NEXTGEN Cassava PhD students Olumide Alabi (IITA) and Ismail Kayondo (NaCRRI) and postdoctoral fellow Dunia Pino del Carpio (Cornell) recently attended a week-long course at Aarhus University, Denmark, titled “Statistical Models for Genomic Predictions in Animals and Plants.” Below is Olumide Alabi’s report on the course:
Day 1: Background on genomic selection; classical MAS; basic prediction using GWAS; environmental effects
The course started in the morning with a theoretical background and introduction to the topics stated above. A hands-on exercise on the optimization of breeding plans using phenotypic selection and genomic selection was simulated with varying population sizes and marker densities. In the afternoon, a real dataset was provided to work with, from which we individually ran a simple GWAS model using R and later fitted prediction models from the GWAS result after we had identified significant SNP effects. The last exercise of the day was fitting the GWAS model with environmental effects and comparing prediction models using different cross validation schemes.
- snp_select = which(gwas_BW_results[,4] < 2.7e-5)
- lm(dat_train$BW ~factor(dat_train$Sex)+factor(dat_train$Batch)+geno_train$V2)
Lessons: Making conclusions from computing predictions from larger sets of SNPs by different thresholds for p-values in the modeling for predictions using the GWAS results obtained
Day 2: Whole-genome SNP regression model; introduction to single-trait and multi-trait GBLUP; cross‐validation systems
With preliminary theoretical discussions on each of the topics listed above, much time was devoted to hands-on exercises on them. The first exercise of the day was the Random Regression with R-BGLR. It was noted that BGLR does not accept missing data, hence, a replacement with the mean genotype (2p: allele frequency). A DMU R package developed by the Danish group was installed, and we used this to run a multi-trait WGRR and GBLUP. Finally, an exercise cross-validation of varying k-fold schemes was carried out.
Day 3: Making, scaling and interpreting Genomic Relationships matrices; single step GBLUP and scaling G and A
I initially found it difficult to comprehend some aspects of the day 3 topics and exercises; however, the given publication (VanRaden, 2008 and Legarra et al, 2015) and additional explanation by the instructor and interaction with colleagues in the class helped somehow. Time was dedicated to the single step approach for genomic evaluation, compatibility of G and A matrix and the single-step in Rdmu using the pedigree file.
Day 4: Bayesian shrinkage models; Bayesian mixture/variable selection models
There was theoretical explanation on posterior distribution and prior distribution information of parameters used for the modeling. Exercises on Mixture model approach were practiced, comparison of different model approach for GWAS and genomic prediction was part of the exercises for the day (LASSO, Bayes A, Bayes B.) using the BGLR and the Rdmu packages. My personal motivation is to read more on Bayesian statistics.
Day 5: Relationships in data; genomic feature models; usual SNP QC
One of the fascinating lessons of the day for me was the Genetic Feature model using the GBLUP models and the Bayesian approach. You can either use a GBLUP model, building G-matrices for SNPs from one chromosome versus the other chromosomes, or a Bayesian model that directly models 19 different variances for the SNPs in each chromosome.
- The lessons of the summer course will be very useful for me in the immediate term, as I will hopefully participate fully alongside Marnin and Uche in the NEXTGEN GS Cycle 3 genomic predictions of the IITA GS program.
- The attendance of this course has filled the gap pointed out to me during my Comprehensive exam by the panel: “Assuming all the support and the associated institutions in my program are not there, how will I cope to implement GS on my own in terms of the predictions, marker system management…”.
- Although I cannot claim 100% understanding of all the theories and exercises at once, the interactive nature of the course was of immense help to my comprehension of what I could apply in my current research and future endeavours.
- The concepts learnt in the course will help me in detailing some of the background concepts of several approaches in my final thesis and publication efforts.
- Meeting several new persons, the exchange of research efforts, and the adventure of getting around some part of Aarhus city after class in the evening time cannot be overemphasized. Although the course was titled “summer course,” it was cold all through, coupled with the experience of very long day hours and short night darkness (~ 4 hours).
I acknowledge the NEXTGEN program management for the capacity-building investment by giving us the opportunity of attending courses that are of relevance to our present research efforts and preparing us for future research endeavours.