Tag Archives: students

NextGen PhD Student Visits Cornell for training on Prediction Modeling

May 13, 2016, Ithaca NY: Olumide Alabi, NextGen Cassava PhD student with the International Institute for Tropical Agriculture (IITA) in Ibadan, Nigeria, recently visited Jean-Luc Jannink’s laboratory group at Cornell University for training on prediction modeling. Olumide reports on his visit here:

Date: 8th March to 6th April, 2016

Location: Dr. Jean-Luc Jannink’s Research group, Plant breeding and genetics department, Bradfield Hall, Cornell University, Ithaca, NY

Practical skill acquisition in genomic prediction modeling forms the basis of my brief visit to Cornell. I got handy explanation on prediction modeling processes as they apply to past and present genomic selection cycles as being implemented in IITA-NextGen Cassava Breeding Project.

Three major objective activities included:

  1. The prediction modeling for the IITA-Genomic Selection

Marnin Wolfe, postdoctoral associate at Cornell, was able to guide me from the known in Genomic predictions in general to the unknown with practical step-by-step activities using the IITA-NextGen cassava dataset. I received concrete training on the use of single step model and information on the limitation to it, as it could be computationally intensive with large datasets. Also, I was trained on two-step model, formation of the kinship matrix using the “A.mat” function, model.matrix, kin.blup phenotype dataset curation for prediction modeling, G-BLUP model, RR-BLUP model, the inclusion of multiple random effects in prediction modeling using the EMMREML model and general theories and coding syntaxes associated with these above-mentioned models. One of the newest concepts to me in all was when I was guided through the IITA-Cycle 3 prediction, de-regressed BLUPs, especially with the theory and concept of reliability estimation, PEV,  and how these influence the accuracy of our predictions. Marnin did well in guiding me through these concepts both theoretically and practically, coupled with exercises, reading assignments, brainstorming sessions. To wrap it up, I was guided through the entire IITA-GS Cycle 3 prediction model; the code was provided to me by Marnin with detailed explanations.

  1. Fitting the appropriate model for the genetic gain estimation

Estimating the “Expected Gain” in GS application in cassava is not a straight-forward thing, as the selection of the parents is based on selection index built from the GEBVs of traits and individuals. In the gain estimation using the conventional breeder’s equation, there is a little adjustment in GS concept, which is basically the selection accuracy factor in the model. To obtain this, we had to correlate the S.I_GEBVs (Predicted) of lines and the S.I_BLUPs (Observed). In my brainstorming with Marnin, we came up with the concept highlighted below:

rA = corr(S.I_GEBVs, S.I_BLUPs)

Where S.I_GEBVs = wtGEBVT1 + wtGEBVT2 + wtGEBVT2…+ wtGEBVTN

wt = the economic weight used for trait T in the selection index model

S.I_BLUPs = wtBLUPT1 + wtBLUPT2 + wtBLUPT2…+ wtBLUPTN

Hence, the rA could be appropriately fitted in the breeder’s equation for the expected gain estimation.

  1. GWAS exploration on the plant type dataset

Dunia (Research Associate) guided me through GWA-studies with the use of datasets on plant type and the associated SNP data. For better handling of the categorical nature of the Plant Type trait (compact_1, open_2, umbrella_3 and cylinderica_4), Marnin suggested the classification of the trait as binomial scores (E.g. Compact: 0_absent, 1_present), hence coding the scores as a trait per time. It was to enable us to fit a GLIMMIX model with the flexibility of a link function for variance components.

  1. I participated in the research group and graduate student seminars and symposiums.

Skills acquired

I can practically implement Genomic prediction with more confidence on availability of appropriate dataset. I got a detailed understanding of the past IITA GS Cycle selection and a first-hand understanding of the present Cycle 3 predictions (Thanks to Marnin). I got a better clue on several aspects in statistical modeling to be included in my thesis report, especially the expected gain estimation concept and some genomic prediction steps.

Acknowledgement

My appreciation goes to Dr. Jean-Luc Jannink for the time and audience given to me while I was in Ithaca; the meeting for updates in his office and facilitation of my visit; amidst other.

Many thanks to Marnin for devoting much time in coaching me. In fact, he was my tutor all through the period I was in Ithaca. Dunia did a great job as well as my NextGen graduate student colleagues, Ugo, Uche, and Alfred. Alex of BTI is appreciated for his kind gestures all through my time in Ithaca. I would not but mention the logistics from Dan’s end, Karen and the team in IP-CALS office.

I want to thank my supervisors in IITA, Drs. Peter Kulakow and Ismail Rabbi, for granting the home-support needed to visit Cornell this period. Thanks to Dr. Chiedozie Egesi and Dr. Hale Tufan. My final appreciation goes to the Cornell-NextGen Cassava project for the full support. My regards to all.

Olumide during his training at Cornell and with Marnin Wolfe, bottom left

Olumide during his training at Cornell and with Marnin Wolfe, bottom left

NEXTGEN PhD Student Roberto Lozano Attends Cold Spring Harbor Laboratory Course

NEXTGEN Cassava PhD student Roberto Lozano recently attended a two-week course on Statistical Methods for Functional Genomics at Cold Spring Harbor Laboratory (CSHL), and he reports on it here:

CSHL is considered among the leading research institutions in the world in molecular biology and genetics. Not only because of its history (considerable long list of noble laureates) but also for the current research taking place there.

Part of my research as a graduate student is focused on using high-throughput genomic data to identify functional regions across the cassava genome and try to use this information to improve Cassava GS-assisted breeding. Some of the high-throughput genomic data will come from transcriptome sequencing, chromatin footprinting and methylation profiling analysis.

Statistical Methods for Functional Genomics course attendees

Statistical Methods for Functional Genomics course attendees

High-throughput sequencing has become a major technique in biological research. However analyzing big data sets, products of these technologies, carries some challenges that are not always properly tackled. These kinds of errors can threaten the biological inferences that are made. All the techniques that I planned on using for my research carry some unique difficulties and sometimes complex statistical principles underlying their analysis methods. This course tackled all those techniques, and the instructors and speakers have wide experience working with that kind of data. That’s what initially caught my attention to apply for this course.

DNA Sculpture at CSHL

DNA Sculpture at CSHL

After taking it I have to admit that it was as good as it could get. All the instructors were great; each of them leads their own top-notch research group, and they were really helpful and resourceful. The invited speakers were great as well, showing some of the latest techniques and applications of next-gen sequencing. The attendees came from a wide variety of fields and from all around the world, working in both Academia and private companies, and the wide variety of their study fields (cancer, neurobiology, plant genomics, immunology and more) really assured lots of interesting discussions. Finally I had to mention that even the location of the Cold Spring Harbor Labs was something else, a beautiful environment that let people focus on their research.

NEXTGEN Students and Postdoc Attend Genomic Selection Course in Aarhus, Denmark

Ismail Kayondo, Dunia Pino del Carpio, and Olumide Alabi at Aarhus University

Ismail Kayondo, Dunia Pino del Carpio, and Olumide Alabi at Aarhus University

NEXTGEN Cassava PhD students Olumide Alabi (IITA) and Ismail Kayondo (NaCRRI) and postdoctoral fellow Dunia Pino del Carpio (Cornell) recently attended a week-long course at Aarhus University, Denmark, titled “Statistical Models for Genomic Predictions in Animals and Plants.” Below is Olumide Alabi’s report on the course:

Day 1: Background on genomic selection; classical MAS; basic prediction using GWAS; environmental effects
The course started in the morning with a theoretical background and introduction to the topics stated above. A hands-on exercise on the optimization of breeding plans using phenotypic selection and genomic selection was simulated with varying population sizes and marker densities. In the afternoon, a real dataset was provided to work with, from which we individually ran a simple GWAS model using R and later fitted prediction models from the GWAS result after we had identified significant SNP effects. The last exercise of the day was fitting the GWAS model with environmental effects and comparing prediction models using different cross validation schemes.

  • snp_select = which(gwas_BW_results[,4] < 2.7e-5)
  • lm(dat_train$BW ~factor(dat_train$Sex)+factor(dat_train$Batch)+geno_train$V2)

Lessons: Making conclusions from computing predictions from larger sets of SNPs by different thresholds for p-values in the modeling for predictions using the GWAS results obtained

Day 2: Whole-genome SNP regression model; introduction to single-trait and multi-trait GBLUP; cross‐validation systems
With preliminary theoretical discussions on each of the topics listed above, much time was devoted to hands-on exercises on them. The first exercise of the day was the Random Regression with R-BGLR. It was noted that BGLR does not accept missing data, hence, a replacement with the mean genotype (2p: allele frequency). A DMU R package developed by the Danish group was installed, and we used this to run a multi-trait WGRR and GBLUP. Finally, an exercise cross-validation of varying k-fold schemes was carried out.

Day 3: Making, scaling and interpreting Genomic Relationships matrices; single step GBLUP and scaling G and A
I initially found it difficult to comprehend some aspects of the day 3 topics and exercises; however, the given publication (VanRaden, 2008 and Legarra et al, 2015) and additional explanation by the instructor and interaction with colleagues in the class helped somehow. Time was dedicated to the single step approach for genomic evaluation, compatibility of G and A matrix and the single-step in Rdmu using the pedigree file.

Day 4: Bayesian shrinkage models; Bayesian mixture/variable selection models
There was theoretical explanation on posterior distribution and prior distribution information of parameters used for the modeling. Exercises on Mixture model approach were practiced, comparison of different model approach for GWAS and genomic prediction was part of the exercises for the day (LASSO, Bayes A, Bayes B.) using the BGLR and the Rdmu packages. My personal motivation is to read more on Bayesian statistics.

Day 5: Relationships in data; genomic feature models; usual SNP QC
One of the fascinating lessons of the day for me was the Genetic Feature model using the GBLUP models and the Bayesian approach. You can either use a GBLUP model, building G-matrices for SNPs from one chromosome versus the other chromosomes, or a Bayesian model that directly models 19 different variances for the SNPs in each chromosome.

General comments

Olumide Alabi

Olumide Alabi

  • The lessons of the summer course will be very useful for me in the immediate term, as I will hopefully participate fully alongside Marnin and Uche in the NEXTGEN GS Cycle 3 genomic predictions of the IITA GS program.
  • The attendance of this course has filled the gap pointed out to me during my Comprehensive exam by the panel: “Assuming all the support and the associated institutions in my program are not there, how will I cope to implement GS on my own in terms of the predictions, marker system management…”.
  • Although I cannot claim 100% understanding of all the theories and exercises at once, the interactive nature of the course was of immense help to my comprehension of what I could apply in my current research and future endeavours.
  • The concepts learnt in the course will help me in detailing some of the background concepts of several approaches in my final thesis and publication efforts.
  • Meeting several new persons, the exchange of research efforts, and the adventure of getting around some part of Aarhus city after class in the evening time cannot be overemphasized. Although the course was titled “summer course,” it was cold all through, coupled with the experience of very long day hours and short night darkness (~ 4 hours).

Acknowledgement
I acknowledge the NEXTGEN program management for the capacity-building investment by giving us the opportunity of attending courses that are of relevance to our present research efforts and preparing us for future research endeavours.