May 13, 2016, Ithaca NY: Olumide Alabi, NextGen Cassava PhD student with the International Institute for Tropical Agriculture (IITA) in Ibadan, Nigeria, recently visited Jean-Luc Jannink’s laboratory group at Cornell University for training on prediction modeling. Olumide reports on his visit here:

**Date:** 8^{th} March to 6^{th} April, 2016

**Location:** Dr. Jean-Luc Jannink’s Research group, Plant breeding and genetics department, Bradfield Hall, Cornell University, Ithaca, NY

Practical skill acquisition in genomic prediction modeling forms the basis of my brief visit to Cornell. I got handy explanation on prediction modeling processes as they apply to past and present genomic selection cycles as being implemented in IITA-NextGen Cassava Breeding Project.

Three major objective activities included:

- The prediction modeling for the IITA-Genomic Selection

Marnin Wolfe, postdoctoral associate at Cornell, was able to guide me from the known in Genomic predictions in general to the unknown with practical step-by-step activities using the IITA-NextGen cassava dataset. I received concrete training on the use of single step model and information on the limitation to it, as it could be computationally intensive with large datasets. Also, I was trained on two-step model, formation of the kinship matrix using the “A.mat” function, model.matrix, kin.blup phenotype dataset curation for prediction modeling, G-BLUP model, RR-BLUP model, the inclusion of multiple random effects in prediction modeling using the EMMREML model and general theories and coding syntaxes associated with these above-mentioned models. One of the newest concepts to me in all was when I was guided through the IITA-Cycle 3 prediction, de-regressed BLUPs, especially with the theory and concept of reliability estimation, PEV, and how these influence the accuracy of our predictions. Marnin did well in guiding me through these concepts both theoretically and practically, coupled with exercises, reading assignments, brainstorming sessions. To wrap it up, I was guided through the entire IITA-GS Cycle 3 prediction model; the code was provided to me by Marnin with detailed explanations.

- Fitting the appropriate model for the genetic gain estimation

Estimating the “Expected Gain” in GS application in cassava is not a straight-forward thing, as the selection of the parents is based on selection index built from the GEBVs of traits and individuals. In the gain estimation using the conventional breeder’s equation, there is a little adjustment in GS concept, which is basically the selection accuracy factor in the model. To obtain this, we had to correlate the S.I_GEBVs (Predicted) of lines and the S.I_BLUPs (Observed). In my brainstorming with Marnin, we came up with the concept highlighted below:

r_{A} = corr(S.I_GEBVs, S.I_BLUPs)

Where S.I_GEBVs = wtGEBV_{T1} + wtGEBV_{T2} + wtGEBV_{T2}…+ wtGEBV_{TN}

wt = the economic weight used for trait T in the selection index model

S.I_BLUPs = wtBLUP_{T1} + wtBLUP_{T2} + wtBLUP_{T2}…+ wtBLUP_{TN}

Hence, the r_{A }could be appropriately fitted in the breeder’s equation for the expected gain estimation.

- GWAS exploration on the plant type dataset

Dunia (Research Associate) guided me through GWA-studies with the use of datasets on plant type and the associated SNP data. For better handling of the categorical nature of the Plant Type trait (compact_1, open_2, umbrella_3 and cylinderica_4), Marnin suggested the classification of the trait as binomial scores (E.g. Compact: 0_absent, 1_present), hence coding the scores as a trait per time. It was to enable us to fit a GLIMMIX model with the flexibility of a link function for variance components.

- I participated in the research group and graduate student seminars and symposiums.

**Skills acquired**

I can practically implement Genomic prediction with more confidence on availability of appropriate dataset. I got a detailed understanding of the past IITA GS Cycle selection and a first-hand understanding of the present Cycle 3 predictions (Thanks to Marnin). I got a better clue on several aspects in statistical modeling to be included in my thesis report, especially the expected gain estimation concept and some genomic prediction steps.

**Acknowledgement**

My appreciation goes to Dr. Jean-Luc Jannink for the time and audience given to me while I was in Ithaca; the meeting for updates in his office and facilitation of my visit; amidst other.

Many thanks to Marnin for devoting much time in coaching me. In fact, he was my tutor all through the period I was in Ithaca. Dunia did a great job as well as my NextGen graduate student colleagues, Ugo, Uche, and Alfred. Alex of BTI is appreciated for his kind gestures all through my time in Ithaca. I would not but mention the logistics from Dan’s end, Karen and the team in IP-CALS office.

I want to thank my supervisors in IITA, Drs. Peter Kulakow and Ismail Rabbi, for granting the home-support needed to visit Cornell this period. Thanks to Dr. Chiedozie Egesi and Dr. Hale Tufan. My final appreciation goes to the Cornell-NextGen Cassava project for the full support. My regards to all.