# Supplemental Material for Rogers et al., 2020

dataset

posted on 16.03.2021, 19:22 by Anna R. Rogers, Jeffrey C. Dunne, Cinta Romay, Martin Bohn, Edward S. Buckler, Ignacio A. Ciampitti, Jode Edwards, David Ertl, Sherry Flint-Garcia, Michael A. Gore, Christopher Graham, Candice N. Hirsch, Elizabeth Hood, David C. Hooker, Joseph Knoll, Elizabeth C. Lee, Aaron Lorenz, Jonathan P. Lynch, John McKay, Stephen P. Moose, Seth C. Murray, Rebecca Nelson, Torbert Rocheford, James C Schnable, Patrick S Schnable, Rajandeep Sekhon, Maninder Singh, Margaret Smith, Nathan Springer, Kurt Thelen, Peter Thomison, Addie Thompson, Mitch Tuinstra, Jason Wallace, Randall J. Wisser, Wenwei Xu, A.R. Gilmour, Shawn M. Kaeppler, Natalia De Leon, James B. HollandSupplementary Figures:

Figure S1. A Matrix Heatmap

Heatmap of the A Matrix demonstrating relatedness between individuals.

Figure S2. Correlation between A and D matrices

Scatterplot of correlation between the A and D Matrices off-diagonal values

Figure S3. Entry Sharing vs. Kinship between locations

Comparison of entry sharing to mean kinship between Environments.

Figure S4 (a-b). FA Biplot and dendrogram for Environmental clustering

Biplot of first two factors of the FA model colored by environmental cluster. Dendrogram of environmental clustering using same color schematic.

Figure S5. Scree Plot for Env FA clustering

Scree plot for within-between variance ratio from clustering environments based on weather factors

Figure S6. Mean yield vs Yield Factor Loadings/Scores from D*FA(1) Model

Mean yield and mean factor environment loadings or genotype scores on the yield factor from D*FA(1) model are displayed for the seven environment clusters defined by weather variables and for nine hybrid clusters defined by marker data. A tenth hybrid cluster (Ontario) was not included because its yield performance was observed 10 or fewer times in four of the environment clusters.

Figure S7. Stepwise Regression related to Yield

First ten weather variables selected in stepwise regression model relating D×FA(1) environment loadings to weather variables summarized over 5-day windows. Each variable and mean environment yield are standardized.

Figure S8. Marker Data Pipeline

Workflow for processing genomic marker data from inbred parents to generate hybrid genotypes.

Figure S9. Stand Boxplots by location

Boxplot of stand for each environment. Outliers detected using IQR and mean-percentile criteria colored pink.

Supplementary Tables:

Table S1. Env -> Site year pairings

Table matching each Environment name used for analysis to its site and year.

Table S2. Stage 1 Covariates.

Covariates of experimental design and which year each could be fitted for/Blocking factors and residual variance models tested for each year in stage 1 analysis.

Table S3. Stage 1 Within-Environment Selected Model Information.

Model chosen for each trait-environment combination including the fixed effects, random effects, and residual variance structure of the model selected using BIC. In addition, error variance from the model and both mean and plot basis heritabilities are given.

Table S4. Hybrid clusters defined by genotype data, described by pedigrees contributing substantially to the cluster.

Table S5. Pairwise genetic correlations between environments from Yield models fit in Echidna.

Table S6. ANOVA model for Yield to using weather-based environment clusters demonstrate relationship between environmental data and environment main effect and G×E variances.

Table S7. Model fit information for stepwise regression models using 5, 10, 15, and 30-day windowed environmental covariates as a predictor of parameter estimates from respective AxFA(1) and DxFA(1) models. Best fitting model for each parameter by BIC are in bold.

Table S8. Loading terms for the model with best BIC from stepwise regression analysis.

Table S9. Mean and standard deviation of prediction ability for different models to predict hybrid marginal values, based on 10-fold cross-validation.

Supplementary Files:

File S1. Supplemental Methods

Description of the marker, phenotypic, and weather data cleaning and exploration done. approximately 10 pages of comprehensive data cleaning information.

File S2. Combining Phenotype Datasets R code

This script combines trait data from years 2014-16 of G2F project and does some quality control checking and filtering.

File S3. G2F Environmental Data Processing Functions.R

This file contains functions for processing G2F weather data from cleaned files. The first function, plant, computes days since planting for each location. The second function, DailyMeans computes daily values for covariates from data taken throughout a given day at a location using Julian date format. The third function uses a while loop to allow the user to specify the sliding window size for their environmental data output. Between steps assume that the user has filled in any missing values using an appropriate source.

File S4. Spec-IEM Correlations

This file contains correlations between data obtained from the in-field Spectrum Watchdog weather stations and data scraped from the Iowa Environmental Mesonet database.

File S5. Weather variables averaged over 5-day periods for each environment. These values are in original units before scaling.

File S6. Stage 1 R Code

This script processes through Stage 1 modeling including multiple model types to account for experimental design in a given environment. The best model is output using BIC as a metric.

File S7. Hybrid best linear unbiased estimators BLUEs for all traits within each environment based on selected model

File S8. Echidna Code

Sample Echidna code for models utilizing IDVG Variance Structures

File S9. Echidna Code

Sample Echidna code for models utilizing the A Matrix.

File S10. Echidna Code

Sample Echidna code for models utilizing the D Matrix

File S11. Echidna Code

Sample Echidna code for models utilizing both the A and D matrices in tandem.

File S12. Environment BLUEs (averaged across all hybrids) for all traits

File S13. Hybrid BLUEs (averaged across all environments) for all traits

File S14. Hybrid Cluster assignments and A and D diagonal values

Cluster assignments of hybrids from hierarchical clustering of marker data, along with their diagonal values from the A and D matrices.

File S15. Variance Components All Traits all models

Tables of Variance components for each trait and the model fit. For Yield, an expanded set of models were fit while all other traits fit only the IDVG models and most simple model using both the A and D realized relationship matrices.

File S16. Three Year Loadings

Factor loadings of each environment variable from the FA(10) model applied to 30-day period weather data.

File S17. List of Hybrid names in order of rows and columns of the realized genomic relationship matrices

File S18. Realized additive relationship matrix, TASSEL output format

File S19. Realized dominance relationship matrix, TASSEL output format