Supplemental Material for Zhou et al., 2020
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
There has been extensive discussion of the “Replication Crisis” in many fields, including genome-wide association studies (GWAS). We explored replication in a mouse model using an advanced intercross line (AIL), which is a multigenerational intercross between two inbred strains. We re-genotyped a previously published cohort of LG/J x SM/J AIL mice (F34; n=428) using a denser marker set and genotyped a new cohort of AIL mice (F39-43; n=600) for the first time. We identified 36 novel genome-wide significant loci in the F34 and 25 novel loci in the F39-43 cohort. The subset of traits that were measured in both cohorts (locomotor activity, body weight, and coat color) showed high genetic correlations, although the SNP heritabilities were slightly lower in the F39-43 cohort. For this subset of traits, we attempted to replicate loci identified in either F34 or F39-43 in the other cohort. Coat color was robustly replicated; locomotor activity and body weight were only partially replicated, which was inconsistent with our power simulations. We used a random effects model to show that the partial replications could not be explained by Winner’s Curse but could be explained by study-specific heterogeneity. Despite this heterogeneity, we performed a mega-analysis by combining F34 and F39-43 cohorts (n=1,028), which identified four novel loci associated with locomotor activity and body weight. These results illustrate that even with the high degree of genetic and environmental control possible in our experimental system, replication was hindered by study-specific heterogeneity, which has broad implications for ongoing concerns about reproducibility.