Supplemental Material for Lin, Lazarus, and Rhee, 2020
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
Table S1. Known causal genes and their orthologs in major model plants and crop species.
Table S2. The average value of features used for the Arabidopsis model.
Table S3. The average value of features used for the rice model.
Talbe S4. Function annotation, gene expression data, and protein seqeunce difference of thirteen candidate prioritized by QTG-Finder2 and SD1.
Table S5 putative transcription factor binding sites identified in the promoter of Sevir.5G394900.
Figure S1 Whole-genome synteny map between Setaria viridis and Setaria italica by SynMap
Figure S2 Parameter tuning for the Setaria viridis model based on cross-validation AUC-ROC. Error bars represent standard deviation, N=3.
Figure S3 Parameter tuning for the Sorghum bicolor model based on cross-validation AUC-ROC
Figure S4 Causal-gene orthologs in 12 major crops and model species.
Figure S5 Models trained with different groups of orthologs performed similarly according to external validation.
Figure S6 Feature importance of the newly added features to the Arabidopsis and rice models.
Figure S7 Multiple sequence alignment for a RIO2 protein across grass species, yeast and human using Clustal Omega
Figure S8 Multiple sequence alignment for a RIO2 protein across grass species using Clustal Omega
Figure S9 Multiple sequence alignment of a candidate gene SD1 across grass species using Clustal Omega
Figure S10 A candidate gene encoding a ribosomal protein in the L1P family has higher expression in Setaria italica (Seita.5G389700) than in Setaria viridis (Sevir.5G394900).
Figure S11 Pairwise sequence alignment shows polymorphisms in the putative promoters of an ortholog pair of genes encoding L1P ribosomal proteins.