Supplemental Material for Chen et al., 2019
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
Figure S1. Distribution of 17-mers in raw sequence data from the M. baccata genomes.
Figure S2. Genetic and physical maps of M. baccata genome.
Figure S3. Indication of relationship between genetic distance and physical distance for the M. baccata genome chromosomes.
The left blue bars indicate pseudomolecules, and the right green bars represent linkage groups corresponding to 17 chromosomes. The light green blocks indicate ordered regions and markers, and the dark green blocks and lines represent unordered but matched scaffolds to the genetic map.
Figure S4. Stacked bar plots of the categories of BUSCO groups searched in M. baccatagenome and other Rosaceae plants.
Different color represents complete single copy, complete duplicated copy, fragmented and missing BUSCOs respectively. (Mb, Malus baccata; Md, Malus domestica; Pb, Pyrus bretschneideri; Pp, Prunus persica; Fv, Fragaria vesca)
Figure S5. The distribution of the distance between SNPs in M. baccata genome.
Figure S6. SNP frequency per gene in M. baccata genome.
Figure S7. Venn diagram of genes with homology or functional classifications by different methods.
Figure S8. Histogram of Ka/Ks values for orthologous genes from M. baccata and M. domestica.
Figure S9. Divergence time between 10 species.
Figure S10. Synteny analyses among Rosaceae species.
The numbers of nodes represent divergence time in MYA (million years ago). The red node represents known divergence time range. The polygonal star represents WGD invent according to previous studies.
Table S1 M. baccata genome size estimation of 17-mer
Table S2 Global statistics of M. baccata genome sequencing.
Table S3 Summary of the M. baccata genome assembly
Table S4 Statistics for each pseudo-chromosome
Table S5 Statistics for linkage groups construction
Table S6 Assessment of sequence coverage of the M. baccata genome assembly by Malus. EST sequences downloaded from NCBI.
Table S7 Assessing the gene region assembly by RNA sequencing and assembly.
Table S8 Evaluation of gene space coverage using core eukaryotic gene mapping approach (CEGMA).
Table S9 Summarized BUSCO benchmarking for M. baccata genome
Table S10 Annotation of heterozygosity SNP s detected in M. baccata genome.
Table S11 Significantly (Q value < 0.05) enriched KEGG pathways among genes with high frequencies (>3%) of SNPs in M. baccata genome.
Table S12 Statistics of repeats of M. baccata genome
Table S13 Summary and content analysis of different types of TEs in the M. baccatagenome.
Table S14 Summary of identified TEs in sequenced Rosaceae plant
Table S15 Statistics of predicted protein-coding genes in M. baccata using five different methods.
Table S16 Comparison of the gene sets of M. baccata with those of other Rosaceae plant
Table S17 Number of genes with homology or functional classifications by different methods. Four protein databases were used for predicting gene functions.
Table S18 Summary of predicted non-coding RNAs in the M. baccata.
Table S19 Significantly (Q value < 0.05) enriched KEGG pathways among genes with Ka/Ks >1 from M. baccata and M. domestica.
Table S20 Summary of the annotation of genes with Ka/Ks >1 and P < 0.05 from M. baccata and M. domestica.
Table S21 Significantly enriched KEGG pathways among genes expanded in M. baccatagenome.
Table S22 Significantly (Q value < 0.05) enriched KEGG pathways among M. baccataand M. domestica genes from expanded and contracted gene families
Table S23 Summary of synteny blocks with at least four gene pairs between M. baccataand the other Rosaceae genomes.
Table S24 Transcription factors (TF) present in sequenced Rosaceae plant.
Table S25 Number of NBS genes cluster in chromosome of M. baccata and M. domestica.