Supplemental Material for Hamlin et al., 2019
datasetposted on 20.09.2019 by Jennafer Hamlin, Guilherme Dias, Casey Bergman, Douda Bensasson
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
Data associated with Hamlin et al. Raw PacBio reads for the three Candida strains are available at the NCBI short-read archive (SRA) under BioProject PRJNA533645. The phased diploid assemblies for the three oak strains are associated with the overall BioProject PRJNA543321. Individual GenBank accession numbers for primary contigs and haplotigs, respectively, for each strain are as follows: NCYC 4144: GCA_005890765.1 and GCA_005890695.1; NCYC 4145: GCA_005890775.1 and GCA_005890685.1; NCYC 4146: GCA_005890745.1 and GCA_005890705.1. Coordinates of haplotigs relative to their respective primary assembly are available in Files S10, S11, and S12. Annotations of positions of centromeres, telomeric repeats, confirmed LOH regions, assembly gaps, uncertain regions with unexpectedly low heterozygosity, and regions that were not polished by FALCON-Unzip are available for primary assemblies in Files S1, S2, and S3. Annotations of unpolished regions for alternative haplotig assemblies are provided in Files S4, S5 and S6. A full description of software version numbers for phased assembly is provided in File S7 and the configuration files used to run FALCON and FALCON-Unzip are provided in Files S8 and S9.