Supplementary Material for Pook et al., 2019

Table S1 UM imputation error for the commercial breeding line in chicken by changing a single imputing parameter with ne = 100 for BEAGLE 5.1, ne = 300 for BEAGLE 5.0 and ne = 10,000 for BEAGLE 4.1. * BEAGLE crashed for this dataset when using phase-segment > 10, phase-states < 100 or imp-step > 5.

Table S2 UM imputation error for the diversity panel in chicken by changing a single imputing parameter with ne = 300 for BEAGLE 5.1, ne = 3,000 for BEAGLE 5.0 and ne = 10,000 for BEAGLE 4.1. * BEAGLE crashed for this dataset when using phase-segment > 10 or phasestates < 100.

Table S3 Phasing error, as number of heterozygous markers per switch error, for Pseudo S0 generated based on the KE DH-lines by changing a single imputing parameter.

Table S4 Inference error rates using different reference genomes compared to B73 for PE DH-lines. Only markers mapped on both the flint reference genome & B73v4 (Jiao et al. 2017) are considered for "critical" markers (error rate > 10%).

Table S5 List of "critical" markers for Kemater Landmais Gelb using different reference genomes.

Table S6 List of "critical" markers for Petkuser Ferdinand Rot using different reference genomes.

Table S7 Error rates for UM imputation for different reference panels, including A (same subpopulation), C (all other subpopulation), D (below average Nei distant subpopulations), E (All subpopulation with reduced error rate when testing A + B compared to A as the reference panel).

Table S8 Assignments to subpopulations for the chicken diversity panel based on Nei standard genetic distances (Nei 1972).

Table S9 Minimal obtained error rates and used parameter settings for inference and UM imputation. Deviations from the ideal single parameter settings are caused by BEAGLE crashing when changing parameters jointly.

Figure S1 Neighbor-joining-tree for ten subpopulations in the chicken diversity panel. For a detailed list on which individual is assigned to which subpopulation we refer to Supplementary Table S8.

Figure S2 Relationship between region error rate and LD (r2) on chromosome 9 in the maize data. Outliers are corrected for by using a Nadaraya-Watson-estimator (Nadaraya 1964), using a Gaussian kernel and a bandwidth of 3,000 markers in both cases.

Figure S3 Total number of errors per marker (50 repetitions) for BEAGLE 4.0 using buildwindow of 10 and 1200 (default) in the maize data.

Figure S4 DR2 values in relation to the obtained number of error per marker after fitting of ne (A) and on default (B) in BEAGLE 5.0 for the commercial chicken line. 100 / 788 lines were used for study / reference sample.

Figure S5 DR2 values in relation to the obtained number of error per marker after fitting of ne (A) and on default (B) in BEAGLE 5.0 for the chicken diversity panel. 100 / 1710 lines were used for study / reference sample.

Figure S6 – Figure S24 Effect of the different parameter on the inference error rates for the maize data in BEAGLE. Default settings are indicated by the vertical line.

Figure S25 – Figure S45 Effect of the different parameter on the inference and phasing error rates for chromosome 10 of 250 Pseudo S0 generated based on the maize data in BEAGLE. Default settings are indicated by the vertical line.

Figure S46 – Figure S74 Effect of the different parameter on the UM imputation error rate for the maize data, the commercial chicken line and the chicken diversity panel in BEAGLE. Default settings are indicated by the vertical line.