posted on 2018-11-20, 18:19authored byPeter L. Ralph, Han Li
Supplementary Tables and Figures for Li and Ralph, "Local PCA Shows How the Effect of Population Structure Differs Along the Genome"
Table S1.
Correlations between MDS coordinates of genomic regions between runs with different parameter values. To produce these, we first ran the algorithm with the specified window size and number of PCs (k) on the full Medicago truncatula dataset. Then to obtain the correlation between results obtained from parameters A in the row of the matrix above and parameters B in the column of the matrix above, we mapped the windows of B to those of A by averaging MDS coordinates of any windows of B whose midpoints lay in the corresponding window of A; we then computed the correlation between the MDS coordinates of A and the averaged MDS coordinates of B. This is not a symmetric operation, so these matrices are not symmetric. As expected, parameter values with smaller windows produce noisier estimates, but plots of MDS values along the genome are visually very similar.
Figure S1.
PCA plots for chromosome arms 2L, 2R, 3L, 3R and X of the Drosophila melanogaster dataset.
Figure S2.
PCA plots for all 22 human autosomes from the POPRES data.
Figure S3.
PCA plots for all 8 chromosomes in the Medicago truncatula dataset.
Figure S4.
MDS visualizations of the Gaussian genotypes described in the Appendix, for 50 individuals from each of three populations. (top) The first quarter, middle half, and final quarter of the chromosome each have different population structure, as expected, despite the possibility for PC switching within each. (bottom) The same picture results even after marking a random 50% of the genotypes in the first half of the chromosome as missing.
Figure S5.
MDS visualizations of the results of individual-based simulations using SLiM (see Appendix for details). All simulations are neutral, and recombination is: (top) constant; (top middle) varies stepwise by factors of two in seven equal-length segments, with highest rates on the ends, so the middle segment has a recombination rate 64 times lower than the ends; (bottom middle) according to the HapMap human female chromosome 7 map. The bottom figure shows PCA maps corresponding to the three colored windows of the last (HapMap) situation; the outlying regions are long regions of low recombination rate, so that region can be dominated by a few correlated trees, similar to an inversion. The (inset) provides a key to the locations of the individuals on the spatial landscape.
Figure S6.
MDS visualizations of the results of individual-based simulations using SLiM (see Appendix for details). All simulations incorporate linked selection by allowing selected mutations to appear in the same two regions of the genome: the one-sixth of the genome immediately before the halfway point, and the last one-sixth of the genome. (top) Constant recombination rate. (top middle) Stepwise varying recombination rate. (bottom middle) Constant recombination rate with spatially varying effects of selection. (bottom) PCA plots corresponding to the highlighted corners of the last MDS visualization, showing how spatially varying linked selection has affected patterns of relatedness. The (inset) provides a key to the locations of the individuals on the spatial landscape.
Figure S7.
MDS visualizations for each chromosome arm of Drosophila melanogaster, as in Figure 2, except that the method was run using five PCs (k=5) instead of two.
Figure S8.
The proportion of data in each window that are missing, compared to the value of the first MDS coordinate for the Drosophila melanogaster data from Figure 2.
Figure S9.
PCA plots for the three sets of genomic windows colored in Figure 2, on each chromosome arm of Drosophila melanogaster. In all plots, each point represents a sample. The first column shows the combined PCA plot for windows whose points are colored green in Figure 2; the second is for orange windows; and the third is for purple windows.
Figure S10.
Variation in structure for windows of 1,000 SNPs across Drosophila melanogaster chromosome arms: without inversions. As in Figure 2, but after omitting for each chromosome arm individuals carrying the less frequent orientation of any inversions on that chromosome arm. The values differ from those in Figure 4 in the window size used and that some MDS values were inverted (but relative orientation is meaningless as chromosome arms were run separately, unlike for Medicago). In all plots, each point represents one window along the genome. The first column shows the MDS visualization of relationships between windows, and the second and third columns show the midpoint of each window against the two MDS coordinates; rows correspond to chromosome arms. Vertical lines show the breakpoints of known polymorphic inversions.
Figure S11.
Recombination rate, and the effects of population structure for Drosophila melanogaster: this shows the first MDS coordinate and recombination rate (in cM/Mbp), as in Figure 4, against each other. Since the windows underlying estimates of Figure 4 do not coincide, to obtain correlations we divided the genome into 100Kbp bins, and for each variable (recombination rate and MDS coordinate 1) averaged the values of each overlapping bin with weight proportional to the proportion of overlap. The correlation coefficient and p-values for each linear regression are as follows: 2L: correlation=0.52, r^2=0.27; 2R: correlation=0.43, r^2=0.18; 3L: correlation=0.47, r^2=0.21; 3R: correlation=0.46, r^2=0.21; X: correlation=0.50, r^2=0.24.
Figure S12.
MDS plots for human chromosomes 1-8. The first column shows the MDS visualization of relationships between windows, and the second and third columns show the midpoint of each window against the two MDS coordinates; rows correspond to chromosomes. Colorful vertical lines show the breakpoints of known valid inversions, while grey vertical lines show the breakpoints of predicted inversions.
Figure S13.
MDS plots for human chromosomes 9-16, as in Supplementary Figure S12.
Figure S14.
MDS plots for human chromosomes 17-22, as in Supplementary Figure S12.
Figure S15.
Comparison of PCA figures within outlying windows (center column) and flanking non-outlying windows (left and right columns) for the two windows having outlying MDS scores on chromosome 8.
Figure S16.
MDS visualization of variation in the effects of population structure amongst windows across all human autosomes simultaneously. The small group of windows with positive outlying MDS values lie around the inversion at 8p23.
Figure S17.
First MDS coordinate against gene density for all 8 chromosomes of M. truncatula. The first MDS coordinate is significantly correlated with gene count (r=0.149, p=2.2e-16).
Figure S18.
MDS visualizations of the effects of population structure for all 8 chromosomes of the Medicago truncatula data, using windows of 10000 SNPs.
Figure S19.
PCA plots for regions colored in Figure S18 on all 8 chromosomes of Medicago truncatula: (A) green, (B) orange, and (C) purple.
History
Article title
Local PCA Shows How the Effect of Population Structure Differs Along the Genome