Supplemental Material for Li and Ralph, 2018

2018-11-20T18:19:29Z (GMT) by Peter L. Ralph Han Li
Supplementary Tables and Figures for Li and Ralph, "Local PCA Shows How the Effect of Population Structure Differs
Along the Genome"


Table S1.

Correlations between MDS coordinates of genomic regions between runs with
different parameter values. To produce these, we first ran the algorithm with
the specified window size and number of PCs (k) on the full Medicago truncatula
dataset. Then to obtain the correlation between results obtained from
parameters A in the row of the matrix above and parameters B in the column of
the matrix above, we mapped the windows of B to those of A by averaging MDS
coordinates of any windows of B whose midpoints lay in the corresponding window
of A; we then computed the correlation between the MDS coordinates of A and the
averaged MDS coordinates of B. This is not a symmetric operation, so these
matrices are not symmetric. As expected, parameter values with smaller windows
produce noisier estimates, but plots of MDS values along the genome are
visually very similar.


Figure S1.

PCA plots for chromosome arms 2L, 2R, 3L, 3R and X of the Drosophila
melanogaster dataset.

Figure S2.

PCA plots for all 22 human autosomes from the POPRES data.

Figure S3.

PCA plots for all 8 chromosomes in the Medicago truncatula dataset.


Figure S4.

MDS visualizations of the Gaussian genotypes described in the Appendix, for 50
individuals from each of three populations. (top) The first quarter, middle
half, and final quarter of the chromosome each have different population
structure, as expected, despite the possibility for PC switching within each.
(bottom) The same picture results even after marking a random 50% of the
genotypes in the first half of the chromosome as missing.

Figure S5.

MDS visualizations of the results of individual-based simulations using SLiM
(see Appendix for details). All simulations are neutral, and recombination is:
(top) constant; (top middle) varies stepwise by factors of two in seven
equal-length segments, with highest rates on the ends, so the middle segment
has a recombination rate 64 times lower than the ends; (bottom middle)
according to the HapMap human female chromosome 7 map. The bottom figure shows
PCA maps corresponding to the three colored windows of the last (HapMap)
situation; the outlying regions are long regions of low recombination rate, so
that region can be dominated by a few correlated trees, similar to an
inversion. The (inset) provides a key to the locations of the individuals on
the spatial landscape.


Figure S6.

MDS visualizations of the results of individual-based simulations using SLiM
(see Appendix for details). All simulations incorporate linked selection by
allowing selected mutations to appear in the same two regions of the genome:
the one-sixth of the genome immediately before the halfway point, and the last
one-sixth of the genome. (top) Constant recombination rate. (top middle)
Stepwise varying recombination rate. (bottom middle) Constant recombination
rate with spatially varying effects of selection. (bottom) PCA plots
corresponding to the highlighted corners of the last MDS visualization, showing
how spatially varying linked selection has affected patterns of relatedness.
The (inset) provides a key to the locations of the individuals on the spatial
landscape.

Figure S7.

MDS visualizations for each chromosome arm of Drosophila melanogaster, as in
Figure 2, except that the method was run using five PCs (k=5) instead of
two.

Figure S8.

The proportion of data in each window that are missing, compared to the value
of the first MDS coordinate for the Drosophila melanogaster data from Figure 2.

Figure S9.

PCA plots for the three sets of genomic windows colored in Figure 2, on each
chromosome arm of Drosophila melanogaster. In all plots, each point
represents a sample. The first column shows the combined PCA plot for windows
whose points are colored green in Figure 2; the second is for orange windows;
and the third is for purple windows.

Figure S10.

Variation in structure for windows of 1,000 SNPs across Drosophila melanogaster
chromosome arms: without inversions. As in Figure 2, but after omitting for
each chromosome arm individuals carrying the less frequent orientation of any
inversions on that chromosome arm. The values differ from those in Figure 4 in
the window size used and that some MDS values were inverted (but relative
orientation is meaningless as chromosome arms were run separately, unlike for
Medicago). In all plots, each point represents one window along the genome.
The first column shows the MDS visualization of relationships between windows,
and the second and third columns show the midpoint of each window against the
two MDS coordinates; rows correspond to chromosome arms. Vertical lines show
the breakpoints of known polymorphic inversions.

Figure S11.

Recombination rate, and the effects of population structure for Drosophila
melanogaster: this shows the first MDS coordinate and recombination rate (in
cM/Mbp), as in Figure 4, against each other. Since the windows underlying
estimates of Figure 4 do not coincide, to obtain correlations we divided the
genome into 100Kbp bins, and for each variable (recombination rate and MDS
coordinate 1) averaged the values of each overlapping bin with weight
proportional to the proportion of overlap. The correlation coefficient and
p-values for each linear regression are as follows: 2L: correlation=0.52,
r^2=0.27; 2R: correlation=0.43, r^2=0.18; 3L: correlation=0.47, r^2=0.21; 3R:
correlation=0.46, r^2=0.21; X: correlation=0.50, r^2=0.24.


Figure S12.

MDS plots for human chromosomes 1-8. The first column shows the MDS
visualization of relationships between windows, and the second and third
columns show the midpoint of each window against the two MDS coordinates; rows
correspond to chromosomes. Colorful vertical lines show the breakpoints
of known valid inversions, while grey vertical lines show the breakpoints of
predicted inversions.

Figure S13.

MDS plots for human chromosomes 9-16, as in Supplementary Figure S12.

Figure S14.

MDS plots for human chromosomes 17-22, as in Supplementary Figure S12.

Figure S15.

Comparison of PCA figures within outlying windows (center column) and flanking
non-outlying windows (left and right columns) for the two windows having
outlying MDS scores on chromosome 8.

Figure S16.

MDS visualization of variation in the effects of population structure amongst
windows across all human autosomes simultaneously. The small group of
windows with positive outlying MDS values lie around the inversion at 8p23.

Figure S17.

First MDS coordinate against gene density for all 8 chromosomes of M. truncatula.
The first MDS coordinate is significantly correlated with gene count (r=0.149, p=2.2e-16).

Figure S18.

MDS visualizations of the effects of population structure for all 8 chromosomes
of the Medicago truncatula data, using windows of 10000 SNPs.



Figure S19.

PCA plots for regions colored in Figure S18 on all 8 chromosomes of
Medicago truncatula: (A) green, (B) orange, and (C) purple.