## Supplemental Material for Li and Ralph, 2018

dataset

posted on 20.11.2018 by Peter L. Ralph, Han Li#### dataset

Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.

Supplementary Tables and Figures for Li and Ralph, "Local PCA Shows How the Effect of Population Structure Differs

Along the Genome"

Along the Genome"

Table S1.

Correlations between MDS coordinates of genomic regions between runs with

different parameter values. To produce these, we first ran the algorithm with

the specified window size and number of PCs (k) on the full Medicago truncatula

dataset. Then to obtain the correlation between results obtained from

parameters A in the row of the matrix above and parameters B in the column of

the matrix above, we mapped the windows of B to those of A by averaging MDS

coordinates of any windows of B whose midpoints lay in the corresponding window

of A; we then computed the correlation between the MDS coordinates of A and the

averaged MDS coordinates of B. This is not a symmetric operation, so these

matrices are not symmetric. As expected, parameter values with smaller windows

produce noisier estimates, but plots of MDS values along the genome are

visually very similar.

Figure S1.

PCA plots for chromosome arms 2L, 2R, 3L, 3R and X of the Drosophila

melanogaster dataset.

Figure S2.

PCA plots for all 22 human autosomes from the POPRES data.

Figure S3.

PCA plots for all 8 chromosomes in the Medicago truncatula dataset.

Figure S4.

MDS visualizations of the Gaussian genotypes described in the Appendix, for 50

individuals from each of three populations. (top) The first quarter, middle

half, and final quarter of the chromosome each have different population

structure, as expected, despite the possibility for PC switching within each.

(bottom) The same picture results even after marking a random 50% of the

genotypes in the first half of the chromosome as missing.

Figure S5.

MDS visualizations of the results of individual-based simulations using SLiM

(see Appendix for details). All simulations are neutral, and recombination is:

(top) constant; (top middle) varies stepwise by factors of two in seven

equal-length segments, with highest rates on the ends, so the middle segment

has a recombination rate 64 times lower than the ends; (bottom middle)

according to the HapMap human female chromosome 7 map. The bottom figure shows

PCA maps corresponding to the three colored windows of the last (HapMap)

situation; the outlying regions are long regions of low recombination rate, so

that region can be dominated by a few correlated trees, similar to an

inversion. The (inset) provides a key to the locations of the individuals on

the spatial landscape.

Figure S6.

MDS visualizations of the results of individual-based simulations using SLiM

(see Appendix for details). All simulations incorporate linked selection by

allowing selected mutations to appear in the same two regions of the genome:

the one-sixth of the genome immediately before the halfway point, and the last

one-sixth of the genome. (top) Constant recombination rate. (top middle)

Stepwise varying recombination rate. (bottom middle) Constant recombination

rate with spatially varying effects of selection. (bottom) PCA plots

corresponding to the highlighted corners of the last MDS visualization, showing

how spatially varying linked selection has affected patterns of relatedness.

The (inset) provides a key to the locations of the individuals on the spatial

landscape.

Figure S7.

MDS visualizations for each chromosome arm of Drosophila melanogaster, as in

Figure 2, except that the method was run using five PCs (k=5) instead of

two.

Figure S8.

The proportion of data in each window that are missing, compared to the value

of the first MDS coordinate for the Drosophila melanogaster data from Figure 2.

Figure S9.

PCA plots for the three sets of genomic windows colored in Figure 2, on each

chromosome arm of Drosophila melanogaster. In all plots, each point

represents a sample. The first column shows the combined PCA plot for windows

whose points are colored green in Figure 2; the second is for orange windows;

and the third is for purple windows.

Figure S10.

Variation in structure for windows of 1,000 SNPs across Drosophila melanogaster

chromosome arms: without inversions. As in Figure 2, but after omitting for

each chromosome arm individuals carrying the less frequent orientation of any

inversions on that chromosome arm. The values differ from those in Figure 4 in

the window size used and that some MDS values were inverted (but relative

orientation is meaningless as chromosome arms were run separately, unlike for

Medicago). In all plots, each point represents one window along the genome.

The first column shows the MDS visualization of relationships between windows,

and the second and third columns show the midpoint of each window against the

two MDS coordinates; rows correspond to chromosome arms. Vertical lines show

the breakpoints of known polymorphic inversions.

Figure S11.

Recombination rate, and the effects of population structure for Drosophila

melanogaster: this shows the first MDS coordinate and recombination rate (in

cM/Mbp), as in Figure 4, against each other. Since the windows underlying

estimates of Figure 4 do not coincide, to obtain correlations we divided the

genome into 100Kbp bins, and for each variable (recombination rate and MDS

coordinate 1) averaged the values of each overlapping bin with weight

proportional to the proportion of overlap. The correlation coefficient and

p-values for each linear regression are as follows: 2L: correlation=0.52,

r^2=0.27; 2R: correlation=0.43, r^2=0.18; 3L: correlation=0.47, r^2=0.21; 3R:

correlation=0.46, r^2=0.21; X: correlation=0.50, r^2=0.24.

Figure S12.

MDS plots for human chromosomes 1-8. The first column shows the MDS

visualization of relationships between windows, and the second and third

columns show the midpoint of each window against the two MDS coordinates; rows

correspond to chromosomes. Colorful vertical lines show the breakpoints

of known valid inversions, while grey vertical lines show the breakpoints of

predicted inversions.

Figure S13.

MDS plots for human chromosomes 9-16, as in Supplementary Figure S12.

Figure S14.

MDS plots for human chromosomes 17-22, as in Supplementary Figure S12.

Figure S15.

Comparison of PCA figures within outlying windows (center column) and flanking

non-outlying windows (left and right columns) for the two windows having

outlying MDS scores on chromosome 8.

Figure S16.

MDS visualization of variation in the effects of population structure amongst

windows across all human autosomes simultaneously. The small group of

windows with positive outlying MDS values lie around the inversion at 8p23.

Figure S17.

First MDS coordinate against gene density for all 8 chromosomes of M. truncatula.

The first MDS coordinate is significantly correlated with gene count (r=0.149, p=2.2e-16).

Figure S18.

MDS visualizations of the effects of population structure for all 8 chromosomes

of the Medicago truncatula data, using windows of 10000 SNPs.

Figure S19.

PCA plots for regions colored in Figure S18 on all 8 chromosomes of

Medicago truncatula: (A) green, (B) orange, and (C) purple.