1/2
32 files

Supplemental Material for Melnick, et al. 2021

software
posted on 23.04.2021, 12:09 by Marko Melnick, Patrick Gonzales, Thomas J. LaRocca, Yuping Song, Joanne Wuu, Michael Benatar, Björn Oskarsson, Leonard Petrucelli, Robin D. Dowell, Christopher D. Link, Mercedes Prudencio

Figure S1. Heatmap of normalized coverage of regular Virome from Emanuel2020 with BLAST to nt database from 05/10/2019


Heatmap of normalized coverage of dark biome contigs binned by species (top 30 species). The nucleotide database was from 5/10/2019 before the discovery of SARS-CoV-2. The top row shows the same row from the main text but identified as a bat SARS like coronavirus.


Figure S2. Boxplot of normalized coverage for superkingdom Bacteria in VonSchack2018


Boxplot of normalized coverage of regular biome contigs binned by superkingdom Bacteria. Blood shows significantly more reads in total RNA vs polyA RNA compared to Colon tissue.



Figure S3. Boxplot of normalized coverage for superkingdom Bacteria in Shin2014


Boxplot of normalized coverage of regular biome contigs binned by superkingdom Bacteria. Globin depletion (GD) has significantly more coverage than non-globin depleted (NGD) blood.



Figure S4. Boxplot of normalized coverage for superkingdom Viruses in Shin2014


Boxplot of normalized coverage of regular biome contigs binned by superkingdom Viruses. Globin depletion (GD) has significantly more coverage than non-globin depleted (NGD) blood.



Figure S5. Heatmap of normalized coverage of regular Bacteriome binned by species from Shin2014


Heatmap of normalized coverage of regular biome contigs binned by bacteria species. Globin depletion (GD) is red and non-globin depletion is black (NGD).



Figure S6. Heatmap of normalized coverage of regular Bacteriome binned by genus from Shin2014


Heatmap of normalized coverage of regular biome contigs binned by bacteria genus. Globin depletion (GD) is red and non-globin depletion is black (NGD).


Figure S7. Log coverage binned by phylum from our ALS dataset


Coverage is summed for all of the samples and alpha proteo-bacteria, Actinobacteria, Firmicutes, and Bacteroidetes are the most highly represented



Figure S8. Heatmap of normalized coverage of dark biome contigs binned by species with metadata



Heatmap of normalized coverage of dark biome contigs binned by species. The highest coverage belongs to contigs that show high similarity to velvet tobacco mettle virus. Zero coverage is dark blue and goes to yellow with increasing values. These samples were from four conditions including control patients [(CTL) green], ALS symptomatic patients [(SYM) purple], C9-ORF positive ALS symptomatic patients [(C9S) blue] and C9-ORF positive asymptomatic patients [(C9A) red]. Other metadata include gender, lane, run, and age at collection.



Figure S9. Phylogeny tree of RDRP contig. A phylogeny based solely on viral RDRP protein sequences places the RDRP contig (first row of the figure) closest to single-stranded (+) viruses of the Barnavirus, Sobemovirus, and Polerovirus genera.



Figure S10. Log Bacterial contigs vs log reads for Assembly. Scatterplot where each dot is a sample from a dataset with log number of Bacterial contigs assembled on the Y-axis and Log reads used in SPAdes on the X-axis. Aside from the Shin, Humphrys, and Emanuel datasets there is a general trend of increased number of bacterial contigs with amount of reads.


Figure S11. Log number of bacterial species vs log reads for Assembly. Scatterplot where each dot is a sample from a dataset with log number of number of bacterial species detected on the Y-axis and Log reads used in SPAdes on the X-axis.



Figure S12. Log number of bacterial genus vs log reads for Assembly. Scatterplot where each dot is a sample from a dataset with log number of number of bacterial genus detected on the Y-axis and Log reads used in SPAdes on the X-axis.



Figure S13. Upset plots of Bacteria for genus/species of regular genome


Upset plots are venn diagram-like plots. Each set is on a row with total amounts in a set as a blue bar plot on the left. The black histogram on top shows the counts that are in the intersection of sets (a single dot for one set or connected dots for multiple sets). The highest number of overlapping bacterial genus is between our dataset and Ladd2017 (117) followed by the intersection between our dataset, Ladd2017 and Gagliardi2018 (31).



Figure S14. Upset plots of Bacteria for genus/species of dark genome


Upset plots are venn diagram-like plots. Each set is on a row with total amounts in a set as a blue bar plot on the left. The black histogram on top shows the counts that are in the intersection of sets (a single dot for one set or connected dots for multiple sets). The highest number of overlapping bacterial genus is between our dataset and Ladd2017 (390) followed by the intersection between our dataset, Ladd2017 and Gagliardi2018 (88).



Figure S15. Upset plots of Viruses for genus/species of regular genome


Upset plots are venn diagram-like plots. Each set is on a row with total amounts in a set as a blue bar plot on the left. The black histogram on top shows the counts that are in the intersection of sets (a single dot for one set or connected dots for multiple sets). The regular virome of each dataset is relatively unique with very low amounts of overlap (<= 3) between datasets (species and genus shows a similar pattern).


Figure S16. Upset plots of Viruses for genus/species of dark genome



Upset plots are venn diagram-like plots. Each set is on a row with total amounts in a set as a blue bar plot on the left. The black histogram on top shows the counts that are in the intersection of sets (a single dot for one set or connected dots for multiple sets). The highest overlap for species in the dark virome is between our dataset and Ladd2017 (13).



Figure S17. Upset plots of Bacteria in the regular biome for genus/species in ALS and Control contigs


Upset plots are venn diagram-like plots. Each set is on a row with total amounts in a set as a blue bar plot on the left. The black histogram on top shows the counts that are in the intersection of sets (a single dot for one set or connected dots for multiple sets). We assigned a contig to a condition if >= 2 samples from that condition contain at least 90% of the summed normalized coverage (from all samples) to the contig. In the genus and species from ALS samples there is a low amount of overlap between datasets ( <= 1). When we look at control samples there is a much higher overlap for both genus and species.



Figure S18. Upset plots of Bacteria in the dark biome for genus/species in ALS and Control contigs



Upset plots are venn diagram-like plots. Each set is on a row with total amounts in a set as a blue bar plot on the left. The black histogram on top shows the counts that are in the intersection of sets (a single dot for one set or connected dots for multiple sets). We assigned a contig to a condition if >= 2 samples from that condition contain at least 90% of the summed normalized coverage (from all samples) to the contig. Conditions with no recovered viruses have been omitted for clarity. Similarly to the regular bacteriome, there is no overlap in ALS samples and a small amount of overlap in the conditions.

History

Article title

Application of a bioinformatic pipeline to RNA-seq data identifies novel virus-like sequence in human blood

Usage metrics

Categories

Licence

Exports