Supplemental Material for Basu and Larsson, 2018
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
Supplementary figure 1. (S1_G3.pdf) General statistics and pipeline. A) Number of primary tumor samples for which raw sequencing data was downloaded, processed and analyzed. B) Pipeline for mapping and quantification of sequencing reads on the human genome.
Supplementary figure 2. (S2_G3.pdf) Distribution of spearman correlation coefficients comparing the expression of lncRNAs and proximal coding genes (100 Kb windows) across tumor samples in 32 cancer types.
Supplementary figure 3. (S3_G3.pdf) Distribution of spearman correlation coefficients comparing the expression of lncRNAs and proximal coding genes (100 Kb windows) across tumor samples in 32 cancer types where coding and lncRNA genomic coordinates are shuffled.
Supplementary figure 4. (S4_G3.pdf) Expression correlation enrichment and distribution at a stringent P cut-off A) Enrichment of significant correlations (P < 1e-5) in between lncRNAs and neighbor coding genes (100 Kb) compared to random lncRNA-coding pairs. B) Distribution of spearman correlation coefficients for coding genes and their proximal lncRNAs and proximal coding genes (100 Kb, ρ, P < 1e-5).
Supplementary figure 5. (S5_G3.pdf) Percentage of coding/lncRNA pairs showing positive and negative correlation at different distance threshold across multiple cancer types. A) 25 Kb B) 50 Kb C) 100 Kb D) mean percentage of positive and negative correlations across all cancers at 25, 50 and 100 Kb.
Supplementary figure 6. (S6_G3.pdf) Cytoplasmic/Nucleus Relative Concentration Index distribution in coding/lncRNA pairs showing negative (LCN) and positive (LCP) correlation compared against all lncRNAs. Grey asterisk marks a significant difference in distribution between LCN and all lncRNA while red asterisk marks the same between LCN and LCP (Student’s t-test, P < 0.05).
Supplementary table 1. (Supplementary_table-1.xls) Known long non-coding RNA and coding gene associations captured by expression correlation analysis across multiple cancer types.
Supplementary table 2. (Supplementary_table-2.xls) 193 pairs of lncRNAs and coding genes (involving 178 unique lncRNAs and 188 unique coding genes) showing negative expression correlation in the majority of cancers with significant correlation (P < 1e-5) in at least one cancer type.
Supplementary table 3. (Supplementary_table-3.xls) Predicted subcellular localization of lncRNAs which show negative expression correlation in the majority of cancers with significant correlation (P < 1e-5) in at least one cancer type.
Supplementary table 4. (Supplementary_table-4.xls) LncRNAs which are significantly negatively correlated with a coding gene at the expression level while the lncRNA expression correlates positively with the methylation levels at the transcription start site (TSS) of their coding counterparts.