posted on 2021-05-20, 18:04authored byEzequiel G. Mogro, Nicolás M. Ambrosis, Mauricio J. Lozano
Supplementary Tables
Table S1. Genomes used for ISCompare evaluation
Table S2. SurroundingLen parameter optimization. Sheet 1, results of the comparison of E. coli K-12 substr. MG1655 with an artificial genome of the same strain containing 100 IS30 random insertions. Sheet 2, results of the comparison using E. meliloti strain 1021 as reference genome and an artificial E. meliloti 2011 genome with 100 ISRm5 random insertions as query. Sheet 3, Sensitivity and precision analysis. TP, true positives; FP, false positives; FN, false negatives. * indicate values manually found by inspecting the VD reports.
Table S3. ISCompare evaluation using 3,000 random IS30 insertions. Sheet 1: Statistical analysis. Sheet 2: Location of 3,000 randomly inserted IS30. Set 1. Sheet 3: ISCompare results using a compilation of ISs from ISFinder database and the randomly inserted IS30 set 1. Sheet 4: Location of 3,000 randomly inserted IS30. Set 2. Sheet 5: ISCompare results using IS30 as query IS and the randomly inserted IS30 set 2. TP, true positives; FP, false positives; FN, false negatives. * indicate values manually found by inspecting the VD reports.
Table S4. Analysis of differentially located ISs on P. aeruginosa strains. Sheet 1: ISCompare result analysis. TP, true positives; FP, false positives; FN, false negatives. * indicate values manually found by inspecting the VD reports. TP-ALL and FP-ALL indicate the total number of TP or FP detected. Sheet 2: ANIb and DDH results.
Table S5. Analysis of differentially located ISs on E. meliloti strains. Sheet 1: ISCompare results summary. TP, true positives; FP, false positives; FN, false negatives. * indicate values manually found by inspecting the VD reports. TP-ALL and FP-ALL indicate the total number of TP or FP detected. Sheet 2: ANIb calculated using ANI matrix calculator server. Sheet 3: dDDH results from http://ggdc.dsmz.de/ggdc.php.
Table S6. Comparison of ISCompare results using the normal vs the Shift mode (-S). A comparison of DLIS in E. meliloti 1021 and GR4 strains was done using ISCompare with -S option set to 5,000 nucleotides. Green shading indicates new DLIS found in the -S mode. Red shading indicates TP DLIS that were incorrectly detected by the -S mode.
Table S7. Comparison of ISCompare and ISSeeker using E. meliloti genomes. E. meliloti 1021 was used as reference, and compared to GR4 and U1022 strains as queries. In the case of GR4 strain, the -S option was also evaluated. As ISSeeker only can analyse one IS at a time, only ISRm2011-2 location was analyzed using both programs. Sheet 1: ISCompare vs ISSeeker results summary. Sheet 2: ISSeeker, 1021 vs GR4 results. Sheet 3: ISCompare, 1021 vs GR4 results. Sheet 4: ISCompare, 1021 vs GR4 results with -S 5000 setting. Sheet 5: ISSeeker, 1021 vs USDA1022 results. Sheet 6: ISCompare, 1021 vs USDA1022 results. Sheet 7: ISCompare, 1021 vs USDA1022 results with -S 5000 setting.
Table S8. Comparison of ISCompare and ISSeeker using P. aeruginosa genomes. Sheet 1: ISCompare vs ISSeeker results summary. Sheet 2: ISSeeker results for all the analyzed P. aeruginosa strains. Sheet 3: results for all the analyzed P. aeruginosa strains.
Table S9. Comparison of B. pertussis TOHAMA I with strains I127, J299 and J412 containing a IS481 insertion on the pertactin autotransporter gene. Sheet 1: Summary. Sheet 2: Results of TOHAMA I vs I127 using ISCompare. Sheet 3: Results of TOHAMA I vs J299 using ISCompare. Sheet 4: Results of TOHAMA I vs J412 using ISCompare.
Supplementary files
File S1. ContigBlastHit.pm. Modified ContigBlastHit.pm python module from ISSeeker.
Supplementary Figure legends
Figure S1. E. meliloti Average Nucleotide Identity (ANIb) matrix and UPGMA distance tree. The ANIb matrix and UPGMA distance tree were calculated using the ANI-matrix calculator at Kostas lab server (http://enve-omics.ce.gatech.edu/g-matrix/). The accession numbers of the E. meliloti genomes used are listed on Table S1.
Figure S2. Sensitivity and precision of ISCompare using different SurroundingLen values. SurroundingLen parameter optimization was evaluated using a range of nucleotide lengths between 100 and 2000. E. coli K-12 substr. MG1655 genome was compared with an artificial genome of the same strain containing 100 IS30 random insertions. In the case of E. meliloti, the comparison was done between strain 1021 as reference genome and an artificial E. meliloti 2011 genome with 100 ISRm5 random insertions as query. DLIS: Differentially located ISs; VD, discarded cases or cases tagged for manual verification.
Figure S3. Phylogenetic tree of all the sequenced E. meliloti strains at NCBI genomes database. The phylogenetic tree was downloaded from NCBI genomes database. E. meliloti strains were selected according to their phylogenetic distance to the reference strain 1021. Selected strains are shown in color, from green for nearly related strains, to purple for more distant strains. The phylogenetic tree was edited using ITOL server (Letunic and Bork, 2019, https://itol.embl.de/).
Figure S4. Phylogenetic tree of all the sequenced P. aeruginosa strains at NCBI genomes database. The phylogenetic tree was downloaded from NCBI genomes database. P. aeruginosa strains were selected according to their phylogenetic distance to the reference strain PAO1. Selected strains are shown in bold fonts. Collapsed branches are displayed as triangles. Leaves are shown as dots. The phylogenetic tree image was manually edited with Inkscape.
Figure S5. Schematic representation of the possible cases resulting in DLIS, SLIS, and VD reports. Panes A-E represent cases which would be correctly identified by ISCompare as DLIS. Panes F-I correspond to cases which will be reported only using the -rs (report SLIS) option. Panes J-N are cases reported for manual verification, most of them involving repeated sequences. Some of these cases could be DLIS. Panes O-S are cases discarded from the analysis due to non significant or multiple blastn hits. Panes T-V, are other particular cases which could produce false positives. The query and reference genomes are represented by thick lines (green for query, and red/orange/yellow for reference) and ISs are represented as black boxes. Grey boxes represent different ISs and other genes. ltrA: group II intron-encoded protein LtrA. Colored circles indicate the category of differentially located IS candidate (Green, DLIS; Black, SLIS; Purple; magenta and dark blue are “Verify manually” categories; Yellow, orange and red are “Discarded from the analysis” categories).
Figure S6. ISCompare shift mode. A. Schematic representation of the algorithm variations in the -S shit mode. B. Example of usage of -S mode for the identification of differentially located ISs flanking Group II introns. C. Example of usage of -S mode for the identification of differentially located Group II introns using ltrA gene.
History
Article title
Easy identification of insertion sequence mobilization events in related bacterial strains with ISCompare