BMP - 16S profiling pipeline (Ion Torrent)

16S profiling analysis pipeline (Ion Torrent)

BMP advisory board recommend the use of this pipeline as a standard for 16S rRNA data analysis.

We are now working in order to improve this workflow besides making it easier for end-users.

If you have any questions or suggestions, please contact Victor Pylro: victor.pylro@brmicrobiome.org, Leandro Lemos: lemosbioinfo@gmail.com or Luiz Roesch: luiz.roesch@brmicrobiome.org

Please, cite our efforts when using this pipeline: Data analysis for 16S microbial profiling from different benchtop sequencing platforms. J Microbiol Methods. 2014. doi: 10.1016/j.mimet.2014.08.018.

Also, remember to cite all others softwares applied here.

WE STRONGLY RECOMMEND USING THE BMPOS v2 TO EASILY PERFORMING ALL THE BELOW STEPS. TRY IT NOW!

Please, cite our efforts when using the BMPOS: BMPOS: a Flexible and User-Friendly Tool Sets for Microbiome Studies Microb Ecol. 2016. doi: 10.1007/s00248-016-0785-x.

This workflow is under improvement.

Here, we provide the recommended pipeline for 16S profiling analysis for BMP users, from Ion Torrent PGM data

This pipeline was optimized to using VSEARCH instead USEARCH. This made this pipeline more flexible and applicable

to processing larger datasets (USEARCH 32 bit - can only handle data of up to 4 Gb).

What you need: CLICK HERE!

This example assumes reads in FASTQ format.

Along the BMP pipeline you will need the forward primer sequence (it is GGACTACNNGGGTNTCTAAT in the example below – change degenerated bases to “N”), and you will also need to prepare a FASTA file called “barcodes.fa” containing the barcodes that identify your samples (see example here). The FASTA label for each barcode should be a short name identifying the sample.

This page gives a complete pipeline to analyze 16S rRNA gene data. Of course, you should edit as needed for your reads and file locations (represented here as $PWD/).

From non-demultiplexed Ion Torrent .fastq files (remember to keep the barcode sequence):

To obtaind non-demultiplexed (raw data) .fastq files from the Ion Torrent server:

- Click on "Reanalyze"

- Then click on "Analysis settings"

- On "Barcode Set" select "None or RNA_Barcode_None"

1- Strip barcodes ("Ex" is a prefix for the read labels, can be anything you like) <<<UPARSE Scripts>>>

fastq_strip_barcode_relabel2.py $PWD/reads.fastq GGACTACNNGGGTNTCTAAT $PWD/barcodes.fa Ex > reads2.fastq

2 - Quality filtering, length truncate, and convert to FASTA <<<USING VSEARCH>>>

vsearch --fastx_filter $PWD/reads2.fastq --fastq_maxee 1.0 --fastq_trunclen 200 --fastaout reads.fa

3 - Dereplication <<<USING VSEARCH>>>

vsearch --derep_fulllength $PWD/reads.fa --output derep.fa --sizeout

4 - Abundance sort and discard singletons <<<USING VSEARCH>>>

vsearch --sortbysize $PWD/derep.fa --output sorted.fa --minsize 2

5 - OTU clustering <<<USING VSEARCH>>>

vsearch --cluster_size $PWD/sorted.fa --consout otus.fa --id 0.97

6 - Map reads back to OTU database <<<Using VSEARCH>>>

vsearch --usearch_global $PWD/reads.fa --db otus.fa --strand plus --id 0.97 --uc map.uc

7 - Assign taxonomy to OTUS using uclust method on QIIME (use the file “otus.fa” as input file)

assign_taxonomy.py -i $PWD/otus.fa -o output

8 - Align sequences on QIIME, using greengenes reference sequences (use the file “otus.fa” as input file)

align_seqs.py -i $PWD/otus.fa -o rep_set_align

9 - Filter alignments on QIIME

filter_alignment.py -i $PWD/otus_aligned.fasta -o filtered_alignment

10 - Make the reference tree on QIIME

make_phylogeny.py -i $PWD/otus_aligned_pfiltered.fasta -o rep_set.tre

11 - Convert UC to otu-table.txt <<< UPARSE PYTHON SCRIPT>>>

uc2otutab.py $PWD/map.uc > otu_table.txt

12 - Convert otu_table.txt to otu-table.biom, used by QIIME <<< BIOM SCRIPT>>>

biom convert -i $PWD/otu_table.txt -o otu_table.biom --table-type="OTU table" --to-json

13 - Add metadata (taxonomy) to OTU table

biom add-metadata -i $PWD/otu_table.biom -o otu_table_tax.biom --observation-metadata-fp $PWD/otus_tax_assignments.txt --observation-header OTUID,taxonomy,confidence --sc-separated taxonomy --float-fields confidence

14 - Check OTU Table on QIIME.

biom summarize-table -i $PWD/otu_table_tax.biom -o results_biom_table

15 - Run diversity analyses on QIIME (or any other analysis of your choice). The parameter “-e” is the sequencing depth to use for even sub-sampling and maximum rarefaction depth. You should review the output of the ‘biom summarize-table’ (step 15) command to decide on this value.

core_diversity_analyses.py -i $PWD/otu_table_tax.biom -m $PWD/mapping_file.txt -t $PWD/rep_set.tre -e xxxx -o $PWD/core_output

A reduced dataset for 16S (Illumina and Ion Torrent) and ITS (Illumina) can be obtained here and used as example.

BACK