BMP - 16S profiling pipeline (Illumina)

16S profiling analysis pipeline (Illumina paired-end)

BMP advisory board recommend the use of this pipeline as a standard for 16S rRNA data analysis.

We are now working in order to improve this workflow besides making it easier for end-users.

If you have any questions or suggestions, please contact Victor Pylro: victor.pylro@brmicrobiome.org, Leandro Lemos: lemosbioinfo@gmail.com or Luiz Roesch: luiz.roesch@brmicrobiome.org

Please, cite our efforts when using this pipeline: Data analysis for 16S microbial profiling from different benchtop sequencing platforms. J Microbiol Methods. 2014. doi: 10.1016/j.mimet.2014.08.018.

Also, remember to cite all others softwares applied here.

WE STRONGLY RECOMMEND USING THE BMPOS v2 TO EASILY PERFORMING ALL THE BELOW STEPS. TRY IT NOW!

Please, cite our efforts when using the BMPOS: BMPOS: a Flexible and User-Friendly Tool Sets for Microbiome Studies Microb Ecol. 2016. doi: 10.1007/s00248-016-0785-x.

Here, we provide the recommended pipeline for 16S profiling analysis for BMP users, from Illumina paired-end data.

This pipeline was optimized to using VSEARCH instead USEARCH. This made this pipeline more flexible and applicable

to processing larger datasets (USEARCH 32 bit - can only handle data of up to 4 Gb).

What you need: CLICK HERE!

This example assumes reads in FASTQ format.

This page gives a complete pipeline to analyze 16S rRNA gene data. Of course, you should edit as needed for your reads and file locations (represented here as $PWD/).

From Illumina paired-end reads (R1.fastq/R2.fastq/Barcode.fastq):

OBS: for already demultiplexed Illumina data (each sample represented by one file), please contact us. We have a solution, and it will be available here soon.

1 - Take forward and reverse Illumina reads (R1.fastq and R2.fastq files) and joins them using the method fastq-join, together with updating the barcode file on QIIME 1.9.

join_paired_ends.py -f $PWD/forward_reads.fastq -r $PWD/reverse_reads.fastq -b $PWD/barcodes.fastq -o $PWD/joined.fastq

2 - Demultiplex .fastq sequence data, when barcodes and sequences are contained in two separate .fastq files. Here, we are "turning off" filter parameters, and storing the demultiplexed .fastq file (quality filtering will be done in the next step).

split_libraries_fastq.py -i $PWD/joined.fastq -b $PWD/barcode.fastq -o $PWD/slout/ -m $PWD/map.txt --rev_comp_mapping_barcodes --store_demultiplexed_fastq -r 999 -n 999 -q 0 -p 0.0001

3 - Quality filtering, length truncate, and convert to FASTA <<<USING VSEARCH>>>

vsearch --fastx_filter $PWD/slout/seqs.fastq --fastq_maxee 1.0 --fastq_trunclen 240 --fastaout reads.fa

4 - Change sequence header to make file compatible with further UPARSE steps <<<USING BMP PERL SCRIPT>>>. This script will generate your converted FASTA file.

bmp-Qiime2Uparse.pl -i $PWD/reads.fa -o reads_uparse.fa

5 - Dereplication <<<USING VSEARCH>>>

vsearch --derep_fulllength $PWD/reads_uparse.fa --output derep.fa --sizeout

6 - Abundance sort and discard singletons <<<USING VSEARCH>>>

vsearch --sortbysize $PWD/derep.fa --output sorted.fa --minsize 2

7 - OTU clustering using UPARSE method <<<USING VSEARCH>>>

vsearch --cluster_size $PWD/sorted.fa --consout otus1.fa --id 0.97

8 - Fasta Formatter <<<FASTX TOOLKIT SCRIPT>>>

fasta_formatter -i otus1.fa -o formated_otus1.fa

9 - Renamer <<<BMP SCRIPT>>>

bmp-otuName.pl -i formated_otus1.fa -o otus.fa

10 - Map reads back to OTU database <<<VSEARCH>>>

vsearch --usearch_global $PWD/reads_uparse.fa --db otus.fa --strand plus --id 0.97 --uc map.txt

11 - Assign taxonomy to OTUS using uclust method on QIIME (use the file “otus.fa” as input file)

assign_taxonomy.py -i $PWD/otus.fa -o output

12 - Align sequences on QIIME, using greengenes reference sequences (use the file “otus.fa” as input file)

align_seqs.py -i $PWD/otus.fa -o rep_set_align

13 - Filter alignments on QIIME

filter_alignment.py -i $PWD/otus_aligned.fasta -o filtered_alignment

14 - Make the reference tree on QIIME

make_phylogeny.py -i $PWD/otus_aligned_pfiltered.fasta -o rep_set.tre

15 - Convert UC to otu-table.txt <<< BMP SCRIPT>>>

bmp-map2qiime.py map.uc > otu_table.txt

16 - Convert otu_table.txt to otu-table.biom <<< QIIME SCRIPT>>>

make_otu_table.py -i otu_table.txt -t otus_tax_assignments.txt -o otu_table.biom

17 - Check OTU Table on QIIME.

biom summarize-table -i $PWD/otu_table.biom -o results_biom_table

18 - Run diversity analyses on QIIME (or any other analysis of your choice). The parameter “-e” is the sequencing depth to use for even sub-sampling and maximum rarefaction depth. You should review the output of the ‘biom summarize-table’ (step 18) command to decide on this value.

core_diversity_analyses.py -i $PWD/otu_table.biom -m $PWD/mapping_file.txt -t $PWD/rep_set.tre -e xxxx -o $PWD/core_output

A reduced dataset for 16S (Illumina and Ion Torrent) and ITS (Illumina) can be obtained here and used as example.

This workflow is under improvement.

BACK