top of page

16S profiling analysis pipeline (Windows)

 

BMP advisory board recommend the use of this pipeline as a standard for 16S rRNA data analysis.

We are now working in order to improve this workflow besides making it easier for end-users. 

If you have any questions or suggestions, please contact Victor Pylro: victor.pylro@brmicrobiome.org, Daniel Morais: daniel.morais@brmicrobiome.org or Luiz Roesch: luiz.roesch@brmicrobiome.org

 

Please, cite our efforts when using this pipeline: Data analysis for 16S microbial profiling from different benchtop sequencing platforms. J Microbiol Methods. 2014. doi: 10.1016/j.mimet.2014.08.018.

​

Please, cite our efforts when using the BTW package: BTW - Bioinformatics Through the Windows: a easy-to-install package to analyze 16S rRNA data on the Windows Subsystem for Linux (WSL). (Submitted).

​

Also, remember to cite all others softwares applied here.

​

VSEARCH

QIIME 1.9

FastX Toolkit

fastq-join

FLASH

ClustalW

RDP Classifier

Here, we provide the recommended pipeline for 16S profiling analysis using the Windows Subsystem for Linux (WSL)

 

What you need: only the BTW package!! Click here

​

Installing the Windows Subsystem for Linux (WSL): Click here

 

This example assumes reads in FASTQ format.

 

This page gives a complete pipeline to analyze 16S rRNA gene data. Of course, you should edit as needed for your reads and file locations (represented here as $PWD/). 

 

From Illumina paired-end reads: this pipelines assumes that you have an input folder of paired-up files (by filename, with the default _R1_ and _R2_ containing the forward and reverse reads filenames, respectively):  

 

1 - Take forward and reverse Illumina reads (R1.fastq and R2.fastq files) and join them using the method fastq-join <<<USING QIIME 1.9>>>

 

multiple_join_paired_ends.py -i input_files -o merged/

​

1.1 - Alternatively, FLASH can be applied to perform the same task (this must be done for each sample, separately) 

flash -m 20 -M 250 -x 0.25 -p 33  R1.fastq R2.fastq -o merged/

​

2 - Quality filtering, length truncate, and convert to FASTA each joined sample <<<USING VSEARCH>>>

 

vsearch --fastx_filter $PWD/fastqjoin.join.fastq --fastq_maxee 1.0 --fastq_trunclen 220 --fastaout samplex.fa

​

Obs: the --fastq_trunclen parameter will depend on the length of you joined reads. You can use FASTQC to taking this decision.

​

3 - Change sequence header to make file compatible with further steps <<<USING BMP PERL SCRIPT>>>. This script will generate your converted FASTA file. Sample´s name should not contain any special characters, symbols or spaces. We strongly recommend keeping samples´s name as simple as possible.

 

bmp_demultiplexed.pl -i samplex.fa -o samplename -b samplename

 

4 - Make a single file containing all your samples

​

cat sample1 sample2 sample3 sample4 ... > reads.fa

​

5 - Dereplication <<<USING VSEARCH>>>

 

vsearch --derep_fulllength $PWD/reads.fa --output derep.fa --sizeout

 

6 - Abundance sort and discard singletons <<<USING VSEARCH>>>

 

vsearch --sortbysize $PWD/derep.fa --output sorted.fa --minsize 2

 

7 - OTU clustering using UPARSE method <<<USING VSEARCH>>>

 

vsearch --cluster_size $PWD/sorted.fa --consout otus1.fa --id 0.97

​

8 - Fasta Formatter <<<FASTX TOOLKIT SCRIPT>>>

​

fasta_formatter -i otus1.fa -o formated_otus1.fa

​

9 - Renamer <<<BMP SCRIPT>>>

​

bmp-otuName.pl -i formated_otus1.fa -o otus.fa

​

10 - Map reads back to OTU database <<<VSEARCH>>>

​

vsearch --usearch_global $PWD/reads.fa --db otus.fa --strand plus --id 0.97 --uc map.txt

​

11 - Assign taxonomy to OTUS using the RDP Classifier on QIIME (use the file “otus.fa” as input file)

​

assign_taxonomy.py -i $PWD/otus.fa -m rdp -o taxonomy

​

12 - Align sequences on QIIME, using the ClustalW method (use the file “otus.fa” as input file)

​

align_seqs.py -i $PWD/otus.fa -m clustalw -o rep_set_align

​

13 - Filter alignments on QIIME

​

filter_alignment.py -i $PWD/otus_aligned.fasta -o filtered_alignment

​

14 - Make the reference tree on QIIME

​

make_phylogeny.py -i $PWD/otus_aligned_pfiltered.fasta -o rep_set.tre

​

15 - Convert UC to otu-table.txt <<< BMP SCRIPT>>>

​

bmp-map2qiime.py map.uc > otu_table.txt

​

16 - Convert otu_table.txt to otu-table.biom <<< QIIME SCRIPT>>>

​

make_otu_table.py -i otu_table.txt -t otus_tax_assignments.txt -o otu_table.biom

​

17 - Check OTU Table  on QIIME.

​

biom summarize-table -i $PWD/otu_table.biom -o results_biom_table

​

18 - Run diversity analyses on QIIME (or any other analysis of your choice). The parameter “-e” is the sequencing depth to use for even sub-sampling and maximum rarefaction depth. You should review the output of the ‘biom summarize-table’ (step 18) command to decide on this value.

​

core_diversity_analyses.py -i $PWD/otu_table.biom -m $PWD/mapping_file.txt -t $PWD/rep_set.tre -e xxxx -o $PWD/core_output

 

The generated .biom OTU table is also fully compatible with the MicrobiomeAnalyst, a user-friendly web-based platform for microbiome data analyses and visualizations, including taxonomy plots and estimates of α- and β-diversity (http://www.microbiomeanalyst.ca). 

​

​​This workflow is under improvement.

​

​This website is best viewed using Firefox Browser.

© Copyright 2012-2013 Brazilian Microbiome Project

bottom of page