BMP - ITS profiling pipeline (Illumina)

ITS profiling analysis pipeline (Illumina paired-end)

BMP advisory board recommend the use of this pipeline as a standard for ITS data analysis.

We are now working in order to improve this workflow besides making it easier for end-users.

If you have any questions or suggestions, please contact Victor Pylro: victor.pylro@brmicrobiome.org, Leandro Lemos: lemosbioinfo@gmail.com or Luiz Roesch: luiz.roesch@brmicrobiome.org

Please, cite our efforts when using this pipeline: Brazilian Microbiome Project: revealing the unexplored microbial diversity--challenges and prospects. Microb Ecol. 2014, 67(2):237-241. doi: 10.1007/s00248-013-0302-4.

Also, remember to cite all others softwares applied here.

WE STRONGLY RECOMMEND USING THE BMPOS v2 TO EASILY PERFORMING ALL THE BELOW STEPS. TRY IT NOW!

Please, cite our efforts when using the BMPOS: BMPOS: a Flexible and User-Friendly Tool Sets for Microbiome Studies Microb Ecol. 2016. doi: 10.1007/s00248-016-0785-x.

Here, we provide the recommended pipeline for ITS profiling analysis for BMP users, from Illumina paired-end data

This pipeline was optimized to using VSEARCH instead USEARCH. This made this pipeline more flexible and applicable

to processing larger datasets (USEARCH 32 bit - can only handle data of up to 4 Gb).

What you need: CLICK HERE!

This example assumes reads in FASTQ format.

This page gives a complete pipeline to analyze fungi ITS data. Of course, you should edit as needed for your reads and file locations (represented here as $PWD/).

From Illumina paired-end reads (R1.fastq/R2.fastq/Barcode.fastq):

We only use the forward file (R1.fastq), since the reads are variable in length.

1 - Demultiplex R1.fastq (barcodes and sequences are contained in two separate .fastq files). Here, we are "turning off" filter parameters, and storing the demultiplexed .fastq file (quality filtering will be done in the next step). <<<USING QIIME>>>

split_libraries_fastq.py -i $PWD/R1.fastq -b $PWD/barcode.fastq -o $PWD/slout/ -m $PWD/map.txt --rev_comp_mapping_barcodes --store_demultiplexed_fastq -r 999 -n 999 -q 0 -p 0.0001

2 - Quality filtering and convert to FASTA <<<USING VSEARCH>>>

vsearch --fastx_filter $PWD/slout/seqs.fastq --fastq_maxee 1.0 --fastq_trunclen 240 --fastaout reads.fa

3 - Change sequence header to make file compatible with further UPARSE steps <<<USING BMP PERL SCRIPT>>>. This script will generate your converted FASTA file.

bmp-Qiime2Uparse.pl -i $PWD/reads.fa -o reads_uparse.fa

4 - Dereplication <<<USING VSEARCH>>>

vsearch --derep_fulllength $PWD/reads_uparse.fa --output derep.fa --sizeout

5 - Abundance sort and discard singletons <<<USING VSEARCH>>>

vsearch --sortbysize $PWD/derep.fa --output sorted.fa --minsize 2

6 - ITSx extractor – selection for ITS sequences based on HMM <<<USING ITSx>>>

ITSx -i sorted.fa -o otus_ITS_extracted --cpu 2 --preserve T -t F

obs: output is otus_ITS_extracted.ITS1.fasta

7 - Shortening reads in the ITS extracted FASTA file <<<USING VSEARCH>>>

vsearch --fastx_filter $PWD/otus_ITS_extracted.ITS1.fasta --fastq_trunclen 140 -fastaout ITS1_trimmed.fa

8 - OTU clustering using UPARSE method <<<USING VSEARCH>>>

vsearch --cluster_size ITS1_trimmed.fa --consout otus1.fa --id 0.97

9 - Fasta Formatter <<<FASTX TOOLKIT SCRIPT>>>

fasta_formatter -i $PWD/otus1.fa -o formated_otus.fa

10 - Renamer <<<BMP SCRIPT>>>

bmp-otuName.pl -i $PWD/formated_otus.fa -o otus.fa

11 - Map reads back to OTU database <<<USING VSEARCH>>>

vsearch --usearch_global reads_uparse.fa --db otus.fa --strand plus --id 0.97 --uc map.txt

12 - Assign taxonomy to OTUS using blast method on QIIME. Use the file “otus.fa” from UPARSE as input file and UNITE as reference database (Download UNITE database HERE)

assign_taxonomy.py -i $PWD/otus.fa -o output -r $PWD/97_otus.fasta -t $PWD/97_otu_taxonomy.txt -m blast

13 - Convert UC to otu-table.txt <<< BMP SCRIPT>>>

bmp-map2qiime.py $PWD/map.uc > otu_table.txt

14 - Convert otu_table.txt to otu-table.biom <<< QIIME SCRIPT>>>

make_otu_table.py -i $PWD/otu_table.txt -t $PWD/otus_tax_assignments.txt -o otu_table.biom

15 - Check OTU Table on QIIME.

biom summarize-table -i $PWD/otu_table.biom -o results_biom_table

15.1 OPTIONAL: OTU table filtering to keep only k_fungi sequences <<<USING QIIME>>>

filter_taxa_from_otu_table.py -i $PWD/otu_table.biom -o otu_table_k_fungi.biom -p k__fungi

16 - Run diversity analyses on QIIME by applying non-phylogenetic metrics (or any other analysis of your choice). The parameter “-e” is the sequencing depth to use for even sub-sampling and maximum rarefaction depth. You should review the output of the ‘biom summarize-table’ (step 16) command to decide on this value.

core_diversity_analyses.py -i $PWD/otu_table.biom -m $PWD/mapping_file.txt -e xxxx -o $PWD/core_output --nonphylogenetic_diversity

This workflow is under improvement.

BACK