BMP - 18S profiling pipeline (Illumina)

18S profiling analysis pipeline (Illumina paired-end)

BMP advisory board recommend the use of this pipeline as a standard for 18S rRNA data analysis.

We are now working in order to improve this workflow besides making it easier for end-users.

If you have any questions or suggestions, please contact Victor Pylro: victor.pylro@brmicrobiome.org, Leandro Lemos: lemosbioinfo@gmail.com or Luiz Roesch: luiz.roesch@brmicrobiome.org

Please, cite our efforts when using this pipeline: Brazilian Microbiome Project: revealing the unexplored microbial diversity--challenges and prospects. Microb Ecol. 2014, 67(2):237-241. doi: 10.1007/s00248-013-0302-4.

Also, remember to cite all others softwares applied here.

Here, we provide the recommended pipeline for 18S profiling analysis for BMP users, from Illumina paired-end data

What you need: CLICK HERE!

This example assumes reads in FASTQ format.

This page gives some example command lines for constructing a UPARSE pipeline. Of course, you should edit as needed for your reads and file locations (represented here as $PWD/).

From Illumina paired-end reads (R1.fastq/R2.fastq/Barcode.fastq):

NOTE: You may also use only the R1.fastq file, because sometimes do not assembling can provide you better outcomes [check your assembling results (Step 1) before deciding what is the best approach]. In this case, start from the Step 2.

1 - Take forward and reverse Illumina reads (R1.fastq and R2.fastq files) and joins them using the method fastq-join, together with updating the barcode file on QIIME 1.8.0.

join_paired_ends.py -f $PWD/forward_reads.fastq –r $PWD/reverse_reads.fastq –b $PWD/barcodes.fastq -o $PWD/joined.fastq

2 - Demultiplex .fastq sequence data, when barcodes and sequences are contained in two separate .fastq files. Here, we are "turning off" filter parameters, and storing the demultiplexed .fastq file (quality filtering will be done in the next step).

split_libraries_fastq.py -i $PWD/joined.fastq -b $PWD/barcode.fastq -o $PWD/slout/ -m $PWD/map.txt --rev_comp_mapping_barcodes --store_demultiplexed_fastq -r 0 -q 0 -n 100

3 - Quality filtering, length truncate, and convert to FASTA <<<USING USEARCH 7>>>

$u -fastq_filter $PWD/seqs.fastq -fastq_maxee 0.5 -fastq_trunclen 240 -fastaout reads.fa

4 - Change sequence header to make file compatible with further UPARSE steps <<<USING BMP PERL SCRIPT>>>. This script will generate your converted FASTA file.

perl bmp-Qiime2Uparse.pl -i $PWD/reads.fa -o reads_uparse.fa

5 - Dereplication <<<USING USEARCH 7>>>

$u -derep_fulllength $PWD/reads_uparse.fa -output derep.fa -sizeout

6 - Abundance sort and discard singletons <<<USING USEARCH 7>>>

$u -sortbysize $PWD/derep.fa -output sorted.fa -minsize 2

7 - OTU clustering using UPARSE method <<<USING USEARCH 7>>>

$u -cluster_otus $PWD/sorted.fa -otus otus1.fa

8 - Chimera filtering using reference database <<<USING USEARCH 7>>> (Download SILVA dataset_111 HERE)

$u -uchime_ref $PWD/otus1.fa -db $PWD/rep_set/97_Silva_111_rep_set.fasta -strand plus -nonchimeras otus2.fa

9 - Fasta Formatter <<<FASTX TOOLKIT SCRIPT>>>

fasta_formatter -i otus2.fa -o formated_otus2.fa

10 - Renamer <<<BMP SCRIPT>>>

perl bmp-otuName.pl -i formated_otus2.fa -o otus.fa

11 - Map reads back to OTU database <<<USEARCH7 script>>>

usearch -usearch_global reads_uparse.fa -db otus.fa -strand plus -id 0.97 -uc map.uc

12 - Assign taxonomy to OTUS using uclust method on QIIME (use the file “otus.fa” from UPARSE as input file)

assign_taxonomy.py -i $PWD/otus.fa -o output -r $PWD/eukaryotes_only/rep_set_euks/97_Silva_111_rep_set_euk.fasta -t $PWD/taxonomy_euks/97_Silva_111_taxa_map_euks.txt

13 - Align sequences on QIIME, using greengenes reference sequences (use the file “otus.fa” from UPARSE as input file)

align_seqs.py -i $PWD/otus.fa -o rep_set_align -t $PWD/rep_set_aligned_euks/97_Silva_111_rep_set_euk_aligned.fasta

14 - Filter alignments on QIIME

filter_alignment.py -i $PWD/otus_aligned.fasta -o filtered_alignment

15 - Make the reference tree on QIIME

make_phylogeny.py -i $PWD/otus_aligned_pfiltered.fasta -o rep_set.tre

16 - Convert UC to otu-table.txt <<< BMP SCRIPT>>>

python map2qiime.py map.uc > otu_table.txt

17 - Convert otu_table.txt to otu-table.biom <<< QIIME SCRIPT>>>

make_otu_table.py -i otu_table.txt -t otus_tax_assignments.txt -o otu_table.biom

18 - Check OTU Table on QIIME.

biom summarize-table -i $PWD/otu_table_tax.biom -o results_biom_table

19 - Run diversity analyses on QIIME (or any other analysis of your choice). The parameter “-e” is the sequencing depth to use for even sub-sampling and maximum rarefaction depth. You should review the output of the ‘biom summarize-table’ (step 17) command to decide on this value.

core_diversity_analyses.py -i $PWD/otu_table_tax.biom -m $PWD/mapping_file.txt -t $PWD/rep_set.tre -e xxxx -o $PWD/core_output

This workflow is under improvement.

BACK