Co-occurrence Network Analysis
BMP advisory board recommend the use of this pipeline as a standard for Co-occurrence Network Analysis.
We are now working in order to improve this workflow besides making it easier for end-users.
If you have any questions or suggestions, please contact Victor Pylro: victor.pylro@brmicrobiome.org, Leandro Lemos: lemosbioinfo@gmail.com or Luiz Roesch: luiz.roesch@brmicrobiome.org
This tutorial was kindly provided by Francisco Dini-Andreote (Thanks Chico!)
Please, cite our efforts when using this pipeline: Brazilian Microbiome Project: revealing the unexplored microbial diversity--challenges and prospects. Microb Ecol. 2014, 67(2):237-241. doi: 10.1007/s00248-013-0302-4.
Also, remember to cite all others softwares applied here.
Here, we provide the recommended pipeline for Co-occurrence Network Analysis for BMP users.
This example assumes input file (OTU table) in BIOM format.
This page gives some example command lines for constructing a Co-occurrence Network. Of course, you should edit as needed for your file locations (represented here as $PWD/).
1 - Convert you BIOM OTU table in TXT OTU table <<<USING BIOM>>>
biom convert -i otu_table.biom -o otu_table.txt -b --header-key taxonomy
2 - Format OTU table (TXT format) using TextWrangler (on MAC) or XXXXXX (on Linux)
Make sure that your OTU table is a tab delimited TXT file, where columns are samples and rows are OTU's.
Open otu_table.txt and Replace all \r for \n
Save and Close
3 - SparCC Installation
Download SparCC (Click HERE) and put the directory somewhere in your computer.
You also need a working installation of Python (2.6 or 2.7) and the Numpy library.
4 - Calculating correlations
To calculate correlations using the default settings, open a terminal and run:
python $PWD/SparCC.py $PWD/otu_table.txt --cor_file=$PWD/cor_sparcc.out
In the above, remember to replace [$PWD/] with the location of the SparCC directory. [cor_sparcc.out] is the output correlation matrix.
5 - P-values
SparCC uses a permutation-based approach to generate (pseudo) p-values.
So, if you want to calculate p-values for your correlations, the steps are:
5.1 - Generate permuted datasets:
python [sparcc_path]/MakeBootstraps.py [data_path]/OTU_table.txt -n 100 -o permutation_path]/perm
5.2 - Run SparCC on the permuted datasets:
python $PWD/SparCC.py $PWD/perm_0.txt --cor_file=$PWD//perm_cor_0.txt
python $PWD/SparCC.py $PWD/perm_1.txt --cor_file=$PWD//perm_cor_1.txt
...
python $PWD/SparCC.py $PWD/perm_99.txt --cor_file=$PWD//perm_cor_99.txt
* Note that this requires running SparCC many times (at least 100 permutations), which may be very time consuming for a large dataset. Therefore, it is best to do on a multi-node cluster, where you can run jobs in parallel.
5.3 - Compare the correlations obtained for the real data to the ones obtained from the shuffled data to get p-values.
python $PWD/PseudoPvals.py $PWD/cor_sparcc.out $PWD/perm_cor 100 -o $PWD/pvals_two_sided.txt
Make sure to use the exact same parameters which you used when running SparCC on the real data, name all the output files consistently, numbered sequentially, and with a '.txt' extension.
5.4 - Filtering significant and strong SparCC correlations (Download the script Click HERE)
5.4.1 - Edit the script (using any text editor) to specify the correct file paths and desired thresholds.
If you place the script in the same folder as your correlation and p-value files you can just have the file names (like it is in the script at the moment).
Otherwise you will need to specify the full path to the correlation and p-value files.
5.4.2 - In a terminal window, navigate to the folder where the script is located and executing the command:
python $PWD/get_significant_pairs.py
From this point you can go for any Network visualizer such as Gephi or Cytoscape.
​​This workflow is under improvement.
​