Biomedical Statistics and Informatics Software Packages

Go to Bioinformatics and Statistics Home

Hybrid-Denovo

Hybrid-denovo is a de novo OTU-picking pipeline integrating single- and paired-end 16S sequence tags. It is designed to take Illumina paired-end sequencing reads as input and output the OTU BIOM table, together with their representative sequences and a phylogenetic tree of OTUs.

The most distinguishable feature of hybrid-denovo is that it can process a mixture of paired-end reads and single-end reads. It is very useful in that Illumina paired-end reads become a mixture of paired-end reads and single-end reads after quality control. For more details, please read our online article.

System requirements:
Linux platform (we used CentOS 6)

Installation:

From source code:

  1. Download hybrid-denovo.tar.gz and unpack.
  2. Download Linux version 8 of USEARCH from http://www.drive5.com/usearch/download.html
  3. Open config/tool.info and set up paths to USEARCH, java, python(ver 2.7) and QIIME (or QIIME2)
  4. This package also includes some python libraries: biom-format (ver 1.3.1), bitarray (ver 0.8.1), pyqi (ver 0.2.0), numpy (ver 1.8.1) and biopython (ver 1.66). Hopefully, they work under your environment. If you get any error message about a library missing, please install it by yourself and set the path in tool.info.

From VM using virtualBox

  1. Alternatively, you can download our VM virtualBox hybriddenovo.ova, which packages all dependencies
  2. Install on Windows (we installed it on windows 7)
  3. Install Oracle VirtualBox
  4. Open the OVA image you downloaded in step 1.
  5. Ubuntu is installed in the VM virtualBox and the sudo password is ‘mayo’ (in cause you want to install additional packages)

Files and directories in the package (hybrid-denovo.tar.gz):

  1. hybrid-denovo: the main script file
  2. config: directory that stores configuration files
    1. run.info         : the input parameters of the pipeline (open the config/run.info for detail)
    2. tool.info        : the path to external modules and packages of the pipeline, and it is set in run.info
  3. external  : external modules and packages
  4. README    : this README file
  5. sampleV3V5: a test sample for V3V5 rDNA amplicon reads
  6. scripts   : shell script and jar files developed by us
  7. test      : our test run results

Usage:
/path/to/hybrid-denovo /path/to/run.info

key parameters to set run.info (open the config/run.info to edit):

  • R1PAIRED_READ_TYPE: read type (0: single end; 1: paired end with overlap, such as V4 region amplicon; 2: paired end without overlap, such as V3-V5 region amplicon)
  • R1PAIRED_READ_LENGTH: input read length
  • R1PAIRED_INPUT_FILES: a directory that includes all input fastq files. (within which, any *.fastq will be used as input)
  • R1PAIRED_WORK_DIR: your working/output directory
  • R1PAIRED_TOOL_INFO: absolute path to tool.info (by default, the pipeline will use: /your_source_dir/config/tool.info).  Please remember to open tool.info and set correct tool paths

Output Files:

  • mapping.txt: a mapping file associates sample ID and fastq file, based on which,
    you can add other meta information for further analysis (such as QIIME).
  • workspace/imtornado/QC.log.txt: QC results showing the number of input reads and
    the number of QC passed reads
  • workspace/imtornado/: results generated by IM-TORNADO using read1s only
    • test_R1.biom (BIOM file)
    • test_R1.biom.table (converted by QIIME from BIOM file)
    • test_R1.tree (a phylogenetic tree generated by FastTree)
    • test_R1.otus.final.result.fasta (OTU representatives)
  • workspace/imtornado/: results generated by IM-TORNADO using paired-end reads
    • test_paired.biom (BIOM file)
    • test_paired.biom.table (converted by QIIME from BIOM file)
    • test_paired.tree (a phylogenetic tree generated by FastTree)
    • test_paired.otus.final.result.fasta (OTU representatives)
  • workspace/R1Paired/: results generated by our hybrid-denovo method
    • test_PairedSingle.biom (BIOM file)
    • test_PairedSingle.biom.table (converted by QIIME from BIOM file)
    • test_PairedSingle.tree (a phylogenetic tree generated by FastTree)
    • test_PairedSingle.otus.final.result.fasta (OTU representatives)

Test run:

  1. Go to unpacked directory
  2. mkdir mytest
  3. cd mytest
  4. Run command ‘../hybrid-denovo ../config/run.info’.
  5. Compare your results to our results (in /your_source_dir/test) to confirm if you have installed correctly.

Notes:

  1. Installing python libraries individually may cause lots of dependence issues. We suggest you to install QIIME first as many libraries used in the pipeline will be auto-installed with QIIME.
  2. biom-format (ver 1.3.1) in tool.info must be kept because this version is required. If you have installed QIIME2, biom-format (ver 2) may be auto-installed and set in your default path, which may cause a path problem.
  3. If you install on Ubuntu, all python libraries included in this package have to be re-installed, and C complied executable files have to be re-built from source code.
  4. To deal with large datasets, we have a parallel-computing version. For request, please contact us.

Questions:
Please contact chen.xianfeng@mayo.edu or chen.jun2@mayo.edu

Page last modified: October 12, 2016