Table of Contents
- Overview
- 1. Installation
- 1.1. Download and unzip the complete set of installation and CLI scripts
- 1.2. Choosing the correct installation script
- 1.3. Setting up the pipeline on cloud VMs – Using the biomodal-cloud-utils script
- 1.4. Setting up the pipeline on HPC clusters
- 1.5. Import the duet multiomics solution pipeline using the biomodal CLI and perform a test run
- 1.6 Testing the duet pipeline on a larger dataset
- 1.7 nf-core community pipelines
- 2. Using the CLI
- 3. Advanced Features
- 3.1. CHG and CHH modification calling
- 3.2. Mapping Quality Filtering
- 3.3. Epigenetic Quantification Options
- 3.3.1. Epigenetic Quantification Output Formats
- 3.3.2. Limit epigenetic quantification to primary chromosomes
- 3.3.3. Include SNVs in CG coverage
- 3.3.4. Output uncovered CG contexts
- 3.4. Variant Associated Methylation
- 3.4.1. Allele Specific Methylation
- 3.4.2. Joint Variant Calling
- 3.4.3. Somatic Variant Calling
- 3.5. Reduce publish footprint
- 3.5.1. Disable Resolved FASTQ publishing
- 3.5.2. CRAMs
- 3.6. Subsampling
- 3.7. Alternative Supported Reference Genomes
- 3.7.1. Supported Mouse References
- 3.8. Elevated resource profiles
- 3.8.1. Limit local resource requirements
- 3.8.2. Duet Pipeline Module Requirements
- 3.9. Resuming a disrupted pipeline
- 3.10. Low GCP CPU quotas
- 3.11. Stopping and starting your Orchestration VM
- 3.12. Processing individual samples
- 3.13. Enable generation of the FASTQC report
- 4. Target Enrichment
- 5. Updating the duet pipeline
- 6. Terraform
- 7. HPC Recommendations
- 7.1. Limited local disk space available for temporary files
- 7.2. Per-task or CPU memory reservation for LSF
- 7.3. LSF executor scratch space
- 7.4. Wall-time and minimum CPU settings
- 7.5. Setting specific Queue, CPU, RAM and DISK per pipeline or workflow module
- 7.6. Setting specific Memory settings for SGE/OGE/UGE HPC clusters
- 8. Release notes
- 9. Automation
- 10. Reference genome pipeline
- 10.1. Checking available reference pipeline versions
- 10.2. Reference pipeline download
- 10.3. Make reference genomes
- 10.3.1. Running the reference genome pipeline
- 10.3.2. Reference genome input and output files requirements
- 10.3.3. Reference genome config file
- 10.4. Module hardware requirements
- 10.5. Using new reference genomes in the duet pipeline
- 11. D(h)MR calling workflow
- Support
Overview
The biomodal duet multiomics solution bioinformatics pipeline is intended for processing FASTQ files that have been generated using the duet multiomics solution library preparation kits. The biomodal command line interface (CLI) is the recommended command line tool to install, test, and run the biomodal duet pipeline to analyse all your data.
The duet multiomics solution includes:
- Standard and bespoke trimming of duet +modC and duet evoC FASTQ files
- FASTQ resolution to convert raw paired-end duet +modC and duet evoC reads into resolved, error-suppressed 4-base genomic reads with accompanying epigenomic information
- Reference genome and control alignment using BWA_MEM
- Lane-merging of aligned BAM files
- Forking of aligned genome reads and control reads into separate BAM files and analysis pathways
- Deduplication of the genome and long control BAM files
- Modified cytosine quantification in CpG context
- Optional modified cytosine quantification in CHG and CHH contexts
- Germline variant calling with optional joint variant calling and allele-specific methylation calling
- Genetic accuracy analysis using controls
- Analysis and generation of quality metrics associated with the genome-aligned reads and the control-aligned reads
- Generation of summary reports in Excel and HTML formats
The biomodal pipeline and CLI utilises Nextflow as the orchestration tool that will leverage the capabilities of your compute platform to process your data through the duet pipeline. Nextflow will act as an orchestrator performing the following tasks:
- Launching Virtual Machines or HPC nodes
- Copying input files to the virtual machines or HPC nodes
- Downloading and running Docker containers with appropriate software dependencies
- Executing analyses on the virtual machines or HPC nodes
- Transferring outputs from analyses to local or cloud storage
- Organise output files into a convenient directory structure
- Coordinating the channelling of outputs from one stage of analysis to the next
The following instructions are intended to be executed at the command line with Bash on a Linux platform, or via a Cloud Shell in a browser. If you prefer to use the GUI (cloud console) through the browser, searching the command line instructions will lead you to the platform specific documentation with the steps that need to be taken in the relevant cloud console.
The biomodal CLI and duet pipeline is not supported on Windows platforms. We recommend you use Ubuntu 22.04 or CentOS/RHEL 7. Other Linux distributions may work fine, but we have not specifically tested them. We are looking to test with other Linux distributions in the near future.
Support
Feel free to contact us at support@biomodal.com.
If your inquiry is related to the CLI or duet pipeline software, please include the output of the biomodal info
command in your inquiry.