demo-data-instructions

duet multiomics solution demo datasets

This section covers reference sequencing data using the duet multiomics solution +modC or duet evoC reagents and bioinformatics pipeline. This data is freely available and can be utilised to explore the capabilities of +modC and duet evoC, assess benchmarks and reproduce performance, demo purposes and more. Please follow the instructions below for the desired dataset.

Table of Contents

+modC GIAB Dataset – Ashkenazi Trio

Experimental design

For this dataset, 7 GIAB gDNA samples including the Ashkenazi Trio covered in Table 1 below, each of 80 ng input and sonicated to 250 bp fragments, were prepared using the +modC kit and sequenced in duplicate at 2×151 using NovaSeq6000. Sequencing data was processed using biomodal pipeline and downsampled to ~30X coverage for downstream analysis.

The following results are representative of routine processing that can be performed and achieved with the +modC kit.

Sample name Sample Origin Reads Mean Coverage
CEG1532-EL01-A1200-002 NA12878 (HG001) 1000M 31.3X
CEG1532-EL01-A1200-005 NA24385 (HG002) 1000M 31.2X
CEG1532-EL01-A1200-008 NA24149 (HG003) 1000M 31.3X
CEG1532-EL01-A1200-011 NA24143 (HG004) 1000M 31.3X
CEG1532-EL01-A1200-015 NA24631 (HG005) 1000M 31.4X
CEG1532-EL01-A1200-017 NA24694 (HG006) 1000M 31.4X
CEG1532-EL01-A1200-020 NA24695 (HG007) 1000M 31.0X

Analysis results

The dataset includes the following files and reports organised in subfolders as follows:

Folder Description
reports/summary_reports Excel sheet containing summary metrics for all samples analysed. The metrics are associated with reads aligning to the reference genome and with reads aligning to the spike-in controls. The sheet includes common NGS workflow metrics and +modC specific metrics.
reports/summary_reports MulitQC report including plots of the most important metrics for all samples including coverage, GC content and methylation bias.
sample_outputs/allele_specific_methylation Folder containing the Allele Specific Methylation (ASM) files with one row per heterozygote allele. Methylation on reads associated with each allele is quantified and a call of allele specific methylation is provided where asymmetry is significant in the Fisher’s Exact test.
sample_outputs/bams Folder containing the Binary Alignment Map (BAM) files representing all sequences aligned to the reference genome. Methylation status is represented using a MM SAM tag.
sample_outputs/modc_quantification Folder containing quantification files reporting modified cytosine content for each sample and analysed genomic context: CG, CHG, CHH.
sample_outputs/variant_call_files The Variant Call Format (VCF) file contains information about variants detected at specific positions in a reference genome.

duet evoC Dataset – NA12878 and mouse ES cell line

Experimental design – duet evoC

For this dataset, 1 GIAB gDNA sample from the dataset described above (80 ng input) and 1 mouse ES-E14 cell line were sonicated to 250 bp fragments. Libraries prepared using the duet evoC kit were sequenced in duplicate at 2×151 using NovaSeq6000. Sequencing data was processed using biomodal pipeline and downsampled to ~30X coverage (1 billion reads) for downstream analysis.

The following results are representative of routine processing that can be performed and achieved with the duet evoC kit.

Sample name Sample Origin Reads Mean Coverage
CEG1485-EL01-D1115-001 ES-E14 (mouse) 1000M 32.8X
CEG1485-EL01-D1115-005 NA12878 (HG002) 1000M 32.6X

Analysis results – duet evoC

The dataset includes the following files and reports organised in subfolders as follows:

Folder Description
reports/summary_reports Excel sheet containing summary metrics for all samples analysed. The metrics are associated with reads aligning to the reference genome and with reads aligning to the spike-in controls. The sheet includes common NGS workflow metrics and duet evoC specific metrics.
reports/summary_reports MultiQC report including plots of the most important metrics for all samples including coverage, GC content and methylation bias.
sample_outputs/allele_specific_methylation Folder containing the Allele Specific Methylation (ASM) files with one row per heterozygote allele. Methylation on reads associated with each allele is quantified and a call of allele specific methylation is provided where asymmetry is significant in the Fisher’s Exact test.
sample_outputs/bams Folder containing the Binary Alignment Map (BAM) files representing all sequences aligned to the reference genome. Methylation status is represented using a MM SAM tag.
sample_outputs/modc_quantification Folder containing quantification files reporting modified cytosine content for each sample and analysed genomic context: CG, CHG, CHH.
sample_outputs/variant_call_files The Variant Call Format (VCF) file contains information about variants detected at specific positions in a reference genome.

Download instructions

The following instructions and details refer to the duet multiomics solution +modC and duet evoC demo data set made available by biomodal for demo and performance assessment purposes.

Setup: Install gcloud CLI and authenticatate

  1. Please create a Google account using your institutional email address by selecting “Use your existing email address” option during the account creation. If you already have a Google account, please omit this step.

  2. Download and install the Google CLI. Make sure that the gcloud init command is run to have authenticated using your Google account.

If you are using Windows OS and are not able to use bash, you can use the following commands to download the folders individually via the gcloud storage commands listed below for +modC and duet evoC datasets.

Overview of available datasets

Human datasets +modC and duet evoC

The dataset are available in subfolders as follows, with approximate sizes and specific download commands specified:

Folder Description Approximate Size Download command
giab/+modC +modC GIAB datasets, including FASTQ files 3.7TB gcloud storage cp --recursive gs://biomodal-data/giab/+modC/ ./
giab/+modC/input +modC GIAB FASTQ files 888GB gcloud storage cp --recursive gs://biomodal-data/giab/+modC/input ./
giab/+modC/1.3.0 +modC GIAB datasets using duet 1.3.0 1.43TB gcloud storage cp --recursive gs://biomodal-data/giab/+modC/1.3.0 ./
giab/+modC/1.3.0/controls +modC GIAB datasets using duet 1.3.0 30 MB gcloud storage cp --recursive gs://biomodal-data/giab/+modC/1.3.0/controls ./
giab/+modC/1.3.0/diagnostics +modC GIAB datasets (with resolved FASTQ files) using duet 1.3.0 800 GB gcloud storage cp --recursive gs://biomodal-data/giab/+modC/1.3.0/diagnostics ./
giab/+modC/1.3.0/reports +modC GIAB datasets using duet 1.3.0 9 MB gcloud storage cp --recursive gs://biomodal-data/giab/+modC/1.3.0/reports ./
giab/+modC/1.3.0/sample_outputs +modC GIAB datasets using duet 1.3.0 630 GB gcloud storage cp --recursive gs://biomodal-data/giab/+modC/1.3.0/sample_outputs ./
giab/+modC/1.4.1 +modC GIAB datasets using duet 1.4.1 1.38TB gcloud storage cp --recursive gs://biomodal-data/giab/+modC/1.4.1 ./
giab/+modC/1.4.1/controls +modC GIAB datasets using duet 1.4.1 29MB gcloud storage cp --recursive gs://biomodal-data/giab/+modC/1.4.1/controls ./
giab/+modC/1.4.1/diagnostics +modC GIAB datasets (with resolved FASTQ files) using duet 1.4.1 800GB gcloud storage cp --recursive gs://biomodal-data/giab/+modC/1.4.1/diagnostics ./
giab/+modC/1.4.1/reports +modC GIAB datasets using duet 1.4.1 9MB gcloud storage cp --recursive gs://biomodal-data/giab/+modC/1.4.1/reports ./
giab/+modC/1.4.1/sample_outputs +modC GIAB datasets using duet 1.4.1 568GB gcloud storage cp --recursive gs://biomodal-data/giab/+modC/1.4.1/sample_outputs ./
giab/+modC/1.4.1/variant_call_files +modC GIAB datasets using duet 1.4.1 949MB gcloud storage cp --recursive gs://biomodal-data/giab/+modC/1.4.1/variant_call_files ./
giab/evoC duet evoC GIAB datasets 635GB gcloud storage cp --recursive gs://biomodal-data/giab/evoC ./
giab/evoC/input duet evoC GIAB FASTQ files 127GB gcloud storage cp --recursive gs://biomodal-data/giab/evoC/input ./
giab/evoC/1.3.0 duet evoC GIAB datasets using duet 1.3.0 312GB gcloud storage cp --recursive gs://biomodal-data/giab/evoC/1.3.0 ./
giab/evoC/1.3.0/controls duet evoC GIAB datasets using duet 1.3.0 4MB gcloud storage cp --recursive gs://biomodal-data/giab/evoC/1.3.0/controls ./
giab/evoC/1.3.0/diagnostics duet evoC GIAB datasets (with resolved FASTQ files) using duet 1.3.0 240MB gcloud storage cp --recursive gs://biomodal-data/giab/evoC/1.3.0/diagnostics ./
giab/evoC/1.3.0/reports duet evoC GIAB datasets using duet 1.3.0 5MB gcloud storage cp --recursive gs://biomodal-data/giab/evoC/1.3.0/reports ./
giab/evoC/1.3.0/sample_outputs duet evoC GIAB datasets using duet 1.3.0 71GB gcloud storage cp --recursive gs://biomodal-data/giab/evoC/1.3.0/sample_outputs ./
giab/evoC/1.4.1 duet evoC GIAB datasets using duet 1.4.1 196GB gcloud storage cp --recursive gs://biomodal-data/giab/evoC/1.4.1 ./
giab/evoC/1.4.1/controls duet evoC GIAB datasets using duet 1.4.10 4MB gcloud storage cp --recursive gs://biomodal-data/giab/evoC/1.4.1/controls ./
giab/evoC/1.4.1/diagnostics duet evoC GIAB datasets (with resolved FASTQ files) using duet 1.4.1 114GB gcloud storage cp --recursive gs://biomodal-data/giab/evoC/1.4.1/diagnostics ./
giab/evoC/1.4.1/reports duet evoC GIAB datasets using duet 1.4.1 4MB gcloud storage cp --recursive gs://biomodal-data/giab/evoC/1.4.1/reports ./
giab/evoC/1.4.1/sample_outputs duet evoC GIAB datasets using duet 1.4.1 82GB gcloud storage cp --recursive gs://biomodal-data/giab/evoC/1.4.1/sample_outputs ./
giab/evoC/1.4.1/variant_call_files duet evoC GIAB datasets using duet 1.4.1 201MB gcloud storage cp --recursive gs://biomodal-data/giab/evoC/1.4.1/variant_call_files ./

Mouse datasets duet evoC

The dataset are available in subfolders as follows, with approximate sizes and specific download commands specified:

Folder Description Approximate Size Download command
mouse/evoC duet evoC mouse datasets, including resolved FASTQ files 486GB gcloud storage cp --recursive gs://biomodal-data/mouse/evoC ./
mouse/evoC/input duet evoC mouse FASTQ files 125GB gcloud storage cp --recursive gs://biomodal-data/mouse/evoC/input ./
mouse/evoC/1.3.0 duet evoC mouse datasets using duet 1.3.0 174GB gcloud storage cp --recursive gs://biomodal-data/mouse/evoC/1.3.0 ./
mouse/evoC/1.3.0/controls duet evoC mouse datasets using duet 1.3.0 4MB gcloud storage cp --recursive gs://biomodal-data/mouse/evoC/1.3.0/controls ./
mouse/evoC/1.3.0/diagnostics duet evoC mouse datasets (with resolved FASTQ files) using duet 1.3.0 114GB gcloud storage cp --recursive gs://biomodal-data/mouse/evoC/1.3.0/diagnostics ./
mouse/evoC/1.3.0/reports duet evoC mouse datasets using duet 1.3.0 5MB gcloud storage cp --recursive gs://biomodal-data/mouse/evoC/1.3.0/reports ./
mouse/evoC/1.3.0/sample_outputs duet evoC mouse datasets using duet 1.3.0 60GB gcloud storage cp --recursive gs://biomodal-data/mouse/evoC/1.3.0/sample_outputs ./
mouse/evoC/1.4.1 duet evoC mouse datasets using duet 1.4.1 186GB gcloud storage cp --recursive gs://biomodal-data/mouse/evoC/1.4.1 ./
mouse/evoC/1.4.1/controls duet evoC mouse datasets using duet 1.4.1 4MB gcloud storage cp --recursive gs://biomodal-data/mouse/evoC/1.4.1/controls ./
mouse/evoC/1.4.1/diagnostics duet evoC mouse datasets (with resolved FASTQ files) using duet 1.4.1 115GB gcloud storage cp --recursive gs://biomodal-data/mouse/evoC/1.4.1/diagnostics ./
mouse/evoC/1.4.1/reports duet evoC mouse datasets using duet 1.4.1 5MB gcloud storage cp --recursive gs://biomodal-data/mouse/evoC/1.4.1/reports ./
mouse/evoC/1.4.1/sample_outputs duet evoC mouse datasets using duet 1.4.1 71GB gcloud storage cp --recursive gs://biomodal-data/mouse/evoC/1.4.1/sample_outputs ./
mouse/evoC/1.4.1/variant_call_files duet evoC mouse datasets using duet 1.4.1 225MB gcloud storage cp --recursive gs://biomodal-data/mouse/evoC/1.4.1/variant_call_files ./

For additional information and/or assistance, please contact support@biomodal.com.

Cambridge Epigenetix is now biomodal