duet multiomics solution demo datasets
This section covers reference sequencing data using the duet multiomics solution +modC or duet evoC reagents and bioinformatics pipeline. This data is freely available and can be utilised to explore the capabilities of +modC and duet evoC, assess benchmarks and reproduce performance, demo purposes and more. Please follow the instructions below for the desired dataset.
Table of Contents
- +modC GIAB Dataset – Ashkenazi Trio
- duet evoC Dataset – NA12878 and mouse ES cell line
- Download instructions
- Overview of available datasets
+modC GIAB Dataset – Ashkenazi Trio
Experimental design
For this dataset, 7 GIAB gDNA samples including the Ashkenazi Trio covered in Table 1 below, each of 80 ng input and sonicated to 250 bp fragments, were prepared using the +modC kit and sequenced in duplicate at 2×151 using NovaSeq6000. Sequencing data was processed using biomodal pipeline and downsampled to ~30X coverage for downstream analysis.
The following results are representative of routine processing that can be performed and achieved with the +modC kit.
Sample name | Sample Origin | Reads | Mean Coverage |
---|---|---|---|
CEG1532-EL01-A1200-002 | NA12878 (HG001) | 1000M | 31.3X |
CEG1532-EL01-A1200-005 | NA24385 (HG002) | 1000M | 31.2X |
CEG1532-EL01-A1200-008 | NA24149 (HG003) | 1000M | 31.3X |
CEG1532-EL01-A1200-011 | NA24143 (HG004) | 1000M | 31.3X |
CEG1532-EL01-A1200-015 | NA24631 (HG005) | 1000M | 31.4X |
CEG1532-EL01-A1200-017 | NA24694 (HG006) | 1000M | 31.4X |
CEG1532-EL01-A1200-020 | NA24695 (HG007) | 1000M | 31.0X |
Analysis results
The dataset includes the following files and reports organised in subfolders as follows:
Folder | Description |
---|---|
reports/summary_reports |
Excel sheet containing summary metrics for all samples analysed. The metrics are associated with reads aligning to the reference genome and with reads aligning to the spike-in controls. The sheet includes common NGS workflow metrics and +modC specific metrics. |
reports/summary_reports |
MulitQC report including plots of the most important metrics for all samples including coverage, GC content and methylation bias. |
sample_outputs/allele_specific_methylation |
Folder containing the Allele Specific Methylation (ASM) files with one row per heterozygote allele. Methylation on reads associated with each allele is quantified and a call of allele specific methylation is provided where asymmetry is significant in the Fisher’s Exact test. |
sample_outputs/bams |
Folder containing the Binary Alignment Map (BAM) files representing all sequences aligned to the reference genome. Methylation status is represented using a MM SAM tag. |
sample_outputs/modc_quantification |
Folder containing quantification files reporting modified cytosine content for each sample and analysed genomic context: CG, CHG, CHH. |
sample_outputs/variant_call_files |
The Variant Call Format (VCF) file contains information about variants detected at specific positions in a reference genome. |
duet evoC Dataset – NA12878 and mouse ES cell line
Experimental design – duet evoC
For this dataset, 1 GIAB gDNA sample from the dataset described above (80 ng input) and 1 mouse ES-E14 cell line were sonicated to 250 bp fragments. Libraries prepared using the duet evoC kit were sequenced in duplicate at 2×151 using NovaSeq6000. Sequencing data was processed using biomodal pipeline and downsampled to ~30X coverage (1 billion reads) for downstream analysis.
The following results are representative of routine processing that can be performed and achieved with the duet evoC kit.
Sample name | Sample Origin | Reads | Mean Coverage |
---|---|---|---|
CEG1485-EL01-D1115-001 | ES-E14 (mouse) | 1000M | 32.8X |
CEG1485-EL01-D1115-005 | NA12878 (HG002) | 1000M | 32.6X |
Analysis results – duet evoC
The dataset includes the following files and reports organised in subfolders as follows:
Folder | Description |
---|---|
reports/summary_reports |
Excel sheet containing summary metrics for all samples analysed. The metrics are associated with reads aligning to the reference genome and with reads aligning to the spike-in controls. The sheet includes common NGS workflow metrics and duet evoC specific metrics. |
reports/summary_reports |
MultiQC report including plots of the most important metrics for all samples including coverage, GC content and methylation bias. |
sample_outputs/allele_specific_methylation |
Folder containing the Allele Specific Methylation (ASM) files with one row per heterozygote allele. Methylation on reads associated with each allele is quantified and a call of allele specific methylation is provided where asymmetry is significant in the Fisher’s Exact test. |
sample_outputs/bams |
Folder containing the Binary Alignment Map (BAM) files representing all sequences aligned to the reference genome. Methylation status is represented using a MM SAM tag. |
sample_outputs/modc_quantification |
Folder containing quantification files reporting modified cytosine content for each sample and analysed genomic context: CG, CHG, CHH. |
sample_outputs/variant_call_files |
The Variant Call Format (VCF) file contains information about variants detected at specific positions in a reference genome. |
Download instructions
The following instructions and details refer to the duet multiomics solution +modC and duet evoC demo data set made available by biomodal for demo and performance assessment purposes.
Setup: Install gcloud CLI and authenticatate
-
Please create a Google account using your institutional email address by selecting “Use your existing email address” option during the account creation. If you already have a Google account, please omit this step.
-
Download and install the Google CLI. Make sure that the
gcloud init
command is run to have authenticated using your Google account.
If you are using Windows OS and are not able to use bash, you can use the following commands to download the folders individually via the gcloud storage
commands listed below for +modC and duet evoC datasets.
Overview of available datasets
Human datasets +modC and duet evoC
The dataset are available in subfolders as follows, with approximate sizes and specific download commands specified:
Folder | Description | Approximate Size | Download command |
---|---|---|---|
giab/+modC | +modC GIAB datasets, including FASTQ files | 3.7TB | gcloud storage cp --recursive gs://biomodal-data/giab/+modC/ ./ |
giab/+modC/input | +modC GIAB FASTQ files | 888GB | gcloud storage cp --recursive gs://biomodal-data/giab/+modC/input ./ |
giab/+modC/1.3.0 | +modC GIAB datasets using duet 1.3.0 | 1.43TB | gcloud storage cp --recursive gs://biomodal-data/giab/+modC/1.3.0 ./ |
giab/+modC/1.3.0/controls | +modC GIAB datasets using duet 1.3.0 | 30 MB | gcloud storage cp --recursive gs://biomodal-data/giab/+modC/1.3.0/controls ./ |
giab/+modC/1.3.0/diagnostics | +modC GIAB datasets (with resolved FASTQ files) using duet 1.3.0 | 800 GB | gcloud storage cp --recursive gs://biomodal-data/giab/+modC/1.3.0/diagnostics ./ |
giab/+modC/1.3.0/reports | +modC GIAB datasets using duet 1.3.0 | 9 MB | gcloud storage cp --recursive gs://biomodal-data/giab/+modC/1.3.0/reports ./ |
giab/+modC/1.3.0/sample_outputs | +modC GIAB datasets using duet 1.3.0 | 630 GB | gcloud storage cp --recursive gs://biomodal-data/giab/+modC/1.3.0/sample_outputs ./ |
giab/+modC/1.4.1 | +modC GIAB datasets using duet 1.4.1 | 1.38TB | gcloud storage cp --recursive gs://biomodal-data/giab/+modC/1.4.1 ./ |
giab/+modC/1.4.1/controls | +modC GIAB datasets using duet 1.4.1 | 29MB | gcloud storage cp --recursive gs://biomodal-data/giab/+modC/1.4.1/controls ./ |
giab/+modC/1.4.1/diagnostics | +modC GIAB datasets (with resolved FASTQ files) using duet 1.4.1 | 800GB | gcloud storage cp --recursive gs://biomodal-data/giab/+modC/1.4.1/diagnostics ./ |
giab/+modC/1.4.1/reports | +modC GIAB datasets using duet 1.4.1 | 9MB | gcloud storage cp --recursive gs://biomodal-data/giab/+modC/1.4.1/reports ./ |
giab/+modC/1.4.1/sample_outputs | +modC GIAB datasets using duet 1.4.1 | 568GB | gcloud storage cp --recursive gs://biomodal-data/giab/+modC/1.4.1/sample_outputs ./ |
giab/+modC/1.4.1/variant_call_files | +modC GIAB datasets using duet 1.4.1 | 949MB | gcloud storage cp --recursive gs://biomodal-data/giab/+modC/1.4.1/variant_call_files ./ |
giab/evoC | duet evoC GIAB datasets | 635GB | gcloud storage cp --recursive gs://biomodal-data/giab/evoC ./ |
giab/evoC/input | duet evoC GIAB FASTQ files | 127GB | gcloud storage cp --recursive gs://biomodal-data/giab/evoC/input ./ |
giab/evoC/1.3.0 | duet evoC GIAB datasets using duet 1.3.0 | 312GB | gcloud storage cp --recursive gs://biomodal-data/giab/evoC/1.3.0 ./ |
giab/evoC/1.3.0/controls | duet evoC GIAB datasets using duet 1.3.0 | 4MB | gcloud storage cp --recursive gs://biomodal-data/giab/evoC/1.3.0/controls ./ |
giab/evoC/1.3.0/diagnostics | duet evoC GIAB datasets (with resolved FASTQ files) using duet 1.3.0 | 240MB | gcloud storage cp --recursive gs://biomodal-data/giab/evoC/1.3.0/diagnostics ./ |
giab/evoC/1.3.0/reports | duet evoC GIAB datasets using duet 1.3.0 | 5MB | gcloud storage cp --recursive gs://biomodal-data/giab/evoC/1.3.0/reports ./ |
giab/evoC/1.3.0/sample_outputs | duet evoC GIAB datasets using duet 1.3.0 | 71GB | gcloud storage cp --recursive gs://biomodal-data/giab/evoC/1.3.0/sample_outputs ./ |
giab/evoC/1.4.1 | duet evoC GIAB datasets using duet 1.4.1 | 196GB | gcloud storage cp --recursive gs://biomodal-data/giab/evoC/1.4.1 ./ |
giab/evoC/1.4.1/controls | duet evoC GIAB datasets using duet 1.4.10 | 4MB | gcloud storage cp --recursive gs://biomodal-data/giab/evoC/1.4.1/controls ./ |
giab/evoC/1.4.1/diagnostics | duet evoC GIAB datasets (with resolved FASTQ files) using duet 1.4.1 | 114GB | gcloud storage cp --recursive gs://biomodal-data/giab/evoC/1.4.1/diagnostics ./ |
giab/evoC/1.4.1/reports | duet evoC GIAB datasets using duet 1.4.1 | 4MB | gcloud storage cp --recursive gs://biomodal-data/giab/evoC/1.4.1/reports ./ |
giab/evoC/1.4.1/sample_outputs | duet evoC GIAB datasets using duet 1.4.1 | 82GB | gcloud storage cp --recursive gs://biomodal-data/giab/evoC/1.4.1/sample_outputs ./ |
giab/evoC/1.4.1/variant_call_files | duet evoC GIAB datasets using duet 1.4.1 | 201MB | gcloud storage cp --recursive gs://biomodal-data/giab/evoC/1.4.1/variant_call_files ./ |
Mouse datasets duet evoC
The dataset are available in subfolders as follows, with approximate sizes and specific download commands specified:
Folder | Description | Approximate Size | Download command |
---|---|---|---|
mouse/evoC | duet evoC mouse datasets, including resolved FASTQ files | 486GB | gcloud storage cp --recursive gs://biomodal-data/mouse/evoC ./ |
mouse/evoC/input | duet evoC mouse FASTQ files | 125GB | gcloud storage cp --recursive gs://biomodal-data/mouse/evoC/input ./ |
mouse/evoC/1.3.0 | duet evoC mouse datasets using duet 1.3.0 | 174GB | gcloud storage cp --recursive gs://biomodal-data/mouse/evoC/1.3.0 ./ |
mouse/evoC/1.3.0/controls | duet evoC mouse datasets using duet 1.3.0 | 4MB | gcloud storage cp --recursive gs://biomodal-data/mouse/evoC/1.3.0/controls ./ |
mouse/evoC/1.3.0/diagnostics | duet evoC mouse datasets (with resolved FASTQ files) using duet 1.3.0 | 114GB | gcloud storage cp --recursive gs://biomodal-data/mouse/evoC/1.3.0/diagnostics ./ |
mouse/evoC/1.3.0/reports | duet evoC mouse datasets using duet 1.3.0 | 5MB | gcloud storage cp --recursive gs://biomodal-data/mouse/evoC/1.3.0/reports ./ |
mouse/evoC/1.3.0/sample_outputs | duet evoC mouse datasets using duet 1.3.0 | 60GB | gcloud storage cp --recursive gs://biomodal-data/mouse/evoC/1.3.0/sample_outputs ./ |
mouse/evoC/1.4.1 | duet evoC mouse datasets using duet 1.4.1 | 186GB | gcloud storage cp --recursive gs://biomodal-data/mouse/evoC/1.4.1 ./ |
mouse/evoC/1.4.1/controls | duet evoC mouse datasets using duet 1.4.1 | 4MB | gcloud storage cp --recursive gs://biomodal-data/mouse/evoC/1.4.1/controls ./ |
mouse/evoC/1.4.1/diagnostics | duet evoC mouse datasets (with resolved FASTQ files) using duet 1.4.1 | 115GB | gcloud storage cp --recursive gs://biomodal-data/mouse/evoC/1.4.1/diagnostics ./ |
mouse/evoC/1.4.1/reports | duet evoC mouse datasets using duet 1.4.1 | 5MB | gcloud storage cp --recursive gs://biomodal-data/mouse/evoC/1.4.1/reports ./ |
mouse/evoC/1.4.1/sample_outputs | duet evoC mouse datasets using duet 1.4.1 | 71GB | gcloud storage cp --recursive gs://biomodal-data/mouse/evoC/1.4.1/sample_outputs ./ |
mouse/evoC/1.4.1/variant_call_files | duet evoC mouse datasets using duet 1.4.1 | 225MB | gcloud storage cp --recursive gs://biomodal-data/mouse/evoC/1.4.1/variant_call_files ./ |
For additional information and/or assistance, please contact support@biomodal.com.