demo-data-instructions

duet multiomics solution demo datasets

This section covers reference sequencing data using the duet multiomics solution +modC or duet evoC reagents and bioinformatics pipeline. This data is freely available and can be utilised to explore the capabilities of +modC and duet evoC, assess benchmarks and reproduce performance, demo purposes and more. Please follow the instructions below for the desired dataset.

Table of Contents

+modC GIAB Dataset – Ashkenazi Trio

Experimental design

For this dataset, 7 GIAB gDNA samples including the Ashkenazi Trio covered in Table 1 below, each of 80 ng input and sonicated to 250 bp fragments, were prepared using the +modC kit and sequenced in duplicate at 2×151 using NovaSeq6000. Sequencing data was processed using biomodal pipeline v1.1.1 and downsampled to ~30X coverage for downstream analysis.

The following results are representative of routine processing that can be performed and achieved with the +modC kit.

Sample name Sample Origin Reads Mean Coverage
CEG1532-EL01-A1200-002 NA12878 (HG001) 1000M 31.3X
CEG1532-EL01-A1200-005 NA24385 (HG002) 1000M 31.2X
CEG1532-EL01-A1200-008 NA24149 (HG003) 1000M 31.3X
CEG1532-EL01-A1200-011 NA24143 (HG004) 1000M 31.3X
CEG1532-EL01-A1200-015 NA24631 (HG005) 1000M 31.4X
CEG1532-EL01-A1200-017 NA24694 (HG006) 1000M 31.4X
CEG1532-EL01-A1200-020 NA24695 (HG007) 1000M 31.0X

Analysis results

The dataset includes the following files and reports organised in subfolders as follows, with approximate sizes specified:

  • summary_report (500KB)
    Excel sheet containing summary metrics for all samples analysed. The metrics are associated with reads aligning to the reference genome and with reads aligning to the spike-in controls. The sheet includes common NGS workflow metrics and +modC specific metrics.

  • multiqc (1.6MB)
    Report including plots of the most important metrics for all samples including coverage, GC content and methylation bias.

  • allele_specific_methylation (1.6GB)
    Folder containing the Allele Specific Methylation (ASM) files with one row per heterozygote allele. Methylation on reads associated with each allele is quantified and a call of allele specific methylation is provided where asymmetry is significant in the Fisher’s Exact test.

  • dedup_genome_bams (435GB)
    Folder containing the Binary Alignment Map (BAM) files representing all sequences aligned to the reference genome. Methylation status is represented using a MM SAM tag.

  • genome_modc_quantification (32GB)
    Folder contains quantification files reporting modified cytosine content for each sample and analysed genomic context: CG, CHG, CHH.

  • variant_call_files (1.6GB)
    The Variant Call Format (VCF) file contains information about variants detected at specific positions in a reference genome.

duet evoC Dataset – NA12878 and mouse ES cell line

Experimental design – duet evoC

For this dataset, 1 GIAB gDNA sample from the dataset described above (80 ng input) and 1 mouse ES-E14 cell line were sonicated to 250 bp fragments. Libraries prepared using the duet evoC kit were sequenced in duplicate at 2×151 using NovaSeq6000. Sequencing data was processed using biomodal pipeline v1.1.3a2 and downsampled to ~30X coverage (1 billion reads) for downstream analysis.

The following results are representative of routine processing that can be performed and achieved with the duet evoC kit.

Sample name Sample Origin Reads Mean Coverage
CEG1485-EL01-D1115-001 ES-E14 (mouse) 1000M 32.8X
CEG1485-EL01-D1115-005 NA12878 (HG002) 1000M 32.6X

Analysis results – duet evoC

The dataset includes the following files and reports organised in subfolders as follows, with approximate sizes specified:

  • summary_report (500KB)
    Excel sheet containing summary metrics for all samples analysed. The metrics are associated with reads aligning to the reference genome and with reads aligning to the spike-in controls. The sheet includes common NGS workflow metrics and duet evoC specific metrics.

  • multiqc (1.5MB)
    Report including plots of the most important metrics for all samples including coverage, GC content and methylation bias.

  • allele_specific_methylation (26MB mouse; 277MB human)
    Folder containing the Allele Specific Methylation (ASM) files with one row per heterozygote allele. Methylation on reads associated with each allele is quantified and a call of allele specific methylation is provided where asymmetry is significant in the Fisher’s Exact test.

  • dedup_genome_bams (58GB mouse; 66GB human)
    Folder containing the Binary Alignment Map (BAM) files representing all sequences aligned to the reference genome. Methylation status is represented using a MM SAM tag.

  • genome_modc_quantification (430M mouse; 910M human)
    Folder containing quantification files reporting modified cytosine content (mC and hmC) for each sample at CG genomic contexts

  • variant_call_files (250MB mouse; 240M human)
    The Variant Call Format (VCF) file contains information about variants detected at specific positions in a reference genome.

Download instructions

The following instructions and details refer to the duet multiomics solution +modC and duet evoC demo data set made available by biomodal for demo and performance assessment purposes.

Setup: Install gcloud CLI and authenticatate

  1. Please create a Google account using your institutional email address by selecting “Use your existing email address” option during the account creation. If you already have a Google account, please omit this step.

  2. Download and install the Google CLI. Make sure that the gcloud init command is run to have authenticated using your Google account.

If you are using Windows OS and are not able to use bash, you can use the following commands to download the folders individually via the gcloud storage commands listed below for +modC and duet evoC datasets.

+modC files

Downloading fastq files

gcloud storage cp --recursive "gs://biomodal-data/giab/5L_comm_kit/1.1.1/nf-input" ./

Downloading pipeline output files

To see the available versions, use the following command:

gcloud storage ls "gs://biomodal-data/giab/5L_comm_kit"

Then set a version of the pipeline, for example:

export VERSION=1.3.0

To download the complete analysis output, use:

gcloud storage cp --recursive "gs://biomodal-data/giab/5L_comm_kit/${VERSION}/run1532_external" ./

You can choose to download a subset of the data by appending the appropriate suffix, for example to download just the summary reports:

gcloud storage cp --recursive "gs://biomodal-data/giab/5L_comm_kit/${VERSION}/run1532_external/summary_report" ./

To see what suffixes are available:

gcloud storage ls "gs://biomodal-data/giab/5L_comm_kit/${VERSION}/run1532_external"

duet evoC files

To see the available versions of the duet evoC datasets, use the following command:

gcloud storage ls "gs://biomodal-data/giab/6L"

Then set a version of the pipeline, for example:

export VERSION=1.3.0

duet evoC files can be downloaded using the following commands:

Human

Downloading duet evoC fastq files
gcloud storage cp --recursive "gs://biomodal-data/giab/6L/${VERSION}/nf-input" ./
Downloading duet evoC pipeline output files

Using the version you have already set, you can download the complete analysis output using:

gcloud storage cp --recursive "gs://biomodal-data/giab/6L/${VERSION}/run1485_external" ./

You can choose to download a subset of the data by appending the appropriate suffix, for example to download just the summary reports:

gcloud storage cp --recursive "gs://biomodal-data/giab/6L/${VERSION}/run1485_external/pipeline_report" ./

To see what suffixes are available:

gcloud storage ls "gs://biomodal-data/giab/6L/${VERSION}/run1485_external"

Mouse

Downloading fastq files for mouse
gcloud storage cp --recursive "gs://biomodal-data/mouse/6L/1.1.3a2/nf-input" ./
Downloading pipeline output files for mouse

To see the available versions, use the following command:

gcloud storage ls "gs://biomodal-data/mouse/6L"

Then set a version of the pipeline, for example:

export VERSION=1.3.0

To download the complete analysis output use:

gcloud storage cp --recursive "gs://biomodal-data/mouse/6L/${VERSION}/run1485_external" ./

You can choose to download a subset of the data by appending the appropriate suffix, for example to download just the summary reports:

gcloud storage cp --recursive "gs://biomodal-data/mouse/6L/${VERSION}/run1485_external/pipeline_report" ./

To see what suffixes are available:

gcloud storage ls "gs://biomodal-data/mouse/giab/6L/${VERSION}/run1485_external"

For additional information and/or assistance, please contact support@biomodal.com.

Cambridge Epigenetix is now biomodal