HPC-recommendations

Static Badge Static Badge Static Badge

Table of Contents

(back to main documentation)

7. LSF and SLURM Recommendations

You should ensure that your local cluster queues and relevant cluster requirements are reflected in the nextflow.config file in the biomodal script folder. However, some settings may be less obvious so we added a few recommendations below.

(back to main documentation) | (back to top)

7.1. Limited local disk space available for temporary files

If your cluster has limited local disk space available for temporary files, please consider binding a larger local tmp directory location using the following settings in the nextflow.config file in the biomodal script folder. This will allow the duet containers to use this directory for temporary files (/tmp).

This is especially important when using Singularity or Apptainer containers, as the temporary files are written to the /tmp directory inside some containers.

singularity {
  enabled    = true
  autoMounts = true
  runOptions = "--bind /<your local tmp path>:/tmp"
}

You can alternatively choose to directly map TMP and cache environment variables at exactly what they represents in your envuronment. You can add this alternative method in the nextflow.config file in the biomodal script folder:

singularity {
  enabled    = true
  autoMounts = true
  envWhitelist = "TMPDIR,SINGULARITY_TMPDIR,SINGULARITY_CACHEDIR,NXF_SINGULARITY_CACHEDIR"
  runOptions = '--bind "$TMPDIR"'
}

Additionally, try adding wider cluster options to the process section in nextflow.config if you are still experiencing issues related to temporary files:

process {
  executor = "slurm"
  containerOptions = "--bind /<your local tmp path>:/tmp"
}

We strongly recommend that both TMPDIR and SINGULARITY_TMPDIR environment variables are set on the head/controller node before any jobs are submitted. This should ensure that the temporary files are written to the correct location.

export TMPDIR=<your local tmp path>
export SINGULARITY_TMPDIR=$TMPDIR

Please note; If you see issues similar to: Error in tempfile() using template /<your local tmp path>/parXXXXX.par: Parent directory (/<your local tmp path>/) does not exist at /venv/bin/parallel, you should verify that both the TMPDIR and SINGULARITY_TMPDIR environment variables are set correctly and that you have added the relevant --bind variables as descrived above.

Similarly, you can check that the optional SINGULARITY_CACHEDIR and NXF_SINGULARITY_CACHEDIR environment variables are pointing to the same location as the libraryDir = "<some-location>/singularity-images" in the nextflow.config file in the biomodal script folder to ensure that the singularity containers are cached in the correct location.

export SINGULARITY_CACHEDIR=<same-path-as-libraryDir-in-nextflow-config>
export NXF_SINGULARITY_CACHEDIR=$SINGULARITY_CACHEDIR

If you notice that Nextflow attempts to directly download containers when you run the duet pipeline, you should check that both SINGULARITY_CACHEDIR and NXF_SINGULARITY_CACHEDIR environment variables are set correctly and that the libraryDir in the nextflow.config file in the biomodal script folder is pointing to the same location. Nextflow first checks the library directory when searching for the image. If the image is not found it then checks the cache directory.

Please note that dependent on the software on your cluster, you need to set Singularity or Apptainer environment variables respctively.

Singularity variable Apptainer variable
SINGULARITY_CACHEDIR APPTAINER_CACHEDIR
SINGULARITYENV_TMPDIR APPTAINERENV_TMPDIR
SINGULARITYENV_NXF_TASK_WORKDIR APPTAINERENV_NXF_TASK_WORKDIR
SINGULARITYENV_NXF_DEBUG APPTAINERENV_NXF_DEBUG

The biomodal CLI expects to find singularity in $PATH, so please make sure that Apptainer installations include a symlink to the apptainer binary named singularity.

These settings should be added to your cluster environment configuration files, such as .bashrc, .bash_profile, .profile, or similar, to ensure that they are set correctly when the jobs are submitted on your cluster.
Please note that defining these variables withing your startup scripts, will define these variables for the user, and could impact other pipelines using Apptainer/Singularity, so please consult with your cluster administrator before making these changes.
If not using duet software for an extended period of time, it is recommended to remove or comment out these lines until resuming analysis with the duet software pipeline.

alternatively you can add these enviroment variables to envWhitelist as described above.

(back to main documentation) | (back to top)

7.2. Per-task or CPU memory reservation for LSF

Default LSF cluster settings works with a per-core memory limits mode, so it divides the requested memory by the number of requested cpus. If your LFS cluster is configered differently, it is recommended that you try to add the following settings to the LSF executor section in the nextflow.config file in the biomodal script folder.

executor {
  name = "lsf"
  perTaskReserve = false
  perJobMemLimit = true
}

(back to main documentation) | (back to top)

7.3. LSF executor scratch space

You can dynamically use LSF scratch space per job using the following settings in the nextflow.config file in the biomodal script folder.

process {
  executor = "lsf"
  scratch = "$SCRATCHDIR/$LSB_JOBID"
}

(back to main documentation) | (back to top)

7.4. Wall-time and minimum CPU settings

In some clusters, it is recommended to set a “wall-time”, i.e. max time a job can run before it is terminated. There may also be a “minimum CPU” requirement to start jobs in your cluster. You can adjust the Nextflow time and cpuNum parameters in the nextflow.config file in the biomodal script folder.

process {
  executor = "slurm"
  time = "24h"
  params.cpuNum = 1
}

(back to main documentation) | (back to top)

7.5. Setting specific Queue, CPU, RAM and DISK per pipeline or workflow module

In the nextflow.config file in the biomodal script folder, you can set specific Queues, CPU, RAM or DISK requirements per module by using the withName parameter. For example, you can set the CPU, RAM or DISK requirements for the BWA_MEM2 and PRELUDE modules as follows:

process { 
  //Example of default queue settings 
  cpus     = 16  
  memory   = "16GB"
  queue    = "<add the name name of your default queue>"  
  time     = "24h"
  disk     = "800GB"

  //Example of setting per module specific queuesettings 
  withName: 'BWA_MEM2' { memory = "64GB"
                         queue = "<add the name name of a larger queue>" 
                         time = "48h" }
  withName: 'PRELUDE'  { cpus = 32
                         memory = "32GB"
                         queue = "<add the name name of a larger queue>" }
}

(back to main documentation) | (back to top)

7.6. Setting specific Memory settings for SGE/OGE/UGE HPC clusters

If you are on a SGE/OGE/UGE HPC cluster that is not supporting h_rt, h_rss or mem_free settings, you can try to add one of the following settings. Either globaly:

process { 
  //Example of default queue settings 
  cpus     = 16  
  memory   = "16GB"
  queue    = "<name-of-small-queue>"  
  time     = "24h"
  disk     = "800GB"
  clusterOptions = { "-l h_vmem=${task.memory.toString().replaceAll(/[\sB]/,'')}" }

or diretly in the relevant withName sections if you like to remove other memory specific cluster settings:

withName: 'BWA_MEM2' { cpus = 32
                       clusterOptions = "-l h_vmem=64GB"
                       memory = null
                       queue = "<name-of-preferred-queue>" 
                       time = "48h" }  

If your cluster expects a per core rather than per job memory limit, you should adjuse the h_vmem setting to reflect the per core memory limit.

(back to main documentation) | (back to top) | (Next)

Cambridge Epigenetix is now biomodal