Skip to content

Instantly share code, notes, and snippets.

@MillironX
Last active January 3, 2022 20:40
Show Gist options
  • Select an option

  • Save MillironX/bd9606623b3ccfdfb72d77e2bd3dc213 to your computer and use it in GitHub Desktop.

Select an option

Save MillironX/bd9606623b3ccfdfb72d77e2bd3dc213 to your computer and use it in GitHub Desktop.
Nextflow Code Style Proposal

Nextflow Code Style Proposal

Code formatting

Shebang

All Nextflow documents should start with the following lines, regardless of whether or not the file is intended to be used as an entry script

#!/usr/bin/env nextflow
nextflow.enable.dsl = 2

Module Includes

Module includes should occur at the top of the file following the Shebang and before any logic has occurred.

Line limits

  • Indent using 4 spaces (no tabs), even in script and shell blocks
  • Nextflow code should be limited to 120 characters per line
  • script and shell blocks should be limited to 80 unindented characters (e.g. at one starting intent level, limit to 84 characters)

Naming

  • Use UPPER_CASE_WITH_UNDERSCORE_SEPARATORS for Process and Workflow names
  • Use CapitalizeEveryWordWithoutSeparators for Channel and Global variable names
  • Use camelCaseWitoutSeparators for input and output variable names
  • Use snake_case_underscore_separators for parameter and process label names

Process formatting

The order of directives and sections in each process should be

  1. tag
  2. label
  3. publishDir
  4. input:
  5. output:
  6. when:
  7. script:/shell:
  8. stub:

Any other directives for a process should be defined in a nextflow.config file

The Script/Shell Block

The purpose of the script block is the delegate the task to another program. It is NOT to perform logic.

It is acceptable to perform minimal Nextflow processing to convert pipeline parameters into program option flags. It is preferable to use ternary operators in most cases to perform such logic. In the case of branching logical paths, use Nextflow's if statement to create multiple script blocks.

// No
script:
"""
READTYPE="ont"
if [ ${params.pe} == "pe" ]; then
    READTYPE="pe"
fi
program --readtype \$READTYPE "${reads}" > badreads
"""

// Ok
script:
if (params.pe) {
    """
    program --readtype pe "${reads}" > goodreads
    """
}
else {
    """
    program --readtype ont "${reads}" > goodreads
    """
}

// Yes
script:
readType = (params.pe) ? 'pe' : 'ont'
"""
program --readtype ${readType} "${reads}" > betterreads
"""

If you feel you need to use a shell block, add the logic to a shell script in the bin folder instead, and call that from a script block.

Do not include shebangs for bash/shell script blocks. Shebangs would be included for Python/Perl/etc. script blocks, but these need to go into the bin folder as distinct scripts.

Brace-delimit all Nextflow variables, e.g. ${var}. In the unusual case where you need to reference a shell variable, do not brace-delimit shell variables.

Quote all file/path variables and strings variables that might contain spaces/meta-characters. Do not quote numeric variables.

Continue to use 4-space indentation, even though common convention is to use 2 spaces for shell scripts.

Keep each command on a single line, unless it violates the character limit. In that case, place each parameter on a separate line indented with four spaces.

Pipelines should be written on a single line unless they violate the character limit. In that case, each command should be its own line, with the pipes being indented four spaces.

When combining long commands and pipelines, all pipelines are single-indented, while all options are double-indented.

Use double backslashes for escaping newlines, as these will be translated into escaped newlines in the .command.sh script.

"""
command1 \\
        --option1 \\
        --option2 \\
    | command2 \\
        --option3 \\
    | command3 \\
    | command4 \\
        --option3 \\
        --option4
"""

Self-documenting code

Inspired by VB.NET and C#'s self-documenting code, use three slashes to distinguish a documentation block above a workflow or process declaration. Use YAML instead of XML, and multiple/tuple input and output need to be documented

/// summary: |
///   Classifies metagenomic sequencing reads
/// input:
///   - tuple:
///       - name: sampleName
///         type: val(String)
///         description: Unique identifier for this sample
///       - name: readFiles
///         type: path
///         description: The reads files to be classified
///   - name: db
///     type: path
///     description: The location of the Kraken2 database
/// output:
///   - name: classifiedReads
///     tuple:
///       - type: val(String)
///         description: Unique identifier for this sample
///       - type: path
///         description: Files containing reads that Kraken2 could classify
///   - name: unclassifiedReads
///     tuple:
///       - type: val(String)
///         description: Unique identifier for this sample
///       - type: path
///         description: Files containing reads that Kraken2 could not classify
///   - name: txt
///     tuple:
///       - type: val(String)
///         description: Unique identifier for this sample
///       - type: path
///         description: The read-by-read classification report generated by Kraken2
///   - name: version
///     type: path
///     description: Kraken2's self-reported version number
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment