Skip to content

Instantly share code, notes, and snippets.

@wolegeyun
Forked from sirselim/basecalling_notes.md
Created April 10, 2023 08:32
Show Gist options
  • Select an option

  • Save wolegeyun/ae6acb0b67b456043fe058cbf138403e to your computer and use it in GitHub Desktop.

Select an option

Save wolegeyun/ae6acb0b67b456043fe058cbf138403e to your computer and use it in GitHub Desktop.
a collection of my notes while working on nanopore basecalling on the Jetson Xavier

Jetson Xavier basecalling notes

initial basecalling runs

'fast' flip-flop calling on the Jetson Xavier

guppy_basecaller --disable_pings --compress_fastq -c dna_r9.4.1_450bps_fast.cfg -i flongle_fast5_pass/ -s flongle_test2 -x 'auto' --recursive 
high-accuracy calling with base modifications on the Jetson Xavier
guppy_basecaller --disable_pings --compress_fastq -c dna_r9.4.1_450bps_modbases_dam-dcm-cpg_hac.cfg --fast5_out -i flongle_fast5_pass/ -s flongle_hac_fastq -x 'auto' --recursive 
$ guppy_basecaller --compress_fastq -c dna_r9.4.1_450bps_modbases_dam-dcm-cpg_hac.cfg -i flongle_fast5_pass/ -s flongle_hac_fastq -x 'auto' --recursive
ONT Guppy basecalling software version 3.4.1+213a60d0
config file:        /opt/ont/guppy/data/dna_r9.4.1_450bps_modbases_dam-dcm-cpg_hac.cfg
model file:         /opt/ont/guppy/data/template_r9.4.1_450bps_modbases_dam-dcm-cpg_hac.jsn
input path:         flongle_fast5_pass/
save path:          flongle_hac_fastq
chunk size:         1000
chunks per runner:  512
records per file:   4000
fastq compression:  ON
num basecallers:    1
gpu device:         auto
kernel path:
runners per device: 4

Found 105 fast5 files to process.
Init time: 2790 ms

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Caller time: 2493578 ms, Samples called: 3970728746, samples/s: 1.59238e+06
Finishing up any open output files.
Basecalling completed successfully.

So from the above we see in high accuracy mode it take the Xavier ~41 minutes to complete the base calling using the default configuration files. For reference the fast calling mode was ~8 minutes.

optimising settings for Jetson Xavier

  • When performing GPU basecalling there is always one CPU support thread per GPU caller, so the number of callers (--num_callers) dictates the maximum number of CPU threads used.
  • Max chunks per runner (--chunks_per_runner): The maximum number of chunks which can be submitted to a single neural network runner before it starts computation. Increasing this figure will increase GPU basecalling performance when it is enabled.
  • Number of GPU runners per device (--gpu_runners_per_device): The number of neural network runners to create per CUDA device. Increasing this number may improve performance on GPUs with a large number of compute cores, but will increase GPU memory use. This option only affects GPU calling.

There is a rough equation to estimate amount of ram:

runners * chunks_per_runner * chunk_size < 100000 * [max GPU memory in GB]

For example, a GPU with 8 GB of memory would require:

runners * chunks_per_runner * chunk_size < 800000

some suggested settings from ONT

NVIDIA Jetson TX2
--num_callers 1
--gpu_runners_per_device 2
--chunks_per_runner 48

from hac config file (dna_r9.4.1_450bps_modbases_dam-dcm-cpg_hac.cfg)

chunk_size                          = 1000
gpu_runners_per_device              = 4
chunks_per_runner                   = 512
chunks_per_caller                   = 10000

modified testing

'fast' flip-flop calling on the Jetson Xavier

guppy_basecaller --disable_pings --compress_fastq -c dna_r9.4.1_450bps_fast.cfg -i flongle_fast5_pass/ \
  -s flongle_test2 -x 'auto' --recursive --num_callers 4 --gpu_runners_per_device 8 --chunks_per_runner 256
$ guppy_basecaller --disable_pings --compress_fastq -c dna_r9.4.1_450bps_fast.cfg \
-i flongle_fast5_pass/ -s flongle_test2 -x 'auto' --recursive --num_callers 4 \
--gpu_runners_per_device 8 --chunks_per_runner 256
ONT Guppy basecalling software version 3.4.1+213a60d0
config file:        /opt/ont/guppy/data/dna_r9.4.1_450bps_fast.cfg
model file:         /opt/ont/guppy/data/template_r9.4.1_450bps_fast.jsn
input path:         flongle_fast5_pass/
save path:          flongle_test2
chunk size:         1000
chunks per runner:  256
records per file:   4000
fastq compression:  ON
num basecallers:    4
gpu device:         auto
kernel path:
runners per device: 8

Found 105 fast5 files to process.
Init time: 880 ms

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Caller time: 428745 ms, Samples called: 3970269916, samples/s: 9.26021e+06
Finishing up any open output files.
Basecalling completed successfully.

I was able to shave a minute off the fast model on the Xavier (above) getting it down to ~7 minutes.

jetson_xavier_jtop_screenshot

Update: (13th Dec 2019)

Just modifying the number of chunks per running has allowed me to get the time down to under 6.5 mins (see table below).

chunks_per_runner time
(160) default ~8 mins
256 7 mins 6 secs
512 6 mins 28 secs
1024 6 min 23 secs

It looks like we might have reached an optimal point here. Next I'll test some of the other parameters and see if we can speed this up further.

high-accuracy calling with base modifications on the Jetson Xavier

guppy_basecaller --disable_pings --compress_fastq -c dna_r9.4.1_450bps_modbases_dam-dcm-cpg_hac.cfg \
  --num_callers 4 --gpu_runners_per_device 8 --fast5_out -i flongle_fast5_pass/ \
  -s flongle_hac_basemod_fastq -x 'auto' --recursive
increased number of callers
$ guppy_basecaller --disable_pings --compress_fastq -c dna_r9.4.1_450bps_fast.cfg \
  -i flongle_fast5_pass/ -s flongle_test2 -x 'auto' --recursive --num_callers 8 \
  --gpu_runners_per_device 8 --chunks_per_runner 1024
ONT Guppy basecalling software version 3.4.1+213a60d0
config file:        /opt/ont/guppy/data/dna_r9.4.1_450bps_fast.cfg
model file:         /opt/ont/guppy/data/template_r9.4.1_450bps_fast.jsn
input path:         flongle_fast5_pass/
save path:          flongle_test2
chunk size:         1000
chunks per runner:  1024
records per file:   4000
fastq compression:  ON
num basecallers:    8 
gpu device:         auto
kernel path:
runners per device: 8 

Found 105 fast5 files to process.
Init time: 897 ms

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Caller time: 383865 ms, Samples called: 3970269916, samples/s: 1.03429e+07
Finishing up any open output files.
Basecalling completed successfully.
increased chunk size
$ guppy_basecaller --disable_pings --compress_fastq -c dna_r9.4.1_450bps_fast.cfg \
  -i flongle_fast5_pass/ -s flongle_test2 -x 'auto' --recursive --num_callers 4 \
  --gpu_runners_per_device 8 --chunks_per_runner 1024 --chunk_size 2000
ONT Guppy basecalling software version 3.4.1+213a60d0
config file:        /opt/ont/guppy/data/dna_r9.4.1_450bps_fast.cfg
model file:         /opt/ont/guppy/data/template_r9.4.1_450bps_fast.jsn
input path:         flongle_fast5_pass/
save path:          flongle_test2
chunk size:         2000
chunks per runner:  1024
records per file:   4000
fastq compression:  ON
num basecallers:    4
gpu device:         auto
kernel path:
runners per device: 8

Found 105 fast5 files to process.
Init time: 1180 ms

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Caller time: 503532 ms, Samples called: 3970269916, samples/s: 7.88484e+06
Finishing up any open output files.
Basecalling completed successfully.
increased runners per device and number of callers
$ guppy_basecaller --disable_pings --compress_fastq -c dna_r9.4.1_450bps_fast.cfg \
  -i flongle_fast5_pass/ -s flongle_test2 -x 'auto' --recursive --num_callers 8 \
  --gpu_runners_per_device 16 --chunks_per_runner 1024 --chunk_size 1000
ONT Guppy basecalling software version 3.4.1+213a60d0
config file:        /opt/ont/guppy/data/dna_r9.4.1_450bps_fast.cfg
model file:         /opt/ont/guppy/data/template_r9.4.1_450bps_fast.jsn
input path:         flongle_fast5_pass/
save path:          flongle_test2
chunk size:         1000
chunks per runner:  1024
records per file:   4000
fastq compression:  ON
num basecallers:    8
gpu device:         auto
kernel path:
runners per device: 16

Found 105 fast5 files to process.
Init time: 1113 ms

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Caller time: 383466 ms, Samples called: 3970269916, samples/s: 1.03536e+07
Finishing up any open output files.
Basecalling completed successfully.

current 'optimal' parameters

The below parameters seem to provide the 'optimal' speed increase with a resultant run time of 6 mins and 23 secs.

$ guppy_basecaller --disable_pings --compress_fastq -c dna_r9.4.1_450bps_fast.cfg \
  -i flongle_fast5_pass/ -s flongle_test2 -x 'auto' --recursive --num_callers 4 \
  --gpu_runners_per_device 8 --chunks_per_runner 1024 --chunk_size 1000
ONT Guppy basecalling software version 3.4.1+213a60d0
config file:        /opt/ont/guppy/data/dna_r9.4.1_450bps_fast.cfg
model file:         /opt/ont/guppy/data/template_r9.4.1_450bps_fast.jsn
input path:         flongle_fast5_pass/
save path:          flongle_test2
chunk size:         1000
chunks per runner:  1024
records per file:   4000
fastq compression:  ON
num basecallers:    4
gpu device:         auto
kernel path:
runners per device: 8

Found 105 fast5 files to process.
Init time: 926 ms

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Caller time: 382714 ms, Samples called: 3970269916, samples/s: 1.0374e+07
Finishing up any open output files.
Basecalling completed successfully.

potential V100 examples

V100 config example for high accuracy model
guppy_basecaller \
--disable_pings \
--compress_fastq \
-c dna_r9.4.1_450bps_modbases_dam-dcm-cpg_hac.cfg \
--ipc_threads 16 \
--num_callers 8 \
--gpu_runners_per_device 4 \
--chunks_per_runner 512 \
--device "cuda:0 cuda:1" \ # this parameter should now scale nicely across both cards, I haven't checked though
--recursive \
--fast5_out \
-i fast5_input \
-s fastq_output
V100 config example for fast calling model
guppy_basecaller \
--disable_pings \
--compress_fastq \
-c dna_r9.4.1_450bps_fast.cfg \
--ipc_threads 16 \
--num_callers 8 \
--gpu_runners_per_device 64 \
--chunks_per_runner 256 \
--device "cuda:0 cuda:1" \
--recursive \
-i fast5_input \
-s fastq_output

Titan RTX

There has been some discussion about the recent release of Guppy (3.4.1 and 3.4.2) in terms of speed. I was interested in running some benchmarks across different versions. I had a hunch it may have been something to do with the newly introduced compression of the fast5 files...

Test parameters

The only things I am changing are the version of Guppy being used, and in the case of 3.4.3 I am trying with and without vbz compression of the fast5 files. Everything else is as below:

System:

  • Debian Sid (unstable)
  • 2x 12-Core Intel Xeon Gold 5118 (48 threads)
  • 256Gb RAM
  • Titan RTX
  • Nvidia drivers - 418.56

Guppy GPU basecalling parameters:

  • disable_pings
  • compress_fastq
  • dna_r9.4.1_450bps_fast.cfg
  • num_callers 8
  • gpu_runners_per_device 64
  • chunks_per_runner 256
  • device "cuda:0"
  • recursive

Results

guppy version time (seconds) samples/s
3.1.5# 93.278 4.25638e+07
3.2.4# 94.141 4.21737e+07
3.3.0# 94.953 4.1813e+07
3.3.3# 95.802 4.14425e+07
3.4.3 (no vbz compressed fast5) 270.953 1.4653e+07
3.4.3 (vbz compressed fast5) 82.877 4.79056e+07

# these versions of Guppy did not support vbz compression of fast5 files.

You can view the 'raw' results/output for each run below:

Guppy 3.1.5

~/Downloads/software/guppy/3.1.5/ont-guppy/bin/guppy_basecaller \
    --disable_pings \
    --compress_fastq \
    -c dna_r9.4.1_450bps_fast.cfg \
    --num_callers 8 \
    --gpu_runners_per_device 64 \
    --chunks_per_runner 256 \
    --device "cuda:0" \
    --recursive \
    -i flongle_fast5_pass \
    -s testrun_fast_3.1.5

ONT Guppy basecalling software version 3.1.5+781ed57
config file:        /home/miles/Downloads/software/guppy/3.1.5/ont-guppy/data/dna_r9.4.1_450bps_fast.cfg
model file:         /home/miles/Downloads/software/guppy/3.1.5/ont-guppy/data/template_r9.4.1_450bps_fast.jsn
input path:         flongle_fast5_pass
save path:          testrun_fast_3.1.5
chunk size:         1000
chunks per runner:  256
records per file:   4000
fastq compression:  ON
num basecallers:    8
gpu device:         cuda:0
kernel path:        
runners per device: 64

Found 105 fast5 files to process.
Init time: 1000 ms

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Caller time: 93278 ms, Samples called: 3970269916, samples/s: 4.25638e+07
Finishing up any open output files.
Basecalling completed successfully.

Guppy 3.2.4

~/Downloads/software/guppy/3.2.4/ont-guppy/bin/guppy_basecaller \
    --disable_pings \
    --compress_fastq \
    -c dna_r9.4.1_450bps_fast.cfg \
    --num_callers 8 \
    --gpu_runners_per_device 64 \
    --chunks_per_runner 256 \
    --device "cuda:0" \
    --recursive \
    -i flongle_fast5_pass \
    -s testrun_fast_3.2.4

ONT Guppy basecalling software version 3.2.4+d9ed22f
config file:        /home/miles/Downloads/software/guppy/3.2.4/ont-guppy/data/dna_r9.4.1_450bps_fast.cfg
model file:         /home/miles/Downloads/software/guppy/3.2.4/ont-guppy/data/template_r9.4.1_450bps_fast.jsn
input path:         flongle_fast5_pass
save path:          testrun_fast_3.2.4
chunk size:         1000
chunks per runner:  256
records per file:   4000
fastq compression:  ON
num basecallers:    8
gpu device:         cuda:0
kernel path:        
runners per device: 64

Found 105 fast5 files to process.
Init time: 836 ms

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Caller time: 94141 ms, Samples called: 3970269916, samples/s: 4.21737e+07
Finishing up any open output files.
Basecalling completed successfully.

Guppy 3.3.0

~/Downloads/software/guppy/3.3.0/ont-guppy/bin/guppy_basecaller \
    --disable_pings \
    --compress_fastq \
    -c dna_r9.4.1_450bps_fast.cfg \
    --num_callers 8 \
    --gpu_runners_per_device 64 \
    --chunks_per_runner 256 \
    --device "cuda:0" \
    --recursive \
    -i flongle_fast5_pass \
    -s testrun_fast_3.3.0

ONT Guppy basecalling software version 3.3.0+ef22818
config file:        /home/miles/Downloads/software/guppy/3.3.0/ont-guppy/data/dna_r9.4.1_450bps_fast.cfg
model file:         /home/miles/Downloads/software/guppy/3.3.0/ont-guppy/data/template_r9.4.1_450bps_fast.jsn
input path:         flongle_fast5_pass
save path:          testrun_fast_3.3.0
chunk size:         1000
chunks per runner:  256
records per file:   4000
fastq compression:  ON
num basecallers:    8
gpu device:         cuda:0
kernel path:        
runners per device: 64

Found 105 fast5 files to process.
Init time: 722 ms

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Caller time: 94953 ms, Samples called: 3970269916, samples/s: 4.1813e+07
Finishing up any open output files.
Basecalling completed successfully.

Guppy 3.3.3

~/Downloads/software/guppy/3.3.3/ont-guppy/bin/guppy_basecaller \
    --disable_pings \
    --compress_fastq \
    -c dna_r9.4.1_450bps_fast.cfg \
    --num_callers 8 \
    --gpu_runners_per_device 64 \
    --chunks_per_runner 256 \
    --device "cuda:0" \
    --recursive \
    -i flongle_fast5_pass \
    -s testrun_fast_3.3.3

ONT Guppy basecalling software version 3.3.3+fa743a6
config file:        /home/miles/Downloads/software/guppy/3.3.3/ont-guppy/data/dna_r9.4.1_450bps_fast.cfg
model file:         /home/miles/Downloads/software/guppy/3.3.3/ont-guppy/data/template_r9.4.1_450bps_fast.jsn
input path:         flongle_fast5_pass
save path:          testrun_fast_3.3.3
chunk size:         1000
chunks per runner:  256
records per file:   4000
fastq compression:  ON
num basecallers:    8
gpu device:         cuda:0
kernel path:        
runners per device: 64

Found 105 fast5 files to process.
Init time: 726 ms

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Caller time: 95802 ms, Samples called: 3970269916, samples/s: 4.14425e+07
Finishing up any open output files.
Basecalling completed successfully.

Guppy 3.4.3 (not compressed)

~/Downloads/software/guppy/3.4.3/ont-guppy/bin/guppy_basecaller \
    --disable_pings \
    --compress_fastq \
    -c dna_r9.4.1_450bps_fast.cfg \
    --num_callers 8 \
    --gpu_runners_per_device 64 \
    --chunks_per_runner 256 \
    --device "cuda:0" \
    --recursive \
    -i flongle_fast5_pass \
    -s testrun_fast_3.4.3_uncompressed

ONT Guppy basecalling software version 3.4.3+f4fc735
config file:        /home/miles/Downloads/software/guppy/3.4.3/ont-guppy/data/dna_r9.4.1_450bps_fast.cfg
model file:         /home/miles/Downloads/software/guppy/3.4.3/ont-guppy/data/template_r9.4.1_450bps_fast.jsn
input path:         flongle_fast5_pass
save path:          testrun_fast_3.4.3_uncompressed
chunk size:         1000
chunks per runner:  256
records per file:   4000
fastq compression:  ON
num basecallers:    8
gpu device:         cuda:0
kernel path:
runners per device: 64

Found 105 fast5 files to process.
Init time: 738 ms

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Caller time: 270953 ms, Samples called: 3970269916, samples/s: 1.4653e+07
Finishing up any open output files.
Basecalling completed successfully.

Guppy 3.4.3 (compressed)

~/Downloads/software/guppy/3.4.3/ont-guppy/bin/guppy_basecaller \
    --disable_pings \
    --compress_fastq \
    -c dna_r9.4.1_450bps_fast.cfg \
    --num_callers 8 \
    --gpu_runners_per_device 64 \
    --chunks_per_runner 256 \
    --device "cuda:0" \
    --recursive \
    -i flongle_compressed \
    -s testrun_fast_3.4.3

ONT Guppy basecalling software version 3.4.3+f4fc735
config file:        /home/miles/Downloads/software/guppy/3.4.3/ont-guppy/data/dna_r9.4.1_450bps_fast.cfg
model file:         /home/miles/Downloads/software/guppy/3.4.3/ont-guppy/data/template_r9.4.1_450bps_fast.jsn
input path:         flongle_compressed
save path:          testrun_fast_3.4.3
chunk size:         1000
chunks per runner:  256
records per file:   4000
fastq compression:  ON
num basecallers:    8
gpu device:         cuda:0
kernel path:        
runners per device: 64

Found 105 fast5 files to process.
Init time: 721 ms

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Caller time: 82877 ms, Samples called: 3970269916, samples/s: 4.79056e+07
Finishing up any open output files.
Basecalling completed successfully.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment