28 Jan 2021

Benchmarking Nanopore basecallers: some observations on the Bonito basecaller

 We have sequenced several fish genomes on our MinION. Whenever there is a new version of the Guppy basecaller I re-basecall a small dataset from each species and align the raw sequences to previously published, independent references. Using Heng Li's one-liner for sequence identity, I get an estimate of the raw error rate of the sequences. 




Frequency distrubutions of percent identity to reference for Species 1. 


The Bonito 441 basecaller (using the res_dna_r941_min_crf_v031 model from Rerio) has a nice improvement in raw accuracy. At the moment this comes at the cost of slower basecalling speeds (~3 times slower on our GTX 1080 GPU). According to ONT a speed upgrade should be coming soon with a new Guppy release!

In my tests Bonito resulted in slightly less total bases, but slightly higher proportion of those reads mapped to the reference (using MiniMap2 and Samtools).
 



In Bonito the low default chunk_size of 720 may be reducing slightly the accuracies. Setting chunk_size instead to 1000 resulted in a small improvement in the accuracies. Setting it to 1200 or higher caused it to crash. 

Lastly, it seems the fastq quality scoring is broken in Bonito, seeing how there is no relation between the quality scores and percent match when mapping the reads to the reference genome (unlike in Guppy):





Plots were made in NanoPlot



25 Jan 2021

Basecalling on the MinION Mk1C - speed up by 3x!

We recently received our new Mk1C MinION sequencer/mini PC. It has a GPU for basecalling, but it is much weaker than the GTX1080 in our standalone MinION PC, so it will probably not be used much for basecalling. Since the Mk1C runs on Linux Ubuntu one can ssh in and run commands from the terminal. In this way I did some benchmarking with various Guppy parameters. This revealed that while the basecalling speed with the "fast" model cannot be improved much, the "HAC" (High Accuracy) model can be sped up by almost 3 times! 



Increasing the chunks_per_runner seems to be the only setting that makes much difference (thanks to https://github.com/sirselim/jetson_nanopore_sequencing) Increasing it to above 512 caused hangs and crashes. In one case I had to force reboot the Mk1C by pressing the power button for ~10 seconds. All tests were done on a single fast5 file using Guppy423 (MinION Release 20.10.3). Use these settings at your own risk! 
 
Best Mk1C basecaller speed:
guppy_basecaller --config dna_r9.4.1_450bps_hac.cfg --input_path /data/jon/fast5 --save_path /data/jon/Guppy423 --qscore_filtering --device auto --num_callers 1  --gpu_runners_per_device 2 --chunks_per_runner 512





Memory use
I used this command to log the memory use every five seconds:
top -d 5 -b | grep 'KiB Mem' >> freeMem.txt
Below is the minimum amount of free memory during each benchmark session (Hac model)

chunks_per_runner    free memory (MB)
48                                  816
256                                286
512                                  78


Getting temperature readings from the terminal:
As a Linux novice I just copy and paste commands I find online and hope it works: 
paste <(cat /sys/devices/virtual/thermal/thermal_zone*/type) <(cat /sys/devices/virtual/thermal/thermal_zone*/temp) | column -s $'\t' -t | sed 's/\(.\)..$/.\1°C/'

Example result:
BCPU-therm        36.5°C
MCPU-therm       36.5°C
GPU-therm          35.0°C
PLL-therm           36.5°C
Tboard_tegra       32.0°C
Tdiode_tegra       33.0°C
PMIC-Die          100.0°C
thermal-fan-est   35.9°C

The 100 degrees for the PMIC-Die is not real. I did a full basecalling of a previous run to see if the basecaller would be stable with the new settings, and there were no issues, but it took several days to complete. The temperatures never got very high. But the fan does make a bit of noise!