20 Jul 2021

Guppy update - "super-accurate" model

Towards the end of May Oxford Nanopore released a new version of the Guppy basecaller. This version includes the Bonito basecaller model, which I previously tested and found that the quality scoring was broken. You can now select among 3 models; fast, HAC, and sup, with sup ("super accurate") the slowest but most accurate. I put our five genomic test datasets through the new version, using the sup model. I am pleased to see that the quality scoring problem from Bonito has been fixed. The sup model shows a small increase in the raw accuracy. This comes at the cost of slower basecalling speeds. In conclusion, another nice upgrade in accuracy. One of these days I must do some assembly benchmarks* to see if this translates into better assemblies! Previous testing by a colleague of mine indicated that this was not always the case. 

* I just need to learn how to do assemblies :-)



Error rates were calculated using Heng Li's one liner. No quality trimming was applied, except for Species 5 which had a minimum quality score of 7. 




The read quality estimates are now at least somewhat correlated to how similar the sequences are to the reference.