28 Jan 2025

CS DNA in Nanopore sequencing

This post is way out of date; Guppy is no longer the basecaller in common use, but nevertheless I feel it may be of some interest.

In some recent runs I had a closer look at the CS DNA (Control Strand DNA; a 3560 bp piece of the Lambda genome. Guppy will detect and place CS DNA in a separate folder, however, in NanoPlot I noticed a sharp peak in the passed reads that I suspected were from leftover CS DNA. 




Next I had a look at the reads in the 'calibration_strands' folder, which contains the CS DNA detected by Guppy. The reads were all between 3000 - 3800 bp. This is the default size range where Guppy will look for CS DNA. Users can specify different size ranges in the basecaller script if desired, but this will slow down the basecalling. 

Next, I mapped the 'passed' reads onto the CS DNA reference sequence, to see if anything would align. And it did; around 75 Mbases (compared to around 700 Mbases in the 'calibration_strands' folder). So ~10 % of CS DNA was not detected and removed by Guppy. The gap in the  3000 - 3800 bp range shows how only CS DNA reads in this size range is removed.  





I am guessing reads above 3.8 Kbases are chimeras. Around 3% of the CS DNA bases (and incidentally, also ~3% of the CS DNA reads) came from reads longer than 3.8 Kbases. Is this an indication that around 3% of all reads in this run are chimeric? For some context, Wick et al (2021) reported 0.88 and 1.41% chimeric reads in two ligation runs, and also noted that chimeric read rates up to ~5% do not impact assembly qualities of bacterial genomes.  

Lastly, and this is hardly worth mentioning, I aligned the passed reads to the full genome sequence of the lambda phage, to see if anything aligned outside the amplified region. A negligible amount of reads did so - only around 0.03% of all the CS DNA reads. 

I feel like there is some untapped potential in CS DNA. With live basecalling enabled, during a sequencing run it could show error rates; perhaps give an indication of the molarity of your library in the form of ratio between reads from CS DNA to reads from your library; and possibly also an indication of prevalence of chimeric reads. It is already possible in MinKNOW to align reads to a reference during a run.