28 Jan 2025

DNA size selection testing with PEG8000

For long read sequencing it is important to have as little as possible short DNA fragments. John Tyson from Snutch Lab, Canada, published on the web the wonderful “Bead-free long fragment LSK109 library preparation” with results from their experimentation with size selection using PEG buffer and centrifugation. Inspired by their promising results I have done my own size selection tests with PEG8000.

Going straight to the main finding rather than boring you with the minutia of the methods:

  •      Incubation before centrifugation is beneficial; it reduces the loss of HMW DNA to the supernatant
  •       The less the starting amount of DNA, the lower the recovery, but also the higher the size cutoff
  •        It’s a good idea to keep the supernatant as it is easy to lose the pellet
  •        Not all species’ DNA seem to behave the same (see above bullet point)


PEG8000 size selection testing of genomic DNA with and without incubating the mix for 30 min before centrifugation. Notice the larger amount of HMW DNA in the supernatant with no incubation step. 0.33% Megabase agarose gel run for 2 h at 0.8V/cm. 


Increasing the starting amount of DNA increases the recovery, but at the cost of retaining smaller sized DNA.

 


The effect of varying incubation times and the final ratio of buffer to DNA. There seems to be little improvement beyond 30 minutes of incubation. Lowering the buffer:DNA ratio lowers the recovery (by removing increasingly larger fragment sizes), but at the risk of completely losing all the DNA. At a ratio of 0.9:1 there was no distinct cutoff; all fragment sizes were present in the supernatant.


Methods

“Rocky Mountains” 9% size selection buffer

Reagent

Unit

Stock conc

Final conc

Input ul

Tris-Cl pH8

mM

1000

10

1.5

NaCl

M

5

1

30

PEG8000

% w/v

25

9

54

H2O

ul

 

 

64.5

Total

ul

 

 

150


  •     Buffers were made fresh on the day of each test
  •     Gently mix equal volumes (60 ul buffer + 60 ul DNA) of "Rocky Mountains" buffer and DNA in an Eppendorf LoBind tube
  •     Incubate at room temperature for 1 hour in darkness. Keep the tube vertical
  •     30 min centrifugation at 12K g at room temperature
  •     Remove the supernatant carefully by pipetting (optionally keep it and recover the DNA)
  •     Carefully add 200 ul 70% ethanol to the side of the tube wall and centrifuge at 12K g for 5 min at room temperature
  •     Remove the ethanol carefully
  •     Repeat the previous two steps
  •     Remove the ethanol slowly by pipetting. If done slowly enough, the surface tension should prevent any droplets from remaining on the tube walls, and ~100 % of the ethanol can be removed
  •     If no ethanol droplets are visible, allow the pellet to dry for 1 minute before eluting in TE buffer

 

The DNA from the supernatants were recovered with a 1:1 standard SPRI cleanup. Both the size-selected DNA and supernatant DNA were quantified by Nanodrop and Qubit and were run on Megabase agarose to find the approximate size cutoff. I used two pools of DNA; one from a fish and one from a shrimp species, that contained both HMW DNA and a good amount of smear. 





CS DNA in Nanopore sequencing

This post is way out of date; Guppy is no longer the basecaller in common use, but nevertheless I feel it may be of some interest.

In some recent runs I had a closer look at the CS DNA (Control Strand DNA; a 3560 bp piece of the Lambda genome. Guppy will detect and place CS DNA in a separate folder, however, in NanoPlot I noticed a sharp peak in the passed reads that I suspected were from leftover CS DNA. 




Next I had a look at the reads in the 'calibration_strands' folder, which contains the CS DNA detected by Guppy. The reads were all between 3000 - 3800 bp. This is the default size range where Guppy will look for CS DNA. Users can specify different size ranges in the basecaller script if desired, but this will slow down the basecalling. 

Next, I mapped the 'passed' reads onto the CS DNA reference sequence, to see if anything would align. And it did; around 75 Mbases (compared to around 700 Mbases in the 'calibration_strands' folder). So ~10 % of CS DNA was not detected and removed by Guppy. The gap in the  3000 - 3800 bp range shows how only CS DNA reads in this size range is removed.  





I am guessing reads above 3.8 Kbases are chimeras. Around 3% of the CS DNA bases (and incidentally, also ~3% of the CS DNA reads) came from reads longer than 3.8 Kbases. Is this an indication that around 3% of all reads in this run are chimeric? For some context, Wick et al (2021) reported 0.88 and 1.41% chimeric reads in two ligation runs, and also noted that chimeric read rates up to ~5% do not impact assembly qualities of bacterial genomes.  

Lastly, and this is hardly worth mentioning, I aligned the passed reads to the full genome sequence of the lambda phage, to see if anything aligned outside the amplified region. A negligible amount of reads did so - only around 0.03% of all the CS DNA reads. 

I feel like there is some untapped potential in CS DNA. With live basecalling enabled, during a sequencing run it could show error rates; perhaps give an indication of the molarity of your library in the form of ratio between reads from CS DNA to reads from your library; and possibly also an indication of prevalence of chimeric reads. It is already possible in MinKNOW to align reads to a reference during a run.