28 Jan 2025

DNA size selection testing with PEG8000

For long read sequencing it is important to have as little as possible short DNA fragments. John Tyson from Snutch Lab, Canada, published on the web the wonderful “Bead-free long fragment LSK109 library preparation” with results from their experimentation with size selection using PEG buffer and centrifugation. Inspired by their promising results I have done my own size selection tests with PEG8000.

Going straight to the main finding rather than boring you with the minutia of the methods:

  •      Incubation before centrifugation is beneficial; it reduces the loss of HMW DNA to the supernatant
  •       The less the starting amount of DNA, the lower the recovery, but also the higher the size cutoff
  •        It’s a good idea to keep the supernatant as it is easy to lose the pellet
  •        Not all species’ DNA seem to behave the same (see above bullet point)


PEG8000 size selection testing of genomic DNA with and without incubating the mix for 30 min before centrifugation. Notice the larger amount of HMW DNA in the supernatant with no incubation step. 0.33% Megabase agarose gel run for 2 h at 0.8V/cm. 


Increasing the starting amount of DNA increases the recovery, but at the cost of retaining smaller sized DNA.

 


The effect of varying incubation times and the final ratio of buffer to DNA. There seems to be little improvement beyond 30 minutes of incubation. Lowering the buffer:DNA ratio lowers the recovery (by removing increasingly larger fragment sizes), but at the risk of completely losing all the DNA. At a ratio of 0.9:1 there was no distinct cutoff; all fragment sizes were present in the supernatant.


Methods

“Rocky Mountains” 9% size selection buffer

Reagent

Unit

Stock conc

Final conc

Input ul

Tris-Cl pH8

mM

1000

10

1.5

NaCl

M

5

1

30

PEG8000

% w/v

25

9

54

H2O

ul

 

 

64.5

Total

ul

 

 

150


  •     Buffers were made fresh on the day of each test
  •     Gently mix equal volumes (60 ul buffer + 60 ul DNA) of "Rocky Mountains" buffer and DNA in an Eppendorf LoBind tube
  •     Incubate at room temperature for 1 hour in darkness. Keep the tube vertical
  •     30 min centrifugation at 12K g at room temperature
  •     Remove the supernatant carefully by pipetting (optionally keep it and recover the DNA)
  •     Carefully add 200 ul 70% ethanol to the side of the tube wall and centrifuge at 12K g for 5 min at room temperature
  •     Remove the ethanol carefully
  •     Repeat the previous two steps
  •     Remove the ethanol slowly by pipetting. If done slowly enough, the surface tension should prevent any droplets from remaining on the tube walls, and ~100 % of the ethanol can be removed
  •     If no ethanol droplets are visible, allow the pellet to dry for 1 minute before eluting in TE buffer

 

The DNA from the supernatants were recovered with a 1:1 standard SPRI cleanup. Both the size-selected DNA and supernatant DNA were quantified by Nanodrop and Qubit and were run on Megabase agarose to find the approximate size cutoff. I used two pools of DNA; one from a fish and one from a shrimp species, that contained both HMW DNA and a good amount of smear. 





CS DNA in Nanopore sequencing

This post is way out of date; Guppy is no longer the basecaller in common use, but nevertheless I feel it may be of some interest.

In some recent runs I had a closer look at the CS DNA (Control Strand DNA; a 3560 bp piece of the Lambda genome. Guppy will detect and place CS DNA in a separate folder, however, in NanoPlot I noticed a sharp peak in the passed reads that I suspected were from leftover CS DNA. 




Next I had a look at the reads in the 'calibration_strands' folder, which contains the CS DNA detected by Guppy. The reads were all between 3000 - 3800 bp. This is the default size range where Guppy will look for CS DNA. Users can specify different size ranges in the basecaller script if desired, but this will slow down the basecalling. 

Next, I mapped the 'passed' reads onto the CS DNA reference sequence, to see if anything would align. And it did; around 75 Mbases (compared to around 700 Mbases in the 'calibration_strands' folder). So ~10 % of CS DNA was not detected and removed by Guppy. The gap in the  3000 - 3800 bp range shows how only CS DNA reads in this size range is removed.  





I am guessing reads above 3.8 Kbases are chimeras. Around 3% of the CS DNA bases (and incidentally, also ~3% of the CS DNA reads) came from reads longer than 3.8 Kbases. Is this an indication that around 3% of all reads in this run are chimeric? For some context, Wick et al (2021) reported 0.88 and 1.41% chimeric reads in two ligation runs, and also noted that chimeric read rates up to ~5% do not impact assembly qualities of bacterial genomes.  

Lastly, and this is hardly worth mentioning, I aligned the passed reads to the full genome sequence of the lambda phage, to see if anything aligned outside the amplified region. A negligible amount of reads did so - only around 0.03% of all the CS DNA reads. 

I feel like there is some untapped potential in CS DNA. With live basecalling enabled, during a sequencing run it could show error rates; perhaps give an indication of the molarity of your library in the form of ratio between reads from CS DNA to reads from your library; and possibly also an indication of prevalence of chimeric reads. It is already possible in MinKNOW to align reads to a reference during a run. 





13 Mar 2022

MiSeq post run washes: an update

In a previous post I described how our 5% bleach solution for post-run washes had gone bad. Recently I did three MiSeq runs in a row with various amplicons. This was a chance for me to investigate potential carryover contamination between runs, as usually much time passes and multiple washes occur between runs. I was surprised to find a correlation between amplicon read numbers between the first two runs. Carryover levels were around 0.02% , whereas Illumina's Technical Support Note indicates "as low as 0.001% " after carrying out the bleach post-run wash. Between the second and third runs I did the post-run wash twice; this seems to have eliminated any carryover. The current batch of bleach arrived ~8 months ago; this is already too long ago it seems. 



Run2 read numbers represent carryover contamination from Run1, for the loci ITS and Uni18s. Each point represents an index combination which was used in Run1 but not Run2. 



3 Dec 2021

ORG.one sequencing success (and some oddities)

ORG.one is an initiative from Oxford Nanopore wherein one can apply for free Nanopore consumables for genome sequencing of critically endangered species. Together with a colleague I applied earlier this year, and we were accepted. The sequencing reagents arrived a few weeks ago - two flow cells and one LSK110 sequencing kit. This is the first time I have used the LSK110 kit (until now I have used LSK109), and it seems to work very well. The yields were high, 34 and 37 Gbases, respectively. These flow cells stayed alive for almost a week (with frequent nuclease flushes).  I actually added another 24 hours of sequencing time after 5 days of runtime, and got another ~1.6 Gbase of sequence!

A few oddities during the runs: The translocation speed on the flow cell on our MinION Mk1B was slightly high; starting out just above the green zone. The quality score was also marginally lower than for the other flow cell. 



After 4 days, out of nowhere, reads suddenly began going to the "Skipped" folder. A few hours later this behaviour stopped. I have no idea why. 



The other flow cell was run simultaneously on our Mk1C. After a nuclease flush, suddenly the pores on the sides of the sensor chip no longer worked, and a large proportion of the channels had changed status to "Saturated". However, multiple manual Mux scans gradually brought them back to life. The same happened on every subsequent nuclease flush. 




27 Jul 2021

MiSeq post run washes: beware of expired sodium hypochlorite

Bleach, or sodium hypochlorite (NaOCl) is optionally used during MiSeq post-run washes in order to eliminate run-to-run carryover of library template. I noticed a cloudiness in our 5% sodium hypochlorite. This stock solution was purchased several years ago and stored in the fridge as indicated on the label. The bottle had no expiration date. After some online searching it became obvious that bleach has a (very) limited shelf life, depending on the temperature and concentration. After several years our stock had decomposed to saltwater, and seemed to have some fungal growth! Fortunately our MiSeq seemed unaffected; a cursory check found no indications of  run cross-contamination, and a later instrument annual maintenance found the capillaries clear and clean. Nevertheless, the moral of the story is: regularly buy fresh NaOCl for your post-run washes! 






20 Jul 2021

Guppy update - "super-accurate" model

Towards the end of May Oxford Nanopore released a new version of the Guppy basecaller. This version includes the Bonito basecaller model, which I previously tested and found that the quality scoring was broken. You can now select among 3 models; fast, HAC, and sup, with sup ("super accurate") the slowest but most accurate. I put our five genomic test datasets through the new version, using the sup model. I am pleased to see that the quality scoring problem from Bonito has been fixed. The sup model shows a small increase in the raw accuracy. This comes at the cost of slower basecalling speeds. In conclusion, another nice upgrade in accuracy. One of these days I must do some assembly benchmarks* to see if this translates into better assemblies! Previous testing by a colleague of mine indicated that this was not always the case. 

* I just need to learn how to do assemblies :-)



Error rates were calculated using Heng Li's one liner. No quality trimming was applied, except for Species 5 which had a minimum quality score of 7. 




The read quality estimates are now at least somewhat correlated to how similar the sequences are to the reference.





6 May 2021

2021 phylogenetic tree of Nanopore library kits

It's that time of the year again. It is time for my annual phylogenetic tree of Nanopore library kits. It should be pretty self-explanatory. The devices are:

F - Flongle
M - MinION
G - GridION
P - PromethION





Note that the LSK109 kit will be discontinued on Sept. 10 2021, except for all COVID-related projects which will be supported indefinintely.  Please let me know if you spot any mistakes or have suggestions for imprevements.