How Library Preparation Affects Sequencing Accuracy

In this webinar, Laurence Etwiller and Jennifer Ong discuss their recent publications, and how library preparation affects sequencing accuracy and variant calling. View full Q & A summary.

Script

Deana Martin:

Welcome to the NEB TV webinar series. Today, I'm joined by two of our scientists, Laurence Ettwiller and Jennifer Ong. Hello. How are you?

 

Laurence Ettwiller:

Hello.

 

Jennifer Ong:

Hello.

 

Deana Martin:

And could you tell us what we're talking about today?

 

Jennifer Ong:

Today, we'll be talking about how library preparation affects sequencing accuracy.

 

Deana Martin:

Great, let's get started.

 

Laurence Ettwiller:

Hello, welcome to NEB TV webinar on how preparation affects sequencing accuracy.  My name is Laurence Ettwiller. I'm a staff scientist in the research department of New England Biolabs. Today, I'm going to talk about the result of my recent paper on DNA damage, and its effect on sequencing accuracy.

 

Since the completion of the human genome in 2003, sequencing human is mostly about finding the few differences in the ocean of identical sequences. There are roughly one variant per 300 base pair, and this number can be higher in the case of notably cancer. But this number gives you a baseline value of the number of identical bases that we need to sequence in order to detect one variation. Since we are focused on differences, sequencing requires high accuracy to be able to identify variance.

 

Fortunately, Illumina sequencing has a pretty good accuracy, with most of the bases' call with a quality score between 30 and 40. Phred quality score is a measure of the quality of the identification of the base generated by next generation sequencing. It was originally developed for Phred-base calling to help in the automation of DNA sequencing in the Human Genome Project. At 30, the probability of making an incorrect base call is for one for every 1,000 base. It's good, but not perfect.

 

Germline variants are usually shared by all sequences in the case of the homozygous variant, half of the sequences in the case of a heterozygous variant. Therefore, with enough coverage, we can easily distinguish a true variant from sequencing error that tend to be stochastic. In this example, we can observe a germline T-to-C variant at a position highlighted in blue. All the reads share this variant, and the more reads we have at this position, the more certain we are that this C-to-T variant is real. Coverage therefore compensates sequencing errors.

 

But for certain samples, variants may not be shared with the majority of the sequences. For example, a tumor acquires somatic variants that are found in only a few cells, giving rise to intratumoral heterogeneity. In the case of a rare variant, only one or a few reads at the specific genomic position will have the evidence of the variant.

 

Coming back to our previous sequencing example, are read variants which has the G-to-T in red, or C-to-A in Green, true variants shared with just a few small subpopulation of cells? Or is it a sequencing error? In the case of the C-to-T in green, the Phred quality score is 20, which is one error every 100 base. This read variant is likely originated from an error during sequencing, maybe during the clustering step, and can be easily removed by filtering them out based on the quality score. For the G-to-T variant though in red, this variant has the Phred quality score of 35, which is one error every roughly 3,000 base. Less likely to be a sequencing error, but nonetheless can be an artifact.

 

In this presentation, Jennifer and I will look at the origin of artifactual variants with high Phred quality scores. Those errors aren't likely the result of Illumina sequencing, but are introduced before or during laboratory preparation. Those are the most pernicious error, because they cannot just simply be filtered out using low quality score. In the first part of the presentation, I will focus on damage, while in the second part of the presentation, Jennifer will tackle errors that are introduced during DNA amplification.

 

Damage can be introduced during DNA storage, or laboratory preparation, but not all damage are equally bad for sequencing accuracy. For NGS, it all depends how the damage is handled by the DNA polymerase during amplification or clustering. Also, the end repair step may handle damage differently, because the type of polymerases used are different. Based on the effect of damage of sequencing, we can classify damages in three groups.

 

In the first groups, the polymerase goes through the damage, and incorporate the correct base. In this case, there is no consequence for sequencing. On the second group, the damage stalls the polymerase, and while there is no consequence in sequencing accuracy, the blocking damage tends to decrease library yield. Finally, the last bullet point describes the most problematic damage, the mutagenic damage. In the case of a mutagenic damage, the polymerase goes through the damage, but it incorporates the wrong base. In this case, this misincorporation will lead to an artifactual variant during sequencing. The same type of damage tends to lead to the same type of misincorporation.

 

For example, oxidative damage leads to 8-oxoguanine, drives the polymerase to incorporate an adenine instead of a cytosine, thus a G-to-T error signature. That is a problem, because in the case of a cancer, they'll also have mutational signatures.

 

There is mutation 18 from the Alexandrov paper in 2013, showing a C-to-A or a G-to-T mutation that is predominantly found in these types of cancers. This makes sense, because some cancers originated from exposure to agents that caused a systematic damage leading to a systematic misincorporation of nucleotide. Those misincorporations aren't fixed after many generations of cell, leading to in this case, G-to-T or C-to-A somatic variants.

 

So how do we distinguish real G-to-T or C-to-A somatic variants that are fixed in the cancer cell population, compared to artifactual damage from the laboratory preparation that won't get fixed after PCR amplification or clustering?

 

It turns out that we can actually make the distinction between variants derived from damage versus real variants. As the damage in the DNA by definition aren't fixed, it turns out that damage is only affecting one base in the pairing, the other base is not affected. Additionally, Illumina sequencing is directional in the sense that in paired-end mode, the read one is reading the origin of that strand, while read two is reading the reverse compliment of the original strand.

 

In the case of non-seq damage, this directionally can lead to a global imbalance in the total number of variant types between read one and read two. We then divide the score by this score, that stands for global imbalance value.

 

We experimentally demonstrated that the GIV score is specifically measuring errors due to damage. For this, we acoustically shear the human genomic DNA in unbuffered solution to introduce 8-oxoguanine into the DNA, leading to a G-to-T errors as previously demonstrated by Costello et al. Half the sample is then treated with a cocktail of repair enzymes to repair the 8-oxoguanine, while the other half is directly sequenced using Illumina.

 

The graph that you see represents the G-to-T variant frequency on the read function of the position on the read. The left panel represents read one, and the right panel represents read two of the paired-end. You can immediately appreciate the difference between the variant frequency in read one and read two in the non-repair sample in blue. If the sample is repaired, this is the case in the red, the variant frequency dropped to baseline, and the imbalance is gone. This result demonstrates that the imbalance in variant frequency between read one and read two is the result of damage.

 

The GIV score is freely available on GitHub. It estimates the mutagenic damage of a sample using as low as two million reads, and can easily be implemented as part of the QC pipeline. We use the GIV score to estimate damage in the public genomic database, such as the 1000 Genome project, and the Cancer Genome Atlas on the right. We calculate the GIV score for all possible substitution, and ordered a substitution class by increased average GIV score. The substitution class with the highest GIV score for both the Cancer Genome Atlas and the 1000 Genome Project dataset is by far the G-to-T class, suggesting widespread oxidative damage on the guanosine. Any point above the solid gray line represents the sequencing run, where more than one third of the G-to-T substitutions on the reads are derived from damage, and are therefore artifactual.

 

Next, instead of looking at artifactual variant directly on the read, we investigated the consequence of damage at genomic positions with high sequencing coverage. As we have seen before, coverage compensates sequencing errors, and because damage is stochastic, we expect damage to primarily confound the identification of low allelic variants.

 

To better assess the effect of DNA damage on those variants, we sequenced a target set of cancer genes from a tumor sample with or without DNA repair, and normalized the coverage to 300-fold. We counted the number of variants on the Y-axis, and the type of variant. For example, G-to-T in orange, orange reverse compliments C-to-A in blue, and classified them in four groups. Variant frequency of less than 1%, 1-5%, 6-10%, and higher than 10%.

 

We identified sets of G-to-T variants in orange. In the non-repair sample, for determining variant frequency of less than 1%, and 1-5%. If the sample is repaired, the G-to-T variant frequency drops to numbers that are similar to the C-to-A variant frequency. Conversely, variants present in the sample at high frequency, such as germline variants, show the same profile with or without repair. This result demonstrates that damage confound the identification of low allelic variants.

 

If we now look at the variant defined by the National Cancer Institute GDC Data Portal, the fraction of G-to-T variants in the variant file is elevated in samples that are severely damaged in red, compared to samples that have no or little damage detected, in blue. This is true for both the raw somatic mutation files, and the annotated mutation files. This results suggest that a number of variants in these files are false, and are due to damage to instead.

 

With this presentation, I hope to have convinced you of the importance and pervasive nature of mutagenic damage in sequencing. This damage leads to systematic errors that can me measured using a simple metric called a GIV score, that measures the extent of the imbalance between read one and read two. Finally, damage confound the identification of variants, especially the low allelic fraction variants found in cancers.

 

And now I turn to Jennifer for the second part of the presentation.

 

Jennifer Ong:

Thank you, Laurence. Hello, everyone. My name is Jennifer Ong, and I'm a senior scientist at New England BioLabs. In part two, I'll discuss a recent paper from our lab describing DNA amplification errors, and how these can affect sequencing accuracy.

 

The polymerase chain reaction, or PCR, is a DNA amplification that is integral to many next generation sequencing sample preparation workflows, including our own NEBNext Ultra II. Therefore, any errors that are introduced during PCR will also increase the total number of sequencing errors.

 

PCR is an exponential DNA amplification method, which is used to amplify a target sequence located between two primers. During the amplification, the target sequence is copied, and then copies are made of those copies. This allows a large amplification from small amounts of material, but it also means that any errors introduced during PCR get copied and amplified as well. We want to measure the frequency and types of errors made during PCR to help us understand their contribution to next generation sequencing.

 

To do this, we set up a very accurate fidelity assay based on Pacific Bioscience's single-molecule real-time sequencing. SMRT sequencing sequences individual DNA molecules, and because each molecule is sequenced multiple times, you can achieve very high accuracy in a process called circular consensus sequencing. This high accuracy is what allows us to distinguish true PCR errors from sequencing errors.

 

To measure the errors generated during PCR, we sequenced the pool of molecules that were produced during DNA amplification. Sequencing libraries were repaired from these PCR products by ligating a hairpin-shaped SMRTbell adapters onto the ends of each product, which produces a circular DNA template. During sequencing, the sequencing primers extended around the circular template in a rolling circle amplification. Rolling circle amplification produces very long sequencing reads that are iterative copies of the top and bottom strand inserts, separated by adapter sequences.

 

The iterative copies of each top and bottom strand, called sub-reads, allow us to generate a very accurate consensus sequence for the top and the bottom strand. In this example, the true PCR error shown in green is present in all sub-reads on the bottom strand, whereas the sequencing errors illustrated in gray are randomly distributed and washed out of the consensus sequence.

 

In our study, we looked at several different types of PCR errors, including polymerase-based substitution events, PCR-mediated recombination, and mutagenic DNA damage caused by thermo cycling. Probably the most well characterized of these errors are polymerase base substitutions. These occur when the polymerase incorporates the wrong nucleotide during replication, but we also wanted to look at less well characterized mistakes like PCR-mediated recombination. This occurs when you're amplifying a mixed pool of closely related but not identical sequences, and template switching by the polymerase produces chimeric PCR products. This could lead to sequencing artifacts for certain types of next generation sequencing assays. Finally, we also wanted to measure the rate of mutagenic DNA damage that occurs from heating during thermo cycling.

 

One of the first things that we wanted to do is to compare the substitution error rates for different DNA polymerases, and in this table here, I'm showing the error rates of two different commonly used polymerases for PCR, Taq polymerase and Q5 DNA polymerase. In a typical fidelity experiment, we'll sequence about 100 million high-accuracy consensus bases in order to derive these error rates.

 

For Taq polymerase, this turns out to be 1.5 times 10 to the minus four substitutions per base per doubling event, and this corresponds to an accuracy of about one substitution every 6,000 bases replicated. In comparison, Q5 DNA polymerase has a much lower substitution rate at 5.3 times 10 to the minus seven substitutions per base per doubling event. This corresponds to an accuracy of one substitution every 1.9 million bases replicated. In comparison to Taq, this is about 280-fold higher fidelity.

 

These error rates are actually pretty low for the DNA polymerase. However, because of the exponential nature of PCR, an error that occurs in an early PCR cycle will be replicated, and get propagated through the amplification product. After 16 cycles of PCR, if you're using a low fidelity Taq polymerase, and amplifying one [KB 00:18:11], about 60% of the fragments will contain at least one error. Whereas if you're using Q5, only 0.3% of the fragments amplified will contain at least one error. This kind of shows the importance of using a high fidelity polymerase for amplification.

 

We then went on to compare the fidelity for several high fidelity DNA polymerases commonly used for next generation sequencing sample preparation workflows. In addition to Q5, we also looked at several engineered DNA polymerases, such as Phusion, PrimeSTAR GXL, and Kapa HiFi ReadyMix. We also looked at several wild-type DNA polymerases from archaea, such as Deep Vent, Pfu, and KOD DNA polymerases, and in this graph here, we're showing the fidelity relative to Taq polymerase. The higher fidelity, the lower the error rate, and the more accurate the enzyme.

 

In addition to determining the error rate for each polymerase, we're also able to identify the mutations that each typically makes. In this mutation spectrum here, I'm showing the distribution of the different error types. For example, Taq polymerase, which is a family A DN polymerase, tends to make errors that are A-to-G or T-to-C substitutions 68% of the time. The other high fidelity polymerases, which are are all related to family B DNA polymerases, these tend to make errors that are more G-to-A or C-to-T substitutions the majority of the time.

 

Okay, so onto our next type of PCR error, PCR-mediated recombination. Recombination can occur during amplification of a mixed pool of closely related sequences that may be nearly identical, but maybe only differ by a few mutations. We set out to develop an assay to measure the rate of recombination using the pairs of artificial sequences, illustrated here by DNA-1 and DNA-1x. These are nearly identical, only differing by a mutation every 100 base pairs, spread out across the length of gene. These mutations then serve as markers to help us distinguish between these two template sequences.

 

We then use Taq polymerase to amplify a mixed population of these two sequences during PCR, and then sequence the products. If there's no recombination that occurred after amplification, then the product should only have the markers from one of the templates. However, if a recombination then has occurred, then the strand will be a chimeric product, which will have the markers from both of the templates. As illustrated here, we can see that a recombination event has occurred after the third marker, where it switches from red to blue.

 

These types of recombination events, we also call them crossover events, can occur during PCR when a polymerase extends a primer, but then it falls off before it reaches the end of the template. During the next cycle of PCR, this extended primer can nail a different template, and then become extended. The resulting product is a chimeric product, and this can cause confusion because this has different types of mutations than the original starting pool. We wanted to measure the rate of recombination events by counting up the number of crossover events that we see after amplification.

 

We measured the rate of template switching for Taq polymerase using two of these types of template pairs. We measured the number of crossover events that occurred after a 16-cycle PCR amplification, and also the total number of sequence bases to get at the recombination rate, which ended up being on average about one times 10 to the minus four per base, per doubling event. These recombination events being very rare at the polymerase level, but because this is PCR, and errors get amplified, this means that after 16 cycles, we saw that 23%, or 28% of the strands that were produced had at least one recombination event. This is just another example of these really rare mistakes that polymerases make, producing a large amount of error in the final PCR products.

 

One of the last topics that I wanted to talk about, with mutagenic damage that can occur during thermo cycling. With these experiments, we made plasmid DNA libraries, subjected them to thermo cycling, where we heat and cool the DNA as you would in PCR for 16 cycles, and then measured the error rates. If we just took the plasmid DNA and measured the error rates with no thermo cycling, this had an error rate of about four times 10 to the minus seven. If we then subjected the plasmid libraries to heating and cooling for 16 mock thermo cycles, the error rate went up, almost two orders of magnitude to about two times 10 the minus five per base.

 

We then used a PreCR cocktail. PreCR is an enzyme mixture that recognizes and repairs DNA damage. We then took our heated libraries, and treated them with PreCR, and measured the error rate again, and found that the error rate after PreCR treatment and mock thermo cycling went back down to basal levels. To us, this indicated that heating and cooling actually causes DNA damage that can be repaired by PreCR.

 

The identity of the mutations after heating and cooling were all C-to-T substitutions, and so C-to-T substitutions are an indicator of cytosine deamination. We concluded that heating and cooling during PCR causes mutagenic damage, and per PCR cycle this happened about once every 700,000 bases.

 

I would like to conclude part two by summarizing our analysis of PCR products at the single molecule level. We measured polymerase base substitution error rates, as well as other types of PCR errors, including PCR-mediated recombination, the result of template switching during amplification of closely related sequences. We also looked at how much DNA damage occurs during thermo cycling, and found that heating causes cytosine deamination, which results in C-to-T substitutions upon sequencing.

 

If you're interested in learning more about the topics discussed here, the links for these two papers are available in the resources widget in the webinar console. Laurence and I will also be available to answer questions.

 

Jennifer Ong:

Hello, everyone. Okay, so it looks like we have one question here about essential genomic content for errors by Taq polymerase, and it's true that Taq polymerase tends to make errors mainly in As and Ts. They're through amplifying a genome or specific amplicon that's very A-T rich, then you'll see more errors in those kinds of sequence contexts.

 

There was also another question about any sequence context for crossover events by Taq polymerase, and for the simple recombination assay, we did not see any positional effects for the location of the crossover events, but in our paper we also looked at amplification of particular sequences. One of those was the lacZ amplicon, which contained certain structural elements that were inverted repeats, and these structural elements did induce a form of template switching that we observed over amplification of these particular sequences. So the answer is that some sequence context could also promote recombination or crossover events.

 

Laurence Ettwiller:

Yes, I'm going to take your question regarding the ... Sorry, I'm reading the question. Is there any way to address this issue in Ion Torrent data sets? I'm assuming that the issue is the damage?

 

Because damage happens, for example, oxidative damage happened before PCR, before sequencing, the damage will be fixed at the moment of sequencing. So Ion Torrent, it will not help with sequencing. You will have errors due to damage with Ion Torrent as well. Additionally, as far as I know, there's no imbalance in Ion Torrent. No, there is paired-end sequencing, so you will not be able to look at the imbalance, the GIV score in an Ion Torrent platform.

 

Jennifer Ong:

Okay, going back to some of the PCR questions, it looks like there's questions about ... Does the rate of heating and cooling during PCR affect DNA damage?

 

I think that probably the amount of time the heating at 98 degrees, during the denaturation temperature, is probably what is promoting hydrolysis or cytosine deamination. I think that the length of time at elevated temperatures like 98 degrees, or the number of cycles would then promote more DNA damage.

 

Laurence Ettwiller:

I have a question: Can you use the GIV score to remove low frequency variants cause by DNA damage? If so, it works with downstream analysis?

 

Currently, we cannot use the GIV score to remove the specific variants that we call. The GIV is a global imbalance value, so it takes the sequencing read directly up to 2 million reads, so we do not have this feature in the GIV score.

 

Jennifer Ong:

Okay, what are the questions about PCR? We're being asked: Why do we only do the tests with 16 cycles of PCR, and not 30 to 40 cycles as most PCR methods?

 

We choose to use 16 cycles of PCR, because we found that for the conditions that we're using, and the amount of template that we're using, after 16 cycles, the amplification reaction is still in the linear range, and we haven't saturation the reaction yet. We're trying to perform PCR under conditions where amplification is still occurring in the later cycles. If we performed more cycles of PCR, then the amplification reaction would be saturated at that point.

 

Laurence Ettwiller:

Yes, I have a question regarding the GIV score. Could you explain the GIV score a bit more?

 

The idea is that if you have a damage, the damage is non-fixed. In one of the strands you will have the damage, while in the other strand you don't have the damage. The other thing is a certain type of damage always leads to a certain type of ... The polymerase always behaves a certain way when it encounters damage.

 

For example, an oxidative damage as in the case of 8-oxoguanine, you will have a G-to-T, always when you have 8-oxoguanine. This is unfixed in the case of damage, and the second thing is that Illumina is directional. When you ligate your adapters, you have a P5 and a P7. The P5 is read one, P5 is read one, it reads the origin of the strand, while P7 receives reverse compliment of the original strand. When you do this, you will have an excess of G-to-T on read one, compared to C-to-A on read one, versus T-to-A on read two, relative to G-to-T on read two.

 

So it's this imbalance that we are measuring in the case of damage. If it's a true variant, the true variant is fixed, but you will have the changes in both strands, and therefore you will have no imbalance.

 

Jennifer Ong:

Okay, so I have another question here. What are the sequencing accuracy, e.g. SNP-calling implications of crossover events during PCR?

 

So crossover events during PCR can have implications for certain NGS assays. For instance, if you're amplifying a pool of biopopulations which have very related sequences to each other, or in microbial identification, when you're amplifying 16S RRNA genes, which are very closely related, and you're trying to sequence the original viral population, or the region 16S gene. If you have a crossover event, you may see one mutation or one sequence then get combined with another sequence, and it will be a synthetic error. It will be not be a sequence that was originally present in your starting pool.

 

So it may seem like you have a new viral variant, or a new microbe, or if you're doing haplotyping, a new type in your sequencing pool that didn't originally exist.

 

Jennifer Ong:

I have another question. Would you recommend DNA damage repair before Illumina sequencing as a normal part of a protocol?

 

If you're talking about 8-oxoguanine, the damage that is introduced during shearing of the DNA, you can first change the buffer composition. So instead of 0.1 TE, which is what people tend to use, I use a different buffer, 1 X TE for example. When using 1X TE, you will remove most of the damage, but not everything. If you really want to remove all the damage, you may want to try PreCR.

 

Another question about PCR. Do you believe most errors in the PCR experiments come from damage to the bases, in a template, or growing strand, or from damage to the nucleotide pool? Does the fact that thermo cycling itself causes damage suggest it's in the templates, or maybe there are different types of damage in the template versus the nucleotide pool?

 

There have been experiments done, which shows that yes, heating does cause damage, again cytosine deamination to the nucleotide pool, which then can get incorporated into the PCR products. Thermo cycling does cause damage to both the template strand, and also the nucleotide pool, and can cause C-to-T substitutions in the final product.

 

Laurence Ettwiller:

Do I have another question? Will C deamination in any way, shape, form affect by methylation of genomic CpG content?

 

What we have done mostly is samples that are fresh, so the mutation is mostly a G-to-T due to oxidative damage. C deamination happened mostly on the FFP samples. We have not looked in this study at FFP samples.

 

Jennifer Ong:

Okay. The question here is: How does the number of PCR-mediated mutations introduced compare to the number of mutations introduced during the actual sequencing?

 

Depending on the quality score, usually the number of sequencing errors can be quite high for Illumina sequencing, and these are probably more than the polymerase induced errors. But the polymerase induced errors actually add to the total number of sequencing errors, and it depends a little bit on the data analysis, and what types of filters you're using for base calling.

 

Jennifer Ong:

So there's another question here. Did the length of the fragment when doing PCR affect damage or errors caused during PCR? And for long amplification, DNA damage or errors then becomes more of a problem for producing very long amplicons.

 

I think it's maybe more of the opposite way around, that damage maybe prevents a longer amplification than for shorter products.

 

Laurence Ettwiller:

Yes, so I have another question. Do you think treatment of the amplified library with PCR to correct DNA damage arising from thermo cycling should be added to library preparation and workflow?

 

The PCR as it currently stands will not stand the temperature, so during PCR you will fix the damage that is introduced during the cycling, and therefore you cannot use PCR during PCR. So the damage can only be fixed before or after.

 

Jennifer Ong:

Okay, and a couple questions here. Are there polymerases that have lower rates of recombination than taq, or other ways of reducing chimeric formation during PCR?

 

This is an area of interest for us. I wanted to do a further study looking at different polymerases, and that's something that we've planned for the future, but don't have a good answer for this yet. I do think PCR-mediated recombination is probably more related to the PCR protocol and the cycling conditions than maybe the polymerase itself.

 

Laurence Ettwiller:

We have a couple of questions about how PCR works. I will let Tom answer these questions.

 

Tom May:

Yeah, hi. I was listening in on the seminar, and this question came up. It was developed in my lab. PCR has a lot of the enzymes that are found in base excision repair. There's a number of enzymes, like FPG, EndoVIII, these types of enzymes that will go and recognize damage, and cause a break in the double stranded DNA chain, and there's a DNA polymerase of Bst, a DNA polymerase full-length that goes in and does a short little nick translation to make a little repair patch, and then there's also a DNA ligase present. So it fully seals the nick after it's been translated, so that the damage is actually fully repaired.

 

I mean, clearly because it requires a template strand to do the repair, it works most effectively on double-stranded DNA. If there's single-stranded DNA, some of the DNA repair enzymes will actually remove the damage, but now there's no template strand. There will be no additional bases added in to repair it. It'll just simply remove the damage.

 

Jennifer Ong:

Okay. I have a question here about: Does deamination of dCTP, in the nucleotide pool ... Or damage to the nucleotide pool, does this affect the incorporation rate?  Mutated versus non-mutated DCTP.

 

Laurence Ettwiller:

So when DCTP is deaminated, it becomes dUTP. DUTP is very similar to thymidine, and the incorporation rate is probably not that affected. Probably not that affected that much.

 

Laurence Ettwiller:

I have another question. How would you limit, or what is the best way DNA damage ... In the typical DNA extraction library prep?

 

So most of the damage, it depends on the sample. If it's a fresh sample, most of the damage comes from the shearing of the DNA during sonication. In that case, the first way to limit the damage is to increase the buffer potential of the solution during shearing. It turns out that the way you insulate DNA may also affect the damage. So that's something that we have not really looked at, but I think there is a correlation there, but really the buffer composition is really what makes the damage be limited. If you wanted to really remove the oxidative damage, then PCR would also work for that.

 

Jennifer Ong:

Okay, so I had a question about PCR errors, and do they affect Sanger sequencing as much as NGS?

 

And it depends a little how you're preparing your samples for Sanger sequencing. With Sanger sequencing, because you're sequencing a pool of molecules, if you're sequencing a PCR product for instance, you would get ...your Sanger sequencing would be a consensus of the entire pool, which would actually probably be more accurate, because things like errors or DNA damage would be washed away in building the consensus.

 

Or if you're using Sanger sequencing to sequence PCR products that have been cloned into a plasmid, and then purified from E. Coli, the cell has DNA damage repair pathways that will repair the damage, and reduce the sequencing error when you actually perform Sanger sequencing.

 

Kari Goodwin:

So I think that's all the questions that we're going to have time to take today. We will be providing the webinar on-demand as well, and following up with each of you with your questions. Thank you so much for joining, and have a great day!

 

Loading Spinner