Published on

Amplicon NGS for CRISPR: how to design your sequencing strategy

Authors
  • avatar
    Name
    BioTech Bench
    Twitter

This is Part 13 of the CRISPR from Bench to Analysis series.

When Sanger isn't enough

In our previous post, we looked at how to use Sanger sequencing combined with TIDE or ICE to quickly quantify your knockout efficiency. It's cheap, fast, and uses equipment that every biology lab has. For standard verification of a knockout clone, it's perfect.

But Sanger sequencing has limits. It is a consensus method; it mixes the signals of all DNA molecules in your sample into a single trace chromatogram. Because of this, Sanger struggles to:

  • Detect rare alleles or editing events below 5% efficiency.
  • Accurately resolve complex mixtures of alleles (e.g. multiplexed editing or highly heterogeneous cell populations).
  • Characterize large insertions or deletions precisely at the nucleotide level.

When you need single-molecule resolution, deep sensitivity, or need to quantify thousands of different editing events at once, it's time to turn to Next-Generation Sequencing (NGS). Specifically, Amplicon NGS.

In this guide, we'll design a robust, cost-effective Amplicon NGS strategy from primer layout to sequencing parameters.


The Core Concept: Barcoded Amplicons

Unlike whole-genome sequencing (WGS), which sequences all the DNA in your cell, Amplicon NGS focuses only on a specific target region. We isolate this region using PCR, attach sequencing adapters and barcodes, and sequence it on a high-throughput platform (typically Illumina).

Because we sequence each PCR product molecule individually, we get a direct digital count. If we sequence 10,000 molecules and 4,000 contain an indel at the cut site, we can confidently state that our editing efficiency is 40%.

Here is how you set this up in the lab.


Step 1: Primer Design (The 50 bp Rule)

Designing primers for CRISPR NGS is different from standard PCR. Your goal is not just to see a band on a gel; it is to capture the full diversity of indels without technical bias.

Follow these rules:

  1. The Cleavage Site in the Middle: Design your primers so the expected double-strand break (DSB) site (3 bp upstream of the PAM for Cas9) is centered in the amplicon.
  2. The 50 bp Buffer Rule: Keep the primers at least 50 bp away from the cleavage site. If a primer binds too close to the cut site, large deletions might destroy the primer binding site, causing "primer PCR dropout." You would fail to amplify the edited molecules, artificially underestimating your editing efficiency.
  3. Keep it Short: Your entire amplicon should be between 150 bp and 250 bp. Shorter amplicons amplify more efficiently and sequence more reliably.

Step 2: The Two-Step PCR Protocol

To sequence PCR products on an Illumina instrument, the DNA must have specific flow cell binding adapters (called P5 and P7) and index sequences (barcodes) so you can tell samples apart.

The most efficient way to add these is a Two-Step PCR workflow:

Genomic DNA
[ PCR 1 ]  Using target-specific primers with overhang adapters
[ PCR 2 ]  Using generic index primers (attaches P5/P7 and barcodes)
Ready to Sequence

PCR 1 (Target Amplification)

You design forward and reverse primers that target your locus of interest. Crucially, you append standard Illumina overhang adapters to the 5' ends:

  • Forward Overhang: 5'-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG-[Forward Primer]-3'
  • Reverse Overhang: 5'-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG-[Reverse Primer]-3'

PCR 2 (Barcoding)

You take a small aliquot of your PCR 1 product and run a second PCR (typically only 8-12 cycles). This PCR uses generic primers that bind to the overhang adapters. These primers contain:

  • P5/P7 sequences that bind to the sequencer flow cell.
  • i5/i7 index barcodes (usually 8 bp) that uniquely label each sample.

This setup allows you to order just one set of expensive indexing primers and reuse them for any target locus simply by switching the PCR 1 target-specific primers.


Step 3: Selecting Sequencing Parameters

Once your library is prepared, you need to choose your sequencing run parameters.

Paired-End (PE) Reading

Always use Paired-End sequencing (reading the molecule from both ends). Because our amplicons are short (e.g. 200 bp) and we use PE150 or PE250 reads, the forward and reverse reads will overlap in the middle. This overlap is critical: sequencing error rates increase towards the end of a read. By overlapping the reads directly at the cleavage site, the analysis software can compare the two reads and discard sequencing errors, ensuring your indels are real biological edits and not instrument noise.

Sequencing Depth

How many reads do you need per sample? It depends on your goal:

  • Standard knockout confirmation: 10,000 to 30,000 reads per sample. This gives you plenty of statistical power to quantify editing down to 1%.
  • Homology-Directed Repair (HDR) validation: 50,000 to 100,000 reads. HDR is often rare, and deep sequencing ensures accurate quantification.
  • Off-target detection / Rare variant search: 100,000+ reads.

The Indispensable Control: Untreated Cells

Never run a CRISPR NGS experiment without sequencing an untreated control sample (cells that went through the transfection process but did not receive the gRNA/Cas9).

Why?

  1. Natural SNPs: Your cells may contain natural single-nucleotide polymorphisms (SNPs) or small indels that differ from the reference genome. An untreated control identifies these so you don't mistake them for editing.
  2. Sequencing Errors: Illumina sequencing has a baseline error rate (~0.1%). In homopolymer runs (e.g. a string of six 'A's), the sequencer might register an insertion or deletion. Sequencing your control reveals these technical artifacts, allowing your analysis pipeline to subtract them.

Summary: Designing Your Strategy

ParameterRecommended ChoiceRationale
Amplicon Size150 - 250 bpFits within standard read lengths; amplifies efficiently.
Primer Distance50\ge 50 bp from PAMAvoids primer binding site destruction by large indels.
Protocol2-step PCRMinimizes primer costs and allows flexible multiplexing.
PlatformIllumina MiSeqHigh accuracy, flexible read lengths (PE150/PE250).
Read ModePaired-EndDirect read overlap at the cut site filters out sequencing errors.
Depth20,000 reads/sampleBalanced cost and statistical power for standard knockout screens.

What's next?

Now that we have designed our amplicon sequencing library and understand the run parameters, how do we search for off-target editing in the lab? We can't amplify every site in the genome with primers.

In the next post, we compare the top laboratory-based off-target detection methods: GUIDE-seq, CIRCLE-seq, and Digenome-seq.