How CRISPR-Cas9 actually works: mechanism, PAM sites, and DNA repair

This is Arc 1, Part 1 of the CRISPR from Bench to Analysis series.

You've probably heard that CRISPR can edit any gene in any organism. That's basically true. But if you're about to design your first CRISPR experiment — or you're trying to figure out why your knockout didn't work — it helps to understand what's actually happening at the molecular level.

Not because the mechanism is complicated (it isn't, once it clicks), but because the biology directly determines your experimental design. Which repair pathway your cells use will decide whether you get a knockout, a precise edit, or nothing. Where you place your guide RNA affects both efficiency and off-target risk. Understanding the mechanism turns CRISPR from a black box into something you can reason about.

What you'll learn in this post

How the Cas9 protein finds and cuts its DNA target
What a PAM site is and why Cas9 can't work without one
The three main ways your cells repair the cut — and how each one affects your results
What to think about before you even open a gRNA design tool

The two-component system

CRISPR-Cas9 has two functional parts: the Cas9 protein (the scissors) and the guide RNA (the GPS).

The guide RNA — usually written as gRNA or sgRNA (single guide RNA) — is a short piece of RNA, typically 20 nucleotides long, that you design to match your target sequence. Cas9 uses this guide to scan the genome and find the matching DNA. When it finds a match, it cuts.

This is what makes CRISPR so powerful and so practical. To edit a new target, you don't redesign the protein — you just change a 20-nucleotide RNA sequence. That's a $10 oligo synthesis, not months of protein engineering.

The guide RNA you design is called the spacer sequence. In practice, your gRNA also includes a constant scaffold sequence that Cas9 physically grabs onto — but most design tools handle that automatically, so you only need to specify the spacer.

Diagram of the Cas9 protein structure and its interaction with the sgRNA and target DNA double helix, showing the REC and NUC lobes and their functional domains

Figure 1. Structure of SpCas9 and the Cas9–sgRNA–target DNA complex. The protein is split into two lobes: the REC lobe binds the guide RNA, while the NUC lobe contains the HNH and RuvC nuclease domains (which cut each DNA strand) and the PAM-interacting (PI) domain. The guide RNA threads through the protein to position the 20-nt spacer against the target strand. Adapted from Zhu Y (2022). Advances in CRISPR/Cas9. _BioMed Research International, 2022:9978571. doi:10.1155/2022/9978571, under CC BY 4.0._

How Cas9 finds its target: PAM sites and R-loop formation

Cas9 doesn't just float around and randomly probe DNA. It uses a two-step recognition process that's surprisingly elegant.

Step 1: PAM scanning

Before Cas9 checks whether your guide RNA matches the DNA, it first looks for a short sequence called a PAM site (Protospacer Adjacent Motif). For the most commonly used Cas9 — SpCas9, derived from Streptococcus pyogenes — the PAM sequence is NGG, where N is any nucleotide.

Cas9 scans along the DNA looking for NGG sequences. When it finds one, it pauses and checks whether the 20 nucleotides immediately upstream match your guide RNA. If they do, it cuts. If they don't, it moves on.

This means two things for your experimental design:

Your target must have an NGG PAM site nearby. No PAM, no cut. When you're picking where to target, you're constrained by where NGG sequences happen to fall in your gene of interest.
The PAM site is not part of your target sequence — it sits just downstream of it (on the non-template strand). When you're reading off-target prediction results, remember that the PAM has to be present at any off-target site too.

Step 2: R-loop formation and mismatch tolerance

Once Cas9 finds a PAM, it unwinds the DNA locally and starts checking whether the guide RNA matches the adjacent sequence. This process — where the guide RNA displaces one DNA strand and base-pairs with the other — is called R-loop formation.

Cas9 checks the 20-nucleotide match starting from the PAM-proximal end (the 3' end of your spacer, the end closest to the NGG). The 10–12 nucleotides closest to the PAM are called the seed region — mismatches here almost always abolish cutting. Mismatches farther from the PAM (the PAM-distal end) are more tolerated.

This is why off-target sites aren't random. They're sequences that match your guide well — especially in the seed region — and happen to have an NGG nearby.

The cut: two nuclease domains, one blunt break

Once Cas9 confirms a full match, it cuts both strands of the DNA. It does this using two nuclease domains within the same protein:

HNH domain — cuts the strand complementary to your guide RNA (the strand your guide base-pairs with)
RuvC domain — cuts the non-complementary strand (the strand that was displaced during R-loop formation)

Both cuts happen close together, typically between positions 3 and 4 upstream of the PAM. The result is a blunt-ended double-strand break (DSB) — both strands severed at roughly the same position.

A double-strand break is one of the most serious forms of DNA damage a cell can experience. Your cells have evolved multiple mechanisms to repair it as quickly as possible. Which mechanism kicks in is not entirely in your control — and it's the most important variable in your experiment.

What happens after the cut: DNA repair pathways

This is where things get biologically interesting, and where a lot of CRISPR experiments go wrong.

Non-homologous end joining (NHEJ)

NHEJ is the cell's default repair pathway for double-strand breaks. It's fast, it's available in all cell types and cell cycle phases, and it's imprecise.

NHEJ ligates the broken ends back together, but it often introduces small insertions or deletions — indels — at the cut site. These indels are random in size (typically 1–10 bp) and random in sequence. If your cut site is in a coding exon, an indel that shifts the reading frame will introduce a premature stop codon downstream — that's a functional knockout.

NHEJ is your pathway of choice for gene knockouts. It requires no template, it works in dividing and non-dividing cells, and it's the dominant pathway in most mammalian cell types.

One important caveat: not every indel gives you a knockout. In-frame indels (multiples of 3 bp) can leave the reading frame intact. If you're assessing knockout efficiency, you need to check the actual indel spectrum — not just whether a cut occurred. This is something Sanger sequencing + TIDE analysis (covered in Arc 2) can tell you.

Homology-directed repair (HDR)

HDR uses a DNA template to repair the break — either the sister chromatid or an exogenous template you provide. When you supply a donor template (a plasmid or single-stranded oligonucleotide) with homology arms flanking the cut site, HDR can incorporate your desired sequence with high precision.

HDR is your pathway for precise edits: point mutations, small insertions, epitope tags, fluorescent protein knock-ins.

The catch: HDR is far less efficient than NHEJ in most mammalian cell types (typically 1–10% of alleles edited vs. 30–80% for NHEJ knockouts). And HDR requires cells to be in S or G2 phase of the cell cycle — it's essentially unavailable in post-mitotic cells like neurons.

If you're trying to do HDR in a primary non-dividing cell type, you're working against the biology. This is one of the main reasons base editing and prime editing (covered in Posts 3 and 4) were developed — they can make precise edits without relying on HDR.

Microhomology-mediated end joining (MMEJ)

MMEJ is a third pathway that's worth knowing about. It uses short regions of microhomology (2–25 bp) flanking the break site to guide repair, which typically results in deletions of the sequence between the microhomology sequences.

MMEJ deletions are more predictable than NHEJ indels — you can actually predict likely MMEJ outcomes from the sequence context around your cut site using tools like inDelphi or FORECAST. This makes MMEJ useful if you need a specific deletion rather than a random one.

Schematic comparing NHEJ and HDR repair pathways after a CRISPR-Cas9 double-strand break, showing indel formation via NHEJ and precise sequence insertion via HDR with a donor template

Figure 2. The two dominant repair outcomes after a Cas9-induced double-strand break. NHEJ (left) rapidly ligates the broken ends with frequent small insertions or deletions — the basis for gene knockout. HDR (right) incorporates a supplied donor template to introduce a precise sequence change — the basis for knock-ins and point mutations, but restricted to dividing cells in S/G2 phase. Adapted from Zhu Y (2022). Advances in CRISPR/Cas9. _BioMed Research International, 2022:9978571. doi:10.1155/2022/9978571, under CC BY 4.0._

What this means for your experiment

Before you open a gRNA design tool, ask yourself two questions:

1. What outcome do I want?

Gene knockout → design for NHEJ, maximize cutting efficiency
Precise edit (point mutation, tag insertion) → design for HDR, choose a cell type that divides
Defined deletion → consider MMEJ-based approaches
Base conversion without a DSB → consider base editing (Post 3)
Complex precise edit in a non-dividing cell → consider prime editing (Post 4)

2. What cell type am I working in?

Dividing cells (HEK293, iPSCs, primary T cells in culture): both NHEJ and HDR available
Non-dividing cells (neurons, hepatocytes in vivo): NHEJ available, HDR essentially not

The answers to these questions determine which CRISPR system to use — something we'll build into a proper decision framework in Post 9 (Arc 1 capstone).

Common mistakes

Assuming any indel gives a knockout. It doesn't. In-frame indels leave the reading frame intact. Always sequence your edited clones and check whether the protein is actually disrupted — Western blot or functional assay, not just a PCR band.

Targeting the last exon for knockouts. Cells can produce truncated proteins from transcripts that bypass a frameshift near the end of the coding sequence. Target an early, constitutively spliced exon.

Expecting high HDR efficiency in a non-dividing cell type. If you're working with primary neurons, cardiomyocytes, or any other post-mitotic cell, HDR rates will be very low. This is biology, not a protocol failure.

Honest caveats

This post describes SpCas9 — the most widely used Cas9 variant, originally from S. pyogenes. Other Cas proteins (SaCas9, AsCas12a, LbCas12a) have different PAM requirements, different cut geometries, and different properties. We cover Cas12a in Post 2.

The repair pathway your cells actually use depends on cell type, cell cycle, chromatin state, and the specific DNA sequence context around the break. The percentages cited here are rough approximations. Actual efficiency in your system may be quite different.

There are also additional repair pathways (single-strand annealing, alternative end-joining) that we haven't covered here. They're less commonly exploited experimentally, but they exist and can produce unexpected outcomes.

What's next

Post 2 covers Cas12a (Cpf1) — a different Cas protein that processes its own guide RNAs, cuts in a staggered fashion, and has a T-rich PAM requirement that opens up genomic targets Cas9 can't easily reach. If you're ever choosing between Cas9 and Cas12a for an experiment, Post 2 will tell you exactly when each one has the edge.

→ Read Post 2: Cas12a (Cpf1) vs Cas9: which one should you use? (coming February 26)

Want the full protocol details, decision flowcharts, and troubleshooting guide that go with this post? They're in the book CRISPR from Bench to Analysis — including a complete breakdown of how to predict repair outcomes from your cut site sequence.

What aspect of the Cas9 mechanism surprised you most? Or is there something here you'd like me to go deeper on? Drop a comment below.

Resources

Resource	What it's for	Link
CRISPOR	gRNA design + off-target prediction	crispor.tefor.net
CHOPCHOP	gRNA design, supports Cas9 + Cas12a	chopchop.cbu.uib.no
inDelphi	Predict NHEJ/MMEJ repair outcomes	indelphi.giffordlab.mit.edu
FORECAST	Predict repair outcomes from cut context	forecast.dharmacon.com
Addgene CRISPR guide	Protocol reference and troubleshooting	addgene.org/guides/crispr
Jinek et al. 2012	Original Cas9 biochemistry paper	Science, 337:816–821
Doudna & Charpentier 2014	Review: CRISPR-Cas9 mechanism and uses	Science, 346:1258096
Zhu Y. 2022	Review: Cas9 structure and repair pathways (source of Figures 1–2)	BioMed Res Int, 2022:9978571