Published on

In silico off-target prediction: how the algorithms work and which tools to trust

Authors
  • avatar
    Name
    BioTech Bench
    Twitter

This is Arc 1, Part 8 of the CRISPR from Bench to Analysis series.


You’ve found a perfect PAM site. The GC content is exactly 50%. The secondary structure looks clean. It’s the "perfect" guide RNA—on paper.

But there’s a problem. The human genome is 3 billion base pairs long. The odds that your 20-nucleotide spacer matches (or nearly matches) another location in the genome are surprisingly high. If Cas9 cuts there, you might accidentally knock out a tumor suppressor or cause a chromosomal translocation.

In the early days of CRISPR, we were flying blind. Today, we have powerful in silico off-target prediction tools. This post explains the math behind the scores and which tools you should actually trust for your next experiment.

What you'll learn

  • Why off-targets happen: The "Wobble" and mismatch tolerance
  • Scoring 101: Understanding the MIT Score vs. the CFD Score
  • The "Seed" effect: Why mismatches at the PAM-distal end matter less
  • Top Tools: Cas-OFFinder, CRISPOR, and Benchling compared
  • Practical strategy: How to filter 100 potential off-targets down to the 3 you actually need to check in the lab

Why Cas9 is "sloppy"

Cas9 doesn't require a 100% perfect match to cut. It’s remarkably tolerant of mismatches, especially those located far away from the PAM.

The first 10-12 nucleotides next to the PAM are called the Seed Region. If you have a mismatch here, Cas9 usually won't cut. But if the mismatch is at position 19 or 20 (the "distal" end), Cas9 often doesn't care. It will bind, it will cut, and you will have an off-target.

MIT vs. CFD: Decoding the Scores

When you run a guide through a design tool, you’ll see two main numbers.

1. The MIT Score (The Classic)

Developed by the Zhang lab at MIT, this was the first widely used scoring system. It’s based on a simple heuristic: the more mismatches an off-target has, and the closer they are to the PAM, the lower the score (meaning less likely to cut).

  • 0-100 scale: Higher is better (fewer predicted off-targets).
  • Limit: It doesn't account for the type of mismatch (e.g., a G-U wobble is less disruptive than a C-A mismatch).

2. The CFD Score (The Gold Standard)

The Cutting Frequency Determination (CFD) score was developed by the Doench lab (Broad Institute). Unlike the MIT score, it uses experimental data from thousands of guides to weigh every possible mismatch at every position.

  • Why it’s better: It knows that a mismatch at position 5 is "heavier" than one at position 18.
  • Interpretation: A score of 1.0 means a perfect match. Off-targets with a CFD > 0.2 are generally considered "high risk."

Which Tools Should You Use?

1. Cas-OFFinder (The Power User Choice)

If you want to find every possible off-target—including those with 5+ mismatches or DNA/RNA bulges—this is the tool. It's incredibly fast because it uses GPU acceleration.

2. CRISPOR (The Best for Bench Biologists)

CRISPOR is my personal favorite. It pulls in data from almost every published algorithm (MIT, CFD, Rule Set 2) and presents it in a clear, color-coded table.

  • Best for: Standard Cas9/Cas12a design in common organisms (Human, Mouse, Zebrafish).
  • Link: crispor.tefor.net

3. Benchling (The Integrated Choice)

If your lab already uses Benchling for plasmid maps, their built-in CRISPR tool is excellent. It uses the CFD scoring system and handles the visual mapping of off-targets onto your genome beautifully.


The "Rule of 3" for the Lab

You run a search and get a list of 500 potential off-targets. You can't NGS-sequence all of them. Here is how I filter them for validation:

  1. Exclude Intergenic/Intronic: If the off-target is in the middle of a "gene desert" or a non-functional intron, it’s low priority.
  2. Focus on Exons: Only worry about off-targets that hit an exonic region of a known gene.
  3. The Top 3: Take the 3 highest-scoring (highest CFD) exonic off-targets. If your guide doesn't cut those, it's very unlikely to cut the hundreds of lower-ranked sites.

My Take

In silico prediction is a filter, not a guarantee. A high score doesn't mean it will cut, and a low score doesn't mean it's impossible. Chromatin accessibility and epigenetic markers play a huge role that these algorithms can't yet fully predict.

Always use CFD scores over MIT scores when available, and if your project is clinical or high-stakes, move beyond prediction to empirical discovery (like GUIDE-seq or CIRCLE-seq), which we’ll cover in CRISPR-14.


What's the scariest off-target a tool ever predicted for your guide? Drop the gene name below.

Resources

ResourceLinkNotes
CRISPORcrispor.tefor.netRecommended design tool
Cas-OFFinderrgenome.netComprehensive off-target search
Doench et al. (2016)Nature BiotechThe CFD scoring paper
Hsu et al. (2013)Nature BiotechThe original MIT scoring paper