In silico off-target prediction: how the algorithms work and which tools to trust

This is Arc 1, Part 8 of the CRISPR from Bench to Analysis series.

You’ve found a perfect PAM site. The GC content is exactly 50%. The secondary structure looks clean. It’s the "perfect" guide RNA—on paper.

But there’s a problem. The human genome is 3 billion base pairs long. The odds that your 20-nucleotide spacer matches (or nearly matches) another location in the genome are surprisingly high. If Cas9 cuts there, you might accidentally knock out a tumor suppressor or cause a chromosomal translocation.

In the early days of CRISPR, we were flying blind. Today, we have powerful in silico off-target prediction tools. This post explains the math behind the scores and which tools you should actually trust for your next experiment.

What you'll learn

Why off-targets happen: The "Wobble" and mismatch tolerance
Scoring 101: Understanding the MIT Score vs. the CFD Score
The "Seed" effect: Why mismatches at the PAM-distal end matter less
Top Tools: Cas-OFFinder, CRISPOR, and Benchling compared
Practical strategy: How to filter 100 potential off-targets down to the 3 you actually need to check in the lab

Why Cas9 is "sloppy"

Cas9 doesn't require a 100% perfect match to cut. It’s remarkably tolerant of mismatches, especially those located far away from the PAM.

The first 10-12 nucleotides next to the PAM are called the Seed Region. If you have a mismatch here, Cas9 usually won't cut. But if the mismatch is at position 19 or 20 (the "distal" end), Cas9 often doesn't care. It will bind, it will cut, and you will have an off-target.

MIT vs. CFD: Decoding the Scores

When you run a guide through a design tool, you’ll see two main numbers.

1. The MIT Score (The Classic)

Developed by the Zhang lab at MIT, this was the first widely used scoring system. It’s based on a simple heuristic: the more mismatches an off-target has, and the closer they are to the PAM, the lower the score (meaning less likely to cut).

0-100 scale: Higher is better (fewer predicted off-targets).
Limit: It doesn't account for the type of mismatch (e.g., a G-U wobble is less disruptive than a C-A mismatch).

2. The CFD Score (The Gold Standard)

The Cutting Frequency Determination (CFD) score was developed by the Doench lab (Broad Institute). Unlike the MIT score, it uses experimental data from thousands of guides to weigh every possible mismatch at every position.

Why it’s better: It knows that a mismatch at position 5 is "heavier" than one at position 18.
Interpretation: A score of 1.0 means a perfect match. Off-targets with a CFD > 0.2 are generally considered "high risk."

Which Tools Should You Use?

1. Cas-OFFinder (The Power User Choice)

If you want to find every possible off-target—including those with 5+ mismatches or DNA/RNA bulges—this is the tool. It's incredibly fast because it uses GPU acceleration.

Best for: Exhaustive searches and finding off-targets for non-standard PAMs.
Link: rgenome.net/cas-offinder

2. CRISPOR (The Best for Bench Biologists)

CRISPOR is my personal favorite. It pulls in data from almost every published algorithm (MIT, CFD, Rule Set 2) and presents it in a clear, color-coded table.

Best for: Standard Cas9/Cas12a design in common organisms (Human, Mouse, Zebrafish).
Link: crispor.tefor.net

3. Benchling (The Integrated Choice)

If your lab already uses Benchling for plasmid maps, their built-in CRISPR tool is excellent. It uses the CFD scoring system and handles the visual mapping of off-targets onto your genome beautifully.

The "Rule of 3" for the Lab

You run a search and get a list of 500 potential off-targets. You can't NGS-sequence all of them. Here is how I filter them for validation:

Exclude Intergenic/Intronic: If the off-target is in the middle of a "gene desert" or a non-functional intron, it’s low priority.
Focus on Exons: Only worry about off-targets that hit an exonic region of a known gene.
The Top 3: Take the 3 highest-scoring (highest CFD) exonic off-targets. If your guide doesn't cut those, it's very unlikely to cut the hundreds of lower-ranked sites.

My Take

In silico prediction is a filter, not a guarantee. A high score doesn't mean it will cut, and a low score doesn't mean it's impossible. Chromatin accessibility and epigenetic markers play a huge role that these algorithms can't yet fully predict.

Always use CFD scores over MIT scores when available, and if your project is clinical or high-stakes, move beyond prediction to empirical discovery (like GUIDE-seq or CIRCLE-seq), which we’ll cover in CRISPR-14.

What's the scariest off-target a tool ever predicted for your guide? Drop the gene name below.

Resources

Resource	Link	Notes
CRISPOR	crispor.tefor.net	Recommended design tool
Cas-OFFinder	rgenome.net	Comprehensive off-target search
Doench et al. (2016)	Nature Biotech	The CFD scoring paper
Hsu et al. (2013)	Nature Biotech	The original MIT scoring paper