Published on

Multiple testing correction in R: what p.adjust is really doing to your p-values

Authors
  • avatar
    Name
    BioTech Bench
    Twitter

This is Arc 2, Part 10 of the R for Biologists series.


In Post 5, we looked at t-tests. You had one gene, two conditions, and you got a p-value of 0.04. You were happy. In the classical lab world, p < 0.05 is the magic threshold for "truth."

But bioinformatics is different. In an RNA-seq experiment, you aren't running one t-test. You're running 20,000 of them—one for every gene in the genome.

If you use the standard p < 0.05 threshold across 20,000 tests, you are guaranteed to find roughly 1,000 genes that look significant purely by random chance. These are False Positives (Type I errors). If you publish those 1,000 genes, you are publishing noise.

This post explains how to fix this using Multiple Testing Correction in R.

What you'll learn

  • The "Jelly Bean" Problem: Why testing more genes leads to more errors
  • Bonferroni vs. Benjamini-Hochberg (FDR): Which one should you use?
  • How to use p.adjust() in R (the only function you need)
  • Interpreting the results: What an "Adjusted P-value" actually represents
  • Why we call them "q-values"

The Problem: The Multiple Testing Burden

Imagine I give you a fair coin. The odds of getting 5 heads in a row are low (~3%). If you do it once and get 5 heads, I might suspect the coin is rigged (p < 0.05).

But what if I give 100 people a fair coin and ask them all to try? Statistically, about 3 of those people will get 5 heads in a row just by luck. If I only report those 3 people, I’m "p-hacking"—I’m presenting random noise as a significant finding.

In genomics, we are the 100 people with the coins. Every gene is a new chance for luck to look like biology.


The Fix: p.adjust()

R has a built-in function to handle this: p.adjust(). It takes a vector of raw p-values and returns a vector of "adjusted" p-values.

Let’s simulate 10 p-values, where most are noise but two are actually significant:

raw_pvals <- c(0.001, 0.005, 0.06, 0.12, 0.45, 0.67, 0.89, 0.04, 0.03, 0.99)

1. Bonferroni (The Strict Parent)

The Bonferroni correction is the simplest. It multiplies every p-value by the number of tests (n).

  • Pros: Very low risk of false positives.
  • Cons: Extremely "conservative." It often throws out real biology because the bar is too high.
p.adjust(raw_pvals, method = "bonferroni")
[1] 0.01 0.05 0.60 1.00 1.00 1.00 1.00 0.40 0.30 1.00

Notice that our 0.04 and 0.03 values are now 0.40 and 0.30—no longer significant.

2. Benjamini-Hochberg / FDR (The Modern Standard)

The False Discovery Rate (FDR) is what almost every bioinformatics paper uses. Instead of trying to eliminate all false positives, it controls the proportion of false positives among your significant results.

  • Interpretation: If you set an FDR of 0.05 and find 100 genes, you are accepting that ~5 of them might be noise.
p.adjust(raw_pvals, method = "BH") # BH = Benjamini-Hochberg
[1] 0.005 0.016 0.120 0.200 0.642 0.837 0.988 0.100 0.100 0.990

When to use what?

MethodMethod in RBest for...
None"none"Single-gene experiments (qPCR).
Bonferroni"bonferroni"When a false positive is a disaster (e.g., clinical trials).
FDR (BH)"BH" or "fdr"RNA-seq, Proteomics, GWAS. Any high-throughput screening.

Real-world Example: Volcano Plots

In Post 12, we used adj.P.Val from the limma package to build a volcano plot. That adj.P.Val column was calculated using the Benjamini-Hochberg method.

If we had used raw p-values, our volcano plot would be "noisier," with many more dots above the significance line that don't actually replicate in a follow-up experiment.


My Take

If you are analyzing genomic data, never report raw p-values. It’s the fastest way to get a desk-rejection from a reviewer.

Think of FDR as a way to prioritize your work. It tells you which genes are the "safest bets" for your next Western blot or CRISPR knockout. If a gene is significant in raw p-values but disappears after FDR correction, it means the evidence isn't strong enough to survive the 20,000-test burden. Move on to the ones that survive.


Struggling to get your favorite gene to survive FDR correction? Drop the p-values below and let's look at the distribution.

Resources

ResourceLinkNotes
Noble (2009)Nature Biotech"How does multiple testing correction work?"
R Documentation?p.adjustDetailed info on all available methods
False Discovery RateWikipediaThe math behind Benjamini-Hochberg