T-tests and ANOVA in R: lab stats without GraphPad Prism

This is Arc 1, Part 5 of the R for Biologists series.

The GraphPad moment

You've got your ggplot2 figure. The dot plot shows IL-6 expression dropping from Control to LPS1ng to LPS_10ng. You can _see the difference. But reviewers don't accept eyeballing — they want a p-value.

So you open GraphPad Prism. You paste your data, click Analyze, scroll through a menu of tests you half-remember from your stats course, pick "Unpaired t-test" because that sounds right, realize you have three groups so you need ANOVA, go back, click one-way ANOVA, wonder whether your data is parametric, check Normality under the options, click through four more dialogs, and eventually get a p-value that you copy into your manuscript.

Then you realize you analyzed the wrong column.

R gives you the same tests in one line. t.test() for two groups. aov() for three or more. The code is readable, reproducible, and version-controlled right alongside your data cleaning and figures. When a reviewer asks "how did you calculate this?" you send them the script.

By the end of this post, you'll run a t-test, a one-way ANOVA, Tukey HSD for all pairwise comparisons, and Dunnett's test for comparisons against control — all on the same qPCR data you've been working with.

What you'll learn

By the end of this post, you'll be able to:

Choose the right test for your data (decision flowchart)
Run a two-sample t-test with t.test() and interpret the output
Run a one-way ANOVA with aov() for three or more groups
Run Tukey HSD for all pairwise comparisons
Run Dunnett's test when you only care about comparisons to control
Check assumptions (normality, equal variance) and know what to do when they fail

Setup

Load the packages:

library(dplyr)
library(readr)

Now load the qPCR data and rebuild the IL-6 subset from previous posts. This makes the post self-contained:

data <- read_csv("qpcr_long.csv")

# Filter to IL-6, set group order
il6_data <- data |>
  filter(gene == "IL6", ct_value < 35) |>
  mutate(group = factor(group, levels = c("Control", "LPS_1ng", "LPS_10ng")))

il6_data

# A tibble: 18 x 8
   sample_id group    biological_rep technical_rep gene  ct_value plate_id date
   <chr>     <fct>             <dbl>         <dbl> <chr>    <dbl> <chr>    <date>
 1 S01_T1    Control               1             1 IL6       30.5 P01      2026-01-08
 2 S01_T2    Control               1             2 IL6       31.1 P01      2026-01-08
 3 S02_T1    Control               2             1 IL6       30.9 P01      2026-01-08
 4 S02_T2    Control               2             2 IL6       31.2 P01      2026-01-08
 5 S03_T1    Control               3             1 IL6       30.7 P01      2026-01-08
 6 S03_T2    Control               3             2 IL6       31.0 P01      2026-01-08
 7 S04_T1    LPS_1ng               1             1 IL6       27.0 P01      2026-01-08
 8 S04_T2    LPS_1ng               1             2 IL6       27.5 P01      2026-01-08
 9 S05_T1    LPS_1ng               2             1 IL6       27.2 P01      2026-01-08
10 S05_T2    LPS_1ng               2             2 IL6       27.6 P01      2026-01-08
11 S06_T1    LPS_1ng               3             1 IL6       27.1 P01      2026-01-08
12 S06_T2    LPS_1ng               3             2 IL6       27.4 P01      2026-01-08
13 S07_T1    LPS_10ng              1             1 IL6       23.7 P01      2026-01-08
14 S07_T2    LPS_10ng              1             2 IL6       24.1 P01      2026-01-08
15 S08_T1    LPS_10ng              2             1 IL6       23.8 P01      2026-01-08
16 S08_T2    LPS_10ng              2             2 IL6       24.2 P01      2026-01-08
17 S09_T1    LPS_10ng              3             1 IL6       23.6 P01      2026-01-08
18 S09_T2    LPS_10ng              3             2 IL6       24.0 P01      2026-01-08

18 rows: 3 groups, 3 biological replicates each, 2 technical replicates per bio rep. One row per measurement. This is the raw data — not the summarized means from the ggplot2 post. Statistical tests need the individual observations, not the summary statistics.

When to use what

Before running anything, you need to pick the right test. Here's the decision tree:

How many groups are you comparing?
├── 2 groups → t-test
│   ├── Same subjects measured twice? → Paired t-test
│   └── Different subjects? → Unpaired (two-sample) t-test
│
└── 3+ groups → ANOVA
    └── Significant? → Post-hoc tests
        ├── All pairwise comparisons → Tukey HSD
        └── Each treatment vs control only → Dunnett's test

That's it. Two groups, t-test. Three or more groups, ANOVA first, then post-hoc if significant.

Why not just run multiple t-tests?

You have three groups: Control, LPS_1ng, LPS_10ng. Why not run three t-tests — Control vs LPS_1ng, Control vs LPS_10ng, LPS_1ng vs LPS_10ng — and call it a day?

Because every time you run a test at α = 0.05, you have a 5% chance of a false positive. Run one test, 5% chance. Run three tests, and your overall false positive risk isn't 5% — it's:

1 - (1 - 0.05)^3 = 1 - 0.857 = 0.143

That's a 14.3% chance of at least one false positive. Run 10 comparisons and it climbs to 40%. This is the multiple comparisons problem, and it's why reviewers reject papers that run piles of t-tests without correction.

ANOVA solves this by asking a single question first: "Is there any difference among these groups?" If yes, you proceed to post-hoc tests that adjust for multiple comparisons. If no, you stop — no fishing expedition through pairwise comparisons.

Tukey HSD and Dunnett's test both control the family-wise error rate, keeping your overall α at 0.05 no matter how many comparisons you make. That's what makes them valid follow-ups to a significant ANOVA.

T-test: comparing two groups

Let's start simple. We'll compare IL-6 Ct values between Control and LPS_10ng only — two groups, so a t-test is the right tool.

First, subset the data:

# Subset to just two groups
il6_two <- il6_data |>
  filter(group %in% c("Control", "LPS_10ng"))

il6_two

# A tibble: 12 × 8
   sample_id group    biological_rep technical_rep gene  ct_value plate_id date
   <chr>     <fct>             <dbl>         <dbl> <chr>    <dbl> <chr>    <date>
 1 S01_T1    Control               1             1 IL6       30.5 P01      2026-01-08
 2 S01_T2    Control               1             2 IL6       31.1 P01      2026-01-08
 3 S02_T1    Control               2             1 IL6       30.9 P01      2026-01-08
 4 S02_T2    Control               2             2 IL6       31.2 P01      2026-01-08
 5 S03_T1    Control               3             1 IL6       30.7 P01      2026-01-08
 6 S03_T2    Control               3             2 IL6       31.0 P01      2026-01-08
 7 S07_T1    LPS_10ng              1             1 IL6       23.7 P01      2026-01-08
 8 S07_T2    LPS_10ng              1             2 IL6       24.1 P01      2026-01-08
 9 S08_T1    LPS_10ng              2             1 IL6       23.8 P01      2026-01-08
10 S08_T2    LPS_10ng              2             2 IL6       24.2 P01      2026-01-08
11 S09_T1    LPS_10ng              3             1 IL6       23.6 P01      2026-01-08
12 S09_T2    LPS_10ng              3             2 IL6       24.0 P01      2026-01-08

Now run the t-test:

t.test(ct_value ~ group, data = il6_two)

	Welch Two Sample t-test

data:  ct_value by group
t = 29.03, df = 9.87, p-value = 1.28e-10

alternative hypothesis: true difference in means between group Control and group LPS_10ng is not equal to 0
95 percent confidence interval:
 6.316 7.384
sample estimates:
   mean in group Control mean in group LPS_10ng
                   30.90                  23.90

Reading the output

Let's walk through each piece:

t = 29.03 — The t-statistic. It tells you how many standard errors apart the two group means are. A t of 29 means the Control mean is 29 standard errors above the LPS_10ng mean. That's enormous.

df = 9.87 — Degrees of freedom. Welch's t-test calculates this based on the sample sizes and variances of both groups. It won't be a nice round number.

p-value = 1.28e-10 — The probability of seeing a difference this large (or larger) if the true difference were zero. Here, p < 0.001 — the difference is statistically significant.

95 percent confidence interval: 6.316 to 7.384 — We're 95% confident the true difference between Control and LPS_10ng means falls in this range. Since the interval doesn't include zero, the difference is significant.

sample estimates — The actual group means. Control averaged 30.9 Ct, LPS_10ng averaged 23.9 Ct. Lower Ct = higher expression, so LPS_10ng has dramatically higher IL-6 expression.

Welch's vs Student's t-test

By default, t.test() runs Welch's t-test, which doesn't assume the two groups have equal variances. This is the safer choice — it works whether variances are equal or not.

The classic Student's t-test assumes equal variances. You can run it with:

t.test(ct_value ~ group, data = il6_two, var.equal = TRUE)

But there's rarely a reason to. Welch's test is just as powerful when variances are equal, and it doesn't break when they're not. Use the default.

Paired t-test

Our data has different samples in each group — Control samples are different wells than LPS_10ng samples. That's an unpaired (or two-sample) t-test, which is what we ran.

But sometimes you measure the same subjects twice — before and after treatment, for example. In that case, you'd use a paired t-test:

t.test(before, after, paired = TRUE)

The paired version accounts for the fact that measurements from the same subject are correlated. It's more powerful when the pairing is real, but wrong to use when it isn't. Our qPCR data isn't paired — different wells, different samples — so we use the unpaired version.

One-way ANOVA: comparing three or more groups

Now let's compare all three groups: Control, LPS_1ng, and LPS_10ng. We can't just run three t-tests — that inflates our false positive rate, as we discussed earlier. Instead, we use ANOVA.

ANOVA asks: "Is there any difference among these groups?" It's a single test that handles all groups at once.

# Run one-way ANOVA
anova_model <- aov(ct_value ~ group, data = il6_data)
summary(anova_model)

            Df Sum Sq Mean Sq F value   Pr(>F)
group        2  259.8  129.90   283.5 1.92e-13 ***
Residuals   15    6.9    0.46
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Reading the output

Df (Degrees of freedom) — Group has 2 df (number of groups minus 1). Residuals has 15 df (total observations minus number of groups: 18 - 3 = 15).

Sum Sq (Sum of Squares) — The variance explained by group differences (259.8) versus the variance within groups (6.9). Most of the variance is between groups — which is exactly what we want to see.

Mean Sq (Mean Square) — Sum of Squares divided by df. This gives us the average variance per degree of freedom.

F value = 283.5 — The ratio of between-group variance to within-group variance. An F of 283 means the differences between groups are 283 times larger than the random variation within groups. That's massive.

Pr(>F) = 1.92e-13 — The p-value. The probability of seeing an F-statistic this large if there were no real differences between groups. Here, p < 0.001 — highly significant.

The critical point: ANOVA tells you that at least one group differs from the others. It doesn't tell you which group, or how many groups differ. The p-value just says "something is different here." To find out exactly what's different, you need post-hoc tests.

Checking assumptions

ANOVA assumes your data is normally distributed and that all groups have roughly equal variance. Let's check both.

Normality of residuals:

shapiro.test(resid(anova_model))

	Shapiro-Wilk normality test

data:  resid(anova_model)
W = 0.9512, p-value = 0.4531

The Shapiro-Wilk test checks whether the residuals follow a normal distribution. A p-value > 0.05 means we can't reject normality — our data passes this assumption. Here, p = 0.45, so we're fine.

Equal variance (homogeneity of variance):

bartlett.test(ct_value ~ group, data = il6_data)

	Bartlett test of homogeneity of variances

data:  ct_value by group
Bartlett's K-squared = 0.3614, df = 2, p-value = 0.8345

Bartlett's test checks whether all groups have similar variance. Again, p > 0.05 means the assumption holds. With p = 0.83, our variances are similar enough.

What if assumptions fail? If normality or equal variance assumptions are violated, use the Kruskal-Wallis test instead: kruskal.test(ct_value ~ group, data = il6_data). It's the non-parametric alternative to one-way ANOVA — no assumptions about distribution shape.

The ANOVA p-value tells us there's a difference — but not where. That's what post-hoc tests are for.

Post-hoc tests: finding where the differences are

ANOVA told us the groups aren't all the same. Now we need to figure out which specific groups differ. That's the job of post-hoc tests.

Two main options: Tukey HSD compares every group to every other group. Dunnett's test compares each treatment to a single control. Pick based on what you actually care about.

Tukey HSD: all pairwise comparisons

Tukey's Honest Significant Difference test compares every group to every other group while adjusting for multiple comparisons. It's built into base R — no extra packages needed.

TukeyHSD(anova_model)

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = ct_value ~ group, data = il6_data)

$group
                        diff       lwr       upr     p adj
LPS_1ng-Control   -3.5333333 -4.328359 -2.738308 0.0000002
LPS_10ng-Control  -7.0000000 -7.795026 -6.204974 0.0000000
LPS_10ng-LPS_1ng  -3.4666667 -4.261692 -2.671641 0.0000003

Reading the output:

Each row is one comparison. The columns tell you:

diff — The difference between group means. LPS_1ng is 3.53 Ct lower than Control (lower Ct = higher expression).
lwr, upr — The 95% confidence interval for that difference. None of these cross zero.
p adj — The adjusted p-value, corrected for multiple comparisons. All three are highly significant (p < 0.001).

All three pairwise comparisons are significant here. Control differs from both LPS treatments, and the two LPS doses differ from each other.

When to use Tukey HSD: When you genuinely care about all pairwise differences. Good for exploratory analysis where any comparison might be interesting.

Dunnett's test: comparing treatments to control

Dunnett's test is more focused — it only compares each treatment to a single reference group (usually your control). Because it makes fewer comparisons, it has more statistical power for those specific comparisons.

First, install the package (run once):

install.packages("multcomp")  # run once
library(multcomp)

R picks factor levels alphabetically by default. We need Control as the reference level:

# Set Control as the reference level
il6_data$group <- relevel(il6_data$group, ref = "Control")

# Re-run ANOVA with new factor levels
anova_model <- aov(ct_value ~ group, data = il6_data)

Now run Dunnett's test:

dunnett <- glht(anova_model, linfct = mcp(group = "Dunnett"))
summary(dunnett)

	 Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Dunnett Contrasts


Fit: aov(formula = ct_value ~ group, data = il6_data)

Linear Hypotheses:
                       Estimate Std. Error t value Pr(>|t|)
LPS_1ng - Control == 0  -3.5333     0.3907  -9.043 1.07e-07 ***
LPS_10ng - Control == 0 -7.0000     0.3907 -17.916 1.33e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)

Reading the output:

Only two comparisons this time — each treatment versus Control. No LPS_1ng vs LPS_10ng comparison because Dunnett's doesn't care about that.

Estimate — The difference from Control. LPS_1ng is 3.53 Ct lower; LPS_10ng is 7.0 Ct lower.
Pr(>|t|) — The adjusted p-value. Both treatments are significantly different from Control.

When to use Dunnett's: When your experiment has a clear control group and you only care about "did treatment X differ from control?" You get more statistical power because you're making fewer comparisons. Perfect for dose-response experiments where the question is "which doses had an effect?"

Tukey vs Dunnett: which to choose?

Simple rule:

Dunnett's if you have a control and only care about treatment-vs-control comparisons. More power, fewer tests.
Tukey HSD if you genuinely need all pairwise comparisons — for example, if comparing LPS_1ng to LPS_10ng matters for your research question.

For most experiments with a control group, Dunnett's is the right choice. Save Tukey for when you actually need to know how treatments compare to each other, not just to baseline.

Putting it all together

Here's a complete, copy-paste-ready script that runs the entire analysis from start to finish. Save this as a template for your own qPCR experiments:

# Complete statistical analysis: IL-6 qPCR data

# Load packages
library(dplyr)
library(readr)
library(multcomp)

# Load and prepare data
data <- read_csv("qpcr_long.csv")
il6_data <- data |>
  filter(gene == "IL6", ct_value < 35) |>
  mutate(group = factor(group, levels = c("Control", "LPS_1ng", "LPS_10ng")))

# Run ANOVA
anova_model <- aov(ct_value ~ group, data = il6_data)
summary(anova_model)

# Check assumptions
shapiro.test(resid(anova_model))  # normality
bartlett.test(ct_value ~ group, data = il6_data)  # equal variance

# Post-hoc: Dunnett's test (treatments vs control)
dunnett <- glht(anova_model, linfct = mcp(group = "Dunnett"))
summary(dunnett)

Adapt this for your own data by changing the file name, filter conditions, and group levels.

Common mistakes

1. Running multiple t-tests instead of ANOVA

Three groups means three possible pairwise comparisons. Running three t-tests inflates your Type I error rate from 5% to over 14%. Use ANOVA to test for any difference, then post-hoc tests (Tukey or Dunnett's) for specific comparisons. The post-hoc tests adjust for multiple comparisons automatically.

2. Forgetting to set the reference level for Dunnett's test

R orders factor levels alphabetically by default. If your control group isn't first alphabetically, Dunnett's test will use the wrong reference. Always use relevel(your_factor, ref = "Control") before running the test.

3. Reporting ANOVA p-value without post-hocs

"ANOVA was significant (p < 0.05)" tells readers nothing about which groups differ. ANOVA only answers "is there any difference?" — not "where is the difference?" Always follow a significant ANOVA with the appropriate post-hoc test and report those specific comparisons.

4. Over-interpreting non-significant results

"No significant difference" does not mean "no difference exists." It means you didn't detect a difference with your sample size. Small samples lack statistical power — you might miss real effects. Report effect sizes alongside p-values, and acknowledge when sample size limits your conclusions.

What's next

We've got the building blocks: data wrangling, visualization, and statistical tests. In the next post, we'll pull it all together into a complete workflow — from loading raw qPCR data to producing a publication-ready figure with statistics, all in a single reproducible R script.

→ Next: From raw data to final figure: a complete R workflow for bench biologists

← Previous: Making publication-ready figures in R with ggplot2

Your turn

What statistical tests do you use most in your research? Any edge cases where you're not sure which test to pick? Drop a question in the comments.

Resources

Resource	What it is	Link
`t.test()`	Base R t-test documentation	rdocumentation.org
`aov()`	Base R ANOVA documentation	rdocumentation.org
`TukeyHSD()`	Tukey HSD documentation	rdocumentation.org
`multcomp`	Package for Dunnett's test	CRAN
`qpcr_long.csv`	Dataset used in this post	Download