- Published on
From raw data to final figure: a complete R workflow for bench biologists
- Authors

- Name
- BioTech Bench
This is Arc 1, Part 6 of the R for Biologists series.
The payoff
You've built each piece separately.
In Post 2, you loaded a qPCR CSV and explored it with head() and glimpse(). In Post 3, you cleaned it with dplyr — filtered out undetermined wells, averaged technical replicates, calculated delta Ct. In Post 4, you turned those summaries into a dot plot with error bars. In Post 5, you ran ANOVA and Tukey HSD and got your p-values.
Now we put it all together.
This post runs the complete workflow — raw CSV to publication-ready figure with significance brackets — in one script. By the end you'll have a template you can adapt for any lab dataset.
What you'll learn
By the end of this post, you'll be able to:
- Run a complete analysis from raw CSV to exported figure in a single R script
- Add significance brackets and p-value annotations to a ggplot2 figure with
ggsignif - Export at 300 dpi for journal submission
- Apply the same 6-step workflow to a new dataset (ELISA)
Setup
Install ggsignif — the one new package in this post. Everything else you already have from the earlier posts:
install.packages("ggsignif") # new in this post — skip if already installed
Load all four packages:
library(dplyr)
library(readr)
library(ggplot2)
library(ggsignif)
Download qpcr_long.csv if you don't have it — same file from Post 2 onward.
Step 1: Load
data <- read_csv("qpcr_long.csv")
glimpse(data)
Rows: 90
Columns: 8
$ sample_id <chr> "S01_T1", "S01_T2", "S02_T1", ...
$ group <chr> "Control", "Control", "Control", ...
$ biological_rep <dbl> 1, 1, 2, ...
$ technical_rep <dbl> 1, 2, 1, ...
$ gene <chr> "GAPDH", "GAPDH", "GAPDH", ...
$ ct_value <dbl> 21.15, 21.22, 20.87, ...
$ plate_id <chr> "P01", "P01", "P01", ...
$ date <date> 2026-01-08, 2026-01-08, ...
Same 90-row file from Post 2. One row per measurement — one gene, one sample, one technical replicate.
Step 2: Clean
Drop undetermined wells (Ct ≥ 35) and the ACTB reference gene — we'll use GAPDH only. Then keep the columns we actually need:
clean <- data |>
filter(ct_value < 35) |>
filter(!gene %in% c("ACTB")) |>
select(sample_id, group, biological_rep, gene, ct_value)
nrow(clean)
[1] 72
72 rows: 4 genes × 3 groups × 3 biological replicates × 2 technical replicates. All Ct values in this dataset fall below 35, so the first filter keeps everything; its value is as a safeguard for real data that might have undetermined wells.
Step 3: Summarize
The same pipeline from Post 3. Average across all replicates per gene per group:
summary_stats <- clean |>
group_by(group, gene) |>
summarise(
mean_ct = mean(ct_value),
sd_ct = sd(ct_value),
n = n(),
.groups = "drop"
)
print(summary_stats)
# A tibble: 12 × 5
group gene mean_ct sd_ct n
<chr> <chr> <dbl> <dbl> <int>
1 Control GAPDH 21.1 0.196 6
2 Control IL10 32.0 0.293 6
3 Control IL6 30.9 0.338 6
4 Control TNF 29.8 0.238 6
5 LPS_10ng GAPDH 21.2 0.220 6
6 LPS_10ng IL10 27.2 0.306 6
7 LPS_10ng IL6 23.9 0.320 6
8 LPS_10ng TNF 24.1 0.289 6
9 LPS_1ng GAPDH 21.1 0.222 6
10 LPS_1ng IL10 29.3 0.401 6
11 LPS_1ng IL6 27.3 0.352 6
12 LPS_1ng TNF 27.2 0.196 6
We'll focus on IL-6 for the figure. The dose-response is already visible in the numbers: Control 30.9 → LPS_1ng 27.3 → LPS_10ng 23.9. Each ~3.5 Ct drop is roughly an 8-fold increase in expression.
Step 4: Plot
Filter to IL-6, set the group order as a factor, and build the dot plot with error bars. This is the same scaffold from Post 4:
il6_summary <- summary_stats |>
filter(gene == "IL6") |>
mutate(group = factor(group, levels = c("Control", "LPS_1ng", "LPS_10ng")))
p <- ggplot(il6_summary, aes(x = group, y = mean_ct, color = group)) +
geom_point(size = 4) +
geom_errorbar(
aes(ymin = mean_ct - sd_ct, ymax = mean_ct + sd_ct),
width = 0.2
) +
labs(
title = "IL-6 expression by treatment group",
x = NULL,
y = "Mean Ct (lower = higher expression)"
) +
theme_classic() +
theme(legend.position = "none")
p
The dose-response is visible — Control is highest, LPS_10ng is lowest. What's missing: significance annotations to tell readers which differences are statistically meaningful.
Step 5: Annotate significance
ggsignif adds significance brackets as a ggplot2 geom layer — the same way geom_point() or geom_errorbar() adds dots and bars. You specify which groups to compare and what label to show; it draws the bracket.
First, get the p-values from Tukey HSD (the same test from Post 5):
il6_raw <- clean |>
filter(gene == "IL6") |>
mutate(group = factor(group, levels = c("Control", "LPS_1ng", "LPS_10ng")))
anova_model <- aov(ct_value ~ group, data = il6_raw)
TukeyHSD(anova_model)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = ct_value ~ group, data = il6_raw)
$group
diff lwr upr p adj
LPS_1ng-Control -3.5333333 -4.328359 -2.738308 0.0000002
LPS_10ng-Control -7.0000000 -7.795026 -6.204974 0.0000000
LPS_10ng-LPS_1ng -3.4666667 -4.261692 -2.671641 0.0000003
All three comparisons are p < 0.001 — we'll use *** for all three. Now add the brackets:
p_annotated <- p +
geom_signif(
comparisons = list(
c("Control", "LPS_1ng"),
c("LPS_1ng", "LPS_10ng"),
c("Control", "LPS_10ng")
),
annotations = c("***", "***", "***"),
y_position = c(32.0, 33.0, 34.5),
tip_length = 0.01,
color = "black"
) +
expand_limits(y = 36)
p_annotated
geom_signif() takes three key arguments:
comparisons— a list of two-element vectors naming the groups to connect. Group names must match your x-axis factor levels exactly.annotations— the label for each bracket. Use"***"for p < 0.001,"**"for p < 0.01,"*"for p < 0.05,"ns"for not significant. Or pass the numeric p-value as a string:"p = 0.03".y_position— where to place the horizontal bar of each bracket, in data coordinates. Stack them: the innermost bracket lowest, the outermost highest.
The expand_limits(y = 36) line pushes the y axis up to make room for all three brackets. Adjust this value if brackets run off the top of your figure.
Step 6: Export
ggsave("il6_figure.png", plot = p_annotated, dpi = 300, width = 5, height = 5)
ggsave("il6_figure.tiff", plot = p_annotated, dpi = 300, width = 5, height = 5)
300 dpi is the minimum most journals require for figures. Export both PNG (for previewing and presentations) and TIFF (for submission — many journals specify TIFF or EPS). Adjust width and height (in inches) to match your journal's column width guidelines.
The complete script
Here's the full workflow in one clean script. Save it as qpcr_analysis.R, swap in your own file path and column names, and it will run start to finish:
# === qPCR analysis: complete Arc 1 workflow ===
# Packages
library(dplyr)
library(readr)
library(ggplot2)
library(ggsignif)
# Step 1: Load
data <- read_csv("qpcr_long.csv")
# Step 2: Clean
clean <- data |>
filter(ct_value < 35) |>
filter(!gene %in% c("ACTB")) |>
select(sample_id, group, biological_rep, gene, ct_value)
# Step 3: Summarize
summary_stats <- clean |>
group_by(group, gene) |>
summarise(
mean_ct = mean(ct_value),
sd_ct = sd(ct_value),
n = n(),
.groups = "drop"
)
# Step 4: Plot — IL-6 dot plot with error bars
il6_summary <- summary_stats |>
filter(gene == "IL6") |>
mutate(group = factor(group, levels = c("Control", "LPS_1ng", "LPS_10ng")))
il6_raw <- clean |>
filter(gene == "IL6") |>
mutate(group = factor(group, levels = c("Control", "LPS_1ng", "LPS_10ng")))
p <- ggplot(il6_summary, aes(x = group, y = mean_ct, color = group)) +
geom_point(size = 4) +
geom_errorbar(
aes(ymin = mean_ct - sd_ct, ymax = mean_ct + sd_ct),
width = 0.2
) +
labs(
title = "IL-6 expression by treatment group",
x = NULL,
y = "Mean Ct (lower = higher expression)"
) +
theme_classic() +
theme(legend.position = "none")
# Step 5: Significance annotations
anova_model <- aov(ct_value ~ group, data = il6_raw)
TukeyHSD(anova_model) # check p-values before annotating
p_annotated <- p +
geom_signif(
comparisons = list(
c("Control", "LPS_1ng"),
c("LPS_1ng", "LPS_10ng"),
c("Control", "LPS_10ng")
),
annotations = c("***", "***", "***"),
y_position = c(32.0, 33.0, 34.5),
tip_length = 0.01,
color = "black"
) +
expand_limits(y = 36)
p_annotated
# Step 6: Export
ggsave("il6_figure.png", plot = p_annotated, dpi = 300, width = 5, height = 5)
ggsave("il6_figure.tiff", plot = p_annotated, dpi = 300, width = 5, height = 5)
Applying it to ELISA data
The six steps above aren't specific to qPCR. Here's the same workflow on a plate reader dataset — OD450 absorbance readings from a cytokine ELISA, three conditions, three biological replicates each.
Download elisa_data.csv, then:
# Step 1: Load
elisa <- read_csv("elisa_data.csv")
# Step 2: Clean
elisa_clean <- elisa |>
filter(od_value > 0.05) |>
select(sample_id, group, biological_rep, od_value) |>
mutate(group = factor(group, levels = c("Control", "Drug_1uM", "Drug_10uM")))
# Step 3: Summarize
elisa_summary <- elisa_clean |>
group_by(group) |>
summarise(
mean_od = mean(od_value),
sd_od = sd(od_value),
n = n(),
.groups = "drop"
)
# Steps 4–5: Plot + annotate
anova_elisa <- aov(od_value ~ group, data = elisa_clean)
TukeyHSD(anova_elisa)
ggplot(elisa_summary, aes(x = group, y = mean_od, color = group)) +
geom_point(size = 4) +
geom_errorbar(
aes(ymin = mean_od - sd_od, ymax = mean_od + sd_od),
width = 0.2
) +
geom_signif(
comparisons = list(c("Control", "Drug_1uM"), c("Control", "Drug_10uM")),
annotations = c("***", "***"),
y_position = c(1.0, 1.1),
tip_length = 0.01,
color = "black"
) +
expand_limits(y = 1.2) +
labs(x = NULL, y = "Absorbance (OD 450 nm)") +
theme_classic() +
theme(legend.position = "none")
# Step 6: Export
ggsave("elisa_figure.png", dpi = 300, width = 4, height = 5)
The column names changed (ct_value → od_value). The filter threshold changed (< 35 → > 0.05). The axis label changed. The rest is identical.
The steps don't change. Only the column names do.
What's next
That's Arc 1 complete.
You've gone from zero R to a full reproducible analysis pipeline: loading data, cleaning it, summarizing it, plotting it, testing it, and annotating the figure. If you've been following along with your own data, you should have a working analysis.R script by now.
Arc 2 picks up from here and moves into public datasets and bioinformatics workflows — downloading gene expression data from GEO, normalization, DESeq2, pathway enrichment. The tidyverse syntax, ggplot2 patterns, and statistical thinking from Arc 1 all carry straight over.
→ Next: How to download public gene expression datasets from GEO in R
← Previous: T-tests and ANOVA in R: lab stats without GraphPad Prism
What dataset have you been waiting to try this on? Drop it in the comments.
Resources
| Resource | What it is | Link |
|---|---|---|
ggsignif | Significance brackets for ggplot2 | cran.r-project.org/package=ggsignif |
ggplot2 | Visualization (covered in Post 4) | ggplot2.tidyverse.org |
dplyr | Data wrangling (covered in Post 3) | dplyr.tidyverse.org |
qpcr_long.csv | qPCR dataset used throughout Arc 1 | Download |
elisa_data.csv | ELISA dataset used in the outro | Download |