Published on

How to make a gene expression heatmap in R (that actually looks good)

Authors
  • avatar
    Name
    BioTech Bench
    Twitter

This is Arc 2, Part 9 of the R for Biologists series.

The table you can't read

Post 8 ended with a log2 CPM matrix — normalized, log-transformed, ready. You could see that DUSP1 jumped from ~15 to ~17 in dexamethasone, and FKBP5 from ~13 to ~16.6 in dexamethasone. But reading a table of numbers and extracting a pattern across 30 genes and 6 samples is not something human visual systems are good at.

A heatmap is the fix. Color replaces number, and the eye picks up patterns instantly — which genes go up together, which go down, which samples cluster with which. Every RNA-seq paper has at least one. This post shows you how to build one that actually looks publication-ready, not like a default output you're embarrassed to show your PI.

What you'll learn

  • What a heatmap encodes: color = expression level, rows = genes, columns = samples
  • What the dendrogram is actually showing: hierarchical clustering, distance, linkage
  • How to build a minimal heatmap with pheatmap
  • How to upgrade it: RdBu color palette, z-score row scaling, sample group annotations
  • How to control clustering: distance metric, linkage method, cutree_rows for visual gaps

Setup

Install pheatmap and RColorBrewer if you haven't already:

install.packages("pheatmap")     # skip if already installed
install.packages("RColorBrewer") # skip if already installed

Load them:

library(pheatmap)
library(RColorBrewer)

The data: 30 dexamethasone-regulated genes

We're continuing with the dexamethasone experiment from Posts 7 and 8: human airway smooth muscle cells treated with dexamethasone (a glucocorticoid) versus untreated controls. In Post 8 we normalized 10 genes to log2 CPM. Here we expand to 30 — 15 known dex-inducible genes and 15 dex-repressed genes — to get a heatmap that looks like a real result.

The values are log2 CPM, consistent with what Post 8 produced for the genes we share (DUSP1, FKBP5, CRISPLD2, IL6):

mat <- matrix(c(
  14.76, 14.83, 14.64, 17.22, 17.28, 17.33,  # DUSP1
  12.81, 12.83, 12.82, 16.57, 16.66, 16.69,  # FKBP5
  11.87, 11.90, 11.75, 15.29, 15.31, 15.43,  # CRISPLD2
  11.23, 11.45, 11.08, 15.87, 16.12, 15.94,  # TSC22D3
  10.54, 10.72, 10.41, 15.23, 15.08, 15.37,  # ZBTB16
  11.02, 11.34, 10.89, 16.14, 16.41, 16.22,  # PER1
  12.14, 12.38, 12.05, 16.83, 17.09, 16.91,  # KLF9
  11.56, 11.23, 11.74, 15.67, 15.34, 15.82,  # DDIT4
  10.87, 11.13, 10.69, 15.04, 15.31, 15.12,  # CDKN1C
  11.34, 11.08, 11.52, 15.96, 15.68, 16.11,  # ANGPTL4
  10.92, 11.17, 10.78, 15.71, 15.96, 15.82,  # PDK4
  11.73, 11.45, 11.89, 16.38, 16.09, 16.54,  # RGS2
  12.37, 12.64, 12.22, 16.94, 17.21, 17.08,  # SGK1
  11.12, 10.89, 11.28, 15.78, 15.52, 15.93,  # GADD45B
  10.65, 10.41, 10.82, 15.34, 15.07, 15.49,  # TNFRSF9
  13.37, 13.39, 13.28, 11.01, 10.93, 11.44,  # IL6
  14.12, 14.35, 13.98,  9.23,  8.96,  9.37,  # IL1B
  13.56, 13.29, 13.72,  8.67,  8.41,  8.83,  # CXCL8
  14.34, 14.58, 14.19,  9.45,  9.19,  9.61,  # CCL2
  13.89, 13.63, 14.05,  8.90,  8.64,  9.06,  # TNF
  13.23, 13.47, 13.09,  8.34,  8.58,  8.45,  # PTGS2
  14.01, 13.75, 14.17,  9.12,  8.86,  9.28,  # MMP3
  13.67, 13.91, 13.53,  8.78,  9.02,  8.89,  # CCL11
  13.90, 13.64, 14.06,  9.01,  8.75,  9.17,  # CXCL5
  13.45, 13.69, 13.31,  8.56,  8.80,  8.67,  # ICAM1
  13.78, 13.52, 13.94,  8.89,  8.63,  9.05,  # VCAM1
  13.34, 13.58, 13.20,  8.45,  8.69,  8.56,  # CCL7
  14.23, 13.97, 14.39,  9.34,  9.08,  9.50,  # CXCL1
  13.89, 14.13, 13.75,  8.67,  8.91,  8.78,  # CXCL2
  13.45, 13.21, 13.60,  8.23,  7.98,  8.37   # MKI67
), nrow = 30, byrow = TRUE)

rownames(mat) <- c(
  "DUSP1", "FKBP5", "CRISPLD2", "TSC22D3", "ZBTB16", "PER1",
  "KLF9", "DDIT4", "CDKN1C", "ANGPTL4", "PDK4", "RGS2",
  "SGK1", "GADD45B", "TNFRSF9",
  "IL6", "IL1B", "CXCL8", "CCL2", "TNF", "PTGS2",
  "MMP3", "CCL11", "CXCL5", "ICAM1", "VCAM1",
  "CCL7", "CXCL1", "CXCL2", "MKI67"
)
colnames(mat) <- c("Ctrl_1", "Ctrl_2", "Ctrl_3", "Dex_1", "Dex_2", "Dex_3")

Check the dimensions:

dim(mat)
[1] 30  6

Take a look at the first few rows:

head(mat, 3)
         Ctrl_1 Ctrl_2 Ctrl_3 Dex_1 Dex_2 Dex_3
DUSP1     14.76  14.83  14.64 17.22 17.28 17.33
FKBP5     12.81  12.83  12.82 16.57 16.66 16.69
CRISPLD2  11.87  11.90  11.75 15.29 15.31 15.43

The top 15 rows are dex-induced genes; the bottom 15 are dex-repressed. You can't really see the pattern in the table — the numbers overlap too much. That's exactly what the heatmap will fix.

Stage 1 — The minimal heatmap

One function, one line:

pheatmap(mat)
# Plot appears in the Plots pane

What you get: a heatmap with dendrograms on both axes and gene names on the right. The clustering is already working — dex samples are grouping together and ctrl samples are grouping together.

But three things are wrong:

  1. The color scale maps to absolute log2 CPM values. Genes with high absolute expression dominate the colors, hiding the treatment effect between conditions.
  2. No group labels. You can see Ctrl and Dex in the column names, but there's no visual separator.
  3. The default color scale doesn't communicate biology cleanly. Red for up and blue for down is the convention in expression data — but only after row-scaling, which we haven't done yet.

What the dendrogram means

Before fixing the aesthetics, it's worth understanding what that tree on the left is actually showing.

Hierarchical clustering treats each gene's expression profile as a point in multi-dimensional space — in our case, six-dimensional (one dimension per sample). It computes how far apart each pair of genes is, merges the closest two into a cluster, then treats that cluster as a new point and repeats until everything is merged into one tree. The branching structure it builds is the dendrogram. Genes that join at a low branch are similar to each other; genes that only merge near the top are very different.

The distance measure controls what "far apart" means. The default in pheatmap is Euclidean distance — the straight-line distance in 6D space. This is usually adequate, but it means genes with different absolute levels can look "similar" if their profiles happen to be numerically close. We'll switch to correlation-based distance in Stage 3, which measures similarity of shape (does the gene go up and down at the same times?) rather than closeness in absolute value.

The linkage method controls how distances between clusters are computed after the first merge. The default is complete linkage: the distance between two clusters is the maximum pairwise distance between any member of one and any member of the other. Ward.D2 (which we'll use in Stage 3) minimizes total within-cluster variance at each merge step — for expression data, it usually gives tighter, more interpretable groups with cleaner visual separation.

Stage 2 — The publication upgrade

Three changes: colors, scaling, and annotation.

annotation_col <- data.frame(
  Group = c("Ctrl", "Ctrl", "Ctrl", "Dex", "Dex", "Dex")
)
rownames(annotation_col) <- colnames(mat)

pheatmap(mat,
  color          = colorRampPalette(rev(brewer.pal(9, "RdBu")))(100),
  scale          = "row",
  annotation_col = annotation_col,
  border_color   = NA,
  fontsize_row   = 8,
  fontsize_col   = 10
)
# Plot appears in the Plots pane

The heatmap now shows two clear bands: the top 15 genes (dex-induced) are blue in control samples and red in dex samples; the bottom 15 (dex-repressed) are red in controls and blue in dex. A teal/salmon annotation bar above the columns labels the two conditions. The dendrograms are separating gene groups and sample conditions cleanly.

What each argument does:

  • color = colorRampPalette(rev(brewer.pal(9, "RdBu")))(100) — creates a 100-step palette running from blue (low) through white (zero) to red (high). rev() flips the RdBu palette so blue = low and red = high, matching the convention in most expression papers.
  • scale = "row" — z-scores each row: subtracts the gene's mean across all samples and divides by its SD. After scaling, every gene has mean 0 and SD 1. The color now reflects whether expression is above or below that gene's own average — which is what you actually want to see.
  • annotation_col — a data frame with one row per sample (rownames must match the matrix column names) and one or more columns. Each column becomes a colored annotation bar above the heatmap.
  • border_color = NA — removes the grid lines between cells. Optional, but cleaner.
  • fontsize_row = 8, fontsize_col = 10 — adjust font sizes so gene names don't overlap at 30 rows.

Stage 3 — Clustering tuning

Two arguments change the clustering behavior; one adds a visual gap line:

pheatmap(mat,
  color                    = colorRampPalette(rev(brewer.pal(9, "RdBu")))(100),
  scale                    = "row",
  annotation_col           = annotation_col,
  border_color             = NA,
  fontsize_row             = 8,
  fontsize_col             = 10,
  clustering_distance_rows = "correlation",
  clustering_method        = "ward.D2",
  cutree_rows              = 2
)
# Plot appears in the Plots pane

The heatmap now shows a visible gap line separating the 15 dex-induced genes (top cluster, blue in Ctrl / red in Dex) from the 15 dex-repressed genes (bottom cluster, red in Ctrl / blue in Dex). The clusters are tighter and the dendrogram branches more cleanly than in Stage 2.

What changed:

  • clustering_distance_rows = "correlation" — switches to Pearson correlation-based distance for the gene dendrogram. Two genes with the same directional response in dex will cluster together even if their absolute levels differ. This is almost always a better choice than Euclidean for expression data.
  • clustering_method = "ward.D2" — switches to Ward.D2 linkage, which minimizes within-cluster variance at each merge step. It produces more compact, visually distinct clusters than the default complete linkage.
  • cutree_rows = 2 — draws a gap line through the row dendrogram, cutting it into 2 clusters. The number matches the biology: one up-regulated cluster, one down-regulated cluster.

We leave clustering_distance_cols at its default (Euclidean) — with only 6 samples, the choice of distance method matters much less for the column dendrogram.

The full script

Everything together, copy-paste ready:

library(pheatmap)
library(RColorBrewer)

mat <- matrix(c(
  14.76, 14.83, 14.64, 17.22, 17.28, 17.33,  # DUSP1
  12.81, 12.83, 12.82, 16.57, 16.66, 16.69,  # FKBP5
  11.87, 11.90, 11.75, 15.29, 15.31, 15.43,  # CRISPLD2
  11.23, 11.45, 11.08, 15.87, 16.12, 15.94,  # TSC22D3
  10.54, 10.72, 10.41, 15.23, 15.08, 15.37,  # ZBTB16
  11.02, 11.34, 10.89, 16.14, 16.41, 16.22,  # PER1
  12.14, 12.38, 12.05, 16.83, 17.09, 16.91,  # KLF9
  11.56, 11.23, 11.74, 15.67, 15.34, 15.82,  # DDIT4
  10.87, 11.13, 10.69, 15.04, 15.31, 15.12,  # CDKN1C
  11.34, 11.08, 11.52, 15.96, 15.68, 16.11,  # ANGPTL4
  10.92, 11.17, 10.78, 15.71, 15.96, 15.82,  # PDK4
  11.73, 11.45, 11.89, 16.38, 16.09, 16.54,  # RGS2
  12.37, 12.64, 12.22, 16.94, 17.21, 17.08,  # SGK1
  11.12, 10.89, 11.28, 15.78, 15.52, 15.93,  # GADD45B
  10.65, 10.41, 10.82, 15.34, 15.07, 15.49,  # TNFRSF9
  13.37, 13.39, 13.28, 11.01, 10.93, 11.44,  # IL6
  14.12, 14.35, 13.98,  9.23,  8.96,  9.37,  # IL1B
  13.56, 13.29, 13.72,  8.67,  8.41,  8.83,  # CXCL8
  14.34, 14.58, 14.19,  9.45,  9.19,  9.61,  # CCL2
  13.89, 13.63, 14.05,  8.90,  8.64,  9.06,  # TNF
  13.23, 13.47, 13.09,  8.34,  8.58,  8.45,  # PTGS2
  14.01, 13.75, 14.17,  9.12,  8.86,  9.28,  # MMP3
  13.67, 13.91, 13.53,  8.78,  9.02,  8.89,  # CCL11
  13.90, 13.64, 14.06,  9.01,  8.75,  9.17,  # CXCL5
  13.45, 13.69, 13.31,  8.56,  8.80,  8.67,  # ICAM1
  13.78, 13.52, 13.94,  8.89,  8.63,  9.05,  # VCAM1
  13.34, 13.58, 13.20,  8.45,  8.69,  8.56,  # CCL7
  14.23, 13.97, 14.39,  9.34,  9.08,  9.50,  # CXCL1
  13.89, 14.13, 13.75,  8.67,  8.91,  8.78,  # CXCL2
  13.45, 13.21, 13.60,  8.23,  7.98,  8.37   # MKI67
), nrow = 30, byrow = TRUE)

rownames(mat) <- c(
  "DUSP1", "FKBP5", "CRISPLD2", "TSC22D3", "ZBTB16", "PER1",
  "KLF9", "DDIT4", "CDKN1C", "ANGPTL4", "PDK4", "RGS2",
  "SGK1", "GADD45B", "TNFRSF9",
  "IL6", "IL1B", "CXCL8", "CCL2", "TNF", "PTGS2",
  "MMP3", "CCL11", "CXCL5", "ICAM1", "VCAM1",
  "CCL7", "CXCL1", "CXCL2", "MKI67"
)
colnames(mat) <- c("Ctrl_1", "Ctrl_2", "Ctrl_3", "Dex_1", "Dex_2", "Dex_3")

annotation_col <- data.frame(
  Group = c("Ctrl", "Ctrl", "Ctrl", "Dex", "Dex", "Dex")
)
rownames(annotation_col) <- colnames(mat)

pheatmap(mat,
  color                    = colorRampPalette(rev(brewer.pal(9, "RdBu")))(100),
  scale                    = "row",
  annotation_col           = annotation_col,
  border_color             = NA,
  fontsize_row             = 8,
  fontsize_col             = 10,
  clustering_distance_rows = "correlation",
  clustering_method        = "ward.D2",
  cutree_rows              = 2
)
# Plot appears in the Plots pane

What's next

In the next post, we'll revisit p-values — specifically what happens when you test 30 genes at once. Running 30 t-tests and reporting the raw p-values is not valid: at p < 0.05, you'd expect 1–2 false positives by chance alone. Post 10 covers multiple testing correction: what p.adjust() is actually doing, when to use FDR versus Bonferroni, and how to apply both in R.


What dataset did you use your first heatmap on? Drop a comment below.

Resources

ResourceLinkNotes
pheatmap packagecran.r-project.org/package=pheatmapKolde R (2019). pheatmap: Pretty Heatmaps. R package version 1.0.12
RColorBrewercran.r-project.org/package=RColorBrewerNeuwirth E (2022). Color palettes including RdBu
pheatmap GitHubgithub.com/raivokolde/pheatmapSource code and full argument reference