The complete R for biologists learning path: from spreadsheet to bioinformatics

You've heard people say "you should learn R." Maybe a collaborator sent you a script and you had no idea where to start. Maybe you've been clicking through the same GraphPad Prism workflow for three years and you're starting to wonder if there's a better way. Maybe you're trying to analyze RNA-seq data and everyone keeps pointing you toward DESeq2 — an R package.

Whatever brought you here, this is the place to start.

This page is the home of the R for Biologists series — a structured learning path that takes you from "I've never opened a terminal" to running a real differential gene expression analysis. No computer science degree required. No toy datasets about iris flowers. Just R, biological data that looks like yours, and explanations that respect the fact that you're already a scientist.

How this series works

The series is divided into three arcs. Each arc ends at a milestone — a point where you've learned something genuinely useful, and where it would be completely reasonable to stop if you've got what you need.

You don't have to read every post. Jump to wherever you are right now. But if you're starting from scratch, working through them in order is worth it — each post builds on the one before.

Posts go live every week. Bookmark this page and check back, or follow the blog so you don't miss one.

Arc 1 — R for your everyday lab

You'll finish this arc knowing: How to load your own data, clean it, visualize it, and run basic statistical tests — all in R. You won't need GraphPad Prism or Excel for data analysis anymore.

This arc covers the tools that 90% of bench biologists will use every single day. It's practical, it's fast, and every example uses data that looks like something you'd actually export from your thermocycler or plate reader.

#	Post	Status
1	Why every biologist should learn R (and how to start with Swirl)	✅ Published
2	Your first R script: loading and exploring biological data	✅ Published
3	How to clean and organize your lab data in R with dplyr	✅ Published
4	Making publication-ready figures in R with ggplot2	✅ Published
5	T-tests and ANOVA in R: lab stats without GraphPad Prism	✅ Published
6	From raw data to final figure: a complete R workflow for bench biologists	✅ Published

Arc 1 datasets: qPCR export CSVs, ELISA plate reader data, Western blot quantification — all simulated from real lab formats, all downloadable.

Arc 2 — Statistics and public data

You'll finish this arc knowing: How to pull datasets from public repositories like GEO, run analyses that would pass peer review, and make figures that belong in a paper — not just a lab meeting.

This arc is where R starts to feel like a real superpower. You stop working only with your own data and start tapping into thousands of published datasets. You also get serious about statistical rigor — multiple testing correction, normalization, clustering.

#	Post	Status
7	How to download public gene expression datasets from GEO in R	✅ Published
8	Why your data needs normalization — and how to do it in R	✅ Published
9	How to make a gene expression heatmap in R (that actually looks good)	✅ Published
10	Multiple testing correction in R: what p.adjust is really doing to your p-values	✅ Published
11	What is Bioconductor, and why does everyone in bioinformatics use it?	✅ Published
12	A complete statistical analysis of a public dataset in R: GEO walkthrough	✅ Published

Arc 2 datasets: Public GEO datasets (NCBI), RNA-seq count matrices, proteomics data — all free, no login required.

Arc 3 — Bioinformatics proper

You'll finish this arc knowing: How to run a real differential gene expression analysis from raw RNA-seq counts all the way through pathway enrichment. You'll understand what DESeq2 is actually doing under the hood, and you'll have touched single-cell RNA-seq.

This is the arc that most people mean when they say "I want to learn bioinformatics." It's not as intimidating as it looks — especially if you've built up from Arc 1 and Arc 2.

#	Post	Status
13	Understanding RNA-seq data: what those count matrices actually mean	✅ Published
14	Differential gene expression with DESeq2: a step-by-step tutorial	✅ Published
15	DESeq2 vs edgeR: which one should you use and does it actually matter?	✅ Published
16	Volcano plots and MA plots: visualizing RNA-seq results in R	Coming May 26
17	Gene ontology and pathway enrichment analysis in R with clusterProfiler	✅ Published
18	Your first single-cell RNA-seq analysis in R with Seurat	Coming Jun 9
19	The complete RNA-seq pipeline in R: from raw counts to biological insight	Coming Jun 16

Arc 3 datasets: RNA-seq count matrices from TCGA and GEO, 10x Genomics PBMC single-cell data — all public, with accession numbers in each post.

Where should you start?

Never opened R before? Start at Post 1. It covers installation, why R is worth your time, and how to take your first steps interactively using a tool called swirl.

Already have R installed and have done some basics? Jump to Post 2 or Post 3. Post 2 is about loading your actual data. Post 3 is where you start cleaning and wrangling it with dplyr.

Comfortable with R basics but new to bioinformatics? Arc 2 (Post 7) is your entry point. Start with GEO and public datasets.

Already doing genomics and just want the bioinformatics pipeline content? Go straight to Arc 3 (Post 13).

A few things to know about this series

All tools are free and open-source. Every package used in this series is available at no cost. You don't need a Prism license or a Matlab license or anything else. R and RStudio are both free.

Every post uses real biological data. No iris flowers, no mtcars. The datasets throughout this series look like what you actually export from a thermocycler, a plate reader, or a genomics core.

Code is always runnable. Every code block is tested and complete. If you copy and paste it, it should work — assuming your data is in the format described. When something might break, I'll tell you why and how to fix it.

This is not a comprehensive R textbook. There's a lot of R we won't cover. The goal is to get you doing useful biology in R as fast as possible, not to make you an R expert. If you want deep R theory, there are better places for that. This series is for scientists who want to analyze their data.

Just getting started? Post 1 is waiting. If you've already got R installed and you're ready to load some real data, Post 2 is where the hands-on stuff begins.

Which part of this series are you most looking forward to? Drop a comment below — it helps me know where to focus the most detail.