Published on

Conda for biologists: managing bioinformatics environments without losing your mind

Authors
  • avatar
    Name
    BioTech Bench
    Twitter

Dependency hell is real

Let me guess. You tried to install a bioinformatics tool — maybe samtools, maybe fastqc, maybe a Python package for single-cell analysis. The installation instructions said "requires Python 3.9." You had Python 3.12. You installed 3.9. Now your other tool that needed 3.12 is broken. You created a virtual environment, but then a C library was missing. You installed the C library, but it conflicted with your system's version. You gave up and asked the bioinformatics core to run it for you.

This is called dependency hell, and it is the single biggest barrier between bench biologists and computational tools. It is not your fault. It is a genuinely hard problem — different tools need different versions of Python, R, C libraries, and system dependencies, and installing one can break another.

Conda is the solution. It is a package manager and environment manager that creates isolated "bubbles" — each with its own Python version, its own R version, its own libraries — so that installing one tool does not affect anything else on your system. Think of it as a set of sandboxed mini-computers inside your computer, each perfectly configured for one task.

What you'll learn

  • What Conda is and how it works
  • How to install Conda (Miniconda)
  • How to create and manage isolated environments
  • How to install bioinformatics tools from the bioconda channel
  • How to share your environment with collaborators (reproducibility)
  • How to avoid the most common Conda pitfalls

Step 1: Install Conda (Miniconda)

You don't need the full Anaconda distribution — it is 3 GB and comes with hundreds of packages you will never use. Instead, install Miniconda, which is just Conda itself plus Python. It is about 400 MB.

On Linux:

# Download the installer
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

# Run it
bash Miniconda3-latest-Linux-x86_64.sh

# Follow the prompts. When it asks to initialize Conda, say yes.

On macOS (Apple Silicon):

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh
bash Miniconda3-latest-MacOSX-arm64.sh

After installation, restart your terminal. Verify it worked:

conda --version
conda 25.11.1

If you see a version number, you're good. If you see "command not found," Conda is not in your PATH. Run source ~/.bashrc (or source ~/.zshrc on macOS) and try again.


Step 2: Set up your channels

Conda installs packages from channels — repositories of pre-built software. For bioinformatics, there are three channels you need, and the order matters:

conda config --add channels conda-forge
conda config --add channels bioconda

Verify your configuration:

conda config --show channels
channels:
  - conda-forge
  - bioconda
  - defaults

Why this order matters: Conda resolves dependencies across channels in the order listed. conda-forge should be first because it has the most up-to-date general packages. bioconda should be second — it contains bioinformatics-specific tools. defaults (the Anaconda repository) is the fallback. If you put bioconda first, Conda might pull a broken dependency from an older bioconda build instead of getting the maintained version from conda-forge.


Step 3: Create your first environment

Let's create an environment for a hypothetical RNA-seq project. We'll call it rna-seq and specify Python 3.11:

conda create -n rna-seq python=3.11 -y
Executing transaction: done

#
# To activate this environment, use
#
#     $ conda activate rna-seq
#
# To deactivate an active environment, use
#
#     $ conda deactivate

The -n flag names the environment. The -y flag says "yes to everything" (so you don't have to confirm). Conda downloaded Python 3.11 and a minimal set of dependencies into a new isolated directory.

Activate and verify

conda activate rna-seq
python --version
Python 3.11.15

When the environment is active, its name appears in your terminal prompt. Any package you install now goes into this environment only — it does not touch your system Python or any other environment.

To see all your environments:

conda env list
# conda environments:
#
# * -> active
# + -> frozen
btest                    /home/redhat/.conda/envs/btest
                         /home/redhat/miniconda3
                         /home/redhat/miniconda3/envs/CRISPRitz
                         /home/redhat/miniconda3/envs/crispor
                         /home/redhat/miniconda3/envs/crispresso2_env
                         /home/redhat/miniconda3/envs/fastqc
                         /home/redhat/miniconda3/envs/paperqa
base                     /usr

The * shows which environment is currently active.


Step 4: Install bioinformatics tools

This is where Conda shines. The bioconda channel has thousands of pre-compiled bioinformatics tools — samtools, bedtools, fastqc, bwa, bowtie2, STAR, DESeq2, cutadapt, trimmomatic, and hundreds more. You don't need to compile anything. Conda downloads the pre-built binary and installs it.

Install samtools

conda install -n rna-seq -c bioconda samtools -y
Preparing transaction: done
Verifying transaction: done
Executing transaction: done

That's it. Samtools is installed — no compiling, no missing headers, no ./configure && make && make install. Verify it:

samtools --version
samtools 1.3.1
Using htslib 1.3.1
Copyright (C) 2016 Genome Research Ltd.

Install FastQC

conda install -n rna-seq -c bioconda fastqc -y

FastQC is a Java application that normally requires you to install Java separately, download the FastQC zip, unzip it, and make the script executable. With Conda, Java is installed as a dependency automatically:

fastqc --version
FastQC v0.12.1

Search for available packages

Not sure if a tool is on bioconda? Search for it:

conda search -c bioconda samtools
# Name                       Version           Build  Channel
samtools                      0.1.12               0  bioconda
samtools                      0.1.12               1  bioconda
samtools                      0.1.13               0  bioconda
samtools                      0.1.14               0  bioconda
samtools                      0.1.18     h20b1175_12  bioconda
samtools                      1.3.1       h8ea3c3a_10  bioconda
samtools                      1.17         h00cdaf9_0  bioconda
samtools                      1.19         h5041a36_0  bioconda
samtools                      1.20         h6e868fa_0  bioconda
samtools                      1.21         f5299c06_0  bioconda

This shows every available version. To install a specific version:

conda install -n rna-seq -c bioconda samtools=1.21 -y

The one-liner environment

You can create an environment and install everything at once:

conda create -n rna-seq -c bioconda -c conda-forge \
    python=3.11 samtools fastqc cutadapt trimmomatic -y

This creates the environment, installs Python 3.11, and adds four bioinformatics tools — all in one command, all mutually compatible.


Step 5: Managing environments

List installed packages

conda list -n rna-seq
# packages in environment at /home/redhat/.conda/envs/rna-seq:
#
# Name                     Version          Build                 Channel
_openmp_mutex              4.5              20_gnu                conda-forge
bzip2                      1.0.8            hda65f42_9            conda-forge
ca-certificates            2026.6.17        hbd8a1cb_0            conda-forge
htslib                     1.21             h9753388_0            bioconda
libgcc                     15.2.0           he0feb66_19           conda-forge
ncurses                    6.5              h5b29e6c_0            conda-forge
python                     3.11.15          hf115687_0_cpython    conda-forge
samtools                   1.21             f5299c06_0            bioconda

Each row shows the package name, version, build hash, and which channel it came from. This is your complete software manifest — useful for troubleshooting and reproducibility.

Remove a package

conda remove -n rna-seq samtools -y

Delete an entire environment

conda env remove -n rna-seq -y

Deactivate

conda deactivate

You are now back in the base environment. Your rna-seq environment still exists — it is just not active.


Step 6: Reproducibility with environment.yml

Here is the scenario: you ran an analysis six months ago. Your paper is in review. The reviewer asks for your code. You send your scripts. They try to run them — and everything breaks, because they have different package versions.

Conda solves this with the environment.yml file. Export your environment:

conda env export -n rna-seq > environment.yml

The file looks like this:

name: rna-seq
channels:
  - conda-forge
  - bioconda
  - defaults
dependencies:
  - _openmp_mutex=4.5=20_gnu
  - bzip2=1.0.8=hda65f42_9
  - ca-certificates=2026.6.17=hbd8a1cb_0
  - htslib=1.21=h9753388_0
  - libgcc=15.2.0=he0feb66_19
  - ncurses=6.5=h5b29e6c_0
  - python=3.11.15=hf115687_0_cpython
  - samtools=1.21=f5299c06_0

Now anyone — your collaborator, a reviewer, your future self — can recreate your exact environment with one command:

conda env create -f environment.yml

This is the gold standard for reproducible bioinformatics. Include your environment.yml alongside your code in a GitHub repository, and anyone can reproduce your computational environment exactly.

Tip: For a cleaner file that is less likely to break across platforms (Linux vs. macOS), use --no-builds:

conda env export -n rna-seq --no-builds > environment.yml

This removes the build hashes, which are platform-specific. The resulting file is more portable, at a small risk of getting a slightly different build.


Step 7: A real workflow — CRISPR analysis environment

Let's put it all together. Say you want to set up a complete CRISPR off-target analysis environment. You need Python, samtools (for alignment), and CRISPResso2 (for editing efficiency). Here's how:

# Create the environment
conda create -n crispr-analysis -c bioconda -c conda-forge \
    python=3.11 samtools=1.21 crispresso2 -y

# Activate it
conda activate crispr-analysis

# Verify everything is installed
python --version
samtools --version | head -1
CRISPResso --version

# Export for reproducibility
conda env export -n crispr-analysis > environment.yml

One environment, three tools, zero conflicts. When you are done with CRISPR analysis, deactivate it and your system is clean.


Common tools on bioconda

Here are some of the most popular bioinformatics tools available on the bioconda channel:

ToolWhat it doesInstall command
samtoolsBAM/SAM file manipulationconda install -c bioconda samtools
bedtoolsGenome interval operationsconda install -c bioconda bedtools
bwaRead alignment to reference genomeconda install -c bioconda bwa
bowtie2Read alignment (RNA-seq, ChIP-seq)conda install -c bioconda bowtie2
starSpliced alignment for RNA-seqconda install -c bioconda star
fastqcQuality control for sequencing readsconda install -c bioconda fastqc
cutadaptAdapter trimmingconda install -c bioconda cutadapt
trimmomaticRead trimming and filteringconda install -c bioconda trimmomatic
bcftoolsVariant calling and VCF manipulationconda install -c bioconda bcftools
vcftoolsVCF analysis and filteringconda install -c bioconda vcftools
multiqcAggregate QC reportsconda install -c bioconda multiqc
seqkitFASTA/FASTQ manipulationconda install -c bioconda seqkit
crispresso2CRISPR editing analysisconda install -c bioconda crispresso2
blastSequence similarity searchconda install -c bioconda blast
prodigalGene prediction in prokaryotesconda install -c bioconda prodigal
prokkaGenome annotationconda install -c bioconda prokka
kallistoPseudoalignment for RNA-seq quantificationconda install -c bioconda kallisto
salmonTranscript quantificationconda install -c bioconda salmon

Browse the full catalog at bioconda.github.io.


The real talk

Conda can be slow. The dependency solver — especially the default one — can take minutes to resolve complex environments. If you find yourself waiting, install the libmamba solver:

conda install -n base conda-libmamba-solver
conda config --set solver libmamba

libmamba is dramatically faster — what used to take 5 minutes now takes 5 seconds. Newer Conda versions (4.11+) ship with it by default.

Environments take disk space. Each environment is a full copy of Python plus all its packages. A typical environment is 1-5 GB. If you create 20 environments, that's 20-100 GB. Run conda clean --all periodically to remove cached packages and free space:

conda clean --all -y

Don't install everything in base. The base environment is Conda itself. Installing tools there can break Conda. Always create a separate environment:

# GOOD
conda create -n my-project samtools
conda activate my-project

# BAD - can break conda itself
conda install samtools

Bioconda builds can lag behind. The latest version of a tool on GitHub might not be on bioconda for weeks or months. If you need the bleeding edge, you may have to build from source. But for 95% of use cases, the bioconda version is fine.

Channel priority conflicts. If you see a message about "package not found" or "conflicting dependencies," it is often because your channel priority is wrong. conda-forge should always be prioritized over defaults. If you are still stuck, try installing with --strict-channel-priority:

conda install -n my-env -c bioconda -c conda-forge --strict-channel-priority samtools -y

Mamba: the faster alternative

If Conda is too slow for you, Mamba is a drop-in replacement written in C++. It uses the same channels and the same environment format, but it is significantly faster at dependency resolution. You can install it inside Conda:

conda install -n base -c conda-forge mamba

Then just replace conda with mamba in any command:

mamba create -n rna-seq -c bioconda samtools fastqc -y
mamba install -n rna-seq -c bioconda cutadapt -y

Same environments, same packages, much faster. Many bioinformaticians have switched to Mamba entirely.


The cheat sheet

CommandWhat it does
conda create -n env_name python=3.11Create a new environment
conda activate env_nameActivate an environment
conda deactivateLeave the current environment
conda env listList all environments
conda install -c bioconda tool_nameInstall a package from bioconda
conda search -c bioconda tool_nameSearch for available versions
conda list -n env_nameList packages in an environment
conda remove -n env_name package_nameRemove a package
conda env remove -n env_nameDelete an entire environment
conda env export -n env_name > env.ymlExport environment to a file
conda env create -f env.ymlRecreate environment from a file
conda clean --allRemove cached packages to free disk space
mamba create -n env_name ...Same as conda, but faster (if mamba installed)

What's next?

You now have the three foundational skills for computational biology: navigating the command line, managing your software with Conda, and analyzing data with R. With these tools, you can install virtually any bioinformatics tool, run it on real data, and reproduce your results.

The next time someone sends you a GitHub repo with a pipeline, you won't close the tab. You'll create a Conda environment, install the dependencies, and run it.

Already using Conda in your research? What is your most-used environment? Drop a comment below.

Additional Resources

ResourceLinkWhat it is
Minicondadocs.conda.io/minicondaOfficial installer
Biocondabioconda.github.ioThe bioinformatics package channel
Conda cheat sheetdocs.conda.io/cheatsheetOfficial quick reference
Mambagithub.com/mamba-org/mambaFaster Conda alternative
Anaconda Cloudanaconda.orgSearch packages across all channels
Conda-forgeconda-forge.orgCommunity-maintained general packages

Just getting started with the command line? Check out our Bash for Biologists survival guide first — it covers the terminal basics you need before Conda.