Skip to the content.

Authors: Christina Zakarian, Laura Paez, Natalie Elphick
Date: 10/19/21


Perform preprocessing of single cell RNA-seq data from the olfactory bulb using the 10x Genomics Cell Ranger pipeline to perform alignment, get UMI and cell counts, and generate a gene expression count matrix.

1. Installation of cell ranger:

conda create --name scRNAseq python=3.8
conda activate scRNAseq
wget -O cellranger-6.1.1.tar.gz ""
tar -xzvf cellranger-6.1.1.tar.gz



Adding Cell Ranger directory to $PATH:

Added the following command to .bashrc file in home directory on talapas to allow calling cellranger command from anywhere on command line.

export PATH=/projects/bgmp/czakari2/bioinformatics/yu_project/cellranger/cellranger-6.1.1:$$

Verify Installation:

cellranger testrun --id=tiny

Pipestance completed successfully!
Saving pipestance info to testrun/testrun.mri.tgz

2. Build a Custom Reference With cellranger mkref

The built in reference genome that Cell Ranger has is using an older mouse assembly (GRCm38/mm10) but we want to use the most recent version -> Mus musculus (GRCm39/mm39), so we will use cell ranger’s ‘mkref’ function to build a custom genome reference from the more recent assembly.

Max mentioned in his email that we can use either the genome reference or the transcriptome reference (inc coding and non-coding genes). We will use the genome reference since Cell ranger’s examples use the genome + the transcriptome reference will require inputting 2 separate fasta files (cDNA and ncRNA) which may not be so straightforward to use with cell ranger.

Download the fasta and gtf files from ensembl:

Make a directory to store the downloaded fasta and gtf files:


Confirm checksums:

30904 30598 Mus_musculus.GRCm39.104.gtf.gz
16996 787519 Mus_musculus.GRCm39.dna.primary_assembly.fa.gz

Unzipped fasta and gtf files using gunzip before running mkref command.

Generate the genome reference using mkref (STAR: 2.7.2a):

cd /projects/bgmp/shared/2021_projects/Yu/cellranger_build

cellranger mkref 

Full slurm script ( with output in slurm-16423558.out can be found in repo under …/cellranger/mkref/

Run cellranger on the combined FASTQ files

for sample in $samples
    /usr/bin/time -v cellranger count \
    --id=sample_$sample \
    --transcriptome=/projects/bgmp/shared/2021_projects/Yu/cellranger_build/Mus_musculus.GRCm39.dna.ens104 \
    --fastqs=/projects/bgmp/shared/2021_projects/Yu/BGMP_2021/combined_files_output \
    --sample=$sample \

Full slurm script ( and output files can be found in repo under …/cellranger/count/

Save cellranger count ouptuts as RDS objects using Seurat

Script and relevant outputs can be found under ../cellranger/seurat_obj