Identify clusters with (A) low RNA UMIs, (B) High % mitochondrial reads, and/or (C) Uninformative marker genes.Pre-process data using standard workflow.Manually threshold raw gene expression matrices according to RNA nUMIs (especially important when dealing with super-loaded 10X data because of the way CellRanger threholds data - See Lun et al., 2019, Genome Biology.There are a variety of ways to do this, but I usually use the following workflow: Notably, it is okay to run DoubletFinder on data generated by splitting a single sample across multiple 10X lanes.Įnsure that input data is cleared of low-quality cell clusters. These artificial doublets will skew results. For example, if you run DoubletFinder on aggregated data representing WT and mutant cell lines sequenced across different 10X lanes, artificial doublets will be generated from WT and mutant cells, which cannot exist in your data. 'Best-Practices' for scRNA-seq data generated without sample multiplexing Input scRNA-seq Dataĭo not apply DoubletFinder to aggregated scRNA-seq data representing multiple distinct samples (e.g., multiple 10X lanes). DoubletFinder is insensitive to homotypic doublets - i.e., doublets dervied from transcriptionally-similar cell states. DoubletFinder identifies false-negative Demuxlet classifications caused by doublets formed from cells with identical SNP profiles. Application to Cell Hashing and Demuxlet dataĭoubletFinder successfully recapitulates ground-truth doublet classifications determined using antibody-barcode sample multiplexing (Cell Hashing) and SNP deconvolution (Demuxlet). This value can best be estimated from cell loading densities into the 10X/Drop-Seq device, and adjusted according to the estimated proportion of homotypic doublets. NExp ~ This defines the pANN threshold used to make final doublet/singlet predictions. Optimal pK values should be estimated using the strategy described below. No default is set, as pK should be adjusted for each scRNA-seq dataset. PK ~ This defines the PC neighborhood size used to compute pANN, expressed as a proportion of the merged real-artificial data. Default is set to 25%, based on observation that DoubletFinder performance is largely pN-invariant (see McGinnis, Murrow and Gartner 2019, Cell Systems). PN ~ This defines the number of generated artificial doublets, expressed as a proportion of the merged real-artificial data. PCs ~ The number of statistically-significant principal components, specified as a range (e.g., PCs = 1:10) Seu ~ This is a fully-processed Seurat object (i.e., after NormalizeData, FindVariableGenes, ScaleData, RunPCA, and RunTSNE have all been run). (4) Rank order and threshold pANN values according to the expected number of doubletsĭoubletFinder takes the following arguments: (3) Perform PCA and use the PC distance matrix to find each cell's proportion of artificial k nearest neighbors (pANN) (2) Pre-process merged real-artificial data (1) Generate artificial doublets from existing scRNA-seq data NOTE: These package versions were used in the bioRxiv paper, but other versions may work, as well.ĭoubletFinder can be broken up into 4 steps:.Remotes::install_github('chris-mcginnis-ucsf/DoubletFinder') DependenciesĭoubletFinder requires the following R packages: Included vignette describing 'best-practices' for applying DoubletFinder to scRNA-seq data generated without sample multiplexing.Implemented strategy for determining optimal pK values for any scRNA-seq data using pN-pK parameter sweeps and mean-variance-normalized bimodality coefficient (BCmvn).Increased computational efficiency during pANN computation.() Seurat V3 compatibility: 'doubletFinder_v3' and 'paramSweep_v3' functions added, other functions for parameter estimation remain compatible. () Added 'PCs' argument to 'doubletFinder', 'doubletFinder_v3', 'paramSweep', and 'paramSweep_v3' to avoid conflicts with dimension reduction preferences. () Added SCTransform compatibilities to 'paramSweep_v3' and 'doubletFinder_v3' () Added parallelization to paramSweep_v3 (thanks NathanSkeen!) - Note: progress no longer updated, but the process is much faster! Fixed bug with smaller datasets. () Internalized functions normally in 'modes' package to enable compatibility with R v3.6 and highger. DoubletFinder (code written by Chris McGinnis)ĭoubletFinder is an R package that predicts doublets in single-cell RNA sequencing data.ĭoubletFinder is implemented to interface with Seurat >= 2.0 ( )ĭoubletFinder was published by Cell Systems in April, 2019: (19)30073-0 Updates I also only check my github repos about once per month, so please reach out directly at if you run into any issues. I'm now a postdoc at Stanford and my UCSF email will be decommissioned soon.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |