Perform the scMerge2 algorithm — scMerge2 • scMerge

Merge single-cell data from different batches and experiments leveraging (pseudo)-replicates, control genes and pseudo-bulk.

scMerge2(
  exprsMat,
  batch,
  cellTypes = NULL,
  condition = NULL,
  ctl = rownames(exprsMat),
  chosen.hvg = NULL,
  ruvK = 20,
  use_bpparam = BiocParallel::SerialParam(),
  use_bsparam = BiocSingular::RandomParam(),
  use_bnparam = BiocNeighbors::AnnoyParam(),
  pseudoBulk_fn = "create_pseudoBulk",
  k_pseudoBulk = 30,
  k_celltype = 10,
  exprsMat_counts = NULL,
  cosineNorm = TRUE,
  return_subset = FALSE,
  return_subset_genes = NULL,
  return_matrix = TRUE,
  byChunk = TRUE,
  chunkSize = 50000,
  verbose = TRUE,
  seed = 1
)

Arguments

exprsMat: A gene (row) by cell (column) log-transformed matrix to be adjusted.
batch: A vector indicating the batch information for each cell in the batch-combined matrix.
cellTypes: An optional vector indicating the cell type information for each cell in the batch-combined matrix. If it is NULL, pseudo-replicate procedure will be run to identify cell type.
condition: An optional vector indicating the condition information for each cell in the batch-combined matrix.
ctl: A character vector of negative control. It should have a non-empty intersection with the rows of exprsMat
chosen.hvg: An optional character vector of highly variable genes chosen.
ruvK: An integer indicates the number of unwanted variation factors that are removed, default is 20.
use_bpparam: A BiocParallelParam class object from the BiocParallel package is used. Default is SerialParam().
use_bsparam: A BiocSingularParam class object from the BiocSingular package is used. Default is RandomParam().
use_bnparam: A BiocNeighborsParam class object from the BiocNeighbors package is used. Default is AnnoyParam().
pseudoBulk_fn: A character indicates the way of pseudobulk constructed.
k_pseudoBulk: An integer indicates the number of pseudobulk constructed within each cell grouping. Default is 30.
k_celltype: An integer indicates the number of nearest neighbours used in buildSNNGraph when grouping cells within each batch. Default is 10.
exprsMat_counts: A gene (row) by cell (column) counts matrix to be adjusted.
cosineNorm: A logical vector indicates whether cosine normalisation is performed on input data.
return_subset: If TRUE, adjusted matrix of only a subset of genes (hvg or indicates in return_subset_genes) will be return.
return_subset_genes: An optional character vector of indicates the subset of genes will be adjusted.
return_matrix: A logical value indicates whether the adjusted matrix is calculated and returned.
byChunk: A logical value indicates whether it calculates the adjusted matrix by chunk
chunkSize: A numeric indicates the size of the chunk. If FALSE, then only the estimated parameters will be returned.
verbose: If TRUE, then all intermediate steps will be shown. Default to FALSE.
seed: A numeric input indicates the seed used.

Value

Returns a list object with following components:

newY: if return_matrix is TRUE, the adjusted matrix will be return.
fullalpha: Alpha estimated from the fastRUVIII model.
M: Replicate matrix.

Author

Yingxin Lin

Examples

## Loading example data
data('example_sce', package = 'scMerge')
## Previously computed stably expressed genes
data('segList_ensemblGeneID', package = 'scMerge')
## Running an example data with minimal inputs
library(SingleCellExperiment)
exprsMat <- scMerge2(exprsMat = logcounts(example_sce),
batch = example_sce$batch,
ctl = segList_ensemblGeneID$mouse$mouse_scSEG)
#> [1] "Cluster within batch"
#> Warning: You're computing too large a percentage of total singular values, use a standard svd instead.
#> Warning: You're computing too large a percentage of total singular values, use a standard svd instead.
#> [1] "Normalising data"
#> [1] "Constructing pseudo-bulk"
#> Dimension of pseudo-bulk expression: [1] 1047  131
#> [1] "Identifying MNC using pseudo-bulk:"

#> [1] "Running RUV"
assay(example_sce, "scMerge2") <- exprsMat$newY


example_sce = scater::runPCA(example_sce, exprs_values = 'scMerge2')                                       
scater::plotPCA(example_sce, colour_by = 'cellTypes', shape = 'batch')