Merge single-cell data from different batches and experiments leveraging (pseudo)-replicates, control genes and pseudo-bulk.
scMerge2(
exprsMat,
batch,
cellTypes = NULL,
condition = NULL,
ctl = rownames(exprsMat),
chosen.hvg = NULL,
ruvK = 20,
use_bpparam = BiocParallel::SerialParam(),
use_bsparam = BiocSingular::RandomParam(),
use_bnparam = BiocNeighbors::AnnoyParam(),
pseudoBulk_fn = "create_pseudoBulk",
k_pseudoBulk = 30,
k_celltype = 10,
exprsMat_counts = NULL,
cosineNorm = TRUE,
return_subset = FALSE,
return_subset_genes = NULL,
return_matrix = TRUE,
byChunk = TRUE,
chunkSize = 50000,
verbose = TRUE,
seed = 1
)
A gene (row) by cell (column) log-transformed matrix to be adjusted.
A vector indicating the batch information for each cell in the batch-combined matrix.
An optional vector indicating the cell type information for each cell in the batch-combined matrix.
If it is NULL
, pseudo-replicate procedure will be run to identify cell type.
An optional vector indicating the condition information for each cell in the batch-combined matrix.
A character vector of negative control. It should have a non-empty intersection with the rows of exprsMat
An optional character vector of highly variable genes chosen.
An integer indicates the number of unwanted variation factors that are removed, default is 20.
A BiocParallelParam
class object from the BiocParallel
package is used. Default is SerialParam().
A BiocSingularParam
class object from the BiocSingular
package is used. Default is RandomParam().
A BiocNeighborsParam
class object from the BiocNeighbors
package is used. Default is AnnoyParam().
A character indicates the way of pseudobulk constructed.
An integer indicates the number of pseudobulk constructed within each cell grouping. Default is 30.
An integer indicates the number of nearest neighbours used in buildSNNGraph
when grouping cells within each batch. Default is 10.
A gene (row) by cell (column) counts matrix to be adjusted.
A logical vector indicates whether cosine normalisation is performed on input data.
If TRUE
, adjusted matrix of only a subset of genes (hvg or indicates in return_subset_genes
) will be return.
An optional character vector of indicates the subset of genes will be adjusted.
A logical value indicates whether the adjusted matrix is calculated and returned.
A logical value indicates whether it calculates the adjusted matrix by chunk
A numeric indicates the size of the chunk.
If FALSE
, then only the estimated parameters will be returned.
If TRUE
, then all intermediate steps will be shown. Default to FALSE
.
A numeric input indicates the seed used.
Returns a list
object with following components:
newY: if return_matrix
is TRUE
, the adjusted matrix will be return.
fullalpha: Alpha estimated from the fastRUVIII model.
M: Replicate matrix.
## Loading example data
data('example_sce', package = 'scMerge')
## Previously computed stably expressed genes
data('segList_ensemblGeneID', package = 'scMerge')
## Running an example data with minimal inputs
library(SingleCellExperiment)
exprsMat <- scMerge2(exprsMat = logcounts(example_sce),
batch = example_sce$batch,
ctl = segList_ensemblGeneID$mouse$mouse_scSEG)
#> [1] "Cluster within batch"
#> Warning: You're computing too large a percentage of total singular values, use a standard svd instead.
#> Warning: You're computing too large a percentage of total singular values, use a standard svd instead.
#> [1] "Normalising data"
#> [1] "Constructing pseudo-bulk"
#> Dimension of pseudo-bulk expression: [1] 1047 131
#> [1] "Identifying MNC using pseudo-bulk:"
#> [1] "Running RUV"
assay(example_sce, "scMerge2") <- exprsMat$newY
example_sce = scater::runPCA(example_sce, exprs_values = 'scMerge2')
scater::plotPCA(example_sce, colour_by = 'cellTypes', shape = 'batch')