Merge single-cell data hierarchically from different batches and experiments leveraging (pseudo)-replicates, control genes and pseudo-bulk.

scMerge2h(
  exprsMat,
  batch_list = list(),
  h_idx_list = list(),
  cellTypes = NULL,
  condition = NULL,
  ctl = rownames(exprsMat),
  chosen.hvg = NULL,
  ruvK_list = 20,
  use_bpparam = BiocParallel::SerialParam(),
  use_bsparam = BiocSingular::RandomParam(),
  use_bnparam = BiocNeighbors::AnnoyParam(),
  pseudoBulk_fn = "create_pseudoBulk",
  k_pseudoBulk = 30,
  k_celltype = 10,
  exprsMat_counts = NULL,
  cosineNorm = TRUE,
  return_subset = FALSE,
  return_subset_genes = NULL,
  return_matrix = TRUE,
  verbose = TRUE,
  seed = 1
)

Arguments

exprsMat

A gene (row) by cell (column) log-transformed matrix to be adjusted.

batch_list

A list indicating the batch information for each cell in the batch-combined matrix.

h_idx_list

A list indicating the indeces information in the hierarchical merging.

cellTypes

An optional vector indicating the cell type information for each cell in the batch-combined matrix. If it is NULL, pseudo-replicate procedure will be run to identify cell type.

condition

An optional vector indicating the condition information for each cell in the batch-combined matrix.

ctl

A character vector of negative control. It should have a non-empty intersection with the rows of exprsMat

chosen.hvg

An optional character vector of highly variable genes chosen.

ruvK_list

An integer indicates the number of unwanted variation factors that are removed, default is 20.

use_bpparam

A BiocParallelParam class object from the BiocParallel package is used. Default is SerialParam().

use_bsparam

A BiocSingularParam class object from the BiocSingular package is used. Default is RandomParam().

use_bnparam

A BiocNeighborsParam class object from the BiocNeighbors package is used. Default is AnnoyParam().

pseudoBulk_fn

A character indicates the way of pseudobulk constructed.

k_pseudoBulk

An integer indicates the number of pseudobulk constructed within each cell grouping. Default is 30.

k_celltype

An integer indicates the number of nearest neighbours used in buildSNNGraph when grouping cells within each batch. Default is 10.

exprsMat_counts

A gene (row) by cell (column) counts matrix to be adjusted.

cosineNorm

A logical vector indicates whether cosine normalisation is performed on input data.

return_subset

If TRUE, adjusted matrix of only a subset of genes (hvg or indicates in return_subset_genes) will be return.

return_subset_genes

An optional character vector of indicates the subset of genes will be adjusted.

return_matrix

A logical vector indicates whether the adjusted matrix is calculated and returned. If FALSE, then only the estimated parameters will be returned.

verbose

If TRUE, then all intermediate steps will be shown. Default to FALSE.

seed

A numeric input indicates the seed used.

Author

Yingxin Lin

Examples


## Loading example data
data('example_sce', package = 'scMerge')
## Previously computed stably expressed genes
data('segList_ensemblGeneID', package = 'scMerge')


# Create a fake sample information
example_sce$sample <- rep(c(1:4), each = 50)

# Construct a hierarchical index list
h_idx_list <- list(level1 = split(1:200, example_sce$batch),
                   level2 = list(1:200))

# Construct a batch information list
batch_list <- list(level1 = split(example_sce$sample, example_sce$batch),
                   level2 = list(example_sce$batch))
library(SingleCellExperiment)
exprsMat <- scMerge2h(exprsMat = logcounts(example_sce),
batch_list = batch_list,
h_idx_list = h_idx_list,
ctl = segList_ensemblGeneID$mouse$mouse_scSEG,
ruvK_list = c(2, 5))
#> [1] "Hierarchical merging level 1, data1"
#> [1] "Cluster within batch"
#> Warning: more singular values/vectors requested than available
#> Warning: You're computing too large a percentage of total singular values, use a standard svd instead.
#> Warning: requested number of components greater than available rank
#> Warning: more singular values/vectors requested than available
#> Warning: You're computing too large a percentage of total singular values, use a standard svd instead.
#> Warning: requested number of components greater than available rank
#> [1] "Normalising data"
#> [1] "Constructing pseudo-bulk"
#> Dimension of pseudo-bulk expression: [1] 1047   89
#> [1] "Identifying MNC using pseudo-bulk:"
#> [1] "Running RUV"
#> [1] "Hierarchical merging level 1, data2"
#> [1] "Cluster within batch"
#> Warning: more singular values/vectors requested than available
#> Warning: You're computing too large a percentage of total singular values, use a standard svd instead.
#> Warning: requested number of components greater than available rank
#> Warning: more singular values/vectors requested than available
#> Warning: You're computing too large a percentage of total singular values, use a standard svd instead.
#> Warning: requested number of components greater than available rank

#> [1] "Normalising data"
#> [1] "Constructing pseudo-bulk"
#> Dimension of pseudo-bulk expression: [1] 1047   86
#> [1] "Identifying MNC using pseudo-bulk:"

#> [1] "Running RUV"
#> [1] "Hierarchical merging level 2, data1"
#> [1] "Cluster within batch"
#> [1] "Normalising data"
#> [1] "Constructing pseudo-bulk"
#> Dimension of pseudo-bulk expression: [1] 1047  100
#> [1] "Identifying MNC using pseudo-bulk:"

#> [1] "Running RUV"
assay(example_sce, "scMerge2") <- exprsMat[[length(h_idx_list)]]
example_sce = scater::runPCA(example_sce, exprs_values = 'scMerge2')                                       
scater::plotPCA(example_sce, colour_by = 'cellTypes', shape = 'batch')