A clustering method for cell type classification of spatial transcriptomics data. The tool generates and uses an adaptively smoothed, spatially informed gene expression data for clustering.
Usage
clustSIGNAL(
spe,
samples,
dimRed = "None",
batch = FALSE,
batch_by = "None",
NN = 30,
kernel = "G",
spread = 0.05,
sort = TRUE,
threads = 1,
outputs = "c",
clustParams = list(clust_c = 0, subclust_c = 0, iter.max = 30, k = 5, cluster.fun =
"louvain")
)
Arguments
- spe
a SpatialExperiment object.
- samples
a character indicating name of colData(spe) column containing sample names.
- dimRed
a character indicating the name of the reduced dimensions to use from the SpatialExperiment object (i.e., from reducedDimNames(spe)). Default value is 'None'.
- batch
a logical parameter for whether or not to perform batch correction. Default value is FALSE.
- batch_by
a character indicating name of colData(spe) column containing the batch names.
- NN
an integer for the number of neighbourhood cells the function should consider. The value must be greater than or equal to 1. Default value is 30.
- kernel
a character for type of distribution to be used. The two valid values are "G" or "E". G for Gaussian distribution, and E for exponential distribution. Default value is "G".
- spread
a numeric value for distribution spread, represented by standard deviation for Gaussian distribution and rate for exponential distribution. Default value is 0.05 for Gaussian distribution and 20 for exponential distribution.
- sort
a logical parameter for whether or not to sort the neighbourhood after region description. Default value is TRUE.
- threads
a numeric value for the number of CPU cores to be used for the analysis. Default value set to 1.
- outputs
a character for the type of output to return to the user. "c" for data frame of cell IDs and their respective cluster numbers (default), "n" for dataframe of clusters plus neighbourhood matrix, "s" for dataframe of clusters plus final spatialExperiment object, or "a" for all outputs.
- clustParams
a list of parameters for TwoStepParam clustering methods. The clustering parameters are in the order - centers (centers) for clustering with KmeansParam, centers (centers) for sub-clustering clusters with KmeansParam, maximum iterations (iter.max) for clustering with KmeansParam, k values (k) for clustering with NNGraphParam, and community detection method (cluster.fun) to use with NNGraphParam.
Value
a list of outputs
1. clusters: a data frame of cell names and their cluster classification.
2. neighbours: a character matrix containing cells IDs of each cell's neighbours
3. spe_final: a SpatialExperiment object with initial 'putative cell type' groups, entropy values, smoothed gene expression, post-smoothing clusters, and silhouette widths included.
Examples
data(example)
names(colData(spe))
#> [1] "uniqueID" "sample_id" "entropy" "nsCluster" "initCluster"
# identify the column names with cell and sample labels
samples = "sample_id"
res_list <- clustSIGNAL(spe, samples, outputs = "c")
#> [1] "Calculating PCA. Time 05:14:50"
#> [1] "ClustSIGNAL run started. Time 05:14:50"
#> [1] "Initial nonspatial clustering performed. Clusters = 6 Time 05:14:50"
#> Warning: You're computing too large a percentage of total singular values, use a standard svd instead.
#> Warning: You're computing too large a percentage of total singular values, use a standard svd instead.
#> [1] "Nonspatial subclustering performed. Subclusters = 17 Time 05:14:52"
#> [1] "Regions defined. Time 05:14:52"
#> [1] "Region domainness calculated. Time 05:14:52"
#> [1] "Smoothing performed. NN = 30 Kernel = G Spread = 0.05 Time 05:14:52"
#> [1] "Nonspatial clustering performed on smoothed data. Clusters = 9 Time 05:14:52"
#> [1] "ClustSIGNAL run completed. 05:14:52"
#> Time difference of 2.5301 secs