10 Identifying spatial domains with unsupervised clustering
Beyond spatial relationships between cell types, imaging datasets also contain another source of rich information - spatial domains. To give an idea of what spatial domains might visually look like, we’ve provided an image on the right, where we can clearly map out our healthy epithelial tissue spatial domain on the left of the image, and our immune and tumour domains on the right of the image.
However, spatial domains tend to be highly dependent on the biological question being answered. For example, when your primary tissue of interest are solid tumours, spatial domain analysis can provide insights into proportion of tumour domains vs immune domains, or how tumour domains differ between progressive and non-progressive cancers. Alternatively, if your primary question of interest is diabetes, spatial domains can provide insights into marker or cell type differences in your pancreatic islets.
In this section, we’ll be exploring the use of lisaClust
on two different datasets to identify spatial domains and help predict patient survival.
# set parameters
set.seed(51773)
use_mc <- TRUE
if (use_mc) {
nCores <- max(parallel::detectCores()/2, 1)
} else {
nCores <- 1
}
BPPARAM <- simpleSeg:::generateBPParam(nCores)
theme_set(theme_classic())
10.1 lisaClust
Clustering Local Indicators of Spatial Association (LISA) functions is a methodology for identifying consistent spatial organisation of multiple cell-types in an unsupervised way. This can be used to enable the characterisation of interactions between multiple cell-types simultaneously and can complement traditional pairwise analysis. In our implementation our LISA curves are a localised summary of an L-function from a Poisson point process model. Our framework lisaClust
can be used to provide a high-level summary of cell type co-localisation in high-parameter spatial cytometry data, facilitating the identification of distinct tissue compartments or complex cellular microenvironments.
The workflow that lisaClust uses to identify regions of tissue with similar localisation patterns of cells contains multiple key steps. First, cells are treated as objects and assigned coordinates in an x-y space. Second, distances between all cells are calculated and then, by modeling the cells as a multi-type Poisson point process, the distances are used to calculate local indicators of spatial association (LISA). These LISA curves summarize the spatial association between each cell and a specific cell type over a range of radii. The LISA curves are calculated for each cell and cell type and then clustered to assign a region label for each cell.
10.1.1 Case study: Keren
We will start by reading in the Keren 2018 dataset from the SpatialDatasets
package as a SingleCellExperiment
object. Here the data is in a format consistent with that outputted by CellProfiler.
kerenSPE <- SpatialDatasets::spe_Keren_2018()
see ?SpatialDatasets and browseVignettes('SpatialDatasets') for documentation
loading from cache
10.1.1.1 Generate LISA curves
For the purpose of this demonstration, we will be using only images 5 and 6 of the dataset.
This data comes with pre-annotated cell types, sowe can move directly to performing k-means clustering on the local indicators of spatial association (LISA) functions using the lisaClust
function. The image ID, cell type column, and spatial coordinates can be specified using the imageID
, cellType
, and spatialCoords
arguments respectively. We will identify 5 regions of co-localisation by setting k = 5
.
kerenSPE <- lisaClust(kerenSPE,
k = 5)
These regions are stored in colData
and can be extracted.
DataFrame with 10 rows and 2 columns
imageID region
<character> <character>
21154 5 region_4
21155 5 region_4
21156 5 region_4
21157 5 region_2
21158 5 region_2
21159 5 region_2
21160 5 region_5
21161 5 region_2
21162 5 region_2
21163 5 region_2
10.1.1.2 Examine cell type enrichment
lisaClust also provides a convenient function, regionMap
, for examining which cell types are located in which regions. In this example, we use this to check which cell types appear more frequently in each region than expected by chance.
regionMap(kerenSPE,
type = "bubble")
Above, we can see that tumour cells are concentrated in region 5, and immune cells are concentrated in region 1 and 4. We can further segregate these cells by increasing the number of clusters, i.e., increasing the parameter k =
in the lisaClust
function.
How do we choose an appropriate value for k
?
The choice of
k
depends largely on the biological question being asked. For instance, if we are interested in understanding the interactions between immune cells in a tumor microenvironment, the number of clusters should reflect the known biological subtypes of immune cells, such as T cells, B cells, macrophages, etc. In this case, a larger value ofk
may be needed to capture the diversity within these immune cell populations.On the other hand, if the focus is on interactions between immune cells and tumor cells, we might choose a smaller value of
k
to group immune cells into broader categories.Additionally, methods like the Gap statistic, Jump statistic, or Silhouette score could be employed to determine an optimal value of
k
.
10.1.1.3 Plot identified regions
We can use the hatchingPlot
function to visualise all 5 regions and 17 cell types simultaneously for a specific image or set of images. The output is a ggplot
object where the regions are marked by different hatching patterns. The nbp
argument can be used to tune the granularity of the grid used for defining regions.
hatchingPlot(kerenSPE, useImages = 5, nbp = 300)
Concave windows are temperamental. Try choosing values of window.length > and < 1 if you have problems.
Warning in split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...): data
length is not a multiple of split variable
Warning in split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...): data
length is not a multiple of split variable
Time for this code chunk to run with 5.5 cores: 36.72 seconds
In accordance with the regionMap
output, we can see that region 5 is mostly made up of tumour cells, and region 2 and 4 both contain our immune cell populations.
How could results from lisaClust be used in conjunction with results from spicyR?
lisaClust provides a high-resolution view of the tissue architecture, while spicyR can quantify how these spatial relationships or features contribute to clinical outcomes. spicyR’s L-function metric can be used to determine the degree of localisation or dispersion between different spatial domains. For instance, we can look at co-localisation between region 5 (our tumour cells) and regions 2 or 4 (our immune cells).
10.2 sessionInfo
R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS Sonoma 14.4.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Australia/Sydney
tzcode source: internal
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] SpatialDatasets_1.4.0 SpatialExperiment_1.16.0
[3] ExperimentHub_2.14.0 AnnotationHub_3.14.0
[5] BiocFileCache_2.14.0 dbplyr_2.5.0
[7] SingleCellExperiment_1.28.1 SummarizedExperiment_1.36.0
[9] Biobase_2.66.0 GenomicRanges_1.58.0
[11] GenomeInfoDb_1.42.0 IRanges_2.40.0
[13] S4Vectors_0.44.0 BiocGenerics_0.52.0
[15] MatrixGenerics_1.18.0 matrixStats_1.4.1
[17] ggplot2_3.5.1 spicyR_1.18.0
[19] lisaClust_1.14.4
loaded via a namespace (and not attached):
[1] splines_4.4.1 later_1.4.1
[3] bitops_1.0-9 filelock_1.0.3
[5] svgPanZoom_0.3.4 tibble_3.2.1
[7] polyclip_1.10-7 lifecycle_1.0.4
[9] rstatix_0.7.2 lattice_0.22-6
[11] MASS_7.3-61 MultiAssayExperiment_1.32.0
[13] backports_1.5.0 magrittr_2.0.3
[15] rmarkdown_2.29 yaml_2.3.10
[17] httpuv_1.6.15 doRNG_1.8.6.1
[19] ClassifyR_3.10.5 sp_2.1-4
[21] dcanr_1.22.0 spatstat.sparse_3.1-0
[23] DBI_1.2.3 minqa_1.2.8
[25] RColorBrewer_1.1-3 abind_1.4-8
[27] zlibbioc_1.52.0 purrr_1.0.2
[29] RCurl_1.98-1.16 tweenr_2.0.3
[31] rappdirs_0.3.3 GenomeInfoDbData_1.2.13
[33] spatstat.utils_3.1-1 terra_1.7-78
[35] pheatmap_1.0.12 goftest_1.2-3
[37] simpleSeg_1.8.0 spatstat.random_3.3-2
[39] svglite_2.1.3 codetools_0.2-20
[41] DelayedArray_0.32.0 ggforce_0.4.2
[43] tidyselect_1.2.1 raster_3.6-30
[45] UCSC.utils_1.2.0 farver_2.1.2
[47] viridis_0.6.5 lme4_1.1-35.5
[49] spatstat.explore_3.3-3 jsonlite_1.8.9
[51] Formula_1.2-5 survival_3.7-0
[53] iterators_1.0.14 systemfonts_1.1.0
[55] foreach_1.5.2 tools_4.4.1
[57] Rcpp_1.0.13-1 glue_1.8.0
[59] gridExtra_2.3 SparseArray_1.6.0
[61] xfun_0.49 mgcv_1.9-1
[63] ggthemes_5.1.0 EBImage_4.48.0
[65] HDF5Array_1.34.0 dplyr_1.1.4
[67] shinydashboard_0.7.2 scam_1.2-17
[69] withr_3.0.2 numDeriv_2016.8-1.1
[71] BiocManager_1.30.25 fastmap_1.2.0
[73] ggh4x_0.2.8 rhdf5filters_1.18.0
[75] boot_1.3-31 fansi_1.0.6
[77] digest_0.6.37 mime_0.12
[79] R6_2.5.1 colorspace_2.1-1
[81] tensor_1.5 jpeg_0.1-10
[83] spatstat.data_3.1-4 RSQLite_2.3.8
[85] utf8_1.2.4 tidyr_1.3.1
[87] generics_0.1.3 data.table_1.16.2
[89] class_7.3-22 httr_1.4.7
[91] htmlwidgets_1.6.4 S4Arrays_1.6.0
[93] pkgconfig_2.0.3 gtable_0.3.6
[95] blob_1.2.4 XVector_0.46.0
[97] htmltools_0.5.8.1 carData_3.0-5
[99] fftwtools_0.9-11 scales_1.3.0
[101] ggupset_0.4.0 png_0.1-8
[103] spatstat.univar_3.1-1 knitr_1.49
[105] rstudioapi_0.17.1 reshape2_1.4.4
[107] rjson_0.2.23 nlme_3.1-166
[109] curl_6.0.1 nloptr_2.1.1
[111] bdsmatrix_1.3-7 rhdf5_2.50.0
[113] cachem_1.1.0 stringr_1.5.1
[115] BiocVersion_3.20.0 vipor_0.4.7
[117] parallel_4.4.1 concaveman_1.1.0
[119] AnnotationDbi_1.68.0 pillar_1.9.0
[121] grid_4.4.1 vctrs_0.6.5
[123] coxme_2.2-22 promises_1.3.2
[125] ggpubr_0.6.0 car_3.1-3
[127] xtable_1.8-4 beeswarm_0.4.0
[129] evaluate_1.0.1 magick_2.8.5
[131] cli_3.6.3 locfit_1.5-9.10
[133] compiler_4.4.1 rlang_1.1.4
[135] crayon_1.5.3 rngtools_1.5.2
[137] ggsignif_0.6.4 labeling_0.4.3
[139] ggbeeswarm_0.7.2 plyr_1.8.9
[141] stringi_1.8.4 viridisLite_0.4.2
[143] nnls_1.6 deldir_2.0-4
[145] BiocParallel_1.40.0 cytomapper_1.18.0
[147] lmerTest_3.1-3 munsell_0.5.1
[149] Biostrings_2.74.0 tiff_0.1-12
[151] spatstat.geom_3.3-4 V8_6.0.0
[153] Matrix_1.7-1 bit64_4.5.2
[155] Rhdf5lib_1.28.0 KEGGREST_1.46.0
[157] shiny_1.9.1 igraph_2.1.1
[159] broom_1.0.7 memoise_2.0.1
[161] bit_4.5.0