12 Case Study: Head and Neck Squamous Cell Carcinoma (Ferguson et al., 2022)

In this section, we will demonstrate how the workflow can be applied to a single dataset, the Ferguson 2022 dataset. The key conclusion of this manuscript (amongst others) is that spatial information about cells and the immune environment can be used to predict primary tumour progression or metastases in patients. We will use our workflow to reach a similar conclusion.

library(cytomapper)
library(dplyr)
library(ggplot2)
library(simpleSeg)
library(FuseSOM)
library(ggpubr)
library(scater)
library(spicyR)
library(ClassifyR)
library(lisaClust)
library(Statial)
library(tidySingleCellExperiment)
library(SpatialExperiment)
library(SpatialDatasets)

# set parameters
set.seed(51773)

# whether to use multiple cores (recommended)
use_mc = TRUE
is_windows = .Platform$OS.type == "windows"

if (use_mc) {
  nCores = max(ceiling(parallel::detectCores() / 2), 1)
  
  if (nCores == 1) {
    BPPARAM = BiocParallel::SerialParam()
  } else if (is_windows) {
    BPPARAM = BiocParallel::SnowParam(workers = nCores, type = "SOCK")
  } else {
    BPPARAM = BiocParallel::MulticoreParam(workers = nCores)
  }
} else {
  BPPARAM = BiocParallel::SerialParam()
}

theme_set(theme_classic())

12.1 Read in images

As before, we can load the Ferguson 2022 image data from the SpatialDatasets package into a CytoImageList and store them as h5 file-on-disk in a temporary directory. We will also assign the metadata columns of the CytoImageList object using the mcols function.

pathToImages <- SpatialDatasets::Ferguson_Images()
tmp <- tempfile()
unzip(pathToImages, exdir = tmp)

# Store images in a CytoImageList on_disk as h5 files to save memory.
images <- cytomapper::loadImages(
  tmp,
  single_channel = TRUE,
  on_disk = TRUE,
  h5FilesPath = HDF5Array::getHDF5DumpDir(),
  BPPARAM = BPPARAM
)

# assign metadata columns
mcols(images) <- S4Vectors::DataFrame(imageID = names(images))

Time for this code chunk to run with 6 cores: 146.41 s

As we’re reading the image channels directly from the names of the TIFF image, we will first clean them for ease of downstream processing.

channelNames(images) <- channelNames(images) |>
                          # remove preceding letters
                          sub(pattern = ".*_", replacement = "", x = _) |> 
                          # remove the .ome
                          sub(pattern = ".ome", replacement = "", x = _)

Similarly, the image names will be taken from the folder name containing the individual TIFF images for each channel. These will often also need to be cleaned.

# cleaning image names to obtain image IDs
split_names <- function(x) {
  sapply(strsplit(x, "_"), `[`, 3)
}

names(images) <- names(images) |> split_names()
mcols(images) <- S4Vectors::DataFrame(imageID = names(images))

12.2 SimpleSeg: Segment the cells in the images

Our simpleSeg R package Bioconductor provides a series of functions to generate simple segmentation masks of images. A key strength of the simpleSeg package is that we have included multiple ways to perform some simple segmentation operations and incorporated multiple automatic procedures to optimise some key parameters when these aren’t specified. These functions leverage the functionality of the EBImage package on Bioconductor. For more flexibility when performing your segmentation in R, we recommend learning to use the EBImage package.

12.2.1 Run simpleSeg

If your images are stored in a list or CytoImageList they can be segmented with a simple call to simpleSeg. simpleSeg is an R implementation of a simple segmentation technique which traces out the nuclei using a specified channel (by setting nucleus =) and then dilates around the traced nuclei by a specified amount (controlled using discSize). The nucleus can be traced out using either one specified channel, or by using the principal components of all channels most correlated to the specified nuclear channel by setting pca = TRUE.

In the example below, we used simpleSeg to trace the nuclei signal in the images based on the HH3 channel, expanding outward from the nucleus by 3 pixels. A more detailed explanation of each of the key parameters is available in the Processing section.

# generate masks
masks <- simpleSeg(images,
                   nucleus = c("HH3"),
                   cellBody = "dilate",
                   discSize = 3,
                   sizeSelection = 20,
                   transform = "sqrt",
                   tissue = c("panCK", "CD45", "HH3"),
                   cores = nCores)

Time for this code chunk to run with 6 cores: 65.99 s

12.2.2 Visualise separation

We can then use the display and colorLabels functions from EBImage to examine the performance of the cell segmentation.

EBImage::display(colorLabels(masks[[1]]))

12.2.3 Visualise outlines

The plotPixels function in cytomapper makes it easy to overlay the mask on top of the nucleus intensity marker to see how well our segmentation process has performed. Here, we can see that the segmentation appears to be performing reasonably. If you see over or under-segmentation of your images, discSize is a key parameter in simpleSeg for optimising the size of the dilation disc after segmenting out the nuclei.

plotPixels(image = images["F3"], 
           mask = masks["F3"],
           img_id = "imageID", 
           colour_by = c("HH3"), 
           display = "single",
           colour = list(HH3 = c("black","blue")),
           legend = NULL,
           bcg = list(
             HH3 = c(1, 1, 2)
           ))

If you wish to visualise multiple markers instead of just the HH3 marker and see how the segmentation mask performs, this can also be done. Here, we can see that our segmentation mask has done a good job of capturing the CD31 signal, but perhaps not such a good job of capturing the FXIIIA signal, which often lies outside of our dilated nuclear mask. This could suggest that we might need to increase the discSize of our dilation.

plotPixels(image = images["F3"], 
           mask = masks["F3"],
           img_id = "imageID", 
           colour_by = c("HH3", "CD31", "FX111A"), 
           display = "single",
           colour = list(HH3 = c("black","blue"),
                         CD31 = c("black", "red"),
                         FX111A = c("black", "green") ),
           legend = NULL,
           bcg = list(
             HH3 = c(1, 1, 2),
             CD31 = c(0, 1, 2),
             FX111A = c(0, 1, 1.5)
           ))

Here, we can see that our segmentation mask has done a good job of capturing the CD31 signal, but perhaps not such a good job of capturing the FXIIIA signal, which often lies outside of our dilated nuclear mask. This could suggest that we might need to increase the discSize of our dilation.

12.3 Summarise cell features

We can then use the measureObjects function from the cytomapper package to characterise the phenotypes of each of the segmented cells. measureObjects will calculate the average intensity of each channel within each cell as well as a few morphological features. By default, measureObjects will return a SingleCellExperiment object, where the channel intensities are stored in the counts assay and the spatial location of each cell is stored in colData in the m.cx and m.cy columns. We can ask measureObjects to return a SpatialExperiment object instead by specifying return_as = "spe", and the spatial coordinates of each cell will be stored in the spatialCoords slot.

# Summarise the expression of each marker in each cell
cells <- cytomapper::measureObjects(masks,
                                    images,
                                    img_id = "imageID",
                                    return_as = "spe",
                                    BPPARAM = BPPARAM)

spatialCoordsNames(cells) <- c("x", "y")

Time for this code chunk to run with 6 cores: 223.48 s

12.4 Load the clinical data

To associate features in our image with disease progression, it is important to read in information which links image identifiers to their progression status. The clinical data is available through the SpatialDatasets package.

clinical <- SpatialDatasets::fergusonClinical()

# ensure image IDs match
rownames(clinical) <- clinical$imageID
clinical <- clinical[names(images), ]

# put the clinical data into the colData slot
colData(cells) <- cbind(colData(cells), clinical[cells$imageID, ])

If needed, the SpatialExperiment object can be stored as an R Data file.

save(cells, file = "data/cells.rda")

In case you already have your SpatialExperiment/SingleCellExperiment object, you may only be interested in our downstream workflow. For the sake of convenience, we’ve provided capability to directly load in the SpatialExperiment object that we’ve generated.

load("data/cells.rda")

12.5 Normalise the data

Next, we can check if the marker intensities of each cell require some form of transformation or normalisation. The reason we do this is two-fold:

The intensities of images are often highly skewed, preventing any meaningful downstream analysis.
The intensities across different images are often different, meaning that what is considered “positive” can be different across images.

By transforming and normalising the data, we aim to reduce these two effects. Below, we extract the marker intensities from the counts assay and take a closer look at the CD3 marker, which should be expressed in the majority of T cells.

# Plot densities of CD3 for each image.
cells |> 
  join_features(features = rownames(cells), shape = "wide", assay = "counts") |> 
  ggplot(aes(x = CD3, colour = imageID)) + 
  geom_density() + 
  theme(legend.position = "none")

Here, we can see that the intensities are very clearly skewed, and it is difficult to distinguish a CD3- cell from a CD3+ cell. Further, we can clearly see some image-level batch effect, where across images, the intensity peaks differ drastically.

Another method of visualising batch effects is using a dimensionality reduction technique and visualising how the images separate out on a 2D plot. If no batch effect is expected, we should see the images largely overlap with each other.

# Usually we specify a subset of the original markers which are informative to separating out distinct cell types for the UMAP and clustering.
ct_markers <- c("podoplanin", "CD13", "CD31",
                "panCK", "CD3", "CD4", "CD8a",
                "CD20", "CD68", "CD16", "CD14", "HLADR", "CD66a")


set.seed(51773)
# Perform dimension reduction using UMAP.
cells <- scater::runUMAP(
  cells,
  subset_row = ct_markers,
  exprs_values = "counts"
)

# Select a subset of images to plot.
someImages <- unique(cells$imageID)[c(1, 5, 10, 20, 30, 40)]

# UMAP by imageID.
scater::plotReducedDim(
  cells[, cells$imageID %in% someImages],
  dimred = "UMAP",
  colour_by = "imageID"
)

The UMAP also indicates that some level of batch effect exists in our dataset.

To mitigate this, we can use the normalizeCells function from simpleSeg. Below, we perform normalisation (specified by method =) by 1) trimming the 99th percentile, 2) dividing by the mean and 3) removing the 1st principal component. This modified data is then stored in the norm assay by default.

# Leave out the nuclei markers from our normalisation process. 
useMarkers <- rownames(cells)[!rownames(cells) %in% c("DNA1", "DNA2", "HH3")]

# Transform and normalise the marker expression of each cell type.
cells <- normalizeCells(cells,
                        markers = useMarkers,
                        transformation = NULL,
                        method = c("trim99", "mean", "PC1"),
                        assayIn = "counts",
                        cores = nCores
)

# Plot densities of CD3 for each image
cells |> 
  join_features(features = rownames(cells), shape = "wide", assay = "norm") |> 
  ggplot(aes(x = CD3, colour = imageID)) + 
  geom_density() + 
  theme(legend.position = "none")

We can see that this normalised data appears more bimodal, not perfectly, but likely to a sufficient degree for clustering, as we can at least observe a clear CD3+ peak at 1.00, and a CD3- peak at around 0.3. For more information on available normalisation and transformation methods and a detailed explanation of the parameters, refer to our Quality Control section.

We can also appreciate through the UMAP a reduction of the batch effect we initially saw.

set.seed(51773)
# Perform dimension reduction using UMAP.
cells <- scater::runUMAP(
  cells,
  subset_row = ct_markers,
  exprs_values = "norm",
  name = "normUMAP"
)

someImages <- unique(cells$imageID)[c(1, 5, 10, 20, 30, 40)]

# UMAP by imageID.
scater::plotReducedDim(
  cells[, cells$imageID %in% someImages],
  dimred = "normUMAP",
  colour_by = "imageID"
)

12.6 FuseSOM: Cluster cells into cell types

We can also appreciate from the UMAP above that there is a division of clusters, most likely representing different cell types. We next aim to empirically distinguish each cluster using our FuseSOM package for clustering. FuseSOM provides a pipeline for the clustering of highly multiplexed in situ imaging cytometry assays. This pipeline uses the Self Organising Map architecture coupled with Multiview hierarchical clustering and provides functions for the estimation of the number of clusters.

Here, we cluster using the runFuseSOM function. We specify the number of clusters to identify to be numClusters = 10. We also specify a set of cell-type specific markers to use (ct_markers), as we want our clusters to be distinct based off cell type markers, rather than markers which might pick up a transitioning cell state.

# Generate SOM and cluster cells into 10 groups
cells <- runFuseSOM(
  cells,
  markers = ct_markers,
  assay = "norm",
  numClusters = 10)

We can also observe how reasonable our choice of k = 10 was, using the estimateNumCluster and optiPlot functions. Here we examine the Gap method, but others, such as Silhouette and Within Cluster Distance are also available. We can see that there are elbow points in the gap statistic at k = 7, k = 10, and k = 11. We’ve specified k = 10, striking a good balance between the number of clusters and the gap statistic. For more discussion on how to select an appropriate value for k, refer to our Unsupervised Clustering section.

cells <- estimateNumCluster(cells, kSeq = 2:30)
optiPlot(cells, method = "gap")

12.6.1 Interpreting cluster phenotype

We can begin the process of understanding what each of these cell clusters are by using the plotGroupedHeatmap function from scater.

# Visualise marker expression in each cluster.
scater::plotGroupedHeatmap(
  cells,
  features = ct_markers,
  group = "clusters",
  exprs_values = "norm",
  center = TRUE,
  scale = TRUE,
  zlim = c(-3, 3),
  cluster_rows = FALSE,
  block = "clusters"
)

At the least, here we can see we capture all the major immune populations that we expect to see, including the CD4 and CD8 T cells, the CD20+ B cells, the CD68+ myeloid populations, the CD66+ granulocytes, the podoplanin+ epithelial cells, and the panCK+ tumour cells.

Given domain-specific knowledge of the tumour-immune landscape, we can go ahead and annotate these clusters as cell types given their expression profiles.

cells <- cells |>
  mutate(cellType = case_when(
      clusters == "cluster_1" ~ "epithelial",
      clusters == "cluster_2" ~ "squamous", # tumour cell type
      clusters == "cluster_3" ~ "monocyte",
      clusters == "cluster_4" ~ "myeloid",
      clusters == "cluster_5" ~ "granulocyte",
      clusters == "cluster_6" ~ "CD8_T",
      clusters == "cluster_7" ~ "CD4_T",
      clusters == "cluster_8" ~ "B",
      clusters == "cluster_9" ~ "endothelial",
      clusters == "cluster_10" ~ "dendritic",
      
  ))

We might also be interested in how these cell types are distributed on the images themselves. Here we examine the distribution of clusters on image F3, noting the healthy epithelial and endothelial structures surrounded by tumour cells.

reducedDim(cells, "spatialCoords") <- spatialCoords(cells)

cells |> 
  filter(imageID == "F3") |> 
  plotReducedDim("spatialCoords", colour_by = "cellType")

We can also use the UMAP we computed earlier to visualise our data in a lower dimension and see how well our annotated cell types cluster out.

# UMAP by cell type
scater::plotReducedDim(
  cells[, cells$imageID %in% someImages],
  dimred = "normUMAP",
  colour_by = "cellType"
)

12.6.2 Testing for association between the proportion of each cell type and progressor status

We recommend using a package such as diffcyt for testing for changes in abundance of cell types. However, the colTest function from the spicyR package allows us to quickly test for associations between the proportions of the cell types and progression status using either Wilcoxon rank sum tests or t-tests.

# Test for changes in cell type proportion across progression status with student's t-test
testProp <- colTest(cells, 
                    condition = "group", 
                    feature = "cellType",
                    type = "ttest")

head(testProp)

            mean in group NP mean in group P tval.t   pval adjPval     cluster
CD8_T                 0.0580          0.0330   2.70 0.0088   0.088       CD8_T
endothelial           0.0820          0.0650   2.00 0.0580   0.290 endothelial
squamous              0.5800          0.6300  -1.50 0.1400   0.470    squamous
dendritic             0.0350          0.0290   1.20 0.2600   0.530   dendritic
myeloid               0.0061          0.0044   1.10 0.2800   0.530     myeloid
B                     0.0210          0.0130   0.99 0.3300   0.530           B

Here, we can see that both CD4 T and granulocytes cells appear to be present in different proportions in non-progressors compared to progressors.

Let’s examine one of these clusters using our getProp function from the spicyR package, which conveniently transforms our proportions into a feature matrix of images by cell type, enabling convenient downstream classification or analysis.

prop <- getProp(cells, feature = "cellType")
prop[1:5, 1:5]

             B      CD4_T      CD8_T  dendritic endothelial
A2 0.004477278 0.02529662 0.09805238 0.01880457  0.05842848
A3 0.005432814 0.04183267 0.13998551 0.02173126  0.07424846
A4 0.005383814 0.03977075 0.13042723 0.01128864  0.06304272
A5 0.018041999 0.09316770 0.06980183 0.04761905  0.06773144
A6 0.014440433 0.08561114 0.04228984 0.02836514  0.07684373

Next, let’s visualise how different the proportions are for the most significant cell type (CD4 T cell) across progression statuses using a boxplot.

# Obtain most significant cell type - CD4 T cells
clusterToUse <- rownames(testProp)[1]

prop |>
  select(all_of(clusterToUse)) |>
  tibble::rownames_to_column("imageID") |>
  left_join(clinical, by = "imageID") |>
  ggplot(aes(x = group, y = .data[[clusterToUse]], fill = group)) +
  geom_boxplot()

The boxplot visualisation of CD4 T cell proportion clearly shows that progressors have a lower proportion of CD4 T cells in the tumour core. We can repeat the above for granulocytes as well. In contrast to CD4 T cells, progressors have a higher proportion of granulocytes compared to non-progressors.

# Obtain second most significant cell type - granulocytes
clusterToUse <- rownames(testProp)[2]

prop |>
  select(all_of(clusterToUse)) |>
  tibble::rownames_to_column("imageID") |>
  left_join(clinical, by = "imageID") |>
  ggplot(aes(x = group, y = .data[[clusterToUse]], fill = group)) +
  geom_boxplot()

If you have already clustered and annotated your cells, you may only be interested in our downstream analysis capabilities, looking into identifying localisation (spicyR), cell regions (lisaClust), and cell-cell interactions (SpatioMark & Kontextual). Therefore, for the sake of convenience, we’ve provided capability to directly load in the SpatialExperiment (SPE) object that we’ve generated up to this point, complete with clusters, cell type annotations, and normalised intensities.

load("data/computed_cells.rda")

12.7 spicyR: Test spatial relationships

Our spicyR package offers a range of functions to support the analysis of immunofluorescence, imaging mass cytometry data, and other assays that provide deep phenotyping of individual cells and their spatial distribution. In this example, we use the spicy function to assess changes in spatial relationships between pairwise combinations of cells.

In simple terms, spicyR utilises the L-function to evaluate whether cell types are localized or dispersed. The L-function quantifies “closeness” between points, where higher values indicate increased localisation and lower values suggest dispersion.

Here, we quantify spatial relationships and mildly account for some global tissue structure using sigma = 50. For a more detailed explanation of the key parameters, refer to the Cell Localisation section. Additional information on optimising the parameters can be found in the spicyR paper.

spicyTest <- spicy(cells,
                   condition = "group",
                   cellTypeCol = "cellType",
                   imageIDCol = "imageID",
                   sigma = 50,
                   BPPARAM = BPPARAM)

topPairs(spicyTest, n = 10)

                         intercept coefficient    p.value adj.pvalue
granulocyte__dendritic   -8.544472   13.166206 0.01984412  0.8140202
dendritic__granulocyte   -7.913578   12.986740 0.02296098  0.8140202
epithelial__myeloid       4.599553  -13.632031 0.02840491  0.8140202
myeloid__epithelial       4.593292  -13.017068 0.03256081  0.8140202
dendritic__dendritic     41.164389  -11.645611 0.06060092  0.9366501
epithelial__squamous      2.143769   -3.616736 0.07664431  0.9366501
CD8_T__granulocyte       -9.196658   11.719884 0.08850050  0.9366501
granulocyte__granulocyte 60.308964  -26.776805 0.09868144  0.9366501
CD4_T__dendritic         10.414405    8.173354 0.10228860  0.9366501
dendritic__CD4_T         10.696917    8.124193 0.10273571  0.9366501
                                from          to
granulocyte__dendritic   granulocyte   dendritic
dendritic__granulocyte     dendritic granulocyte
epithelial__myeloid       epithelial     myeloid
myeloid__epithelial          myeloid  epithelial
dendritic__dendritic       dendritic   dendritic
epithelial__squamous      epithelial    squamous
CD8_T__granulocyte             CD8_T granulocyte
granulocyte__granulocyte granulocyte granulocyte
CD4_T__dendritic               CD4_T   dendritic
dendritic__CD4_T           dendritic       CD4_T

The most significant interaction appears to occur between granulocytes and dendritic cells with a p-value of 0.019. We can visualise these relationships using the signifPlot function.

# Visualise which relationships are changing the most.
signifPlot(spicyTest)

We observe that cell type pairs appear to become less attractive (or avoid more) in the progression sample, except for epithelial cells, which appear to become localised. We can also see that granulocytes and myeloid cells are more dispersed in progressors compared to non-progressors.

As an example, we will use the spicyBoxPlot function to investigate whether the squamous tumour cell type localises with the granulocytes and assess whether this localisation influences tumour progression versus non-progression.

spicyBoxPlot(spicyTest, 
             from = "squamous", 
             to = "granulocyte")

Warning: Removed 1 row containing non-finite outside the scale range
(`stat_boxplot()`).

Alternatively, we can look at the most differentially localised relationship between progressors and non-progressors by specifying rank = 1.

spicyBoxPlot(spicyTest, 
             rank = 1)

Warning: Removed 1 row containing non-finite outside the scale range
(`stat_boxplot()`).

Now that we have examined cellular localisation relationships, we will move on to identifying spatial domains, or regions of co-localisation.

12.8 lisaClust: Find cellular neighbourhoods

Our lisaClust package on Bioconductor provides a series of functions to identify and visualise regions of tissue where spatial associations between cell types is similar. This package can be used to provide a high-level summary of cell-type co-localisation in multiplexed imaging data that has been segmented at a single-cell resolution. Here we use the lisaClust function to clusters cells into 4 regions with distinct spatial ordering. By default, these identified regions are stored in the regions column in the colData of our SpatialExperiment object.

# Cluster cells into spatial regions with similar composition.
cells <- lisaClust(
  cells,
  Rs = c(20, 50, 100),
  k = 4,
  sigma = 20,
  cellType = "cellType",
  BPPARAM = BPPARAM)

Warning: The `BPPARAM` argument of `lisaClust()` is deprecated as of lisaClust 1.14.4.
ℹ Please use the `cores` argument instead.
ℹ The deprecated feature was likely used in the lisaClust package.
  Please report the issue at
  <https://github.com/ellispatrick/lisaClust/issues>.

Generating local L-curves.

For more information on how to choose an appropriate value for k , refer to the section on Identifying spatial domains.

12.8.1 Region-cell type enrichment heatmap

We can try to interpret which spatial orderings the regions are quantifying using the regionMap function. This plots the frequency of each cell type in a region relative to what you would expect by chance.

# Visualise the enrichment of each cell type in each region
regionMap(cells, cellType = "cellType", limit = c(0.2, 2))

We can see here that our regions have neatly separated according to biological milieu, with region 1 and region 3 containing our immune cell types, region 4 containing our healthy epithelial and endothelial cells, and region 2 containing our tumour cells.

12.8.2 Visualise regions

We can quickly examine the spatial arrangement of these regions using plotReducedDim on image F3, where we can see the same division of immune, healthy, and tumour tissue that we identified in our regionMap.

cells |> 
  filter(imageID == "F3") |> 
  plotReducedDim("spatialCoords", colour_by = "region")

Although slower, we have also implemented a function called hatchingPlot that overlays region information using a hatching pattern, allowing it to be viewed alongside cell type classifications. The granularity of the region boundaries can be controlled using the nbp argument.

# Use hatching to visualise regions and cell types.
hatchingPlot(
  cells,
  useImages = "F3",
  cellType = "cellType",
  nbp = 300)

Concave windows are temperamental. Try choosing values of window.length > and < 1 if you have problems.

12.8.3 Test for association with progression

Similar to cell type proportions, we can quickly use the colTest function to test for associations between the proportions of cells in each region and progression status by specifying feature = "region".

# Test if the proportion of each region is associated
# with progression status.
testRegion <- colTest(
  cells,
  feature = "region",
  condition = "group",
  type = "ttest")

testRegion

         mean in group NP mean in group P tval.t  pval adjPval  cluster
region_4            0.580            0.64  -1.70 0.094    0.20 region_4
region_1            0.220            0.19   1.70 0.099    0.20 region_1
region_2            0.150            0.12   0.98 0.330    0.44 region_2
region_3            0.058            0.05   0.55 0.590    0.59 region_3

From the results above, it appears that none of the regions above are significantly associated with changes in progression status.

12.9 Statial: Identify changes in cell state

Our Statial package provides a suite of functions (Kontextual) for robust quantification of cell type localisation which are invariant to changes in tissue structure. In addition, we provide a suite of functions (SpatioMark) for uncovering continuous changes in marker expression associated with varying levels of localisation.

12.9.1 SpatioMark: Continuous changes in marker expression associated with varying levels of localisation

The first step in analysing these changes is to calculate the spatial proximity (getDistances) of each cell to every cell type. These values will then be stored in the reducedDims slot of the SingleCellExperiment object under the names distances. SpatioMark also provides functionality to look into proximal cell abundance using the getAbundance function, which is further explored in the Changes in marker expression section.

cells$m.cx <- spatialCoords(cells)[,"x"]
cells$m.cy <- spatialCoords(cells)[,"y"]

cells <- getDistances(cells,
  maxDist = 200,
  nCores = nCores,
  cellType = "cellType",
  spatialCoords = c("m.cx", "m.cy"))

We can then visualise an example image, specified with image = "F3" and a particular marker interaction with cell type localisation. To visualise these changes, we specify two cell types with the from and to parameters, and a marker with the marker parameter (cell-cell-marker interactions). Here, we specify the changes in the marker podoplanin in squamous tumour cells as its localisation to epithelial cells increases or decreases, where we can observe that podoplanin decreases in tumour cells as its distance to the central cluster of epithelial cells increases.

p <- plotStateChanges(
  cells = cells,
  cellType = "cellType",
  spatialCoords = c("m.cx", "m.cy"),
  type = "distances",
  image = "F3",
  from = "squamous",
  to = "epithelial",
  marker = "podoplanin",
  size = 1,
  shape = 19,
  interactive = FALSE,
  plotModelFit = FALSE,
  method = "lm")

# plot image
p$image

# plot the scatter plot
p$scatter

`geom_smooth()` using formula = 'y ~ x'

SpatioMark aims to holistically uncover all such significant relationships by looking at all interactions across all images. The calcStateChanges function provided by Statial can be expanded for this exact purpose - by not specifying cell types, a marker, or an image, calcStateChanges will examine the most significant correlations between distance and marker expression across the entire dataset.

state_dist <- calcStateChanges(
  cells = cells,
  cellType = "cellType",
  type = "distances",
  assay = 2,
  nCores = nCores,
  minCells = 100)

head(state_dist, n = 10)

      imageID primaryCellType otherCellType     marker         coef      tval
83730      D6        squamous             B       PDL2  0.002257419  46.97671
81937      D2        squamous       myeloid      panCK -0.001636384 -40.22898
81970      D2        squamous       myeloid       TIM3 -0.002043811 -36.71967
83624      D6        squamous     dendritic      HLADR -0.001209405 -36.70992
84990      F5        squamous   granulocyte       PDL2  0.001800396  37.14811
78711      B1        squamous         CD8_T      CXCR3 -0.002693811 -38.68014
83701      D6        squamous             B      panCK  0.001874891  35.81925
76947      A2        squamous             B      CXCR3  0.001531085  35.99011
88021      H5        squamous             B      panCK  0.001951588  36.59020
91086      J5        squamous    epithelial podoplanin -0.001581229 -34.44390
               pval           fdr
83730  0.000000e+00  0.000000e+00
81937 1.236575e-294 5.709268e-290
81970 1.520922e-252 4.681398e-248
83624 2.057602e-249 4.749973e-245
84990 4.651527e-249 8.590440e-245
78711 6.097199e-240 9.383589e-236
83701 3.032438e-239 4.000218e-235
76947 1.186479e-236 1.369494e-232
88021 1.408016e-229 1.444625e-225
91086 2.192554e-222 2.024605e-218

Here, we can see that one of the most significant relationships is between myeloid cells and squamous cells (tumour cells) for the marker TIM3 in image D2. The negative coefficient associated with this relationship tells us that as myeloid cells and squamous cells become more localised, TIM3 expression on the squamous cells decreases.

We can confirm this by examining image D2 using the plotStateChanges function.

p <- plotStateChanges(
  cells = cells,
  cellType = "cellType",
  spatialCoords = c("m.cx", "m.cy"),
  type = "distances",
  image = "D2",
  from = "squamous",
  to = "myeloid",
  marker = "TIM3",
  size = 1,
  shape = 19,
  interactive = FALSE,
  plotModelFit = FALSE,
  method = "lm")

# plot image
p$image

# plot the scatter plot
p$scatter

`geom_smooth()` using formula = 'y ~ x'

The results from our SpatioMark outputs can be converted from a data.frame to a matrix, using the prepMatrix function. The choice of extracting either the t-statistic or the coefficient of the linear regression can be specified using the column = "tval" parameter, with the coefficient being the default extracted parameter. We can see that with SpatioMark, we get some features which are significant after adjusting for FDR.

# Preparing outcome vector
outcome <- cells$group[!duplicated(cells$imageID)]
names(outcome) <- cells$imageID[!duplicated(cells$imageID)]

# Preparing features for Statial
distMat <- prepMatrix(state_dist)

distMat <- distMat[names(outcome), ]

# Remove some very small values
distMat <- distMat[, colMeans(abs(distMat) > 0.0001) > .8]

# test for changes in marker expression across outcomes
results <- colTest(distMat, outcome, type = "ttest")
head(results)

                               mean in group NP mean in group P tval.t    pval
squamous__squamous__VISTA              -0.00590        -0.00230   -3.7 0.00066
endothelial__endothelial__CD13         -0.00190        -0.00068   -3.5 0.00110
endothelial__endothelial__DNA1         -0.42000        -0.03800   -3.6 0.00110
endothelial__endothelial__DNA2         -0.76000        -0.07200   -3.5 0.00120
endothelial__dendritic__ICOS           -0.00026         0.00006   -3.3 0.00280
squamous__squamous__CADM1              -0.00520        -0.00250   -3.1 0.00360
                               adjPval                        cluster
squamous__squamous__VISTA        0.072      squamous__squamous__VISTA
endothelial__endothelial__CD13   0.072 endothelial__endothelial__CD13
endothelial__endothelial__DNA1   0.072 endothelial__endothelial__DNA1
endothelial__endothelial__DNA2   0.072 endothelial__endothelial__DNA2
endothelial__dendritic__ICOS     0.130   endothelial__dendritic__ICOS
squamous__squamous__CADM1        0.140      squamous__squamous__CADM1

When we compare changes in marker expression across outcomes, we can see that one of the most significant relationships is squamous__CD4_T__OX40 , or the expression of marker OX40 in squamous cells and their localisation with CD4 T cells. Expression of the marker OX40 in MC is lower in progressors vs non-progressors. OX40 is a co-stimulatory receptor found on activated T cells, primarily CD4+ and CD8+ T cells. It belongs to the tumor necrosis factor receptor (TNFR) superfamily and plays a crucial role in regulating the immune response.

Understanding the spatial relationships of cells within tissue is crucial for interpreting their roles within tissues. However, without defining appropriate contexts, these relationships can be misinterpreted due to confounding factors such as tissue stricture and the choice of region to image. Kontextual provides a framework to generate contexts to measure spatial relationships between cells.

12.9.2 Kontextual: Robust quantification of cell type localisation which is invariant to changes in tissue structure

Kontextual is a method to evaluate the localisation relationship between two cell types in an image. Kontextual builds on the L-function by contextualising the relationship between two cell types in reference to the typical spatial behaviour of a \(3^{rd}\) cell type/population. By taking this approach, Kontextual is invariant to changes in the window of the image as well as tissue structures which may be present.

The definitions of cell types and cell states are somewhat ambiguous, cell types imply well defined groups of cells that serve different roles from one another, on the other hand cell states imply that cells are a dynamic entity which cannot be discretised, and thus exist in a continuum. For the purposes of using Kontextual we treat cell states as identified clusters of cells, where larger clusters represent a “parent” cell population, and finer sub-clusters representing a “child” cell population. For example a CD4 T cell may be considered a child to a larger parent population of Immune cells. Kontextual thus aims to see how a child population of cells deviate from the spatial behaviour of their parent population, and how that influences the localisation between the child cell state and another cell state.

12.9.2.1 Cell type hierarchy

A key input for Kontextual is an annotation of cell type hierarchies. We will need these to organise all the cells present into cell state populations or clusters, e.g. all the different B cell types are put in a vector called bcells.

Here, we use the treekoR package on Bioconductor to define these hierarchies in a data driven way.

exprs <- t(assay(cells, "norm")) |> data.frame()
fergusonTree <- treekoR::getClusterTree(exprs,
                                        cells$cellType,
                                        hierarchy_method = "hopach",
                                        scale_exprs = FALSE)

parent1 <- c("CD8_T", "CD4_T", "dendritic")
parent2 <- c("B", "granulocyte")
parent3 <- c(parent1, parent2)

parent4 <- c("myeloid", "epithelial", "squamous")
parent5 <- c(parent4, "endothelial")

all = c(parent1, parent2, parent3, parent4, parent5)

treeDf = Statial::parentCombinations(all, parent1, parent2, parent3, parent4, parent5)

fergusonTree$clust_tree |> plot()

Kontextual accepts a SingleCellExperiment object, a single image, or list of images from a SingleCellExperiment object, which gets passed into the cells argument. Here, we’ve specified Kontextual to perform calculations on all pairwise combinations for every cluster using the parentCombinations function to create the treeDf dataframe which we’ve specified in the parentDf parameter. The argument r will specify the radius which the cell relationship will be evaluated on. Kontextual supports parallel processing, the number of cores can be specified using the cores argument. Kontextual can take a single value or multiple values for each argument and will test all combinations of the arguments specified.

We can calculate all pairwise relationships across all images for a single radius.

kontext <- Kontextual(
  cells = cells,
  cellType = "cellType",
  spatialCoords = c("m.cx", "m.cy"),
  parentDf = treeDf,
  r = 50,
  cores = nCores)

Again, we can use the same colTest to quickly test for associations between the Kontextual values and survival probability using either Wilcoxon rank sum tests or t-tests. Similar to SpatioMark, we can specify using either the original L-function by specifying column = "original" in our prepMatrix function.

# Converting Kontextual result into data matrix
kontextMat <- prepMatrix(kontext)

# Replace NAs with 0
kontextMat[is.na(kontextMat)] <- 0

results <- spicyR::colTest(kontextMat, outcome, type = "ttest")

head(results)

                                  mean in group NP mean in group P tval.t
CD4_T__endothelial__parent5                    7.5           14.00   -3.0
squamous__granulocyte__parent3                -4.2            1.70   -2.5
endothelial__CD4_T__parent3                    3.2            7.50   -2.4
granulocyte__squamous__parent5                -1.6            0.71   -2.0
dendritic__CD4_T__parent3                     -6.9           -0.72   -2.0
granulocyte__endothelial__parent5             11.0           -0.76    2.0
                                    pval adjPval
CD4_T__endothelial__parent5       0.0057    0.65
squamous__granulocyte__parent3    0.0190    0.65
endothelial__CD4_T__parent3       0.0250    0.65
granulocyte__squamous__parent5    0.0530    0.65
dendritic__CD4_T__parent3         0.0580    0.65
granulocyte__endothelial__parent5 0.0590    0.65
                                                            cluster
CD4_T__endothelial__parent5             CD4_T__endothelial__parent5
squamous__granulocyte__parent3       squamous__granulocyte__parent3
endothelial__CD4_T__parent3             endothelial__CD4_T__parent3
granulocyte__squamous__parent5       granulocyte__squamous__parent5
dendritic__CD4_T__parent3                 dendritic__CD4_T__parent3
granulocyte__endothelial__parent5 granulocyte__endothelial__parent5

One of the most significant relationships is squamous__granulocyte__parent3 . When we compare the mean values for this relationship between progressors and non-progressors, we can see that squamous cells and granulocytes show increased co-localisation in progressors vs non-progressors with respect to the parent 3 broader population. When we refer to the cell type hierarchy we defined, we can see that parent 3 refers to the parent population of all immune cells.

Now that we have identified and calculated several features that can be used to measure spatial associations across cell types and regions of localisation, we will see how these metrics can be used to predict clinical outcomes for patients.

12.10 ClassifyR: Classification

Our ClassifyR package on Bioconductor provides a convenient framework for evaluating classification in R. We provide functionality to easily include four key modelling stages: data transformation, feature selection, classifier training and prediction, into a cross-validation loop. Here we use the crossValidate function to perform 100 repeats of 5-fold cross-validation to evaluate the performance of a random forest applied to five quantifications of our IMC data:

Cell type proportions (FuseSOM)
Cell type localisation from spicyR using the L-function
Region proportions from lisaClust
Cell type localisation reference to a parent cell type from Kontextual
Cell changes in response to proximal changes from SpatioMark

For a more thorough explanation of ClassifyR, refer to our Classification section.

# Create list to store data.frames
data <- list()

# Add proportions of each cell type in each image
data[["proportions"]] <- getProp(cells, "cellType")

# Add pair-wise associations
spicyMat <- bind(spicyTest)
spicyMat[is.na(spicyMat)] <- 0
spicyMat <- spicyMat |>
  select(!condition) |>
  tibble::column_to_rownames("imageID")

data[["spicyR"]] <- spicyMat

# Add proportions of each region in each image
# to the list of dataframes.
data[["lisaClust"]] <- getProp(cells, "region")


# Add SpatioMark features
data[["SpatioMark"]] <- distMat

# Add Kontextual features
data[["Kontextual"]] <- kontextMat

# Set seed
set.seed(51773)

# Perform cross-validation of a random forest model
# with 100 repeats of 5-fold cross-validation.
cv <- crossValidate(
  measurements = data,
  outcome = outcome,
  classifier = "randomForest",
  nFolds = 5,
  nRepeats = 50,
  nCores = nCores)

12.10.1 Visualise cross-validated prediction performance

Here we use the performancePlot function to assess the AUC from each repeat of the 5-fold cross-validation.

# Calculate AUC for each cross-validation repeat and plot.
performancePlot(
  cv,
  metric = "AUC",
  characteristicsList = list(x = "Assay Name"),
  orderingList = list("Assay Name" = c("proportions", "spicyR", "lisaClust", "Kontextual", "SpatioMark")))

Both cell type proportions and SpatioMark appear to capture information that is predictive of patient outcome.

We can also visualise which features were good at classifying which patients using the sampleMetricMap function from ClassifyR.

samplesMetricMap(cv)

TableGrob (2 x 1) "arrange": 2 grobs
  z     cells    name                 grob
1 1 (2-2,1-1) arrange       gtable[layout]
2 2 (1-1,1-1) arrange text[GRID.text.2130]

Overall, it appears that we were more easily able to identify non-progressors compared to progressors, and patients G6 and H2 within the progressor cohort were particularly difficult to classify.

Here we have used a pipeline of our spatial analysis R packages to demonstrate an easy way to segment, cluster, normalise, quantify and classify high dimensional in situ cytometry data all within R.

12.11 sessionInfo

sessionInfo()

R version 4.5.0 (2025-04-11)
Platform: aarch64-apple-darwin20
Running under: macOS Sonoma 14.4.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Australia/Sydney
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] SpatialDatasets_1.6.3           ExperimentHub_2.16.0           
 [3] AnnotationHub_3.16.0            BiocFileCache_2.16.0           
 [5] dbplyr_2.5.0                    SpatialExperiment_1.18.1       
 [7] ttservice_0.4.1                 tidyr_1.3.1                    
 [9] tidySingleCellExperiment_1.18.1 Statial_1.10.0                 
[11] lisaClust_1.16.0                ClassifyR_3.12.0               
[13] survival_3.8-3                  BiocParallel_1.42.0            
[15] MultiAssayExperiment_1.34.0     spicyR_1.20.1                  
[17] scater_1.36.0                   scuttle_1.18.0                 
[19] ggpubr_0.6.0                    FuseSOM_1.10.0                 
[21] simpleSeg_1.9.2                 ggplot2_3.5.2                  
[23] dplyr_1.1.4                     cytomapper_1.20.0              
[25] SingleCellExperiment_1.30.1     SummarizedExperiment_1.38.1    
[27] Biobase_2.68.0                  GenomicRanges_1.60.0           
[29] GenomeInfoDb_1.44.0             IRanges_2.42.0                 
[31] S4Vectors_0.46.0                BiocGenerics_0.54.0            
[33] generics_0.1.4                  MatrixGenerics_1.20.0          
[35] matrixStats_1.5.0               EBImage_4.50.0                 

loaded via a namespace (and not attached):
  [1] dichromat_2.0-0.1           tiff_0.1-12                
  [3] dcanr_1.24.0                FCPS_1.3.4                 
  [5] nnet_7.3-20                 goftest_1.2-3              
  [7] Biostrings_2.76.0           HDF5Array_1.36.0           
  [9] TH.data_1.1-3               vctrs_0.6.5                
 [11] spatstat.random_3.4-1       shape_1.4.6.1              
 [13] digest_0.6.37               png_0.1-8                  
 [15] proxy_0.4-27                BiocBaseUtils_1.10.0       
 [17] ggrepel_0.9.6               deldir_2.0-4               
 [19] permute_0.9-7               magick_2.8.5               
 [21] MASS_7.3-65                 reshape2_1.4.4             
 [23] httpuv_1.6.16               foreach_1.5.2              
 [25] withr_3.0.2                 ggfun_0.1.8                
 [27] psych_2.5.3                 xfun_0.52                  
 [29] ellipsis_0.3.2              doRNG_1.8.6.2              
 [31] memoise_2.0.1               ggbeeswarm_0.7.2           
 [33] RProtoBufLib_2.20.0         diptest_0.77-1             
 [35] princurve_2.1.6             systemfonts_1.2.3          
 [37] tidytree_0.4.6              zoo_1.8-14                 
 [39] GlobalOptions_0.1.2         V8_6.0.3                   
 [41] DEoptimR_1.1-3-1            Formula_1.2-5              
 [43] prabclus_2.3-4              KEGGREST_1.48.0            
 [45] promises_1.3.2              httr_1.4.7                 
 [47] rstatix_0.7.2               rhdf5filters_1.20.0        
 [49] fpc_2.2-13                  rhdf5_2.52.0               
 [51] rstudioapi_0.17.1           UCSC.utils_1.4.0           
 [53] concaveman_1.1.0            curl_6.2.3                 
 [55] ScaledMatrix_1.16.0         h5mread_1.0.1              
 [57] analogue_0.18.0             polyclip_1.10-7            
 [59] GenomeInfoDbData_1.2.14     SparseArray_1.8.0          
 [61] fftwtools_0.9-11            doParallel_1.0.17          
 [63] xtable_1.8-4                stringr_1.5.1              
 [65] evaluate_1.0.3              S4Arrays_1.8.0             
 [67] irlba_2.3.5.1               colorspace_2.1-1           
 [69] filelock_1.0.3              spatstat.data_3.1-6        
 [71] flexmix_2.3-20              magrittr_2.0.3             
 [73] ggtree_3.16.0               later_1.4.2                
 [75] viridis_0.6.5               modeltools_0.2-24          
 [77] lattice_0.22-6              genefilter_1.90.0          
 [79] spatstat.geom_3.4-1         robustbase_0.99-4-1        
 [81] XML_3.99-0.18               cowplot_1.1.3              
 [83] RcppAnnoy_0.0.22            ggupset_0.4.1              
 [85] class_7.3-23                svgPanZoom_0.3.4           
 [87] pillar_1.10.2               nlme_3.1-168               
 [89] iterators_1.0.14            compiler_4.5.0             
 [91] beachmat_2.24.0             stringi_1.8.7              
 [93] tensor_1.5                  minqa_1.2.8                
 [95] plyr_1.8.9                  treekoR_1.16.0             
 [97] crayon_1.5.3                abind_1.4-8                
 [99] gridGraphics_0.5-1          locfit_1.5-9.12            
[101] sp_2.2-0                    bit_4.6.0                  
[103] terra_1.8-50                sandwich_3.1-1             
[105] multcomp_1.4-28             fastcluster_1.3.0          
[107] codetools_0.2-20            textshaping_1.0.1          
[109] BiocSingular_1.24.0         coop_0.6-3                 
[111] GetoptLong_1.0.5            plotly_4.10.4              
[113] mime_0.13                   splines_4.5.0              
[115] circlize_0.4.16             Rcpp_1.0.14                
[117] profileModel_0.6.1          knitr_1.50                 
[119] blob_1.2.4                  clue_0.3-66                
[121] BiocVersion_3.21.1          lme4_1.1-37                
[123] fs_1.6.6                    nnls_1.6                   
[125] Rdpack_2.6.4                ggsignif_0.6.4             
[127] ggplotify_0.1.2             tibble_3.2.1               
[129] Matrix_1.7-3                scam_1.2-19                
[131] statmod_1.5.0               svglite_2.2.1              
[133] tweenr_2.0.3                pkgconfig_2.0.3            
[135] pheatmap_1.0.12             tools_4.5.0                
[137] cachem_1.1.0                rbibutils_2.3              
[139] RSQLite_2.3.11              viridisLite_0.4.2          
[141] DBI_1.2.3                   numDeriv_2016.8-1.1        
[143] fastmap_1.2.0               rmarkdown_2.29             
[145] scales_1.4.0                grid_4.5.0                 
[147] shinydashboard_0.7.3        broom_1.0.8                
[149] patchwork_1.3.0             brglm_0.7.2                
[151] BiocManager_1.30.25         carData_3.0-5              
[153] farver_2.1.2                reformulas_0.4.1           
[155] mgcv_1.9-3                  yaml_2.3.10                
[157] ggthemes_5.1.0              cli_3.6.5                  
[159] purrr_1.0.4                 hopach_2.68.0              
[161] lifecycle_1.0.4             uwot_0.2.3                 
[163] mvtnorm_1.3-3               kernlab_0.9-33             
[165] backports_1.5.0             annotate_1.86.0            
[167] cytolib_2.20.0              gtable_0.3.6               
[169] rjson_0.2.23                parallel_4.5.0             
[171] ape_5.8-1                   limma_3.64.1               
[173] edgeR_4.6.2                 jsonlite_2.0.0             
[175] bitops_1.0-9                bit64_4.6.0-1              
[177] Rtsne_0.17                  FlowSOM_2.16.0             
[179] yulab.utils_0.2.0           vegan_2.6-10               
[181] spatstat.utils_3.1-4        BiocNeighbors_2.2.0        
[183] ranger_0.17.0               flowCore_2.20.0            
[185] bdsmatrix_1.3-7             spatstat.univar_3.1-3      
[187] lazyeval_0.2.2              ConsensusClusterPlus_1.72.0
[189] shiny_1.10.0                htmltools_0.5.8.1          
[191] diffcyt_1.28.0              rappdirs_0.3.3             
[193] glue_1.8.0                  XVector_0.48.0             
[195] RCurl_1.98-1.17             treeio_1.32.0              
[197] mclust_6.1.1                mnormt_2.1.1               
[199] coxme_2.2-22                jpeg_0.1-11                
[201] gridExtra_2.3               boot_1.3-31                
[203] igraph_2.1.4                R6_2.6.1                   
[205] ggiraph_0.8.13              labeling_0.4.3             
[207] ggh4x_0.3.0                 cluster_2.1.8.1            
[209] rngtools_1.5.2              Rhdf5lib_1.30.0            
[211] aplot_0.2.5                 nloptr_2.2.1               
[213] DelayedArray_0.34.1         tidyselect_1.2.1           
[215] vipor_0.4.7                 ggforce_0.4.2              
[217] raster_3.6-32               car_3.1-3                  
[219] AnnotationDbi_1.70.0        rsvd_1.0.5                 
[221] DataVisualizations_1.3.3    data.table_1.17.4          
[223] ComplexHeatmap_2.24.0       htmlwidgets_1.6.4          
[225] RColorBrewer_1.1-3          rlang_1.1.6                
[227] spatstat.sparse_3.1-0       spatstat.explore_3.4-3     
[229] colorRamps_2.3.4            lmerTest_3.1-3             
[231] uuid_1.2-1                  ggnewscale_0.5.1           
[233] fansi_1.0.6                 beeswarm_0.4.0