Spatial Linear and Mixed-Effects Modelling with spicyR
Nicolas Canete
Westmead Institute for Medical Research, University of Sydney, Australianicolas.canete@sydney.edu.au
Ellis Patrick
Westmead Institute for Medical Research, University of Sydney, AustraliaSchool of Mathematics and Statistics, University of Sydney, AustraliaAlex Qin
Westmead Institute for Medical Research, University of Sydney, AustraliaSchool of Mathematics and Statistics, University of Sydney, AustraliaShreya Rao
Westmead Institute for Medical Research, University of Sydney, AustraliaSchool of Mathematics and Statistics, University of Sydney, Australia22 November 2024
Source:vignettes/spicyR.Rmd
spicyR.Rmd
Abstract
Perform linear and mixed-effects models to assess and visualise changes in cell localisation across disease conditions.
Installation
if (!require("BiocManager")) {
install.packages("BiocManager")
}
BiocManager::install("spicyR")
Overview
This guide provides step-by-step instructions on how to apply a linear model to multiple segmented and labelled images to assess how the localisation of different cell types changes across different disease conditions.
Example data
We use the Keren et al. (2018) breast cancer dataset to compare the spatial distribution of immune cells in individuals with different levels of tumour infiltration (cold and compartmentalised).
The data is stored as a SpatialExperiment
object and
contains single-cell spatial data from 41 images.
kerenSPE <- SpatialDatasets::spe_Keren_2018()
The cell types in this dataset includes 11 immune cell types (double negative CD3 T cells, CD4 T cells, B cells, monocytes, macrophages, CD8 T cells, neutrophils, natural killer cells, dendritic cells, regulatory T cells), 2 structural cell types (endothelial, mesenchymal), 2 tumour cell types (keratin+ tumour, tumour) and one unidentified category.
Linear modelling
To investigate changes in localisation between two different cell types, we measure the level of localisation between two cell types by modelling with the L-function. The L-function is a variance-stabilised K-function given by the equation
with defined as
where summarises the degree of co-localisation of cell type with cell type , and are the number of cells of type and , is the image area, is the distance between two cells and is an edge correcting factor.
Specifically, the mean difference between the experimental function and the theoretical function is used as a measure for the level of localisation, defined as
where is the sum is taken over a discrete range of between and . Differences of the statistic between two conditions is modelled using a weighted linear model.
Test for change in localisation for a specific pair of cells
Firstly, we can test whether one cell type tends to be more localised
with another cell type in one condition compared to the other. This can
be done using the spicy()
function, where we specify the
condition
parameter.
In this example, we want to see whether or not neutrophils
(to
) tend to be found around CD8 T cells
(from
) in compartmentalised tumours compared to cold
tumours. Given that there are 3 conditions, we can specify the desired
conditions by setting the order of our condition
factor.
spicy
will choose the first level of the factor as the base
condition and the second level as the comparison condition.
spicy
will also naturally coerce the condition
column into a factor if it is not already a factor. The column
containing cell type annotations can be specified using the
cellType
argument. By default, spicy
uses the
column named cellType
in the SpatialExperiment
object.
spicyTestPair <- spicy(
kerenSPE,
condition = "tumour_type",
from = "CD8_T_cell",
to = "Neutrophils"
)
topPairs(spicyTestPair)
#> intercept coefficient p.value adj.pvalue
#> CD8_T_cell__Neutrophils -109.081 112.0185 2.166646e-05 2.166646e-05
#> from to
#> CD8_T_cell__Neutrophils CD8_T_cell Neutrophils
We obtain a spicy
object which details the results of
the modelling performed. The topPairs()
function can be
used to obtain the associated coefficients and p-value.
As the coefficient
in spicyTestPair
is
positive, we find that neutrophils are significantly more likely to be
found near CD8 T cells in the compartmentalised tumours group compared
to the cold tumour group.
Test for change in localisation for all pairwise cell combinations
We can perform what we did above for all pairwise combinations of
cell types by excluding the from
and to
parameters in spicy()
.
spicyTest <- spicy(
kerenSPE,
condition = "tumour_type"
)
topPairs(spicyTest)
#> intercept coefficient p.value adj.pvalue
#> Macrophages__dn_T_CD3 56.446064 -50.08474 1.080273e-07 3.035568e-05
#> dn_T_CD3__Macrophages 54.987151 -48.38664 2.194018e-07 3.082595e-05
#> Macrophages__DC_or_Mono 73.239404 -59.90361 5.224660e-06 4.893765e-04
#> DC_or_Mono__Macrophages 71.777087 -58.46833 7.431172e-06 5.220399e-04
#> dn_T_CD3__dn_T_CD3 -63.786032 100.61010 2.878804e-05 1.208706e-03
#> Neutrophils__dn_T_CD3 -63.141840 69.64356 2.891872e-05 1.208706e-03
#> dn_T_CD3__Neutrophils -63.133725 70.15508 3.011012e-05 1.208706e-03
#> DC__Macrophages 96.893239 -92.55112 1.801300e-04 5.758129e-03
#> Macrophages__DC 96.896215 -93.25194 1.844241e-04 5.758129e-03
#> CD4_T_cell__Keratin_Tumour -4.845037 -22.14995 2.834659e-04 7.409016e-03
#> from to
#> Macrophages__dn_T_CD3 Macrophages dn_T_CD3
#> dn_T_CD3__Macrophages dn_T_CD3 Macrophages
#> Macrophages__DC_or_Mono Macrophages DC_or_Mono
#> DC_or_Mono__Macrophages DC_or_Mono Macrophages
#> dn_T_CD3__dn_T_CD3 dn_T_CD3 dn_T_CD3
#> Neutrophils__dn_T_CD3 Neutrophils dn_T_CD3
#> dn_T_CD3__Neutrophils dn_T_CD3 Neutrophils
#> DC__Macrophages DC Macrophages
#> Macrophages__DC Macrophages DC
#> CD4_T_cell__Keratin_Tumour CD4_T_cell Keratin_Tumour
Again, we obtain a spicy
object which outlines the
result of the linear models performed for each pairwise combination of
cell types.
We can also examine the L-function metrics of individual images by
using the convenient bind()
function on our
spicyTest
results object.
bind(spicyTest)[1:5, 1:5]
#> imageID condition Keratin_Tumour__Keratin_Tumour
#> 1 1 mixed -2.300602
#> 2 2 mixed -1.989699
#> 3 3 compartmentalised 11.373530
#> 4 4 compartmentalised 33.931133
#> 5 5 compartmentalised 17.922818
#> dn_T_CD3__Keratin_Tumour B_cell__Keratin_Tumour
#> 1 -5.298543 -20.827279
#> 2 -16.020022 3.025815
#> 3 -21.857447 -24.962913
#> 4 -36.438476 -40.470221
#> 5 -20.816783 -38.138076
The results can be represented as a bubble plot using the
signifPlot()
function.
signifPlot(
spicyTest,
breaks = c(-3, 3, 1),
marksToPlot = c("Macrophages", "DC_or_Mono", "dn_T_CD3", "Neutrophils",
"CD8_T_cell", "Keratin_Tumour")
)
Here, we can observe that the most significant relationships occur between macrophages and double negative CD3 T cells, suggesting that the two cell types are far more dispersed in compartmentalised tumours compared to cold tumours.
To examine a specific cell type-cell type relationship in more
detail, we can use spicyBoxplot()
and specify either
from = "Macrophages"
and to = "dn_T_CD3"
or
rank = 1
.
spicyBoxPlot(results = spicyTest,
# from = "Macrophages",
# to = "dn_T_CD3"
rank = 1)
Linear modelling for custom metrics
spicyR
can also be applied to custom distance or
abundance metrics. A kNN interactions graph can be generated with the
function buildSpatialGraph
from the imcRtools
package. This generates a colPairs
object inside of the
SpatialExperiment
object.
spicyR
provides the function convPairs
for
converting a colPairs
object into an abundance matrix by
calculating the average number of nearby cells types for every cell type
for a given k
. For example, if there exists on average 5
neutrophils for every macrophage in image 1, the column
Neutrophil__Macrophage
would have a value of 5 for image
1.
kerenSPE <- imcRtools::buildSpatialGraph(kerenSPE,
img_id = "imageID",
type = "knn", k = 20,
coords = c("x", "y"))
pairAbundances <- convPairs(kerenSPE,
colPair = "knn_interaction_graph")
head(pairAbundances["B_cell__B_cell"])
#> B_cell__B_cell
#> 1 12.7349608
#> 10 0.2777778
#> 11 0.0000000
#> 12 1.3333333
#> 13 1.2200957
#> 14 0.0000000
The custom distance or abundance metrics can then be included in the
analysis with the alternateResult
parameter. The
Statial
package contains other custom distance metrics
which can be used with spicy
.
spicyTestColPairs <- spicy(
kerenSPE,
condition = "tumour_type",
alternateResult = pairAbundances,
weights = FALSE
)
topPairs(spicyTestColPairs)
#> intercept coefficient p.value adj.pvalue
#> CD8_T_cell__Neutrophils 0.833333333 -0.7592968 0.002645466 0.3291833
#> B_cell__Tumour 0.001937984 0.2602822 0.004872664 0.3291833
#> Other_Immune__NK 0.012698413 0.2612881 0.005673068 0.3291833
#> Unidentified__CD8_T_cell 0.106626794 0.6387339 0.005906526 0.3291833
#> dn_T_CD3__NK 0.004242424 0.2148797 0.006317829 0.3291833
#> CD4_T_cell__Neutrophils 0.036213602 0.2947696 0.007902670 0.3291833
#> Tregs__CD4_T_cell 0.128876212 0.5726201 0.010207087 0.3291833
#> Endothelial__DC 0.008771930 0.3008523 0.011189533 0.3291833
#> Tumour__Neutrophils 0.021638939 0.2529045 0.011388850 0.3291833
#> Mesenchymal__Neutrophils 0.004504505 0.2494301 0.012761315 0.3291833
#> from to
#> CD8_T_cell__Neutrophils CD8_T_cell Neutrophils
#> B_cell__Tumour B_cell Tumour
#> Other_Immune__NK Other_Immune NK
#> Unidentified__CD8_T_cell Unidentified CD8_T_cell
#> dn_T_CD3__NK dn_T_CD3 NK
#> CD4_T_cell__Neutrophils CD4_T_cell Neutrophils
#> Tregs__CD4_T_cell Tregs CD4_T_cell
#> Endothelial__DC Endothelial DC
#> Tumour__Neutrophils Tumour Neutrophils
#> Mesenchymal__Neutrophils Mesenchymal Neutrophils
signifPlot(
spicyTestColPairs,
breaks = c(-3, 3, 1),
marksToPlot = c("Macrophages", "dn_T_CD3", "CD4_T_cell",
"B_cell", "DC_or_Mono", "Neutrophils", "CD8_T_cell")
)
Performing survival analysis
spicy
can also be used to perform survival analysis to
asses whether changes in co-localisation between cell types are
associated with survival probability. spicy
requires the
SingleCellExperiment
object being used to contain a column
called survival
as a Surv
object.
kerenSPE$event = 1 - kerenSPE$Censored
kerenSPE$survival = Surv(kerenSPE$`Survival_days_capped*`, kerenSPE$event)
We can then perform survival analysis using the spicy
function by specifying condition = "survival"
. We can then
access the corresponding coefficients and p-values by accessing the
survivalResults
slot in the spicy
results
object.
# Running survival analysis
spicySurvival = spicy(kerenSPE,
condition = "survival")
# top 10 significant pairs
head(spicySurvival$survivalResults, 10)
#> # A tibble: 10 × 4
#> test coef se.coef p.value
#> <chr> <dbl> <dbl> <dbl>
#> 1 Other_Immune__Tregs 0.0236 0.00866 0.00000893
#> 2 CD4_T_cell__Tregs 0.0177 0.00685 0.0000124
#> 3 Tregs__Other_Immune 0.0237 0.00873 0.0000126
#> 4 Tregs__CD4_T_cell 0.0171 0.00676 0.0000285
#> 5 CD8_T_cell__CD8_T_cell 0.00605 0.00272 0.000332
#> 6 Tumour__CD8_T_cell -0.0305 0.0114 0.000617
#> 7 CD8_T_cell__Tumour -0.0305 0.0116 0.000721
#> 8 CD4_T_cell__dn_T_CD3 0.00845 0.00353 0.000794
#> 9 dn_T_CD3__CD4_T_cell 0.00840 0.00353 0.000937
#> 10 DC__Other_Immune -0.0289 0.0123 0.00103
Accounting for tissue inhomogeneity
The spicy
function can also account for tissue
inhomogeneity to avoid false positives or negatives. This can be done by
setting the sigma =
parameter within the spicy function. By
default, sigma
is set to NULL
, and
spicy
assumes a homogeneous tissue structure.
For example, when we examine the L-function for
Keratin_Tumour__Neutrophils
when sigma = NULL
and Rs = 100
, the value is positive, indicating attraction
between the two cell types.
# filter SPE object to obtain image 24 data
kerenSubset = kerenSPE[, colData(kerenSPE)$imageID == "24"]
pairwiseAssoc = getPairwise(kerenSubset,
sigma = NULL,
Rs = 100) |>
as.data.frame()
pairwiseAssoc[["Keratin_Tumour__Neutrophils"]]
#> [1] 10.88892
When we specify sigma = 20
and re-calculate the
L-function, it indicates that there is no relationship between
Keratin_Tumour
and Neutrophils
, i.e., there is
no major attraction or dispersion, as it now takes into account tissue
inhomogeneity.
pairwiseAssoc = getPairwise(kerenSubset,
sigma = 20,
Rs = 100) |>
as.data.frame()
pairwiseAssoc[["Keratin_Tumour__Neutrophils"]]
#> [1] 0.9024836
# obtain colData for image 24
cData = colData(kerenSPE) |> as.data.frame() |>
dplyr::filter(imageID == "24")
# obtain cells present in image 24
coords = spatialCoords(kerenSPE) |> as.data.frame()
coords$cellID = rownames(coords)
coords = coords |> dplyr::filter(cellID %in% cData$CellID)
cData$X = coords$x
cData$Y = coords$y
cData = cData |>
dplyr::mutate(cellTypeNew = ifelse(cellType %in% c("Keratin_Tumour", "Neutrophils"),
cellType, "Other"))
pal = setNames(c("#d6b11c", "#850f07"),
c("Keratin_Tumour", "Neutrophils"))
ggplot() +
stat_density_2d(data = cData, aes(x = X, y = Y, fill = after_stat(density)),
geom = "raster",
contour = FALSE) +
geom_point(data = cData |> filter(cellType != "Other"),
aes(x = X, y = Y, colour = cellTypeNew), size = 1) +
scale_color_manual(values = pal) +
scale_fill_distiller(palette = "Blues", direction = 1) +
theme_classic() +
labs(title = "image ID: 24")
Plotting image 24 shows that the supposed co-localisation occurs due to the dense cluster of cells near the top of the image.
Mixed effects modelling
spicyR
supports mixed effects modelling when multiple
images are obtained for each subject. In this case, subject
is treated as a random effect and condition
is treated as a
fixed effect. To perform mixed effects modelling, we can specify the
subject
parameter in the spicy
function.
spicyMixedTest <- spicy(
diabetesData,
condition = "stage",
subject = "case"
)
References
sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.5 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] survival_3.7-0 dplyr_1.1.4
#> [3] imcRtools_1.12.0 SpatialDatasets_1.4.0
#> [5] ExperimentHub_2.14.0 AnnotationHub_3.14.0
#> [7] BiocFileCache_2.14.0 dbplyr_2.5.0
#> [9] SpatialExperiment_1.16.0 SingleCellExperiment_1.28.1
#> [11] SummarizedExperiment_1.36.0 Biobase_2.66.0
#> [13] GenomicRanges_1.58.0 GenomeInfoDb_1.42.0
#> [15] IRanges_2.40.0 S4Vectors_0.44.0
#> [17] BiocGenerics_0.52.0 MatrixGenerics_1.18.0
#> [19] matrixStats_1.4.1 ggplot2_3.5.1
#> [21] spicyR_1.19.3 BiocStyle_2.34.0
#>
#> loaded via a namespace (and not attached):
#> [1] fs_1.6.5 spatstat.sparse_3.1-0
#> [3] bitops_1.0-9 sf_1.0-19
#> [5] EBImage_4.48.0 httr_1.4.7
#> [7] RColorBrewer_1.1-3 numDeriv_2016.8-1.1
#> [9] tools_4.4.2 backports_1.5.0
#> [11] utf8_1.2.4 R6_2.5.1
#> [13] DT_0.33 HDF5Array_1.34.0
#> [15] mgcv_1.9-1 rhdf5filters_1.18.0
#> [17] withr_3.0.2 sp_2.1-4
#> [19] gridExtra_2.3 coxme_2.2-22
#> [21] ClassifyR_3.10.0 cli_3.6.3
#> [23] textshaping_0.4.0 spatstat.explore_3.3-3
#> [25] labeling_0.4.3 sass_0.4.9
#> [27] nnls_1.6 spatstat.data_3.1-4
#> [29] readr_2.1.5 proxy_0.4-27
#> [31] pkgdown_2.1.1 systemfonts_1.1.0
#> [33] ggupset_0.4.0 svglite_2.1.3
#> [35] RSQLite_2.3.8 generics_0.1.3
#> [37] vroom_1.6.5 spatstat.random_3.3-2
#> [39] car_3.1-3 scam_1.2-17
#> [41] Matrix_1.7-1 ggbeeswarm_0.7.2
#> [43] fansi_1.0.6 abind_1.4-8
#> [45] terra_1.7-83 lifecycle_1.0.4
#> [47] yaml_2.3.10 carData_3.0-5
#> [49] rhdf5_2.50.0 SparseArray_1.6.0
#> [51] grid_4.4.2 blob_1.2.4
#> [53] promises_1.3.0 crayon_1.5.3
#> [55] bdsmatrix_1.3-7 shinydashboard_0.7.2
#> [57] lattice_0.22-6 beachmat_2.22.0
#> [59] KEGGREST_1.46.0 magick_2.8.5
#> [61] cytomapper_1.18.0 pillar_1.9.0
#> [63] knitr_1.49 RTriangle_1.6-0.14
#> [65] rjson_0.2.23 boot_1.3-31
#> [67] codetools_0.2-20 glue_1.8.0
#> [69] spatstat.univar_3.1-1 data.table_1.16.2
#> [71] MultiAssayExperiment_1.32.0 vctrs_0.6.5
#> [73] png_0.1-8 gtable_0.3.6
#> [75] cachem_1.1.0 xfun_0.49
#> [77] S4Arrays_1.6.0 mime_0.12
#> [79] tidygraph_1.3.1 pheatmap_1.0.12
#> [81] units_0.8-5 nlme_3.1-166
#> [83] bit64_4.5.2 filelock_1.0.3
#> [85] bslib_0.8.0 svgPanZoom_0.3.4
#> [87] KernSmooth_2.23-24 vipor_0.4.7
#> [89] colorspace_2.1-1 DBI_1.2.3
#> [91] raster_3.6-30 tidyselect_1.2.1
#> [93] bit_4.5.0 compiler_4.4.2
#> [95] curl_6.0.1 BiocNeighbors_2.0.0
#> [97] desc_1.4.3 DelayedArray_0.32.0
#> [99] bookdown_0.41 scales_1.3.0
#> [101] classInt_0.4-10 distances_0.1.11
#> [103] rappdirs_0.3.3 tiff_0.1-12
#> [105] stringr_1.5.1 digest_0.6.37
#> [107] goftest_1.2-3 fftwtools_0.9-11
#> [109] spatstat.utils_3.1-1 minqa_1.2.8
#> [111] rmarkdown_2.29 XVector_0.46.0
#> [113] htmltools_0.5.8.1 pkgconfig_2.0.3
#> [115] jpeg_0.1-10 lme4_1.1-35.5
#> [117] fastmap_1.2.0 rlang_1.1.4
#> [119] htmlwidgets_1.6.4 ggthemes_5.1.0
#> [121] UCSC.utils_1.2.0 shiny_1.9.1
#> [123] ggh4x_0.2.8 farver_2.1.2
#> [125] jquerylib_0.1.4 jsonlite_1.8.9
#> [127] BiocParallel_1.40.0 RCurl_1.98-1.16
#> [129] magrittr_2.0.3 scuttle_1.16.0
#> [131] Formula_1.2-5 GenomeInfoDbData_1.2.13
#> [133] Rhdf5lib_1.28.0 munsell_0.5.1
#> [135] Rcpp_1.0.13-1 ggnewscale_0.5.0
#> [137] viridis_0.6.5 stringi_1.8.4
#> [139] ggraph_2.2.1 zlibbioc_1.52.0
#> [141] MASS_7.3-61 plyr_1.8.9
#> [143] parallel_4.4.2 ggrepel_0.9.6
#> [145] deldir_2.0-4 Biostrings_2.74.0
#> [147] graphlayouts_1.2.1 splines_4.4.2
#> [149] tensor_1.5 hms_1.1.3
#> [151] locfit_1.5-9.10 igraph_2.1.1
#> [153] ggpubr_0.6.0 spatstat.geom_3.3-4
#> [155] ggsignif_0.6.4 reshape2_1.4.4
#> [157] BiocVersion_3.20.0 evaluate_1.0.1
#> [159] BiocManager_1.30.25 tzdb_0.4.0
#> [161] nloptr_2.1.1 tweenr_2.0.3
#> [163] httpuv_1.6.15 tidyr_1.3.1
#> [165] purrr_1.0.2 polyclip_1.10-7
#> [167] ggforce_0.4.2 broom_1.0.7
#> [169] xtable_1.8-4 e1071_1.7-16
#> [171] rstatix_0.7.2 later_1.3.2
#> [173] class_7.3-22 viridisLite_0.4.2
#> [175] ragg_1.3.3 tibble_3.2.1
#> [177] lmerTest_3.1-3 memoise_2.0.1
#> [179] beeswarm_0.4.0 AnnotationDbi_1.68.0
#> [181] concaveman_1.1.0