5 Quality Control
With cell segmentation complete, we can move on to performing quality control checks. In many spatial imaging protocols, there tends to be a degree of variability in the intensity of each image. For example, in one image, the CD3 stain may be too strong, but in another image the CD3 staining may be particularly weak. This variability is often inevitable and can be hard to correct for during the imaging process. Hence, it is important that we identify when such variance occurs and correct it. The simpleSeg package includes additional functions for detecting batch effects and normalising expression data to minimise their impact.
# set parameters
set.seed(51773)
use_mc <- TRUE
if (use_mc) {
nCores <- max(parallel::detectCores()/2, 1)
} else {
nCores <- 1
}
BPPARAM <- simpleSeg:::generateBPParam(nCores)
theme_set(theme_classic())
5.1 simpleSeg: Do my images have a batch effect?
First, let’s load in the images we previously segmented out in the last section. The SpatialDatasets
package conveniently provides the segmented out images for the Ferguson 2022 dataset.
# load in segmented data
fergusonSPE <- SpatialDatasets::spe_Ferguson_2022()
Next, we can check if the marker intensities of each cell require some form of transformation or normalisation. The reason we do this is two-fold:
- The intensities of images are often highly skewed, preventing any meaningful downstream analysis.
- The intensities across different images are often different, meaning that what is considered “positive” can be different across images.
By transforming and normalising the data, we aim to reduce these two effects. Below, we extract the marker intensities from the counts
assay and take a closer look at the CD3 marker, which should be expressed in the majority of T cells.
# plot densities of CD3 for each image
fergusonSPE |>
join_features(features = rownames(fergusonSPE), shape = "wide", assay = "counts") |>
ggplot(aes(x = CD3, colour = imageID)) +
geom_density() +
theme(legend.position = "none")
What we’re looking for
- Do the CD3+ and CD3- peaks clearly separate out in the density plot? To ensure that downstream clustering goes smoothly, we want our cell type specific markers to show 2 distinct peaks representing our CD3+ and CD3- cells. If we see 3 or more peaks where we don’t expect, this might be an indicator that further normalisation is required.
- Are our CD3+ and CD3- peaks consistent across our images? We want to make sure that our density plots for CD3 are largely the same across images so that a CD3+ cell in one image is equivalent to a CD3+ cell in another image.
Here, we can see that the intensities are very clearly skewed, and it is difficult to distinguish a CD3- cell from a CD3+ cell. Further, we can clearly see some image-level batch effects, where across images, the intensity peaks differ drastically.
Another method of visualising batch effects is using a dimensionality reduction technique and visualising how the images separate out on a 2D plot. If no batch effect is expected, we should see the images largely overlap with each other.
set.seed(51773)
# specify a subset of informative markers for UMAP and clustering
ct_markers <- c("podoplanin", "CD13", "CD31",
"panCK", "CD3", "CD4", "CD8a",
"CD20", "CD68", "CD16", "CD14",
"HLADR", "CD66a")
# perform dimension reduction using UMAP
fergusonSPE <- scater::runUMAP(
fergusonSPE,
subset_row = ct_markers,
exprs_values = "counts"
)
# select a subset of images to plot
someImages <- unique(fergusonSPE$imageID)[c(1, 5, 10, 20, 30, 40)]
# UMAP by imageID
scater::plotReducedDim(
fergusonSPE[, fergusonSPE$imageID %in% someImages],
dimred = "UMAP",
colour_by = "imageID"
)
Time for this code chunk to run with 5.5 cores: 47.35 seconds
The UMAP also indicates that some level of batch effect exists in our dataset.
We can use the normalizeCells
function from the simpleSeg package to correct for image-level batch effects. We specify the following parameters:
-
transformation
is an optional argument which specifies the function to be applied to the data. We do not apply an arcsinh transformation here, as we already apply a square root transform in thesimpleSeg
function. -
method = c("trim99", "mean", PC1")
is an optional argument which specifies the normalisation method(s) to be performed. A comprehensive table of methods is provided below. -
assayIn = "counts"
is a required argument which specifies the name of the assay that contains our intensity data. In our context, this is calledcounts
.
# leave out the nuclei markers from our normalisation process
useMarkers <- rownames(fergusonSPE)[!rownames(fergusonSPE) %in% c("DNA1", "DNA2", "HH3")]
# transform and normalise the marker expression of each cell type
fergusonSPE <- normalizeCells(fergusonSPE,
markers = useMarkers,
transformation = NULL,
method = c("trim99", "mean", "PC1"),
assayIn = "counts",
cores = nCores)
This modified data is then stored in the norm
assay by default, but can be changed using the assayOut
parameter.
5.1.0.1 method
Parameters
Method | Description |
---|---|
“mean” | Divides the marker cellular marker intensities by their mean. |
“minMax” | Subtracts the minimum value and scales markers between 0 and 1. |
“trim99” | Sets the highest 1% of values to the value of the 99th percentile.` |
“PC1” | Removes the 1st principal component) can be performed with one call of the function, in the order specified by the user. |
We can then plot the same density curve for the CD3 marker using the normalised data.
# plot densities of CD3 for each image
fergusonSPE |>
join_features(features = rownames(fergusonSPE), shape = "wide", assay = "norm") |>
ggplot(aes(x = CD3, colour = imageID)) +
geom_density() +
theme(legend.position = "none")
Questions revisited
- Do the CD3+ and CD3- peaks clearly separate out in the density plot? If not, we can try optimising the transformation if the distribution looks heavily skewed.
- Are our CD3+ and CD3- peaks consistent across our images? We can try to be more stringent in our normalisation, such as by removing the 1st PC (
method = c(..., "PC1")
) or scaling the values for all images between 0 and 1 (method = c(..., "minMax")
).
Here, we can see that the normalised data appears more bimodal, and we can clearly observe a CD3+ peak at 5.00, and a CD3- peak at around 3.00. Image-level batch effects also appear to have been mitigated.
We can also visualise the effect of normalisation on the UMAP, which shows that the images now overlap with each other to a much greater extent.
set.seed(51773)
# perform dimension reduction using UMAP
fergusonSPE <- scater::runUMAP(
fergusonSPE,
subset_row = ct_markers,
exprs_values = "norm",
name = "normUMAP"
)
someImages <- unique(fergusonSPE$imageID)[c(1, 5, 10, 20, 30, 40)]
# UMAP by imageID
scater::plotReducedDim(
fergusonSPE[, fergusonSPE$imageID %in% someImages],
dimred = "normUMAP",
colour_by = "imageID"
)
Time for this code chunk to run with 5.5 cores: 46.42 seconds
Now that we have completed quality control checks and normalised the expression data to address variability and batch effects, we can proceed to the next step: clustering and cell annotation.
5.2 sessionInfo
R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS Sonoma 14.4.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Australia/Sydney
tzcode source: internal
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] SpatialDatasets_1.4.0 SpatialExperiment_1.16.0
[3] ExperimentHub_2.14.0 AnnotationHub_3.14.0
[5] BiocFileCache_2.14.0 dbplyr_2.5.0
[7] scater_1.34.0 scuttle_1.16.0
[9] simpleSeg_1.8.0 ggplot2_3.5.1
[11] ttservice_0.4.1 tidyr_1.3.1
[13] dplyr_1.1.4 tidySingleCellExperiment_1.16.0
[15] SingleCellExperiment_1.28.1 SummarizedExperiment_1.36.0
[17] Biobase_2.66.0 GenomicRanges_1.58.0
[19] GenomeInfoDb_1.42.0 IRanges_2.40.0
[21] S4Vectors_0.44.0 BiocGenerics_0.52.0
[23] MatrixGenerics_1.18.0 matrixStats_1.4.1
loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-3 rstudioapi_0.17.1 jsonlite_1.8.9
[4] magrittr_2.0.3 spatstat.utils_3.1-1 ggbeeswarm_0.7.2
[7] magick_2.8.5 farver_2.1.2 rmarkdown_2.29
[10] zlibbioc_1.52.0 vctrs_0.6.5 memoise_2.0.1
[13] RCurl_1.98-1.16 terra_1.7-78 svgPanZoom_0.3.4
[16] htmltools_0.5.8.1 S4Arrays_1.6.0 curl_6.0.1
[19] BiocNeighbors_2.0.0 raster_3.6-30 Rhdf5lib_1.28.0
[22] SparseArray_1.6.0 rhdf5_2.50.0 htmlwidgets_1.6.4
[25] cachem_1.1.0 plotly_4.10.4 mime_0.12
[28] lifecycle_1.0.4 pkgconfig_2.0.3 rsvd_1.0.5
[31] Matrix_1.7-1 R6_2.5.1 fastmap_1.2.0
[34] GenomeInfoDbData_1.2.13 shiny_1.9.1 digest_0.6.37
[37] colorspace_2.1-1 AnnotationDbi_1.68.0 irlba_2.3.5.1
[40] RSQLite_2.3.8 beachmat_2.22.0 labeling_0.4.3
[43] filelock_1.0.3 fansi_1.0.6 nnls_1.6
[46] httr_1.4.7 polyclip_1.10-7 abind_1.4-8
[49] compiler_4.4.1 bit64_4.5.2 withr_3.0.2
[52] tiff_0.1-12 BiocParallel_1.40.0 DBI_1.2.3
[55] viridis_0.6.5 HDF5Array_1.34.0 cytomapper_1.18.0
[58] rappdirs_0.3.3 DelayedArray_0.32.0 rjson_0.2.23
[61] tools_4.4.1 vipor_0.4.7 beeswarm_0.4.0
[64] httpuv_1.6.15 glue_1.8.0 EBImage_4.48.0
[67] rhdf5filters_1.18.0 promises_1.3.2 grid_4.4.1
[70] generics_0.1.3 gtable_0.3.6 spatstat.data_3.1-4
[73] data.table_1.16.2 BiocSingular_1.22.0 ScaledMatrix_1.14.0
[76] sp_2.1-4 utf8_1.2.4 XVector_0.46.0
[79] RcppAnnoy_0.0.22 spatstat.geom_3.3-4 BiocVersion_3.20.0
[82] ggrepel_0.9.6 pillar_1.9.0 stringr_1.5.1
[85] later_1.4.1 lattice_0.22-6 bit_4.5.0
[88] deldir_2.0-4 tidyselect_1.2.1 locfit_1.5-9.10
[91] Biostrings_2.74.0 knitr_1.49 gridExtra_2.3
[94] svglite_2.1.3 xfun_0.49 shinydashboard_0.7.2
[97] stringi_1.8.4 UCSC.utils_1.2.0 fftwtools_0.9-11
[100] yaml_2.3.10 lazyeval_0.2.2 evaluate_1.0.1
[103] codetools_0.2-20 tibble_3.2.1 BiocManager_1.30.25
[106] cli_3.6.3 uwot_0.2.2 xtable_1.8-4
[109] systemfonts_1.1.0 munsell_0.5.1 Rcpp_1.0.13-1
[112] png_0.1-8 spatstat.univar_3.1-1 parallel_4.4.1
[115] ellipsis_0.3.2 blob_1.2.4 jpeg_0.1-10
[118] bitops_1.0-9 viridisLite_0.4.2 scales_1.3.0
[121] purrr_1.0.2 crayon_1.5.3 rlang_1.1.4
[124] cowplot_1.1.3 KEGGREST_1.46.0