Software
If you would like to contribute to SCDNEY, check out the guidelines.
How do I process my data to get cells?
MoleculeExperiment
MoleculeExperiment contains functions to create and work with objects from the new MoleculeExperiment class. We introduce this class for analysing molecule-based spatial transcriptomics data (e.g., Xenium by 10X, Cosmx SMI by Nanostring, and Merscope by Vizgen). This allows researchers to analyse spatial transcriptomics data at the molecule level, and to have standardised data formats accross vendors.
Peters Couto B, Robertson N, Patrick E, Ghazanfar S (2023). MoleculeExperiment: Prioritising a molecule-level storage of Spatial Transcriptomics Data. doi:10.18129/B9.bioc.MoleculeExperiment, R package version 1.2.2.
scMerge
Like all gene expression data, single-cell data suffers from batch effects and other unwanted variations that makes accurate biological interpretations difficult. The scMerge method leverages factor analysis, stably expressed genes (SEGs) and (pseudo-) replicates to remove unwanted variations and merge multiple single-cell data. This package contains all the necessary functions in the scMerge pipeline, including the identification of SEGs, replication-identification methods, and merging of single-cell data.
Lin Y, Ghazanfar S, Wang K, Gagnon-Bartsch J, Lo K, Su X, Han Z, Ormerod J, Speed T, Yang P, Yang J (2019). “scMerge leverages factor analysis, stable expression, and pseudoreplication to merge multiple single-cell RNA-seq datasets.” Proceedings of the National Academy of Sciences. doi:10.1073/pnas.1820006116.
simpleSeg
Image segmentation is the process of identifying the borders of individual objects (in this case cells) within an image. This allows for the features of cells such as marker expression and morphology to be extracted, stored and analysed. simpleSeg provides functionality for user friendly, watershed based segmentation on multiplexed cellular images in R based on the intensity of user specified protein marker channels. simpleSeg can also be used for the normalization of single cell data obtained from multiple images.
Canete N, Nicholls A, Patrick E (2023). simpleSeg: A package to perform simple cell segmentation. R package version 1.4.1.
What types of cells are in my data?
Cepo
Defining the identity of a cell is fundamental to understand the heterogeneity of cells to various environmental signals and perturbations. We present Cepo, a new method to explore cell identities from single-cell RNA-sequencing data using differential stability as a new metric to define cell identity genes. Cepo computes cell-type specific gene statistics pertaining to differential stable gene expression.
Kim H, Wang K (2023). Cepo: Cepo for the identification of differentially stable genes. doi:10.18129/B9.bioc.Cepo, R package version 1.8.0.
FuseSOM
A correlation-based multiview self-organizing map for the characterization of cell types in highly multiplexed in situ imaging cytometry assays (FuseSOM
) is a tool for unsupervised clustering. FuseSOM
is robust and achieves high accuracy by combining a Self Organizing Map
architecture and a Multiview
integration of correlation based metrics. This allows FuseSOM to cluster highly multiplexed in situ imaging cytometry assays.
<0-length citation>
scClassify
scClassify is a multiscale classification framework for single-cell RNA-seq data based on ensemble learning and cell type hierarchies, enabling sample size estimation required for accurate cell type classification and joint classification of cells using multiple references.
Lin Y, Cao Y, Kim HJ, Salim A, Speed TP, Lin DM, Yang P, Yang JYH (2020). “scClassify: sample size estimation and multiscale classification of cells using single and multiple reference.” Molecular systems biology, 16(6), e9389.
scReClassify
A post hoc cell type classification tool to fine-tune cell type annotations generated by any cell type classification procedure with semi-supervised learning algorithm AdaSampling technique. The current version of scReClassify supports Support Vector Machine and Random Forest as a base classifier.
Kim T, Lo K, Geddes TA, Kim HJ, Yang JYH, Yang P (2019). “scReClassify: post hoc cell type classification of single-cell RNA-seq data.” BMC Genomics, 20(913).
How do I find changes in my data?
ClassifyR
The software formalises a framework for classification and survival model evaluation in R. There are four stages; Data transformation, feature selection, model training, and prediction. The requirements of variable types and variable order are fixed, but specialised variables for functions can also be provided. The framework is wrapped in a driver loop that reproducibly carries out a number of cross-validation schemes. Functions for differential mean, differential variability, and differential distribution are included. Additional functions may be developed by the user, by creating an interface to the framework.
Strbenac D, Mann GJ, Ormerod JT, Yang JYH (2015). “ClassifyR: an R package for performance assessment of classification with applications to transcriptomics.” Bioinformatics, 31(11), 1851-1853.
treekoR
treekoR is a novel framework that aims to utilise the hierarchical nature of single cell cytometry data to find robust and interpretable associations between cell subsets and patient clinical end points. These associations are aimed to recapitulate the nested proportions prevalent in workflows inovlving manual gating, which are often overlooked in workflows using automatic clustering to identify cell populations. We developed treekoR to: Derive a hierarchical tree structure of cell clusters; quantify a cell types as a proportion relative to all cells in a sample (%total), and, as the proportion relative to a parent population (%parent); perform significance testing using the calculated proportions; and provide an interactive html visualisation to help highlight key results.
Chan A (2023). treekoR: Cytometry Cluster Hierarchy and Cellular-to-phenotype Associations. doi:10.18129/B9.bioc.treekoR, R package version 1.10.0.
scFeatures
scFeatures constructs multi-view representations of single-cell and spatial data. scFeatures is a tool that generates multi-view representations of single-cell and spatial data through the construction of a total of 17 feature types. These features can then be used for a variety of analyses using other software in Biocondutor.
Cao,Y., Lin,Y., Patrick,E., Yang,P., Yang,J.Y.H. & (2022). “scFeatures: multi-view representations of single-cell and spatial data for disease outcome prediction.” Bioinformatics, 38(20), 4745-4753. ISSN 1367-4803, doi:10.1093/bioinformatics/btac590.
scHOT
Single cell Higher Order Testing (scHOT) is an R package that facilitates testing changes in higher order structure of gene expression along either a developmental trajectory or across space. scHOT is general and modular in nature, can be run in multiple data contexts such as along a continuous trajectory, between discrete groups, and over spatial orientations; as well as accommodate any higher order measurement such as variability or correlation. scHOT meaningfully adds to first order effect testing, such as differential expression, and provides a framework for interrogating higher order interactions from single cell data.
Ghazanfar S, Lin Y (2023). scHOT: single-cell higher order testing. doi:10.18129/B9.bioc.scHOT, R package version 1.14.0.
spicyR
The spicyR package provides a framework for performing inference on changes in spatial relationships between pairs of cell types for cell-resolution spatial omics technologies. spicyR consists of three primary steps: (i) summarizing the degree of spatial localization between pairs of cell types for each image; (ii) modelling the variability in localization summary statistics as a function of cell counts and (iii) testing for changes in spatial localizations associated with a response variable.
Canete N, Iyengar S, Ormerod J, Baharlou H, Harman A, Patrick E (2022). “spicyR: spatial analysis of in situ cytometry data in R.” Bioinformatics, 38(11), 3099–3105. doi:10.1093/bioinformatics/btac268.
Statial
Statial is a suite of functions for identifying changes in cell state. The functionality provided by Statial provides robust quantification of cell type localisation which are invariant to changes in tissue structure. In addition to this Statial uncovers changes in marker expression associated with varying levels of localisation. These features can be used to explore how the structure and function of different cell types may be altered by the agents they are surrounded with.
Ameen F, Iyengar S, Ghazanfar S, Patrick E (2023). Statial: A package to identify changes in cell state relative to spatial associations. R package version 1.4.5.
lisaClust
lisaClust provides a series of functions to identify and visualise regions of tissue where spatial associations between cell-types is similar. This package can be used to provide a high-level summary of cell-type colocalization in multiplexed imaging data that has been segmented at a single-cell resolution.
Patrick E, Canete N (2023). lisaClust: lisaClust: Clustering of Local Indicators of Spatial Association. doi:10.18129/B9.bioc.lisaClust, R package version 1.10.1.
SimBench
The SimBench package is designed for benchmarking simulation methods based on two key aspects of accuracy of data properties estimation and ability to retain biological signals. It contains functions for comparing simulated data obtained from simulation methods and real data that was used as the reference input into the simulation methods on the two aspects.
Cao Y, Yang P, Yang J (2021). SimBench: SimBench: benchmarking simulation methods. R package version 0.99.1.
Can I combine different types of omics data?
CiteFuse
CiteFuse pacakage implements a suite of methods and tools for CITE-seq data from pre-processing to integrative analytics, including doublet detection, network-based modality integration, cell type clustering, differential RNA and protein expression analysis, ADT evaluation, ligand-receptor interaction analysis, and interactive web-based visualisation of the analyses.
Kim HJ, Lin Y, Geddes TA, Yang P, Yang JYH (2020). “CiteFuse enables multi-modal analysis of CITE-seq data.” Bioinformatics, 36(14), 4137–4143.
StabMap
StabMap performs single cell mosaic data integration by first building a mosaic data topology, and for each reference dataset, traverses the topology to project and predict data onto a common embedding. Mosaic data should be provided in a list format, with all relevant features included in the data matrices within each list object. The output of stabMap is a joint low-dimensional embedding taking into account all available relevant features. Expression imputation can also be performed using the StabMap embedding and any of the original data matrices for given reference and query cell lists.
Ghazanfar S (2023). StabMap: Stabilised mosaic single cell data integration using unshared features. R package version 0.1.8.