Introduction to Computational Challenge

As part of Oz Single Cell 2019 conference, we are hosting a single cell data analysis challenge. The challenge aims to foster the development of ideas and approaches to better utilise data arising from single cell sequencing technology, specifically using one or more of three single cell data sets. We encourage innovation from any aspect of single cell analytics from the extraction of biological insight through to complex analytical approaches. Possible entries to the challenge include:

Challenge Entrants

Teams can consist of any number of people, but we encourage teams to have a mix of students, postdocs, and more senior academic staff. To register your team, please fill in our form here, note that this is not binding and you can change your mind later. On 15th July, we will ask each team entering the challenge to provide us with a 200-word description of their entry.

The entries close at 5pm on Friday 19th July.

Judging criteria

Participants are invited to present their analysis during the Oz Single Cell conference. Each team will be allocated a 3 min rapid-fire talk and a panel will judge the entries. The entries will be judged according to the novelty and potential impact on future single cell work. This could be in any of the following areas:

In addition, the clarity and coherence of the presentation will be taken into account.


Dataset 1 - Swarbrick Lab CITE-seq data


Dataset from the Swarbrick lab provides the participants with an opportunity to explore CITE-seq, a powerful new technology that allows simultaneous characterisation of both RNA and surface proteins. Hackers are invited to build stories around the biological (metastatic lymph node) or technical aspects (normalisation, visualisation, integration) of the data.


Cell-surface proteins were historically crucial in describing cellular states, and were often a more reliable indicator of cellular activity and function than RNA expression. The Swarbrick Lab from Garvan Institute has implemented CITE-seq technology (Stoeckius et al. 2017) that couples protein and RNA information for individual cells, allowing a more powerful characterization of cellular phenotypes than transcriptome measurements alone.


In the experiment below we applied multimodal analysis to phenotype immune cells in a human metastatic lymph node. After spiking in mouse cells (3T3) and staining with 96 CITE-seq antibodies (with IgG1, IgG2a and IgG2b as isotype controls), the samples were analysed on Chromium 10x. All of the above are available in raw format or normalized and scaled Seurat objects (v3).

  1. Subset and visualize different cell populations using integrated RNA and CITE-seq data to determine differences between cellular populations
  2. Improve the default statistical framework for the QC of CITE antibodies using mouse spike-ins and IgG controls
  3. Compare CITE profiles with publicly available cell-type signatures (i.e.
  • csv file: link
  • raw_gene_bc_matrices_h5.h5 link
  • filtered_gene_bc_matrices_h5.h5 link
  • Swarbrick_metastatic_lymphnode.CITE.csv
  1. SEURAT OBJECT (.RData file: link)
  • Swarbrick_metastatic_lymphnode.seurat.Rdata
  • Seurat (v3) object with raw and processed data human and mouse data, with CITE-seq counts (RNA and ADT (antibody-derived tags)) included.
  • data processed with Seurat_3.0.0.9100, normalized with SCTransform (stored in SCT assay)
  1. ANTIBODY ANNOTATION (csv file: link)
  • antibody_isotypes.csv
  1. SingleCellExperiment Object (.RData file: link )

Dataset 2 - Benchmarking datasets

Dataset 2.1 - Ding et al. Benchmark Datasets


A multitude of single-cell RNA sequencing methods have been developed in recent years, with dramatic advances in scale and power, and enabling major discoveries and large scale cell mapping efforts. However, these methods have not been systematically and comprehensively benchmarked. Here, we directly compare seven methods for single cell and/or single nucleus profiling from three types of samples – cell lines, peripheral blood mononuclear cells and brain tissue – generating 36 libraries in six separate experiments in a single center. To analyze these datasets, we developed and applied scumi, a flexible computational pipeline that can be used for any scRNA-seq method. We evaluated the methods for both basic performance and for their ability to recover known biological information in the samples. Our study will help guide experiments with the methods in this study as well as serve as a benchmark for future studies and for computational algorithm development.

Experimental design for Ding et al. benchmark data. (Ding et al, 2019)

Experimental design for Ding et al. benchmark data. (Ding et al, 2019)


PBMC data

There are two experiments (PBMC1 and PBMC2), each with x7 scRNA-seq methods (Sci-Seq, Smart-seq2, CEL-seq2, 10x_v2,Seq-well, Drop-seq, inDrops): 31021 cells:

Cortex data

Cortex data 13783 cells, the mouse Cortex single nucleus RNA-seq (snRNA-seq) experiments. These consist of two experiments (Cortex1 and Cortex2), each with 4 snRNA-seq methods (10X, DroNc-Seq, Sci-RNA-Seq, and Smart-seq2):

Cell mixture

cells mixture (HEK293 and NIH3T3) experiments: 28580 cells. These consist of two experiments (Mixture1 and Mixture2), each with x7 scRNA-seq methods. This dataset allowed us to better understand the ability of different methods to detect gene expression (detecting doublets):


Ding, J. et al., 2019. Systematic comparative analysis of single cell RNA-sequencing methods. bioRxiv, p.632216. Available at:

Dataset 2.2 - CellBench: single cell RNA-seq benchmarking


CellBench uses three human lung adenocarcinoma cell lines HCC827, H1975 and H2228, which were cultured separately, and then processed in three different ways. Firstly, single cells from each cell line were mixed in equal proportions, with libraries generated using three different protocols: CEL-seq2, Drop-seq (with Dolomite equipment) and 10X Chromium. Secondly, the single cells were sorted from the three cell lines into 384-well plates, with an equal number of cells per well in different combinations (generally 9-cells, but with some 90-cell population controls). Thirdly, RNA was extracted in bulk for each cell line and the RNA was mixed in 7 different proportions and diluted to single cell equivalent amounts ranging from 3.75pg to 30pg and processed using CEL-seq 2 and SORT-seq. ERCC spike-in controls were present in samples processed using the 2 plate-based technologies (CEL-seq2 and SORT-seq).

Experimental design for cell bench data. (Tian et al, 2019)

Experimental design for cell bench data. (Tian et al, 2019)


Raw data from this series of experiments is available under GEO accession number GSE118767. The processed count data obtained from scPipe and stored in R objects that use the SingleCellExperiment class can be downloaded from The website contains instructions for getting the count data and metadata (including annotations) for each dataset. All data is post sample quality control, without gene filtering.


Tian, L. et al., 2019. Benchmarking single cell RNA-sequencing analysis pipelines using mixture control experiments. Nature Methods, 16(6), pp.479–487. Available at:



If you have any questions about this challenge, feel free to email us at: