🚀 Installation guide

Pre-requisites

Python >= 3.8
R >= 4.0

Note

We recommend creating a separate environment such as Mamba to avoid package conflicts.

R dependencies

Before installing Hydra, please make sure you install the following packages:

mamba install -c conda-forge -c bioconda \
  bioconductor-hdf5array \
  bioconductor-singlecellexperiment \
  bioconductor-rhdf5 \
  r-seurat \
  r-glue \
  r-reticulate \
  r-matrix \
  r-ggplot2 \
  r-rlang \
  r-ggridges \
  r-anndata \
  bioconductor-zellkonverter

Installing Hydra

Install Hydra via pip:

pip3 install hydra-tools

Verifying installation

To check the Hydra installation, please run:

hydra --help

You should see an output like this:

Thank you for using Hydra 😄, an interpretable deep generative tool for single-cell omics. Please refer to the full documentation available at https://sydneybiox.github.io/Hydra for detailed usage instructions. If you encounter any issues running the tool - Please open an issue on Github, and we will get back to you as soon as possible!!


📍 NOTE 📍: You need to run feature selection (`fs`) on the train datatset before annotating the cell types in the query dataset. If you have already run feature selection on the train & want to annotate (`annotation`) a different related query dataset, please process the data (`processdata`) first and then provide the path to the directory containing this processed data.

usage: Hydra [-h] [--seed SEED] [--train TRAIN] [--test TEST] [--celltypecol CELLTYPECOL] [--modality {rna,adt,atac}] [--base_dir DIR] [--gene GENE] [--ctofinterest CTOFINTEREST]
             [--predictions PREDICTIONS] [--ctpredictions CTPREDICTIONS] [--processdata_batch_size PROCESSDATA_BATCH_SIZE] [--batch_size BATCH_SIZE] [--attr_batch_size ATTR_BATCH_SIZE]
             [--epochs EPOCHS] [--lr LR] [--gpu GPU] [--z_dim Z_DIM] [--hidden_rna HIDDEN_RNA] [--hidden_adt HIDDEN_ADT] [--hidden_atac HIDDEN_ATAC] [--num_models NUM_MODELS] --setting
             {processdata,fs,plot,annotation}
             ...

positional arguments:
  annotation_args       Additional arguments for annotation script

options:
  -h, --help                  show this help message and exit
  --seed SEED                 seed
  --train TRAIN               Path to the training dataset (Seurat, SCE or Anndata object)
  --test TEST                 Path to the test dataset (Seurat or SCE object)
  --celltypecol CELLTYPECOL
                              Cell type label column in your input dataset (Seurat, SCE or Anndata object). Default: `cell_type`
  --modality {rna,adt,atac}
                              Input data modality. Default: `rna`
  --base_dir DIR              Path to the directory containing processed data directory. Default: Current working directory
  --gene GENE                 Name of the gene whose expression is to be highlighted in the plot
  --ctofinterest CTOFINTEREST
                              Name of the cell type for which a ridgeline plot of gene expression should be generated
  --predictions PREDICTIONS
                              Generate UMAP plot for Hydra predicted cell types
  --ctpredictions CTPREDICTIONS
                              Path to the csv file containing cell types predicted by Hydra
  --processdata_batch_size PROCESSDATA_BATCH_SIZE
                              batch size for processing reference and query datasets
  --batch_size BATCH_SIZE
                              batch size for processing data during training
  --attr_batch_size ATTR_BATCH_SIZE
                              batch size for feature atrribution. Please adjust this based on your GPU memory
  --epochs EPOCHS             num of training epochs
  --lr LR                     learning rate
  --gpu GPU                   Please specify the GPU to use
  --z_dim Z_DIM               Number of neurons in latent space
  --hidden_rna HIDDEN_RNA
                              Number of neurons for RNA layer
  --hidden_adt HIDDEN_ADT
                              Number of neurons for ADT layer
  --hidden_atac HIDDEN_ATAC
                              Number of neurons for ATAC layer
  --num_models NUM_MODELS
                              Number of models for Ensemble Learning
  --setting {processdata,fs,plot,annotation}
                            `processdata` for processing input train and test Seurat, SCE or Anndata objects;
                            `fs` for feature selection to obtain cell-identity genes;
                            `plot` for generating UMAP plot of the dataset (Additionally, highlights gene expression when called with the `--gene` argument; Generates a ridgeline plot of expression of the specified gene in cell type of interest vs all other cell types when called with `--ctofinterest` argument; Generates a UMAP plot of Hydra predicted labels when called with `--predictions` argument);
                            `annotation` for automated annotation of the query dataset

Documentation by Manoj M Wagle