🚀 Installation guide
Pre-requisites
- Python >= 3.8
- R >= 4.0
Note
We recommend creating a separate environment such as Mamba to avoid package conflicts.
R dependencies
Before installing Hydra, please make sure you install the following packages:
mamba install -c conda-forge -c bioconda \
bioconductor-hdf5array \
bioconductor-singlecellexperiment \
bioconductor-rhdf5 \
r-seurat \
r-glue \
r-reticulate \
r-matrix \
r-ggplot2 \
r-rlang \
r-ggridges \
r-anndata \
bioconductor-zellkonverter
Installing Hydra
Install Hydra via pip:
pip3 install hydra-tools
Verifying installation
To check the Hydra installation, please run:
hydra --help
You should see an output like this:
Thank you for using Hydra 😄, an interpretable deep generative tool for single-cell omics. Please refer to the full documentation available at https://sydneybiox.github.io/Hydra for detailed usage instructions. If you encounter any issues running the tool - Please open an issue on Github, and we will get back to you as soon as possible!!
📍 NOTE 📍: You need to run feature selection (`fs`) on the train datatset before annotating the cell types in the query dataset. If you have already run feature selection on the train & want to annotate (`annotation`) a different related query dataset, please process the data (`processdata`) first and then provide the path to the directory containing this processed data.
usage: Hydra [-h] [--seed SEED] [--train TRAIN] [--test TEST] [--celltypecol CELLTYPECOL] [--modality {rna,adt,atac}] [--base_dir DIR] [--gene GENE] [--ctofinterest CTOFINTEREST]
[--predictions PREDICTIONS] [--ctpredictions CTPREDICTIONS] [--processdata_batch_size PROCESSDATA_BATCH_SIZE] [--batch_size BATCH_SIZE] [--attr_batch_size ATTR_BATCH_SIZE]
[--epochs EPOCHS] [--lr LR] [--gpu GPU] [--z_dim Z_DIM] [--hidden_rna HIDDEN_RNA] [--hidden_adt HIDDEN_ADT] [--hidden_atac HIDDEN_ATAC] [--num_models NUM_MODELS] --setting
{processdata,fs,plot,annotation}
...
positional arguments:
annotation_args Additional arguments for annotation script
options:
-h, --help show this help message and exit
--seed SEED seed
--train TRAIN Path to the training dataset (Seurat, SCE or Anndata object)
--test TEST Path to the test dataset (Seurat or SCE object)
--celltypecol CELLTYPECOL
Cell type label column in your input dataset (Seurat, SCE or Anndata object). Default: `cell_type`
--modality {rna,adt,atac}
Input data modality. Default: `rna`
--base_dir DIR Path to the directory containing processed data directory. Default: Current working directory
--gene GENE Name of the gene whose expression is to be highlighted in the plot
--ctofinterest CTOFINTEREST
Name of the cell type for which a ridgeline plot of gene expression should be generated
--predictions PREDICTIONS
Generate UMAP plot for Hydra predicted cell types
--ctpredictions CTPREDICTIONS
Path to the csv file containing cell types predicted by Hydra
--processdata_batch_size PROCESSDATA_BATCH_SIZE
batch size for processing reference and query datasets
--batch_size BATCH_SIZE
batch size for processing data during training
--attr_batch_size ATTR_BATCH_SIZE
batch size for feature atrribution. Please adjust this based on your GPU memory
--epochs EPOCHS num of training epochs
--lr LR learning rate
--gpu GPU Please specify the GPU to use
--z_dim Z_DIM Number of neurons in latent space
--hidden_rna HIDDEN_RNA
Number of neurons for RNA layer
--hidden_adt HIDDEN_ADT
Number of neurons for ADT layer
--hidden_atac HIDDEN_ATAC
Number of neurons for ATAC layer
--num_models NUM_MODELS
Number of models for Ensemble Learning
--setting {processdata,fs,plot,annotation}
`processdata` for processing input train and test Seurat, SCE or Anndata objects;
`fs` for feature selection to obtain cell-identity genes;
`plot` for generating UMAP plot of the dataset (Additionally, highlights gene expression when called with the `--gene` argument; Generates a ridgeline plot of expression of the specified gene in cell type of interest vs all other cell types when called with `--ctofinterest` argument; Generates a UMAP plot of Hydra predicted labels when called with `--predictions` argument);
`annotation` for automated annotation of the query dataset
Documentation by Manoj M Wagle