Single-cell transcriptomics
In this tutorial, we will use the example Lung datasets (Madissoon et al and He et al) introduced in the data processing step earlier. In particular, we will explore how to use Hydra
to:
- Select cell-identity genes in a reference scRNA-seq dataset
- Visualize the expression of the selected identity genes
- Perform automated cell type annotation of a query dataset
Capturing cell-identity genes
A key component of Hydra
is selecting features that are specific to the cell type of interest. As an interpretable deep learning-based tool, Hydra
employs a post-hoc feature attribution approach called Integrated Gradients on the trained VAE models.
After the data processing step, the processed Reference and Query datasets are stored in the directory Input_Processed
. By default, this directory will be saved in the working directory from where Hydra
was called.
Below is an example of how to run feature selection:
hydra --setting fs --base_dir [Path to the directory containing `Input_Processed` directory]
Note
Depending on your GPU memory, you might want to adjust the batch size for feature attribution using the argument attr_batch_size
. We recommend providing smaller batch sizes with lower GPU memory to avoid CUDA out of memory error (Default: 500, tested on GPU - 24GB memory).
Running feature selection on our reference dataset - Lung (Madissoon et al). Since we are running fs
from the same directory containing the Input_Processed
directory, we can skip the base_dir
argument.
hydra --setting fs
Thank you for using Hydra π, an interpretable deep generative tool for single-cell omics. Please refer to the full documentation available at https://sydneybiox.github.io/Hydra/ for detailed usage instructions. If you encounter any issues running the tool - Please open an issue on Github, and we will get back to you as soon as possible!! =============================== Device to be used: CUDA =============================== INFO - 2025-08-10 07:26:12,597 - Starting to run INFO - 2025-08-10 07:26:12,597 - Training model... INFO - 2025-08-10 07:26:27,691 - The Dataset is: scRNA-seq 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 40/40 [00:25<00:00, 1.56it/s] INFO - 2025-08-10 07:27:15,789 - Refining Model: 1 55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 26/47 [00:00<00:00, 44.70it/s]Early stopping at epoch: 29 60%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 28/47 [00:00<00:00, 34.50it/s] INFO - 2025-08-10 07:27:19,113 - Refining Model: 2 68%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 26/38 [00:00<00:00, 31.17it/s]Early stopping at epoch: 35 89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 34/38 [00:01<00:00, 30.72it/s] INFO - 2025-08-10 07:27:20,479 - Refining Model: 3 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 48/48 [00:01<00:00, 37.35it/s] INFO - 2025-08-10 07:27:22,026 - Refining Model: 4 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 31/31 [00:01<00:00, 29.37it/s] INFO - 2025-08-10 07:27:23,329 - Refining Model: 5 57%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 17/30 [00:00<00:00, 23.28it/s]Early stopping at epoch: 25 80%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 24/30 [00:01<00:00, 23.61it/s] INFO - 2025-08-10 07:27:24,591 - Refining Model: 6 62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 29/47 [00:00<00:00, 43.00it/s]Early stopping at epoch: 30 62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 29/47 [00:00<00:00, 30.83it/s] INFO - 2025-08-10 07:27:25,782 - Refining Model: 7 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 40/40 [00:01<00:00, 33.50it/s] INFO - 2025-08-10 07:27:27,247 - Refining Model: 8 71%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 34/48 [00:01<00:00, 42.93it/s]Early stopping at epoch: 36 73%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 35/48 [00:01<00:00, 31.24it/s] INFO - 2025-08-10 07:27:28,633 - Refining Model: 9 66%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 25/38 [00:01<00:00, 37.84it/s]Early stopping at epoch: 27 68%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 26/38 [00:01<00:00, 25.03it/s] INFO - 2025-08-10 07:27:29,911 - Refining Model: 10 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 31/31 [00:01<00:00, 27.28it/s] INFO - 2025-08-10 07:27:31,291 - Refining Model: 11 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 50/50 [00:01<00:00, 39.54it/s] INFO - 2025-08-10 07:27:32,821 - Refining Model: 12 76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 32/42 [00:00<00:00, 48.71it/s]Early stopping at epoch: 35 81%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 34/42 [00:00<00:00, 38.05it/s] INFO - 2025-08-10 07:27:33,986 - Refining Model: 13 46%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 19/41 [00:00<00:00, 26.95it/s]Early stopping at epoch: 28 66%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 27/41 [00:00<00:00, 28.61it/s] INFO - 2025-08-10 07:27:35,198 - Refining Model: 14 60%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 21/35 [00:00<00:00, 37.88it/s]Early stopping at epoch: 24 66%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 23/35 [00:00<00:00, 30.65it/s] INFO - 2025-08-10 07:27:36,237 - Refining Model: 15 56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 27/48 [00:00<00:00, 45.71it/s]Early stopping at epoch: 29 58%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 28/48 [00:00<00:00, 32.05it/s] INFO - 2025-08-10 07:27:37,375 - Refining Model: 16 82%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 27/33 [00:00<00:00, 43.76it/s]Early stopping at epoch: 30 88%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 29/33 [00:00<00:00, 31.14it/s] INFO - 2025-08-10 07:27:38,570 - Refining Model: 17 78%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 31/40 [00:00<00:00, 48.20it/s]Early stopping at epoch: 34 82%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 33/40 [00:00<00:00, 35.20it/s] INFO - 2025-08-10 07:27:39,767 - Refining Model: 18 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 36/36 [00:01<00:00, 33.82it/s] INFO - 2025-08-10 07:27:41,093 - Refining Model: 19 65%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 22/34 [00:00<00:00, 31.60it/s]Early stopping at epoch: 30 85%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 29/34 [00:00<00:00, 32.17it/s] INFO - 2025-08-10 07:27:42,242 - Refining Model: 20 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 32/32 [00:00<00:00, 32.98it/s] INFO - 2025-08-10 07:27:43,457 - Refining Model: 21 37%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 13/35 [00:00<00:00, 25.98it/s]Early stopping at epoch: 22 60%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 21/35 [00:00<00:00, 31.65it/s] INFO - 2025-08-10 07:27:44,368 - Refining Model: 22 62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 25/40 [00:00<00:00, 46.78it/s]Early stopping at epoch: 28 68%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 27/40 [00:00<00:00, 35.76it/s] INFO - 2025-08-10 07:27:45,363 - Refining Model: 23 70%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 31/44 [00:00<00:00, 49.81it/s]Early stopping at epoch: 34 75%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 33/44 [00:00<00:00, 36.71it/s] INFO - 2025-08-10 07:27:46,519 - Refining Model: 24 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:01<00:00, 29.74it/s] INFO - 2025-08-10 07:27:47,785 - Refining Model: 25 64%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 21/33 [00:00<00:00, 33.83it/s]Early stopping at epoch: 29 85%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 28/33 [00:00<00:00, 33.51it/s] INFO - 2025-08-10 07:27:48,887 - Running feature selection... 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 9/9 [01:18<00:00, 8.77s/it] INFO - 2025-08-10 07:29:14,545 - Completed successfully!
Note
By default, Hydra uses an ensemble size = 25. If you wish to modify the number of models, please use the argument num_models
while running the tool.
Understanding feature selection results
The feature selection process outputs a set of cell type-specific features along with their corresponding importance scores derived from the reference dataset.
This will be stored in the Feature_Selection
sub-directory located within the Results
directory.
Results/ βββ Feature_Selection βββ Hydra-25 βββ fs.CD4-positive, alpha-beta T cell_Hydra.csv βββ fs.dendritic cell_Hydra.csv βββ fs.endothelial cell of lymphatic vessel_Hydra.csv βββ fs.lung multiciliated epithelial cell_Hydra.csv βββ fs.mast cell_Hydra.csv βββ fs.natural killer cell_Hydra.csv βββ fs.plasmacytoid dendritic cell_Hydra.csv βββ fs.regulatory T cell_Hydra.csv βββ fs.T cell_Hydra.csv
Below is an example of feature selection results for the cell type - endothelial cell of lymphatic vessel:

- By default,
Hydra
outputs the importance scores for all genes for each cell type. The genes are ranked based on their scores, with higher scores indicating better ranks. - Each row corresponds to a gene, its associated modality and index number in the dataset (Gene_Modality_Index), and its corresponding importance score.
Visualizing marker gene expression
Hydra
allows users to visualize the expression of marker genes directly within a dataset. Hydra
utilizes Seurat, ggplot2 and ggridges to generate these plots.
Here, we will visualize the expression of the top gene ENSG00000137077.
hydra --setting plot --modality rna --train scRNA/Lung_Madissoon.h5ad --gene "ENSG00000137077" --ctofinterest "endothelial cell of lymphatic vessel" --celltypecol cell_type
Note
If you want to visualize the marker gene expression in the annotated query dataset, you need to map the cell type labels predicted by Hydra and specify the celltypecol
Thank you for using Hydra π, an interpretable deep generative tool for single-cell omics. Please refer to the full documentation available at https://sydneybiox.github.io/Hydra/ for detailed usage instructions. If you encounter any issues running the tool - Please open an issue on Github, and we will get back to you as soon as possible!! =============================== Device to be used: CUDA =============================== INFO - 2025-08-10 19:31:02,010 - Starting to run INFO - 2025-08-10 19:31:02,010 - Generating plot... Warning: Data is of class matrix. Coercing to dgCMatrix. Normalizing layer: counts Performing log-normalization 0% 10 20 30 40 50 60 70 80 90 100% [----|----|----|----|----|----|----|----|----|----| **************************************************| Finding variable features for layer counts Calculating gene variances 0% 10 20 30 40 50 60 70 80 90 100% [----|----|----|----|----|----|----|----|----|----| **************************************************| Calculating feature variances of standardized and clipped values 0% 10 20 30 40 50 60 70 80 90 100% [----|----|----|----|----|----|----|----|----|----| **************************************************| Centering and scaling data matrix |======================================================================| 100% PC_ 1 Positiveegativeositiveegativeositiveegativeositiveegativeositiveegative: ENSG00000108821, ENSG00000118849, ENSG00000011465, ENSG00000077942, ENSG00000164647, ENSG00000164692, ENSG00000168542, ENSG00000109846, ENSG00000116132, ENSG00000124212 ENSG00000142156, ENSG00000139329, ENSG00000188257, ENSG00000205362, ENSG00000167779, ENSG00000129009, ENSG00000111341, ENSG00000134853, ENSG00000197766, ENSG00000113140 ENSG00000004776, ENSG00000162576, ENSG00000106333, ENSG00000091986, ENSG00000109610, ENSG00000157613, ENSG00000112936, ENSG00000140285, ENSG00000163520, ENSG00000174807 Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation' This message will be shown once per session 19:31:17 UMAP embedding parameters a = 0.9922 b = 1.112 19:31:17 Read 5950 rows and found 10 numeric columns 19:31:17 Using Annoy for neighbor search, n_neighbors = 30 19:31:17 Building Annoy index with metric = cosine, n_trees = 50 0% 10 20 30 40 50 60 70 80 90 100% [----|----|----|----|----|----|----|----|----|----| **************************************************| 19:31:17 Writing NN index file to temp file /tmp/RtmpSm0msP/file3e8b8a7a56378a 19:31:17 Searching Annoy index using 1 thread, search_k = 3000 19:31:19 Annoy recall = 100% 19:31:19 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30 19:31:19 Initializing from normalized Laplacian + noise (using RSpectra) 19:31:20 Commencing optimization for 500 epochs, with 249888 positive edges 19:31:20 Using rng type: pcg Using method 'umap' 0% 10 20 30 40 50 60 70 80 90 100% [----|----|----|----|----|----|----|----|----|----| **************************************************| 19:31:25 Optimization finished Scale for colour is already present. Adding another scale for colour, which will replace the existing scale. Picking joint bandwidth of 0.196 INFO - 2025-08-10 19:31:27,272 - Completed successfully!
The gene
and ctofinterest
arguments are optional. If you run Hydra
without these arguments, only a UMAP plot of the dataset will be generated.

The gene
argument highlights the expression of a specified gene across all cell types on the UMAP plot.

The ctofinterest
argument generates a ridgeline plot showing the expression of the specified gene in the cell type of interest versus all other cell types.

The plots generated will be stored in the directory Results/Plots
.
Automated cell type annotation
Another key component of Hydra
is an automated cell type annotation module. We will use the setting annotation
to annotate the query dataset.
Note
You need to run feature selection (fs
) on the reference datatset before annotating the query. If you have already run feature selection on the reference & want to annotate (annotation
) a different related query dataset, please process the data (processdata
) first and then provide the path to the directory containing this processed data.
In this example, we will annotate the Lung dataset (He et al) introduced earlier. Since we are running fs
from the same directory containing the Input_Processed
directory, we can skip the base_dir
argument.
hydra --setting annotation
Thank you for using Hydra π, an interpretable deep generative tool for single-cell omics. Please refer to the full documentation available at https://sydneybiox.github.io/Hydra/ for detailed usage instructions. If you encounter any issues running the tool - Please open an issue on Github, and we will get back to you as soon as possible!! =============================== Device to be used: CUDA =============================== INFO - 2025-08-10 19:52:13,972 - Starting to run Device to be used: cuda INFO - 2025-08-10 19:52:16,352 - Training classifier: 1 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:09<00:00, 1.87s/it] INFO - 2025-08-10 19:53:08,709 - Training classifier: 2 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 70.58it/s] INFO - 2025-08-10 19:53:11,426 - Training classifier: 3 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 71.25it/s] INFO - 2025-08-10 19:53:13,269 - Training classifier: 4 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 69.50it/s] INFO - 2025-08-10 19:53:15,711 - Training classifier: 5 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 61.14it/s] INFO - 2025-08-10 19:53:17,226 - Training classifier: 6 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 61.22it/s] INFO - 2025-08-10 19:53:19,323 - Training classifier: 7 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 66.35it/s] INFO - 2025-08-10 19:53:21,421 - Training classifier: 8 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 60.45it/s] INFO - 2025-08-10 19:53:23,181 - Training classifier: 9 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 60.71it/s] INFO - 2025-08-10 19:53:25,759 - Training classifier: 10 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 70.00it/s] INFO - 2025-08-10 19:53:28,174 - Training classifier: 11 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 69.72it/s] INFO - 2025-08-10 19:53:30,902 - Training classifier: 12 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 69.22it/s] INFO - 2025-08-10 19:53:33,107 - Training classifier: 13 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 69.46it/s] INFO - 2025-08-10 19:53:35,200 - Training classifier: 14 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 68.02it/s] INFO - 2025-08-10 19:53:37,760 - Training classifier: 15 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 70.47it/s] INFO - 2025-08-10 19:53:40,618 - Training classifier: 16 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 68.59it/s] INFO - 2025-08-10 19:53:42,993 - Training classifier: 17 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 69.41it/s] INFO - 2025-08-10 19:53:45,198 - Training classifier: 18 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 69.64it/s] INFO - 2025-08-10 19:53:46,947 - Training classifier: 19 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 70.02it/s] INFO - 2025-08-10 19:53:48,458 - Training classifier: 20 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 69.59it/s] INFO - 2025-08-10 19:53:49,924 - Training classifier: 21 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 69.37it/s] INFO - 2025-08-10 19:53:52,444 - Training classifier: 22 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 71.17it/s] INFO - 2025-08-10 19:53:54,682 - Training classifier: 23 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 69.72it/s] INFO - 2025-08-10 19:53:57,157 - Training classifier: 24 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 70.59it/s] INFO - 2025-08-10 19:53:59,812 - Training classifier: 25 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 71.28it/s] INFO - 2025-08-10 19:54:02,705 - Annotating the query dataset... INFO - 2025-08-10 19:54:22,460 - Completed successfully!
Note
The number of classifiers should be equal to the number of models (Ensemble size) used in the feature selection step. If you have changed the ensemble size, then you need to provide an additional argument num_classifiers
when calling the annotation setting.
The annotation results generated will be stored in the directory Results/Annotation
.
Plotting cell type predictions
Hydra
also allows users to visualize the predicted labels on the UMAP plot. We will use the Hydra predicted labels for the Lung dataset (He et al):
hydra --setting plot --predictions True --test scRNA/Lung_He.h5ad --modality rna --ctpredictions Results/Annotation/Hydra-25/cell_type_predicted_Hydra-25.csv
Thank you for using Hydra π, an interpretable deep generative tool for single-cell omics. Please refer to the full documentation available at https://sydneybiox.github.io/Hydra/ for detailed usage instructions. If you encounter any issues running the tool - Please open an issue on Github, and we will get back to you as soon as possible!! =============================== Device to be used: CUDA =============================== INFO - 2025-08-10 20:08:55,945 - Starting to run INFO - 2025-08-10 20:08:55,945 - Generating plot for Hydra predicted cell types... Warning: Data is of class matrix. Coercing to dgCMatrix. Normalizing layer: counts Performing log-normalization 0% 10 20 30 40 50 60 70 80 90 100% [----|----|----|----|----|----|----|----|----|----| **************************************************| Finding variable features for layer counts Calculating gene variances 0% 10 20 30 40 50 60 70 80 90 100% [----|----|----|----|----|----|----|----|----|----| **************************************************| Calculating feature variances of standardized and clipped values 0% 10 20 30 40 50 60 70 80 90 100% [----|----|----|----|----|----|----|----|----|----| **************************************************| Centering and scaling data matrix |======================================================================| 100% PC_ 1 Positiveegativeositive: ENSG00000003436, ENSG00000137077, ENSG00000188643, ENSG00000131477, ENSG00000172889, ENSG00000114115, ENSG00000127920, ENSG00000184113, ENSG00000110841, ENSG00000168497 ENSG00000112769, ENSG00000117707, ENSG00000122786, ENSG00000100234, ENSG00000141753, ENSG00000177469, ENSG00000160180, ENSG00000117519, ENSG00000118257, ENSG00000037280 ENSG00000138722, ENSG00000066056, ENSG00000117122, ENSG00000163453, ENSG00000128052, ENSG00000187498, ENSG00000134871, ENSG00000152661, ENSG00000179776, ENSG00000137726 Negativeositive: ENSG00000101439, ENSG00000223865, ENSG00000231389, ENSG00000196126, ENSG00000204287, ENSG00000198502, ENSG00000196735, ENSG00000179344, ENSG00000204257, ENSG00000066336 ENSG00000242574, ENSG00000127951, ENSG00000172243, ENSG00000101336, ENSG00000132514, ENSG00000129226, ENSG00000106066, ENSG00000163563, ENSG00000204482, ENSG00000019582 ENSG00000182578, ENSG00000166428, ENSG00000112799, ENSG00000140749, ENSG00000120708, ENSG00000197629, ENSG00000169413, ENSG00000160593, ENSG00000100079, ENSG00000161921 Negativeositiveegativeositiveegativearning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation' This message will be shown once per session 20:09:14 UMAP embedding parameters a = 0.9922 b = 1.112 20:09:14 Read 6483 rows and found 10 numeric columns 20:09:14 Using Annoy for neighbor search, n_neighbors = 30 20:09:14 Building Annoy index with metric = cosine, n_trees = 50 0% 10 20 30 40 50 60 70 80 90 100% [----|----|----|----|----|----|----|----|----|----| **************************************************| 20:09:14 Writing NN index file to temp file /tmp/RtmpU0dZPV/file3ef8bc449f0c19 20:09:14 Searching Annoy index using 1 thread, search_k = 3000 20:09:16 Annoy recall = 100% 20:09:16 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30 20:09:16 Initializing from normalized Laplacian + noise (using RSpectra) 20:09:18 Commencing optimization for 500 epochs, with 260692 positive edges 20:09:18 Using rng type: pcg Using method 'umap' 0% 10 20 30 40 50 60 70 80 90 100% [----|----|----|----|----|----|----|----|----|----| **************************************************| 20:09:24 Optimization finished INFO - 2025-08-10 20:09:24,813 - Completed successfully!
Below is an example of UMAP of Hydra
predicted cell types for the Lung dataset (He et al):

Below, we include an external figure showing the agreement between Hydra predicted cell type labels and the authorsβ labels

Documentation by Manoj M Wagle