Genome Assembly
Heterozygous Genomes
Hybrids
Python
Perl
C++
Bioconda
Docker
Singularity
Problem
Genome assembly can be challenging for organisms with highly variable or repetitive genomes. Standard
tools often produce redundant sequences, inflating genome size and complicating downstream analyses such
as gene annotation and comparative genomics.
Approach
Redundans streamlines draft genome assemblies by merging redundant contigs, filling gaps, and improving
contiguity—especially for short-read assemblies of highly heterozygous organisms like fungi, plants, and
non-model animals with a modular Python workflow and wrapper: redundancy reduction → scaffolding
→ gap closing.
Contribution
Lead developer for v2 rewrite; yet to be published. Available through Bioconda.
Cointainerised with Singularity for HPC deployment and with Docker images for local use.
Adopted and widely used (+177 Github Stars, +2100 bioconda downloads, +300 Docker pulls)
Github
Bioconda
DockerHub
Clinical NLP
Python
AI API
HuffinFace2-BioNER
Problem
PubMed literature grows faster than manual curation can keep up with, thus
gene–disease evidence extraction requires scalable automation.
Approach
Async Python pipeline querying PubMed/ClinVar PMC APIs, applying
HuffinFace2-based BioNER for gene/variant/phenotype extraction, and aggregating
association evidence per gene–disease pair, with the help of AI API clients (Perplexity, ChatGPT,
Gemini...) for literature synthesis.
Contribution
Original author; for gene panel design and clinical evidence aggregation in rheumatological disease
genomics. Ongoing development with plans for open-source release and publication in 2026-2027.
Github
Metagenomics
Taxonomic Profiling
Functional Annotation
Snakemake
R
Python
Docker
Singularity
Problem
Integrating taxonomic profiling with functional annotation in metagenomic studies
requires gluing many heterogeneous tools, making reproducibility difficult.
Approach
End-to-end Snakemake workflow: QC → host depletion → Kraken2 taxonomy →
HUMAnN3 functional profiling → KRONA visualization → aggregate R statistics (Alpha-diversity).
Contribution
Co-author and lead code developer; published in NAR Genomics and Bioinformatics (2025). Fully
containerised with Singularity for HPC deployment and with Docker images for local use (+1700 Docker
pulls).
Paper
Github
DockerHub
Candida pathogens warehouse
Database
SQL
PHP
Gradle
Docker
Problem
Community resources for the
Candida pathogens' biology lacked integrated, queryable databases.
Approach
Develop a genome and transcriptome warehouse for the Candida pathogens, integrating genomic data
with functional annotations and expression datasets, and deploy it as a containerised service on the BSC
cloud environment.
Contribution
Unit testing and front end designer of the CandidaMine database. Contributed to the database deployment
within the BSC's VM infrastructure.
candidamine.org ↗
Phylogenomics
Gene neighbourhoods
SQL
Python
Docker
HTML
JavaScript
Uvicorn
Problem
Community resources for evolved gene neighbourhoods lacked integrated, queryable databases.
Approach
Pipeline and schema contributions, automated data loading, interpretable visualization of gene
neighbourhoods,
versioned container images, and REST API endpoints.
Contribution
Co-author on EvolClustDB Journal of Molecular Biology (2023) publication; contributed database
schema design, data engineering,
containerisation, and deployment.
Paper
evolclustdb.org ↗
Genome Assembly
Heterozygosity
Ploidy Analysis
Python
Docker
Problem
Genome assemblies from highly heterozygous or complex organisms can be misleading due to redundancy,
contamination, or unusual ploidy, complicating downstream analyses.
Approach
Provides diagnostic analyses and visualizations to identify assembly issues such as heterozygosity, ploidy
variation, repeats, and contamination, guiding informed assembly strategies.
Contribution
Co-author on Karyon GigaScience (2022) publication; contributed to development, testing, and
visualization modules to help researchers understand and improve complex genome assemblies.
Paper
GitHub
DockerHub
Phylogenomics
Meta-score calculation
SQL
Python
JavaScript
HTML
CSS
Docker
FastAPI
Problem
Metadatabases of orthologues at scale requires
robust database infrastructure and smart resource allocation / utilization, a humongous task when dealing
with thousands of genomes and millions of trees.
Approach
Backend pipeline contributions for data processing, orthologs meta-score calculation, data loading, and
API
development.
Contribution
Co-lead on the database v2.5 update: table schema update, data engineering, data integrity and deployment.
orthology.phylomedb.org ↗
Phylogenomics
Tree visualization
SQL
Python
JavaScript
HTML
CSS
Docker
FastAPI
Problem
Genome-wide phylome construction at scale requires
robust database infrastructure, solid tree visualization and automated tree-building workflows.
Approach
Backend pipeline contributions for tree construction, data loading, and database optimization; frontend
development for interactive tree visualization and user interface; containerised deployment as a community
resource.
Contribution
First author on PhylomeDB NAR publications (2021); contributed data
engineering, database optimization, containerisation, and deployment.
Paper
phylomedb.org ↗