Portrait // INITIATING SEQUENCE — DATA STREAM ONLINE //

Diego
Fuentes Palacios

Bioinformatics · Genomics · Neuroscience
Machine & Deep Learning · Workflow & Software Architect

📍 Barcelona, Catalonia, ES


Bioinformatician by trade based in Barcelona. I enjoy coding as well as trying to simplify and improve reproducibility by using workflow managers and setting up containers. I also experiment with AI-based projects, and sometimes dive into "vibe coding" projects.

Open to research collaborations and biotech, pharma or scientific software roles.

OVERVIEW

I'm Diego, a bioinformatician with a background that spans classical and computational genetics, experimental neuroscience, large-scale genomic data analysis, workflow engineering and AI-based projects.

My academic path started with a BSc in Genetics, shifted into an MSc in Neuroscience where I first encountered the power of computational approaches to complex-disease research, and culminated in an MSc in Bioinformatics, after which I developed expertise in genome assembly, phylogenomics, metagenomics and reproducible pipeline engineering with some sysadmin skills sprinkled in.

I have a passion for building tools and pipelines that make complex analyses more accessible and reproducible, and for applying computational methods to unravel biological insights.


OpenSource contributions
Professionally I've contributed to flagship bioinformatics tools — including GenLit, Redundans, meTAline, Karyon and database / resources such as PhylomeDB v5, MeTaPhOrs v2.5, EvolClustDB, and CandidaMine. I care deeply about reproducibility: every workflow I build ships in containers (Docker/Singularity) and is preferably orchestrated with Snakemake or Nextflow, save some exceptions.


Engineering in the AI Era
My seniority is defined by architectural design, intent-driven prompting, and critical validation. For Expert to Proficient technologies in my stack, I design the core functions and orchestrate AI to generate precise structures, conducting rigorous debugging and code reviews. Across all levels, I apply efficient prompt-engineering, knowing exactly when to leverage AI for acceleration and when to bypass it to resolve complex bottlenecks natively.

EXPERIENCE

2025 – Present Barcelona, ES

Senior Bioinformatician

Institut de Innovació i Investigació Parc Tauli (I3PT) · Rheumatology group (RheIMID)
  • Calculating polygenic risk score (PRS) through GWAS harmonisation and imputation QC pipelines for SpA and psoriatic arthritis patient cohorts. The objective was to infer treatment-outcome prediction in spondyloarthritis, integrating EHR-derived clinical features with genomic data.
  • Prototyping a clinical NLP pipeline (GenLit) for biomedical literature mining — extracting gene–disease associations, variant mentions, and clinical trial outcomes from PubMed abstracts via transformer-based NER models.
  • Analyzing metagenomic data to identify microbial signatures (taxonomic signal and functional annotation) associated with rheumatological diseases, specifically spondyloarthritis and psoriatic arthritis.
  • Analyzing proteomic data to identify novel protein biomarkers in relation with arthritic rheumatology.
  • Collaborating cross-institutionally with clinical and statistical genetics groups to co-author manuscripts and share reproducible analysis code.
2024 – 2025 Barcelona, ES

Specialist — Digital Health & AI

Deloitte Touche Tohmatsu Limited · Technology Strategy & Transformation
    Specialist in Digital Health (specifically Genetics and Bioinformatics) and AI. Both Functional & Technical consultancy roles as well as project manager.
  • Technical Lead in charge of the initial design of the Master Data and Database structure of the central node for the project SIGenES, Ministerio de Sanidad - SNS (10 months)
  • Project Manager in charge of several subprojects (postIAM, Demències i TAVI) within the project DAIPO (Digital transformation in Health, Horizon Europe funds) for the Hospital Universitari Bellvitge and the Àrea Metropolità Sud (Hospital Viladecans & Atenció Primària Delta Sud). 12 months.
  • Specialist consultant for the Sistema Canario de Salud (SCS), more specifically regarding the projects SIGenES and UNICAS, within a technical office. 4 months.
2020 – 2024 Barcelona, ES

Bioinformatics Research Assistant

Barcelona Supercomputing Center (BSC-CNS) / Institute of Research in Biomedicine (IRB) · Gabaldon Lab
    I've worked on database management and development as well as pipeline development and deployment, virtual machine and cluster management (a custom cluster setup with two twinnodes) and data analysis. I also was the sole admin of our group's Github organization for a while, and I set up the CI/CD pipelines for our flagship tools and databases.
  • Led development and maintenance of Redundans2, a modular pipeline for heterozygous genome assembly, scaffolding, and gap-closing.
  • Designed, developed and deployed the meTAline metagenomics workflow integrating taxonomic profiling (Kraken2) with functional annotation (HUMAnN3).
  • Contributed backend infrastructure and data pipelines for PhylomeDB v5, MeTaPhOrs 2.5, EvolClustDB, Karyon, and CandidaMine — containerised services deployed on the BSC cloud environment.
2019 – 2020 Barcelona, ES

Internship & Bioinformatician

Centro Nacional de Análisis Genómico (CNAG) · Genome Assembly & Annotation Team
    As both part of my master's thesis and later temporary contract, I developed an automated pipeline for Mis-assembly and SV detection (MASV) during my short time there (6 months).
    Also contributed to the genome assembly of Pelobates cultripes and Lecanosticta acicola.

SELECTED WORKS
// ACCESS GRANTED //

Redundans2 Present
Genome Assembly Heterozygous Genomes Hybrids Python Perl C++ Bioconda Docker Singularity
Problem Genome assembly can be challenging for organisms with highly variable or repetitive genomes. Standard tools often produce redundant sequences, inflating genome size and complicating downstream analyses such as gene annotation and comparative genomics.
Approach Redundans streamlines draft genome assemblies by merging redundant contigs, filling gaps, and improving contiguity—especially for short-read assemblies of highly heterozygous organisms like fungi, plants, and non-model animals with a modular Python workflow and wrapper: redundancy reduction → scaffolding → gap closing.
Contribution Lead developer for v2 rewrite; yet to be published. Available through Bioconda. Cointainerised with Singularity for HPC deployment and with Docker images for local use. Adopted and widely used (+177 Github Stars, +2100 bioconda downloads, +300 Docker pulls)
Github Bioconda DockerHub
GenLit Present
Clinical NLP Python AI API HuffinFace2-BioNER
Problem PubMed literature grows faster than manual curation can keep up with, thus gene–disease evidence extraction requires scalable automation.
Approach Async Python pipeline querying PubMed/ClinVar PMC APIs, applying HuffinFace2-based BioNER for gene/variant/phenotype extraction, and aggregating association evidence per gene–disease pair, with the help of AI API clients (Perplexity, ChatGPT, Gemini...) for literature synthesis.
Contribution Original author; for gene panel design and clinical evidence aggregation in rheumatological disease genomics. Ongoing development with plans for open-source release and publication in 2026-2027.
Github
meTAline 2025
Metagenomics Taxonomic Profiling Functional Annotation Snakemake R Python Docker Singularity
Problem Integrating taxonomic profiling with functional annotation in metagenomic studies requires gluing many heterogeneous tools, making reproducibility difficult.
Approach End-to-end Snakemake workflow: QC → host depletion → Kraken2 taxonomy → HUMAnN3 functional profiling → KRONA visualization → aggregate R statistics (Alpha-diversity).
Contribution Co-author and lead code developer; published in NAR Genomics and Bioinformatics (2025). Fully containerised with Singularity for HPC deployment and with Docker images for local use (+1700 Docker pulls).
Paper Github DockerHub
CandidaMine 2023
Candida pathogens warehouse Database SQL PHP Gradle Docker
Problem Community resources for the Candida pathogens' biology lacked integrated, queryable databases.
Approach Develop a genome and transcriptome warehouse for the Candida pathogens, integrating genomic data with functional annotations and expression datasets, and deploy it as a containerised service on the BSC cloud environment.
Contribution Unit testing and front end designer of the CandidaMine database. Contributed to the database deployment within the BSC's VM infrastructure.
candidamine.org ↗
EvolClustDB 2023
Phylogenomics Gene neighbourhoods SQL Python Docker HTML JavaScript Uvicorn
Problem Community resources for evolved gene neighbourhoods lacked integrated, queryable databases.
Approach Pipeline and schema contributions, automated data loading, interpretable visualization of gene neighbourhoods, versioned container images, and REST API endpoints.
Contribution Co-author on EvolClustDB Journal of Molecular Biology (2023) publication; contributed database schema design, data engineering, containerisation, and deployment.
Paper evolclustdb.org ↗
Karyon 2022
Genome Assembly Heterozygosity Ploidy Analysis Python Docker
Problem Genome assemblies from highly heterozygous or complex organisms can be misleading due to redundancy, contamination, or unusual ploidy, complicating downstream analyses.
Approach Provides diagnostic analyses and visualizations to identify assembly issues such as heterozygosity, ploidy variation, repeats, and contamination, guiding informed assembly strategies.
Contribution Co-author on Karyon GigaScience (2022) publication; contributed to development, testing, and visualization modules to help researchers understand and improve complex genome assemblies.
Paper GitHub DockerHub
MeTaPhOrs 2.5 2022
Phylogenomics Meta-score calculation SQL Python JavaScript HTML CSS Docker FastAPI
Problem Metadatabases of orthologues at scale requires robust database infrastructure and smart resource allocation / utilization, a humongous task when dealing with thousands of genomes and millions of trees.
Approach Backend pipeline contributions for data processing, orthologs meta-score calculation, data loading, and API development.
Contribution Co-lead on the database v2.5 update: table schema update, data engineering, data integrity and deployment.
orthology.phylomedb.org ↗
PhylomeDB v5 2021
Phylogenomics Tree visualization SQL Python JavaScript HTML CSS Docker FastAPI
Problem Genome-wide phylome construction at scale requires robust database infrastructure, solid tree visualization and automated tree-building workflows.
Approach Backend pipeline contributions for tree construction, data loading, and database optimization; frontend development for interactive tree visualization and user interface; containerised deployment as a community resource.
Contribution First author on PhylomeDB NAR publications (2021); contributed data engineering, database optimization, containerisation, and deployment.
Paper phylomedb.org ↗

COLLABORATION NETWORK
// [WARNING] PRIVACY VIOLATION //

Force-directed graph of publications and co-authors derived from Google Scholar. The closer two nodes are, the more collaborations they represent. Drag nodes to explore. Click a paper node to load its metadata in the panel.

Diego Fuentes-Palacios (me)
Publication
Co-author

// click a paper node to view metadata

NOW & NEXT

Currently Working On

  • Spondyloarthritis project — harmonising multi-cohort GWAS summary statistics, evaluating imputation quality, and building clinically applicable risk models.
  • Iterating on GenLit — improving NER recall for variant and extraction, and expanding coverage to other AI APIs for literature synthesis and evidence aggregation.
  • Exploring LLM-assisted literature synthesis for structured evidence aggregation in rheumatological disease genomics.

Open To

  • Senior bioinformatics engineer or technical lead roles in academia, medical centres, biotech, pharma, or precision medicine companies.
  • Open-source contributions to genomics tooling, clinical NLP frameworks, or scientific workflow infrastructure. Research collaborations in computational genomics, metagenomics, phylogenomics, or biomedical text mining.
  • Talks, seminars, or workshop sessions on reproducible pipelines, methodology, AI in precision medicine or biomedical text mining.