About
I am a PhD-qualified data scientist and computational astrophysicist who builds and analyses large-scale predictive models. My research has involved running high-resolution cosmological simulations to study alternative dark matter models, galaxy cluster physics, and large-scale structures. A large part of my day-to-day involves digging into the numerical quirks of these simulations, using statistical models to figure out what is a real physical effect and what is just a computational artefact, and then checking those results against actual observations.
I apply my background in numerical modelling and high-performance computing (HPC) to broader data science challenges. I manage end-to-end data pipelines, wrangle terabyte-scale datasets, and use machine learning to find useful patterns. Lately, I've been focusing heavily on AI orchestration, leveraging foundational models to automate complex data workflows and turn months of manual engineering into a single day's work. Whether I'm optimising legacy C/C++ code for supercomputers or setting up automated ETL pipelines in Python and SQL, I enjoy tackling tough quantitative problems and making sure the data is reliable.
Experience
- Led international projects to build and validate large-scale predictive models.
- Built and ran a custom collaboration platform for a 100-member international consortium, replacing a commercial tool and avoiding its ongoing licensing costs entirely.
- Served on the Swinburne Time Allocation Committee (STACK), reviewing and ranking competing proposals to allocate scarce, high-value observatory time across an international research consortium.
- Built the SAGE Tree Converter, an AI-driven tool that cut a manual 1–3 month engineering job down to a single day.
- Wrote custom GIZMO extensions for distributed HPC environments, improving predictive-model accuracy by 40% by finding and fixing systematic errors.
- Built end-to-end Python and SQL ETL pipelines to process terabyte-scale raw simulation data.
- Used A/B testing, feature engineering, and statistical diagnostics to tell genuine signals apart from computational noise in terabyte-scale simulation outputs.
- Followed that up with the Spurious Halo Classifier, swapping a hand-tuned cutoff for scikit-learn and PyTorch models trained on independently labelled data, with experiment tracking (MLflow), model explainability (SHAP), and CI for reproducible results.
- Mentored 3 undergraduate and 2 postgraduate researchers and gave regular progress updates to a cross-disciplinary team across astrophysics, physics, and engineering.
- Developed Python workflows and R notebooks leveraging NumPy/Pandas to integrate and standardise terabyte-scale HPC datasets across international facilities.
- Conducted complex statistical benchmarking of modelling architectures to isolate physical signals from computational artefacts.
- Communicated technical insights across 9 peer-reviewed publications and 6 international conferences.
Projects
- Built a classifier (scikit-learn, PyTorch) to replace a hand-tuned cutoff, training it on independently labelled data for more reliable, repeatable results.
- Organised the workflow around a Databricks-inspired medallion data architecture, with MLflow for experiment tracking and SHAP for explaining model decisions.
- Added CI so training and validation stay reproducible as the models evolve.
- Built end-to-end Python and SQL ETL pipelines to clean and standardise terabyte-scale datasets for distributed HPC environments.
- Used data science techniques like A/B testing and feature engineering to spot anomalies and separate real signals from noise.
- Built interactive Power BI/Fabric dashboards to share modelling results with non-technical stakeholders.
- Wrote custom C/C++ extensions for the GIZMO framework (GIZMO-PBHEF) to model complex system dynamics on distributed HPC.
- Developed and tested new simulation methods, improving accuracy by 40% by tracking down systematic errors across terabyte-scale models.
- Used CI/CD to manage the full build and deployment cycle, keeping parallelised builds stable and reliable.
- Built an AI-assisted CLI that uses LLMs to convert complex hierarchical simulation outputs into standard formats on its own.
- Used an adapter design pattern and a persistent knowledge base to keep token usage low and make the tool easy to extend.
- Turned a manual 1–3 month job into a single-day process, a 98% cut in turnaround time.
Publications
I've contributed to several peer-reviewed astrophysics papers, mostly focusing on large-scale statistical modelling and galaxy formation.
Latest publication: On the redshift evolution of the spin parameter in cosmological simulations, Riera et al. 2026, PASA, 43, e063.
Professional Development
Selected for a competitive 20-person cohort to develop commercialisation pathways for cross-sector digital health projects. Wrapped up the program with an investor pitch night, presenting technical research as a commercial proposition to an industry panel.
Selected for an intensive program focused on re-framing technical research for real-world applications through commercial literacy and visual storytelling. Developed capabilities in creative problem-solving and translating complex technical knowledge for non-specialised audiences.
Education
Graduated Summa Cum Laude. Awarded Doctorate Honourable Mention in 2022.
Awarded general enrolment funding and academic stipend by the Spanish Ministry of Education.
Awarded general enrolment funding and academic stipend by the Spanish Ministry of Education.