Research Areas
My research focuses on a diverse set of themes that lie at the intersection of data management, scalable infrastructures, scientific discovery, and intelligent systems. These areas combine theoretical advances with practical applications, aiming to build tools, methods, and systems that empower science and technology in the era of big data and artificial intelligence.
Scientific Workflow Management
This area investigates how to model, execute, and optimize large-scale scientific workflows.
Research focuses on automation, reproducibility, and fault tolerance to enable scientists
to orchestrate complex experiments seamlessly, reducing human effort and improving scientific productivity.
Cloud Computing
Research in cloud computing addresses elasticity, scalability, and cost-effectiveness
in distributed environments. By leveraging virtualized resources and on-demand infrastructures,
the goal is to optimize performance and reduce barriers for executing data-intensive scientific experiments.
Provenance Data Management
Provenance, or data lineage, ensures that scientific results are transparent and reproducible.
My work develops models, storage systems, and analysis techniques that make it possible to
capture, query, and reason about the origin, evolution, and trustworthiness of data at scale.
Bioinformatics
Bioinformatics integrates computing and biology to address challenges in genomics, transcriptomics,
and molecular biology. Research focuses on designing scalable data pipelines, efficient algorithms,
and workflow solutions that support the analysis of massive and heterogeneous biological datasets.
Data Management for Machine Learning
Preparing data for machine learning is a complex and costly process.
This research area explores methods for feature engineering, data cleaning, integration, and sampling,
ensuring that ML models are trained with high-quality data while optimizing efficiency and scalability.
Machine Learning for Data Management
Instead of only preparing data for ML, this area investigates how ML techniques
can improve traditional data management tasks. Applications include intelligent indexing,
adaptive query optimization, and workload prediction, enabling smarter and more efficient data systems.
eScience
eScience is the use of advanced computational methods to accelerate scientific discovery.
This area involves designing infrastructures, data repositories, and collaborative platforms
that enable interdisciplinary research, promote open science, and help scientists turn vast
amounts of data into knowledge and innovation.