Advanced data clustering.
Scalable clustering methods for chemical libraries, molecular dynamics trajectories and high-dimensional biological data. This is the backbone of our virtual screening post-processing — turning millions of poses into a tractable, diverse and prioritised set of candidates.
What we work on
Concrete clustering problems we solve at production scale.
- Chemical library clustering by 2D fingerprints, 3D shape and pharmacophore similarity for diversity selection.
- Docking pose clustering to reduce redundancy in million-pose screens and surface representative binders.
- Consensus aggregation across docking, shape and pharmacophore scoring methods.
- MD trajectory clustering by RMSD, contact maps and energy landscapes to extract representative conformations.
- High-dimensional biological data — omics, image features and time-series — clustered for downstream interpretation.
Tools we use
- MetaScreener consensus moduleAggregates results from multiple VS methods into ranked, deduplicated hit lists.
- ASGARDClustering and analysis of GROMACS MD trajectories.
- Internal clustering scriptsPython and R pipelines for chemical and biological data, run on HPC.
Applications & target areas
Where scalable clustering changes what is actually possible in a project.
Virtual screening triage
Reducing 10⁶–10⁷ docked poses to a diverse, manageable shortlist for experimental testing.
MD post-processing
Extracting representative conformations from long simulations for further docking and free-energy work.
Diversity selection
Choosing chemically diverse subsets for screening campaigns and library design.
Biological data analysis
Patient stratification, image segmentation and time-series clustering for partner projects.
Selected papers
Reference publications underpinning this line.
| Topic | Reference |
|---|---|
| ASGARD — MD trajectory analysis and clustering | 10.1080/07391102.2024.2349527 |
| MetaScreener — consensus and post-processing of VS campaigns | github.com/bio-hpc/metascreener |
Have millions of poses, trajectories or descriptors to make sense of?
Prof. Horacio Pérez-Sánchez · hperez@ucam.edu