Research Line

Advanced data clustering

Scalable clustering methods for chemical libraries, molecular dynamics trajectories and high-dimensional biological data. This is the backbone of our virtual screening post-processing — turning millions of poses into a tractable, diverse and prioritised set of candidates.

What we work on

Concrete clustering problems we solve at production scale.

  • Chemical library clustering by 2D fingerprints, 3D shape and pharmacophore similarity for diversity selection.
  • Docking pose clustering to reduce redundancy in million-pose screens and surface representative binders.
  • Consensus aggregation across docking, shape and pharmacophore scoring methods.
  • MD trajectory clustering by RMSD, contact maps and energy landscapes to extract representative conformations.
  • High-dimensional biological data — omics, image features and time-series — clustered for downstream interpretation.

Tools we use

  • MetaScreener consensus module — aggregates results from multiple VS methods into ranked, deduplicated hit lists.
  • ASGARD — clustering and analysis of GROMACS MD trajectories.
  • Internal clustering scripts — Python and R pipelines for chemical and biological data, run on HPC.
See all tools →

Applications & target areas

Where scalable clustering changes what is actually possible in a project.

Virtual screening triageReducing 10⁶–10⁷ docked poses to a diverse, manageable shortlist for experimental testing.
MD post-processingExtracting representative conformations from long simulations for further docking and free-energy work.
Diversity selectionChoosing chemically diverse subsets for screening campaigns and library design.
Biological data analysisPatient stratification, image segmentation and time-series clustering for partner projects.

Selected resources

Interested in this line?

Contact Prof. Horacio Pérez-Sánchez · hperez@ucam.edu

Sign up to receive periodic updates about last research results, software tools (recently published and in testing -premiere access-), open positions, grant calls, press releases, etc.

We don’t spam! Read our privacy policy for more info.