Video length is 6:41

Molecular Clustering with GPU Acceleration

In this episode of “MATLAB for Chemistry,” you will learn how GPU acceleration enhances cheminformatics workflows—specifically, molecular clustering based on structural similarity. Traditional pairwise similarity calculations, such as Tanimoto coefficients applied to molecular fingerprints, scale as O(N²), making them impractical for large data sets. GPUs, with their massively parallel architecture, offer a solution by executing thousands of similarity computations simultaneously.

What’s covered in this video:

  1. GPU‑based fingerprint handling: Transfer molecular fingerprints to a GPU using gpuArray in MATLAB®.

  2. Parallel similarity computation: The GPU computes all pairwise Tanimoto similarities at once, eliminating slow serial looping.

  3. Result retrieval: Computed similarity matrices are gathered back to CPU memory for downstream clustering.

MATLAB is integrated with RDKit—used for fingerprint generation—to illustrate how modest code changes can offload heavy computations to the GPU. The result is a marked speedup: Tasks that would take hours on a CPU now run in minutes or less.

Why this matters in cheminformatics:

  • Scalability: Enables clustering of thousands to millions of molecules—essential in drug discovery and large virtual screenings

  • High-throughput potential: GPU acceleration supports rapid, iterative exploration of chemical space, critical for hit identification

  • Seamless MATLAB integration: No need for complex CUDA® code; MATLAB handles GPU support via built‑in functions such as gpuArray and gather

Previous videos in this series covered the fundamentals of similarity calculation. This demonstration advances the foundation by transforming computationally intensive clustering tasks into a tractable, GPU‑driven workflow. This approach showcases the power of harmonizing MATLAB, Python® (RDKit), and GPU computing for modern cheminformatics—turning bottlenecks into fast, practical analytics while maintaining clear, maintainable code.

  1. Watch this video to see how MATLAB-RDKit enables molecular similarity analysis.
  2. Explore GPU computing in MATLAB.
  3. Learn to speed up computations with gpuArray.
  4. Learn to transfer data from GPU to CPU with gather.
  5. Download the Tox21 dataset for the exercise at the end of the liver script.

Published: 22 Jul 2025