Molecular Similarity Analysis
Dive into molecular similarity analysis using this practical cheminformatics walkthrough with MATLAB® and RDKit. See how to import and prepare a molecular data set, compute molecular fingerprints, calculate pairwise similarity scores, and visualize the chemical similarity landscape.
Molecular similarity is a core concept in cheminformatics and drug discovery, enabling you to identify structurally related compounds and assess chemical diversity. Here, you will see that molecules selected for similar physicochemical properties, specifically high LogP (partition coefficient) and high LogS (solubility), can still be structurally diverse. The similarity histogram reveals that most molecules in this data set have low structural similarity, despite sharing similar property values.
This analysis highlights key scientific insights:
- Global properties such as LogP and LogS can be achieved by many different chemical scaffolds and functional groups.
 - Property-based filtering alone does not guarantee structural similarity.
 - Molecular diversity can persist even within property-constrained data sets.
 
Combining both property-based and structure-based approaches is essential for comprehensive chemical space exploration, compound library design, and effective lead selection.
You’ll be guided through each step, from data import to visualization, using MATLAB and RDKit. Exercises at the end encourage you to investigate how property and structure relate in real chemical data sets.
Published: 26 Jun 2025