![]() An example of this problem is the peak-to-peak match of compounds 10 and 12 (Figure 1). Treatment of outliers in DGAĪ problem with performing DGA-based unique matching of peaks between two spectra is that a single long match can greatly affect the outcome. The actual structures of the 51 compounds are listed in Additional file 1. The database of 51 HSQC spectra from our previous work was used to test the efficacy of our newly developed algorithm. The results were compared to bit string based molecular fingerprints incorporating a suitable threshold for the Tanimoto coefficient and to nearest neighbour search, also known as proximity search or closest point search which is the simplest implementation of all peak matching methods. We tested our new method on a compound database containing 51 HSQC spectra. We improved the efficiency of our previously reported HSQC spectra matching algorithm by using a discrete genetic algorithm (DGA) implementation instead of differential evolution. ![]() The outcome is a robust algorithm capable of matching spectra containing a large number of peaks rapidly on a standard desktop computer. Our new approach is aimed at increasing computational efficiency by considering three factors limiting the rate of convergence of any algorithm, the choice of the metric and method to obtain an optimal solution and the size of the search space. larger than 20), the search space became very large, to the extent that the quality of match was not computable in a reasonable amount of time. However, as the number of peaks increased (i.e. The use of self-adaptive differential evolution allowed matching of a candidate compound HSQC peaks to individual entries of a database. We previously outlined a method of matching HSQC spectra of small compounds motivated by evolutionary optimization. Above an acceptable threshold, compounds are deemed similar and therefore have similar chemical or biological properties. In bit string-based fingerprinting, the Tanimoto (T c) and Tversky coefficients have been used widely to quantify the level of similarity. The fingerprints capture specific information about molecular structure and specific properties of a molecule. For example, compound fragments and related properties have been mapped to molecular fingerprints defined using bit strings. The location of peaks provides valuable information about the chemical environment of hydrogen and carbon atoms allowing molecular structure to be inferred from the number and location of peaks which have specific distributions for each compound.Ī number of metrics have been used to quantify the similarity between a compound of interest and a database of compounds allowing the best database results to be selected as possible replacements for the candidate structure. Since we validate our findings using published data, in this work, peak intensities are not included as part of the spectra matching. However, care must be taken to ensure that all data was acquired using the same acquisition parameters. The intensity of the peaks could also be included in the analysis. Generally, the 2D Cartesian coordinates of the peaks are reported without any reference to intensity or peak size. The high-intensity plot features, referred to as “peaks”, delineate directly bonded hydrogen and carbon atoms of a compound. Experimental results are presented as 2D plots with axes defined by proton ( 1 H) and carbon ( 13C) chemical shifts. To improve database searching, we have concentrated on the latter two constraints and propose a new approach of identifying similar compounds using heteronuclear single quantum coherence (HSQC) spectra.Ĭarbon HSQC spectra are collected routinely to confirm or elucidate molecular structure in synthetic and natural product chemistry. Currently, database searching efficiency is constrained by the size of the database, the method used to determine similarity and the function defining match quality. ![]() The mapping of specific compound properties to “fingerprints” has provided a robust method of searching large databases. Classification of similar compounds is based on the premise that physicochemical properties are comparable. Database driven chemical structure identification is common practice in drug discovery. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |