The LeadId project aimed at developing computer methods for HTS data analysis. So far, the project has resulted in an analysis tool (HTSview), which appears to be capable of capturing SAR data even from HTS data sets. HTSview has a graphical user interface (GUI), which allows fully interactive analysis and visualization of the data. The analysis can be efficiently used to derive models describing the SAR (biophore). Biophores can directly be used for virtual screening. The method does not require 3D alignments, but gives matching information derived from the multiple Feature Tree models. This information is intended to be used for generating 3D pharmacophores.
The newly developed methods have been tested for literature data and in-house data sets, for which SAR information was available. The LeadID software could be used to derive predictive SAR models from this data. Initial virtual screening test using the biophore models indicated, that scaffold hopping is possible. More important, the resulting biophore models can be chemically interpreted. Thus, the software appears to be suited to derive meaningful biophore models HTS data.
A test version of HTSview will be provided by BioSolveIT in august '03.
- A computer program with a graphical user interface has been generated that allows the rapid identification of potential biophoric groups from HTS data. The results of the analysis can easily be used for virtual screening of large databases to mine through compound collections that have not been tested in order to identify new promising lead structures.
- The methodology is based on the Feature Trees descriptor, which is a fragment based, non-linear topological molecular descriptor. The molecule is described by a tree structure representing its major chemical building blocks and the way they are connected. Each building block is labeled by a fingerprint (biophore) representing physico-chemical properties of the fragment (e.g. h-bond acceptor, donor, hydrophobic ring center, hydrophobic).
- A novel concept to identify SAR has also been introduced, Multiple Feature Tree models. The algorithm is used for the multiple 2D-alignment of feature trees. Active molecules are combined into a multiple topological template containing the matched fragments. A two dimensional mapping of the matches describes the resulting model and indirectly the topology of the molecules in the activity region. This novel methodology shows a great potential in the generation of multiple Feature Tree models in first studies.
- The data mining tools like clustering and statistical fragment analysis together with the Feature Tree descriptor enhance the analysis of HTS results significantly. Hence, the descriptor and the analysis take not only into account the information from the active compounds but also make use of the information from the inactive ones, this new tool offers broader potential than standard descriptors.
- Further on, size limitations for the analyzed data set have been minimized and the speed of the calculations has been improved considerably (more than 100 comparisons per second). Therefore, several hundred thousands of compounds can be analyzed. Feature Trees can be run on multiple processors and is now able to handle up to 10,000 molecules on a workstation cluster for processing the full similarity matrix (50 million pair comparisons over night).
© 2003 Marc Zimmermann, Modfied: