The distribution of chemical compounds in high-dimensional molecular descriptor space can be viewed in two dimensions by applying the projection method of this invention. This method has particular usefulness for viewing the relationships of a large number of compounds such as found in a large scale HTS or virtual combinatorial library. After selecting a representative subset of the larger data set of compounds, initially components from the high-dimensional descriptor space are determined by PCA. In order to relax an NLM projection using the PCA components as a start, the stress function is modified to reflect a local horizon beyound which the separation of the compounds is not meaningfully measureable. The resulting two dimensional projections provide a clear insight into the distribution of the chemical compounds in the higher dimensional space. The method is clearly generalizable to viewing descriptor space in three dimensions and to using high dimensional descriptors other than those used to describe molecular structure.
通过应用本发明的投影方法,可在二维空间中查看化合物在高维分子描述空间中的分布情况。这种方法特别适用于查看大量化合物之间的关系,例如在大规模 H
TS 或虚拟组合库中发现的化合物。从更大的化合物数据集中选择一个有代表性的子集后,通过 PCA 确定高维描述符空间的初始成分。为了以 PCA 分量为起点放宽 NLM 投影,对应力函数进行了修改,以反映局部
水平线,在此
水平线之外,化合物的分离度无法进行有意义的测量。由此得到的二维投影可以让我们清楚地了解化合物在高维空间中的分布情况。该方法显然可以推广到查看三维描述符空间以及使用高维描述符(用于描述分子结构的描述符除外)。