Normalized web distance (NWD) is a similarity or normalized semantic distance based on the World Wide Web or another large electronic database, for instance Wikipedia, and a search engine that returns reliable aggregate page counts. For sets of search terms the NWD gives a common similarity (common semantics) on a scale from 0 (identical) to 1 (completely different). The NWD approximates the similarity of members of a set according to all (upper semi)computable properties. We develop the theory and give applications of classifying using Amazon, Wikipedia, and the NCBI website from the National Institutes of Health. The last gives new correlations between health hazards. A restriction of the NWD to a set of two yields the earlier normalized google distance (NGD) but no combination of the NGD's of pairs in a set can extract the information the NWD extracts from the set. The NWD enables a new contextual (different databases) learning approachbased on Kolmogorov complexity theory that incorporates knowledge from these databases.
翻译:普通化的网络距离(NWD)是基于万维网或其他大型电子数据库(例如Wikipedia)的相似或正常的语义距离(NWD),以及一个返回可靠总页数数的搜索引擎。对于一组搜索术语,NWD给出了一个从0(相同)到1(完全不同)的通用相似性(通用语义)。NWD接近了一个根据所有(上半)可计算属性组成的集的成员的相似性。我们开发了使用亚马逊、维基百科和国家卫生研究所的CONCI网站进行分类的理论和应用。最后一个提供了健康危害之间的新关联。NWD限制将两组连接为生成较早的标准化谷歌距离(NGD),但一组中NGD的配对组合无法从集中提取信息。NWD使基于这些数据库知识的Kolmogorov复杂理论的新的背景(不同的数据库)学习方法得以实现。