Finding relevant scientific articles is crucial for advancing knowledge. Recommendation systems are helpful for such purpose, although they have only been applied to science recently. This article describes EILEEN (Exploratory Innovator of LitEraturE Networks), a recommendation system for scientific publications and grants with open source code and datasets. We describe EILEEN's architecture for ingesting and processing documents and modeling the recommendation system and keyphrase estimator. Using a unique dataset of log-in user behavior, we validate our recommendation system against Latent Semantic Analysis (LSA) and the standard ranking from Elasticsearch (Lucene scoring). We find that a learning-to-rank with Random Forest achieves an AUC of 0.9, significantly outperforming both baselines. Our results suggest that we can substantially improve science recommendations and learn about scientists' behavior through their search behavior. We make our system available through eileen.io
翻译:寻找相关的科学文章对于推动知识发展至关重要。 推荐系统对此很有帮助, 尽管它们最近只应用到科学上。 文章描述了ELIEEN(LITERATIURE Networks的探索创新者), 这是科学出版物和赠款的建议系统, 带有开放源代码和数据集。 我们描述了ELIELEEN的采集和处理文件的架构, 以及建议系统的建模和关键词测量器。 使用一个独特的登录用户行为数据集, 我们验证了我们反对Lentn Semantic 分析(LSA)的推荐系统, 以及ElasticricSearch(Lucene评分)的标准排名。 我们发现, 随机森林的学习排名可以达到0.9AU, 大大超过两个基线。 我们的结果表明, 我们可以大幅改进科学建议, 并通过搜索行为了解科学家的行为。 我们通过eileen提供我们的系统。