Web archives are a historically valuable source of information. In some respects, web archives are the only record of the evolution of human society in the last two decades. They preserve a mix of personal and collective memories, the importance of which tends to grow as they age. However, the value of web archives depends on their users being able to search and access the information they require in efficient and effective ways. Without the possibility of exploring and exploiting the archived contents, web archives are useless. Web archive access functionalities range from basic browsing to advanced search and analytical services, accessed through user-friendly interfaces. Full-text and URL search have become the predominant and preferred forms of information discovery in web archives, fulfilling user needs and supporting search APIs that feed complex applications. Both full-text and URL search are based on the technology developed for modern web search engines, since the Web is the main resource targeted by both systems. However, while web search engines enable searching over the most recent web snapshot, web archives enable searching over multiple snapshots from the past. This means that web archives have to deal with a temporal dimension that is the cause of new challenges and opportunities, discussed throughout this chapter.
翻译:网络档案是历史上宝贵的信息来源。在某些方面,网络档案是人类社会在过去二十年中演变的唯一记录,保存个人和集体的记忆,其重要性随着年老而日益增长。然而,网络档案的价值取决于其用户能否以高效和有效的方式搜索和访问他们所需要的信息。如果无法探索和利用存档内容,网络档案是没有用处的。网络档案访问功能从基本的浏览到先进的搜索和分析服务,通过方便用户的界面访问。全文和URL搜索已成为网络档案中信息发现的主要和首选形式,满足用户的需求,支持提供复杂应用程序的搜索API。全文和URL搜索都基于为现代网络搜索引擎开发的技术,因为网络是两个系统的主要目标资源。虽然网络搜索引擎能够搜索最新的网络快照,但网络档案能够从以往的多处搜索。这意味着网络档案必须处理作为新挑战和机会的时空因素,在整个章节中加以讨论。