Tens of thousands of engineers use Sourcegraph day-to-day to search for code and rely on it to make progress on software development tasks. We face a key challenge in designing a query language that accommodates the needs of a broad spectrum of users. Our experience shows that users express different and often contradictory preferences for how queries should be interpreted. These preferences stem from users with differing usage contexts, technical experience, and implicit expectations from using prior tools. At the same time, designing a code search query language poses unique challenges because it intersects traditional search engines and full-fledged programming languages. For example, code search queries adopt certain syntactic conventions in the interest of simplicity and terseness but invariably risk encoding implicit semantics that are ambiguous at face-value (a single space in a query could mean three or more semantically different things depending on surrounding terms). Users often need to disambiguate intent with additional syntax so that a query expresses what they actually want to search. This need to disambiguate is one of the primary frustrations we've seen users experience with writing search queries in the last three years. We share our observations that lead us to a fresh perspective where code search behavior can straddle seemingly ambiguous queries. We develop Automated Query Evaluation (AQE), a new technique that automatically generates and adaptively runs alternative query interpretations in frustration-prone conditions. We evaluate AQE with an A/B test across more than 10,000 unique users on our publicly-available code search instance. Our main result shows that relative to the control group, users are on average 22% more likely to click on a search result at all on any given day when AQE is active.
翻译:成千上万的工程师每天使用Sourcegraph每天搜索代码,并依靠它来在软件开发任务上取得进展。我们在设计满足广大用户需要的查询语言时面临一个关键的挑战。我们的经验表明,用户对如何解释询问表示不同而且往往相互矛盾的偏好。这些偏好来自使用背景、技术经验和对使用先前工具的隐含期望各不相同的用户。与此同时,设计代码搜索查询语言带来了独特的挑战,因为它交叉了传统的搜索引擎和完全的编程语言。例如,代码搜索查询采用某些综合做法,以简单易变,但总是冒着隐含隐含的语义的风险,这些语言在面值上含糊不清(一个查询中的单一空间可能意味着三种或更多语义上的差异)。用户往往需要用额外的语法来模糊意图,从而表达他们真正想要搜索的内容。这需要混淆,这是我们所看到的用户在过去三年里搜索的替代形式上遇到的主要挫折之一。我们分享了我们的观点,在使用最不透明的方法搜索时,我们用最难的语法的语义来进行新的阅读。我们用一个新观点来分析,在不同的语言上,我们用最难的语法搜索。我们用的是,我们用最难的语法的读的语法检索,在过去三年里可以产生一种新的观点,我们用新的语言搜索。我们用新的观点来研究。