改进小型抽样系统使用率比额表分分数不确定性报告不确定性报告经验决定规则 (Empirical Decision Rules for Improving the Uncertainty Reporting of Small Sample System Usability Scale Scores)

The System Usability Scale (SUS) is a short, survey-based approach used to determine the usability of a system from an end user perspective once a prototype is available for assessment. Individual scores are gathered using a 10-question survey with the survey results reported in terms of central tendency (sample mean) as an estimate of the system's usability (the SUS study score), and confidence intervals on the sample mean are used to communicate uncertainty levels associated with this point estimate. When the number of individuals surveyed is large, the SUS study scores and accompanying confidence intervals relying upon the central limit theorem for support are appropriate. However, when only a small number of users are surveyed, reliance on the central limit theorem falls short, resulting in confidence intervals that suffer from parameter bound violations and interval widths that confound mappings to adjective and other constructed scales. These shortcomings are especially pronounced when the underlying SUS score data is skewed, as it is in many instances. This paper introduces an empirically-based remedy for such small-sample circumstances, proposing a set of decision rules that leverage either an extended bias-corrected accelerated (BCa) bootstrap confidence interval or an empirical Bayesian credibility interval about the sample mean to restore and bolster subsequent confidence interval accuracy. Data from historical SUS assessments are used to highlight shortfalls in current practices and to demonstrate the improvements these alternate approaches offer while remaining statistically defensible. A freely available, online application is introduced and discussed that automates SUS analysis under these decision rules, thereby assisting usability practitioners in adopting the advocated approaches.

翻译：系统使用比例表(SUS)是一个简短的、基于调查的方法,用于在有原型可供评估时从终端用户的角度确定系统的可用性;在收集个人评分时,采用10个问题调查的结果,以中央趋势(抽样平均值)作为系统可用性的估计(SUS研究得分),抽样平均值上的信任间隔用于通报与这一点估计有关的不确定性水平;在接受调查的人数众多时,依靠中央限值提供支持的SUS研究分数和相应的信任间隔是适当的;然而,在对少数用户进行调查时,对中央限值的依赖度就不足,因此对中央限值的依赖度则短于10个问题调查的结果,其调查结果以中央趋势(抽样平均值)为中央趋势(抽样)为中央趋势(抽样),其间信任度(图图图图图)与感性(S)相混淆,其间比(多次),当基础SUS得分数据偏差数据被扭曲时,这些缺点尤其明显;本文件介绍了对此类小范围背景情况进行经验性的补救,提出一套决策规则,既利用扩大的偏差修正的偏差规则,又根据SBAsregreal rogreal rode 恢复了目前对S的准确性分析,然后又恢复了SBArereme的精确度评估。