The rapid development of network science and technologies depends on shareable datasets. Currently, there is no standard practice for reporting and sharing network datasets. Some network dataset providers only share links, while others provide some contexts or basic statistics. As a result, critical information may be unintentionally dropped, and network dataset consumers may misunderstand or overlook critical aspects. Inappropriately using a network dataset can lead to severe consequences (e.g., discrimination) especially when machine learning models on networks are deployed in high-stake domains. Challenges arise as networks are often used across different domains (e.g., network science, physics, etc) and have complex structures. To facilitate the communication between network dataset providers and consumers, we propose network report. A network report is a structured description that summarizes and contextualizes a network dataset. Network report extends the idea of dataset reports (e.g., Datasheets for Datasets) from prior work with network-specific descriptions of the non-i.i.d. nature, demographic information, network characteristics, etc. We hope network reports encourage transparency and accountability in network research and development across different fields.
翻译:网络科学技术的迅速发展取决于共享的数据集。目前,在报告和共享网络数据集方面没有标准的做法。一些网络数据集提供者只共享链接,而另一些则提供一些背景或基本统计数据。因此,关键信息可能被无意地丢弃,网络数据集消费者可能误解或忽视关键方面。不适当地使用网络数据集可能导致严重后果(例如歧视),特别是当网络机器学习模型被部署在高接收域时。由于网络经常在不同领域(例如网络科学、物理等)使用,而且结构复杂,因此出现了挑战。为了便利网络数据集提供者和消费者之间的通信,我们提议网络报告。网络报告是一种结构化描述,概括网络数据集,并描述其背景。网络报告扩大了先前与网络具体描述不同领域的网络研究与发展有关的数据集报告(例如数据集数据表)的概念。我们希望网络报告鼓励不同领域的网络研究与发展的透明度和问责制。