StandleDiff: 内边断开的空格中未贴标签的数据集之间的属性比较</s> (StyleDiff: Attribute Comparison Between Unlabeled Datasets in Latent Disentangled Space)

One major challenge in machine learning applications is coping with mismatches between the datasets used in the development and those obtained in real-world applications. These mismatches may lead to inaccurate predictions and errors, resulting in poor product quality and unreliable systems. In this study, we propose StyleDiff to inform developers of the differences between the two datasets for the steady development of machine learning systems. Using disentangled image spaces obtained from recently proposed generative models, StyleDiff compares the two datasets by focusing on attributes in the images and provides an easy-to-understand analysis of the differences between the datasets. The proposed StyleDiff performs in $O (d N\log N)$, where $N$ is the size of the datasets and $d$ is the number of attributes, enabling the application to large datasets. We demonstrate that StyleDiff accurately detects differences between datasets and presents them in an understandable format using, for example, driving scenes datasets.

翻译：在机器学习应用程序中,一个重大挑战是应对开发中使用的数据集与现实应用中获取的数据集之间的不匹配。这些不匹配可能导致不准确的预测和错误,导致产品质量差和系统不可靠。在本研究中,我们建议StyleDiff向开发者通报两个数据集之间的差异,以便稳步开发机器学习系统。利用最近提议的基因化模型获得的分解图像空间,StyleDiff通过关注图像属性来比较这两个数据集,并对数据集之间的差异进行易于理解的分析。提议的StyleDiff以$O(d N\log N) 计算,其中美元是数据集的大小,美元是属性的数量,使应用程序能够用于大型数据集。我们证明StyleDiff精确地检测数据集之间的差异,并以易懂的格式展示,例如,驱动场景数据集。</s>

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日