Target encoding is an effective encoding technique of categorical variables and is often used in machine learning systems for processing tabular data sets with mixed numeric and categorical variables. Recently en enhanced version of this encoding technique was proposed by using conjugate Bayesian modeling. This paper presents a further development of Bayesian encoding method by using sampling techniques, which helps in extracting information from intra-category distribution of the target variable, improves generalization and reduces target leakage.
翻译:目标编码是绝对变量的有效编码技术,经常用于机器学习系统,用于处理具有混合数字和绝对变量的表格数据集。最近,通过使用同源贝叶斯模型,提出了这一编码技术的强化版本。本文介绍了通过抽样技术进一步开发贝叶斯编码方法,这有助于从目标变量的类别内分布中提取信息,改进了一般化,减少了目标渗漏。