Contemporary empirical applications frequently require flexible regression models for complex response types and large tabular or non-tabular, including image or text, data. Classical regression models either break down under the computational load of processing such data or require additional manual feature extraction to make these problems tractable. Here, we present deeptrafo, a package for fitting flexible regression models for conditional distributions using a tensorflow backend with numerous additional processors, such as neural networks, penalties, and smoothing splines. Package deeptrafo implements deep conditional transformation models (DCTMs) for binary, ordinal, count, survival, continuous, and time series responses, potentially with uninformative censoring. Unlike other available methods, DCTMs do not assume a parametric family of distributions for the response. Further, the data analyst may trade off interpretability and flexibility by supplying custom neural network architectures and smoothers for each term in an intuitive formula interface. We demonstrate how to set up, fit, and work with DCTMs for several response types. We further showcase how to construct ensembles of these models, evaluate models using inbuilt cross-validation, and use other convenience functions for DCTMs in several applications. Lastly, we discuss DCTMs in light of other approaches to regression with non-tabular data.
翻译:当代实证应用程序经常需要灵活的回归模型,用于处理复杂的响应类型和大量的表格或非表格数据,包括图像或文本数据。传统回归模型在处理这种数据的计算负荷下会崩溃,或需要额外的手动特征提取才能使这些问题容易处理。这里,我们介绍deeptrafo包,它是一个使用tensorflow后端和许多其他处理器(如神经网络、惩罚和平滑样条)拟合用于条件分布的灵活回归模型的包。deeptrafo包实现了二进制、有序、计数、生存、连续和时间系列响应的深层条件转换模型(DCTMs),可能包括无信息的截尾。DCTMs不像其他可用方法那样假设响应的参数分布族。此外,数据分析师可以通过提供定制的神经网络架构和平滑器来在直观的公式界面中交换可解释性和灵活性,以用于每个术语。我们演示了如何为多种响应类型设置、拟合和使用DCTMs。我们进一步展示了如何构建这些模型的集合,使用内置的交叉验证评估模型,并使用其他方便的DCTMs函数在几个应用程序中使用DCTMs。最后,我们讨论了在非表格数据回归方面DCTMs作为其他方法的问题。