The $\texttt{torch-choice}$ is an open-source library for flexible, fast choice modeling with Python and PyTorch. $\texttt{torch-choice}$ provides a $\texttt{ChoiceDataset}$ data structure to manage databases flexibly and memory-efficiently. The paper demonstrates constructing a $\texttt{ChoiceDataset}$ from databases of various formats and functionalities of $\texttt{ChoiceDataset}$. The package implements two widely used models, namely the multinomial logit and nested logit models, and supports regularization during model estimation. The package incorporates the option to take advantage of GPUs for estimation, allowing it to scale to massive datasets while being computationally efficient. Models can be initialized using either R-style formula strings or Python dictionaries. We conclude with a comparison of the computational efficiencies of $\texttt{torch-choice}$ and $\texttt{mlogit}$ in R as (1) the number of observations increases, (2) the number of covariates increases, and (3) the expansion of item sets. Finally, we demonstrate the scalability of $\texttt{torch-choice}$ on large-scale datasets.
翻译:Torch-Choice是一个开源库,用于使用Python和PyTorch进行灵活,快速的选择建模。torch-choice提供ChoiceDataset数据结构,可以灵活,高效地管理数据库。本文演示了如何从各种格式的数据库中构建ChoiceDataset,并介绍了ChoiceDataset的功能。该包实现了两个广泛使用的模型,即多项式Logit和嵌套Logit模型,支持在模型估计期间的正则化处理。该包可以利用GPU进行估计,从而可以在大规模数据集的情况下进行扩展,同时非常高效。模型可以使用R式公式字符串或Python字典进行初始化。我们最后比较了torch-choice和R的mlogit在(1)观测次数增加,(2)协变量数量增加和(3)项目集扩展时的计算效率。最后,我们展示了torch-choice在大规模数据集上的可扩展性。