MATLAB is a mathematical computing environment used by many engineers, mathematicians, and students to process and understand their data. Important to all data science is the managing of textual data. MATLAB supports two textual data containers: (1) cell arrays of characters and (2) string arrays. This research showcases the strengths of string arrays over cell arrays by quantifying their performance, memory contiguity, syntax readability, interface fluidity, and autocomplete capabilities. These results demonstrate that string arrays often run 2x to 40x faster than cell arrays for common string benchmarks, are optimized for data locality by reducing metadata overhead, and offer a more expressive syntax due to their automatic data type conversions and vectorized methods.
翻译:MATLAB是一种数学计算环境,许多工程师、数学家和学生用来处理和理解数据。对所有数据科学来说,重要的是对文本数据的管理。 MATLAB支持两个文本数据容器:(1) 字符的单元格阵列和(2) 字符串阵列。这项研究通过量化其性能、内存毗连、语法可读性、界面流畅性和自动化能力,展示了细胞阵列在单元格阵列上的强项。这些结果表明,字符阵列的运行速度往往比用于通用字符串基准的单元格阵列快2x40x,通过减少元数据管理为数据定位优化了数据位置,并因其自动数据类型转换和矢量化方法而提供了更直观的语法。