Although learning from data is effective and has achieved significant milestones, it has many challenges and limitations. Learning from data starts from observations and then proceeds to broader generalizations. This framework is controversial in science, yet it has achieved remarkable engineering successes. This paper reflects on some epistemological issues and some of the limitations of the knowledge discovered in data. The document discusses the common perception that getting more data is the key to achieving better machine learning models from theoretical and practical perspectives. The paper sheds some light on the shortcomings of using generic mathematical theories to describe the process. It further highlights the need for theories specialized in learning from data. While more data leverages the performance of machine learning models in general, the relation in practice is shown to be logarithmic at its best; After a specific limit, more data stabilize or degrade the machine learning models. Recent work in reinforcement learning showed that the trend is shifting away from data-oriented approaches and relying more on algorithms. The paper concludes that learning from data is hindered by many limitations. Hence an approach that has an intensional orientation is needed.
翻译:虽然从数据中学习是有效的,并取得了重要的里程碑,但它有许多挑战和限制。从数据中学习从观察开始,然后进行更广泛的概括性研究。这个框架在科学中存在争议,但已经取得了显著的工程成功。本文思考了一些认识性问题和在数据中发现的知识的一些局限性。文件讨论了从理论和实践角度获取更多数据是实现更好的机器学习模式的关键这一共同认识。文件从理论和实践角度对使用通用数学理论描述这一过程的缺点提出了一些看法。文件进一步强调了在从数据中学习的专门理论的必要性。虽然更多的数据利用了机器学习模型的一般性能,但实践上的关系显示在最好的方面是逻辑性的;经过一个具体的限制之后,更多的数据稳定或使机器学习模型退化。最近开展的强化学习工作表明,趋势正在偏离以数据为导向的方法,更多地依赖算法。文件的结论是,从数据中学习受到许多限制的阻碍。因此,需要有一种具有强化性指导的方法。