ChatGPT, a question-and-answer dialogue system based on a large language model, has gained huge popularity since its introduction. Its positive aspects have been reported through many media platforms, and some analyses even showed that ChatGPT achieved a decent grade in professional exams, including the law, medical, and finance domains, adding extra support to the claim that AI now can assist and, even, replace humans in industrial fields. Others, however, doubt its reliability and trustworthiness. In this paper, we investigate ChatGPT's trustworthiness regarding logically consistent behaviours. Our findings suggest that, although ChatGPT seems to achieve an improved language understanding ability, it still fails to generate logically correct predictions frequently. Hence, while it is true that ChatGPT is an impressive and promising new technique, we conclude that its usage in real-world applications without thorough human inspection requires further consideration, especially for risk-sensitive areas.
翻译:热格普特是一个基于一个大型语言模式的问答对话系统,自推出以来,这一系统已变得非常受欢迎,其积极方面通过许多媒体平台得到报道,一些分析甚至显示,热格普特在专业考试(包括法律、医疗和金融领域)中获得了体面的成绩,为AI现在可以帮助甚至取代工业领域人类的说法增添了额外的支持,但另一些人则怀疑其可靠性和可信赖性。在本文件中,我们调查了热格普特在逻辑一致行为方面的可信赖性。我们的研究结果表明,虽然热格普特似乎能够提高语言理解能力,但仍然未能经常产生逻辑正确的预测。 因此,尽管热格普特特特特特确实是一种令人印象深刻和充满希望的新技术,但我们的结论是,在现实世界应用中,未经彻底的人类检查就需要进一步考虑,特别是在风险敏感领域。</s>