As the amount of audio-visual content increases, the need to develop automatic captioning and subtitling solutions to match the expectations of a growing international audience appears as the only viable way to boost throughput and lower the related post-production costs. Automatic captioning and subtitling often need to be tightly intertwined to achieve an appropriate level of consistency and synchronization with each other and with the video signal. In this work, we assess a dual decoding scheme to achieve a strong coupling between these two tasks and show how adequacy and consistency are increased, with virtually no additional cost in terms of model size and training complexity.
翻译:随着视听内容数量的增加,开发自动字幕和字幕解决方案以适应不断增长的国际受众的期望的必要性看来是提高产出量和降低相关生产后成本的唯一可行办法,自动字幕和字幕往往需要密切交织,以达到适当的一致性和同步性,并与视频信号同步。在这项工作中,我们评估了一种双重解码计划,以在这两项任务之间实现强有力的结合,并表明如何提高充分和一致性,在模型规模和培训复杂性方面几乎没有额外费用。