We propose a PiggyBack, a Visual Question Answering platform that allows users to apply the state-of-the-art visual-language pretrained models easily. The PiggyBack supports the full stack of visual question answering tasks, specifically data processing, model fine-tuning, and result visualisation. We integrate visual-language models, pretrained by HuggingFace, an open-source API platform of deep learning technologies; however, it cannot be runnable without programming skills or deep learning understanding. Hence, our PiggyBack supports an easy-to-use browser-based user interface with several deep learning visual language pretrained models for general users and domain experts. The PiggyBack includes the following benefits: Free availability under the MIT License, Portability due to web-based and thus runs on almost any platform, A comprehensive data creation and processing technique, and ease of use on deep learning-based visual language pretrained models. The demo video is available on YouTube and can be found at https://youtu.be/iz44RZ1lF4s.
翻译:我们建议使用一个PiggyBack, 即一个视觉问题解答平台, 方便用户使用最先进的视觉语言预设模型。 PiggyBack 支持全套视觉问题解答任务, 特别是数据处理、 模型微调和结果可视化。 我们整合了视觉语言模型, 由HugingFace(一个开放源代码的深层次学习技术的ACI平台)预先培训; 但是, 没有编程技能或深层次学习理解, 就无法运行。 因此, 我们的 Pigback 支持一个方便使用的用户界面, 与几个深层次学习的视觉语言预设模型, 供普通用户和域专家使用。 PigygyBack 包括以下好处: 由MIT 授权免费提供, 网络可移植, 几乎在任何平台上运行, 综合数据创建和处理技术, 以及易于使用深层次学习的视觉语言预设模型。 演示视频可在YouTube上查阅, 可在 https://yotu.be/iz44RZ1F4s查阅。