Most computer vision models are developed based on either convolutional neural network (CNN) or transformer, while the former (latter) method captures local (global) features. To relieve model performance limitations due to the lack of global (local) features, we develop a novel classification network CECT by controllable ensemble CNN and transformer. CECT is composed of a convolutional encoder block, a transposed-convolutional decoder block, and a transformer classification block. Different from conventional CNN- or transformer-based methods, our CECT can capture features at both multi-local and global scales. Besides, the contribution of local features at different scales can be controlled with the proposed ensemble coefficients. We evaluate CECT on two public COVID-19 datasets and it outperforms existing state-of-the-art methods on all evaluation metrics. With remarkable feature capture ability, we believe CECT can be extended to other medical image classification scenarios as a diagnosis assistant.
翻译:多数计算机视觉模型都是以进化神经网络(CNN)或变压器为基础开发的,而前(拉特)方法则捕捉到本地(全球)特征。为了减轻由于缺乏全球(当地)特征而导致的模型性能限制,我们开发了新型分类网络CECT,通过可控合用合用CNN和变压器开发了CECT。CECT是由一个进化编码块、变换-革命解调器块和变压器分类块组成的。不同于传统的CNN或变压器方法,我们的CECT可以同时捕捉多地和全球规模的特征。此外,不同规模的本地特征的贡献可以用拟议的元素系数来控制。我们用两个公共COVID-19数据集来评估CECT,它超越了所有评价指标的现有状态方法。有了显著的特征捕捉捕能力,我们认为CECT可以扩展到其他医学图像分类情景,作为诊断助理。</s>