We present novel active learning strategies dedicated to providing a solution to the cold start stage, i.e. initializing the classification of a large set of data with no attached labels. Moreover, proposed strategies are designed to handle an imbalanced context in which random selection is highly inefficient. Specifically, our active learning iterations address label scarcity and imbalance using element scores, combining information extracted from a clustering structure to a label propagation model. The strategy is illustrated by a case study on annotating Twitter content w.r.t. testimonies of a real flood event. We show that our method effectively copes with class imbalance, by boosting the recall of samples from the minority class.
翻译:我们提出了新的积极学习战略,专门为寒冷的起步阶段提供解决办法,即开始对大量没有附带标签的数据进行分类。此外,拟议战略旨在处理随机选择效率极低的不平衡环境。具体地说,我们积极学习的迭代利用元素分数处理标签稀缺和不平衡问题,将从集群结构中提取的信息与标签传播模式相结合。该战略通过对真正洪水事件在推特上提供的内容说明性案例研究加以说明。我们证明,我们的方法有效地解决了阶级不平衡问题,提高了少数阶层样本的回收率。