如何提升具有领域知识的小样本文本分类的效果[Appl. Sci.专栏第十篇发表论文]
周涛  |  2023-02-03  |  科学网  |  312次阅读

我在Applied Sciences(综合性、交叉性期刊,CiteScore=3.70IF=2.84)组织了一个Special Issue,大题目是“大数据分析进展”,比较宽泛。该专栏的推出主要是为了回应因为可获取数据和数据分析的平台、工具的快速增长给自然科学和社会科学带来的重大影响。我们特别欢迎(但不限于)下面四类稿件:(1)数据分析中的基础理论分析,例如一个系统的可预测性(比如时间序列的可预测性)、分类问题的最小误差分析、各种数据挖掘结果的稳定性和可信度分析;(2)数据分析的新方法,例如挖掘因果关系的新方法(这和Topic 1也是相关的)、多模态分析的新方法、隐私计算的新方法等等;(3)推出新的、高价值的数据集、数据分析平台、数据分析工具等等;(4)把大数据分析的方法用到自然科学和社会科学的各个分支(并获得洞见),我们特别喜欢用到那些原来定量化程度不高的学科。

投稿链接:https://www.mdpi.com/journal/applsci/special_issues/75Y7F7607U 

投稿截止时期为2023年6月30日,我们处理稿件非常快,欢迎大家投稿支持。


其中第十篇论文已经正式发表:


Improving Domain-Generalized Few-Shot Text Classification with Multi-Level Distributional Signatures

Abstract

Domain-generalized few-shot text classification (DG-FSTC) is a new setting for few-shot text classification (FSTC). In DG-FSTC, the model is meta-trained on a multi-domain dataset, and meta-tested on unseen datasets with different domains. However, previous methods mostly construct semantic representations by learning from words directly, which is limited in domain adaptability. In this study, we enhance the domain adaptability of the model by utilizing the distributional signatures of texts that indicate domain-related features in specific domains. We propose a Multi-level Distributional Signatures based model, namely MultiDS. Firstly, inspired by pretrained language models, we compute distributional signatures from an extra large news corpus, and we denote these as domain-agnostic features. Then we calculate the distributional signatures from texts in the same domain and texts from the same class, respectively. These two kinds of information are regarded as domain-specific and class-specific features, respectively. After that, we fuse and translate these three distributional signatures into word-level attention values, which enables the model to capture informative features as domain changes. In addition, we utilize domain-specific distributional signatures for the calibration of feature representations in specific domains. The calibration vectors produced by the domain-specific distributional signatures and word embeddings help the model adapt to various domains. Extensive experiments are performed on four benchmarks. The results demonstrate that our proposed method beats the state-of-the-art method with an average improvement of 1.41% on four datasets. Compared with five competitive baselines, our method achieves the best average performance. The ablation studies prove the effectiveness of each proposed module.


论文免费下载链接:

https://www.mdpi.com/2076-3417/13/2/1202  





文章原载于作者的科学网文章,所述内容属作者个人观点,不代表本平台立场。
本文经过系统重新排版,阅读原内容可点击 阅读原文