如何提升具有领域知识的小样本文本分类的效果[Appl. Sci.专栏第十篇发表论文]

周涛 | 2023-02-03 | 科学网 | 415次阅读

我在Applied Sciences（综合性、交叉性期刊，CiteScore=3.70；IF=2.84）组织了一个Special Issue，大题目是“大数据分析进展”，比较宽泛。该专栏的推出主要是为了回应因为可获取数据和数据分析的平台、工具的快速增长给自然科学和社会科学带来的重大影响。我们特别欢迎（但不限于）下面四类稿件：（1）数据分析中的基础理论分析，例如一个系统的可预测性（比如时间序列的可预测性）、分类问题的最小误差分析、各种数据挖掘结果的稳定性和可信度分析；（2）数据分析的新方法，例如挖掘因果关系的新方法（这和Topic 1也是相关的）、多模态分析的新方法、隐私计算的新方法等等；（3）推出新的、高价值的数据集、数据分析平台、数据分析工具等等；（4）把大数据分析的方法用到自然科学和社会科学的各个分支（并获得洞见），我们特别喜欢用到那些原来定量化程度不高的学科。

投稿链接：https://www.mdpi.com/journal/applsci/special_issues/75Y7F7607U

投稿截止时期为2023年6月30日，我们处理稿件非常快，欢迎大家投稿支持。

其中第十篇论文已经正式发表：

Improving Domain-Generalized Few-Shot Text Classification with Multi-Level Distributional Signatures

Abstract

Domain-generalized few-shot text classification (DG-FSTC) is a new setting for few-shot text classification (FSTC). In DG-FSTC, the model is meta-trained on a multi-domain dataset, and meta-tested on unseen datasets with different domains. However, previous methods mostly construct semantic representations by learning from words directly, which is limited in domain adaptability. In this study, we enhance the domain adaptability of the model by utilizing the distributional signatures of texts that indicate domain-related features in specific domains. We propose a Multi-level Distributional Signatures based model, namely MultiDS. Firstly, inspired by pretrained language models, we compute distributional signatures from an extra large news corpus, and we denote these as domain-agnostic features. Then we calculate the distributional signatures from texts in the same domain and texts from the same class, respectively. These two kinds of information are regarded as domain-specific and class-specific features, respectively. After that, we fuse and translate these three distributional signatures into word-level attention values, which enables the model to capture informative features as domain changes. In addition, we utilize domain-specific distributional signatures for the calibration of feature representations in specific domains. The calibration vectors produced by the domain-specific distributional signatures and word embeddings help the model adapt to various domains. Extensive experiments are performed on four benchmarks. The results demonstrate that our proposed method beats the state-of-the-art method with an average improvement of 1.41% on four datasets. Compared with five competitive baselines, our method achieves the best average performance. The ablation studies prove the effectiveness of each proposed module.

论文免费下载链接：

https://www.mdpi.com/2076-3417/13/2/1202

文章原载于作者的科学网文章，所述内容属作者个人观点，不代表本平台立场。

本文经过系统重新排版,阅读原内容可点击阅读原文

热榜

大数据与人工智能的伦理挑战（1）

磨刀不误砍柴工

诸神归位——我电院系调整的原因及必要性分析

妈妈给了我什么？——兼谈儿童教育

专业放大镜：生物技术（生物-信息复合培养实验班）

成电建校史

推荐描述危机时刻的选择的短篇小说《堪萨斯》

网络信息挖掘的关键算法研究（上）

大数据与人工智能的伦理挑战（2）

与“科成”行知学院15届学生话别

随便看看

统计物理与复杂性专辑

IQ调制、成型滤波及星座映射

遭遇短信-电信诈骗

回答了一个关于乒乓操作的问题

从郭德纲采访谈到我对数电改革的一些看法

作业布置多了导致学生抄袭，如何解？

家中菜

转一篇子柯爸爸妈妈的报道，特感人

汶川大地震14周年：好好活着，已是最大的幸运