长江流域取水许可领域专业语料库构建与实体识别

    Data enhancement and named entity recognition for knowledge extraction from Yangtze River water intake permit management documentation

    • 摘要: 取水许可管理是水资源节约和保护的重要手段,目前长江流域取水许可管理过程中产生了大量的重要文本资料,但资料的分析利用主要依赖人工,效率较低。为提升取水许可领域文本资料信息提取的智能化水平,本文提出一种长江流域取水许可领域专业语料库构建与实体识别自动化处理方法。针对取水许可领域名词专业性强、样本稀缺的问题,结合专家经验和行业标准,提出基于词典与预训练模型的数据增强方法,构建取水许可领域专业语料库。针对取水许可领域文本资料中句子结构复杂、语言习惯多样以及上下文关联性强的特点,提出融合多特征的水资源实体识别模型,实现取水许可领域资料专业性文本中实体的准确提取。实验评估表明,基于构建的取水许可领域专业语料库训练,取水许可领域文本资料实体识别的准确率达到89.64%,召回率达到88.71%,F1值达到89.26%,业务审批总时间降低了66%,为取水许可领域文本资料的自动化处理提供了有效支撑。

       

      Abstract: Water intake permit management is a crucial tool for water resource conservation and protection. In the Yangtze River Basin, a large volume of important textual data is generated during the water intake permit management process, but the analysis and utilization of this data still rely heavily on manual work, resulting in low efficiency. To enhance the level of automation in extracting information from textual data in the water intake permit domain, this paper proposes a method for constructing a professional corpus and automating entity recognition in the Yangtze River Basin’s water intake permit field. Given the high technicality of domain-specific terms and the scarcity of samples, a data augmentation approach based on dictionaries and pre-trained models is introduced, incorporating expert knowledge and industry standards to build the professional corpus for the water intake permit domain. Addressing the complexity of sentence structures, diverse language patterns, and strong contextual relationships in water intake permit texts, a multi-feature fusion water resource entity recognition model is proposed to accurately extract entities from the domain-specific text. The experimental evaluation indicates that training on the specialized corpus built for the water intake permit domain achieves an accuracy of 89.64%, a recall rate of 88.71%, and an F1 score of 89.26% for entity recognition in water intake permit text data. Additionally, the overall business approval time has been reduced by 66%, providing effective support for the automation of water intake permit text processing.

       

    /

    返回文章
    返回