基于改进ViT模型的抽水蓄能机组异常声音检测

郭明; 戴鸿清; 张志兵; 孙波; 许颜贺

基于改进ViT模型的抽水蓄能机组异常声音检测

Abnormal Sound Detection for Pumped Storage Units Based on an Improved ViT Model

摘要

摘要: 针对抽水蓄能机组运行状态异常检测中工况变化频繁、故障声学信号样本少以及数据不平衡的问题，本文提出了一种基于改进Vision Transformer (ViT)模型的抽水蓄能机组异常声音检测方法。首先，通过梅尔频谱算法将一维声学信号转换为二维语谱图，增强故障样本的信息量；进一步将生成的语谱图输入至ViT网络中，利用自注意力层与图像特征的交互机制，学习多工况声学数据之间的不变特征；同时，通过所提出的领域提示和提示适配模块根据源域和目标域之间的特征相似性，预测目标域的机组状态信息。结果表明，本文方法在实测数据集上的平均准确率为90.0%，召回率为87.9%，F1 值为88.7%；在MIMII数据集上各项指标相较其他方法平均提高8.7%、6.92%、4.52%。因此，本文所提模型能够有效应对多工况和少样本情况下的异常检测任务。

Abstract: To address the challenges of frequent condition variations, limited fault acoustic signal samples, and imbalanced data in pumped storage unit anomaly detection, this paper proposes an improved Vision Transformer (ViT)-based method for detecting abnormal acoustic signals. First, the Mel spectrogram algorithm is employed to convert one-dimensional acoustic signals into two-dimensional spectrograms, enhancing the information content of fault samples. The generated spectrograms are then input into the ViT network, where the interaction mechanism between self-attention layers and image features is leveraged to learn invariant features across multiple working conditions. Furthermore, a domain prompt and prompt adaptation module is introduced to predict the unit status in the target domain based on feature similarities between the source and target domains. Results show that the proposed method achieves an average accuracy of 90.0%, a recall of 87.9%, and an F1-score of 88.7% on a real-world dataset. On the MIMII dataset, it outperforms other methods by 8.7%, 6.92%, and 4.52% in accuracy, recall, and F1-score, respectively. Therefore, the proposed model effectively addresses anomaly detection tasks under multi-condition and limited-sample scenarios.

HTML全文

参考文献(0)

施引文献

资源附件(0)