基于改进ViT模型的抽水蓄能机组异常声音检测

郭明; 戴鸿清; 张志兵; 孙波; 许颜贺

doi:10.16232/j.cnki.1001-4179.2026.03.029

基于改进ViT模型的抽水蓄能机组异常声音检测

Abnormal sound detection for pumped storage units based on improved ViT model

摘要

摘要: 针对抽水蓄能机组运行状态异常检测中工况变化频繁、故障声学信号样本少以及数据不平衡的问题，提出了一种基于改进Vision Transformer (ViT)模型的抽水蓄能机组异常声音检测方法。首先，通过梅尔频谱算法将一维声学信号转换为二维语谱图，增强故障样本的信息量; 然后，将生成的语谱图输入至ViT网络中，利用自注意力层与图像特征的交互机制，学习多工况声学数据之间的不变特征; 最后，通过所提出的领域提示和提示适配模块根据源域和目标域之间的特征相似性，预测目标域的机组状态信息。结果表明，所提方法在实测数据集上的平均准确率为90.0%，召回率为87.9%，F₁分数为0.887;在MIMII数据集上各项指标比其他方法平均提高8.7%，6.92%，4.52%。所提模型能够有效应对多工况和少样本情况下的异常检测任务。

Abstract: To address the challenges of frequent operating condition changes, limited fault acoustic samples, and data imbalance in the anomaly detection of pumped storage units, this paper proposes an improved Vision Transformer (ViT)-based method for abnormal acoustic signal detection. First, the Mel-spectrogram algorithm was employed to convert one-dimensional acoustic signals into two-dimensional spectrograms, enriching the information content of the fault samples. These spectrograms were then fed into the ViT network, which leveraged the interaction mechanism between the self-attention layers and image features to learn features that were invariant across multiple operating conditions. Furthermore, a domain prompt and prompt adaptation module was introduced. This module predicts the unit′s status in the target domain by assessing feature similarities between the source and target domains. Experimental results on a real-world dataset demonstrate that the proposed method achieves an average accuracy of 90.0%, a recall of 87.9%, and an F₁-score of 0.887. On the MIMII public dataset, it outperforms other comparative methods, improving accuracy by 8.7%, recall by 6.92%, and F₁-score by 4.52% on average. Therefore, the proposed model effectively accomplishes anomaly detection tasks under conditions of multiple operating states and limited fault samples.

HTML全文

参考文献(22)

施引文献

资源附件(0)