Abstract:
The traditional target detection algorithms face limitations of categories when they identify river floating objects, which fails to meet current demands for fine-grained detection. To address this, a multi-technology fusion method based on deep learning is proposed. First, DeepLabv3+ semantic segmentation is applied to separate the riverbank and eliminate background interference from the shore. Then, by integrating Detic′s open-vocabulary detection capability, the range of detectable categories is expanded from dozens to tens of thousands through a custom vocabulary. Finally, a whitelist-blacklist mechanism is introduced to filter candidate floating objects, excluding non-floating targets and focusing on the types of floatables that are relevant to environmental management. Experimental results show that while ensuring detection efficiency, the proposed method achieves a category recognition accuracy of 87.8%, outperforming the YOLOv10 algorithm by 4.8% and the YOLOv12 algorithm by 1.7%, thereby realizing fine-grained detection of river floating objects. Furthermore, the method supports deployment on edge devices, with an inference speed of 0.179 seconds per frame, meeting the requirements for accurate management in complex river environments.