Abstract:
The reliability of sample data for regional geological hazard susceptibility evaluation is one of the key factors affecting the final evaluation results.Taking 236 geological hazards in Baihe County, Shaanxi Province as the research objects, hierarchical clustering algorithm and dynamic K-means clustering algorithm were used to cluster and analyze the geological hazard sample data and obtain their sample purity, respectively.The analysis results indicated that the sample purity produced by hierarchical clustering algorithm and K-means clustering algorithm were 91.53% and 92.80%,respectively.Combining the results of these two algorithms, 20 sample noise points were eliminated, and 216 valid sample points were finally determined with a sample purity of 91.53%.The data before and after sample purification were used to establish an information value(IV) model, namely the pre-IV and post-IV models, to carry out regional geological hazard susceptibility evaluation.The results showed that the number of geological hazards located in the very high and high susceptibility zones of regional susceptibility maps generated by the pre-IV and post-IV models were 149 and 167,accounting for 63.13% and 70.77% of the total hazards, and the hazard densities were 0.508/km2 and 0.584/km2,respectively.Compared with initial samples, the number of hazards in very high and high prone regions increased by 18 after sample purification, and the hazard density increased by 0.076/km2,which made the distribution of geological hazards more concentrated and the prediction results more accurate.The results can provide theoretical and scientific basis for initial sample data purification in geological hazard susceptibility evaluation research.