Abstract:
The flow direction and accumulation algorithms serve as the foundation for hydrological and hydrodynamic simulations on slopes. Implementing the D8 flow accumulation parallel algorithm under the CUDA architecture can effectively accelerate simulation speed. The parallel strategy of the algorithm has become a targeted research factor for addressing access conflicts during computation. We optimized the parallelization strategy of the D8 algorithm using the atomicadd function under the CUDA architecture and applied it to extract river networks from sub-basins at different spatial scales in the Ganjiang River Basin (including the upper, upper-middle, and entire basin). The extraction accuracy, the acceleration effect and its scale effect were assessed. It demonstrated that the stream network extraction achieved a comparable accuracy to that of the classical algorithm, with relative errors in stream length, basin area, and drainage network density all below 0.3%. Under the CUDA architecture, the computation time of the parallel D8 strategy was significantly reduced compared to both the ArcGIS serial algorithm and the Matlab serial algorithm, with the order of efficiency as: CUDA D8 parallel < ArcGIS serial < Matlab serial. Additionally, the speedup ratio was proportional to the number of thread blocks and grids. In special, when the number of thread blocks ≤ 128 and >128, the optimal speedup occurred at the number of grid below 1 024 and above 65 536 respectively. A decreasing effect in speedup was observed along with increasing spatial scales, for example the decline amplitude of the ArcGIS speedup ratio for the middle-upper and whole Ganjiang River basin exceeded 20% compared to the upper reach. The parallel strategy of the D8 algorithm can provide a theoretical reference for the parallel computing of hydrological-hydrodynamic models.