High performance computing for 4D-LSM
Parallel computing that distributes computational tasks to several processors and executes concurrently has extensive applications in geology and rock engineering, where models are always large-scale and computationally intensive. By splitting the original domain to be solved into many subdomains (the so-called domain decomposition method), the computational time and memory requirements are considerably reduced compared to those of serial computing. Currently, domain decomposition has been unitized in the finite element method, mesh-free method, finite difference method, discrete element method, lattice spring model, molecular dynamics, mortar element method and mixed finite elements, as well as other scientific computing approaches such as heat conductive analysis and direct simulation Monte Carlo computation. These methods show the practicality of domain decomposition in scientific computing. With the aid of domain decomposition, numerical simulations such as underground power houses, compositional reservoirs, seismic, slope stability analysis, incompressible athermal flows, two-phase flow problems, multifield fluid problems and discrete crack propagation have been carried out.
In parallel computing using domain decomposition, the time consumed by interprocessor communication is significant, especially in a case with many processors. To minimize the communication cost between processors, geometric partitioning is designed to minimize the size or length of the subdomain boundary. The most intuitive and simplest domain decomposition method is regular cuboid subdomains of the same size . This method, also known as the linked cell method, is still widely used in parallel implementation at present since it is extremely time-saving and straightforward. For high-performance parallel computing, it is critical to balance the computational load among processors as much as possible while minimizing the cost of interprocessor communication. Generally, there are 26 neighbouring subdomains in three-dimensional space that need to communicate with a typical subdomain in the linked cell method1, and the number can be reduced dramatically to 6 by using proper communication methodology. However, for these computational models in geotechnical engineering such as tunnels, buildings and slopes, regular cuboid domain decomposition results in subdomains with different numbers of numerical elements, which leads to computational times being spent on different processors that are unequal, i.e. an imbalanced workload among processors. A waste of computing resources occurs when some processors complete their computations ahead of the others, whereas the next step of computing does not continue all processors have completed their computations.
In this work, an improved domain decomposition method is developed, aiming at discrete numerical models that solve problems in large-scale geotechnical engineering such as underground tunnels, dams, slopes and buildings. The original domain is first partitioned into cubic subdomains. A sparse cuboid topology is adopted to fit the requirements of common geotechnical models, and a six-neighbouring communication strategy is used to minimize the communication cost. To balance the computational load among subdomains, the simulated annealing algorithm (SAA), which can realize the optimization of discrete functions, is used. For an arbitrary profile geometry, the abovementioned graph partition and SAA-optimized linked cell method follow different paths to obtain a reasonable domain decomposition result. The subdomain boundaries generated by graph partitioning always have complex shapes, and the decomposed subdomains are basically the same size. When implemented in numerical simulations, these subdomain boundaries with complex shapes cause difficulty in data communication, which will be discussed later, and a corresponding difficulty in programming. An SAA-optimized linked cell method retains the main advantage of the linked cell method and is still intuitive and sampled. The improved domain decomposition method is implemented in a newly developed discrete numerical model named the four-dimensional lattice spring model (4D-LSM56) by using a message passing interface (MPI). The performance of the parallel 4D-LSM is checked on a medium-sized cluster and a workstation against a number of numerical models including dams, tunnels, open pits and bridges. The numerical results demonstrate that the proposed domain decomposition method can improve the workload balance among processors compared to the original cubic decomposition method. Without workload balance, some computing nodes with heavier computing tasks may run out of memory when implementing the parallel computing of an extremely large-scale model even though the computational capacity of the cluster satisfies the requirements, causing the calculation to be infeasible. As shown in Figure 1A, which is a three-dimensional dam model, the computing task that each node carries out varies from 3.4% to 30.1% of the overall computing task. In this case, the highest memory footprint of the computing node is more than eight times the lowest footprint. Therefore, the proposed method is meaningful for breaking through the limit of computational scale in solving large-scale numerical simulations of geotechnical engineering. Moreover, the improvement also brings higher speed up in calculation than original cubic decomposition method by at most 40%. As a flexible optimization method, SAA can take more complex factors into consideration, such as the time consumed by communication tasks, by applying a proper target function. Hence, two different types of target functions are tested in this work, and their influences on the performance of the parallel 4D-LSM are investigated. The advantages and disadvantages of the proposed domain decomposition method are discussed. Three types of difficulties when realizing large-scale numerical simulations are also discussed. Considering currently available computing resources, the maximum number of parallel 4D-LSM particles is set to one billion. To minimize the time consumption in calculation and preprocessing, the preprocessing of 4D-LSM is redesigned, and a new particle generation method is proposed that is more suitable for calculation on supercomputers. Two one-billion-particle models are realized, and the results show the advantages of the large-scale three-dimensional model.
Article classification: 英文版高性能