MPM Integration
The material point method (MPM) is used to simulate continuous matter and is especially suited for the simulation of large deformations. Once large deformation are present, a dynamic load balancing solution is sensible to efficiently simulate large systems. Even if the initial work distribution is good, it is very often the case, that this distribution is much less so during the simulation run itself. The load balancing library ALL provides an easy plug and play solution to this problem and this module describes the details in how the library is integrated. Thanks to the good load balancing provided by the library larger systems can be simulated with less computational cost.
Purpose of Module
This module shows the straight forwardness of including the load balancing library into already existing code. Depending on the simulation code additional precautions must be taken and those needed for the MPM simulation code are presented here. The prerequisites for the simulation code are also shown. Looking at these will help determine whether a simulation code is particularly suited for integrating ALL or if some further work is needed when integrating.
This module also shows a real world application of the Fortran interface provided with ALL (documented in ALL Fortran interface).
The MPM simulation code with integrated ALL is used by Stephan Schulz in his thesis.
Background Information
The load balancing library ALL is integrated into the material point method simulation code GMPM-PoC, which is written by Stephan Schulz during his thesis. The simulation code will be released to the public in the future.
Certain aspects of the simulation code require additional treatment of the library, or additional features of the library. First, the open boundaries of the simulation code require continuous updates of the outer domain boundaries of the boundary domains. The system extent is adapted to the particle’s bounding box each time step. This also means, the geometry generated in the last balance step by the library cannot be used directly. It is therefore saved by the simulation code, adapted to the new system extent and then given to the library as the basis for the new geometry.
The communication is based on grid halos and only accommodates nearest neighbor communication. This causes the minimum domain size to be the width of exactly this halo. The library supports this feature and only the aforementioned outer domain bounds must be checked for compliance with the minimum size. The other domain boundaries are automatically sufficiently large due to the library.
And lastly, the particle communication at the end of each time step also only accounts for nearest neighbor communication. This means, that a domain’s boundary must not change so much, that it needs to retrieve particles from a process that is not its nearest neighbor. Due to the way the library moves boundaries in the staggered grid and tensor approaches, this is also already guaranteed to be true. There is always an overlap between the old domain’s volume and the new domain’s.
However, the library also has a few requirements of the simulation code. Due to changing domains, particles must be able to be communicated across processes, which is implemented for all particle codes with moving particles. Depending on the load balancing method this communication may be more involved. In the case of the tensor method the topology does not change and every process has the same 26 neighbors during the entire simulation. If, however, the staggered grid approach is used, the communication must also handle changing number of neighbors and determine where they are and what regions they belong to. For example it is common, that one half of a boundary must be communicated to one process and the other to a different one. So the fixed relationship between boundaries and neighbors is broken up. The GMPM-PoC code was already designed with such a communication scheme in mind and provided the necessary flexibility to simply enable the staggered grid method after fixing a few communication bugs.
Building and Testing
To build the code just run make LB=ALL
and everything should be build
automatically including dependencies. Make sure the correct compiler are
found in the path and if you want to use Intel compilers you need to set
COMPILER=INTEL
as well. The normal caveats and required modules for
some HPC systems are the described in the main code’s README
.
Source Code
The main changes are the replacement of the original domain decomposition function which used to equi partition the system extent. Now, ALL is called to update the domain geometry.
1! The following additional functions were used:
2!
3! Additional information:
4! - LB_METHOD_PRE is set by the preprocessor to ALL_STAGGERED or ALL_TENSOR.
5! - The domain_bounds_old will only be used during initialisation for the
6! initial domain configuration.
7! - ``this_image()`` returns the current image index, i.e. current MPI rank+1.
8! - The work ist estimated using ``lb_estimate_work`` which takes the current
9! domain size and number of particles as arguments.
10
11 function domain_decomposition_jall(bounds, dh, num_images3, domain_bounds_old, work, output) result(domain_bounds)
12 use ISO_C_BINDING
13 type(boundingbox), intent(in) :: bounds !< simulation bounds
14 real (kind = real_kind), intent(in) :: dh !< grid width
15 integer, dimension(3), intent(in) :: num_images3 !< the 1 indexed number of images in 3D
16 type(boundingbox_aligned), intent(in) :: domain_bounds_old !< current domain bounds
17 real(real_kind), intent(in) :: work !< work of this domain
18 logical, intent(in) :: output !< output domain bounds to `vtk_outline` directory
19 type(boundingbox_aligned) :: domain_bounds
20 type(boundingbox), save :: domain_old
21 type(ALL_t), save :: jall ! ALL object which is initialized once
22 real (kind = real_kind), dimension(3,2) :: verts
23 integer, dimension(3), save :: this_image3 ! the 1 indexed image number in 3D
24 logical, save :: first_run = .true.
25 integer, save :: step = 1
26 logical, dimension(2,3), save :: domain_at_sim_bound ! true if domain is at the lower/upper simulation boundary
27 real (kind = real_kind), dimension(3), save :: min_size
28 integer(c_int), parameter :: LB_METHOD = LB_METHOD_PRE
29 character (len=ALL_ERROR_LENGTH) :: error_msg
30 if(first_run) then
31 ! calculate this_image3
32 block
33 integer :: x,y,z, cnt
34 cnt = 1
35 do z=1,num_images3(3)
36 do y=1,num_images3(2)
37 do x=1,num_images3(1)
38 if(this_image()==cnt) this_image3 = (/ x,y,z /)
39 cnt = cnt + 1
40 enddo
41 enddo
42 enddo
43 end block
44 call jall%init(LB_METHOD,3,4.0d0)
45 call jall%set_proc_grid_params(this_image3-1, num_images3)
46 call jall%set_proc_tag(this_image())
47 call jall%set_communicator(MPI_COMM_WORLD)
48 min_size(:) = (abs(Rcont_min)+abs(Rcont_max))*dh
49 call jall%set_min_domain_size(min_size)
50 domain_old%bounds_unaligned = domain_bounds_old%bounds_aligned
51 domain_at_sim_bound(1,:) = this_image3==1 ! first image in a direction is automatically at sim bound
52 domain_at_sim_bound(2,:) = this_image3==num_images3 ! last image likewise at sim bound
53 call jall%setup()
54 endif
55 call jall%set_work(real(work,real_kind))
56 !! The `domain_old` bounds are not the actual domain bounds, which
57 !! are aligned to grid widths, but what we got from the previous
58 !! iteration of load balancing. However, the simulation boundaries are
59 !! unchanged by the load balancing.
60 block
61 type(boundingbox_aligned) :: aligned_bnds
62 real (kind = real_kind), dimension(3) :: lowest_upper_bound, highest_lower_bound
63 !> Align the simulation boundaries to the grid and add an additional
64 !! grid width on the top. These may be used instead of our current
65 !! bounds, so they should align properly on the upper bound, if we
66 !! are a simulation boundary. If the simulation bounds have not
67 !! changed they should still coincide with the domain bounds.
68 aligned_bnds%bounds_aligned = floor(bounds%bounds_unaligned/dh)*dh
69 aligned_bnds%bounds_aligned(2,:) = aligned_bnds%bounds_aligned(2,:) + dh
70 !> To make sure, the shrinking domain is still always large enough
71 !! and in particular is not shrunk into the neighbouring domain.
72 !! This can happen if the bounding box is not present in the current
73 !! domain, so the outer bound is moved across the inner bound. This
74 !! must be avoided at all cost. Additionally, we also need to ensure
75 !! the minimum domain width. Also, the outer bound of all boundary
76 !! domains, must be the same. To achieve this, the outermost inner
77 !! bound is calculated in each direction. This then allows us to
78 !! compute the innermost position any outer bound may have to still
79 !! be the required distance from every next inner bound.
80 ! For the lowest domains:
81 lowest_upper_bound = comm_co_min_f(domain_old%bounds_unaligned(2,:))
82 aligned_bnds%bounds_aligned(1,:) = min(lowest_upper_bound-min_size, aligned_bnds%bounds_aligned(1,:))
83 ! For the highest domains:
84 highest_lower_bound = comm_co_max_f(domain_old%bounds_unaligned(1,:))
85 aligned_bnds%bounds_aligned(2,:) = max(highest_lower_bound+min_size, aligned_bnds%bounds_aligned(2,:))
86 ! And now set the boundary domains outer bounds to the new, fixed bounds
87 where(domain_at_sim_bound)
88 domain_old%bounds_unaligned = aligned_bnds%bounds_aligned
89 end where
90 end block
91 !> Make sure that the old domain bounds are sensible. we are only
92 !! updating them, based in the previous value. This also means
93 !! the first call must already contain a sensible approximation
94 !! (the equidistant (geometric) distribution suffices for that).
95 verts = transpose(domain_old%bounds_unaligned)
96 call jall%set_vertices(verts)
97 call jall%balance()
98 call jall%get_result_vertices(verts)
99 domain_bounds%bounds_aligned = transpose(verts)
100 domain_old%bounds_unaligned = domain_bounds%bounds_aligned
101 domain_bounds%bounds_aligned = nint(domain_bounds%bounds_aligned/dh)*dh
102 if(output) then
103 call ALL_reset_error()
104 call jall%print_vtk_outlines(step)
105 if(ALL_error() /= 0) then
106 error_msg = ALL_error_description()
107 print*, "Error in ALL detected:"
108 print*, error_msg
109 endif
110 endif
111 first_run = .false.
112 step = step + 1
113 call assert_domain_width(domain_bounds, dh)
114 end function
115
116 !> Estimate local work
117 function lb_estimate_work(n_part, domain_bounds_old) result(work)
118 integer, intent(in) :: n_part !< number of particles of this domain
119 type(boundingbox_aligned), intent(in) :: domain_bounds_old !< domain bounds
120 real(real_kind) :: work
121 real(real_kind), parameter :: beta = 0.128 ! empirically determined
122 work = n_part + beta*product( domain_bounds_old%bounds_aligned(2,:)-domain_bounds_old%bounds_aligned(1,:) )/grid%dh**3
123 end function
To include the library and its VTK dependency into the existing make build
system, the following snippets were used. This builds a ‘minimal’ VTK and
links ALL against this. During the linking of the main simulation code VTK
is linked using $(VTK_LIB)
where the order is very important. The
calling of make in this Makefile is deprecated and should be replaced by
appropriate calls to cmake --build
and cmake --install
.
1MJOBS ?= $(shell getconf _NPROCESSORS_ONLN)
2JUELICH_ALL_INCLUDE := external/juelich_all_build/include/modules
3JUELICH_ALL_LIB := external/juelich_all_build/lib/libALL_fortran.a external/juelich_all_build/lib/libALL.a
4VTK_LIB := $(subst lib/,external/vtk_build/lib/, lib/libvtkFiltersProgrammable-7.1.a lib/libvtkIOParallelXML-7.1.a lib/libvtkIOXML-7.1.a lib/libvtkIOXMLParser-7.1.a lib/libvtkexpat-7.1.a lib/libvtkParallelMPI-7.1.a lib/libvtkParallelCore-7.1.a lib/libvtkIOLegacy-7.1.a lib/libvtkIOCore-7.1.a lib/libvtkCommonExecutionModel-7.1.a lib/libvtkCommonDataModel-7.1.a lib/libvtkCommonTransforms-7.1.a lib/libvtkCommonMisc-7.1.a lib/libvtkCommonMath-7.1.a lib/libvtkCommonSystem-7.1.a lib/libvtkCommonCore-7.1.a lib/libvtksys-7.1.a -ldl -lpthread lib/libvtkzlib-7.1.a)
5
6# ...
7
8# VTK
9VTKCONFIG_FILE := external/vtk_build/lib/cmake/vtk-7.1/VTKConfig.cmake
10$(VTKCONFIG_FILE):
11 mkdir -p external/vtk_build
12 cd external/vtk_build && CC=$(CC) CXX=$(CXX) cmake ../vtk -DCMAKE_INSTALL_PREFIX=`pwd` $(EXT_LTO_CMFLAGS) -DBUILD_SHARED_LIBS=OFF -DBUILD_TESTING=OFF -DCMAKE_BUILD_TYPE=Release -DVTK_Group_MPI=OFF -DVTK_Group_Rendering=OFF -DVTK_Group_StandAlone=OFF -DVTK_RENDERING_BACKEND=None -DVTK_USE_CXX11_FEATURES=ON -DModule_vtkCommonDataModel=ON -DModule_vtkFiltersProgrammable=ON -DModule_vtkIOParallelXML=ON -DModule_vtkParallelMPI=ON
13 $(MAKE) -j $(MJOBS) -C external/vtk_build
14 $(MAKE) -C external/vtk_build install
15
16# juelich_all
17$(JUELICH_ALL_LIB): $(VTKCONFIG_FILE)
18 mkdir -p external/juelich_all_build
19 cd external/juelich_all_build && CC=$(CC) CXX=$(CXX) cmake ../juelich_all $(EXT_LTO_CMFLAGS) -DCMAKE_Fortran_FLAGS="-Wall -Wextra -fbacktrace $(EXT_LTO)" -DCMAKE_INSTALL_PREFIX=`pwd` -DCM_ALL_FORTRAN=ON -DCM_ALL_USE_F08=$(ALL_USE_F08) -DCMAKE_BUILD_TYPE=Release -DCM_ALL_DEBUG=OFF -DCM_ALL_VTK_OUTPUT=ON -DVTK_DIR=`pwd`/../vtk_build/lib/cmake/vtk-7.1
20 $(MAKE) -C external/juelich_all_build
21 $(MAKE) -C external/juelich_all_build install