MPM Integration

The material point method (MPM) is used to simulate continuous matter and is especially suited for the simulation of large deformations. Once large deformation are present, a dynamic load balancing solution is sensible to efficiently simulate large systems. Even if the initial work distribution is good, it is very often the case, that this distribution is much less so during the simulation run itself. The load balancing library ALL provides an easy plug and play solution to this problem and this module describes the details in how the library is integrated. Thanks to the good load balancing provided by the library larger systems can be simulated with less computational cost.

Purpose of Module

This module shows the straight forwardness of including the load balancing library into already existing code. Depending on the simulation code additional precautions must be taken and those needed for the MPM simulation code are presented here. The prerequisites for the simulation code are also shown. Looking at these will help determine whether a simulation code is particularly suited for integrating ALL or if some further work is needed when integrating.

This module also shows a real world application of the Fortran interface provided with ALL (documented in ALL Fortran interface).

The MPM simulation code with integrated ALL is used by Stephan Schulz in his thesis.

Background Information

The load balancing library ALL is integrated into the material point method simulation code GMPM-PoC, which is written by Stephan Schulz during his thesis. The simulation code will be released to the public in the future.

Certain aspects of the simulation code require additional treatment of the library, or additional features of the library. First, the open boundaries of the simulation code require continuous updates of the outer domain boundaries of the boundary domains. The system extent is adapted to the particle’s bounding box each time step. This also means, the geometry generated in the last balance step by the library cannot be used directly. It is therefore saved by the simulation code, adapted to the new system extent and then given to the library as the basis for the new geometry.

The communication is based on grid halos and only accommodates nearest neighbor communication. This causes the minimum domain size to be the width of exactly this halo. The library supports this feature and only the aforementioned outer domain bounds must be checked for compliance with the minimum size. The other domain boundaries are automatically sufficiently large due to the library.

And lastly, the particle communication at the end of each time step also only accounts for nearest neighbor communication. This means, that a domain’s boundary must not change so much, that it needs to retrieve particles from a process that is not its nearest neighbor. Due to the way the library moves boundaries in the staggered grid and tensor approaches, this is also already guaranteed to be true. There is always an overlap between the old domain’s volume and the new domain’s.

However, the library also has a few requirements of the simulation code. Due to changing domains, particles must be able to be communicated across processes, which is implemented for all particle codes with moving particles. Depending on the load balancing method this communication may be more involved. In the case of the tensor method the topology does not change and every process has the same 26 neighbors during the entire simulation. If, however, the staggered grid approach is used, the communication must also handle changing number of neighbors and determine where they are and what regions they belong to. For example it is common, that one half of a boundary must be communicated to one process and the other to a different one. So the fixed relationship between boundaries and neighbors is broken up. The GMPM-PoC code was already designed with such a communication scheme in mind and provided the necessary flexibility to simply enable the staggered grid method after fixing a few communication bugs.

Building and Testing

To build the code just run make LB=ALL and everything should be build automatically including dependencies. Make sure the correct compiler are found in the path and if you want to use Intel compilers you need to set COMPILER=INTEL as well. The normal caveats and required modules for some HPC systems are the described in the main code’s README.

Source Code

The main changes are the replacement of the original domain decomposition function which used to equi partition the system extent. Now, ALL is called to update the domain geometry.

  1! The following additional functions were used:
  2!
  3! Additional information:
  4! - LB_METHOD_PRE is set by the preprocessor to ALL_STAGGERED or ALL_TENSOR.
  5! - The domain_bounds_old will only be used during initialisation for the
  6!   initial domain configuration.
  7! - ``this_image()`` returns the current image index, i.e. current MPI rank+1.
  8! - The work ist estimated using ``lb_estimate_work`` which takes the current
  9!   domain size and number of particles as arguments.
 10
 11    function domain_decomposition_jall(bounds, dh, num_images3, domain_bounds_old, work, output) result(domain_bounds)
 12        use ISO_C_BINDING
 13        type(boundingbox), intent(in) :: bounds !< simulation bounds
 14        real (kind = real_kind), intent(in) :: dh !< grid width
 15        integer, dimension(3), intent(in) :: num_images3 !< the 1 indexed number of images in 3D
 16        type(boundingbox_aligned), intent(in) :: domain_bounds_old !< current domain bounds
 17        real(real_kind), intent(in) :: work !< work of this domain
 18        logical, intent(in) :: output !< output domain bounds to `vtk_outline` directory
 19        type(boundingbox_aligned) :: domain_bounds
 20        type(boundingbox), save :: domain_old
 21        type(ALL_t), save :: jall ! ALL object which is initialized once
 22        real (kind = real_kind), dimension(3,2) :: verts
 23        integer, dimension(3), save :: this_image3 ! the 1 indexed image number in 3D
 24        logical, save :: first_run = .true.
 25        integer, save :: step = 1
 26        logical, dimension(2,3), save :: domain_at_sim_bound ! true if domain is at the lower/upper simulation boundary
 27        real (kind = real_kind), dimension(3), save :: min_size
 28        integer(c_int), parameter :: LB_METHOD = LB_METHOD_PRE
 29        character (len=ALL_ERROR_LENGTH) :: error_msg
 30        if(first_run) then
 31            ! calculate this_image3
 32            block
 33                integer :: x,y,z, cnt
 34                cnt = 1
 35                do z=1,num_images3(3)
 36                    do y=1,num_images3(2)
 37                        do x=1,num_images3(1)
 38                            if(this_image()==cnt) this_image3 = (/ x,y,z /)
 39                            cnt = cnt + 1
 40                        enddo
 41                    enddo
 42                enddo
 43            end block
 44            call jall%init(LB_METHOD,3,4.0d0)
 45            call jall%set_proc_grid_params(this_image3-1, num_images3)
 46            call jall%set_proc_tag(this_image())
 47            call jall%set_communicator(MPI_COMM_WORLD)
 48            min_size(:) = (abs(Rcont_min)+abs(Rcont_max))*dh
 49            call jall%set_min_domain_size(min_size)
 50            domain_old%bounds_unaligned = domain_bounds_old%bounds_aligned
 51            domain_at_sim_bound(1,:) = this_image3==1 ! first image in a direction is automatically at sim bound
 52            domain_at_sim_bound(2,:) = this_image3==num_images3 ! last image likewise at sim bound
 53            call jall%setup()
 54        endif
 55        call jall%set_work(real(work,real_kind))
 56        !! The `domain_old` bounds are not the actual domain bounds, which
 57        !! are aligned to grid widths, but what we got from the previous
 58        !! iteration of load balancing. However, the simulation boundaries are
 59        !! unchanged by the load balancing.
 60        block
 61            type(boundingbox_aligned) :: aligned_bnds
 62            real (kind = real_kind), dimension(3) :: lowest_upper_bound, highest_lower_bound
 63            !> Align the simulation boundaries to the grid and add an additional
 64            !! grid width on the top. These may be used instead of our current
 65            !! bounds, so they should align properly on the upper bound, if we
 66            !! are a simulation boundary. If the simulation bounds have not
 67            !! changed they should still coincide with the domain bounds.
 68            aligned_bnds%bounds_aligned = floor(bounds%bounds_unaligned/dh)*dh
 69            aligned_bnds%bounds_aligned(2,:) = aligned_bnds%bounds_aligned(2,:) + dh
 70            !> To make sure, the shrinking domain is still always large enough
 71            !! and in particular is not shrunk into the neighbouring domain.
 72            !! This can happen if the bounding box is not present in the current
 73            !! domain, so the outer bound is moved across the inner bound. This
 74            !! must be avoided at all cost. Additionally, we also need to ensure
 75            !! the minimum domain width. Also, the outer bound of all boundary
 76            !! domains, must be the same. To achieve this, the outermost inner
 77            !! bound is calculated in each direction. This then allows us to
 78            !! compute the innermost position any outer bound may have to still
 79            !! be the required distance from every next inner bound.
 80            ! For the lowest domains:
 81            lowest_upper_bound = comm_co_min_f(domain_old%bounds_unaligned(2,:))
 82            aligned_bnds%bounds_aligned(1,:) = min(lowest_upper_bound-min_size, aligned_bnds%bounds_aligned(1,:))
 83            ! For the highest domains:
 84            highest_lower_bound = comm_co_max_f(domain_old%bounds_unaligned(1,:))
 85            aligned_bnds%bounds_aligned(2,:) = max(highest_lower_bound+min_size, aligned_bnds%bounds_aligned(2,:))
 86            ! And now set the boundary domains outer bounds to the new, fixed bounds
 87            where(domain_at_sim_bound)
 88                domain_old%bounds_unaligned = aligned_bnds%bounds_aligned
 89            end where
 90        end block
 91        !> Make sure that the old domain bounds are sensible. we are only
 92        !! updating them, based in the previous value. This also means
 93        !! the first call must already contain a sensible approximation
 94        !! (the equidistant (geometric) distribution suffices for that).
 95        verts = transpose(domain_old%bounds_unaligned)
 96        call jall%set_vertices(verts)
 97        call jall%balance()
 98        call jall%get_result_vertices(verts)
 99        domain_bounds%bounds_aligned = transpose(verts)
100        domain_old%bounds_unaligned = domain_bounds%bounds_aligned
101        domain_bounds%bounds_aligned = nint(domain_bounds%bounds_aligned/dh)*dh
102        if(output) then
103            call ALL_reset_error()
104            call jall%print_vtk_outlines(step)
105            if(ALL_error() /= 0) then
106                error_msg = ALL_error_description()
107                print*, "Error in ALL detected:"
108                print*, error_msg
109            endif
110        endif
111        first_run = .false.
112        step = step + 1
113        call assert_domain_width(domain_bounds, dh)
114    end function
115
116    !> Estimate local work
117    function lb_estimate_work(n_part, domain_bounds_old) result(work)
118        integer, intent(in) :: n_part !< number of particles of this domain
119        type(boundingbox_aligned), intent(in) :: domain_bounds_old !< domain bounds
120        real(real_kind) :: work
121        real(real_kind), parameter :: beta = 0.128 ! empirically determined
122        work = n_part + beta*product( domain_bounds_old%bounds_aligned(2,:)-domain_bounds_old%bounds_aligned(1,:) )/grid%dh**3
123    end function

To include the library and its VTK dependency into the existing make build system, the following snippets were used. This builds a ‘minimal’ VTK and links ALL against this. During the linking of the main simulation code VTK is linked using $(VTK_LIB) where the order is very important. The calling of make in this Makefile is deprecated and should be replaced by appropriate calls to cmake --build and cmake --install.

 1MJOBS ?= $(shell getconf _NPROCESSORS_ONLN)
 2JUELICH_ALL_INCLUDE := external/juelich_all_build/include/modules
 3JUELICH_ALL_LIB := external/juelich_all_build/lib/libALL_fortran.a external/juelich_all_build/lib/libALL.a
 4VTK_LIB := $(subst lib/,external/vtk_build/lib/, lib/libvtkFiltersProgrammable-7.1.a lib/libvtkIOParallelXML-7.1.a lib/libvtkIOXML-7.1.a lib/libvtkIOXMLParser-7.1.a lib/libvtkexpat-7.1.a lib/libvtkParallelMPI-7.1.a lib/libvtkParallelCore-7.1.a lib/libvtkIOLegacy-7.1.a lib/libvtkIOCore-7.1.a lib/libvtkCommonExecutionModel-7.1.a lib/libvtkCommonDataModel-7.1.a lib/libvtkCommonTransforms-7.1.a lib/libvtkCommonMisc-7.1.a lib/libvtkCommonMath-7.1.a lib/libvtkCommonSystem-7.1.a lib/libvtkCommonCore-7.1.a lib/libvtksys-7.1.a -ldl -lpthread lib/libvtkzlib-7.1.a)
 5
 6# ...
 7
 8# VTK
 9VTKCONFIG_FILE := external/vtk_build/lib/cmake/vtk-7.1/VTKConfig.cmake
10$(VTKCONFIG_FILE):
11	mkdir -p external/vtk_build
12	cd external/vtk_build && CC=$(CC) CXX=$(CXX) cmake ../vtk -DCMAKE_INSTALL_PREFIX=`pwd` $(EXT_LTO_CMFLAGS) -DBUILD_SHARED_LIBS=OFF -DBUILD_TESTING=OFF -DCMAKE_BUILD_TYPE=Release -DVTK_Group_MPI=OFF -DVTK_Group_Rendering=OFF -DVTK_Group_StandAlone=OFF -DVTK_RENDERING_BACKEND=None -DVTK_USE_CXX11_FEATURES=ON -DModule_vtkCommonDataModel=ON -DModule_vtkFiltersProgrammable=ON -DModule_vtkIOParallelXML=ON -DModule_vtkParallelMPI=ON
13	$(MAKE) -j $(MJOBS) -C external/vtk_build
14	$(MAKE) -C external/vtk_build install
15
16# juelich_all
17$(JUELICH_ALL_LIB): $(VTKCONFIG_FILE)
18	mkdir -p external/juelich_all_build
19	cd external/juelich_all_build && CC=$(CC) CXX=$(CXX) cmake ../juelich_all $(EXT_LTO_CMFLAGS) -DCMAKE_Fortran_FLAGS="-Wall -Wextra -fbacktrace $(EXT_LTO)" -DCMAKE_INSTALL_PREFIX=`pwd` -DCM_ALL_FORTRAN=ON -DCM_ALL_USE_F08=$(ALL_USE_F08) -DCMAKE_BUILD_TYPE=Release -DCM_ALL_DEBUG=OFF -DCM_ALL_VTK_OUTPUT=ON -DVTK_DIR=`pwd`/../vtk_build/lib/cmake/vtk-7.1
20	$(MAKE) -C external/juelich_all_build
21	$(MAKE) -C external/juelich_all_build install