Runtime Tuning
Besides the obvious tuning parameters exposed through the compile time options (see Compilation DEFINES, CUDA Compilation, and MPI Compilation), divERGe enables fine-grained CPU parallelization control using the following environment variables:
DIVERGE_OMP_NUM_THREADS=VAL:set the (maximum) number of OpenMP threads that execute parallel regions in the core components of divERGe to
VAR. This specifically excludes parallelism in components like FFTW, OpenBLAS, etc. All loops within the source code are equipped with anum_threads()OpenMP clause. For user programs, a function that returns the number of threads is exported asdiverge_omp_num_threads().DIVERGE_FFTW_NUM_THREADS=VAL:set the number of FFTW threads to
VAL.DIVERGE_OMP_IMPLICIT_NUM_THREADS=VAL:set the number of OpenMP threads to
VALfor all OpenMP regions that do not specifynum_threads(). Especially useful if linking to, e.g., OpenBLAS, where thread control is not possible apart from this global, implicit OpenMP setting.DIVERGE_SHARED_MALLOC_EXTRA_COLORS=VAL:boolean flag to control additional splitting of shared memory communicators. may be benificial for performance in some cases; and very useful for debugging.
DIVERGE_SYMCHECK_MPI:pass around the diverge symmetry maps among MPI ranks in round-robin fashion and then check whether they are equal. useful for debugging MPI related symmetry issues. MPI communication function called upon symmetry map creation.
DIVERGE_TU_SYMCHECK_MPI:same as
DIVERGE_SYMCHECK_MPIbut for TUFRG specific data structures.DIVERGE_BATCHED_EIGEN3:enforce the batched eigensolver on CPU compiled with Eigen3 even if CUDA is compiled in