Linear Algebra
divERGe exports some of its linear algebra functions to the C interface. They are useful, because they provide convenient wrappers around annoying CUDA functions when compiling with GPU support.
-
void batched_eigen_r(int *devices, int ndevices, complex128_t *H, double *E, index_t dim, index_t Num)
[source] batched eigensolver for Hermitian matrices, good for many small matrices.
- Parameters:
devices (int*) – list of CUDA devices, may be NULL, in which case control can be achieved through
ndevices: ifndevices == 0do CPU eigensolution. Ifndevices == -1do GPU eigensolution with automatic device find. Ifndevices == -2do GPU eigensolution with automatic device find and automatic chunking. If CUDA is not compiled in,devicesis ignored.ndevices (int) – number of CUDA devices. If
devices != NULLandndevices < 1do automatic setup as ifdeivecs == NULL. Otherwise, use the device list provided indevices.H (complex128_t*) – input and output for the matrices to diagonalize. Matrices must be contiguous in memory and ordered as (i,o,p), where the latter two indices correspond to the matrix rows p and columns o (i.e. FORTRAN order). On output, the eigenvectors are stored as
(i,b,o)withithe index of the eigenproblem,bthe index of the eigenvalue, andothe vector component.E (double*) – output array for the eigenvalues. Given as (i,b) in memory, with each individual eigenproblem sorted in standard fashion.
dim – linear size of each individual problem (
b,o,p)Num – number of eigenproblems (
i)
-
void batched_eigen_r_set_eigen3_mode(int use_eigen3)
[source] force the CPU eigensolver built with Eigen3 for
batched_eigen_r()
-
void batched_eigen_set_nchunks(int nchunks)
[source] set the number of chunks to use in the batched eigen CUDA backend (which operates with 32bit indices, thus enforcing an additional chunking). not thread safe.
-
void batched_eigen_set_nthreads(int nthr)
[source] we need this (annoying) function because some LAPACKE implementations won’t accept running with the implicitly set number of OMP threads
-
void batched_eigen_shut_up(int silence)
[source] disable or enable all
mpi_..._printf()messaged inbatched_eigen()andbatched_eigen_r(). not thread safe.
-
void batched_svd(int *dev, int n_dev, complex128_t *A, complex128_t *U, complex128_t *V, double *S, index_t dim, index_t cnt)
[source] batched SVD on square matrices. if
n_dev < 0, use GPU with automatic devicing. ifn_dev == 0, use CPU. if CUDA is not compiled in, always use CPU. Notably, there is no batched SVD algorithm in CUSOLVER. Therefore, we manually up/download the matrices to the GPUs, making this solver efficient as an eigensolver for larger matrices (seebatched_eigen()for small matrices).special features:
if
n_dev == INT_MIN, do eigensolution instead of SVD. The result is written to U, and V is not touched.if
n_dev == INT_MIN+1same as above, but force CPU algorithm even if CUDA is compiled in
UandVmay beNULL, in which case nothing is copied into the output arrays. one ofUorVmay be the same asA, in which case the input is overwritten.
-
void batched_gemm_overlapping(const complex128_t *Astart, const complex128_t *Bstart, complex128_t *Cstart, index_t N, complex128_t alpha, complex128_t beta, index_t num)
[source] C may overlap with A/B
-
void batched_gemm(const complex128_t *Astart, const complex128_t *Bstart, complex128_t *Cstart, index_t N, complex128_t alpha, complex128_t beta, index_t num)
[source] C must not overlap with A/B
-
void batched_gemm_with_buffers(const complex128_t *Astart, const complex128_t *Bstart, complex128_t *Cstart, char *GPUbuffer[32][3], index_t N, complex128_t alpha, complex128_t beta, index_t num)
[source] non-overlapping, with GPU buffers in order to be cheap on allocations
-
void batched_gemm_vertex_loop(complex128_t *vertex, const complex128_t *loop, index_t qsize, index_t nk, index_t nb)
[source] specialized functions for vertex·loop product in grid code
-
void batched_gemm_loop_vertex(const complex128_t *loop, complex128_t *vertex, index_t qsize, index_t nk, index_t nb)
[source] specialized functions for loop-vertex product in grid code
-
void batched_gemm_N_ll_num(const complex128_t *A, const complex128_t *B, complex128_t *C, index_t N, complex128_t alpha, complex128_t beta, index_t num, bool Aconj, bool Bconj)
[source] assumes col-major order of each individual matrix, A and B may overlap
-
batched_gemm_small_handle_t *batched_gemm_small_init(index_t n, index_t num)
[source] initialization
-
void batched_gemm_small_destroy(batched_gemm_small_handle_t *h)
[source] free resources
-
enum batched_gemm_small_mode_t
[source] collection of operation modes for
batched_gemm_small()
-
void batched_gemm_small_mode(batched_gemm_small_handle_t *h, batched_gemm_small_mode_t mode)
[source] to change the automatically generated mode (either CPU or GPU depending on how the compilation went) to something else, we can use this function on an existing handle
-
void batched_gemm_small(batched_gemm_small_handle_t *h, const complex128_t *A, const complex128_t *B, complex128_t *C)
[source] assumes C ordering for all matrices, with one matrix being in memory straight after the other and no additional strides within the matrices.
-
void batched_zgemd_r(complex128_t *out, const complex128_t *U, const double *E, complex128_t alpha, complex128_t beta, index_t N, index_t num)
[source] calculate batched ‘zgemd’ product
\(O = U * \mathrm{diag}(E) * U^\dagger * \alpha + O * \beta\)
for the arrays out( num, N, N ) (\(O\)), U( num, N, N ), E( num, N )
Note
currently, no GPU version is implemented.
-
void single_eigen(const complex128_t *H, complex128_t *U, double *E, index_t N)
[source] solve a single eigen problem for densely packed matrices in C order
-
void single_svd(const complex128_t *H, complex128_t *U, complex128_t *V, double *S, index_t N)
[source] solve a single SVD for densely packed matrices in C order. if N<0, use non-hermitian solver (dim=|N|) and copy the real part of the eigenvalues
-
void single_eigen_sort(complex128_t *U, double *E, index_t N, char which)
[source] sort eigenvectors and eigenvalues of on eigenproblem according to a given algorithm. Notably, the sort function expects FORTRAN ordered eigenvectors, whereas the
single_eigen()andsingle_svd()functions return C ordered eigenvectors.Sorting options (
which):0:default (
'M')'A':alternating (max negative, max positive, …)
'P':max positive
'N':max negative
'M':magnitude
'm':inverse magnitude
-
index_t single_eigen_sort_part(complex128_t *U, double *E, index_t N, char which, index_t num)
[source] same as
single_eigen_sort(), same as above, but only copy over the firstnumvalues. return the number of values written.
-
index_t single_eigen_sort_N(complex128_t *U, double *E, index_t N, char which, double abs_limit)
[source] same as
single_eigen_sort()in terms of sorting order given bywhich. In addition to the sorting order, we allow for a relative or absolute cutoff inabs_limit. ifabs_limit<0, use relative cutoff. ifabs_limit>0, use absolute cutoff. returns the number of elements the satify the cutoff condition. ifabs_limitis all zero bytes, do not cutoff the eigensolution.
-
complex128_t power_iteration(const complex128_t *A, complex128_t *v, index_t N, complex128_t *aux, index_t *niter, double *tol, const int *iseed)
[source] do power iteration for the maximum eigenvector of matrix
A(NxN, Row major). save eigenvector inv. auxiliary bufferauxcan be NULL or of size2*N. Ifniter != NULL, use the value pointed to as maximum number of iterations. Iftol != NULL, use the value pointed to as minimum tolerance. Ifseed != NULL, use these four integers as seed to construct the random vector. The last integer must be odd.
-
void single_gemm(const complex128_t *A, const complex128_t *B, complex128_t *C, index_t M, index_t N, index_t K, complex128_t alpha, complex128_t beta)
[source] calculates the matrix product \(C = \beta C + \alpha A \cdot B\). A(N,K), B(K,M) and C(N,M) are matrices in C order (Row Major).
-
void single_gemm_serial(const complex128_t *A, const complex128_t *B, complex128_t *C, index_t N, index_t M, index_t K, complex128_t alpha, complex128_t beta)
[source] calculate the matrix product \(C(N,K) = alpha * A(N,M) * B(M,K) + beta * C(N,K)\) with all matrices in C order (Row Major), but with an Eigen3 backend (so no parallelization issues but less ideal runtime).
-
void single_gemm_adjoint(const complex128_t *A, const complex128_t *B, complex128_t *C, index_t M, index_t N, index_t K, complex128_t alpha, complex128_t beta)
[source] calculates the matrix product \(C = \beta C + \alpha A \cdot B^\dagger\). A(N,K), B(M,K) and C(N,M) are matrices in C order (Row Major).
-
void single_square_gemm(const complex128_t *A, int AconjTrans, const complex128_t *B, int BconjTrans, complex128_t *C, index_t n)
[source] calculates the GEMM \(C_{12} = A_{13} B_{32}\) for square matrices stored in C order, i.e., Column-Major. Allows to do Hermitian transposition on either A or B (if the integer argument is nozero).
-
void transpose_inplace(complex128_t *A, index_t N)
[source] in-place transposition of square matrix
A(dimension NxN)
-
void transpose_outofplace(complex128_t *B, const complex128_t *A, index_t N)
[source] out-of-place transposition of square matrix
AintoB(dimension NxN)
-
void transpose_conj_outofplace(complex128_t *B, const complex128_t *A, index_t N)
[source] out-of-place adjoint of square matrix
AintoB(dimension NxN)
-
void matrix_invert(const double *A, index_t sz, double *Ainv)
[source] invert the (sz×sz) matrix
Aand output result toAinv
-
void cmatrix_invert(const complex128_t *A, index_t sz, complex128_t *Ainv)
[source] same as
matrix_invert()but for complex numbers
-
void cfmatrix_invert_F(const gf_complex_t *A, index_t sz, gf_complex_t *Ainv)
[source] same as
cmatrix_invert()but for FORTRAN ordering and—depending on the compilation settings—different precision.
-
complex128_t cmatrix_determinant(const complex128_t *A, index_t sz)
[source] calculate the determinant of a complex square matrix
-
LINALG_IMPLEMENT
[source] Linear Algebra for low dimensions in C
In many parts of divERGe, linear algebra for low-dimensional vector spaces is required. We provide a collection of functions (inline accessible from C, compiled accessible from both C and C++) that are implemented through a macro. So, in your source code, you have to call the implementation macro after including linalg.h:
#include "linalg.h" LINALG_IMPLEMENT
This gives you basic linear algebra functionality for double (
d), complex double (cd), integer (i) and long long integer (ll) vectors and matrices in 2, 3, and 4 dimensions. More dimensions can be implemented withLINALG_IMPL( LINALG_PREFIX, 7 )
where
LINALG_PREFIXis expanded tocla(other prefices may be used by either definingLINALG_PREFIXor giving something else toLINALG_IMPL), and7represents the dimension. If you want to only declare functions and not implement them (for e.g. C++), you can writeLINALG_TYPES( LINALG_PREFIX, 7 )
This statement declares linear algebra functions in dimension 7. Notably, the default symbol visiblity is set to
LINALG_EXPORT, which expands tostatic inline. If you want to use the functions in C++, you have to redefineLINALG_EXPORTto something else (how aboutextern "C"?) in your C++ file that includes linalg.h, as well as in your C file that includes linalg.h in order to define the functions declared to C++.For enhanced portability, we provide the option to disable decomposition support (that relies on LAPACKE). When
LINALG_NO_LAPACKEis defined, the decomposition functions are declared, but their definition is empty.The funcion names are made from a prefix
P, a vector/matrix indicatorX(can beVorM), a dimension indicatorD(an integer), and a datatype indicatorT(the four datatype indicators ared:double,cd:complex128_t,i:int,ll:int64_t). The convention then isPXDT. For example, if the prefix iscla(the default), all functions operating on three-dimensional double vectors begin withclaV3d, and the three dimensional vector data type is calledclaV3d_t.Each macro invokation of
LINALG_IMPLorLINALG_TYPESdeclares/defines the following types and functions:Vectors:
PVDT_tThe structure
PVDT_t(e.g.claV3d_tfor a three-dimensional vector of doubles) has one static array of datatypeTas member:T v[D]. This member holds the vector’s data. The following vector functions are available:PVDT_t PVDT_zero( void ): generate a zero vectorPVDT_t PVDT_const( T val ): generate a constant vector, where each entry is set tovalPVDT_t PVDT_map( const T* p ): map data to a vector (copy)void PVDT_rmap( PVDT_t v, T* p ): map vector to data (copy)PVDT_t PVDT_scale( PVDT_t v, T s ): return \(\boldsymbol{v} \, s\)PVDT_t PVDT_add( PVDT_t a, PVDT_t b ): return \(\boldsymbol{a} + \boldsymbol{b}\)PVDT_t PVDT_sub( PVDT_t a, PVDT_t b ): return \(\boldsymbol{a} - \boldsymbol{b}\)PVDT_t PVDT_conj( PVDT_t a ): return \(\boldsymbol{a}^*\)PVDT_t PVDT_odot( PVDT_t a, PVDT_t b ): return the element-wise product \(\boldsymbol{a} \circ \boldsymbol{b}\)T PVDT_dot( PVDT_t a, PVDT_t b ): scalar product \(\boldsymbol{a} \cdot \boldsymbol{b}\)double PVDT_norm2( PVDT_t a ): squared norm \(\boldsymbol{a} \cdot \boldsymbol{a}^*\)double PVDT_norm( PVDT_t a ): square root of squared norm.PVDT_t PVDT_cross( PVDT_t a, PVDT_t b ): return the cross product \(\boldsymbol{a} \times \boldsymbol{b}\)
Matrices:
PMDT_tThe structure
PMDT_t(e.g.claM3d_tfor a \(3\times3\) matrix of doubles) has one two-dimensional static array of datatypeTas member:T m[D][D]. This member holds the matrix’ data. The following matrix (-vector) functions are available:PMDT_t PMDT_zero( void ): zero-init a matrixPMDT_t PMDT_id( void ): indentity-init a matrixPMDT_t PMDT_map( const T* p ): map data to a matrix (copy)void PMDT_rmap( PMDT_t m, T * p ): map matrix to data (copy)PMDT_t PMDT_scale( PMDT_t v, T s ): return \(\hat{v} \, s\)PMDT_t PMDT_add( PMDT_t a, PMDT_t b ): return \(\hat{a} + \hat{b}\)PMDT_t PMDT_sub( PMDT_t a, PMDT_t b ): return \(\hat{a} - \hat{b}\)PMDT_t PMDT_conj( PMDT_t a ): return \(\hat{a}^*\)PMDT_t PMDT_odot( PMDT_t a, PMDT_t b ): return \(\hat{a} \circ \hat{b}\)T PMDT_dot( PMDT_t a, PMDT_t b ): return \(\mathrm{Tr}(\hat{a} \circ \hat{b})\)double PMDT_norm2( PMDT_t a ): return squared norm \(\mathrm{Tr}(\hat{a} \circ \hat{a}^*)\)double PMDT_norm( PMDT_t a ): return square root of squared normPMDT_t PMDT_matmul( PMDT_t a, PMDT_t b ): return matrix product \(\hat{a} \cdot \hat{b}\)T PMDT_trace( PMDT_t a ): return trace \(\mathrm{Tr}(\hat a)\)PVDT_t PMDT_matvec( PMDT_t a, PVDT_t v ): return matrix vector product \(\hat{a} \cdot \boldsymbol{v}\)PMDT_t PMDT_transpose( PMDT_t a ): return transposed matrix \(\hat{a}^T\)PMDT_t PMDT_adjoint( PMDT_t a ): return adjoint matrix \(\hat{a}^{T*}\)PVDT_t PMDT_vecmat( PVDT_t v, PMDT_t a ): return vector matrix product \((\boldsymbol{v}^T \cdot \hat{a})^T\)PVDd_t PMDT_eigvalsh( PMDT_t H ): return eigenvalues of Hermitian matrix \(\hat{H}\) (only floating point)PVDd_t PMDT_eigh( PMDT_t H, PMDT_t* pU ): return eigenvalues of Hermitian matrix \(\hat{H}\) and eigenvectors inpU.PVDd_t PMDT_svd( PMDT_t H, PMDT_t* pU, PMDT_t* pVT ): perform an SVD of the matrix \(\hat{H} = \hat{U} \cdot \mathrm{diag}(\boldsymbol{S}) \cdot \hat{V}^T\). return singular values \(\boldsymbol{S}\) (floating point) and left singular vectors \(\hat{U}\) (pU) as well as right singular vectors \(\hat{V}^T\) (pVT).
To get used to this mini-library, you can have a look at e.g. src/misc/hoppings_to_supercell.c (mostly integers, i.e.,
claV3i_t) or src/diverge_symmetrize_generate_symm_maps.c (3 dimensional double vectors, i.e.,claV3d_t). It feels a bit like stack-only Eigen3, but for C (so functions and no overloads).