Linear Algebra

divERGe exports some of its linear algebra functions to the C interface. They are useful, because they provide convenient wrappers around annoying CUDA functions when compiling with GPU support.

void batched_eigen_r(int *devices, int ndevices, complex128_t *H, double *E, index_t dim, index_t Num) [source]

batched eigensolver for Hermitian matrices, good for many small matrices.

Parameters:

devices (int*) – list of CUDA devices, may be NULL, in which case control can be achieved through ndevices: if ndevices == 0 do CPU eigensolution. If ndevices == -1 do GPU eigensolution with automatic device find. If ndevices == -2 do GPU eigensolution with automatic device find and automatic chunking. If CUDA is not compiled in, devices is ignored.
ndevices (int) – number of CUDA devices. If devices != NULL and ndevices < 1 do automatic setup as if deivecs == NULL. Otherwise, use the device list provided in devices.
H (complex128_t*) – input and output for the matrices to diagonalize. Matrices must be contiguous in memory and ordered as (i,o,p), where the latter two indices correspond to the matrix rows p and columns o (i.e. FORTRAN order). On output, the eigenvectors are stored as (i,b,o) with i the index of the eigenproblem, b the index of the eigenvalue, and o the vector component.
E (double*) – output array for the eigenvalues. Given as (i,b) in memory, with each individual eigenproblem sorted in standard fashion.
dim – linear size of each individual problem (b, o, p)
Num – number of eigenproblems (i)

void batched_eigen_r_set_eigen3_mode(int use_eigen3) [source]: force the CPU eigensolver built with Eigen3 for batched_eigen_r()

int batched_eigen_r_get_eigen3_mode(void) [source]: get what the current mode is

void batched_eigen_set_nchunks(int nchunks) [source]: set the number of chunks to use in the batched eigen CUDA backend (which operates with 32bit indices, thus enforcing an additional chunking). not thread safe.

int batched_eigen_get_nchunks(void) [source]: get the number of chunks for CUDA. not thread safe.

void batched_eigen_set_nthreads(int nthr) [source]: we need this (annoying) function because some LAPACKE implementations won’t accept running with the implicitly set number of OMP threads

void batched_eigen_shut_up(int silence) [source]: disable or enable all mpi_..._printf() messaged in batched_eigen() and batched_eigen_r(). not thread safe.

void batched_svd(int *dev, int n_dev, complex128_t *A, complex128_t *U, complex128_t *V, double *S, index_t dim, index_t cnt) [source]

batched SVD on square matrices. if n_dev < 0, use GPU with automatic devicing. if n_dev == 0, use CPU. if CUDA is not compiled in, always use CPU. Notably, there is no batched SVD algorithm in CUSOLVER. Therefore, we manually up/download the matrices to the GPUs, making this solver efficient as an eigensolver for larger matrices (see batched_eigen() for small matrices).

special features:

if n_dev == INT_MIN, do eigensolution instead of SVD. The result is written to U, and V is not touched.

if n_dev == INT_MIN+1 same as above, but force CPU algorithm even if CUDA is compiled in

U and V may be NULL, in which case nothing is copied into the output arrays. one of U or V may be the same as A, in which case the input is overwritten.

void batched_gemm_overlapping(const complex128_t *Astart, const complex128_t *Bstart, complex128_t *Cstart, index_t N, complex128_t alpha, complex128_t beta, index_t num) [source]: C may overlap with A/B

void batched_gemm(const complex128_t *Astart, const complex128_t *Bstart, complex128_t *Cstart, index_t N, complex128_t alpha, complex128_t beta, index_t num) [source]: C must not overlap with A/B

void batched_gemm_with_buffers(const complex128_t *Astart, const complex128_t *Bstart, complex128_t *Cstart, char *GPUbuffer[32][3], index_t N, complex128_t alpha, complex128_t beta, index_t num) [source]: non-overlapping, with GPU buffers in order to be cheap on allocations

void batched_gemm_vertex_loop(complex128_t *vertex, const complex128_t *loop, index_t qsize, index_t nk, index_t nb) [source]: specialized functions for vertex·loop product in grid code

void batched_gemm_loop_vertex(const complex128_t *loop, complex128_t *vertex, index_t qsize, index_t nk, index_t nb) [source]: specialized functions for loop-vertex product in grid code

void batched_gemm_cublas_init(index_t N) [source]: initialization routine

void batched_gemm_cublas_clear(void) [source]: deinit

void batched_gemm_cublas_w_buffer_clear(void) [source]: deinit the case where buffers are used

void batched_gemm_N_ll_num(const complex128_t *A, const complex128_t *B, complex128_t *C, index_t N, complex128_t alpha, complex128_t beta, index_t num, bool Aconj, bool Bconj) [source]: assumes col-major order of each individual matrix, A and B may overlap

type batched_gemm_small_handle_t [source]: forward declaration of opaque struct

batched_gemm_small_handle_t *batched_gemm_small_init(index_t n, index_t num) [source]: initialization

void batched_gemm_small_destroy(batched_gemm_small_handle_t *h) [source]: free resources

enum batched_gemm_small_mode_t [source]

collection of operation modes for batched_gemm_small()

enumerator batched_gemm_small_mode_cpu [source]: CPU mode

enumerator batched_gemm_small_mode_gpu [source]: GPU mode

void batched_gemm_small_mode(batched_gemm_small_handle_t *h, batched_gemm_small_mode_t mode) [source]: to change the automatically generated mode (either CPU or GPU depending on how the compilation went) to something else, we can use this function on an existing handle

void batched_gemm_small(batched_gemm_small_handle_t *h, const complex128_t *A, const complex128_t *B, complex128_t *C) [source]: assumes C ordering for all matrices, with one matrix being in memory straight after the other and no additional strides within the matrices.

void batched_zgemd_r(complex128_t *out, const complex128_t *U, const double *E, complex128_t alpha, complex128_t beta, index_t N, index_t num) [source]

calculate batched ‘zgemd’ product

\(O = U * \mathrm{diag}(E) * U^\dagger * \alpha + O * \beta\)

for the arrays out( num, N, N ) (\(O\)), U( num, N, N ), E( num, N )

Note

currently, no GPU version is implemented.

void single_eigen(const complex128_t *H, complex128_t *U, double *E, index_t N) [source]: solve a single eigen problem for densely packed matrices in C order

void single_svd(const complex128_t *H, complex128_t *U, complex128_t *V, double *S, index_t N) [source]: solve a single SVD for densely packed matrices in C order. if N<0, use non-hermitian solver (dim=|N|) and copy the real part of the eigenvalues

void single_eigen_sort(complex128_t *U, double *E, index_t N, char which) [source]

sort eigenvectors and eigenvalues of on eigenproblem according to a given algorithm. Notably, the sort function expects FORTRAN ordered eigenvectors, whereas the single_eigen() and single_svd() functions return C ordered eigenvectors.

Sorting options (which):

0:

default ('M')

'A':

alternating (max negative, max positive, …)

'P':

max positive

'N':

max negative

'M':

magnitude

'm':

inverse magnitude

index_t single_eigen_sort_part(complex128_t *U, double *E, index_t N, char which, index_t num) [source]: same as single_eigen_sort(), same as above, but only copy over the first num values. return the number of values written.

index_t single_eigen_sort_N(complex128_t *U, double *E, index_t N, char which, double abs_limit) [source]: same as single_eigen_sort() in terms of sorting order given by which. In addition to the sorting order, we allow for a relative or absolute cutoff in abs_limit. if abs_limit<0, use relative cutoff. if abs_limit>0, use absolute cutoff. returns the number of elements the satify the cutoff condition. if abs_limit is all zero bytes, do not cutoff the eigensolution.

complex128_t power_iteration(const complex128_t *A, complex128_t *v, index_t N, complex128_t *aux, index_t *niter, double *tol, const int *iseed) [source]: do power iteration for the maximum eigenvector of matrix A (NxN, Row major). save eigenvector in v. auxiliary buffer aux can be NULL or of size 2*N. If niter != NULL, use the value pointed to as maximum number of iterations. If tol != NULL, use the value pointed to as minimum tolerance. If seed != NULL, use these four integers as seed to construct the random vector. The last integer must be odd.

void single_gemm(const complex128_t *A, const complex128_t *B, complex128_t *C, index_t M, index_t N, index_t K, complex128_t alpha, complex128_t beta) [source]: calculates the matrix product \(C = \beta C + \alpha A \cdot B\). A(N,K), B(K,M) and C(N,M) are matrices in C order (Row Major).

void single_gemm_serial(const complex128_t *A, const complex128_t *B, complex128_t *C, index_t N, index_t M, index_t K, complex128_t alpha, complex128_t beta) [source]: calculate the matrix product \(C(N,K) = alpha * A(N,M) * B(M,K) + beta * C(N,K)\) with all matrices in C order (Row Major), but with an Eigen3 backend (so no parallelization issues but less ideal runtime).

void single_gemm_adjoint(const complex128_t *A, const complex128_t *B, complex128_t *C, index_t M, index_t N, index_t K, complex128_t alpha, complex128_t beta) [source]: calculates the matrix product \(C = \beta C + \alpha A \cdot B^\dagger\). A(N,K), B(M,K) and C(N,M) are matrices in C order (Row Major).

void single_square_gemm(const complex128_t *A, int AconjTrans, const complex128_t *B, int BconjTrans, complex128_t *C, index_t n) [source]: calculates the GEMM \(C_{12} = A_{13} B_{32}\) for square matrices stored in C order, i.e., Column-Major. Allows to do Hermitian transposition on either A or B (if the integer argument is nozero).

void transpose_inplace(complex128_t *A, index_t N) [source]: in-place transposition of square matrix A (dimension NxN)

void transpose_outofplace(complex128_t *B, const complex128_t *A, index_t N) [source]: out-of-place transposition of square matrix A into B (dimension NxN)

void transpose_conj_outofplace(complex128_t *B, const complex128_t *A, index_t N) [source]: out-of-place adjoint of square matrix A into B (dimension NxN)

void matrix_invert(const double *A, index_t sz, double *Ainv) [source]: invert the (sz×sz) matrix A and output result to Ainv

void cmatrix_invert(const complex128_t *A, index_t sz, complex128_t *Ainv) [source]: same as matrix_invert() but for complex numbers

void cfmatrix_invert_F(const gf_complex_t *A, index_t sz, gf_complex_t *Ainv) [source]: same as cmatrix_invert() but for FORTRAN ordering and—depending on the compilation settings—different precision.

complex128_t cmatrix_determinant(const complex128_t *A, index_t sz) [source]: calculate the determinant of a complex square matrix

LINALG_IMPLEMENT [source]

Linear Algebra for low dimensions in C

In many parts of divERGe, linear algebra for low-dimensional vector spaces is required. We provide a collection of functions (inline accessible from C, compiled accessible from both C and C++) that are implemented through a macro. So, in your source code, you have to call the implementation macro after including linalg.h:

#include "linalg.h"
LINALG_IMPLEMENT

This gives you basic linear algebra functionality for double (d), complex double (cd), integer (i) and long long integer (ll) vectors and matrices in 2, 3, and 4 dimensions. More dimensions can be implemented with

LINALG_IMPL( LINALG_PREFIX, 7 )

where LINALG_PREFIX is expanded to cla (other prefices may be used by either defining LINALG_PREFIX or giving something else to LINALG_IMPL), and 7 represents the dimension. If you want to only declare functions and not implement them (for e.g. C++), you can write

LINALG_TYPES( LINALG_PREFIX, 7 )

This statement declares linear algebra functions in dimension 7. Notably, the default symbol visiblity is set to LINALG_EXPORT, which expands to static inline. If you want to use the functions in C++, you have to redefine LINALG_EXPORT to something else (how about extern "C"?) in your C++ file that includes linalg.h, as well as in your C file that includes linalg.h in order to define the functions declared to C++.

For enhanced portability, we provide the option to disable decomposition support (that relies on LAPACKE). When LINALG_NO_LAPACKE is defined, the decomposition functions are declared, but their definition is empty.

The funcion names are made from a prefix P, a vector/matrix indicator X (can be V or M), a dimension indicator D (an integer), and a datatype indicator T (the four datatype indicators are d: double, cd: complex128_t, i: int, ll: int64_t). The convention then is PXDT. For example, if the prefix is cla (the default), all functions operating on three-dimensional double vectors begin with claV3d, and the three dimensional vector data type is called claV3d_t.

Each macro invokation of LINALG_IMPL or LINALG_TYPES declares/defines the following types and functions:

Vectors: `PVDT_t`

The structure PVDT_t (e.g. claV3d_t for a three-dimensional vector of doubles) has one static array of datatype T as member: T v[D]. This member holds the vector’s data. The following vector functions are available:

PVDT_t PVDT_zero( void ): generate a zero vector

PVDT_t PVDT_const( T val ): generate a constant vector, where each entry is set to val

PVDT_t PVDT_map( const T* p ): map data to a vector (copy)

void PVDT_rmap( PVDT_t v, T* p ): map vector to data (copy)

PVDT_t PVDT_scale( PVDT_t v, T s ): return \(\boldsymbol{v} \, s\)

PVDT_t PVDT_add( PVDT_t a, PVDT_t b ): return \(\boldsymbol{a} + \boldsymbol{b}\)

PVDT_t PVDT_sub( PVDT_t a, PVDT_t b ): return \(\boldsymbol{a} - \boldsymbol{b}\)

PVDT_t PVDT_conj( PVDT_t a ): return \(\boldsymbol{a}^*\)

PVDT_t PVDT_odot( PVDT_t a, PVDT_t b ): return the element-wise product \(\boldsymbol{a} \circ \boldsymbol{b}\)

T PVDT_dot( PVDT_t a, PVDT_t b ): scalar product \(\boldsymbol{a} \cdot \boldsymbol{b}\)

double PVDT_norm2( PVDT_t a ): squared norm \(\boldsymbol{a} \cdot \boldsymbol{a}^*\)

double PVDT_norm( PVDT_t a ): square root of squared norm.

PVDT_t PVDT_cross( PVDT_t a, PVDT_t b ): return the cross product \(\boldsymbol{a} \times \boldsymbol{b}\)

Matrices: `PMDT_t`

The structure PMDT_t (e.g. claM3d_t for a \(3\times3\) matrix of doubles) has one two-dimensional static array of datatype T as member: T m[D][D]. This member holds the matrix’ data. The following matrix (-vector) functions are available:

PMDT_t PMDT_zero( void ): zero-init a matrix

PMDT_t PMDT_id( void ): indentity-init a matrix

PMDT_t PMDT_map( const T* p ): map data to a matrix (copy)

void PMDT_rmap( PMDT_t m, T * p ): map matrix to data (copy)

PMDT_t PMDT_scale( PMDT_t v, T s ): return \(\hat{v} \, s\)

PMDT_t PMDT_add( PMDT_t a, PMDT_t b ): return \(\hat{a} + \hat{b}\)

PMDT_t PMDT_sub( PMDT_t a, PMDT_t b ): return \(\hat{a} - \hat{b}\)

PMDT_t PMDT_conj( PMDT_t a ): return \(\hat{a}^*\)

PMDT_t PMDT_odot( PMDT_t a, PMDT_t b ): return \(\hat{a} \circ \hat{b}\)

T PMDT_dot( PMDT_t a, PMDT_t b ): return \(\mathrm{Tr}(\hat{a} \circ \hat{b})\)

double PMDT_norm2( PMDT_t a ): return squared norm \(\mathrm{Tr}(\hat{a} \circ \hat{a}^*)\)

double PMDT_norm( PMDT_t a ): return square root of squared norm

PMDT_t PMDT_matmul( PMDT_t a, PMDT_t b ): return matrix product \(\hat{a} \cdot \hat{b}\)

T PMDT_trace( PMDT_t a ): return trace \(\mathrm{Tr}(\hat a)\)

PVDT_t PMDT_matvec( PMDT_t a, PVDT_t v ): return matrix vector product \(\hat{a} \cdot \boldsymbol{v}\)

PMDT_t PMDT_transpose( PMDT_t a ): return transposed matrix \(\hat{a}^T\)

PMDT_t PMDT_adjoint( PMDT_t a ): return adjoint matrix \(\hat{a}^{T*}\)

PVDT_t PMDT_vecmat( PVDT_t v, PMDT_t a ): return vector matrix product \((\boldsymbol{v}^T \cdot \hat{a})^T\)

PVDd_t PMDT_eigvalsh( PMDT_t H ): return eigenvalues of Hermitian matrix \(\hat{H}\) (only floating point)

PVDd_t PMDT_eigh( PMDT_t H, PMDT_t* pU ): return eigenvalues of Hermitian matrix \(\hat{H}\) and eigenvectors in pU.

PVDd_t PMDT_svd( PMDT_t H, PMDT_t* pU, PMDT_t* pVT ): perform an SVD of the matrix \(\hat{H} = \hat{U} \cdot \mathrm{diag}(\boldsymbol{S}) \cdot \hat{V}^T\). return singular values \(\boldsymbol{S}\) (floating point) and left singular vectors \(\hat{U}\) (pU) as well as right singular vectors \(\hat{V}^T\) (pVT).

To get used to this mini-library, you can have a look at e.g. src/misc/hoppings_to_supercell.c (mostly integers, i.e., claV3i_t) or src/diverge_symmetrize_generate_symm_maps.c (3 dimensional double vectors, i.e., claV3d_t). It feels a bit like stack-only Eigen3, but for C (so functions and no overloads).

Linear Algebra

Linear Algebra for low dimensions in C

Vectors: PVDT_t

Matrices: PMDT_t

Vectors: `PVDT_t`

Matrices: `PMDT_t`