File debian.changelog of Package elpa
elpa (2025.06.001-0) stable; urgency=low
* improved GPU version of solve_tridi (up to 3x faster), overall
eigenproblem improvement up to 15%
* significantly improved AMD GPU version (up to 5x) for
ROCm >= 6.4.2
* Sycl backend: experimental support for Intel Toolkit >= 2024.01.
Older Intel Toolkits are not supported anymore
-- Tobias Melson <code@tobias-melson.de> Thu, 21 Aug 2025 10:26:33 +0100
elpa (2025.01.002-0) stable; urgency=low
* fixes a performance degradation in solve_tridi in the CPU
version
* allows to switch NCCL/RCCL usage off, even if compiled with
this support
* patch "0002-Fix-test-correctness-multiply.patch" included in
this release and therefore removed
-- Tobias Melson <code@tobias-melson.de> Fri, 30 Apr 2025 07:49:13 +0100
elpa (2025.01.001-1) stable; urgency=low
* port of tridiagonalization step to GPU: ELPA 1stage/2stage
solvers achieve speedup of up to 40%
* general multiplication routines for distributed matrices on
GPUs
* port of backtransformation in ELPA 1stage: speedup up to four
achieved
-- Tobias Melson <code@tobias-melson.de> Fri, 14 Feb 2025 12:00:00 +0100
elpa (2024.05.001-1) stable; urgency=low
* support of ROCM 6.x and preparation for AMD Mi300
* allow internal matrix redistribution if device pointer API is
used
* do not try to autotune GPU code paths if no GPUs are available
* implement a patch for a bug in cusolverDnXtrtri_bufferSize for
CUDA versions < 12.1
* PoC RCCL support for AMD GPUs, only for experienced users
* significantly faster cholesky decomposition step
* Automatic setting for cublas caching: with CUDA > 12.x a slow
down had been observed since cublas assumed problematic caching
values
* Autoconf >= 2.71 required for building ELPA
* enable gpu-streams per default for NVIDIA and AMD GPUs
* Updated / improved documentation and man pages
* Fixed compilation error on AMD GPUs
* Fixed SVE 256 compute kernels
* Allow (currently in parts of ELPA) to use NVIDIA NCCL for
device to device commpunication
* Speed up of GPU version of hermitian_multiply by up to an
factor of 4
* significantly faster full-to-tridiagonal step in ELPA 1stage
GPU
* significantly faster ELPA 2stage solver on Intel GPUs
* Consistent enabling/disabling of SKEW_SYMMETRIC in header files
* new setup_gpu API function
* added CITATION.cff file
* allow test programs to be run with 1 MPI task
* correct a memory leak in the gpu stream setup
* better handling of GPU BLAS handles
* implement the execution of the AMD HIP code path on NVIDIA GPUs
* implement the execution of the SYCL GPU code path on CPUs
(debugging)
* port generalized routines to SYCL GPU
-- Tobias Melson <code@tobias-melson.de> Mon, 8 Jul 2024 14:27:57 +0100