File debian.changelog of Package elpa

elpa (2025.06.001-0) stable; urgency=low

  * improved GPU version of solve_tridi (up to 3x faster), overall
    eigenproblem improvement up to 15%
  * significantly improved AMD GPU version (up to 5x) for
    ROCm >= 6.4.2
  * Sycl backend: experimental support for Intel Toolkit >= 2024.01.
    Older Intel Toolkits are not supported anymore

 -- Tobias Melson <code@tobias-melson.de>  Thu, 21 Aug 2025 10:26:33 +0100

elpa (2025.01.002-0) stable; urgency=low

  * fixes a performance degradation in solve_tridi in the CPU
    version
  * allows to switch NCCL/RCCL usage off, even if compiled with
    this support
  * patch "0002-Fix-test-correctness-multiply.patch" included in
    this release and therefore removed

 -- Tobias Melson <code@tobias-melson.de>  Fri, 30 Apr 2025 07:49:13 +0100

elpa (2025.01.001-1) stable; urgency=low

  * port of tridiagonalization step to GPU: ELPA 1stage/2stage
    solvers achieve speedup of up to 40%
  * general multiplication routines for distributed matrices on
    GPUs
  * port of backtransformation in ELPA 1stage: speedup up to four
    achieved

 -- Tobias Melson <code@tobias-melson.de>  Fri, 14 Feb 2025 12:00:00 +0100

elpa (2024.05.001-1) stable; urgency=low

  * support of ROCM 6.x and preparation for AMD Mi300
  * allow internal matrix redistribution if device pointer API is
    used
  * do not try to autotune GPU code paths if no GPUs are available
  * implement a patch for a bug in cusolverDnXtrtri_bufferSize for
    CUDA versions < 12.1
  * PoC RCCL support for AMD GPUs, only for experienced users
  * significantly faster cholesky decomposition step
  * Automatic setting for cublas caching: with CUDA > 12.x a slow
    down had been observed since cublas assumed problematic caching
    values
  * Autoconf >= 2.71 required for building ELPA
  * enable gpu-streams per default for NVIDIA and AMD GPUs
  * Updated / improved documentation and man pages
  * Fixed compilation error on AMD GPUs
  * Fixed SVE 256 compute kernels
  * Allow (currently in parts of ELPA) to use NVIDIA NCCL for
    device to device commpunication
  * Speed up of GPU version of hermitian_multiply by up to an
    factor of 4
  * significantly faster full-to-tridiagonal step in ELPA 1stage
    GPU
  * significantly faster ELPA 2stage solver on Intel GPUs
  * Consistent enabling/disabling of SKEW_SYMMETRIC in header files
  * new setup_gpu API function
  * added CITATION.cff file
  * allow test programs to be run with 1 MPI task
  * correct a memory leak in the gpu stream setup
  * better handling of GPU BLAS handles
  * implement the execution of the AMD HIP code path on NVIDIA GPUs
  * implement the execution of the SYCL GPU code path on CPUs
    (debugging)
  * port generalized routines to SYCL GPU

 -- Tobias Melson <code@tobias-melson.de>  Mon,  8 Jul 2024 14:27:57 +0100
openSUSE Build Service is sponsored by