File _patchinfo of Package patchinfo.33050
<patchinfo incident="33050">
<issue tracker="bnc" id="1221813">[RN, OpenBLAS] Update to version 0.3.25</issue>
<issue tracker="jsc" id="PED-7926"/>
<issue tracker="jsc" id="PED-7927"/>
<packager>eeich</packager>
<rating>important</rating>
<category>feature</category>
<summary>Feature update for openblas</summary>
<description>This update for openblas fixes the following issues:
openblas was updated from version 0.3.21 to version 0.3.25 (jsc#PED-7926, jsc#PED-7927, bsc#1221813):
- Changes in version 0.3.25:
* General:
+ Improved the error message shown on exceeding the maximum
thread count
+ Improved the code to add supplementary thread buffers in
case of overflow
+ Fixed a potential division by zero in `?ROTG`
+ Improved the `?MATCOPY` functions to accept zero-sized rows or
columns
+ Corrected empty prototypes in function declarations
+ cleaned up unused declarations in the f2c-converted versions
of the LAPACK sources
+ Improved link line rewriting to avoid mixed libgomp/libomp
builds with clang&gfortran
+ imported the following changes from the upcoming release
3.12 of Reference-LAPACK: LAPACK PR 900, LAPACK PR 904,
LAPACK PR 907, LAPACK PR 909, LAPACK PR 926, LAPACK PR 927,
LAPACK PR 928 & 930
* Architecture x86-64:
+ Fixed capability-based fallback selection for unknown cpus
in `DYNAMIC_ARCH`
+ Added AVX512 optimizations for `?ASUM` on Intel Sapphire Rapids and
Cooper Lake
* Architecture ARM64:
+ Fixed building with XCODE 15
+ Fixed building on A64FX and Cortex A710/X1/X2
+ increased the default buffer size for recent arm server cpus
* Architecture POWER PC:
+ Added support for `DYNAMIC_ARCH` builds with clang
+ Fixed union declaration in the `BFLOAT16` test case
- Changes in version 0.3.24:
* General:
+ Declared the arguments of `cblas_xerbla` as `const`
(in accordance with the reference implementation
and others, the previous discrepancy appears to have dated
back to GotoBLAS)
+ Fixed the implementation of `?GEMMT` that was added in 0.3.23
+ made cpu-specific `SWITCH_RATIO` parameters for GEMM
available to `DYNAMIC_ARCH` builds
+ Fixed missing `SSYCONVF` function in the shared library
+ Fixed parallel build logic used with gmake
+ Fixed several issues with the handling of runtime limits on
the number of OPENMP threads
+ Corrected the error code returned by `SGEADD`/`DGEADD` when
LDA is too small
+ Corrected the error code returned by `IMATCOPY` when LDB
is too small
+ Updated `?NRM2` to support negative increment values (as
introduced in release 3.10.0 of the Reference BLAS)
+ Updated `?ROTG` to use the safe scaling algorithm introduced
in release 3.10.0 of the Reference BLAS
+ Fixed OpenMP builds with CLANG for the case where libomp is
not in a standard location
+ Fixed a potential overwrite of unrelated memory during
thread initialisation on startup
+ Fixed a potential integer overflow in the multithreading
threshold for `?SYMM`/`?SYRK`
+ Fixed build of the LAPACKE interfaces for the LAPACK 3.11.0
`?TRSYL` functions added in 0.3.22
+ Applied additions and corrections from the development
branch of Reference-LAPACK:
- Fixed actual arguments passed to a number of LAPACK
functions (from Reference-LAPACK PR 885)
- Fixed workspace query results in LAPACK `?SYTRF`/`?TRECV3`
(from Reference-LAPACK PR 883)
- Fixed derivation of the UPLO parameter in `LAPACKE_?larfb`
(from Reference-LAPACK PR 878)
- Fixed a crash in LAPACK `?GELSDD` on `NRHS=0` (from
Reference-LAPACK PR 876)
- Added new LAPACK utility functions `CRSCL` and `ZRSCL`
(from Reference-LAPACK PR 839)
- Corrected the order of eigenvalues for 2x2 matrices in
`?STEMR` (Reference-LAPACK PR 867)
- Removed spurious reference to OpenMP variables outside
OpenMP contexts (Reference-LAPACK PR 860)
- Updated file comments on use of `LAMBDA` variable in
LAPACK (Reference-LAPACK PR 852)
- Fixed documentation of LAPACK `SLASD0`/`DLASD0`
(Reference-LAPACK PR 855)
- Fixed confusing use of "minor" in LAPACK documentation
(Reference-LAPACK PR 849)
- Added new LAPACK functions ?GEDMD for dynamic mode
decomposition (Reference-LAPACK PR 736)
- Fixed potential stack overflows in the `EIG` part of the
LAPACK testsuite (Reference-LAPACK PR 854)
- Applied small improvements to the variants of
Cholesky and QR functions (Reference-LAPACK PR 847)
- Removed unused variables from LAPACK `?BDSQR`
(Reference-LAPACK PR 832)
- Fixed a potential crash on allocation failure in LAPACKE
`SGEESX`/`DGEESX` (Reference-LAPACK PR 836)
- Added a quick return from `SLARUV`/`DLARUV` for N < 1
(Reference-LAPACK PR 837)
- Updated function descriptions in LAPACK `?GEGS`/`?GEGV`
(Reference-LAPACK PR 831)
- Improved algorithm description in `?GELSY`
(Reference-LAPACK PR 833)
- Fixed scaling in LAPACK `STGSNA`/`DTGSNA`
(Reference-LAPACK PR 830)
- Fixed crash in `LAPACKE_?geqrt` with row-major data
(Reference-LAPACK PR 768)
- Added LAPACKE interfaces for `C/ZUNHR_COL` and
`S/DORHR_COL` (Reference-LAPACK PR 827)
- Added error exit tests for `SYSV`/`SYTD2`/`GEHD2` to
the testsuite (Reference-LAPACK PR 795)
- Fixed typos in LAPACK source and comments
(Reference-LAPACK PRs 809,811,812,814,820)
- Adopt refactored `?GEBAL` implementation
(Reference-LAPACK PR 808)
* Architecture x86_64:
+ Added cpu model autodetection for Intel Alder Lake N
+ Added activation of the AMX tile to the Sapphire Rapids
`SBGEMM` kernel
+ worked around miscompilations of GEMV/SYMV kernels by
gcc's tree-vectorizer
+ Fixed runtime detection of Cooperlake and Sapphire Rapids
in `DYNAMIC_ARCH`
+ Fixed feature-based cputype fallback in `DYNAMIC_ARCH`
+ Corrected `ZAXPY` result on old pre-AVX hardware for the
`INCX=0` case
+ Fixed a potential use of uninitialized variables in ZTRSM
* Architecture ARMV8:
+ implemented SWITCH_RATIO parameter for improved GEMM
performance on Neoverse
+ activated SVE SGEMM and DGEMM kernels for Neoverse V1
+ Improved performance of the SVE CGEMM and ZGEMM kernels
on Neoverse V1
+ Improved kernel selection for the ARMV8SVE target and added
it to `DYNAMIC_ARCH`
+ Fixed runtime check for SVE availability in `DYNAMIC_ARCH`
builds to take OS or container restrictions into account
+ Fixed a potential use of uninitialized variables in ZTRSM
* Architecture POWER PC:
+ Fixed compiler warnings in the POWER10 SBGEMM kernel
- Changes in version 0.3.23:
* General:
+ Fixed a serious regression in `GETRF`/`GETF2` and
`ZGETRF`/`ZGETF2` where subnormal but nonzero data elements
triggered the singularity flag
+ Fixed a long-standing bug in `CSPR`/`ZSPR` in single-threaded
operation
+ for cases where elements of the X vector are real numbers (or
complex with only the real part zero)
* Architecture x86_64:
+ Added further CPUID values for Intel Raptor Lake
- Changes in version 0.3.22:
* General:
+ Updated the included LAPACK to Reference-LAPACK release 3.11.0
plus post-release corrections and improvements
+ Added a threshold for multithreading in `SYMM`, `SYMV` and
`SYR2K`
+ Increased the threshold for multithreading in `SYRK`
+ OpenBLAS no longer decreases the global `OMP_NUM_THREADS`
when it exceeds the maximum thread count the library was
compiled for.
+ Fixed `?GETF2` potentially returning `NaN` with tiny matrix
elements
+ Fixed `openblas_set_num_threads` to work in `USE_OPENMP`
builds.
+ Fixed cpu core counting in `USE_OPENMP` builds returning the
number of OMP "places" rather than cores
+ Fixed stride calculation in the optimized small-matrix path of
complex `SYR`
+ Fixed building of Reference-LAPACK with recent gfortran
+ Added new environment variable `OPENBLAS_DEFAULT_NUM_THREADS`
+ Added a GEMV-based implementation of `GEMMT`
* Architecture x86_64:
+ Added autodetection of Intel Raptor Lake cpu models
+ Added SSCAL microkernels for Haswell and newer targets
+ Improved the performance of the Haswell DSCAL microkernel
+ Added CSCAL and ZSCAL microkernels for SkylakeX targets
+ Fixed detection of gfortran and Cray CCE compilers
+ Fixed runtime selection of COOPERLAKE in `DYNAMIC_ARCH` builds
+ Worked around gcc/llvm using risky FMA operations in
CSCAL/ZSCAL
* Architecture ARMV8:
+ Fixed cross-compilation to CortexA53 with CMAKE
+ Fixed compilation with CMAKE and "Arm Compiler for Linux 22.1"
+ Added cpu autodetection for Cortex X3 and A715
+ Fixed conditional compilation of SVE-capable targets in
`DYNAMIC_ARCH`
+ sped up SVE kernels by removing unnecessary prefetches
+ Improved the GEMM performance of Neoverse V1
+ Added SVE kernels for SDOT and DDOT
+ Added an SBGEMM kernel for Neoverse N2
+ Improved cpu-specific compiler option selection for
Neoverse cpus
+ Added support for setting `CONSISTENT_FPCSR`
</description>
</patchinfo>