File _patchinfo of Package patchinfo.38884
<patchinfo incident="38884">
<issue tracker="cve" id="2025-43904"/>
<issue tracker="bnc" id="1243666">VUL-0: CVE-2025-43904: slurm,slurm_18_08,slurm_20_02,slurm_20_11,slurm_22_05,slurm_23_02,slurm_24_11,slurmlibs: Coordinators could promote users to Admin</issue>
<packager>eeich</packager>
<rating>important</rating>
<category>security</category>
<summary>Security update for slurm_24_11</summary>
<description>This update for slurm_24_11 fixes the following issues:
Update to version 24.11.5.
Security issues fixed:
- CVE-2025-43904: an issue with permission handling for Coordinators within the accounting system allowed Coordinators
to promote a user to Administrator (bsc#1243666).
Other changes and issues fixed:
- Changes from version 24.11.5
* Return error to `scontrol` reboot on bad nodelists.
* `slurmrestd` - Report an error when QOS resolution fails for
v0.0.40 endpoints.
* `slurmrestd` - Report an error when QOS resolution fails for
v0.0.41 endpoints.
* `slurmrestd` - Report an error when QOS resolution fails for
v0.0.42 endpoints.
* `data_parser/v0.0.42` - Added `+inline_enums` flag which
modifies the output when generating OpenAPI specification.
It causes enum arrays to not be defined in their own schema
with references (`$ref`) to them. Instead they will be dumped
inline.
* Fix binding error with `tres-bind map/mask` on partial node
allocations.
* Fix `stepmgr` enabled steps being able to request features.
* Reject step creation if requested feature is not available
in job.
* `slurmd` - Restrict listening for new incoming RPC requests
further into startup.
* `slurmd` - Avoid `auth/slurm` related hangs of CLI commands
during startup and shutdown.
* `slurmctld` - Restrict processing new incoming RPC requests
further into startup. Stop processing requests sooner during
shutdown.
* `slurmcltd` - Avoid auth/slurm related hangs of CLI commands
during startup and shutdown.
* `slurmctld` - Avoid race condition during shutdown or
ereconfigure that could result in a crash due delayed
processing of a connection while plugins are unloaded.
* Fix small memleak when getting the job list from the database.
* Fix incorrect printing of `%` escape characters when printing
stdio fields for jobs.
* Fix padding parsing when printing stdio fields for jobs.
* Fix printing `%A` array job id when expanding patterns.
* Fix reservations causing jobs to be held for `Bad Constraints`.
* `switch/hpe_slingshot` - Prevent potential segfault on failed
curl request to the fabric manager.
* Fix printing incorrect array job id when expanding stdio file
names. The `%A` will now be substituted by the correct value.
* Fix printing incorrect array job id when expanding stdio file
names. The `%A` will now be substituted by the correct value.
* `switch/hpe_slingshot` - Fix VNI range not updating on slurmctld
restart or reconfigre.
* Fix steps not being created when using certain combinations of
`-c` and `-n` inferior to the jobs requested resources, when
using stepmgr and nodes are configured with
`CPUs == Sockets*CoresPerSocket`.
* Permit configuring the number of retry attempts to destroy CXI
service via the new destroy_retries `SwitchParameter`.
* Do not reset `memory.high` and `memory.swap.max` in slurmd
startup or reconfigure as we are never really touching this
in `slurmd`.
* Fix reconfigure failure of slurmd when it has been started
manually and the `CoreSpecLimits` have been removed from
`slurm.conf`.
* Set or reset CoreSpec limits when slurmd is reconfigured and
it was started with systemd.
* `switch/hpe-slingshot` - Make sure the slurmctld can free
step VNIs after the controller restarts or reconfigures while
the job is running.
* Fix backup `slurmctld` failure on 2nd takeover.
- Changes from version 24.11.4
* `slurmctld`,`slurmrestd` - Avoid possible race condition that
could have caused process to crash when listener socket was
closed while accepting a new connection.
* `slurmrestd` - Avoid race condition that could have resulted
in address logged for a UNIX socket to be incorrect.
* `slurmrestd` - Fix parameters in OpenAPI specification for the
following endpoints to have `job_id` field:
```
GET /slurm/v0.0.40/jobs/state/
GET /slurm/v0.0.41/jobs/state/
GET /slurm/v0.0.42/jobs/state/
GET /slurm/v0.0.43/jobs/state/
```
* `slurmd` - Fix tracking of thread counts that could cause
incoming connections to be ignored after burst of simultaneous
incoming connections that trigger delayed response logic.
* Avoid unnecessary `SRUN_TIMEOUT` forwarding to `stepmgr`.
* Fix jobs being scheduled on higher weighted powered down nodes.
* Fix how backfill scheduler filters nodes from the available
nodes based on exclusive user and `mcs_label` requirements.
* `acct_gather_energy/{gpu,ipmi}` - Fix potential energy
consumption adjustment calculation underflow.
* `acct_gather_energy/ipmi` - Fix regression introduced in 24.05.5
(which introduced the new way of preserving energy measurements
through slurmd restarts) when `EnergyIPMICalcAdjustment=yes`.
* Prevent `slurmctld` deadlock in the assoc mgr.
* Fix memory leak when `RestrictedCoresPerGPU` is enabled.
* Fix preemptor jobs not entering execution due to wrong
calculation of accounting policy limits.
* Fix certain job requests that were incorrectly denied with
node configuration unavailable error.
* `slurmd` - Avoid crash due when slurmd has a communications
failure with `slurmstepd`.
* Fix memory leak when parsing yaml input.
* Prevent `slurmctld` from showing error message about `PreemptMode=GANG`
being a cluster-wide option for `scontrol update part` calls
that don't attempt to modify partition PreemptMode.
* Fix setting `GANG` preemption on partition when updating
`PreemptMode` with `scontrol`.
* Fix `CoreSpec` and `MemSpec` limits not being removed
from previously configured slurmd.
* Avoid race condition that could lead to a deadlock when `slurmd`,
`slurmstepd`, `slurmctld`, `slurmrestd` or `sackd` have a fatal
event.
* Fix jobs using `--ntasks-per-node` and `--mem` keep pending
forever when the requested mem divided by the number of CPUs
will surpass the configured `MaxMemPerCPU`.
* `slurmd` - Fix address logged upon new incoming RPC connection
from `INVALID` to IP address.
* Fix memory leak when retrieving reservations. This affects
`scontrol`, `sinfo`, `sview`, and the following `slurmrestd`
endpoints:
`GET /slurm/{any_data_parser}/reservation/{reservation_name}`
`GET /slurm/{any_data_parser}/reservations`
* Log warning instead of `debuflags=conmgr` gated log when
deferring new incoming connections when number of active
connections exceed `conmgr_max_connections`.
* Avoid race condition that could result in worker thread pool
not activating all threads at once after a reconfigure resulting
in lower utilization of available CPU threads until enough
internal activity wakes up all threads in the worker pool.
* Avoid theoretical race condition that could result in new
incoming RPC
socket connections being ignored after reconfigure.
* slurmd - Avoid race condition that could result in a state
where new incoming RPC connections will always be ignored.
* Add ReconfigFlags=KeepNodeStateFuture to restore saved `FUTURE`
node state on restart and reconfig instead of reverting to
`FUTURE` state. This will be made the default in 25.05.
* Fix case where hetjob submit would cause `slurmctld` to crash.
* Fix jobs using `--cpus-per-gpu` and `--mem` keep pending forever
when the requested mem divided by the number of CPUs will surpass
the configured `MaxMemPerCPU`.
* Enforce that jobs using `--mem` and several `--*-per-*` options
do not violate the `MaxMemPerCPU` in place.
* `slurmctld` - Fix use-cases of jobs incorrectly pending held
when `--prefer` features are not initially satisfied.
* `slurmctld` - Fix jobs incorrectly held when `--prefer` not
satisfied in some use-cases.
* Ensure `RestrictedCoresPerGPU` and `CoreSpecCount` don't overlap.
- Changes from version 24.11.3
* Fix database cluster ID generation not being random.
* Fix a regression in which `slurmd -G` gave no output.
* Fix a long-standing crash in `slurmctld` after updating a
reservation with an empty nodelist. The crash could occur
after restarting slurmctld, or if downing/draining a node
in the reservation with the `REPLACE` or `REPLACE_DOWN` flag.
* Avoid changing process name to "`watch`" from original daemon name.
This could potentially breaking some monitoring scripts.
* Avoid `slurmctld` being killed by `SIGALRM` due to race condition
at startup.
* Fix race condition in slurmrestd that resulted in "`Requested
data_parser plugin does not support OpenAPI plugin`" error being
returned for valid endpoints.
* Fix race between `task/cgroup` CPUset and `jobacctgather/cgroup`.
The first was removing the pid from `task_X` cgroup directory
causing memory limits to not being applied.
* If multiple partitions are requested, set the `SLURM_JOB_PARTITION`
output environment variable to the partition in which the job is
running for `salloc` and `srun` in order to match the documentation
and the behavior of `sbatch`.
* `srun` - Fixed wrongly constructed `SLURM_CPU_BIND` env variable
that could get propagated to downward srun calls in certain mpi
environments, causing launch failures.
* Don't print misleading errors for stepmgr enabled steps.
* `slurmrestd` - Avoid connection to slurmdbd for the following
endpoints:
```
GET /slurm/v0.0.41/jobs
GET /slurm/v0.0.41/job/{job_id}
```
* `slurmrestd` - Avoid connection to slurmdbd for the following
endpoints:
```
GET /slurm/v0.0.40/jobs
GET /slurm/v0.0.40/job/{job_id}
```
* `slurmrestd` - Fix possible memory leak when parsing arrays with
`data_parser/v0.0.40`.
* `slurmrestd` - Fix possible memory leak when parsing arrays with
`data_parser/v0.0.41`.
* `slurmrestd` - Fix possible memory leak when parsing arrays with
`data_parser/v0.0.42`.
- Changes from version 24.11.2
* Fix segfault when submitting `--test-only` jobs that can
preempt.
* Fix regression introduced in 23.11 that prevented the
following flags from being added to a reservation on an
update: `DAILY`, `HOURLY`, `WEEKLY`, `WEEKDAY`, and `WEEKEND`.
* Fix crash and issues evaluating job's suitability for running
in nodes with already suspended job(s) there.
* `slurmctld` will ensure that healthy nodes are not reported as
`UnavailableNodes` in job reason codes.
* Fix handling of jobs submitted to a current reservation with
flags `OVERLAP,FLEX` or `OVERLAP,ANY_NODES` when it overlaps nodes
with a future maintenance reservation. When a job submission
had a time limit that overlapped with the future maintenance
reservation, it was rejected. Now the job is accepted but
stays pending with the reason "`ReqNodeNotAvail, Reserved for
maintenance`".
* `pam_slurm_adopt` - avoid errors when explicitly setting some
arguments to the default value.
* Fix QOS preemption with `PreemptMode=SUSPEND`.
* `slurmdbd` - When changing a user's name update lineage at the
same time.
* Fix regression in 24.11 in which `burst_buffer.lua` does not
inherit the `SLURM_CONF` environment variable from `slurmctld` and
fails to run if slurm.conf is in a non-standard location.
* Fix memory leak in slurmctld if `select/linear` and the
`PreemptParameters=reclaim_licenses` options are both set in
`slurm.conf`. Regression in 24.11.1.
* Fix running jobs, that requested multiple partitions, from
potentially being set to the wrong partition on restart.
* `switch/hpe_slingshot` - Fix compatibility with newer cxi
drivers, specifically when specifying `disable_rdzv_get`.
* Add `ABORT_ON_FATAL` environment variable to capture a backtrace
from any `fatal()` message.
* Fix printing invalid address in rate limiting log statement.
* `sched/backfill` - Fix node state `PLANNED` not being cleared from
fully allocated nodes during a backfill cycle.
* `select/cons_tres` - Fix future planning of jobs with
`bf_licenses`.
* Prevent redundant "`on_data returned rc: Rate limit exceeded,
please retry momentarily`" error message from being printed in
slurmctld logs.
* Fix loading non-default QOS on pending jobs from pre-24.11
state.
* Fix pending jobs displaying `QOS=(null)` when not explicitly
requesting a QOS.
* Fix segfault issue from job record with no `job_resrcs`.
* Fix failing `sacctmgr delete/modify/show` account operations
with `where` clauses.
* Fix regression in 24.11 in which Slurm daemons started
catching several `SIGTSTP`, `SIGTTIN` and `SIGUSR1` signals and
ignored them, while before they were not ignoring them. This
also caused slurmctld to not being able to shutdown after a
`SIGTSTP` because slurmscriptd caught the signal and stopped
while slurmctld ignored it. Unify and fix these situations and
get back to the previous behavior for these signals.
* Document that `SIGQUIT` is no longer ignored by `slurmctld`,
`slurmdbd`, and slurmd in 24.11. As of 24.11.0rc1, `SIGQUIT` is
identical to `SIGINT` and `SIGTERM` for these daemons, but this
change was not documented.
* Fix not considering nodes marked for reboot without ASAP in
the scheduler.
* Remove the `boot^` state on unexpected node reboot after return
to service.
* Do not allow new jobs to start on a node which is being
rebooted with the flag `nextstate=resume`.
* Prevent lower priority job running after cancelling an ASAP
reboot.
* Fix srun jobs starting on `nextstate=resume` rebooting nodes.
</description>
</patchinfo>