File _patchinfo of Package patchinfo.8580
<patchinfo incident="8580">
<issue tracker="bnc" id="1103561">[HPC, slurmctld] slurmctld fails to start on boot when using a shared remote StateSaveLocation</issue>
<issue tracker="bnc" id="1084917">SLURM: not possible to start jobs via srun command when second node is offline</issue>
<issue tracker="fate" id="326642"/>
<category>recommended</category>
<rating>moderate</rating>
<packager>eeich</packager>
<description>This update for slurm provides version 17.11.9 and fixes the following issues:
- When using a remote shared StateSaveLocation, slurmctld needs to be started after
remote filesystems have become available. (bsc#1103561)
- Fix race in the slurmctld backup controller which prevents it to clean up allocations on nodes
properly after failing over. (bsc#1084917)
- Fix segfault in slurmctld when a job's node bitmap is NULL during a scheduling cycle.
- Remove erroneous unlock in acct_gather_energy/ipmi.
- Enable support for hwloc version 2.0.1.
- Fix 'srun -q' (--qos) option handling.
- Fix socket communication issue that can lead to lost task completion messages, which
will cause a permanently stuck srun process.
- Avoid node layout fragmentation if running with a fixed CPU count but without Sockets and
CoresPerSocket defined.
- burst_buffer/cray: Fix datawarp swap default pool overriding jobdw.
- Fix incorrect job priority assignment for multi-partition job with different PriorityTier
settings on the partitions.
- Fix sinfo to print correct node state.
- Do not allocate nodes that were marked down due to the node not responding by ResumeTimeout.
- task/cray plugin: Search for "mems" cgroup information in the file "cpuset.mems" then fall
back to the file "mems".
- Fix ipmi profile debug uninitialized variable.
- PMIx: Fixed the direct connect inline msg sending.
- MYSQL: Fix issue not handling all fields when loading an archive dump.
- Allow a job_submit plugin to change the admin_comment field during job_submit_plugin_modify().
- job_submit/lua: Fix access into reservation table.
- MySQL: Prevent deadlock caused by archive logic locking reads.
- Don't enforce MaxQueryTimeRange when requesting specific jobs.
- Modify --test-only logic to properly support jobs submitted to more than one partition.
- Prevent slurmctld from abort when attempting to set non-existing qos as def_qos_id.
- Add new job dependency type of "afterburstbuffer". The pending job will be delayed until
the first job completes execution and it's burst buffer stage-out is completed.
- Reorder proctrack/task plugin load in the slurmstepd to match that of slurmd and avoid race
condition calling task before proctrack can introduce.
- Prevent reboot of a busy KNL node when requesting inactive features.
- Fix to reinitialize previously adjusted job members to their original value when validating
the job memory in multi-partition requests.
- Fix _step_signal() from always returning SLURM_SUCCESS.
- Combine active and available node feature change logs on one line rather than one line per
node for performance reasons.
- Prevent occasionally leaking freezer cgroups.
- Fix potential segfault when closing the mpi/pmi2 plugin.
- Fix issues with --exclusive=[user|mcs] to work correctly with preemption or when job requests
a specific list of hosts.
- mpi/pmix: Fixed the collectives canceling.
- SlurmDBD: Improve error message handling on archive load failure.
- Fix incorrect locking when deleting reservations.
- Fix incorrect locking when setting up the power save module.
- Fix setting format output length for squeue when showing array jobs.
- Add xstrstr function.
- Fix printing out of --hint options in sbatch, salloc --help.
- Prevent possible divide by zero in _validate_time_limit().
- Add Delegate=yes to the slurmd.service file to prevent systemd from interfering with the jobs'
cgroup hierarchies.
- Change the backlog argument to the listen() syscall within srun to 4096 to match elsewhere in
the code, and avoid communication problems at scale.
- Recommend slurm-munge for slurm-slurmdbd.
</description>
<summary>Recommended update for slurm</summary>
</patchinfo>