Revisions of slurm

buildservice-autocommit accepted request 1161658 from Christian Goll's avatar Christian Goll (mslacken) (revision 293)
baserev update by copy to link target
Christian Goll's avatar Christian Goll (mslacken) accepted request 1161499 from Christian Goll's avatar Christian Goll (mslacken) (revision 292)
- removed Keep-logs-of-skipped-test-when-running-test-cases-sequentially.patch
  as incoperated upstream
* Changes in Slurm 23.02.5
 * Add the JobId to debug() messages indicating when cpus_per_task/mem_per_cpu
   or pn_min_cpus are being automatically adjusted.
 * Fix regression in 23.02.2 that caused slurmctld -R to crash on startup if
   a node features plugin is configured.
 * Fix and prevent reoccurring reservations from overlapping.
 * job_container/tmpfs - Avoid attempts to share BasePath between nodes.
 * Change the log message warning for rate limited users from verbose to info.
 * With CR_Cpu_Memory, fix node selection for jobs that request gres and
   *-mem-per-cpu.
 * Fix a regression from 22.05.7 in which some jobs were allocated too few
   nodes, thus overcommitting cpus to some tasks.
 * Fix a job being stuck in the completing state if the job ends while the
   primary controller is down or unresponsive and the backup controller has
   not yet taken over.
 * Fix slurmctld segfault when a node registers with a configured CpuSpecList
   while slurmctld configuration has the node without CpuSpecList.
 * Fix cloud nodes getting stuck in POWERED_DOWN+NO_RESPOND state after not
   registering by ResumeTimeout.
 * slurmstepd - Avoid cleanup of config.json-less containers spooldir getting
   skipped.
 * slurmstepd - Cleanup per task generated environment for containers in
   spooldir.
 * Fix scontrol segfault when 'completing' command requested repeatedly in
   interactive mode.
 * Properly handle a race condition between bind() and listen() calls in the
   network stack when running with SrunPortRange set.
 * Federation - Fix revoked jobs being returned regardless of the -a/--all
Egbert Eich's avatar Egbert Eich (eeich) committed (revision 291)
    work correctly (boo#1204697).
buildservice-autocommit accepted request 1151965 from Egbert Eich's avatar Egbert Eich (eeich) (revision 290)
baserev update by copy to link target
Egbert Eich's avatar Egbert Eich (eeich) accepted request 1150524 from Egbert Eich's avatar Egbert Eich (eeich) (revision 289)
- Update to version 23.11.03
  * slurmrestd - Reject single http query with multiple path
    requests.
  * Fix launching Singularity v4.x containers with
    `srun --container` by setting .process.terminal to true in
    generated `config.json` when step has pseudoterminal (`--pty`)
    requested.
  * Fix loading in `dyanmic/cloud` node jobs after `net_cred`
    expired.
  * Fix cgroup null path error on `slurmd/slurmstepd` tear down.
  * `data_parser/v0.0.40` - Prevent failure if accounting is
    disabled, instead issue a warning if needed data from the
    database can not be retrieved.
  * `openapi/slurmctld` - Prevent failure if accounting is disabled.
  * Prevent `slurmscriptd` processing delays from blocking other
    threads in `slurmctld` while trying to launch various scripts.
    This is additional work for a fix in 23.02.6.
  * Fix memory leak when receiving alias addrs from controller.
  * `scontrol` - Accept `scontrol token lifespan=infinite` to
    create tokens that effectively do not expire.
  * Avoid errors when Slurmdb accounting disabled when `--json` or
    `--yaml` is invoked with CLI commands and `slurmrestd`. Add
    warnings when query would have populated data from Slurmdb
    instead of errors.
  * Fix `slurmctld` memory leak when running job with
    `--tres-per-task=gres:shard:#`
  * Fix backfill trying to start jobs outside of backfill window.
  * Fix oversubscription on partitions with `PreemptMode=OFF`.
  * Preserve node reason on power up if the node is downed
    or drained.
buildservice-autocommit accepted request 1141442 from Egbert Eich's avatar Egbert Eich (eeich) (revision 288)
baserev update by copy to link target
Egbert Eich's avatar Egbert Eich (eeich) committed (revision 287)
- Remove last change. This is not how it is intended to work
Christian Goll's avatar Christian Goll (mslacken) accepted request 1141020 from Dominique Leuenberger's avatar Dominique Leuenberger (dimstar) (revision 286)
- Fix dependency of testsuite when building without hdf5
  (have_hdf5=0). The previously use construct
  %{?have_hdf5:%ts_depends: does not behave as intended by the
  line-author: %{?…:} does not include a question of value, but
  only if the variable is defined or undefind.
Egbert Eich's avatar Egbert Eich (eeich) committed (revision 285)
  CVE-2023-49933, CVE-2023-49934, CVE-2023-49935, CVE-2023-49936
  and CVE-2023-49937
  * Substantially overhauled the SlurmDBD association management
    code. For clusters updated to 23.11, account and user
    additions or removals are significantly faster than in prior
    releases.
  * Overhauled `scontrol reconfigure` to prevent configuration
    mistakes from disabling slurmctld and slurmd. Instead, an
    error will be returned, and the running configuration will
    persist. This does require updates to the systemd service
    files to use the `--systemd` option to `slurmctld` and `slurmd`.
  * Added a new internal `auth/cred` plugin - `auth/slurm`. This
    builds off the prior `auth/jwt` model, and permits operation
    of the `slurmdbd` and `slurmctld` without access to full
    directory information with a suitable configuration.
  * Added a new `--external-launcher` option to `srun`, which is
    automatically set by common MPI launcher implementations and
    ensures processes using those non-srun launchers have full
    access to all resources allocated on each node.
  * Reworked the dynamic/cloud modes of operation to allow for
    "fanout" - where Slurm communication can be automatically
    offloaded to compute nodes for increased cluster scalability.
  * Overhauled and extended the Reservation subsystem to allow
    for most of the same resource requirements as are placed on
    the job. Notably, this permits reservations to now reserve
    GRES directly.
  * Fix `scontrol update job=... TimeLimit+=/-=` when used with a
    raw JobId of job array element.
  * Reject `TimeLimit` increment/decrement when called on job with
    `TimeLimit=UNLIMITED`.
Egbert Eich's avatar Egbert Eich (eeich) accepted request 1138332 from Christian Goll's avatar Christian Goll (mslacken) (revision 284)
- Update to 23.11.1 with following major improvements and fixing
  CVE-2023-49933, CVE-2023-49934, CVE-2023-49935, CVE-2023-49936 and
  CVE-2023-49937
  * Substantially overhauled the SlurmDBD association management code. For
    clusters updated to 23.11, account and user additions or removals are
    significantly faster than in prior releases.
  * Overhauled 'scontrol reconfigure' to prevent configuration mistakes from
    disabling slurmctld and slurmd. Instead, an error will be returned, and the
    running configuration will persist. This does require updates to the
    systemd service files to use the --systemd option to slurmctld and slurmd.
  * Added a new internal auth/cred plugin - "auth/slurm". This builds off the
    prior auth/jwt model, and permits operation of the slurmdbd and slurmctld
    without access to full directory information with a suitable configuration.
  * Added a new --external-launcher option to srun, which is automatically set
    by common MPI launcher implementations and ensures processes using those
    non-srun launchers have full access to all resources allocated on each
    node.
  * Reworked the dynamic/cloud modes of operation to allow for "fanout" - where
    Slurm communication can be automatically offloaded to compute nodes for
    increased cluster scalability.
    Added initial official Debian packaging support.
  * Overhauled and extended the Reservation subsystem to allow for most of the
    same resource requirements as are placed on the job. Notably, this permits
    reservations to now reserve GRES directly.
- Details of changes:
  * Fix scontrol update job=... TimeLimit+=/-= when used with a raw JobId of job
    array element.
  * Reject TimeLimit increment/decrement when called on job with
    TimeLimit=UNLIMITED.
  * Fix issue with requesting a job with  *licenses as well as
buildservice-autocommit accepted request 1137045 from Egbert Eich's avatar Egbert Eich (eeich) (revision 283)
baserev update by copy to link target
Egbert Eich's avatar Egbert Eich (eeich) accepted request 1136624 from Egbert Eich's avatar Egbert Eich (eeich) (revision 282)
- Update to 23.02.6 to fix (CVE-2023-49933 - bsc#1218046, CVE-2023-49935 -
  bsc#1218049, CVE-2023-49936 - bsc#1218050, CVE-2023-49937 - bsc#1218051,
  CVE-2023-49938 - bsc#1218053)
  * Security Fixes:
    + Add `JobAcctGatherParams=DisableGPUAcct` to disable gpu accounting.
    + `acct_gather_energy/ipmi` - Improve logging of DCMI issues.
    + `gpu/oneapi` - Add support for new env vars `ZE_FLAT_DEVICE_HIERARCHY`
      and `ZE_ENABLE_PCI_ID_DEVICE_ORDER`.
    + `data_parser/v0.0.39` - skip empty string when parsing QOS ids.
    + Remove error message from `assoc_mgr_update_assocs` when purposefully
      resetting the default QOS.
  * Bug Fixes:
    + `libslurm_nss` - Avoid causing glibc to assert due to an unexpected
      return from slurm_nss due to an error during lookup.
    + Fix job requests with `--tres-per-task` sometimes resulting in bad
      allocations that cannot run subsequent job steps.
    + Fix issue with `slurmd` where `srun` fails to be warned when a node
      prolog script runs beyond `MsgTimeout` set in `slurm.conf`.
    + `gres/shard` - Fix plugin functions to have matching parameter orders.
    + `gpu/nvml` - Fix issue that resulted in the wrong MIG devices being
      constrained to a job
    + `gpu/nvml` - Fix linking issue with MIGs that prevented multiple MIGs
      being used in a single job for certain MIG configurations
    + Fix file descriptor leak in slurmd when using `acct_gather_energy/ipmi`
      with DCMI devices.
    + `sview` - avoid crash when job has a node list string > 49 characters.
    + Prevent `slurmctld` crash during reconfigure when packing job start
      messages.
    + Preserve reason uid on reconfig.
    + Update node reason with updated `INVAL` state reason if different from
buildservice-autocommit accepted request 1130097 from Egbert Eich's avatar Egbert Eich (eeich) (revision 281)
baserev update by copy to link target
Egbert Eich's avatar Egbert Eich (eeich) accepted request 1130096 from Egbert Eich's avatar Egbert Eich (eeich) (revision 280)
- Add missing service file for slurmrestd (boo#1217711).
Egbert Eich's avatar Egbert Eich (eeich) accepted request 1129638 from Egbert Eich's avatar Egbert Eich (eeich) (revision 279)
- Explicitly create an Obsoletes: entry for each package version
  that is obsoleted by the present version. These are all published
  versions of the last two major releases as well as all minor
  versions of the present release lower than the current one
  (bsc#1216869 2nd part).
  This prevents the current version to upgrade a old Slurm version
  for which no upgrade path exists.
buildservice-autocommit accepted request 1129192 from Factory Maintainer's avatar Factory Maintainer (factory-maintainer) (revision 278)
baserev update by copy to link target
Egbert Eich's avatar Egbert Eich (eeich) committed (revision 277)
- On SLE-12 exclude build for s390x.
buildservice-autocommit accepted request 1123596 from Egbert Eich's avatar Egbert Eich (eeich) (revision 276)
baserev update by copy to link target
Egbert Eich's avatar Egbert Eich (eeich) accepted request 1123595 from Egbert Eich's avatar Egbert Eich (eeich) (revision 275)
- Add missing dependencies to slurm-config to plugins package.
  These should help to tie down the slurm version and help to avoid
  a package mix (bsc#1216869).
buildservice-autocommit accepted request 1121548 from Factory Maintainer's avatar Factory Maintainer (factory-maintainer) (revision 274)
baserev update by copy to link target
Displaying revisions 1 - 20 of 293
openSUSE Build Service is sponsored by