Revisions of slurm
buildservice-autocommit
accepted
request 1161658
from
Christian Goll (mslacken)
(revision 293)
baserev update by copy to link target
Christian Goll (mslacken)
accepted
request 1161499
from
Christian Goll (mslacken)
(revision 292)
- removed Keep-logs-of-skipped-test-when-running-test-cases-sequentially.patch as incoperated upstream * Changes in Slurm 23.02.5 * Add the JobId to debug() messages indicating when cpus_per_task/mem_per_cpu or pn_min_cpus are being automatically adjusted. * Fix regression in 23.02.2 that caused slurmctld -R to crash on startup if a node features plugin is configured. * Fix and prevent reoccurring reservations from overlapping. * job_container/tmpfs - Avoid attempts to share BasePath between nodes. * Change the log message warning for rate limited users from verbose to info. * With CR_Cpu_Memory, fix node selection for jobs that request gres and *-mem-per-cpu. * Fix a regression from 22.05.7 in which some jobs were allocated too few nodes, thus overcommitting cpus to some tasks. * Fix a job being stuck in the completing state if the job ends while the primary controller is down or unresponsive and the backup controller has not yet taken over. * Fix slurmctld segfault when a node registers with a configured CpuSpecList while slurmctld configuration has the node without CpuSpecList. * Fix cloud nodes getting stuck in POWERED_DOWN+NO_RESPOND state after not registering by ResumeTimeout. * slurmstepd - Avoid cleanup of config.json-less containers spooldir getting skipped. * slurmstepd - Cleanup per task generated environment for containers in spooldir. * Fix scontrol segfault when 'completing' command requested repeatedly in interactive mode. * Properly handle a race condition between bind() and listen() calls in the network stack when running with SrunPortRange set. * Federation - Fix revoked jobs being returned regardless of the -a/--all
Egbert Eich (eeich)
committed
(revision 291)
work correctly (boo#1204697).
buildservice-autocommit
accepted
request 1151965
from
Egbert Eich (eeich)
(revision 290)
baserev update by copy to link target
Egbert Eich (eeich)
accepted
request 1150524
from
Egbert Eich (eeich)
(revision 289)
- Update to version 23.11.03 * slurmrestd - Reject single http query with multiple path requests. * Fix launching Singularity v4.x containers with `srun --container` by setting .process.terminal to true in generated `config.json` when step has pseudoterminal (`--pty`) requested. * Fix loading in `dyanmic/cloud` node jobs after `net_cred` expired. * Fix cgroup null path error on `slurmd/slurmstepd` tear down. * `data_parser/v0.0.40` - Prevent failure if accounting is disabled, instead issue a warning if needed data from the database can not be retrieved. * `openapi/slurmctld` - Prevent failure if accounting is disabled. * Prevent `slurmscriptd` processing delays from blocking other threads in `slurmctld` while trying to launch various scripts. This is additional work for a fix in 23.02.6. * Fix memory leak when receiving alias addrs from controller. * `scontrol` - Accept `scontrol token lifespan=infinite` to create tokens that effectively do not expire. * Avoid errors when Slurmdb accounting disabled when `--json` or `--yaml` is invoked with CLI commands and `slurmrestd`. Add warnings when query would have populated data from Slurmdb instead of errors. * Fix `slurmctld` memory leak when running job with `--tres-per-task=gres:shard:#` * Fix backfill trying to start jobs outside of backfill window. * Fix oversubscription on partitions with `PreemptMode=OFF`. * Preserve node reason on power up if the node is downed or drained.
buildservice-autocommit
accepted
request 1141442
from
Egbert Eich (eeich)
(revision 288)
baserev update by copy to link target
Egbert Eich (eeich)
committed
(revision 287)
- Remove last change. This is not how it is intended to work
Christian Goll (mslacken)
accepted
request 1141020
from
Dominique Leuenberger (dimstar)
(revision 286)
- Fix dependency of testsuite when building without hdf5 (have_hdf5=0). The previously use construct %{?have_hdf5:%ts_depends: does not behave as intended by the line-author: %{?…:} does not include a question of value, but only if the variable is defined or undefind.
Egbert Eich (eeich)
committed
(revision 285)
CVE-2023-49933, CVE-2023-49934, CVE-2023-49935, CVE-2023-49936 and CVE-2023-49937 * Substantially overhauled the SlurmDBD association management code. For clusters updated to 23.11, account and user additions or removals are significantly faster than in prior releases. * Overhauled `scontrol reconfigure` to prevent configuration mistakes from disabling slurmctld and slurmd. Instead, an error will be returned, and the running configuration will persist. This does require updates to the systemd service files to use the `--systemd` option to `slurmctld` and `slurmd`. * Added a new internal `auth/cred` plugin - `auth/slurm`. This builds off the prior `auth/jwt` model, and permits operation of the `slurmdbd` and `slurmctld` without access to full directory information with a suitable configuration. * Added a new `--external-launcher` option to `srun`, which is automatically set by common MPI launcher implementations and ensures processes using those non-srun launchers have full access to all resources allocated on each node. * Reworked the dynamic/cloud modes of operation to allow for "fanout" - where Slurm communication can be automatically offloaded to compute nodes for increased cluster scalability. * Overhauled and extended the Reservation subsystem to allow for most of the same resource requirements as are placed on the job. Notably, this permits reservations to now reserve GRES directly. * Fix `scontrol update job=... TimeLimit+=/-=` when used with a raw JobId of job array element. * Reject `TimeLimit` increment/decrement when called on job with `TimeLimit=UNLIMITED`.
Egbert Eich (eeich)
accepted
request 1138332
from
Christian Goll (mslacken)
(revision 284)
- Update to 23.11.1 with following major improvements and fixing CVE-2023-49933, CVE-2023-49934, CVE-2023-49935, CVE-2023-49936 and CVE-2023-49937 * Substantially overhauled the SlurmDBD association management code. For clusters updated to 23.11, account and user additions or removals are significantly faster than in prior releases. * Overhauled 'scontrol reconfigure' to prevent configuration mistakes from disabling slurmctld and slurmd. Instead, an error will be returned, and the running configuration will persist. This does require updates to the systemd service files to use the --systemd option to slurmctld and slurmd. * Added a new internal auth/cred plugin - "auth/slurm". This builds off the prior auth/jwt model, and permits operation of the slurmdbd and slurmctld without access to full directory information with a suitable configuration. * Added a new --external-launcher option to srun, which is automatically set by common MPI launcher implementations and ensures processes using those non-srun launchers have full access to all resources allocated on each node. * Reworked the dynamic/cloud modes of operation to allow for "fanout" - where Slurm communication can be automatically offloaded to compute nodes for increased cluster scalability. Added initial official Debian packaging support. * Overhauled and extended the Reservation subsystem to allow for most of the same resource requirements as are placed on the job. Notably, this permits reservations to now reserve GRES directly. - Details of changes: * Fix scontrol update job=... TimeLimit+=/-= when used with a raw JobId of job array element. * Reject TimeLimit increment/decrement when called on job with TimeLimit=UNLIMITED. * Fix issue with requesting a job with *licenses as well as
buildservice-autocommit
accepted
request 1137045
from
Egbert Eich (eeich)
(revision 283)
baserev update by copy to link target
Egbert Eich (eeich)
accepted
request 1136624
from
Egbert Eich (eeich)
(revision 282)
- Update to 23.02.6 to fix (CVE-2023-49933 - bsc#1218046, CVE-2023-49935 - bsc#1218049, CVE-2023-49936 - bsc#1218050, CVE-2023-49937 - bsc#1218051, CVE-2023-49938 - bsc#1218053) * Security Fixes: + Add `JobAcctGatherParams=DisableGPUAcct` to disable gpu accounting. + `acct_gather_energy/ipmi` - Improve logging of DCMI issues. + `gpu/oneapi` - Add support for new env vars `ZE_FLAT_DEVICE_HIERARCHY` and `ZE_ENABLE_PCI_ID_DEVICE_ORDER`. + `data_parser/v0.0.39` - skip empty string when parsing QOS ids. + Remove error message from `assoc_mgr_update_assocs` when purposefully resetting the default QOS. * Bug Fixes: + `libslurm_nss` - Avoid causing glibc to assert due to an unexpected return from slurm_nss due to an error during lookup. + Fix job requests with `--tres-per-task` sometimes resulting in bad allocations that cannot run subsequent job steps. + Fix issue with `slurmd` where `srun` fails to be warned when a node prolog script runs beyond `MsgTimeout` set in `slurm.conf`. + `gres/shard` - Fix plugin functions to have matching parameter orders. + `gpu/nvml` - Fix issue that resulted in the wrong MIG devices being constrained to a job + `gpu/nvml` - Fix linking issue with MIGs that prevented multiple MIGs being used in a single job for certain MIG configurations + Fix file descriptor leak in slurmd when using `acct_gather_energy/ipmi` with DCMI devices. + `sview` - avoid crash when job has a node list string > 49 characters. + Prevent `slurmctld` crash during reconfigure when packing job start messages. + Preserve reason uid on reconfig. + Update node reason with updated `INVAL` state reason if different from
buildservice-autocommit
accepted
request 1130097
from
Egbert Eich (eeich)
(revision 281)
baserev update by copy to link target
Egbert Eich (eeich)
accepted
request 1130096
from
Egbert Eich (eeich)
(revision 280)
- Add missing service file for slurmrestd (boo#1217711).
Egbert Eich (eeich)
accepted
request 1129638
from
Egbert Eich (eeich)
(revision 279)
- Explicitly create an Obsoletes: entry for each package version that is obsoleted by the present version. These are all published versions of the last two major releases as well as all minor versions of the present release lower than the current one (bsc#1216869 2nd part). This prevents the current version to upgrade a old Slurm version for which no upgrade path exists.
buildservice-autocommit
accepted
request 1129192
from
Factory Maintainer (factory-maintainer)
(revision 278)
baserev update by copy to link target
Egbert Eich (eeich)
committed
(revision 277)
- On SLE-12 exclude build for s390x.
buildservice-autocommit
accepted
request 1123596
from
Egbert Eich (eeich)
(revision 276)
baserev update by copy to link target
Egbert Eich (eeich)
accepted
request 1123595
from
Egbert Eich (eeich)
(revision 275)
- Add missing dependencies to slurm-config to plugins package. These should help to tie down the slurm version and help to avoid a package mix (bsc#1216869).
buildservice-autocommit
accepted
request 1121548
from
Factory Maintainer (factory-maintainer)
(revision 274)
baserev update by copy to link target
Displaying revisions 1 - 20 of 293