File rh#2166967-0001-Fix-fencer-Prevent-double-g_source_remove-of-op_time.patch of Package pacemaker.32051
From db109219f03e8e3ef0539cc53b2976010aba169f Mon Sep 17 00:00:00 2001
From: Reid Wahl <nrwahl@protonmail.com>
Date: Fri, 3 Feb 2023 12:08:57 -0800
Subject: [PATCH] Fix: fencer: Prevent double g_source_remove of op_timer_one
QE observed a rarely reproducible core dump in the fencer during
Pacemaker shutdown, in which we try to g_source_remove() an op timer
that's already been removed.
free_stonith_remote_op_list()
-> g_hash_table_destroy()
-> g_hash_table_remove_all_nodes()
-> clear_remote_op_timers()
-> g_source_remove()
-> crm_glib_handler()
-> "Source ID 190 was not found when attempting to remove it"
The likely cause is that request_peer_fencing() doesn't set
op->op_timer_one to 0 after calling g_source_remove() on it, so if that
op is still in the stonith_remote_op_list at shutdown with the same
timer, clear_remote_op_timers() tries to remove the source for
op_timer_one again.
There are only five locations that call g_source_remove() on a
remote_fencing_op_t timer.
* Three of them are in clear_remote_op_timers(), which first 0-checks
the timer and then sets it to 0 after g_source_remove().
* One is in remote_op_query_timeout(), which does the same.
* The last is the one we fix here in request_peer_fencing().
I don't know all the conditions of QE's test scenario at this point.
What I do know:
* have-watchdog=true
* stonith-watchdog-timeout=10
* no explicit topology
* fence agent script is missing for the configured fence device
* requested fencing of one node
* cluster shutdown
Fixes RHBZ2166967
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
---
daemons/fenced/fenced_remote.c | 1 +
1 file changed, 1 insertion(+)
Index: pacemaker-2.1.2+20211124.ada5c3b36/daemons/fenced/fenced_remote.c
===================================================================
--- pacemaker-2.1.2+20211124.ada5c3b36.orig/daemons/fenced/fenced_remote.c
+++ pacemaker-2.1.2+20211124.ada5c3b36/daemons/fenced/fenced_remote.c
@@ -1622,6 +1622,7 @@ call_remote_stonith(remote_fencing_op_t
op->state = st_exec;
if (op->op_timer_one) {
g_source_remove(op->op_timer_one);
+ op->op_timer_one = 0;
}
if (!(stonith_watchdog_timeout_ms > 0 && (