File bsc#1173668-0001-Fix-attrd-prevent-leftover-attributes-of-shutdown-no.patch of Package pacemaker.34782
From a1a9c54cfc451e36f66db3738fd798b7464a1239 Mon Sep 17 00:00:00 2001
From: "Gao,Yan" <ygao@suse.com>
Date: Fri, 25 Sep 2020 15:47:23 +0200
Subject: [PATCH] Fix: attrd: prevent leftover attributes of shutdown node in
 cib
This commit prevents writing out of attributes from being triggered by
cib_replace event on a node that is requesting shutdown, so that it
prevents leftover attributes of the shutdown node in cib.
Race conditions were encountered on shutdown and startup of a node.
Pacemaker is v1.1.18+, but I think the latest revision should be
potentially impacted too.
Node2 was shutting down. When crmd was stopped from node2, node1 erased
all the transient attributes of node2 from cib:
```
Sep 21 08:33:49 [7849] node1       crmd:   notice: peer_update_callback:
Our peer on the DC (node2) is dead
Sep 21 08:33:49 [7844] node1        cib:     info: cib_process_request:
Completed cib_delete operation for section
//node_state[@uname='node2']/transient_attributes: OK (rc=0,
origin=node1/crmd/114, version=0.649.144)
```
And node2 became the DC and did a cib_replace:
```
Sep 21 08:33:49 [7849] node1       crmd:     info: update_dc:     Set DC
to node1 (3.1.0)
Sep 21 08:33:49 [7849] node1       crmd:     info: do_dc_join_finalize:
join-2: Syncing our CIB to the rest of the cluster
Sep 21 08:33:49 [7844] node1        cib:     info: cib_process_replace:
Replaced 0.649.144 with 0.649.144 from node1
Sep 21 08:33:49 [7844] node1        cib:     info: cib_process_request:
Completed cib_replace operation for section 'all': OK (rc=0,
origin=node1/crmd/126, version=0.649.144)
```
Meanwhile cib and attrd daemons on node2 didn't receive SIGTERM yet and
were still running. Attrd reacted to the cib_replace and wrote all the
node attributes back into cib again including its "shutdown" attribute:
```
Sep 21 08:33:49 [4444] node2      attrd:   notice: attrd_cib_replaced_cb:
Updating all attributes after cib_refresh_notify event
Sep 21 08:33:49 [4441] node2        cib:     info: cib_perform_op:
++
/cib/status/node_state[@id='14548837']/transient_attributes[@id='14548837']/instance_attributes[@id='status-14548837']:
<nvpair id="status-14548837-shutdown" name="shutdown"
value="1600677133"/>
Sep 21 08:33:49 [4444] node2      attrd:     info: attrd_cib_callback:
Update 1103 for shutdown[node2]=1600677133: OK (0)
```
Later, attrd received SIGTERM and shut down:
```
Sep 21 08:33:49 [4439] node2 pacemakerd:   notice: stop_child:    Stopping
attrd | sent signal 15 to process 4444
Sep 21 08:33:49 [4444] node2      attrd:   notice: crm_signal_dinode1tch:
Caught 'Terminated' signal | 15 (invoking handler)
Sep 21 08:33:49 [4444] node2      attrd:     info: main:  Shutting down
attribute manager
```
When node2 started again, it cleared its node attributes from cib, but
cib of node1 didn't join yet by then:
```
Sep 21 08:42:46 [4844] node2      attrd:     info: attrd_erase_attrs:
Clearing transient attributes from CIB |
xpath=//node_state[@uname='node2']/transient_attributes
Sep 21 08:42:46 [4841] node2        cib:     info: cib_process_request:
Completed cib_delete operation for section
//node_state[@uname='node2']/transient_attributes: OK (rc=0,
origin=node2/attrd/2, version=0.649.0)
```
Then cib of node1 joined:
```
Sep 21 08:42:47 [4841] node2        cib:     info: pcmk_cpg_membership:
Node 14548836 joined group cib (counter=1.0, pid=0, unchecked for
rivals)
```
Soon the node attributes of node2 got back into cib again by syncing of
cib from node1:
```
Sep 21 08:42:47 [4846] node2       crmd:     info: update_dc:     Set DC
to node1 (3.1.0)
Sep 21 08:42:48 [4841] node2        cib:     info: cib_process_replace:
Replaced 0.649.90 with 0.649.410 from node1
Sep 21 08:42:48 [4841] node2        cib:     info: cib_perform_op:
++
/cib/status/node_state[@id='14548837']/transient_attributes[@id='14548837']/instance_attributes[@id='status-14548837']:
<nvpair id="status-14548837-shutdown" name="shutdown"
value="1600677133"/>
```
So the "leftover" "shutdown" attribute of node2 caused it to shut down
again:
```
Sep 21 08:42:51 [4846] node2       crmd:    error: handle_request:      We
didn't ask to be shut down, yet our DC is telling us to.
```
---
 daemons/attrd/attrd_commands.c  | 11 +++++++++++
 daemons/attrd/attrd_utils.c     | 35 ++++++++++++++++++++++++++++++++-
 daemons/attrd/pacemaker-attrd.h |  3 +++
 3 files changed, 48 insertions(+), 1 deletion(-)
diff --git a/daemons/attrd/attrd_commands.c b/daemons/attrd/attrd_commands.c
index 96d992a20..deff82f47 100644
--- a/daemons/attrd/attrd_commands.c
+++ b/daemons/attrd/attrd_commands.c
@@ -886,6 +886,17 @@ attrd_peer_update(crm_node_t *peer, xmlNode *xml, const char *host, bool filter)
         v->current = (value? strdup(value) : NULL);
         a->changed = TRUE;
 
+        if (pcmk__str_eq(host, attrd_cluster->uname, pcmk__str_casei)
+            && pcmk__str_eq(attr, XML_CIB_ATTR_SHUTDOWN, pcmk__str_none)) {
+
+            if (!pcmk__str_eq(value, "0", pcmk__str_null_matches)) {
+                attrd_set_requesting_shutdown();
+
+            } else {
+                attrd_clear_requesting_shutdown();
+            }
+        }
+
         // Write out new value or start dampening timer
         if (a->timeout_ms && a->timer) {
             crm_trace("Delayed write out (%dms) for %s", a->timeout_ms, attr);
diff --git a/daemons/attrd/attrd_utils.c b/daemons/attrd/attrd_utils.c
index 3b8446a9f..b7957fdd7 100644
--- a/daemons/attrd/attrd_utils.c
+++ b/daemons/attrd/attrd_utils.c
@@ -24,9 +24,42 @@
 
 cib_t *the_cib = NULL;
 
+static bool requesting_shutdown = FALSE;
 static bool shutting_down = FALSE;
 static GMainLoop *mloop = NULL;
 
+/*!
+ * \internal
+ * \brief  Set requesting_shutdown state
+ */
+void
+attrd_set_requesting_shutdown()
+{
+    requesting_shutdown = TRUE;
+}
+
+/*!
+ * \internal
+ * \brief  Clear requesting_shutdown state
+ */
+void
+attrd_clear_requesting_shutdown()
+{
+    requesting_shutdown = FALSE;
+}
+
+/*!
+ * \internal
+ * \brief Check whether we're currently requesting shutdown
+ *
+ * \return TRUE if requesting shutdown, FALSE otherwise
+ */
+gboolean
+attrd_requesting_shutdown()
+{
+    return requesting_shutdown;
+}
+
 /*!
  * \internal
  * \brief Check whether we're currently shutting down
@@ -191,7 +224,7 @@ attrd_cib_disconnect()
 void
 attrd_cib_replaced_cb(const char *event, xmlNode * msg)
 {
-    if (attrd_shutting_down()) {
+    if (attrd_requesting_shutdown() || attrd_shutting_down()) {
         return;
     }
 
diff --git a/daemons/attrd/pacemaker-attrd.h b/daemons/attrd/pacemaker-attrd.h
index b82455564..15e08f55e 100644
--- a/daemons/attrd/pacemaker-attrd.h
+++ b/daemons/attrd/pacemaker-attrd.h
@@ -20,6 +20,9 @@
 void attrd_init_mainloop(void);
 void attrd_run_mainloop(void);
 
+void attrd_set_requesting_shutdown(void);
+void attrd_clear_requesting_shutdown(void);
+gboolean attrd_requesting_shutdown(void);
 gboolean attrd_shutting_down(void);
 void attrd_shutdown(int nsig);
 void attrd_init_ipc(qb_ipcs_service_t **ipcs,
-- 
2.26.2