File bug-921102_pacemaker-crmd-reset-stonith-failcount.patch of Package pacemaker.9287

commit 53d7d54d5a33857c3331f0dbc44eadf6d092c90c
Author: Gao,Yan <ygao@suse.com>
Date:   Tue Mar 10 16:02:33 2015 +0100

    Fix: crmd: Reset stonith failcount to recover transitioner when the node rejoins
    
    CRMd transitioner could not recover from "Too many failures to fence".
    
    Steps to produce:
    
    1. Two-node cluster with stonith, for example using IPMI.
    2. Node-1 has a complete power outage for a couple of minutes. The
    IPMI device is also without power, which causes the fencing to fail
    3. Node-2 tries to fence node-1 for several times but fails.
    4. Node-2 reports "Too many failures to fence node-1 (11), giving up".
    5. The power returns and node-1 boots up normally.
    6. Node-1 rejoins the cluster, but resources are not started on it.
    
    Expected result:
    The stonith failcount for node-1 should be reset and resources should
    be started on node-1.
    
    Actual result:
    Node-2 still logs "Too many failures to fence" and resources are not
    started on node-1.

Index: pacemaker/crmd/callbacks.c
===================================================================
--- pacemaker.orig/crmd/callbacks.c
+++ pacemaker/crmd/callbacks.c
@@ -204,6 +204,9 @@ peer_update_callback(enum crm_status_typ
             if (alive && safe_str_eq(task, CRM_OP_FENCE)) {
                 crm_notice("Node return implies stonith of %s (action %d) completed", node->uname,
                          down->id);
+
+                st_fail_count_reset(node->uname);
+
                 erase_status_tag(node->uname, XML_CIB_TAG_LRM, cib_scope_local);
                 erase_status_tag(node->uname, XML_TAG_TRANSIENT_NODEATTRS, cib_scope_local);
                 /* down->confirmed = TRUE; Only stonith-ng returning should imply completion */
openSUSE Build Service is sponsored by