File drbd-Fix-abortion-of-a-connect-2-phase-commit.patch of Package drbd.20953
From c078591272d112f04813c615dce7058b20e78f28 Mon Sep 17 00:00:00 2001
From: Philipp Reisner <philipp.reisner@linbit.com>
Date: Mon, 26 Jul 2021 15:22:22 +0200
Subject: [PATCH] drbd: Fix abortion of a connect 2-phase-commit
There was a bug that triggered under the following conditions:
* two nodes connect and run the 2-phase-commit protocol
* the two nodes have the same value in the current-uuid slot
* the node with the lower node_id has a value not equal to zero in its
bitmap slot for that peer
* the node with the higher node id aborts the first 2-phase-commit
transaction because the UUID changed
* a moment later it receives the abort command, and calls
the apply_connect() function with commit = true. This is the bug.
The effects of the bug are:
* the node with the lower node-id starts another 2-phase transaction
* this time the passive node (higher node-id) does not execute the
detection of missed finished resyncs, because the INITIAL_STATE_PROCESSED
bit is already set.
* The two nodes establish the DRBD connection and disagree about the
replication state. The lower node_id does a resync, the other node
goes into established replication state
Fix the bug by setting the commit parameter of the apply_connect()
call correctly.
---
drbd/drbd_receiver.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drbd/drbd_receiver.c b/drbd/drbd_receiver.c
index 2f162a22..6f5cb201 100644
--- a/drbd/drbd_receiver.c
+++ b/drbd/drbd_receiver.c
@@ -6041,6 +6041,7 @@ change_connection_state(struct drbd_connection *connection,
long t = resource->res_opts.auto_promote_timeout * HZ / 10;
bool is_disconnect = false;
bool is_connect = false;
+ bool abort = flags & CS_ABORT;
struct drbd_peer_device *peer_device;
unsigned long irq_flags;
enum drbd_state_rv rv;
@@ -6060,7 +6061,7 @@ change_connection_state(struct drbd_connection *connection,
if (is_connect && connection->agreed_pro_version >= 118) {
if (flags & CS_PREPARE)
conn_connect2(connection);
- if (flags & CS_ABORT)
+ if (abort)
abort_connect(connection);
}
retry:
@@ -6076,7 +6077,7 @@ retry:
if (rv < SS_SUCCESS)
goto fail;
- if (reply) {
+ if (reply && !abort) {
u64 directly_reachable = directly_connected_nodes(resource, NEW) |
NODE_MASK(resource->res_opts.node_id);
@@ -6085,7 +6086,7 @@ retry:
}
if (is_connect && connection->agreed_pro_version >= 117)
- apply_connect(connection, flags & CS_PREPARED);
+ apply_connect(connection, (flags & CS_PREPARED) && !abort);
rv = end_state_change(resource, &irq_flags);
out:
--
2.16.4