File drbd-Fix-abortion-of-a-connect-2-phase-commit.patch of Package drbd
From c078591272d112f04813c615dce7058b20e78f28 Mon Sep 17 00:00:00 2001
From: Philipp Reisner <philipp.reisner@linbit.com>
Date: Mon, 26 Jul 2021 15:22:22 +0200
Subject: [PATCH] drbd: Fix abortion of a connect 2-phase-commit
There was a bug that triggered under the following conditions:
* two nodes connect and run the 2-phase-commit protocol
* the two nodes have the same value in the current-uuid slot
* the node with the lower node_id has a value not equal to zero in its
  bitmap slot for that peer
* the node with the higher node id aborts the first 2-phase-commit
  transaction because the UUID changed
* a moment later it receives the abort command, and calls
  the apply_connect() function with commit = true. This is the bug.
The effects of the bug are:
* the node with the lower node-id starts another 2-phase transaction
* this time the passive node (higher node-id) does not execute the
  detection of missed finished resyncs, because the INITIAL_STATE_PROCESSED
  bit is already set.
* The two nodes establish the DRBD connection and disagree about the
  replication state. The lower node_id does a resync, the other node
  goes into established replication state
Fix the bug by setting the commit parameter of the apply_connect()
call correctly.
---
 drbd/drbd_receiver.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drbd/drbd_receiver.c b/drbd/drbd_receiver.c
index 2f162a22..6f5cb201 100644
--- a/drbd/drbd_receiver.c
+++ b/drbd/drbd_receiver.c
@@ -6041,6 +6041,7 @@ change_connection_state(struct drbd_connection *connection,
 	long t = resource->res_opts.auto_promote_timeout * HZ / 10;
 	bool is_disconnect = false;
 	bool is_connect = false;
+	bool abort = flags & CS_ABORT;
 	struct drbd_peer_device *peer_device;
 	unsigned long irq_flags;
 	enum drbd_state_rv rv;
@@ -6060,7 +6061,7 @@ change_connection_state(struct drbd_connection *connection,
 	if (is_connect && connection->agreed_pro_version >= 118) {
 		if (flags & CS_PREPARE)
 			conn_connect2(connection);
-		if (flags & CS_ABORT)
+		if (abort)
 			abort_connect(connection);
 	}
 retry:
@@ -6076,7 +6077,7 @@ retry:
 	if (rv < SS_SUCCESS)
 		goto fail;
 
-	if (reply) {
+	if (reply && !abort) {
 		u64 directly_reachable = directly_connected_nodes(resource, NEW) |
 			NODE_MASK(resource->res_opts.node_id);
 
@@ -6085,7 +6086,7 @@ retry:
 	}
 
 	if (is_connect && connection->agreed_pro_version >= 117)
-		apply_connect(connection, flags & CS_PREPARED);
+		apply_connect(connection, (flags & CS_PREPARED) && !abort);
 	rv = end_state_change(resource, &irq_flags);
 out:
 
-- 
2.16.4