File drbd-fix-race-condition-resetting-resync_next_bit.patch of Package drbd.38639
From 48c9cdfdbd6df6c89ee605cedbdc86c5b1c91ada Mon Sep 17 00:00:00 2001
From: Joel Colledge <joel.colledge@linbit.com>
Date: Mon, 17 May 2021 08:42:05 +0200
Subject: [PATCH] drbd: fix race condition resetting resync_next_bit
The commit
262103c65d28 drbd: serialize syncs from multiple sources
introduced a race condition which could cause syncs to stall. The
sequence was as follows:
1. drbd_start_resync changes the replication state to L_SYNC_TARGET.
2. make_resync_request runs, either due to a previous sync or due to the
   chain of calls finish_state_change => initialize_resync =>
   drbd_rs_controller_reset => post work MAKE_RESYNC_REQUEST.
3. make_resync_request uses the old value of resync_next_bit, finds no
   bits set after this point, and sets "bit" to the end of the bitmap.
4. drbd_start_resync sets "resync_next_bit" to 0.
5. make_resync_request sets "resync_next_bit" to "bit", that is the end
   of the bitmap.
Now the sync stalls because "resync_next_bit" is at end of bitmap but
there are no requests active and bits in the bitmap are still set.
Fix this by resetting "resync_next_bit" earlier, before
make_resync_request could run for this sync.
---
 drbd/drbd_sender.c | 8 ++------
 drbd/drbd_state.c  | 1 +
 2 files changed, 3 insertions(+), 6 deletions(-)
diff --git a/drbd/drbd_sender.c b/drbd/drbd_sender.c
index cc67f298..bd90e9c3 100644
--- a/drbd/drbd_sender.c
+++ b/drbd/drbd_sender.c
@@ -2225,12 +2225,8 @@ skip_helper:
 		     drbd_repl_str(repl_state),
 		     (unsigned long) peer_device->rs_total << (BM_BLOCK_SHIFT-10),
 		     (unsigned long) peer_device->rs_total);
-		if (side == L_SYNC_TARGET) {
-			peer_device->resync_next_bit = 0;
-			peer_device->use_csums = use_checksum_based_resync(connection, device);
-		} else {
-			peer_device->use_csums = false;
-		}
+		peer_device->use_csums = side == L_SYNC_TARGET ?
+			use_checksum_based_resync(connection, device) : false;
 
 		if ((side == L_SYNC_TARGET || side == L_PAUSED_SYNC_T) &&
 		    !(peer_device->uuid_flags & UUID_FLAG_STABLE) &&
diff --git a/drbd/drbd_state.c b/drbd/drbd_state.c
index 0724ffdd..c52681ce 100644
--- a/drbd/drbd_state.c
+++ b/drbd/drbd_state.c
@@ -2284,6 +2284,7 @@ static void initialize_resync(struct drbd_peer_device *peer_device)
 	unsigned long tw = drbd_bm_total_weight(peer_device);
 	unsigned long now = jiffies;
 
+	peer_device->resync_next_bit = 0;
 	peer_device->rs_failed = 0;
 	peer_device->rs_paused = 0;
 	peer_device->rs_same_csum = 0;
-- 
2.16.4