File drbd-fix-race-condition-resetting-resync_next_bit.patch of Package drbd.20953
From 48c9cdfdbd6df6c89ee605cedbdc86c5b1c91ada Mon Sep 17 00:00:00 2001
From: Joel Colledge <joel.colledge@linbit.com>
Date: Mon, 17 May 2021 08:42:05 +0200
Subject: [PATCH] drbd: fix race condition resetting resync_next_bit
The commit
262103c65d28 drbd: serialize syncs from multiple sources
introduced a race condition which could cause syncs to stall. The
sequence was as follows:
1. drbd_start_resync changes the replication state to L_SYNC_TARGET.
2. make_resync_request runs, either due to a previous sync or due to the
chain of calls finish_state_change => initialize_resync =>
drbd_rs_controller_reset => post work MAKE_RESYNC_REQUEST.
3. make_resync_request uses the old value of resync_next_bit, finds no
bits set after this point, and sets "bit" to the end of the bitmap.
4. drbd_start_resync sets "resync_next_bit" to 0.
5. make_resync_request sets "resync_next_bit" to "bit", that is the end
of the bitmap.
Now the sync stalls because "resync_next_bit" is at end of bitmap but
there are no requests active and bits in the bitmap are still set.
Fix this by resetting "resync_next_bit" earlier, before
make_resync_request could run for this sync.
---
drbd/drbd_sender.c | 8 ++------
drbd/drbd_state.c | 1 +
2 files changed, 3 insertions(+), 6 deletions(-)
diff --git a/drbd/drbd_sender.c b/drbd/drbd_sender.c
index cc67f298..bd90e9c3 100644
--- a/drbd/drbd_sender.c
+++ b/drbd/drbd_sender.c
@@ -2225,12 +2225,8 @@ skip_helper:
drbd_repl_str(repl_state),
(unsigned long) peer_device->rs_total << (BM_BLOCK_SHIFT-10),
(unsigned long) peer_device->rs_total);
- if (side == L_SYNC_TARGET) {
- peer_device->resync_next_bit = 0;
- peer_device->use_csums = use_checksum_based_resync(connection, device);
- } else {
- peer_device->use_csums = false;
- }
+ peer_device->use_csums = side == L_SYNC_TARGET ?
+ use_checksum_based_resync(connection, device) : false;
if ((side == L_SYNC_TARGET || side == L_PAUSED_SYNC_T) &&
!(peer_device->uuid_flags & UUID_FLAG_STABLE) &&
diff --git a/drbd/drbd_state.c b/drbd/drbd_state.c
index 0724ffdd..c52681ce 100644
--- a/drbd/drbd_state.c
+++ b/drbd/drbd_state.c
@@ -2284,6 +2284,7 @@ static void initialize_resync(struct drbd_peer_device *peer_device)
unsigned long tw = drbd_bm_total_weight(peer_device);
unsigned long now = jiffies;
+ peer_device->resync_next_bit = 0;
peer_device->rs_failed = 0;
peer_device->rs_paused = 0;
peer_device->rs_same_csum = 0;
--
2.16.4