File drbd-fix-termination-of-verify-with-stop-sector.patch of Package drbd.28199

From 263ac567d1b4bb285c13dadaa7cb59b6a2dc8d48 Mon Sep 17 00:00:00 2001
From: Joel Colledge <joel.colledge@linbit.com>
Date: Fri, 28 May 2021 10:41:48 +0200
Subject: [PATCH] drbd: fix termination of verify with stop sector

Using the verify stop sector functionality could result in DRBD getting
stuck in VerifyS/VerifyT states. It was caused by a race condition, but
such that it was reliably reproducible in some environments. When this
occurred, lines such as the following were repeatedly logged on the
VerifyS node:
drbd test/0 drbd1000 node: Retrying drbd_rs_del_all() later. refcnt=1

The root of the issue was a bug in make_ov_request which caused it to
always issue one more request even if the stop sector had been reached.
When the "last" OV request was done, make_ov_request was scheduled, as
well as drbd_resync_finished. These two functions raced. The function
make_ov_request locking a resync extent, and the function
drbd_resync_finished clearing the resync extents. When make_ov_request
won the race, the resync extents could not be cleared and the verify
operation could not be finished.

With corking enabled and no other activity causing anything to be sent,
the extra resync extent remained locked indefinitely because the OV
request was never actually sent. With corking disabled or sender
activity (of a sort which is not blocked by the locked resync extent,
e.g. writes to a different disk region), the verify operation did
continue to make slow progress until it eventually reached the end of
the disk.

Fix this by only ensuring that one OV request is sent for the whole
verify operation rather than one every time make_ov_request is called.
---
 drbd/drbd_sender.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drbd/drbd_sender.c b/drbd/drbd_sender.c
index fac63d75..2dc1ba41 100644
--- a/drbd/drbd_sender.c
+++ b/drbd/drbd_sender.c
@@ -947,7 +947,7 @@ static int make_ov_request(struct drbd_peer_device *peer_device, unsigned int se
 		/* We check for "finished" only in the reply path:
 		 * w_e_end_ov_reply().
 		 * We need to send at least one request out. */
-		stop_sector_reached = i > 0
+		stop_sector_reached = sector > peer_device->ov_start_sector
 			&& verify_can_do_stop_sector(peer_device)
 			&& sector >= peer_device->ov_stop_sector;
 		if (stop_sector_reached)
-- 
2.16.4

openSUSE Build Service is sponsored by