File spinlock_deadlock_dev_al_lock.patch of Package drbd.16013
commit 7ce7cac6a1901988caec429f8fb42874d44d7b44
Author: Lars Ellenberg <lars.ellenberg@linbit.com>
Date: Mon May 13 17:13:47 2019 +0200
drbd: fix potential spinlock deadlock on device->al_lock
kernel: [<ffffffffbab6b6e7>] _raw_spin_lock_irqsave+0x37/0x40
kernel: [<ffffffffc0a2a90f>] drbd_rs_complete_io+0x3f/0x160 [drbd]
kernel: [<ffffffffba67fc87>] bio_endio+0x67/0xb0
kernel: [<ffffffffba74fda7>] blk_mq_complete_request+0x27/0x30
kernel: [<ffffffffc05a2372>] nvme_process_cq+0xf2/0x1e0 [nvme]
kernel: [<ffffffffc05a2933>] nvme_irq+0x23/0x50 [nvme]
kernel: [<ffffffffba42e554>] handle_irq+0xe4/0x1a0
kernel: [<ffffffffbab7a59d>] do_IRQ+0x4d/0xf0
kernel: [<ffffffffbab6c362>] common_interrupt+0x162/0x162
kernel: [<ffffffffc0a208d9>] drbd_receiver+0x479/0x780 [drbd]
So drbd_receiver is in receive_Data(), prepare_activity_log(),
holding the device->al_lock (but forgot to disable IRQs),
gets interrupted by NVME completion, finds its way into
drbd_rs_complete_io() and tries to lock the same device->al_lock again.
introduced while fixing a distributed resource starvation deadlock with:
2018-07-19 3d98754e drbd: protocol 114: fix distributed deadlock on secondary activity log
(released with 9.0.15)
Fix: use spin_lock_irq() in prepare_activity_log().
diff --git a/drbd/drbd_receiver.c b/drbd/drbd_receiver.c
index 3b4c6263..654aab61 100644
--- a/drbd/drbd_receiver.c
+++ b/drbd/drbd_receiver.c
@@ -2742,11 +2742,11 @@ prepare_activity_log(struct drbd_peer_request *peer_req)
* See also drbd_request_prepare() for the "request" entry point. */
ecnt = atomic_add_return(nr_al_extents, &device->wait_for_actlog_ecnt);
- spin_lock(&device->al_lock);
+ spin_lock_irq(&device->al_lock);
al = device->act_log;
nr = al->nr_elements;
used = al->used;
- spin_unlock(&device->al_lock);
+ spin_unlock_irq(&device->al_lock);
/* note: due to the slight delay between being accounted in "used" after
* being committed to the activity log with drbd_al_begin_io_commit(),