Sign Up
Log In
Log In
or
Sign Up
Places
All Projects
Status Monitor
Collapse sidebar
SUSE:SLE-12-SP5:Update
drbd
bsc-1197431-drbd-fix-missing-set-out-of-sync-fo...
Overview
Repositories
Revisions
Requests
Users
Attributes
Meta
File bsc-1197431-drbd-fix-missing-set-out-of-sync-for-D_INCONSISTENT-.patch of Package drbd
From 204200aba5329e2d5918ce16f0543ac7b83cde16 Mon Sep 17 00:00:00 2001 From: Lars Ellenberg <lars.ellenberg@linbit.com> Date: Wed, 25 Jul 2018 14:17:24 +0200 Subject: [PATCH] drbd: fix missing set-out-of-sync for < D_INCONSISTENT peer This fixes missing resyncs (and thus data corruption) in scenarios described below. When changing the receiving side to use the activity log not only if "the" Primary peer was diskless, but always, we lost the "set-out-of-sync" bit to track changed blocks on behalf of the (not permanently, but temporarily detached) diskless peer. Which means that we don't know what to resync after a "detach/keep modifying/attach" cycle on a Primary besides what is covered by the activity log there. Which is likely why this went undetected for so long: to notice, modifications have to actually go to an area that was "cold" during the detach. A different scenario makes it immediately obvious, though: "secondary/detach/primary/keep modifying/attach", in which case on attach, no activity log is applied, the bitmap(s) "seem" to be clean, and we don't do any resync at all. Very bad, very embarrassing :-( --- drbd/drbd_debugfs.c | 1 + drbd/drbd_int.h | 7 +++++++ drbd/drbd_receiver.c | 18 ++++++++++++++++++ 3 files changed, 26 insertions(+) diff --git a/drbd/drbd_debugfs.c b/drbd/drbd_debugfs.c index ce6c3018..e35ae2bc 100644 --- a/drbd/drbd_debugfs.c +++ b/drbd/drbd_debugfs.c @@ -268,6 +268,7 @@ static void seq_print_peer_request_flags(struct seq_file *m, struct drbd_peer_re seq_print_rq_state_bit(m, f & EE_IS_BARRIER, &sep, "barr"); seq_print_rq_state_bit(m, f & EE_SEND_WRITE_ACK, &sep, "C"); seq_print_rq_state_bit(m, f & EE_MAY_SET_IN_SYNC, &sep, "set-in-sync"); + seq_print_rq_state_bit(m, f & EE_SET_OUT_OF_SYNC, &sep, "set-out-of-sync"); seq_print_rq_state_bit(m, (f & (EE_IN_ACTLOG|EE_WRITE)) == EE_WRITE, &sep, "blocked-on-al"); seq_print_rq_state_bit(m, f & EE_TRIM, &sep, "trim"); seq_print_rq_state_bit(m, f & EE_ZEROOUT, &sep, "zero-out"); diff --git a/drbd/drbd_int.h b/drbd/drbd_int.h index 53aedf89..1d2a741e 100644 --- a/drbd/drbd_int.h +++ b/drbd/drbd_int.h @@ -453,8 +453,14 @@ struct drbd_peer_request { * non-atomic modification to ee->flags is ok. */ enum { + /* If successfully written, + * we may clear the corresponding out-of-sync bits */ __EE_MAY_SET_IN_SYNC, + /* Peer did not write this one, we must set-out-of-sync + * before actually submitting ourselves */ + __EE_SET_OUT_OF_SYNC, + /* This peer request closes an epoch using a barrier. * On successful completion, the epoch is released, * and the P_BARRIER_ACK send. */ @@ -509,6 +515,7 @@ enum { __EE_IN_ACTLOG, }; #define EE_MAY_SET_IN_SYNC (1<<__EE_MAY_SET_IN_SYNC) +#define EE_SET_OUT_OF_SYNC (1<<__EE_SET_OUT_OF_SYNC) #define EE_IS_BARRIER (1<<__EE_IS_BARRIER) #define EE_TRIM (1<<__EE_TRIM) #define EE_ZEROOUT (1<<__EE_ZEROOUT) diff --git a/drbd/drbd_receiver.c b/drbd/drbd_receiver.c index d1d1a337..5a8d1459 100644 --- a/drbd/drbd_receiver.c +++ b/drbd/drbd_receiver.c @@ -1543,6 +1543,10 @@ int drbd_submit_peer_request(struct drbd_device *device, unsigned nr_pages = peer_req->page_chain.nr_pages; int err = -ENOMEM; + if (peer_req->flags & EE_SET_OUT_OF_SYNC) + drbd_set_out_of_sync(peer_req->peer_device, + peer_req->i.sector, peer_req->i.size); + /* TRIM/DISCARD: for now, always use the helper function * blkdev_issue_zeroout(..., discard=true). * It's synchronous, but it does the right thing wrt. bio splitting. @@ -2936,6 +2940,20 @@ static int receive_Data(struct drbd_connection *connection, struct packet_info * if (err == DRBD_PAL_DISCONNECTED) goto disconnect_during_al_begin_io; + /* Note: this now may or may not be "hot" in the activity log. + * Still, it is the best time to record that we need to set the + * out-of-sync bit, if we delay that until drbd_submit_peer_request(), + * we may introduce a race with some re-attach on the peer. + * Unless we want to guarantee that we drain all in-flight IO + * whenever we receive a state change. Which I'm not sure about. + * Use the EE_SET_OUT_OF_SYNC flag, to be acted on just before + * the actual submit, when we can be sure it is "hot". + */ + if (peer_device->disk_state[NOW] < D_INCONSISTENT) { + peer_req->flags &= ~EE_MAY_SET_IN_SYNC; + peer_req->flags |= EE_SET_OUT_OF_SYNC; + } + atomic_inc(&connection->active_ee_cnt); if (err == DRBD_PAL_QUEUE) { -- 2.34.1
Locations
Projects
Search
Status Monitor
Help
OpenBuildService.org
Documentation
API Documentation
Code of Conduct
Contact
Support
@OBShq
Terms
openSUSE Build Service is sponsored by
The Open Build Service is an
openSUSE project
.
Sign Up
Log In
Places
Places
All Projects
Status Monitor