File 5dbaf9b2-x86-mm-dont-re-set-PGT_pinned-on-partial-page.patch of Package xen.15650
# Commit bf656e02d8e7f49b484e2587aef4f18deda6e2ab
# Date 2019-10-31 16:11:46 +0100
# Author George Dunlap <george.dunlap@citrix.com>
# Committer Jan Beulich <jbeulich@suse.com>
x86/mm: Don't re-set PGT_pinned on a partially de-validated page
When unpinning pagetables, if an operation is interrupted,
relinquish_memory() re-sets PGT_pinned so that the un-pin will
pickedup again when the hypercall restarts.
This is appropriate when put_page_and_type_preemptible() returns
-EINTR, which indicates that the page is back in its initial state
(i.e., completely validated). However, for -ERESTART, this leads to a
state where a page has both PGT_pinned and PGT_partial set.
This happens to work at the moment, although it's not really a
"canonical" state; but in subsequent patches, where we need to make a
distinction in handling between PGT_validated and PGT_partial pages,
this causes issues.
Move to a "canonical" state by:
- Only re-setting PGT_pinned on -EINTR
- Re-dropping the refcount held by PGT_pinned on -ERESTART
In the latter case, the PGT_partial bit will be cleared further down
with the rest of the other PGT_partial pages.
While here, clean up some trainling whitespace.
This is part of XSA-299.
Reported-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -112,7 +112,7 @@ static void play_dead(void)
* this case, heap corruption or #PF can occur (when heap debugging is
* enabled). For example, even printk() can involve tasklet scheduling,
* which touches per-cpu vars.
- *
+ *
* Consider very carefully when adding code to *dead_idle. Most hypervisor
* subsystems are unsafe to call.
*/
@@ -1840,9 +1840,34 @@ static int relinquish_memory(
break;
case -ERESTART:
case -EINTR:
+ /*
+ * -EINTR means PGT_validated has been re-set; re-set
+ * PGT_pinned again so that it gets picked up next time
+ * around.
+ *
+ * -ERESTART, OTOH, means PGT_partial is set instead. Put
+ * it back on the list, but don't set PGT_pinned; the
+ * section below will finish off de-validation. But we do
+ * need to drop the general ref associated with
+ * PGT_pinned, since put_page_and_type_preemptible()
+ * didn't do it.
+ *
+ * NB we can do an ASSERT for PGT_validated, since we
+ * "own" the type ref; but theoretically, the PGT_partial
+ * could be cleared by someone else.
+ */
+ if ( ret == -EINTR )
+ {
+ ASSERT(page->u.inuse.type_info & PGT_validated);
+ set_bit(_PGT_pinned, &page->u.inuse.type_info);
+ }
+ else
+ put_page(page);
+
ret = -ERESTART;
+
+ /* Put the page back on the list and drop the ref we grabbed above */
page_list_add(page, list);
- set_bit(_PGT_pinned, &page->u.inuse.type_info);
put_page(page);
goto out;
default:
@@ -2064,7 +2089,7 @@ void vcpu_kick(struct vcpu *v)
* pending flag. These values may fluctuate (after all, we hold no
* locks) but the key insight is that each change will cause
* evtchn_upcall_pending to be polled.
- *
+ *
* NB2. We save the running flag across the unblock to avoid a needless
* IPI for domains that we IPI'd to unblock.
*/