File 5dbafa5c-x86-mm-always-retain-general-ref-on-partial.patch of Package xen.25148
# Commit 18b0ab697830a46ce3dacaf9210799322cb3732c
# Date 2019-10-31 16:14:36 +0100
# Author George Dunlap <george.dunlap@citrix.com>
# Committer Jan Beulich <jbeulich@suse.com>
x86/mm: Always retain a general ref on partial
In order to allow recursive pagetable promotions and demotions to be
interrupted, Xen must keep track of the state of the sub-pages
promoted or demoted. This is stored in two elements in the page struct:
nr_entries_validated and partial_flags.
The rule is that entries [0, nr_entries_validated) should always be
validated and hold a general reference count. If partial_flags is
zero, then [nr_entries_validated] is not validated and no reference
count is held. If PTF_partial_set is set, then [nr_entries_validated]
is partially validated.
At the moment, a distinction is made between promotion and demotion
with regard to whether the entry itself "holds" a general reference
count: when entry promotion is interrupted (i.e., returns -ERESTART),
the entry is not considered to hold a reference; when entry demotion
is interrupted, the entry is still considered to hold a general
reference.
PTF_partial_general_ref is used to distinguish between these cases.
If clear, it's a partial promotion => no general reference count held
by the entry; if set, it's partial demotion, so a general reference
count held. Because promotions and demotions can be interleaved, this
value is passed to get_page_and_type_from_mfn and put_page_from_l*e,
to be able to properly handle reference counts.
Unfortunately, because a refcount is not held, it is possible to
engineer a situation where PFT_partial_set is set but the page in
question has been assigned to another domain. A sketch is provided in
the appendix.
Fix this by having the parent page table entry hold a general
reference count whenever PFT_partial_set is set. (For clarity of
change, keep two separate flags. These will be collapsed in a
subsequent changeset.)
This has two basic implications. On the put_page_from_lNe() side,
this mean that the (partial_set && !partial_ref) case can never happen,
and no longer needs to be special-cased.
Secondly, because both flags are set together, there's no need to carry over
existing bits from partial_pte.
(NB there is still another issue with calling _put_page_type() on a
page which had PGT_partial set; that will be handled in a subsequent
patch.)
On the get_page_and_type_from_mfn() side, we need to distinguish
between callers which hold a reference on partial (i.e.,
alloc_lN_table()), and those which do not (new_cr3, PIN_LN_TABLE, and
so on): pass a flag if the type should be retained on interruption.
NB that since l1 promotion can't be preempted, that get_page_from_l2e
can't return -ERESTART.
This is part of XSA-299.
Reported-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
-----
* Appendix: Engineering PTF_partial_set while a page belongs to a
foreign domain
Suppose A is a page which can be promoted to an l3, and B is a page
which can be promoted to an l2, and A[x] points to B. B has
PGC_allocated set but no other general references.
V1: PIN_L3 A.
A is validated, B is validated.
A.type_count = 1 | PGT_validated | PGT_pinned
B.type_count = 1 | PGT_validated
B.count = 2 | PGC_allocated (A[x] holds a general ref)
V1: UNPIN A.
A begins de-validation.
Arrange to be interrupted when i < x
V1->old_guest_table = A
V1->old_guest_table_ref_held = false
A.type_count = 1 | PGT_partial
A.nr_validated_entries = i < x
B.type_count = 0
B.count = 1 | PGC_allocated
V2: MOD_L4_ENTRY to point some l4e to A.
Picks up re-validation of A.
Arrange to be interrupted halfway through B's validation
B.type_count = 1 | PGT_partial
B.count = 2 | PGC_allocated (PGT_partial holds a general ref)
A.type_count = 1 | PGT_partial
A.nr_validated_entries = x
A.partial_pte = PTF_partial_set
V3: MOD_L3_ENTRY to point some other l3e (not in A) to B.
Validates B.
B.type_count = 1 | PGT_validated
B.count = 2 | PGC_allocated ("other l3e" holds a general ref)
V3: MOD_L3_ENTRY to clear l3e pointing to B.
Devalidates B.
B.type_count = 0
B.count = 1 | PGC_allocated
V3: decrease_reservation(B)
Clears PGC_allocated
B.count = 0 => B is freed
B gets assigned to a different domain
V1: Restarts UNPIN of A
put_old_guest_table(A)
...
free_l3_table(A)
Now since A.partial_flags has PTF_partial_set, free_l3_table() will
call put_page_from_l3e() on A[x], which points to B, while B is owned
by another domain.
If A[x] held a general refcount for B on partial validation, as it does
for partial de-validation, then B would still have a reference count of
1 after PGC_allocated was freed; so B wouldn't be freed until after
put_page_from_l3e() had happend on A[x].
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -658,10 +658,11 @@ static int __get_page_type(struct page_i
* page->pte[page->nr_validated_entries]. See the comment in mm.h for
* more information.
*/
-#define PTF_partial_set (1 << 0)
-#define PTF_partial_general_ref (1 << 1)
-#define PTF_preemptible (1 << 2)
-#define PTF_defer (1 << 3)
+#define PTF_partial_set (1 << 0)
+#define PTF_partial_general_ref (1 << 1)
+#define PTF_preemptible (1 << 2)
+#define PTF_defer (1 << 3)
+#define PTF_retain_ref_on_restart (1 << 4)
static int get_page_and_type_from_mfn(
mfn_t mfn, unsigned long type, struct domain *d,
@@ -670,7 +671,11 @@ static int get_page_and_type_from_mfn(
struct page_info *page = mfn_to_page(mfn);
int rc;
bool preemptible = flags & PTF_preemptible,
- partial_ref = flags & PTF_partial_general_ref;
+ partial_ref = flags & PTF_partial_general_ref,
+ partial_set = flags & PTF_partial_set,
+ retain_ref = flags & PTF_retain_ref_on_restart;
+
+ ASSERT(partial_ref == partial_set);
if ( likely(!partial_ref) &&
unlikely(!get_page_from_mfn(mfn, d)) )
@@ -683,13 +688,15 @@ static int get_page_and_type_from_mfn(
* - page is fully validated (rc == 0)
* - page is not validated (rc < 0) but:
* - We came in with a reference (partial_ref)
+ * - page is partially validated (rc == -ERESTART), and the
+ * caller has asked the ref to be retained in that case
* - page is partially validated but there's been an error
* (page == current->arch.old_guest_table)
*
* The partial_ref-on-error clause is worth an explanation. There
* are two scenarios where partial_ref might be true coming in:
- * - mfn has been partially demoted as type `type`; i.e. has
- * PGT_partial set
+ * - mfn has been partially promoted / demoted as type `type`;
+ * i.e. has PGT_partial set
* - mfn has been partially demoted as L(type+1) (i.e., a linear
* page; e.g. we're being called from get_page_from_l2e with
* type == PGT_l1_table, but the mfn is PGT_l2_table)
@@ -712,7 +719,8 @@ static int get_page_and_type_from_mfn(
*/
if ( likely(!rc) || partial_ref )
/* nothing */;
- else if ( page == current->arch.old_guest_table )
+ else if ( page == current->arch.old_guest_table ||
+ (retain_ref && rc == -ERESTART) )
ASSERT(preemptible);
else
put_page(page);
@@ -1379,8 +1387,8 @@ static int put_page_from_l2e(l2_pgentry_
if ( (flags & (PTF_partial_set | PTF_partial_general_ref)) ==
PTF_partial_set )
{
- ASSERT(!(flags & PTF_defer));
- rc = _put_page_type(pg, PTF_preemptible, ptpg);
+ /* partial_set should always imply partial_ref */
+ BUG();
}
else if ( flags & PTF_defer )
{
@@ -1425,8 +1433,8 @@ static int put_page_from_l3e(l3_pgentry_
if ( (flags & (PTF_partial_set | PTF_partial_general_ref)) ==
PTF_partial_set )
{
- ASSERT(!(flags & PTF_defer));
- return _put_page_type(pg, PTF_preemptible, mfn_to_page(_mfn(pfn)));
+ /* partial_set should always imply partial_ref */
+ BUG();
}
if ( flags & PTF_defer )
@@ -1456,8 +1464,8 @@ static int put_page_from_l4e(l4_pgentry_
if ( (flags & (PTF_partial_set | PTF_partial_general_ref)) ==
PTF_partial_set )
{
- ASSERT(!(flags & PTF_defer));
- return _put_page_type(pg, PTF_preemptible, mfn_to_page(_mfn(pfn)));
+ /* partial_set should always imply partial_ref */
+ BUG();
}
if ( flags & PTF_defer )
@@ -1581,13 +1589,22 @@ static int alloc_l2_table(struct page_in
(rc = get_page_from_l2e(pl2e[i], pfn, d, partial_flags)) > 0 )
continue;
- if ( rc == -ERESTART )
- {
- page->nr_validated_ptes = i;
- /* Set 'set', retain 'general ref' */
- page->partial_flags = partial_flags | PTF_partial_set;
- }
- else if ( rc == -EINTR && i )
+ /*
+ * It shouldn't be possible for get_page_from_l2e to return
+ * -ERESTART, since we never call this with PTF_preemptible.
+ * (alloc_l1_table may return -EINTR on an L1TF-vulnerable
+ * entry.)
+ *
+ * NB that while on a "clean" promotion, we can never get
+ * PGT_partial. It is possible to arrange for an l2e to
+ * contain a partially-devalidated l2; but in that case, both
+ * of the following functions will fail anyway (the first
+ * because the page in question is not an l1; the second
+ * because the page is not fully validated).
+ */
+ ASSERT(rc != -ERESTART);
+
+ if ( rc == -EINTR && i )
{
page->nr_validated_ptes = i;
page->partial_flags = 0;
@@ -1596,6 +1613,7 @@ static int alloc_l2_table(struct page_in
else if ( rc < 0 && rc != -EINTR )
{
gdprintk(XENLOG_WARNING, "Failure in alloc_l2_table: slot %#x\n", i);
+ ASSERT(current->arch.old_guest_table == NULL);
if ( i )
{
page->nr_validated_ptes = i;
@@ -1652,16 +1670,17 @@ static int alloc_l3_table(struct page_in
rc = get_page_and_type_from_mfn(
l3e_get_mfn(pl3e[i]),
PGT_l2_page_table | PGT_pae_xen_l2, d,
- partial_flags | PTF_preemptible);
+ partial_flags | PTF_preemptible | PTF_retain_ref_on_restart);
}
- else if ( (rc = get_page_from_l3e(pl3e[i], pfn, d, partial_flags)) > 0 )
+ else if ( (rc = get_page_from_l3e(pl3e[i], pfn, d,
+ partial_flags | PTF_retain_ref_on_restart)) > 0 )
continue;
if ( rc == -ERESTART )
{
page->nr_validated_ptes = i;
/* Set 'set', leave 'general ref' set if this entry was set */
- page->partial_flags = partial_flags | PTF_partial_set;
+ page->partial_flags = PTF_partial_set | PTF_partial_general_ref;
}
else if ( rc == -EINTR && i )
{
@@ -1822,14 +1841,15 @@ static int alloc_l4_table(struct page_in
i++, partial_flags = 0 )
{
if ( !is_guest_l4_slot(d, i) ||
- (rc = get_page_from_l4e(pl4e[i], pfn, d, partial_flags)) > 0 )
+ (rc = get_page_from_l4e(pl4e[i], pfn, d,
+ partial_flags | PTF_retain_ref_on_restart)) > 0 )
continue;
if ( rc == -ERESTART )
{
page->nr_validated_ptes = i;
/* Set 'set', leave 'general ref' set if this entry was set */
- page->partial_flags = partial_flags | PTF_partial_set;
+ page->partial_flags = PTF_partial_set | PTF_partial_general_ref;
}
else if ( rc < 0 )
{
@@ -1927,9 +1947,7 @@ static int free_l2_table(struct page_inf
else if ( rc == -ERESTART )
{
page->nr_validated_ptes = i;
- page->partial_flags = (partial_flags & PTF_partial_set) ?
- partial_flags :
- (PTF_partial_set | PTF_partial_general_ref);
+ page->partial_flags = PTF_partial_set | PTF_partial_general_ref;
}
else if ( rc == -EINTR && i < L2_PAGETABLE_ENTRIES - 1 )
{
@@ -1977,9 +1995,7 @@ static int free_l3_table(struct page_inf
if ( rc == -ERESTART )
{
page->nr_validated_ptes = i;
- page->partial_flags = (partial_flags & PTF_partial_set) ?
- partial_flags :
- (PTF_partial_set | PTF_partial_general_ref);
+ page->partial_flags = PTF_partial_set | PTF_partial_general_ref;
}
else if ( rc == -EINTR && i < L3_PAGETABLE_ENTRIES - 1 )
{
@@ -2010,9 +2026,7 @@ static int free_l4_table(struct page_inf
if ( rc == -ERESTART )
{
page->nr_validated_ptes = i;
- page->partial_flags = (partial_flags & PTF_partial_set) ?
- partial_flags :
- (PTF_partial_set | PTF_partial_general_ref);
+ page->partial_flags = PTF_partial_set | PTF_partial_general_ref;
}
else if ( rc == -EINTR && i < L4_PAGETABLE_ENTRIES - 1 )
{
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -167,22 +167,25 @@ struct page_info
* page.
*
* This happens:
- * - During de-validation, if de-validation of the page was
+ * - During validation or de-validation, if the operation was
* interrupted
* - During validation, if an invalid entry is encountered and
* validation is preemptible
* - During validation, if PTF_partial_general_ref was set on
- * this entry to begin with (perhaps because we're picking
- * up from a partial de-validation).
+ * this entry to begin with (perhaps because it picked up a
+ * previous operation)
*
- * When resuming validation, if PTF_partial_general_ref is clear,
- * then a general reference must be re-acquired; if it is set, no
- * reference should be acquired.
+ * When resuming validation, if PTF_partial_general_ref is
+ * clear, then a general reference must be re-acquired; if it
+ * is set, no reference should be acquired.
*
* When resuming de-validation, if PTF_partial_general_ref is
* clear, no reference should be dropped; if it is set, a
* reference should be dropped.
*
+ * NB at the moment, PTF_partial_set should be set if and only if
+ * PTF_partial_general_ref is set.
+ *
* NB that PTF_partial_set and PTF_partial_general_ref are
* defined in mm.c, the only place where they are used.
*