File gcc48-aarch64-hack-reload-alt.diff of Package cross-hppa-gcc48-icecream-backend

This is for bsc#1093797

A miscompile in kernels btrfs module in reada.c.  Preprocessed
testcase and receipt for reproduction in bug report.  In short,
we're starting with an asm with a =Q constraint:

(insn 186 185 188 18 (parallel [
      (set (mem/j:HI (plus:DI (reg/v/f:DI 162 [ fs_info ])
	                     (const_int 4416 [0x1140])) [0 MEM[(struct arch_spinlock_t *)_222].owner+0 S2 A32])
       (asm_operands/v:HI (".if 1 == 1 ....") ("=Q") ...))))

pseudo 162 (fs_info) didn't get a hardreg, constraint "Q" requires a memory
address in a single register.  The constant 4416 will turn out to be invalid
for an operand in any add instruction.

reload will register these reloads:
Reload 0: reload_in (DI) = (reg/v/f:DI 162 [ fs_info ])
        GENERAL_REGS, RELOAD_FOR_INPUT_ADDRESS (opnum = 0)
        reload_in_reg: (reg/v/f:DI 162 [ fs_info ])
Reload 1: reload_in (DI) = (plus:DI (reg/v/f:DI 162 [ fs_info ])
                                                    (const_int 4416 [0x1140]))
        GENERAL_REGS, RELOAD_FOR_INPUT (opnum = 0), inc by 2
        reload_in_reg: (plus:DI (reg/v/f:DI 162 [ fs_info ])
                                                    (const_int 4416 [0x1140]))

Without changes these two reloads don't conflict (one is RELOAD_FOR_INPUT
the other RELOAD_FOR_INPUT_ADDRESS), and hence are getting allocated the same
reload reg (x4 in this case).  The first reload generates this insn:

(insn 836 185 838 18 (set (reg:DI 4 x4)
        (mem/c:DI (plus:DI (reg/f:DI 29 x29)
                (const_int 128 [0x80])) [0 %sfp+32 S8 A64])) spinlock.h:158 33 {*movdi_aarch64}
     (nil))

The second reload wants to generate this insn (in gen_reload):
  (insn 839 838 186 18 (set (reg:DI 4 x4)
        (plus:DI (reg:DI 4 x4) (const_int 4416 [0x1140]))))
which would be fine but emit_insn_if_valid_for_reload() finds out
correctly that this insn is invalid (because constant doesn't
match aarch64_plus_operand).  The code in gen_reload handles this situation
(see comments therein) by moving one operand into the reload register
(here it chooses the constant, if it chose the other operand (the register)
nothing would be solved, the constant would still be too large).
_BUT_ the reload register is x4, which already contains an input to this
insn.  So what is generated is:

(insn 838 836 839 18 (set (reg:DI 4 x4)
        (const_int 4416 [0x1140])) {*movdi_aarch64}
     (nil))
(insn 839 838 186 18 (set (reg:DI 4 x4)
        (plus:DI (reg:DI 4 x4)
            (reg:DI 4 x4))) {*adddi3_aarch64}
     (expr_list:REG_EQUIV (plus:DI (reg:DI 4 x4)
            (const_int 4416 [0x1140]))
        (nil)))

hence x4 overwritten.  There's commentary about this specific situation,
and how to avoid it.  The situation is: two reloads, one feeding the other
and the other not being able to be generated as one instruction.
The function gen_reload_chain_without_interm_reg_p specifically detects
this situation (and in this case indeed says that an interim register
is needed, i.e. that the reload_reg can't be the same for these two
reloads).  But that function is only used in a very specific case, namely
when looking for conflicts between two RELOAD_FOR_OPERAND_ADDRESS reloads.

In particular it's claimed (in reloads_conflict, reload_reg_free_p and
reload_reg_free_for_value_p) that a RELOAD_FOR_INPUT and a
RELOAD_FOR_INPUT_ADDRESS for the same operand never conflict.  Obviously
that's not true in this case on aarch64.  In fact I don't see how this
situation is avoided generally on other architectures.  Possibly address
forms with out-of-range offsets must be rejected such that the invalid
offset itself generates a reload (rejecting just the whole address is
equivalent to the above).

The patch below does something like this: it detects the situation
in the backends hinter for reloading addresses.  When it sees an
address consisting of a sum of a pseudo that didn't get a hardreg and a 
constant (or generally an operand) that's invalid for a single
instruction then it reloads both operands of the PLUS (the whole plus
is reloaded by the caller).

This should be relatively safe, but it's unclear if it deals with
all situation where this can occur as it doesn't touch reload itself.

We do add a safety assert in gen_reload, though.  That always trigger
when we're about to miscompile for the above reasons.  This assert
doesn't trigger in the GCC testsuite even without patch.

Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c	(revision 238821)
+++ gcc/config/aarch64/aarch64.c	(working copy)
@@ -3747,6 +3758,22 @@ aarch64_legitimize_reload_address (rtx *
 		   BASE_REG_CLASS, GET_MODE (x), VOIDmode, 0, 0,
 		   opnum, (enum reload_type) type);
       return x;
+    }
+  
+  if (GET_CODE (x) == PLUS
+      && 1
+      && REG_P (XEXP (x, 0))
+      && !HARD_REGISTER_P (XEXP (x, 0))
+      && reg_renumber[REGNO (XEXP (x, 0))] < 0
+      && !aarch64_plus_operand (XEXP (x, 1), DImode))
+    {
+      push_reload (XEXP (x, 0), NULL_RTX, &XEXP (x, 0), NULL,
+		   BASE_REG_CLASS, GET_MODE (x), VOIDmode, 0, 0,
+		   opnum, (enum reload_type) type);
+      push_reload (XEXP (x, 1), NULL_RTX, &XEXP (x, 1), NULL,
+		   BASE_REG_CLASS, GET_MODE (x), VOIDmode, 0, 0,
+		   opnum, (enum reload_type) type);
+      return x;
     }
 
   /* We wish to handle large displacements off a base register by splitting
Index: gcc/reload1.c
===================================================================
--- gcc/reload1.c	(revision 238821)
+++ gcc/reload1.c	(working copy)
@@ -8613,6 +8613,9 @@ gen_reload (rtx out, rtx in, int opnum,
 	      && !insn_operand_matches (code, 2, op1)))
 	tem = op0, op0 = op1, op1 = tem;
 
+      /* We must not clobber op1!  */
+      gcc_assert (!reg_overlap_mentioned_p (out, op1));
+
       gen_reload (out, op0, opnum, type);
 
       /* If OP0 and OP1 are the same, we can use OUT for OP1.
openSUSE Build Service is sponsored by