File math-remove-slow-path.patch of Package glibc.18276
2018-04-03 Wilco Dijkstra <wdijkstr@arm.com>
* sysdeps/ieee754/dbl-64/s_sin.c (__sin): Cleanup ifdefs.
(__cos): Likewise.
* sysdeps/ieee754/dbl-64/s_sin.c (__sincos): Refactor using the same
logic as sin and cos.
2018-04-03 Wilco Dijkstra <wdijkstr@arm.com>
* sysdeps/ieee754/dbl-64/s_sin.c (do_sin): Use TAYLOR_SIN for small
inputs. Return correct sign.
(do_sincos): Remove small input check before do_sin, let do_sin set
the sign.
(__sin): Likewise.
(__cos): Likewise.
2018-04-03 Wilco Dijkstra <wdijkstr@arm.com>
* sysdeps/ieee754/dbl-64/s_sin.c (TAYLOR_SLOW): Remove.
(do_cos_slow): Likewise.
(do_sin_slow): Likewise.
(reduce_and_compute): Likewise.
(slow): Likewise.
(slow1): Likewise.
(slow2): Likewise.
(sloww): Likewise.
(sloww1): Likewise.
(sloww2): Likewise.
(bslow): Likewise.
(bslow1): Likewise.
(bslow2): Likewise.
(cslow2): Likewise.
2018-04-03 Wilco Dijkstra <wdijkstr@arm.com>
* sysdeps/ieee754/dbl-64/s_sin.c (TAYLOR_SIN): Remove cor parameter.
(do_cos): Remove corp parameter and calculations.
(do_sin): Likewise.
(do_sincos): Remove cor variable.
(__sin): Use do_sincos for huge inputs.
(__cos): Likewise.
* sysdeps/ieee754/dbl-64/s_sincos.c (__sincos): Likewise.
(reduce_and_compute_sincos): Remove unused function.
2018-04-03 Wilco Dijkstra <wdijkstr@arm.com>
* sysdeps/ieee754/dbl-64/s_sin.c (reduce_sincos_1): Rename to
reduce_sincos, improve accuracy to 136 bits.
(do_sincos_1): Rename to do_sincos, remove fallbacks to slow functions.
(__sin): Use improved reduction and simplified do_sincos calculation.
(__cos): Likewise.
* sysdeps/ieee754/dbl-64/s_sincos.c (__sincos): Likewise.
2018-04-03 Wilco Dijkstra <wdijkstr@arm.com>
* sysdeps/ieee754/dbl-64/s_sin.c (reduce_sincos_2): Remove function.
(do_sincos_2): Likewise.
(__sin): Remove middle range reduction case.
(__cos): Likewise.
* sysdeps/ieee754/dbl-64/s_sincos.c (__sincos): Remove middle range
reduction case.
2018-04-03 Wilco Dijkstra <wdijkstr@arm.com>
* sysdeps/aarch64/libm-test-ulps: Update ULP for sin, cos, sincos.
* sysdeps/ieee754/dbl-64/s_sin.c (__sin): Remove slow paths for small
inputs.
(__cos): Likewise.
* sysdeps/x86_64/fpu/libm-test-ulps: Update ULP for sin, cos, sincos.
2018-02-12 Szabolcs Nagy <szabolcs.nagy@arm.com>
* manual/probes.texi: Remove slowexp probes.
* math/Makefile: Remove slowexp.
* sysdeps/generic/math_private.h (__slowexp): Remove.
* sysdeps/ieee754/dbl-64/e_exp.c (__ieee754_exp): Remove __slowexp and
document error bounds.
* sysdeps/i386/fpu/slowexp.c: Remove.
* sysdeps/ia64/fpu/slowexp.c: Remove.
* sysdeps/ieee754/dbl-64/slowexp.c: Remove.
* sysdeps/ieee754/dbl-64/uexp.h (err_0): Remove.
* sysdeps/m68k/m680x0/fpu/slowexp.c: Remove.
* sysdeps/powerpc/power4/fpu/Makefile (CPPFLAGS-slowexp.c): Remove.
* sysdeps/x86_64/fpu/multiarch/Makefile: Remove slowexp-fma.
* sysdeps/x86_64/fpu/multiarch/e_exp-avx.c (__slowexp): Remove.
* sysdeps/x86_64/fpu/multiarch/e_exp-fma.c (__slowexp): Remove.
* sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c (__slowexp): Remove.
* sysdeps/x86_64/fpu/multiarch/slowexp-avx.c: Remove.
* sysdeps/x86_64/fpu/multiarch/slowexp-fma.c: Remove.
* sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c: Remove.
2018-02-12 Wilco Dijkstra <wdijkstr@arm.com>
[BZ #13932]
* sysdeps/ieee754/dbl-64/uexp.h (err_1): Remove.
* benchtests/pow-inputs: Update comment for slow path cases.
* manual/probes.texi (slowpow_p10): Delete removed probe.
(slowpow_p10): Likewise.
* math/Makefile: Remove halfulp.c and slowpow.c.
* sysdeps/aarch64/libm-test-ulps: Set ULP of pow to 1.
* sysdeps/generic/math_private.h (__exp1): Remove error argument.
(__halfulp): Remove.
(__slowpow): Remove.
* sysdeps/i386/fpu/halfulp.c: Delete file.
* sysdeps/i386/fpu/slowpow.c: Likewise.
* sysdeps/ia64/fpu/halfulp.c: Likewise.
* sysdeps/ia64/fpu/slowpow.c: Likewise.
* sysdeps/ieee754/dbl-64/e_exp.c (__exp1): Remove error argument,
improve comments and add error analysis.
* sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Add error analysis.
(power1): Remove function:
(log1): Remove error argument, add error analysis.
(my_log2): Remove function.
* sysdeps/ieee754/dbl-64/halfulp.c: Delete file.
* sysdeps/ieee754/dbl-64/slowpow.c: Likewise.
* sysdeps/m68k/m680x0/fpu/halfulp.c: Likewise.
* sysdeps/m68k/m680x0/fpu/slowpow.c: Likewise.
* sysdeps/powerpc/power4/fpu/Makefile: Remove CPPFLAGS-slowpow.c.
* sysdeps/x86_64/fpu/libm-test-ulps: Set ULP of pow to 1.
* sysdeps/x86_64/fpu/multiarch/Makefile: Remove slowpow-fma.c,
slowpow-fma4.c, halfulp-fma.c, halfulp-fma4.c.
* sysdeps/x86_64/fpu/multiarch/e_pow-fma.c (__slowpow): Remove define.
* sysdeps/x86_64/fpu/multiarch/e_pow-fma4.c (__slowpow): Likewise.
* sysdeps/x86_64/fpu/multiarch/halfulp-fma.c: Delete file.
* sysdeps/x86_64/fpu/multiarch/halfulp-fma4.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/slowpow-fma.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/slowpow-fma4.c: Likewise.
2018-02-07 Wilco Dijkstra <wdijkstr@arm.com>
* manual/probes.texi (slowlog): Delete documentation of removed probe.
(slowlog_inexact): Likewise
* sysdeps/ieee754/dbl-64/e_log.c (__ieee754_log): Remove slow paths.
* sysdeps/ieee754/dbl-64/ulog.h: Remove unused declarations.
Index: glibc-2.26/benchtests/pow-inputs
===================================================================
--- glibc-2.26.orig/benchtests/pow-inputs
+++ glibc-2.26/benchtests/pow-inputs
@@ -302,8 +302,7 @@
0x1.c004d2256a5b8p402, -0x1.a01df480fdcb7p98
0x1.52b9d41aaa1e9p-589, -0x1.292cb15f1459dp46
-0x1.ea9ca6fa0919ep-279, -0x1.601e44b6a588cp40
-# pow slow path at 240 bits
-# Implemented in sysdeps/ieee754/dbl-64/slowpow.c
+# old pow slow path at 240 bits
## name: 240bits
0x1.01fcd33493ea3p596, -0x1.724bd4e887783p-14
0x1.032ff59ab34fdp-540, -0x1.61e3632080b87p-24
@@ -405,8 +404,7 @@
0x1.fae913d4f952ep-809, -0x1.4b649402fce63p-6
0x1.fe6d725408f24p484, -0x1.25f4f6441d2e4p-12
0x1.ff6393f9150ccp-718, 0x1.a0cb50a9bf2f3p-31
-# pow slowest path at 768 bits
-# Implemented in sysdeps/ieee754/dbl-64/slowpow.c
+# old pow slowest path at 768 bits
## name: 768bits
1.0000000000000020, 1.5
0x1.006777b4b61dep843, -0x1.67e3145491872p-1
Index: glibc-2.26/manual/probes.texi
===================================================================
--- glibc-2.26.orig/manual/probes.texi
+++ glibc-2.26/manual/probes.texi
@@ -265,53 +265,6 @@ Unless explicitly mentioned otherwise, a
precision in the mantissa of the multiple precision number. Hence, a precision
level of 32 implies 768 bits of precision in the mantissa.
-@deftp Probe slowexp_p6 (double @var{$arg1}, double @var{$arg2})
-This probe is triggered when the @code{exp} function is called with an
-input that results in multiple precision computation with precision
-6. Argument @var{$arg1} is the input value and @var{$arg2} is the
-computed output.
-@end deftp
-
-@deftp Probe slowexp_p32 (double @var{$arg1}, double @var{$arg2})
-This probe is triggered when the @code{exp} function is called with an
-input that results in multiple precision computation with precision
-32. Argument @var{$arg1} is the input value and @var{$arg2} is the
-computed output.
-@end deftp
-
-@deftp Probe slowpow_p10 (double @var{$arg1}, double @var{$arg2}, double @var{$arg3}, double @var{$arg4})
-This probe is triggered when the @code{pow} function is called with
-inputs that result in multiple precision computation with precision
-10. Arguments @var{$arg1} and @var{$arg2} are the input values,
-@code{$arg3} is the value computed in the fast phase of the algorithm
-and @code{$arg4} is the final accurate value.
-@end deftp
-
-@deftp Probe slowpow_p32 (double @var{$arg1}, double @var{$arg2}, double @var{$arg3}, double @var{$arg4})
-This probe is triggered when the @code{pow} function is called with an
-input that results in multiple precision computation with precision
-32. Arguments @var{$arg1} and @var{$arg2} are the input values,
-@code{$arg3} is the value computed in the fast phase of the algorithm
-and @code{$arg4} is the final accurate value.
-@end deftp
-
-@deftp Probe slowlog (int @var{$arg1}, double @var{$arg2}, double @var{$arg3})
-This probe is triggered when the @code{log} function is called with an
-input that results in multiple precision computation. Argument
-@var{$arg1} is the precision with which the computation succeeded.
-Argument @var{$arg2} is the input and @var{$arg3} is the computed
-output.
-@end deftp
-
-@deftp Probe slowlog_inexact (int @var{$arg1}, double @var{$arg2}, double @var{$arg3})
-This probe is triggered when the @code{log} function is called with an
-input that results in multiple precision computation and none of the
-multiple precision computations result in an accurate result.
-Argument @var{$arg1} is the maximum precision with which computations
-were performed. Argument @var{$arg2} is the input and @var{$arg3} is
-the computed output.
-@end deftp
-
@deftp Probe slowatan2 (int @var{$arg1}, double @var{$arg2}, double @var{$arg3}, double @var{$arg4})
This probe is triggered when the @code{atan2} function is called with
an input that results in multiple precision computation. Argument
Index: glibc-2.26/math/Makefile
===================================================================
--- glibc-2.26.orig/math/Makefile
+++ glibc-2.26/math/Makefile
@@ -110,9 +110,9 @@ type-ldouble-yes := ldouble
# double support
type-double-suffix :=
-type-double-routines := branred doasin dosincos halfulp mpa mpatan2 \
- mpatan mpexp mplog mpsqrt mptan sincos32 slowexp \
- slowpow sincostab k_rem_pio2
+type-double-routines := branred doasin dosincos mpa mpatan2 \
+ mpatan mpexp mplog mpsqrt mptan sincos32 \
+ sincostab k_rem_pio2
# float support
type-float-suffix := f
Index: glibc-2.26/sysdeps/aarch64/libm-test-ulps
===================================================================
--- glibc-2.26.orig/sysdeps/aarch64/libm-test-ulps
+++ glibc-2.26/sysdeps/aarch64/libm-test-ulps
@@ -1012,7 +1012,9 @@ ildouble: 2
ldouble: 2
Function: "cos":
+double: 1
float: 1
+idouble: 1
ifloat: 1
ildouble: 1
ldouble: 1
@@ -1932,7 +1934,9 @@ ildouble: 1
ldouble: 1
Function: "pow":
+double: 1
float: 1
+idouble: 1
ifloat: 1
ildouble: 2
ldouble: 2
@@ -1992,7 +1996,9 @@ ildouble: 2
ldouble: 2
Function: "sin":
+double: 1
float: 1
+idouble: 1
ifloat: 1
ildouble: 1
ldouble: 1
@@ -2022,7 +2028,9 @@ ildouble: 3
ldouble: 3
Function: "sincos":
+double: 1
float: 1
+idouble: 1
ifloat: 1
ildouble: 1
ldouble: 1
Index: glibc-2.26/sysdeps/generic/math_private.h
===================================================================
--- glibc-2.26.orig/sysdeps/generic/math_private.h
+++ glibc-2.26/sysdeps/generic/math_private.h
@@ -255,20 +255,17 @@ extern float __kernel_standard_f (float,
extern long double __kernel_standard_l (long double,long double,int);
/* Prototypes for functions of the IBM Accurate Mathematical Library. */
-extern double __exp1 (double __x, double __xx, double __error);
+extern double __exp1 (double __x, double __xx);
extern double __sin (double __x);
extern double __cos (double __x);
extern int __branred (double __x, double *__a, double *__aa);
extern void __doasin (double __x, double __dx, double __v[]);
extern void __dubsin (double __x, double __dx, double __v[]);
extern void __dubcos (double __x, double __dx, double __v[]);
-extern double __halfulp (double __x, double __y);
extern double __sin32 (double __x, double __res, double __res1);
extern double __cos32 (double __x, double __res, double __res1);
extern double __mpsin (double __x, double __dx, bool __range_reduce);
extern double __mpcos (double __x, double __dx, bool __range_reduce);
-extern double __slowexp (double __x);
-extern double __slowpow (double __x, double __y, double __z);
extern void __docos (double __x, double __dx, double __v[]);
#ifndef math_opt_barrier
Index: glibc-2.26/sysdeps/i386/fpu/halfulp.c
===================================================================
--- glibc-2.26.orig/sysdeps/i386/fpu/halfulp.c
+++ /dev/null
@@ -1 +0,0 @@
-/* Not needed. */
Index: glibc-2.26/sysdeps/i386/fpu/slowexp.c
===================================================================
--- glibc-2.26.orig/sysdeps/i386/fpu/slowexp.c
+++ /dev/null
@@ -1 +0,0 @@
-/* Not needed. */
Index: glibc-2.26/sysdeps/i386/fpu/slowpow.c
===================================================================
--- glibc-2.26.orig/sysdeps/i386/fpu/slowpow.c
+++ /dev/null
@@ -1 +0,0 @@
-/* Not needed. */
Index: glibc-2.26/sysdeps/ia64/fpu/halfulp.c
===================================================================
--- glibc-2.26.orig/sysdeps/ia64/fpu/halfulp.c
+++ /dev/null
@@ -1 +0,0 @@
-/* Not needed. */
Index: glibc-2.26/sysdeps/ia64/fpu/slowexp.c
===================================================================
--- glibc-2.26.orig/sysdeps/ia64/fpu/slowexp.c
+++ /dev/null
@@ -1 +0,0 @@
-/* Not needed. */
Index: glibc-2.26/sysdeps/ia64/fpu/slowpow.c
===================================================================
--- glibc-2.26.orig/sysdeps/ia64/fpu/slowpow.c
+++ /dev/null
@@ -1 +0,0 @@
-/* Not needed. */
Index: glibc-2.26/sysdeps/ieee754/dbl-64/e_exp.c
===================================================================
--- glibc-2.26.orig/sysdeps/ieee754/dbl-64/e_exp.c
+++ glibc-2.26/sysdeps/ieee754/dbl-64/e_exp.c
@@ -23,10 +23,10 @@
/* exp1 */
/* */
/* FILES NEEDED:dla.h endian.h mpa.h mydefs.h uexp.h */
-/* mpa.c mpexp.x slowexp.c */
+/* mpa.c mpexp.x */
/* */
/* An ultimate exp routine. Given an IEEE double machine number x */
-/* it computes the correctly rounded (to nearest) value of e^x */
+/* it computes an almost correctly rounded (to nearest) value of e^x */
/* Assumption: Machine arithmetic operations are performed in */
/* round to nearest mode of IEEE 754 standard. */
/* */
@@ -46,10 +46,6 @@
# define SECTION
#endif
-double __slowexp (double);
-
-/* An ultimate exp routine. Given an IEEE double machine number x it computes
- the correctly rounded (to nearest) value of e^x. */
double
SECTION
__ieee754_exp (double x)
@@ -93,17 +89,10 @@ __ieee754_exp (double x)
rem = (bet + bet * eps) + al * eps;
res = al + rem;
- cor = (al - res) + rem;
- if (res == (res + cor * err_0))
- {
- retval = res * binexp.x;
- goto ret;
- }
- else
- {
- retval = __slowexp (x);
- goto ret;
- } /*if error is over bound */
+ /* Maximum relative error is 7.8e-22 (70.1 bits).
+ Maximum ULP error is 0.500007. */
+ retval = res * binexp.x;
+ goto ret;
}
if (n <= smallint)
@@ -166,38 +155,22 @@ __ieee754_exp (double x)
if (ex >= -1022)
{
binexp.i[HIGH_HALF] = (1023 + ex) << 20;
- if (res == (res + cor * err_0))
- {
- retval = res * binexp.x;
- goto ret;
- }
- else
- {
- retval = __slowexp (x);
- goto check_uflow_ret;
- } /*if error is over bound */
+ /* Does not underflow: res >= 1.0, binexp >= 0x1p-1022
+ Maximum relative error is 7.8e-22 (70.1 bits).
+ Maximum ULP error is 0.500007. */
+ retval = res * binexp.x;
+ goto ret;
}
ex = -(1022 + ex);
binexp.i[HIGH_HALF] = (1023 - ex) << 20;
res *= binexp.x;
cor *= binexp.x;
- eps = 1.0000000001 + err_0 * binexp.x;
t = 1.0 + res;
y = ((1.0 - t) + res) + cor;
res = t + y;
- cor = (t - res) + y;
- if (res == (res + eps * cor))
- {
- binexp.i[HIGH_HALF] = 0x00100000;
- retval = (res - 1.0) * binexp.x;
- goto check_uflow_ret;
- }
- else
- {
- retval = __slowexp (x);
- goto check_uflow_ret;
- } /* if error is over bound */
- check_uflow_ret:
+ /* Maximum ULP error is 0.5000035. */
+ binexp.i[HIGH_HALF] = 0x00100000;
+ retval = (res - 1.0) * binexp.x;
if (retval < DBL_MIN)
{
double force_underflow = tiny * tiny;
@@ -210,10 +183,9 @@ __ieee754_exp (double x)
else
{
binexp.i[HIGH_HALF] = (junk1.i[LOW_HALF] + 767) << 20;
- if (res == (res + cor * err_0))
- retval = res * binexp.x * t256.x;
- else
- retval = __slowexp (x);
+ /* Maximum relative error is 7.8e-22 (70.1 bits).
+ Maximum ULP error is 0.500007. */
+ retval = res * binexp.x * t256.x;
if (isinf (retval))
goto ret_huge;
else
@@ -233,13 +205,10 @@ ret:
strong_alias (__ieee754_exp, __exp_finite)
#endif
-/* Compute e^(x+xx). The routine also receives bound of error of previous
- calculation. If after computing exp the error exceeds the allowed bounds,
- the routine returns a non-positive number. Otherwise it returns the
- computed result, which is always positive. */
+/* Compute e^(x+xx). */
double
SECTION
-__exp1 (double x, double xx, double error)
+__exp1 (double x, double xx)
{
double bexp, t, eps, del, base, y, al, bet, res, rem, cor;
mynumber junk1, junk2, binexp = {{0, 0}};
@@ -249,6 +218,7 @@ __exp1 (double x, double xx, double erro
m = junk1.i[HIGH_HALF];
n = m & hugeint; /* no sign */
+ /* fabs (x) > 5.551112e-17 and fabs (x) < 7.080010e+02. */
if (n > smallint && n < bigint)
{
y = x * log2e.x + three51.x;
@@ -276,11 +246,9 @@ __exp1 (double x, double xx, double erro
rem = (bet + bet * eps) + al * eps;
res = al + rem;
- cor = (al - res) + rem;
- if (res == (res + cor * (1.0 + error + err_1)))
- return res * binexp.x;
- else
- return -10.0;
+ /* Maximum relative error before rounding is 8.8e-22 (69.9 bits).
+ Maximum ULP error is 0.500008. */
+ return res * binexp.x;
}
if (n <= smallint)
@@ -318,6 +286,7 @@ __exp1 (double x, double xx, double erro
cor = (al - res) + rem;
if (m >> 31)
{
+ /* x < 0. */
ex = junk1.i[LOW_HALF];
if (res < 1.0)
{
@@ -328,34 +297,25 @@ __exp1 (double x, double xx, double erro
if (ex >= -1022)
{
binexp.i[HIGH_HALF] = (1023 + ex) << 20;
- if (res == (res + cor * (1.0 + error + err_1)))
- return res * binexp.x;
- else
- return -10.0;
+ /* Maximum ULP error is 0.500008. */
+ return res * binexp.x;
}
+ /* Denormal case - ex < -1022. */
ex = -(1022 + ex);
binexp.i[HIGH_HALF] = (1023 - ex) << 20;
res *= binexp.x;
cor *= binexp.x;
- eps = 1.00000000001 + (error + err_1) * binexp.x;
t = 1.0 + res;
y = ((1.0 - t) + res) + cor;
res = t + y;
- cor = (t - res) + y;
- if (res == (res + eps * cor))
- {
- binexp.i[HIGH_HALF] = 0x00100000;
- return (res - 1.0) * binexp.x;
- }
- else
- return -10.0;
+ binexp.i[HIGH_HALF] = 0x00100000;
+ /* Maximum ULP error is 0.500004. */
+ return (res - 1.0) * binexp.x;
}
else
{
binexp.i[HIGH_HALF] = (junk1.i[LOW_HALF] + 767) << 20;
- if (res == (res + cor * (1.0 + error + err_1)))
- return res * binexp.x * t256.x;
- else
- return -10.0;
+ /* Maximum ULP error is 0.500008. */
+ return res * binexp.x * t256.x;
}
}
Index: glibc-2.26/sysdeps/ieee754/dbl-64/e_log.c
===================================================================
--- glibc-2.26.orig/sysdeps/ieee754/dbl-64/e_log.c
+++ glibc-2.26/sysdeps/ieee754/dbl-64/e_log.c
@@ -23,11 +23,10 @@
/* FUNCTION:ulog */
/* */
/* FILES NEEDED: dla.h endian.h mpa.h mydefs.h ulog.h */
-/* mpexp.c mplog.c mpa.c */
/* ulog.tbl */
/* */
/* An ultimate log routine. Given an IEEE double machine number x */
-/* it computes the correctly rounded (to nearest) value of log(x). */
+/* it computes the rounded (to nearest) value of log(x). */
/* Assumption: Machine arithmetic operations are performed in */
/* round to nearest mode of IEEE 754 standard. */
/* */
@@ -40,34 +39,26 @@
#include "MathLib.h"
#include <math.h>
#include <math_private.h>
-#include <stap-probe.h>
#ifndef SECTION
# define SECTION
#endif
-void __mplog (mp_no *, mp_no *, int);
-
/*********************************************************************/
-/* An ultimate log routine. Given an IEEE double machine number x */
-/* it computes the correctly rounded (to nearest) value of log(x). */
+/* An ultimate log routine. Given an IEEE double machine number x */
+/* it computes the rounded (to nearest) value of log(x). */
/*********************************************************************/
double
SECTION
__ieee754_log (double x)
{
-#define M 4
- static const int pr[M] = { 8, 10, 18, 32 };
- int i, j, n, ux, dx, p;
+ int i, j, n, ux, dx;
double dbl_n, u, p0, q, r0, w, nln2a, luai, lubi, lvaj, lvbj,
- sij, ssij, ttij, A, B, B0, y, y1, y2, polI, polII, sa, sb,
- t1, t2, t7, t8, t, ra, rb, ww,
- a0, aa0, s1, s2, ss2, s3, ss3, a1, aa1, a, aa, b, bb, c;
+ sij, ssij, ttij, A, B, B0, polI, polII, t8, a, aa, b, bb, c;
#ifndef DLA_FMS
- double t3, t4, t5, t6;
+ double t1, t2, t3, t4, t5;
#endif
number num;
- mp_no mpx, mpy, mpy1, mpy2, mperr;
#include "ulog.tbl"
#include "ulog.h"
@@ -101,7 +92,7 @@ __ieee754_log (double x)
if (w == 0.0)
return 0.0;
- /*--- Stage I, the case abs(x-1) < 0.03 */
+ /*--- The case abs(x-1) < 0.03 */
t8 = MHALF * w;
EMULV (t8, w, a, aa, t1, t2, t3, t4, t5);
@@ -118,50 +109,12 @@ __ieee754_log (double x)
polII *= w * w * w;
c = (aa + bb) + polII;
- /* End stage I, case abs(x-1) < 0.03 */
- if ((y = b + (c + b * E2)) == b + (c - b * E2))
- return y;
-
- /*--- Stage II, the case abs(x-1) < 0.03 */
-
- a = d19.d + w * d20.d;
- a = d18.d + w * a;
- a = d17.d + w * a;
- a = d16.d + w * a;
- a = d15.d + w * a;
- a = d14.d + w * a;
- a = d13.d + w * a;
- a = d12.d + w * a;
- a = d11.d + w * a;
-
- EMULV (w, a, s2, ss2, t1, t2, t3, t4, t5);
- ADD2 (d10.d, dd10.d, s2, ss2, s3, ss3, t1, t2);
- MUL2 (w, 0, s3, ss3, s2, ss2, t1, t2, t3, t4, t5, t6, t7, t8);
- ADD2 (d9.d, dd9.d, s2, ss2, s3, ss3, t1, t2);
- MUL2 (w, 0, s3, ss3, s2, ss2, t1, t2, t3, t4, t5, t6, t7, t8);
- ADD2 (d8.d, dd8.d, s2, ss2, s3, ss3, t1, t2);
- MUL2 (w, 0, s3, ss3, s2, ss2, t1, t2, t3, t4, t5, t6, t7, t8);
- ADD2 (d7.d, dd7.d, s2, ss2, s3, ss3, t1, t2);
- MUL2 (w, 0, s3, ss3, s2, ss2, t1, t2, t3, t4, t5, t6, t7, t8);
- ADD2 (d6.d, dd6.d, s2, ss2, s3, ss3, t1, t2);
- MUL2 (w, 0, s3, ss3, s2, ss2, t1, t2, t3, t4, t5, t6, t7, t8);
- ADD2 (d5.d, dd5.d, s2, ss2, s3, ss3, t1, t2);
- MUL2 (w, 0, s3, ss3, s2, ss2, t1, t2, t3, t4, t5, t6, t7, t8);
- ADD2 (d4.d, dd4.d, s2, ss2, s3, ss3, t1, t2);
- MUL2 (w, 0, s3, ss3, s2, ss2, t1, t2, t3, t4, t5, t6, t7, t8);
- ADD2 (d3.d, dd3.d, s2, ss2, s3, ss3, t1, t2);
- MUL2 (w, 0, s3, ss3, s2, ss2, t1, t2, t3, t4, t5, t6, t7, t8);
- ADD2 (d2.d, dd2.d, s2, ss2, s3, ss3, t1, t2);
- MUL2 (w, 0, s3, ss3, s2, ss2, t1, t2, t3, t4, t5, t6, t7, t8);
- MUL2 (w, 0, s2, ss2, s3, ss3, t1, t2, t3, t4, t5, t6, t7, t8);
- ADD2 (w, 0, s3, ss3, b, bb, t1, t2);
-
- /* End stage II, case abs(x-1) < 0.03 */
- if ((y = b + (bb + b * E4)) == b + (bb - b * E4))
- return y;
- goto stage_n;
+ /* Here b contains the high part of the result, and c the low part.
+ Maximum error is b * 2.334e-19, so accuracy is >61 bits.
+ Therefore max ULP error of b + c is ~0.502. */
+ return b + c;
- /*--- Stage I, the case abs(x-1) > 0.03 */
+ /*--- The case abs(x-1) > 0.03 */
case_03:
/* Find n,u such that x = u*2**n, 1/sqrt(2) < u < sqrt(2) */
@@ -203,58 +156,10 @@ case_03:
B0 = (((lubi + lvbj) + ssij) + ttij) + dbl_n * LN2B;
B = polI + B0;
- /* End stage I, case abs(x-1) >= 0.03 */
- if ((y = A + (B + E1)) == A + (B - E1))
- return y;
-
-
- /*--- Stage II, the case abs(x-1) > 0.03 */
-
- /* Improve the accuracy of r0 */
- EMULV (p0, r0, sa, sb, t1, t2, t3, t4, t5);
- t = r0 * ((1 - sa) - sb);
- EADD (r0, t, ra, rb);
-
- /* Compute w */
- MUL2 (q, 0, ra, rb, w, ww, t1, t2, t3, t4, t5, t6, t7, t8);
-
- EADD (A, B0, a0, aa0);
-
- /* Evaluate polynomial III */
- s1 = (c3.d + (c4.d + c5.d * w) * w) * w;
- EADD (c2.d, s1, s2, ss2);
- MUL2 (s2, ss2, w, ww, s3, ss3, t1, t2, t3, t4, t5, t6, t7, t8);
- MUL2 (s3, ss3, w, ww, s2, ss2, t1, t2, t3, t4, t5, t6, t7, t8);
- ADD2 (s2, ss2, w, ww, s3, ss3, t1, t2);
- ADD2 (s3, ss3, a0, aa0, a1, aa1, t1, t2);
-
- /* End stage II, case abs(x-1) >= 0.03 */
- if ((y = a1 + (aa1 + E3)) == a1 + (aa1 - E3))
- return y;
-
-
- /* Final stages. Use multi-precision arithmetic. */
-stage_n:
-
- for (i = 0; i < M; i++)
- {
- p = pr[i];
- __dbl_mp (x, &mpx, p);
- __dbl_mp (y, &mpy, p);
- __mplog (&mpx, &mpy, p);
- __dbl_mp (e[i].d, &mperr, p);
- __add (&mpy, &mperr, &mpy1, p);
- __sub (&mpy, &mperr, &mpy2, p);
- __mp_dbl (&mpy1, &y1, p);
- __mp_dbl (&mpy2, &y2, p);
- if (y1 == y2)
- {
- LIBC_PROBE (slowlog, 3, &p, &x, &y1);
- return y1;
- }
- }
- LIBC_PROBE (slowlog_inexact, 3, &p, &x, &y1);
- return y1;
+ /* Here A contains the high part of the result, and B the low part.
+ Maximum abs error is 6.095e-21 and min log (x) is 0.0295 since x > 1.03.
+ Therefore max ULP error of A + B is ~0.502. */
+ return A + B;
}
#ifndef __ieee754_log
Index: glibc-2.26/sysdeps/ieee754/dbl-64/e_pow.c
===================================================================
--- glibc-2.26.orig/sysdeps/ieee754/dbl-64/e_pow.c
+++ glibc-2.26/sysdeps/ieee754/dbl-64/e_pow.c
@@ -20,13 +20,9 @@
/* MODULE_NAME: upow.c */
/* */
/* FUNCTIONS: upow */
-/* power1 */
-/* my_log2 */
/* log1 */
/* checkint */
/* FILES NEEDED: dla.h endian.h mpa.h mydefs.h */
-/* halfulp.c mpexp.c mplog.c slowexp.c slowpow.c mpa.c */
-/* uexp.c upow.c */
/* root.tbl uexp.tbl upow.tbl */
/* An ultimate power routine. Given two IEEE double machine numbers y,x */
/* it computes the correctly rounded (to nearest) value of x^y. */
@@ -50,11 +46,8 @@
static const double huge = 1.0e300, tiny = 1.0e-300;
-double __exp1 (double x, double xx, double error);
-static double log1 (double x, double *delta, double *error);
-static double my_log2 (double x, double *delta, double *error);
-double __slowpow (double x, double y, double z);
-static double power1 (double x, double y);
+double __exp1 (double x, double xx);
+static double log1 (double x, double *delta);
static int checkint (double x);
/* An ultimate power routine. Given two IEEE double machine numbers y, x it
@@ -63,7 +56,7 @@ double
SECTION
__ieee754_pow (double x, double y)
{
- double z, a, aa, error, t, a1, a2, y1, y2;
+ double z, a, aa, t, a1, a2, y1, y2;
mynumber u, v;
int k;
int4 qx, qy;
@@ -100,7 +93,7 @@ __ieee754_pow (double x, double y)
not matter if |y| <= 2**-64. */
if (fabs (y) < 0x1p-64)
y = y < 0 ? -0x1p-64 : 0x1p-64;
- z = log1 (x, &aa, &error); /* x^y =e^(y log (X)) */
+ z = log1 (x, &aa); /* x^y =e^(y log (X)) */
t = y * CN;
y1 = t - (t - y);
y2 = y - y1;
@@ -111,9 +104,16 @@ __ieee754_pow (double x, double y)
aa = y2 * a1 + y * a2;
a1 = a + aa;
a2 = (a - a1) + aa;
- error = error * fabs (y);
- t = __exp1 (a1, a2, 1.9e16 * error); /* return -10 or 0 if wasn't computed exactly */
- retval = (t > 0) ? t : power1 (x, y);
+
+ /* Maximum relative error RElog of log1 is 1.0e-21 (69.7 bits).
+ Maximum relative error REexp of __exp1 is 8.8e-22 (69.9 bits).
+ We actually compute exp ((1 + RElog) * log (x) * y) * (1 + REexp).
+ Since RElog/REexp are tiny and log (x) * y is at most log (DBL_MAX),
+ this is equivalent to pow (x, y) * (1 + 710 * RElog + REexp).
+ So the relative error is 710 * 1.0e-21 + 8.8e-22 = 7.1e-19
+ (60.2 bits). The worst-case ULP error is 0.5064. */
+
+ retval = __exp1 (a1, a2);
}
if (isinf (retval))
@@ -218,33 +218,11 @@ __ieee754_pow (double x, double y)
strong_alias (__ieee754_pow, __pow_finite)
#endif
-/* Compute x^y using more accurate but more slow log routine. */
-static double
-SECTION
-power1 (double x, double y)
-{
- double z, a, aa, error, t, a1, a2, y1, y2;
- z = my_log2 (x, &aa, &error);
- t = y * CN;
- y1 = t - (t - y);
- y2 = y - y1;
- t = z * CN;
- a1 = t - (t - z);
- a2 = z - a1;
- a = y * z;
- aa = ((y1 * a1 - a) + y1 * a2 + y2 * a1) + y2 * a2 + aa * y;
- a1 = a + aa;
- a2 = (a - a1) + aa;
- error = error * fabs (y);
- t = __exp1 (a1, a2, 1.9e16 * error);
- return (t >= 0) ? t : __slowpow (x, y, z);
-}
-
/* Compute log(x) (x is left argument). The result is the returned double + the
- parameter DELTA. The result is bounded by ERROR. */
+ parameter DELTA. */
static double
SECTION
-log1 (double x, double *delta, double *error)
+log1 (double x, double *delta)
{
unsigned int i, j;
int m;
@@ -260,9 +238,7 @@ log1 (double x, double *delta, double *e
u.x = x;
m = u.i[HIGH_HALF];
- *error = 0;
- *delta = 0;
- if (m < 0x00100000) /* 1<x<2^-1007 */
+ if (m < 0x00100000) /* Handle denormal x. */
{
x = x * t52.x;
add = -52.0;
@@ -284,7 +260,7 @@ log1 (double x, double *delta, double *e
v.x = u.x + bigu.x;
uu = v.x - bigu.x;
i = (v.i[LOW_HALF] & 0x000003ff) << 2;
- if (two52.i[LOW_HALF] == 1023) /* nx = 0 */
+ if (two52.i[LOW_HALF] == 1023) /* Exponent of x is 0. */
{
if (i > 1192 && i < 1208) /* |x-1| < 1.5*2**-10 */
{
@@ -296,8 +272,8 @@ log1 (double x, double *delta, double *e
* (r7 + t * r8)))))
- 0.5 * t2 * (t + t1));
res = e1 + e2;
- *error = 1.0e-21 * fabs (t);
*delta = (e1 - res) + e2;
+ /* Max relative error is 1.464844e-24, so accurate to 79.1 bits. */
return res;
} /* |x-1| < 1.5*2**-10 */
else
@@ -316,12 +292,12 @@ log1 (double x, double *delta, double *e
t2 = ((((t - t1) + e) + (ui.x[i + 3] + vj.x[j + 2])) + e2 + e * e
* (p2 + e * (p3 + e * p4)));
res = t1 + t2;
- *error = 1.0e-24;
*delta = (t1 - res) + t2;
+ /* Max relative error is 1.0e-24, so accurate to 79.7 bits. */
return res;
}
- } /* nx = 0 */
- else /* nx != 0 */
+ }
+ else /* Exponent of x != 0. */
{
eps = u.x - uu;
nx = (two52.x - two52e.x) + add;
@@ -334,113 +310,13 @@ log1 (double x, double *delta, double *e
t2 = ((((t - t1) + e) + nx * ln2b.x + ui.x[i + 3] + e2) + e * e
* (q2 + e * (q3 + e * (q4 + e * (q5 + e * q6)))));
res = t1 + t2;
- *error = 1.0e-21;
- *delta = (t1 - res) + t2;
- return res;
- } /* nx != 0 */
-}
-
-/* Slower but more accurate routine of log. The returned result is double +
- DELTA. The result is bounded by ERROR. */
-static double
-SECTION
-my_log2 (double x, double *delta, double *error)
-{
- unsigned int i, j;
- int m;
- double uu, vv, eps, nx, e, e1, e2, t, t1, t2, res, add = 0;
- double ou1, ou2, lu1, lu2, ov, lv1, lv2, a, a1, a2;
- double y, yy, z, zz, j1, j2, j7, j8;
-#ifndef DLA_FMS
- double j3, j4, j5, j6;
-#endif
- mynumber u, v;
-#ifdef BIG_ENDI
- mynumber /**/ two52 = {{0x43300000, 0x00000000}}; /* 2**52 */
-#else
-# ifdef LITTLE_ENDI
- mynumber /**/ two52 = {{0x00000000, 0x43300000}}; /* 2**52 */
-# endif
-#endif
-
- u.x = x;
- m = u.i[HIGH_HALF];
- *error = 0;
- *delta = 0;
- add = 0;
- if (m < 0x00100000)
- { /* x < 2^-1022 */
- x = x * t52.x;
- add = -52.0;
- u.x = x;
- m = u.i[HIGH_HALF];
- }
-
- if ((m & 0x000fffff) < 0x0006a09e)
- {
- u.i[HIGH_HALF] = (m & 0x000fffff) | 0x3ff00000;
- two52.i[LOW_HALF] = (m >> 20);
- }
- else
- {
- u.i[HIGH_HALF] = (m & 0x000fffff) | 0x3fe00000;
- two52.i[LOW_HALF] = (m >> 20) + 1;
- }
-
- v.x = u.x + bigu.x;
- uu = v.x - bigu.x;
- i = (v.i[LOW_HALF] & 0x000003ff) << 2;
- /*------------------------------------- |x-1| < 2**-11------------------------------- */
- if ((two52.i[LOW_HALF] == 1023) && (i == 1200))
- {
- t = x - 1.0;
- EMULV (t, s3, y, yy, j1, j2, j3, j4, j5);
- ADD2 (-0.5, 0, y, yy, z, zz, j1, j2);
- MUL2 (t, 0, z, zz, y, yy, j1, j2, j3, j4, j5, j6, j7, j8);
- MUL2 (t, 0, y, yy, z, zz, j1, j2, j3, j4, j5, j6, j7, j8);
-
- e1 = t + z;
- e2 = ((((t - e1) + z) + zz) + t * t * t
- * (ss3 + t * (s4 + t * (s5 + t * (s6 + t * (s7 + t * s8))))));
- res = e1 + e2;
- *error = 1.0e-25 * fabs (t);
- *delta = (e1 - res) + e2;
- return res;
- }
- /*----------------------------- |x-1| > 2**-11 -------------------------- */
- else
- { /*Computing log(x) according to log table */
- nx = (two52.x - two52e.x) + add;
- ou1 = ui.x[i];
- ou2 = ui.x[i + 1];
- lu1 = ui.x[i + 2];
- lu2 = ui.x[i + 3];
- v.x = u.x * (ou1 + ou2) + bigv.x;
- vv = v.x - bigv.x;
- j = v.i[LOW_HALF] & 0x0007ffff;
- j = j + j + j;
- eps = u.x - uu * vv;
- ov = vj.x[j];
- lv1 = vj.x[j + 1];
- lv2 = vj.x[j + 2];
- a = (ou1 + ou2) * (1.0 + ov);
- a1 = (a + 1.0e10) - 1.0e10;
- a2 = a * (1.0 - a1 * uu * vv);
- e1 = eps * a1;
- e2 = eps * a2;
- e = e1 + e2;
- e2 = (e1 - e) + e2;
- t = nx * ln2a.x + lu1 + lv1;
- t1 = t + e;
- t2 = ((((t - t1) + e) + (lu2 + lv2 + nx * ln2b.x + e2)) + e * e
- * (p2 + e * (p3 + e * p4)));
- res = t1 + t2;
- *error = 1.0e-27;
*delta = (t1 - res) + t2;
+ /* Max relative error is 1.0e-21, so accurate to 69.7 bits. */
return res;
}
}
+
/* This function receives a double x and checks if it is an integer. If not,
it returns 0, else it returns 1 if even or -1 if odd. */
static int
Index: glibc-2.26/sysdeps/ieee754/dbl-64/halfulp.c
===================================================================
--- glibc-2.26.orig/sysdeps/ieee754/dbl-64/halfulp.c
+++ /dev/null
@@ -1,152 +0,0 @@
-/*
- * IBM Accurate Mathematical Library
- * written by International Business Machines Corp.
- * Copyright (C) 2001-2017 Free Software Foundation, Inc.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU Lesser General Public License as published by
- * the Free Software Foundation; either version 2.1 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
- */
-/************************************************************************/
-/* */
-/* MODULE_NAME:halfulp.c */
-/* */
-/* FUNCTIONS:halfulp */
-/* FILES NEEDED: mydefs.h dla.h endian.h */
-/* uroot.c */
-/* */
-/*Routine halfulp(double x, double y) computes x^y where result does */
-/*not need rounding. If the result is closer to 0 than can be */
-/*represented it returns 0. */
-/* In the following cases the function does not compute anything */
-/*and returns a negative number: */
-/*1. if the result needs rounding, */
-/*2. if y is outside the interval [0, 2^20-1], */
-/*3. if x can be represented by x=2**n for some integer n. */
-/************************************************************************/
-
-#include "endian.h"
-#include "mydefs.h"
-#include <dla.h>
-#include <math_private.h>
-
-#ifndef SECTION
-# define SECTION
-#endif
-
-static const int4 tab54[32] = {
- 262143, 11585, 1782, 511, 210, 107, 63, 42,
- 30, 22, 17, 14, 12, 10, 9, 7,
- 7, 6, 5, 5, 5, 4, 4, 4,
- 3, 3, 3, 3, 3, 3, 3, 3
-};
-
-
-double
-SECTION
-__halfulp (double x, double y)
-{
- mynumber v;
- double z, u, uu;
-#ifndef DLA_FMS
- double j1, j2, j3, j4, j5;
-#endif
- int4 k, l, m, n;
- if (y <= 0) /*if power is negative or zero */
- {
- v.x = y;
- if (v.i[LOW_HALF] != 0)
- return -10.0;
- v.x = x;
- if (v.i[LOW_HALF] != 0)
- return -10.0;
- if ((v.i[HIGH_HALF] & 0x000fffff) != 0)
- return -10; /* if x =2 ^ n */
- k = ((v.i[HIGH_HALF] & 0x7fffffff) >> 20) - 1023; /* find this n */
- z = (double) k;
- return (z * y == -1075.0) ? 0 : -10.0;
- }
- /* if y > 0 */
- v.x = y;
- if (v.i[LOW_HALF] != 0)
- return -10.0;
-
- v.x = x;
- /* case where x = 2**n for some integer n */
- if (((v.i[HIGH_HALF] & 0x000fffff) | v.i[LOW_HALF]) == 0)
- {
- k = (v.i[HIGH_HALF] >> 20) - 1023;
- return (((double) k) * y == -1075.0) ? 0 : -10.0;
- }
-
- v.x = y;
- k = v.i[HIGH_HALF];
- m = k << 12;
- l = 0;
- while (m)
- {
- m = m << 1; l++;
- }
- n = (k & 0x000fffff) | 0x00100000;
- n = n >> (20 - l); /* n is the odd integer of y */
- k = ((k >> 20) - 1023) - l; /* y = n*2**k */
- if (k > 5)
- return -10.0;
- if (k > 0)
- for (; k > 0; k--)
- n *= 2;
- if (n > 34)
- return -10.0;
- k = -k;
- if (k > 5)
- return -10.0;
-
- /* now treat x */
- while (k > 0)
- {
- z = __ieee754_sqrt (x);
- EMULV (z, z, u, uu, j1, j2, j3, j4, j5);
- if (((u - x) + uu) != 0)
- break;
- x = z;
- k--;
- }
- if (k)
- return -10.0;
-
- /* it is impossible that n == 2, so the mantissa of x must be short */
-
- v.x = x;
- if (v.i[LOW_HALF])
- return -10.0;
- k = v.i[HIGH_HALF];
- m = k << 12;
- l = 0;
- while (m)
- {
- m = m << 1; l++;
- }
- m = (k & 0x000fffff) | 0x00100000;
- m = m >> (20 - l); /* m is the odd integer of x */
-
- /* now check whether the length of m**n is at most 54 bits */
-
- if (m > tab54[n - 3])
- return -10.0;
-
- /* yes, it is - now compute x**n by simple multiplications */
-
- u = x;
- for (k = 1; k < n; k++)
- u = u * x;
- return u;
-}
Index: glibc-2.26/sysdeps/ieee754/dbl-64/s_sin.c
===================================================================
--- glibc-2.26.orig/sysdeps/ieee754/dbl-64/s_sin.c
+++ glibc-2.26/sysdeps/ieee754/dbl-64/s_sin.c
@@ -22,22 +22,11 @@
/* */
/* FUNCTIONS: usin */
/* ucos */
-/* slow */
-/* slow1 */
-/* slow2 */
-/* sloww */
-/* sloww1 */
-/* sloww2 */
-/* bsloww */
-/* bsloww1 */
-/* bsloww2 */
-/* cslow2 */
/* FILES NEEDED: dla.h endian.h mpa.h mydefs.h usncs.h */
-/* branred.c sincos32.c dosincos.c mpa.c */
-/* sincos.tbl */
+/* branred.c sincos.tbl */
/* */
-/* An ultimate sin and routine. Given an IEEE double machine number x */
-/* it computes the correctly rounded (to nearest) value of sin(x) or cos(x) */
+/* An ultimate sin and cos routine. Given an IEEE double machine number x */
+/* it computes sin(x) or cos(x) with ~0.55 ULP. */
/* Assumption: Machine arithmetic operations are performed in */
/* round to nearest mode of IEEE 754 standard. */
/* */
@@ -65,35 +54,11 @@
a - a^3/3! + a^5/5! - a^7/7! + a^9/9! + (1 - a^2) * da / 2
The constants s1, s2, s3, etc. are pre-computed values of 1/3!, 1/5! and so
- on. The result is returned to LHS and correction in COR. */
-#define TAYLOR_SIN(xx, a, da, cor) \
+ on. The result is returned to LHS. */
+#define TAYLOR_SIN(xx, a, da) \
({ \
double t = ((POLYNOMIAL (xx) * (a) - 0.5 * (da)) * (xx) + (da)); \
double res = (a) + t; \
- (cor) = ((a) - res) + t; \
- res; \
-})
-
-/* This is again a variation of the Taylor series expansion with the term
- x^3/3! expanded into the following for better accuracy:
-
- bb * x ^ 3 + 3 * aa * x * x1 * x2 + aa * x1 ^ 3 + aa * x2 ^ 3
-
- The correction term is dx and bb + aa = -1/3!
- */
-#define TAYLOR_SLOW(x0, dx, cor) \
-({ \
- static const double th2_36 = 206158430208.0; /* 1.5*2**37 */ \
- double xx = (x0) * (x0); \
- double x1 = ((x0) + th2_36) - th2_36; \
- double y = aa * x1 * x1 * x1; \
- double r = (x0) + y; \
- double x2 = ((x0) - x1) + (dx); \
- double t = (((POLYNOMIAL2 (xx) + bb) * xx + 3.0 * aa * x1 * x2) \
- * (x0) + aa * x2 * x2 * x2 + (dx)); \
- t = (((x0) - r) + y) + t; \
- double res = r + t; \
- (cor) = (r - res) + t; \
res; \
})
@@ -123,31 +88,15 @@ static const double
cs4 = -4.16666666666664434524222570944589E-02,
cs6 = 1.38888874007937613028114285595617E-03;
-static const double t22 = 0x1.8p22;
-
-void __dubsin (double x, double dx, double w[]);
-void __docos (double x, double dx, double w[]);
-double __mpsin (double x, double dx, bool reduce_range);
-double __mpcos (double x, double dx, bool reduce_range);
-static double slow (double x);
-static double slow1 (double x);
-static double slow2 (double x);
-static double sloww (double x, double dx, double orig, bool shift_quadrant);
-static double sloww1 (double x, double dx, double orig, bool shift_quadrant);
-static double sloww2 (double x, double dx, double orig, int n);
-static double bsloww (double x, double dx, double orig, int n);
-static double bsloww1 (double x, double dx, double orig, int n);
-static double bsloww2 (double x, double dx, double orig, int n);
int __branred (double x, double *a, double *aa);
-static double cslow2 (double x);
/* Given a number partitioned into X and DX, this function computes the cosine
of the number by combining the sin and cos of X (as computed by a variation
of the Taylor series) with the values looked up from the sin/cos table to
- get the result in RES and a correction value in COR. */
+ get the result. */
static inline double
__always_inline
-do_cos (double x, double dx, double *corp)
+do_cos (double x, double dx)
{
mynumber u;
@@ -157,60 +106,28 @@ do_cos (double x, double dx, double *cor
u.x = big + fabs (x);
x = fabs (x) - (u.x - big) + dx;
- double xx, s, sn, ssn, c, cs, ccs, res, cor;
+ double xx, s, sn, ssn, c, cs, ccs, cor;
xx = x * x;
s = x + x * xx * (sn3 + xx * sn5);
c = xx * (cs2 + xx * (cs4 + xx * cs6));
SINCOS_TABLE_LOOKUP (u, sn, ssn, cs, ccs);
cor = (ccs - s * ssn - cs * c) - sn * s;
- res = cs + cor;
- cor = (cs - res) + cor;
- *corp = cor;
- return res;
-}
-
-/* A more precise variant of DO_COS. EPS is the adjustment to the correction
- COR. */
-static inline double
-__always_inline
-do_cos_slow (double x, double dx, double eps, double *corp)
-{
- mynumber u;
-
- if (x <= 0)
- dx = -dx;
-
- u.x = big + fabs (x);
- x = fabs (x) - (u.x - big);
-
- double xx, y, x1, x2, e1, e2, res, cor;
- double s, sn, ssn, c, cs, ccs;
- xx = x * x;
- s = x * xx * (sn3 + xx * sn5);
- c = x * dx + xx * (cs2 + xx * (cs4 + xx * cs6));
- SINCOS_TABLE_LOOKUP (u, sn, ssn, cs, ccs);
- x1 = (x + t22) - t22;
- x2 = (x - x1) + dx;
- e1 = (sn + t22) - t22;
- e2 = (sn - e1) + ssn;
- cor = (ccs - cs * c - e1 * x2 - e2 * x) - sn * s;
- y = cs - e1 * x1;
- cor = cor + ((cs - y) - e1 * x1);
- res = y + cor;
- cor = (y - res) + cor;
- cor = 1.0005 * cor + __copysign (eps, cor);
- *corp = cor;
- return res;
+ return cs + cor;
}
/* Given a number partitioned into X and DX, this function computes the sine of
the number by combining the sin and cos of X (as computed by a variation of
the Taylor series) with the values looked up from the sin/cos table to get
- the result in RES and a correction value in COR. */
+ the result. */
static inline double
__always_inline
-do_sin (double x, double dx, double *corp)
+do_sin (double x, double dx)
{
+ double xold = x;
+ /* Max ULP is 0.501 if |x| < 0.126, otherwise ULP is 0.518. */
+ if (fabs (x) < 0.126)
+ return TAYLOR_SIN (x * x, x, dx);
+
mynumber u;
if (x <= 0)
@@ -218,85 +135,22 @@ do_sin (double x, double dx, double *cor
u.x = big + fabs (x);
x = fabs (x) - (u.x - big);
- double xx, s, sn, ssn, c, cs, ccs, cor, res;
+ double xx, s, sn, ssn, c, cs, ccs, cor;
xx = x * x;
s = x + (dx + x * xx * (sn3 + xx * sn5));
c = x * dx + xx * (cs2 + xx * (cs4 + xx * cs6));
SINCOS_TABLE_LOOKUP (u, sn, ssn, cs, ccs);
cor = (ssn + s * ccs - sn * c) + cs * s;
- res = sn + cor;
- cor = (sn - res) + cor;
- *corp = cor;
- return res;
-}
-
-/* A more precise variant of DO_SIN. EPS is the adjustment to the correction
- COR. */
-static inline double
-__always_inline
-do_sin_slow (double x, double dx, double eps, double *corp)
-{
- mynumber u;
-
- if (x <= 0)
- dx = -dx;
- u.x = big + fabs (x);
- x = fabs (x) - (u.x - big);
-
- double xx, y, x1, x2, c1, c2, res, cor;
- double s, sn, ssn, c, cs, ccs;
- xx = x * x;
- s = x * xx * (sn3 + xx * sn5);
- c = xx * (cs2 + xx * (cs4 + xx * cs6));
- SINCOS_TABLE_LOOKUP (u, sn, ssn, cs, ccs);
- x1 = (x + t22) - t22;
- x2 = (x - x1) + dx;
- c1 = (cs + t22) - t22;
- c2 = (cs - c1) + ccs;
- cor = (ssn + s * ccs + cs * s + c2 * x + c1 * x2 - sn * x * dx) - sn * c;
- y = sn + c1 * x1;
- cor = cor + ((sn - y) + c1 * x1);
- res = y + cor;
- cor = (y - res) + cor;
- cor = 1.0005 * cor + __copysign (eps, cor);
- *corp = cor;
- return res;
-}
-
-/* Reduce range of X and compute sin of a + da. When SHIFT_QUADRANT is true,
- the routine returns the cosine of a + da by rotating the quadrant once and
- computing the sine of the result. */
-static inline double
-__always_inline
-reduce_and_compute (double x, bool shift_quadrant)
-{
- double retval = 0, a, da;
- unsigned int n = __branred (x, &a, &da);
- int4 k = (n + shift_quadrant) % 4;
- switch (k)
- {
- case 2:
- a = -a;
- da = -da;
- /* Fall through. */
- case 0:
- if (a * a < 0.01588)
- retval = bsloww (a, da, x, n);
- else
- retval = bsloww1 (a, da, x, n);
- break;
-
- case 1:
- case 3:
- retval = bsloww2 (a, da, x, n);
- break;
- }
- return retval;
+ return __copysign (sn + cor, xold);
}
+/* Reduce range of x to within PI/2 with abs (x) < 105414350. The high part
+ is written to *a, the low part to *da. Range reduction is accurate to 136
+ bits so that when x is large and *a very close to zero, all 53 bits of *a
+ are correct. */
static inline int4
__always_inline
-reduce_sincos_1 (double x, double *a, double *da)
+reduce_sincos (double x, double *a, double *da)
{
mynumber v;
@@ -305,156 +159,54 @@ reduce_sincos_1 (double x, double *a, do
v.x = t;
double y = (x - xn * mp1) - xn * mp2;
int4 n = v.i[LOW_HALF] & 3;
- double db = xn * mp3;
- double b = y - db;
- db = (y - b) - db;
- *a = b;
- *da = db;
-
- return n;
-}
-
-/* Compute sin (A + DA). cos can be computed by passing SHIFT_QUADRANT as
- true, which results in shifting the quadrant N clockwise. */
-static double
-__always_inline
-do_sincos_1 (double a, double da, double x, int4 n, bool shift_quadrant)
-{
- double xx, retval, res, cor;
- double eps = fabs (x) * 1.2e-30;
-
- int k1 = (n + shift_quadrant) & 3;
- switch (k1)
- { /* quarter of unit circle */
- case 2:
- a = -a;
- da = -da;
- /* Fall through. */
- case 0:
- xx = a * a;
- if (xx < 0.01588)
- {
- /* Taylor series. */
- res = TAYLOR_SIN (xx, a, da, cor);
- cor = 1.02 * cor + __copysign (eps, cor);
- retval = (res == res + cor) ? res : sloww (a, da, x, shift_quadrant);
- }
- else
- {
- res = do_sin (a, da, &cor);
- cor = 1.035 * cor + __copysign (eps, cor);
- retval = ((res == res + cor) ? __copysign (res, a)
- : sloww1 (a, da, x, shift_quadrant));
- }
- break;
-
- case 1:
- case 3:
- res = do_cos (a, da, &cor);
- cor = 1.025 * cor + __copysign (eps, cor);
- retval = ((res == res + cor) ? ((n & 2) ? -res : res)
- : sloww2 (a, da, x, n));
- break;
- }
-
- return retval;
-}
-
-static inline int4
-__always_inline
-reduce_sincos_2 (double x, double *a, double *da)
-{
- mynumber v;
-
- double t = (x * hpinv + toint);
- double xn = t - toint;
- v.x = t;
- double xn1 = (xn + 8.0e22) - 8.0e22;
- double xn2 = xn - xn1;
- double y = ((((x - xn1 * mp1) - xn1 * mp2) - xn2 * mp1) - xn2 * mp2);
- int4 n = v.i[LOW_HALF] & 3;
- double db = xn1 * pp3;
- t = y - db;
- db = (y - t) - db;
- db = (db - xn2 * pp3) - xn * pp4;
- double b = t + db;
- db = (t - b) + db;
+ double b, db, t1, t2;
+ t1 = xn * pp3;
+ t2 = y - t1;
+ db = (y - t2) - t1;
+
+ t1 = xn * pp4;
+ b = t2 - t1;
+ db += (t2 - b) - t1;
*a = b;
*da = db;
-
return n;
}
-/* Compute sin (A + DA). cos can be computed by passing SHIFT_QUADRANT as
- true, which results in shifting the quadrant N clockwise. */
+/* Compute sin or cos (A + DA) for the given quadrant N. */
static double
__always_inline
-do_sincos_2 (double a, double da, double x, int4 n, bool shift_quadrant)
+do_sincos (double a, double da, int4 n)
{
- double res, retval, cor, xx;
+ double retval;
- double eps = 1.0e-24;
-
- int4 k = (n + shift_quadrant) & 3;
-
- switch (k)
- {
- case 2:
- a = -a;
- da = -da;
- /* Fall through. */
- case 0:
- xx = a * a;
- if (xx < 0.01588)
- {
- /* Taylor series. */
- res = TAYLOR_SIN (xx, a, da, cor);
- cor = 1.02 * cor + __copysign (eps, cor);
- retval = (res == res + cor) ? res : bsloww (a, da, x, n);
- }
- else
- {
- res = do_sin (a, da, &cor);
- cor = 1.035 * cor + __copysign (eps, cor);
- retval = ((res == res + cor) ? __copysign (res, a)
- : bsloww1 (a, da, x, n));
- }
- break;
-
- case 1:
- case 3:
- res = do_cos (a, da, &cor);
- cor = 1.025 * cor + __copysign (eps, cor);
- retval = ((res == res + cor) ? ((n & 2) ? -res : res)
- : bsloww2 (a, da, x, n));
- break;
- }
+ if (n & 1)
+ /* Max ULP is 0.513. */
+ retval = do_cos (a, da);
+ else
+ /* Max ULP is 0.501 if xx < 0.01588, otherwise ULP is 0.518. */
+ retval = do_sin (a, da);
- return retval;
+ return (n & 2) ? -retval : retval;
}
+
/*******************************************************************/
/* An ultimate sin routine. Given an IEEE double machine number x */
/* it computes the correctly rounded (to nearest) value of sin(x) */
/*******************************************************************/
-#ifdef IN_SINCOS
-static double
-#else
+#ifndef IN_SINCOS
double
SECTION
-#endif
__sin (double x)
{
- double xx, res, t, cor;
+ double t, a, da;
mynumber u;
- int4 k, m;
+ int4 k, m, n;
double retval = 0;
-#ifndef IN_SINCOS
SET_RESTORE_ROUND_53BIT (FE_TONEAREST);
-#endif
u.x = x;
m = u.i[HIGH_HALF];
@@ -464,56 +216,34 @@ __sin (double x)
math_check_force_underflow (x);
retval = x;
}
- /*---------------------------- 2^-26 < |x|< 0.25 ----------------------*/
- else if (k < 0x3fd00000)
- {
- xx = x * x;
- /* Taylor series. */
- t = POLYNOMIAL (xx) * (xx * x);
- res = x + t;
- cor = (x - res) + t;
- retval = (res == res + 1.07 * cor) ? res : slow (x);
- } /* else if (k < 0x3fd00000) */
-/*---------------------------- 0.25<|x|< 0.855469---------------------- */
+/*--------------------------- 2^-26<|x|< 0.855469---------------------- */
else if (k < 0x3feb6000)
{
- res = do_sin (x, 0, &cor);
- retval = (res == res + 1.096 * cor) ? res : slow1 (x);
- retval = __copysign (retval, x);
+ /* Max ULP is 0.548. */
+ retval = do_sin (x, 0);
} /* else if (k < 0x3feb6000) */
/*----------------------- 0.855469 <|x|<2.426265 ----------------------*/
else if (k < 0x400368fd)
{
-
t = hp0 - fabs (x);
- res = do_cos (t, hp1, &cor);
- retval = (res == res + 1.020 * cor) ? res : slow2 (x);
- retval = __copysign (retval, x);
+ /* Max ULP is 0.51. */
+ retval = __copysign (do_cos (t, hp1), x);
} /* else if (k < 0x400368fd) */
-#ifndef IN_SINCOS
/*-------------------------- 2.426265<|x|< 105414350 ----------------------*/
else if (k < 0x419921FB)
{
- double a, da;
- int4 n = reduce_sincos_1 (x, &a, &da);
- retval = do_sincos_1 (a, da, x, n, false);
+ n = reduce_sincos (x, &a, &da);
+ retval = do_sincos (a, da, n);
} /* else if (k < 0x419921FB ) */
-/*---------------------105414350 <|x|< 281474976710656 --------------------*/
- else if (k < 0x42F00000)
- {
- double a, da;
-
- int4 n = reduce_sincos_2 (x, &a, &da);
- retval = do_sincos_2 (a, da, x, n, false);
- } /* else if (k < 0x42F00000 ) */
-
-/* -----------------281474976710656 <|x| <2^1024----------------------------*/
+/* --------------------105414350 <|x| <2^1024------------------------------*/
else if (k < 0x7ff00000)
- retval = reduce_and_compute (x, false);
-
+ {
+ n = __branred (x, &a, &da);
+ retval = do_sincos (a, da, n);
+ }
/*--------------------- |x| > 2^1024 ----------------------------------*/
else
{
@@ -521,7 +251,6 @@ __sin (double x)
__set_errno (EDOM);
retval = x / x;
}
-#endif
return retval;
}
@@ -532,23 +261,17 @@ __sin (double x)
/* it computes the correctly rounded (to nearest) value of cos(x) */
/*******************************************************************/
-#ifdef IN_SINCOS
-static double
-#else
double
SECTION
-#endif
__cos (double x)
{
- double y, xx, res, cor, a, da;
+ double y, a, da;
mynumber u;
- int4 k, m;
+ int4 k, m, n;
double retval = 0;
-#ifndef IN_SINCOS
SET_RESTORE_ROUND_53BIT (FE_TONEAREST);
-#endif
u.x = x;
m = u.i[HIGH_HALF];
@@ -560,8 +283,8 @@ __cos (double x)
else if (k < 0x3feb6000)
{ /* 2^-27 < |x| < 0.855469 */
- res = do_cos (x, 0, &cor);
- retval = (res == res + 1.020 * cor) ? res : cslow2 (x);
+ /* Max ULP is 0.51. */
+ retval = do_cos (x, 0);
} /* else if (k < 0x3feb6000) */
else if (k < 0x400368fd)
@@ -569,43 +292,23 @@ __cos (double x)
y = hp0 - fabs (x);
a = y + hp1;
da = (y - a) + hp1;
- xx = a * a;
- if (xx < 0.01588)
- {
- res = TAYLOR_SIN (xx, a, da, cor);
- cor = 1.02 * cor + __copysign (1.0e-31, cor);
- retval = (res == res + cor) ? res : sloww (a, da, x, true);
- }
- else
- {
- res = do_sin (a, da, &cor);
- cor = 1.035 * cor + __copysign (1.0e-31, cor);
- retval = ((res == res + cor) ? __copysign (res, a)
- : sloww1 (a, da, x, true));
- }
-
+ /* Max ULP is 0.501 if xx < 0.01588 or 0.518 otherwise.
+ Range reduction uses 106 bits here which is sufficient. */
+ retval = do_sin (a, da);
} /* else if (k < 0x400368fd) */
-
-#ifndef IN_SINCOS
else if (k < 0x419921FB)
{ /* 2.426265<|x|< 105414350 */
- double a, da;
- int4 n = reduce_sincos_1 (x, &a, &da);
- retval = do_sincos_1 (a, da, x, n, true);
+ n = reduce_sincos (x, &a, &da);
+ retval = do_sincos (a, da, n + 1);
} /* else if (k < 0x419921FB ) */
- else if (k < 0x42F00000)
- {
- double a, da;
-
- int4 n = reduce_sincos_2 (x, &a, &da);
- retval = do_sincos_2 (a, da, x, n, true);
- } /* else if (k < 0x42F00000 ) */
-
- /* 281474976710656 <|x| <2^1024 */
+ /* 105414350 <|x| <2^1024 */
else if (k < 0x7ff00000)
- retval = reduce_and_compute (x, true);
+ {
+ n = __branred (x, &a, &da);
+ retval = do_sincos (a, da, n + 1);
+ }
else
{
@@ -613,304 +316,10 @@ __cos (double x)
__set_errno (EDOM);
retval = x / x; /* |x| > 2^1024 */
}
-#endif
return retval;
}
-/************************************************************************/
-/* Routine compute sin(x) for 2^-26 < |x|< 0.25 by Taylor with more */
-/* precision and if still doesn't accurate enough by mpsin or dubsin */
-/************************************************************************/
-
-static inline double
-__always_inline
-slow (double x)
-{
- double res, cor, w[2];
- res = TAYLOR_SLOW (x, 0, cor);
- if (res == res + 1.0007 * cor)
- return res;
-
- __dubsin (fabs (x), 0, w);
- if (w[0] == w[0] + 1.000000001 * w[1])
- return __copysign (w[0], x);
-
- return __copysign (__mpsin (fabs (x), 0, false), x);
-}
-
-/*******************************************************************************/
-/* Routine compute sin(x) for 0.25<|x|< 0.855469 by __sincostab.tbl and Taylor */
-/* and if result still doesn't accurate enough by mpsin or dubsin */
-/*******************************************************************************/
-
-static inline double
-__always_inline
-slow1 (double x)
-{
- double w[2], cor, res;
-
- res = do_sin_slow (x, 0, 0, &cor);
- if (res == res + cor)
- return res;
-
- __dubsin (fabs (x), 0, w);
- if (w[0] == w[0] + 1.000000005 * w[1])
- return w[0];
-
- return __mpsin (fabs (x), 0, false);
-}
-
-/**************************************************************************/
-/* Routine compute sin(x) for 0.855469 <|x|<2.426265 by __sincostab.tbl */
-/* and if result still doesn't accurate enough by mpsin or dubsin */
-/**************************************************************************/
-static inline double
-__always_inline
-slow2 (double x)
-{
- double w[2], y, y1, y2, cor, res;
-
- double t = hp0 - fabs (x);
- res = do_cos_slow (t, hp1, 0, &cor);
- if (res == res + cor)
- return res;
-
- y = fabs (x) - hp0;
- y1 = y - hp1;
- y2 = (y - y1) - hp1;
- __docos (y1, y2, w);
- if (w[0] == w[0] + 1.000000005 * w[1])
- return w[0];
-
- return __mpsin (fabs (x), 0, false);
-}
-
-/* Compute sin(x + dx) where X is small enough to use Taylor series around zero
- and (x + dx) in the first or third quarter of the unit circle. ORIG is the
- original value of X for computing error of the result. If the result is not
- accurate enough, the routine calls mpsin or dubsin. SHIFT_QUADRANT rotates
- the unit circle by 1 to compute the cosine instead of sine. */
-static inline double
-__always_inline
-sloww (double x, double dx, double orig, bool shift_quadrant)
-{
- double y, t, res, cor, w[2], a, da, xn;
- mynumber v;
- int4 n;
- res = TAYLOR_SLOW (x, dx, cor);
-
- double eps = fabs (orig) * 3.1e-30;
-
- cor = 1.0005 * cor + __copysign (eps, cor);
-
- if (res == res + cor)
- return res;
-
- a = fabs (x);
- da = (x > 0) ? dx : -dx;
- __dubsin (a, da, w);
- eps = fabs (orig) * 1.1e-30;
- cor = 1.000000001 * w[1] + __copysign (eps, w[1]);
-
- if (w[0] == w[0] + cor)
- return __copysign (w[0], x);
-
- t = (orig * hpinv + toint);
- xn = t - toint;
- v.x = t;
- y = (orig - xn * mp1) - xn * mp2;
- n = (v.i[LOW_HALF] + shift_quadrant) & 3;
- da = xn * pp3;
- t = y - da;
- da = (y - t) - da;
- y = xn * pp4;
- a = t - y;
- da = ((t - a) - y) + da;
-
- if (n & 2)
- {
- a = -a;
- da = -da;
- }
- x = fabs (a);
- dx = (a > 0) ? da : -da;
- __dubsin (x, dx, w);
- eps = fabs (orig) * 1.1e-40;
- cor = 1.000000001 * w[1] + __copysign (eps, w[1]);
-
- if (w[0] == w[0] + cor)
- return __copysign (w[0], a);
-
- return shift_quadrant ? __mpcos (orig, 0, true) : __mpsin (orig, 0, true);
-}
-
-/* Compute sin(x + dx) where X is in the first or third quarter of the unit
- circle. ORIG is the original value of X for computing error of the result.
- If the result is not accurate enough, the routine calls mpsin or dubsin.
- SHIFT_QUADRANT rotates the unit circle by 1 to compute the cosine instead of
- sine. */
-static inline double
-__always_inline
-sloww1 (double x, double dx, double orig, bool shift_quadrant)
-{
- double w[2], cor, res;
-
- res = do_sin_slow (x, dx, 3.1e-30 * fabs (orig), &cor);
-
- if (res == res + cor)
- return __copysign (res, x);
-
- dx = (x > 0 ? dx : -dx);
- __dubsin (fabs (x), dx, w);
-
- double eps = 1.1e-30 * fabs (orig);
- cor = 1.000000005 * w[1] + __copysign (eps, w[1]);
-
- if (w[0] == w[0] + cor)
- return __copysign (w[0], x);
-
- return shift_quadrant ? __mpcos (orig, 0, true) : __mpsin (orig, 0, true);
-}
-
-/***************************************************************************/
-/* Routine compute sin(x+dx) (Double-Length number) where x in second or */
-/* fourth quarter of unit circle.Routine receive also the original value */
-/* and quarter(n= 1or 3)of x for computing error of result.And if result not*/
-/* accurate enough routine calls mpsin1 or dubsin */
-/***************************************************************************/
-
-static inline double
-__always_inline
-sloww2 (double x, double dx, double orig, int n)
-{
- double w[2], cor, res;
-
- res = do_cos_slow (x, dx, 3.1e-30 * fabs (orig), &cor);
-
- if (res == res + cor)
- return (n & 2) ? -res : res;
-
- dx = x > 0 ? dx : -dx;
- __docos (fabs (x), dx, w);
-
- double eps = 1.1e-30 * fabs (orig);
- cor = 1.000000005 * w[1] + __copysign (eps, w[1]);
-
- if (w[0] == w[0] + cor)
- return (n & 2) ? -w[0] : w[0];
-
- return (n & 1) ? __mpsin (orig, 0, true) : __mpcos (orig, 0, true);
-}
-
-/***************************************************************************/
-/* Routine compute sin(x+dx) or cos(x+dx) (Double-Length number) where x */
-/* is small enough to use Taylor series around zero and (x+dx) */
-/* in first or third quarter of unit circle.Routine receive also */
-/* (right argument) the original value of x for computing error of */
-/* result.And if result not accurate enough routine calls other routines */
-/***************************************************************************/
-
-static inline double
-__always_inline
-bsloww (double x, double dx, double orig, int n)
-{
- double res, cor, w[2], a, da;
-
- res = TAYLOR_SLOW (x, dx, cor);
- cor = 1.0005 * cor + __copysign (1.1e-24, cor);
- if (res == res + cor)
- return res;
-
- a = fabs (x);
- da = (x > 0) ? dx : -dx;
- __dubsin (a, da, w);
- cor = 1.000000001 * w[1] + __copysign (1.1e-24, w[1]);
-
- if (w[0] == w[0] + cor)
- return __copysign (w[0], x);
-
- return (n & 1) ? __mpcos (orig, 0, true) : __mpsin (orig, 0, true);
-}
-
-/***************************************************************************/
-/* Routine compute sin(x+dx) or cos(x+dx) (Double-Length number) where x */
-/* in first or third quarter of unit circle.Routine receive also */
-/* (right argument) the original value of x for computing error of result.*/
-/* And if result not accurate enough routine calls other routines */
-/***************************************************************************/
-
-static inline double
-__always_inline
-bsloww1 (double x, double dx, double orig, int n)
-{
- double w[2], cor, res;
-
- res = do_sin_slow (x, dx, 1.1e-24, &cor);
- if (res == res + cor)
- return (x > 0) ? res : -res;
-
- dx = (x > 0) ? dx : -dx;
- __dubsin (fabs (x), dx, w);
-
- cor = 1.000000005 * w[1] + __copysign (1.1e-24, w[1]);
-
- if (w[0] == w[0] + cor)
- return __copysign (w[0], x);
-
- return (n & 1) ? __mpcos (orig, 0, true) : __mpsin (orig, 0, true);
-}
-
-/***************************************************************************/
-/* Routine compute sin(x+dx) or cos(x+dx) (Double-Length number) where x */
-/* in second or fourth quarter of unit circle.Routine receive also the */
-/* original value and quarter(n= 1or 3)of x for computing error of result. */
-/* And if result not accurate enough routine calls other routines */
-/***************************************************************************/
-
-static inline double
-__always_inline
-bsloww2 (double x, double dx, double orig, int n)
-{
- double w[2], cor, res;
-
- res = do_cos_slow (x, dx, 1.1e-24, &cor);
- if (res == res + cor)
- return (n & 2) ? -res : res;
-
- dx = (x > 0) ? dx : -dx;
- __docos (fabs (x), dx, w);
-
- cor = 1.000000005 * w[1] + __copysign (1.1e-24, w[1]);
-
- if (w[0] == w[0] + cor)
- return (n & 2) ? -w[0] : w[0];
-
- return (n & 1) ? __mpsin (orig, 0, true) : __mpcos (orig, 0, true);
-}
-
-/************************************************************************/
-/* Routine compute cos(x) for 2^-27 < |x|< 0.25 by Taylor with more */
-/* precision and if still doesn't accurate enough by mpcos or docos */
-/************************************************************************/
-
-static inline double
-__always_inline
-cslow2 (double x)
-{
- double w[2], cor, res;
-
- res = do_cos_slow (x, 0, 0, &cor);
- if (res == res + cor)
- return res;
-
- __docos (fabs (x), 0, w);
- if (w[0] == w[0] + 1.000000005 * w[1])
- return w[0];
-
- return __mpcos (x, 0, false);
-}
-
#ifndef __cos
weak_alias (__cos, cos)
# ifdef NO_LONG_DOUBLE
@@ -925,3 +334,5 @@ strong_alias (__sin, __sinl)
weak_alias (__sin, sinl)
# endif
#endif
+
+#endif
Index: glibc-2.26/sysdeps/ieee754/dbl-64/s_sincos.c
===================================================================
--- glibc-2.26.orig/sysdeps/ieee754/dbl-64/s_sincos.c
+++ glibc-2.26/sysdeps/ieee754/dbl-64/s_sincos.c
@@ -22,42 +22,9 @@
#include <math_private.h>
-#define __sin __sin_local
-#define __cos __cos_local
-#define IN_SINCOS 1
+#define IN_SINCOS
#include "s_sin.c"
-/* Consolidated version of reduce_and_compute in s_sin.c that does range
- reduction only once and computes sin and cos together. */
-static inline void
-__always_inline
-reduce_and_compute_sincos (double x, double *sinx, double *cosx)
-{
- double a, da;
- unsigned int n = __branred (x, &a, &da);
-
- n = n & 3;
-
- if (n == 1 || n == 2)
- {
- a = -a;
- da = -da;
- }
-
- if (n & 1)
- {
- double *temp = cosx;
- cosx = sinx;
- sinx = temp;
- }
-
- if (a * a < 0.01588)
- *sinx = bsloww (a, da, x, n);
- else
- *sinx = bsloww1 (a, da, x, n);
- *cosx = bsloww2 (a, da, x, n);
-}
-
void
__sincos (double x, double *sinx, double *cosx)
{
@@ -67,37 +34,62 @@ __sincos (double x, double *sinx, double
SET_RESTORE_ROUND_53BIT (FE_TONEAREST);
u.x = x;
- k = 0x7fffffff & u.i[HIGH_HALF];
+ k = u.i[HIGH_HALF] & 0x7fffffff;
if (k < 0x400368fd)
{
- *sinx = __sin_local (x);
- *cosx = __cos_local (x);
- return;
- }
- if (k < 0x419921FB)
- {
- double a, da;
- int4 n = reduce_sincos_1 (x, &a, &da);
-
- *sinx = do_sincos_1 (a, da, x, n, false);
- *cosx = do_sincos_1 (a, da, x, n, true);
-
- return;
- }
- if (k < 0x42F00000)
- {
- double a, da;
- int4 n = reduce_sincos_2 (x, &a, &da);
-
- *sinx = do_sincos_2 (a, da, x, n, false);
- *cosx = do_sincos_2 (a, da, x, n, true);
-
+ double a, da, y;
+ /* |x| < 2^-27 => cos (x) = 1, sin (x) = x. */
+ if (k < 0x3e400000)
+ {
+ if (k < 0x3e500000)
+ math_check_force_underflow (x);
+ *sinx = x;
+ *cosx = 1.0;
+ return;
+ }
+ /* |x| < 0.855469. */
+ else if (k < 0x3feb6000)
+ {
+ *sinx = do_sin (x, 0);
+ *cosx = do_cos (x, 0);
+ return;
+ }
+
+ /* |x| < 2.426265. */
+ y = hp0 - fabs (x);
+ a = y + hp1;
+ da = (y - a) + hp1;
+ *sinx = __copysign (do_cos (a, da), x);
+ *cosx = do_sin (a, da);
return;
}
+ /* |x| < 2^1024. */
if (k < 0x7ff00000)
{
- reduce_and_compute_sincos (x, sinx, cosx);
+ double a, da, xx;
+ unsigned int n;
+
+ /* If |x| < 105414350 use simple range reduction. */
+ n = k < 0x419921FB ? reduce_sincos (x, &a, &da) : __branred (x, &a, &da);
+ n = n & 3;
+
+ if (n == 1 || n == 2)
+ {
+ a = -a;
+ da = -da;
+ }
+
+ if (n & 1)
+ {
+ double *temp = cosx;
+ cosx = sinx;
+ sinx = temp;
+ }
+
+ *sinx = do_sin (a, da);
+ xx = do_cos (a, da);
+ *cosx = (n & 2) ? -xx : xx;
return;
}
Index: glibc-2.26/sysdeps/ieee754/dbl-64/slowexp.c
===================================================================
--- glibc-2.26.orig/sysdeps/ieee754/dbl-64/slowexp.c
+++ /dev/null
@@ -1,86 +0,0 @@
-/*
- * IBM Accurate Mathematical Library
- * written by International Business Machines Corp.
- * Copyright (C) 2001-2017 Free Software Foundation, Inc.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU Lesser General Public License as published by
- * the Free Software Foundation; either version 2.1 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
- */
-/**************************************************************************/
-/* MODULE_NAME:slowexp.c */
-/* */
-/* FUNCTION:slowexp */
-/* */
-/* FILES NEEDED:mpa.h */
-/* mpa.c mpexp.c */
-/* */
-/*Converting from double precision to Multi-precision and calculating */
-/* e^x */
-/**************************************************************************/
-#include <math_private.h>
-
-#include <stap-probe.h>
-
-#ifndef USE_LONG_DOUBLE_FOR_MP
-# include "mpa.h"
-void __mpexp (mp_no *x, mp_no *y, int p);
-#endif
-
-#ifndef SECTION
-# define SECTION
-#endif
-
-/*Converting from double precision to Multi-precision and calculating e^x */
-double
-SECTION
-__slowexp (double x)
-{
-#ifndef USE_LONG_DOUBLE_FOR_MP
- double w, z, res, eps = 3.0e-26;
- int p;
- mp_no mpx, mpy, mpz, mpw, mpeps, mpcor;
-
- /* Use the multiple precision __MPEXP function to compute the exponential
- First at 144 bits and if it is not accurate enough, at 768 bits. */
- p = 6;
- __dbl_mp (x, &mpx, p);
- __mpexp (&mpx, &mpy, p);
- __dbl_mp (eps, &mpeps, p);
- __mul (&mpeps, &mpy, &mpcor, p);
- __add (&mpy, &mpcor, &mpw, p);
- __sub (&mpy, &mpcor, &mpz, p);
- __mp_dbl (&mpw, &w, p);
- __mp_dbl (&mpz, &z, p);
- if (w == z)
- {
- /* Track how often we get to the slow exp code plus
- its input/output values. */
- LIBC_PROBE (slowexp_p6, 2, &x, &w);
- return w;
- }
- else
- {
- p = 32;
- __dbl_mp (x, &mpx, p);
- __mpexp (&mpx, &mpy, p);
- __mp_dbl (&mpy, &res, p);
-
- /* Track how often we get to the uber-slow exp code plus
- its input/output values. */
- LIBC_PROBE (slowexp_p32, 2, &x, &res);
- return res;
- }
-#else
- return (double) __ieee754_expl((long double)x);
-#endif
-}
Index: glibc-2.26/sysdeps/ieee754/dbl-64/slowpow.c
===================================================================
--- glibc-2.26.orig/sysdeps/ieee754/dbl-64/slowpow.c
+++ /dev/null
@@ -1,125 +0,0 @@
-/*
- * IBM Accurate Mathematical Library
- * written by International Business Machines Corp.
- * Copyright (C) 2001-2017 Free Software Foundation, Inc.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU Lesser General Public License as published by
- * the Free Software Foundation; either version 2.1 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
- */
-/*************************************************************************/
-/* MODULE_NAME:slowpow.c */
-/* */
-/* FUNCTION:slowpow */
-/* */
-/*FILES NEEDED:mpa.h */
-/* mpa.c mpexp.c mplog.c halfulp.c */
-/* */
-/* Given two IEEE double machine numbers y,x , routine computes the */
-/* correctly rounded (to nearest) value of x^y. Result calculated by */
-/* multiplication (in halfulp.c) or if result isn't accurate enough */
-/* then routine converts x and y into multi-precision doubles and */
-/* calls to mpexp routine */
-/*************************************************************************/
-
-#include "mpa.h"
-#include <math_private.h>
-
-#include <stap-probe.h>
-
-#ifndef SECTION
-# define SECTION
-#endif
-
-void __mpexp (mp_no *x, mp_no *y, int p);
-void __mplog (mp_no *x, mp_no *y, int p);
-double ulog (double);
-double __halfulp (double x, double y);
-
-double
-SECTION
-__slowpow (double x, double y, double z)
-{
- double res, res1;
- mp_no mpx, mpy, mpz, mpw, mpp, mpr, mpr1;
- static const mp_no eps = {-3, {1.0, 4.0}};
- int p;
-
- /* __HALFULP returns -10 or X^Y. */
- res = __halfulp (x, y);
-
- /* Return if the result was computed by __HALFULP. */
- if (res >= 0)
- return res;
-
- /* Compute pow as long double. This is currently only used by powerpc, where
- one may get 106 bits of accuracy. */
-#ifdef USE_LONG_DOUBLE_FOR_MP
- long double ldw, ldz, ldpp;
- static const long double ldeps = 0x4.0p-96;
-
- ldz = __ieee754_logl ((long double) x);
- ldw = (long double) y *ldz;
- ldpp = __ieee754_expl (ldw);
- res = (double) (ldpp + ldeps);
- res1 = (double) (ldpp - ldeps);
-
- /* Return the result if it is accurate enough. */
- if (res == res1)
- return res;
-#endif
-
- /* Or else, calculate using multiple precision. P = 10 implies accuracy of
- 240 bits accuracy, since MP_NO has a radix of 2^24. */
- p = 10;
- __dbl_mp (x, &mpx, p);
- __dbl_mp (y, &mpy, p);
- __dbl_mp (z, &mpz, p);
-
- /* z = x ^ y
- log (z) = y * log (x)
- z = exp (y * log (x)) */
- __mplog (&mpx, &mpz, p);
- __mul (&mpy, &mpz, &mpw, p);
- __mpexp (&mpw, &mpp, p);
-
- /* Add and subtract EPS to ensure that the result remains unchanged, i.e. we
- have last bit accuracy. */
- __add (&mpp, &eps, &mpr, p);
- __mp_dbl (&mpr, &res, p);
- __sub (&mpp, &eps, &mpr1, p);
- __mp_dbl (&mpr1, &res1, p);
- if (res == res1)
- {
- /* Track how often we get to the slow pow code plus
- its input/output values. */
- LIBC_PROBE (slowpow_p10, 4, &x, &y, &z, &res);
- return res;
- }
-
- /* If we don't, then we repeat using a higher precision. 768 bits of
- precision ought to be enough for anybody. */
- p = 32;
- __dbl_mp (x, &mpx, p);
- __dbl_mp (y, &mpy, p);
- __dbl_mp (z, &mpz, p);
- __mplog (&mpx, &mpz, p);
- __mul (&mpy, &mpz, &mpw, p);
- __mpexp (&mpw, &mpp, p);
- __mp_dbl (&mpp, &res, p);
-
- /* Track how often we get to the uber-slow pow code plus
- its input/output values. */
- LIBC_PROBE (slowpow_p32, 4, &x, &y, &z, &res);
-
- return res;
-}
Index: glibc-2.26/sysdeps/ieee754/dbl-64/uexp.h
===================================================================
--- glibc-2.26.orig/sysdeps/ieee754/dbl-64/uexp.h
+++ glibc-2.26/sysdeps/ieee754/dbl-64/uexp.h
@@ -29,8 +29,7 @@
#include "mydefs.h"
-const static double zero = 0.0, hhuge = 1.0e300, tiny = 1.0e-300,
-err_0 = 1.000014, err_1 = 0.000016;
+const static double zero = 0.0, hhuge = 1.0e300, tiny = 1.0e-300;
const static int4 bigint = 0x40862002,
badint = 0x40876000,smallint = 0x3C8fffff;
const static int4 hugeint = 0x7FFFFFFF, infint = 0x7ff00000;
Index: glibc-2.26/sysdeps/ieee754/dbl-64/ulog.h
===================================================================
--- glibc-2.26.orig/sysdeps/ieee754/dbl-64/ulog.h
+++ glibc-2.26/sysdeps/ieee754/dbl-64/ulog.h
@@ -42,43 +42,6 @@
/**/ b6 = {{0x3fbc71c5, 0x25db58ac} }, /* 0.111... */
/**/ b7 = {{0xbfb9a4ac, 0x11a2a61c} }, /* -0.100... */
/**/ b8 = {{0x3fb75077, 0x0df2b591} }, /* 0.091... */
- /* polynomial III */
-#if 0
-/**/ c1 = {{0x3ff00000, 0x00000000} }, /* 1 */
-#endif
-/**/ c2 = {{0xbfe00000, 0x00000000} }, /* -1/2 */
-/**/ c3 = {{0x3fd55555, 0x55555555} }, /* 1/3 */
-/**/ c4 = {{0xbfd00000, 0x00000000} }, /* -1/4 */
-/**/ c5 = {{0x3fc99999, 0x9999999a} }, /* 1/5 */
- /* polynomial IV */
-/**/ d2 = {{0xbfe00000, 0x00000000} }, /* -1/2 */
-/**/ dd2 = {{0x00000000, 0x00000000} }, /* -1/2-d2 */
-/**/ d3 = {{0x3fd55555, 0x55555555} }, /* 1/3 */
-/**/ dd3 = {{0x3c755555, 0x55555555} }, /* 1/3-d3 */
-/**/ d4 = {{0xbfd00000, 0x00000000} }, /* -1/4 */
-/**/ dd4 = {{0x00000000, 0x00000000} }, /* -1/4-d4 */
-/**/ d5 = {{0x3fc99999, 0x9999999a} }, /* 1/5 */
-/**/ dd5 = {{0xbc699999, 0x9999999a} }, /* 1/5-d5 */
-/**/ d6 = {{0xbfc55555, 0x55555555} }, /* -1/6 */
-/**/ dd6 = {{0xbc655555, 0x55555555} }, /* -1/6-d6 */
-/**/ d7 = {{0x3fc24924, 0x92492492} }, /* 1/7 */
-/**/ dd7 = {{0x3c624924, 0x92492492} }, /* 1/7-d7 */
-/**/ d8 = {{0xbfc00000, 0x00000000} }, /* -1/8 */
-/**/ dd8 = {{0x00000000, 0x00000000} }, /* -1/8-d8 */
-/**/ d9 = {{0x3fbc71c7, 0x1c71c71c} }, /* 1/9 */
-/**/ dd9 = {{0x3c5c71c7, 0x1c71c71c} }, /* 1/9-d9 */
-/**/ d10 = {{0xbfb99999, 0x9999999a} }, /* -1/10 */
-/**/ dd10 = {{0x3c599999, 0x9999999a} }, /* -1/10-d10 */
-/**/ d11 = {{0x3fb745d1, 0x745d1746} }, /* 1/11 */
-/**/ d12 = {{0xbfb55555, 0x55555555} }, /* -1/12 */
-/**/ d13 = {{0x3fb3b13b, 0x13b13b14} }, /* 1/13 */
-/**/ d14 = {{0xbfb24924, 0x92492492} }, /* -1/14 */
-/**/ d15 = {{0x3fb11111, 0x11111111} }, /* 1/15 */
-/**/ d16 = {{0xbfb00000, 0x00000000} }, /* -1/16 */
-/**/ d17 = {{0x3fae1e1e, 0x1e1e1e1e} }, /* 1/17 */
-/**/ d18 = {{0xbfac71c7, 0x1c71c71c} }, /* -1/18 */
-/**/ d19 = {{0x3faaf286, 0xbca1af28} }, /* 1/19 */
-/**/ d20 = {{0xbfa99999, 0x9999999a} }, /* -1/20 */
/* constants */
/**/ sqrt_2 = {{0x3ff6a09e, 0x667f3bcc} }, /* sqrt(2) */
/**/ h1 = {{0x3fd2e000, 0x00000000} }, /* 151/2**9 */
@@ -87,14 +50,6 @@
/**/ delv = {{0x3ef00000, 0x00000000} }, /* 1/2**16 */
/**/ ln2a = {{0x3fe62e42, 0xfefa3800} }, /* ln(2) 43 bits */
/**/ ln2b = {{0x3d2ef357, 0x93c76730} }, /* ln(2)-ln2a */
-/**/ e1 = {{0x3bbcc868, 0x00000000} }, /* 6.095e-21 */
-/**/ e2 = {{0x3c1138ce, 0x00000000} }, /* 2.334e-19 */
-/**/ e3 = {{0x3aa1565d, 0x00000000} }, /* 2.801e-26 */
-/**/ e4 = {{0x39809d88, 0x00000000} }, /* 1.024e-31 */
-/**/ e[M] ={{{0x37da223a, 0x00000000} }, /* 1.2e-39 */
-/**/ {{0x35c851c4, 0x00000000} }, /* 1.3e-49 */
-/**/ {{0x2ab85e51, 0x00000000} }, /* 6.8e-103 */
-/**/ {{0x17383827, 0x00000000} }},/* 8.1e-197 */
/**/ two54 = {{0x43500000, 0x00000000} }, /* 2**54 */
/**/ u03 = {{0x3f9eb851, 0xeb851eb8} }; /* 0.03 */
@@ -114,43 +69,6 @@
/**/ b6 = {{0x25db58ac, 0x3fbc71c5} }, /* 0.111... */
/**/ b7 = {{0x11a2a61c, 0xbfb9a4ac} }, /* -0.100... */
/**/ b8 = {{0x0df2b591, 0x3fb75077} }, /* 0.091... */
- /* polynomial III */
-#if 0
-/**/ c1 = {{0x00000000, 0x3ff00000} }, /* 1 */
-#endif
-/**/ c2 = {{0x00000000, 0xbfe00000} }, /* -1/2 */
-/**/ c3 = {{0x55555555, 0x3fd55555} }, /* 1/3 */
-/**/ c4 = {{0x00000000, 0xbfd00000} }, /* -1/4 */
-/**/ c5 = {{0x9999999a, 0x3fc99999} }, /* 1/5 */
- /* polynomial IV */
-/**/ d2 = {{0x00000000, 0xbfe00000} }, /* -1/2 */
-/**/ dd2 = {{0x00000000, 0x00000000} }, /* -1/2-d2 */
-/**/ d3 = {{0x55555555, 0x3fd55555} }, /* 1/3 */
-/**/ dd3 = {{0x55555555, 0x3c755555} }, /* 1/3-d3 */
-/**/ d4 = {{0x00000000, 0xbfd00000} }, /* -1/4 */
-/**/ dd4 = {{0x00000000, 0x00000000} }, /* -1/4-d4 */
-/**/ d5 = {{0x9999999a, 0x3fc99999} }, /* 1/5 */
-/**/ dd5 = {{0x9999999a, 0xbc699999} }, /* 1/5-d5 */
-/**/ d6 = {{0x55555555, 0xbfc55555} }, /* -1/6 */
-/**/ dd6 = {{0x55555555, 0xbc655555} }, /* -1/6-d6 */
-/**/ d7 = {{0x92492492, 0x3fc24924} }, /* 1/7 */
-/**/ dd7 = {{0x92492492, 0x3c624924} }, /* 1/7-d7 */
-/**/ d8 = {{0x00000000, 0xbfc00000} }, /* -1/8 */
-/**/ dd8 = {{0x00000000, 0x00000000} }, /* -1/8-d8 */
-/**/ d9 = {{0x1c71c71c, 0x3fbc71c7} }, /* 1/9 */
-/**/ dd9 = {{0x1c71c71c, 0x3c5c71c7} }, /* 1/9-d9 */
-/**/ d10 = {{0x9999999a, 0xbfb99999} }, /* -1/10 */
-/**/ dd10 = {{0x9999999a, 0x3c599999} }, /* -1/10-d10 */
-/**/ d11 = {{0x745d1746, 0x3fb745d1} }, /* 1/11 */
-/**/ d12 = {{0x55555555, 0xbfb55555} }, /* -1/12 */
-/**/ d13 = {{0x13b13b14, 0x3fb3b13b} }, /* 1/13 */
-/**/ d14 = {{0x92492492, 0xbfb24924} }, /* -1/14 */
-/**/ d15 = {{0x11111111, 0x3fb11111} }, /* 1/15 */
-/**/ d16 = {{0x00000000, 0xbfb00000} }, /* -1/16 */
-/**/ d17 = {{0x1e1e1e1e, 0x3fae1e1e} }, /* 1/17 */
-/**/ d18 = {{0x1c71c71c, 0xbfac71c7} }, /* -1/18 */
-/**/ d19 = {{0xbca1af28, 0x3faaf286} }, /* 1/19 */
-/**/ d20 = {{0x9999999a, 0xbfa99999} }, /* -1/20 */
/* constants */
/**/ sqrt_2 = {{0x667f3bcc, 0x3ff6a09e} }, /* sqrt(2) */
/**/ h1 = {{0x00000000, 0x3fd2e000} }, /* 151/2**9 */
@@ -159,14 +77,6 @@
/**/ delv = {{0x00000000, 0x3ef00000} }, /* 1/2**16 */
/**/ ln2a = {{0xfefa3800, 0x3fe62e42} }, /* ln(2) 43 bits */
/**/ ln2b = {{0x93c76730, 0x3d2ef357} }, /* ln(2)-ln2a */
-/**/ e1 = {{0x00000000, 0x3bbcc868} }, /* 6.095e-21 */
-/**/ e2 = {{0x00000000, 0x3c1138ce} }, /* 2.334e-19 */
-/**/ e3 = {{0x00000000, 0x3aa1565d} }, /* 2.801e-26 */
-/**/ e4 = {{0x00000000, 0x39809d88} }, /* 1.024e-31 */
-/**/ e[M] ={{{0x00000000, 0x37da223a} }, /* 1.2e-39 */
-/**/ {{0x00000000, 0x35c851c4} }, /* 1.3e-49 */
-/**/ {{0x00000000, 0x2ab85e51} }, /* 6.8e-103 */
-/**/ {{0x00000000, 0x17383827} }},/* 8.1e-197 */
/**/ two54 = {{0x00000000, 0x43500000} }, /* 2**54 */
/**/ u03 = {{0xeb851eb8, 0x3f9eb851} }; /* 0.03 */
@@ -178,10 +88,6 @@
#define DEL_V delv.d
#define LN2A ln2a.d
#define LN2B ln2b.d
-#define E1 e1.d
-#define E2 e2.d
-#define E3 e3.d
-#define E4 e4.d
#define U03 u03.d
#endif
Index: glibc-2.26/sysdeps/m68k/m680x0/fpu/halfulp.c
===================================================================
--- glibc-2.26.orig/sysdeps/m68k/m680x0/fpu/halfulp.c
+++ /dev/null
@@ -1 +0,0 @@
-/* Not needed. */
Index: glibc-2.26/sysdeps/m68k/m680x0/fpu/slowexp.c
===================================================================
--- glibc-2.26.orig/sysdeps/m68k/m680x0/fpu/slowexp.c
+++ /dev/null
@@ -1 +0,0 @@
-/* Not needed. */
Index: glibc-2.26/sysdeps/m68k/m680x0/fpu/slowpow.c
===================================================================
--- glibc-2.26.orig/sysdeps/m68k/m680x0/fpu/slowpow.c
+++ /dev/null
@@ -1 +0,0 @@
-/* Not needed. */
Index: glibc-2.26/sysdeps/powerpc/power4/fpu/Makefile
===================================================================
--- glibc-2.26.orig/sysdeps/powerpc/power4/fpu/Makefile
+++ glibc-2.26/sysdeps/powerpc/power4/fpu/Makefile
@@ -2,6 +2,4 @@
ifeq ($(subdir),math)
CFLAGS-mpa.c += --param max-unroll-times=4 -funroll-loops -fpeel-loops
-CPPFLAGS-slowpow.c += -DUSE_LONG_DOUBLE_FOR_MP=1
-CPPFLAGS-slowexp.c += -DUSE_LONG_DOUBLE_FOR_MP=1
endif
Index: glibc-2.26/sysdeps/x86_64/fpu/libm-test-ulps
===================================================================
--- glibc-2.26.orig/sysdeps/x86_64/fpu/libm-test-ulps
+++ glibc-2.26/sysdeps/x86_64/fpu/libm-test-ulps
@@ -1262,7 +1262,9 @@ ildouble: 1
ldouble: 1
Function: "cos":
+double: 1
float128: 1
+idouble: 1
ifloat128: 1
ildouble: 1
ldouble: 1
@@ -2464,8 +2466,10 @@ Function: "log_vlen8_avx2":
float: 2
Function: "pow":
+double: 1
float: 1
float128: 2
+idouble: 1
ifloat: 1
ifloat128: 2
ildouble: 1
@@ -2552,7 +2556,9 @@ Function: "pow_vlen8_avx2":
float: 3
Function: "sin":
+double: 1
float128: 1
+idouble: 1
ifloat128: 1
ildouble: 1
ldouble: 1
@@ -2602,7 +2608,9 @@ Function: "sin_vlen8_avx2":
float: 1
Function: "sincos":
+double: 1
float128: 1
+idouble: 1
ifloat128: 1
ildouble: 1
ldouble: 1
Index: glibc-2.26/sysdeps/x86_64/fpu/multiarch/Makefile
===================================================================
--- glibc-2.26.orig/sysdeps/x86_64/fpu/multiarch/Makefile
+++ glibc-2.26/sysdeps/x86_64/fpu/multiarch/Makefile
@@ -4,9 +4,9 @@ libm-sysdep_routines += s_floor-c s_ceil
libm-sysdep_routines += e_exp-fma4 e_log-fma4 e_pow-fma4 s_atan-fma4 \
e_asin-fma4 e_atan2-fma4 s_sin-fma4 s_tan-fma4 \
- mplog-fma4 mpa-fma4 slowexp-fma4 slowpow-fma4 \
+ mplog-fma4 mpa-fma4 \
sincos32-fma4 doasin-fma4 dosincos-fma4 \
- halfulp-fma4 mpexp-fma4 \
+ mpexp-fma4 \
mpatan2-fma4 mpatan-fma4 mpsqrt-fma4 mptan-fma4
CFLAGS-doasin-fma4.c = -mfma4
@@ -16,7 +16,6 @@ CFLAGS-e_atan2-fma4.c = -mfma4
CFLAGS-e_exp-fma4.c = -mfma4
CFLAGS-e_log-fma4.c = -mfma4
CFLAGS-e_pow-fma4.c = -mfma4 $(config-cflags-nofma)
-CFLAGS-halfulp-fma4.c = -mfma4
CFLAGS-mpa-fma4.c = -mfma4
CFLAGS-mpatan-fma4.c = -mfma4
CFLAGS-mpatan2-fma4.c = -mfma4
@@ -26,14 +25,12 @@ CFLAGS-mpsqrt-fma4.c = -mfma4
CFLAGS-mptan-fma4.c = -mfma4
CFLAGS-s_atan-fma4.c = -mfma4
CFLAGS-sincos32-fma4.c = -mfma4
-CFLAGS-slowexp-fma4.c = -mfma4
-CFLAGS-slowpow-fma4.c = -mfma4
CFLAGS-s_sin-fma4.c = -mfma4
CFLAGS-s_tan-fma4.c = -mfma4
libm-sysdep_routines += e_exp-avx e_log-avx s_atan-avx \
e_atan2-avx s_sin-avx s_tan-avx \
- mplog-avx mpa-avx slowexp-avx \
+ mplog-avx mpa-avx \
mpexp-avx
CFLAGS-e_atan2-avx.c = -msse2avx -DSSE2AVX
@@ -44,7 +41,6 @@ CFLAGS-mpexp-avx.c = -msse2avx -DSSE2AVX
CFLAGS-mplog-avx.c = -msse2avx -DSSE2AVX
CFLAGS-s_atan-avx.c = -msse2avx -DSSE2AVX
CFLAGS-s_sin-avx.c = -msse2avx -DSSE2AVX
-CFLAGS-slowexp-avx.c = -msse2avx -DSSE2AVX
CFLAGS-s_tan-avx.c = -msse2avx -DSSE2AVX
endif
Index: glibc-2.26/sysdeps/x86_64/fpu/multiarch/e_exp-avx.c
===================================================================
--- glibc-2.26.orig/sysdeps/x86_64/fpu/multiarch/e_exp-avx.c
+++ glibc-2.26/sysdeps/x86_64/fpu/multiarch/e_exp-avx.c
@@ -1,6 +1,5 @@
#define __ieee754_exp __ieee754_exp_avx
#define __exp1 __exp1_avx
-#define __slowexp __slowexp_avx
#define SECTION __attribute__ ((section (".text.avx")))
#include <sysdeps/ieee754/dbl-64/e_exp.c>
Index: glibc-2.26/sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c
===================================================================
--- glibc-2.26.orig/sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c
+++ glibc-2.26/sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c
@@ -1,6 +1,5 @@
#define __ieee754_exp __ieee754_exp_fma4
#define __exp1 __exp1_fma4
-#define __slowexp __slowexp_fma4
#define SECTION __attribute__ ((section (".text.fma4")))
#include <sysdeps/ieee754/dbl-64/e_exp.c>
Index: glibc-2.26/sysdeps/x86_64/fpu/multiarch/e_pow-fma4.c
===================================================================
--- glibc-2.26.orig/sysdeps/x86_64/fpu/multiarch/e_pow-fma4.c
+++ glibc-2.26/sysdeps/x86_64/fpu/multiarch/e_pow-fma4.c
@@ -1,6 +1,5 @@
#define __ieee754_pow __ieee754_pow_fma4
#define __exp1 __exp1_fma4
-#define __slowpow __slowpow_fma4
#define SECTION __attribute__ ((section (".text.fma4")))
#include <sysdeps/ieee754/dbl-64/e_pow.c>
Index: glibc-2.26/sysdeps/x86_64/fpu/multiarch/halfulp-fma4.c
===================================================================
--- glibc-2.26.orig/sysdeps/x86_64/fpu/multiarch/halfulp-fma4.c
+++ /dev/null
@@ -1,4 +0,0 @@
-#define __halfulp __halfulp_fma4
-#define SECTION __attribute__ ((section (".text.fma4")))
-
-#include <sysdeps/ieee754/dbl-64/halfulp.c>
Index: glibc-2.26/sysdeps/x86_64/fpu/multiarch/slowexp-avx.c
===================================================================
--- glibc-2.26.orig/sysdeps/x86_64/fpu/multiarch/slowexp-avx.c
+++ /dev/null
@@ -1,9 +0,0 @@
-#define __slowexp __slowexp_avx
-#define __add __add_avx
-#define __dbl_mp __dbl_mp_avx
-#define __mpexp __mpexp_avx
-#define __mul __mul_avx
-#define __sub __sub_avx
-#define SECTION __attribute__ ((section (".text.avx")))
-
-#include <sysdeps/ieee754/dbl-64/slowexp.c>
Index: glibc-2.26/sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c
===================================================================
--- glibc-2.26.orig/sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c
+++ /dev/null
@@ -1,9 +0,0 @@
-#define __slowexp __slowexp_fma4
-#define __add __add_fma4
-#define __dbl_mp __dbl_mp_fma4
-#define __mpexp __mpexp_fma4
-#define __mul __mul_fma4
-#define __sub __sub_fma4
-#define SECTION __attribute__ ((section (".text.fma4")))
-
-#include <sysdeps/ieee754/dbl-64/slowexp.c>
Index: glibc-2.26/sysdeps/x86_64/fpu/multiarch/slowpow-fma4.c
===================================================================
--- glibc-2.26.orig/sysdeps/x86_64/fpu/multiarch/slowpow-fma4.c
+++ /dev/null
@@ -1,11 +0,0 @@
-#define __slowpow __slowpow_fma4
-#define __add __add_fma4
-#define __dbl_mp __dbl_mp_fma4
-#define __mpexp __mpexp_fma4
-#define __mplog __mplog_fma4
-#define __mul __mul_fma4
-#define __sub __sub_fma4
-#define __halfulp __halfulp_fma4
-#define SECTION __attribute__ ((section (".text.fma4")))
-
-#include <sysdeps/ieee754/dbl-64/slowpow.c>