## Tang's exponential function

### The algorithm

In Table-Driven Implementation of the Exponential Function in IEEE Floating-Point Arithmetic, Ping Tak Peter Tang described an implementation of an almost correctly-rounded elementary function in single and double precision. John Harrison later did a complete formal proof in HOL Light of the implementation in Floating point verification in HOL Light: the exponential function.

Here we just focus on the tedious computation of the rounding error. We consider neither the argument reduction nor the reconstruction part (trivial anyway, excepted when the end result is subnormal). We want to prove that, in the C code below, the absolute error between e and the exponential E0 of R0 (the ideal reduced argument) is less than 0.54 ulp. Variable n is an integer and all the other variables are single-precision floating-point numbers.

r2 = -n * l2;
r = r1 + r2;
q = r * r * (a1 + r * a2);
p = r1 + (r2 + q);
s = s1 + s2;
e = s1 + (s2 + s * p);

Let us note R the computed reduced argument and S the stored value of the ideal constant S0. There are 32 such constants. For the sake of simplicity, we only consider the second constant: $2^{\frac{1}{32}}$. E is the value of the expression e computed with an infinitely precise arithmetic. Z is the absolute error between the polynomial $x + a_1 \cdot x^2 + a_2 \cdot x^3$ and $\exp(x) - 1$ for $|x| \le \frac{\log 2}{64}$.

### Gappa description

a1 = 8388676b-24;
a2 = 11184876b-26;
l2 = 12566158b-48;
s1 = 8572288b-23;
s2 = 13833605b-44;

r2 rnd= -n * l2;
r rnd= r1 + r2;
q rnd= r * r * (a1 + r * a2);
p rnd= r1 + (r2 + q);
s rnd= s1 + s2;
e rnd= s1 + (s2 + s * p);

R = r1 + r2;
S = s1 + s2;

E0 = S0 * (1 + R0 + a1 * R0 * R0 + a2 * R0 * R0 * R0 + Z);

{ Z in [-55b-39,55b-39] /\ S - S0 in [-1b-41,1b-41] /\ R - R0 in [-1b-34,1b-34] /\
R in [0,0.0217] /\ n in [-10176,10176]
->
e in ? /\ e - E0 in ? }

Please note the way Z is introduced. Its definition is backward: Z is a real number such that E0 is the product of S0 and the exponential of R0. It makes for clearer definitions and it avoids having to deal with divisions.

Results:
e in [4282253b-22 {1.02097, 2^(0.0299396)}, 8768135b-23 {1.04524, 2^(0.0638374)}]
e - E0 in [-13458043620277891b-59 {-0.023346, -2^(-5.42068)}, 3364512538651833b-57 {0.023346, 2^(-5.42068)}]

Gappa is able to bound both expressions. While the bounds for e seem sensible, the bounds for e - E0 are grossly overestimated. This overestimation comes from the difference between the structures of e and E0. To improve the bounds on the error e - E0, we split it into two parts: a round-off error and a term that combines both discretization and truncation errors. The round-off error is expressed by introducing a term E with the same structure as e but without any rounding operator.

E = s1 + (s2 + S * (r1 + (r2 + R * R * (a1 + R * a2))));

So the round-off error is e - E, while the other term is E - E0. As before, the expressions E and E0 are structurally different, so Gappa grossly overestimates the bounds of E - E0. This time, we introduce a term Er with the same structure as E0 but equal to E. Since Z has no corresponding term in E, we insert an artificial term 0 in Er to obtain the same structure.

Er = S * (1 + R + a1 * R * R + a2 * R * R * R + 0);

Finally, we tell Gappa how to compute e - E0 using E and Er.

e - E0 -> (e - E) + (Er - E0);

Note that, rather than using a hint, this equality could also have been indicated as a hypothesis of the logical formula.

### Full listing

@rnd = float< ieee_32, ne >;

a1 = 8388676b-24;
a2 = 11184876b-26;
l2 = 12566158b-48;
s1 = 8572288b-23;
s2 = 13833605b-44;

r2 rnd= -n * l2;
r rnd= r1 + r2;
q rnd= r * r * (a1 + r * a2);
p rnd= r1 + (r2 + q);
s rnd= s1 + s2;
e rnd= s1 + (s2 + s * p);

R = r1 + r2;
S = s1 + s2;

E = s1 + (s2 + S * (r1 + (r2 + R * R * (a1 + R * a2))));
Er = S * (1 + R + a1 * R * R + a2 * R * R * R + 0);
E0 = S0 * (1 + R0 + a1 * R0 * R0 + a2 * R0 * R0 * R0 + Z);

{ Z in [-55b-39,55b-39] /\ S - S0 in [-1b-41,1b-41] /\ R - R0 in [-1b-34,1b-34] /\
R in [0,0.0217] /\ n in [-10176,10176] ->
e in ? /\ e - E0 in ? }

e - E0 -> (e - E) + (Er - E0);

Gappa answers that the error is bounded by 0.535 ulp. This is consistent with the bounds computed by Tang and Harrison.

Results:
e in [8572295b-23 {1.0219, 2^(0.0312489)}, 4380173b-22 {1.04431, 2^(0.0625575)}]
e - E0 in [-75807082762648785b-80 {-6.27061e-08, -2^(-23.9268)}, 154166255364809243b-81 {6.37617e-08, 2^(-23.9027)}]