arf.h – arbitrary-precision floating-point numbers¶
A variable of type arf_t
holds an arbitrary-precision binary
floating-point number, i.e. a rational number of the form
\(x \times 2^y\) where \(x, y \in \mathbb{Z}\) and \(x\) is odd;
or one of the special values zero, plus infinity, minus infinity,
or NaN (not-a-number).
The exponent of a finite and nonzero floating-point number can be
defined in different
ways: for example, as the component y above, or as the unique
integer e such that
\(x \times 2^y = m \times 2^e\) where \(1/2 \le |m| < 1\).
The internal representation of an arf_t
stores the
exponent in the latter format.
The conventions for special values largely follow those of the IEEE floating-point standard. At the moment, there is no support for negative zero, unsigned infinity, or a NaN with a payload, though some of these might be added in the future.
Except where otherwise noted, the output of an operation is the floating-point number obtained by taking the inputs as exact numbers, in principle carrying out the operation exactly, and rounding the resulting real number to the nearest representable floating-point number whose mantissa has at most the specified number of bits, in the specified direction of rounding. Some operations are always or optionally done exactly.
The arf_t
type is almost identical semantically to
the legacy fmpr_t
type, but uses a more efficient
internal representation.
The most significant differences that the user
has to be aware of are:
- The mantissa is no longer represented as a FLINT
fmpz
, and the internal exponent points to the top of the binary expansion of the mantissa instead of of the bottom. Code designed to manipulate components of anfmpr_t
directly can be ported to thearf_t
type by making use ofarf_get_fmpz_2exp()
andarf_set_fmpz_2exp()
. - Some
arf_t
functions return anint
indicating whether a result is inexact, whereas the correspondingfmpr_t
functions return anslong
encoding the relative exponent of the error.
Types, macros and constants¶
-
arf_struct
¶
-
arf_t
¶ An
arf_struct
contains four words: anfmpz
exponent (exp), a size field tracking the number of limbs used (one bit of this field is also used for the sign of the number), and two more words. The last two words hold the value directly if there are at most two limbs, and otherwise contain one alloc field (tracking the total number of allocated limbs, not all of which might be used) and a pointer to the actual limbs. Thus, up to 128 bits on a 64-bit machine and 64 bits on a 32-bit machine, no space outside of thearf_struct
is used.An
arf_t
is defined as an array of length one of typearf_struct
, permitting anarf_t
to be passed by reference.
-
arf_rnd_t
¶ Specifies the rounding mode for the result of an approximate operation.
-
ARF_RND_DOWN
¶ Specifies that the result of an operation should be rounded to the nearest representable number in the direction towards zero.
-
ARF_RND_UP
¶ Specifies that the result of an operation should be rounded to the nearest representable number in the direction away from zero.
-
ARF_RND_FLOOR
¶ Specifies that the result of an operation should be rounded to the nearest representable number in the direction towards minus infinity.
-
ARF_RND_CEIL
¶ Specifies that the result of an operation should be rounded to the nearest representable number in the direction towards plus infinity.
-
ARF_RND_NEAR
¶ Specifies that the result of an operation should be rounded to the nearest representable number, rounding to an odd mantissa if there is a tie between two values. Warning: this rounding mode is currently not implemented (except for a few conversions functions where this stated explicitly).
-
ARF_PREC_EXACT
¶ If passed as the precision parameter to a function, indicates that no rounding is to be performed. This must only be used when it is known that the result of the operation can be represented exactly and fits in memory (the typical use case is working with small integer values). Note that, for example, adding two numbers whose exponents are far apart can easily produce an exact result that is far too large to store in memory.
Memory management¶
Special values¶
-
int
arf_is_nan
(const arf_t x)¶ Returns nonzero iff x respectively equals 0, 1, \(+\infty\), \(-\infty\), NaN.
-
int
arf_is_normal
(const arf_t x)¶ Returns nonzero iff x is a finite, nonzero floating-point value, i.e. not one of the special values 0, \(+\infty\), \(-\infty\), NaN.
-
int
arf_is_special
(const arf_t x)¶ Returns nonzero iff x is one of the special values 0, \(+\infty\), \(-\infty\), NaN, i.e. not a finite, nonzero floating-point value.
-
int
arf_is_finite
(arf_t x)¶ Returns nonzero iff x is a finite floating-point value, i.e. not one of the values \(+\infty\), \(-\infty\), NaN. (Note that this is not equivalent to the negation of
arf_is_inf()
.)
Assignment, rounding and conversions¶
-
int
arf_set_round_fmpz
(arf_t y, const fmpz_t x, slong prec, arf_rnd_t rnd)¶ Sets y to x, rounded to prec bits in the direction specified by rnd.
-
int
arf_set_round_fmpz_2exp
(arf_t y, const fmpz_t x, const fmpz_t e, slong prec, arf_rnd_t rnd)¶ Sets y to \(x \times 2^e\), rounded to prec bits in the direction specified by rnd.
-
void
arf_get_fmpz_2exp
(fmpz_t m, fmpz_t e, const arf_t x)¶ Sets m and e to the unique integers such that \(x = m \times 2^e\) and m is odd, provided that x is a nonzero finite fraction. If x is zero, both m and e are set to zero. If x is infinite or NaN, the result is undefined.
-
double
arf_get_d
(const arf_t x, arf_rnd_t rnd)¶ Returns x rounded to a double in the direction specified by rnd. This method supports rounding to nearest with ARF_RND_NEAR. It also rounds correctly when overflowing or underflowing the double exponent range (this was not the case in an earlier version).
-
int
arf_get_mpfr
(mpfr_t y, const arf_t x, mpfr_rnd_t rnd)¶ Sets the MPFR variable y to the value of x. If the precision of x is too small to allow y to be represented exactly, it is rounded in the specified MPFR rounding mode. The return value (-1, 0 or 1) indicates the direction of rounding, following the convention of the MPFR library.
-
void
arf_get_fmpz
(fmpz_t z, const arf_t x, arf_rnd_t rnd)¶ Sets z to x rounded to the nearest integer in the direction specified by rnd. If rnd is ARF_RND_NEAR, rounds to the nearest even integer in case of a tie. Aborts if x is infinite, NaN or if the exponent is unreasonably large.
-
slong
arf_get_si
(const arf_t x, arf_rnd_t rnd)¶ Returns x rounded to the nearest integer in the direction specified by rnd. If rnd is ARF_RND_NEAR, rounds to the nearest even integer in case of a tie. Aborts if x is infinite, NaN, or the value is too large to fit in a slong.
-
int
arf_get_fmpz_fixed_si
(fmpz_t y, const arf_t x, slong e)¶ Converts x to a mantissa with predetermined exponent, i.e. computes an integer y such that \(y \times 2^e \approx x\), truncating if necessary. Returns 0 if exact and 1 if truncation occurred.
-
void
arf_ceil
(arf_t y, const arf_t x)¶ Sets y to \(\lfloor x \rfloor\) and \(\lceil x \rceil\) respectively. The result is always represented exactly, requiring no more bits to store than the input. To round the result to a floating-point number with a lower precision, call
arf_set_round()
afterwards.
Comparisons and bounds¶
-
int
arf_equal_si
(const arf_t x, slong y)¶ Returns nonzero iff x and y are exactly equal. This function does not treat NaN specially, i.e. NaN compares as equal to itself.
-
int
arf_cmp
(const arf_t x, const arf_t y)¶ Returns negative, zero, or positive, depending on whether x is respectively smaller, equal, or greater compared to y. Comparison with NaN is undefined.
-
int
arf_cmpabs_2exp_si
(const arf_t x, slong e)¶ Compares x (respectively its absolute value) with \(2^e\).
-
int
arf_sgn
(const arf_t x)¶ Returns \(-1\), \(0\) or \(+1\) according to the sign of x. The sign of NaN is undefined.
-
void
arf_max
(arf_t z, const arf_t a, const arf_t b)¶ Sets z respectively to the minimum and the maximum of a and b.
-
slong
arf_bits
(const arf_t x)¶ Returns the number of bits needed to represent the absolute value of the mantissa of x, i.e. the minimum precision sufficient to represent x exactly. Returns 0 if x is a special value.
-
int
arf_is_int_2exp_si
(const arf_t x, slong e)¶ Returns nonzero iff x equals \(n 2^e\) for some integer n.
-
void
arf_abs_bound_lt_2exp_fmpz
(fmpz_t b, const arf_t x)¶ Sets b to the smallest integer such that \(|x| < 2^b\). If x is zero, infinity or NaN, the result is undefined.
Magnitude functions¶
-
void
arf_get_mag_lower
(mag_t y, const arf_t x)¶ Sets y to a lower bound for the absolute value of x.
-
void
mag_fast_init_set_arf
(mag_t y, const arf_t x)¶ Initializes y and sets it to an upper bound for x. Assumes that the exponent of y is small.
-
void
arf_mag_set_ulp
(mag_t z, const arf_t y, slong prec)¶ Sets z to the magnitude of the unit in the last place (ulp) of y at precision prec.
Shallow assignment¶
-
void
arf_init_set_mag_shallow
(arf_t z, const mag_t x)¶ Initializes z to a shallow copy of x. A shallow copy just involves copying struct data (no heap allocation is performed).
The target variable z may not be cleared or modified in any way (it can only be used as constant input to functions), and may not be used after x has been cleared. Moreover, after x has been assigned shallowly to z, no modification of x is permitted as slong as z is in use.
Random number generation¶
-
void
arf_randtest
(arf_t x, flint_rand_t state, slong bits, slong mag_bits)¶ Generates a finite random number whose mantissa has precision at most bits and whose exponent has at most mag_bits bits. The values are distributed non-uniformly: special bit patterns are generated with high probability in order to allow the test code to exercise corner cases.
-
void
arf_randtest_not_zero
(arf_t x, flint_rand_t state, slong bits, slong mag_bits)¶ Identical to
arf_randtest()
, except that zero is never produced as an output.
-
void
arf_randtest_special
(arf_t x, flint_rand_t state, slong bits, slong mag_bits)¶ Indentical to
arf_randtest()
, except that the output occasionally is set to an infinity or NaN.
Input and output¶
Addition and multiplication¶
-
int
arf_neg_round
(arf_t y, const arf_t x, slong prec, arf_rnd_t rnd)¶ Sets \(y = -x\), rounded to prec bits in the direction specified by rnd, returning nonzero iff the operation is inexact.
-
int
arf_mul_fmpz
(arf_t z, const arf_t x, const fmpz_t y, slong prec, arf_rnd_t rnd)¶ Sets \(z = x \times y\), rounded to prec bits in the direction specified by rnd, returning nonzero iff the operation is inexact.
-
int
arf_add_fmpz
(arf_t z, const arf_t x, const fmpz_t y, slong prec, arf_rnd_t rnd)¶ Sets \(z = x + y\), rounded to prec bits in the direction specified by rnd, returning nonzero iff the operation is inexact.
-
int
arf_add_fmpz_2exp
(arf_t z, const arf_t x, const fmpz_t y, const fmpz_t e, slong prec, arf_rnd_t rnd)¶ Sets \(z = x + y 2^e\), rounded to prec bits in the direction specified by rnd, returning nonzero iff the operation is inexact.
-
int
arf_sub_fmpz
(arf_t z, const arf_t x, const fmpz_t y, slong prec, arf_rnd_t rnd)¶ Sets \(z = x - y\), rounded to prec bits in the direction specified by rnd, returning nonzero iff the operation is inexact.
Summation¶
-
int
arf_sum
(arf_t s, arf_srcptr terms, slong len, slong prec, arf_rnd_t rnd)¶ Sets s to the sum of the array terms of length len, rounded to prec bits in the direction specified by rnd. The sum is computed as if done without any intermediate rounding error, with only a single rounding applied to the final result. Unlike repeated calls to
arf_add()
with infinite precision, this function does not overflow if the magnitudes of the terms are far apart. Warning: this function is implemented naively, and the running time is quadratic with respect to len in the worst case.
Division¶
Square roots¶
-
int
arf_sqrt_fmpz
(arf_t z, const fmpz_t x, slong prec, arf_rnd_t rnd)¶ Sets \(z = \sqrt{x}\), rounded to prec bits in the direction specified by rnd, returning nonzero iff the operation is inexact. The result is NaN if x is negative.
-
int
arf_rsqrt
(arf_t z, const arf_t x, slong prec, arf_rnd_t rnd)¶ Sets \(z = 1/\sqrt{x}\), rounded to prec bits in the direction specified by rnd, returning nonzero iff the operation is inexact. The result is NaN if x is negative, and \(+\infty\) if x is zero.
-
int
arf_root
(arf_t z, const arf_t x, ulong k, slong prec, arf_rnd_t rnd)¶ Sets \(z = x^{1/k}\), rounded to prec bits in the direction specified by rnd, returning nonzero iff the operation is inexact. The result is NaN if x is negative. Warning: this function is a wrapper around the MPFR root function. It gets slow and uses much memory for large k.
Complex arithmetic¶
-
int
arf_complex_mul
(arf_t e, arf_t f, const arf_t a, const arf_t b, const arf_t c, const arf_t d, slong prec, arf_rnd_t rnd)¶
-
int
arf_complex_mul_fallback
(arf_t e, arf_t f, const arf_t a, const arf_t b, const arf_t c, const arf_t d, slong prec, arf_rnd_t rnd)¶ Computes the complex product \(e + fi = (a + bi)(c + di)\), rounding both \(e\) and \(f\) correctly to prec bits in the direction specified by rnd. The first bit in the return code indicates inexactness of \(e\), and the second bit indicates inexactness of \(f\).
If any of the components a, b, c, d is zero, two real multiplications and no additions are done. This convention is used even if any other part contains an infinity or NaN, and the behavior with infinite/NaN input is defined accordingly.
The fallback version is implemented naively, for testing purposes. No squaring optimization is implemented.