The goal is to find a fast approximation of a smooth function, F(x), over the interval [-1,1]. The criterion is to minimize the maximum error. The function might have complex poles nearby.
An example would be the X-component of the normalized tangent to a parametric cubic polynomial curve. I actually use a simplified variant of this, F(x)=1/sqrt(0.01+x^2).
I'm trying to solve a problem, and am not a theoretical mathematician. One reason for not citing many references here is that, after an extensive search, including asking several mathematicians, I haven't found much useful material. So, some of the material here was probably originally discovered decades ago. Citations of prior art are welcome.
Taylor series are almost always the wrong answer. Outside of Math I, who really wants a good fit to the N-th derivative at the origin? In any case, the nearby complex poles will cause a small radius of convergence. Therefore, several Taylor series over smaller intervals would be required.
A Chebyshev series always converges (for continuous F). The rate depends on the size of axes of an ellipse with foci at the ends of the interval, and passing thru the closest pole. This could be quite slow (many terms per digit). Nevertheless, the error oscillates uniformly.
Determining and using a Chebyshev approximation can be numerically tricky. E.g, the flip side of the error being uniformly minimized inside [-1,1] is that immediately outside the interval, the error grows the fastest of any approximating function (satisfying certain constraints).
A Pade approximation is a formal transformation of the first N terms of a series into a rational function. The Taylor approximation of that rational function agrees with the original series up to degree N. Therefore, the Pade approximation contains no more information than the original Taylor series.
Nevertheless, the Pade approximation sometimes is a better approximation to the original function than the Taylor approximation from which it was derived. However, the error is still not uniformly distributed.
Pade approximations were much researched in the 19th century. One property is that the approximation to a certain degree [M,N] is often defective (that's not the right word). That is, some hi-order terms of the numerator or denominator may be zero.
A Chebyshev-Pade approximation is a Pade transformation applied to the first N terms of a Chebyshev series. Again, there is no new information added. Nevertheless, it can converge faster than either the Chebyshev or the Pade approximations.
A spline would work. However, the rate of convergence is quite small, possibly because the spline does not use the information that we are approximating a simple, continuous, differentiable-many-times, function, and not an random collection of data points.
A rational approximation can follow curves that are not essentially polynomial. This includes abs(x), tan(x), and a Heaviside (step) function. These might not even have a uniform polynomial approximation at all.
Even for very smooth functions, such as exp(x), the rational approximation is more economical than a Chebyshev approximation (which is better than the Taylor expansion).
A good ref for that is David Newman's monograph. (Ref to add).
The subject appears to be still largely at the existence-proof stage of development. There are few tools generally the Remez algorithm, and no cookbooks.
Rational functions are not composed of the sum of a set of basis functions. This nonlinearity gives them their power, but makes working with them much harder.
The Minimax routine in Maple claims to find the minimax rational approximation to a given function. However, it often fails with an error message that F doesn't oscillate sufficiently. Worse, sometimes when minimax does return a result, it is wrong. These problems can occur with a [2,2] approximation. Increasing Digits didn't help.
When this was reported to Maple several years ago, they suggested a parameter tweak in the approximation routine so that it would not give up so quickly. That might perhaps let the degree get higher by one before failure.
Mathematica has a function named something like MinMaxApprox. It's no better than Maple.
Computer Aided Design uses rational parametric curves a lot. They have several advantages.
A circle can be exactly represented by a quadratic rational parametric curve, but not by a polynomial parametric curve of any degree.
Perspective transformations preserve rationals, but not polynomials. Preserve means that the form of the curve is unchanged; obviously the coefficients change.
Homogeneous coordinates mesh nicely with rationals.
In homogeneous coordinates, the weight coordinate is useful for weighting the various control points differently.
However, the rational expressions are derived by techniques that do not try to minimax the error. There seems to be no effort to do this. (Corrections accepted.) Therefore, rational curves in CAD don't seem to be relevant.
Guide to Available Mathematical Software, from NIST, is a comprehensive index that has some matches to rational. However, they don't look directly relevant.
Lacking any existing packages to do this, I proceeded heuristically, in Maple, as follows.
Approximating a function looks too hard; I don't want to implement Remez if it can be avoided.
Therefore, pick a representative set of points, interpolate them, increase the sample set, and repeat.
The optimal set of points is not uniformly distributed. Rather, it should be at least as nonuniform as a Chebyshev set, cos(i/n), and posssibly even more nonuniform; see Newman. Nevertheless, I'll start with a uniform set.
A Thiele interpolation is used to derive the rational function interpolating the N data points. It is a formal transformation. It is given in Abramowicz and Stegun, and implemented in Maple.
There is (at least) one problem with Maple's implementation of Thiele. When the X-values are equally spaced, Maple's formula has singularities, of the form 0/0, which cause it to fail. These singularities are artificial and removable since they are caused by common terms in both the numerator and denominator. If we could simplify the expression before evaluating it, then there would be no problem. However, when the algebraic expression, which is a continued fraction, is converted to a rational expression, it is too complicated. Neither does using l'Hopital's rule look too attractive.
Therefore I perturbed the X-values slightly to avoid singularities. This is ugly, but works.
The next problem is that the interpolated function may not be stable, i.e., may have singularities like 1/X. There are existence theorems about this in the literature. My solution is to add another data point near the singularity, and reinterpolate. The new function generally is continuous, altho a new singularity could occur later.
I implemented a testbed in Maple. With this I could plot error curves, add new points, and reinterpolate. Details later.