Even faster asin() was staring right at me
def-pri-pub
107 points
50 comments
March 16, 2026
Related Discussions
Found 5 related stories in 54.1ms across 3,471 title embeddings via pgvector HNSW
- Faster asin() was hiding in plain sight def-pri-pub · 185 pts · March 11, 2026 · 89% similar
- Faster C software with Dynamic Feature Detection todsacerdoti · 61 pts · March 04, 2026 · 47% similar
- Java is fast, code might not be siegers · 192 pts · March 20, 2026 · 44% similar
- If you thought code writing speed was your problem you have bigger problems mooreds · 306 pts · March 17, 2026 · 42% similar
- We rewrote our Rust WASM parser in TypeScript and it got faster zahlekhan · 155 pts · March 20, 2026 · 41% similar
Discussion Highlights (11 comments)
fatih-erikli-cg
I think it is `atan` function. Sin is almost a lookup query.
fhdkweig
This is a followup of a different post from the same domain. 5 days ago, 134 comments https://news.ycombinator.com/item?id=47336111
thomasahle
Did you try polynomial preprocessing methods, like Knuth's and Estrin's methods? https://en.wikipedia.org/wiki/Polynomial_evaluation#Evaluati... they let you compute polynomials with half the multiplications of Horner's method, and I used them in the past to improve the speed of the exponential function in Boost.
jaen
Cool, although more ILP (instruction-level parallelism) might not necessarily be better on a modern GPU, which doesn't have much ILP, if any (instead it uses those resources to execute several threads in parallel). That might explain why the original Cg (a GPU programming language) code did not use Estrin's, since at least the code in the post does add 1 extra op (squaring `abs_x`). (AMD GPUs used to use VLIW (very long instruction word) which is "static" ILP).
jonasenordin
I haven't kept up with C++ in a few years - what does constexpr do for local variables? constexpr double a0 = 1.5707288;
srean
A notable approximation of ~650 AD vintage, by Bhaskara is ArcCos(x)= Π √((1-x)/(4+x)). The search for better and better approximations led Indian mathematicians to independently develop branches of differential and integral calculus. This tradition came to its own as Madhava school of mathematics from Kerala. https://en.wikipedia.org/wiki/Kerala_school_of_astronomy_and... Note the approximation is for 0 < x < 1. For the range [-1, 0] Bhaskara used symmetry. If I remember correctly, Aryabhatta had derived a rational approximation about a hundred years before this. EDIT https://doi.org/10.4169/math.mag.84.2.098
ashdnazg
No idea if it's not already optimised, but x2 could also be x*x and not just abs_x * abs_x, shifting the dependencies earlier.
jagged-chisel
> It also gets in the way of elegance and truth. That’s quite subjective. I happen to find trigonometry to be elegant and true. I also agree that trigonometric functions lack efficiency in software.
coldcity_again
I've been thinking about this since [1] the other day, but I still love how rotation by small angles lets you drop trig entirely. Let α represent a roll rotation, and β a pitch rotation. Let R(α) be: ( cos α sin α 0) (-sin α cos α 0) ( 0 0 1) Let R(β) be: (1 0 0 ) (0 cos β -sin β) (0 sin β cos β) Combine them: R(β).R(α) = ( cos α sin α 0 ) ((-sin α*cos β) (cos α*cos β) -sin β) ((-sin α*sin β) (cos α*sin β) cos β) But! For small α and β, just approximate: ( 1 α 0) (-α 1 -β) ( 0 β 1) So now: x' = x + αy y' = y - αx - βz z' = z + βy [1] https://news.ycombinator.com/item?id=47348192
LegionMammal978
The coefficients given are indeed a near-optimal cubic minimax approximation for (π/2 - arcsin(x))/sqrt(1-x) on [0,1]. But those coefficients aren't actually optimal for approximating arcsin(x) itself. For reference, the coefficients given are [1.5707288, -0.2121144, 0.0742610, -0.0187293]: if we optimize P(x) = (π/2 - arcsin(x))/sqrt(1-x) ourselves, we can extend them to double precision as [1.5707288189560218, -0.21211524058527342, 0.0742623449400704, -0.018729868776598532]. Increasing the precision reduces the max error, at x = 0, by 0.028%. Adjusting our error function to optimize the absolute error of arcsin(x) = π/2 - P(x)*sqrt(1-x) on [0,1], we get the coefficients [1.5707583404833712, -0.2128751841625164, 0.07689738736091772, -0.02089203710669022]. The max error is reduced by 44%, from 6.75e-5 to 3.80e-5. If we plot the error function [0], we see that the new max error is achieved at five points, x = 0, 0.105, 0.386, 0.730, 0.967. (Alternatively, adjusting our error function to optimize the relative error of arcsin(x), we get the coefficients [1.5707963267948966, -0.21441792645252514, 0.08365774237116316, -0.02732304481232744]. The max absolute error is 2.24e-4, but the max relative error is now 0.0181%, even in the vicinity of the root at x = 0. Though we'd almost certainly want to use a different formula to avoid catastrophic cancellation.) So it goes to show, we can nearly double our accuracy, without modifying the code, just by optimizing for the right error metric. [0] https://www.desmos.com/calculator/nj3b8rpvbs
3836293648
I'm surprised that the compiler couldn't see through this.