## Square Roots are slow?

Fun question on Stack Overflow just now. Someone wants to know how to fill a circle by setting pixels. In the interests of trying the simplest thing first, I suggested:

Bitmap bmp = new Bitmap(200, 200); int r = 50; // radius int ox = 100, oy = 100; // origin for (int x = -r; x < r ; x++) { int height = (int)Math.Sqrt(r * r - x * x); for (int y = -height; y < height; y++) bmp.SetPixel(x + ox, y + oy, Color.Red); } bmp.Save(@"c:\users\dearwicker\Desktop\circle.bmp");

As the questioner was already using the Bresenham algorithm to minimise the use of square-root calculation, my answer naturally attracted comments, because I’d completely ignored this issue.

To be honest, I probably would have leapt to the same conclusion. After all – `Math.Sqrt`

looks like some kind of hefty call to a method in the BCL. Who knows how many thousands of instructions we’re going to be wasting on it?

Actually, we can find out by looking in the disassembly window. So what exactly does `Math.Sqrt`

turn into when you run the program?

` for (int n = 0; n < reps; n++)`

00000098 mov dword ptr [rsp+2Ch],0

000000a0 jmp 00000000000000DD

c += (int) Math.Sqrt(n);

000000a2 mov rcx,7FF001D1FF8h

000000ac mov ecx,dword ptr [rcx]

000000ae cvtsi2sd xmm0,dword ptr [rsp+2Ch]

**000000b4 sqrtsd xmm0,xmm0**

000000b8 movsd mmword ptr [rsp+60h],xmm0

000000be cvttsd2si eax,mmword ptr [rsp+60h]

000000c4 add ecx,eax

000000c6 mov rax,7FF001D1FF8h

000000d0 mov dword ptr [rax],ecx

That’s right – it gets inlined into what is effectively a single x86 assembler instruction!

The above disassembly comes from a test program I wrote to compare the time taken to sum the results of lots of `Math.Sqrt`

calls, with the time taken to do *simple addition*. There is no difference – or at least, the difference is so small that it is lost in the noise.

So, no – square roots are not slow!

I readed somewhere that there is a C++ program that accelerates the square root calculation, because the x86 sqrtsd wasn’t fast enough for certain applications. I am certain that I readed it at both Euler Project’s forum and Stackoverflow, and that was made for the original Quake/Doom game.

After some search I found it:

http://www.codemaestro.com/reviews/9

It was the Quake engine, and the method was the REVERSE square root, which also has an entry in the x86 instruction set.

Note that on x86, instructions don’t take a fixed number cycles. sqrt is definitely heavy, but it is completely executed on cpu. This makes the calculation faster than using the same algorithm but hardcoded, but it is still heavy.

I tried to rewrite Sine(degree) a while ago in C#, and managed to get it only 2-3 times as slow as the native Sine function.

@Dykam – note that if I take out the SetPixel call, and then compare the speed of the loop including Math.Sqrt with the speed when Math.Sqrt is also removed, I can’t get a significantly different measurement, even with billions of iterations. That’s what I mean by “not slow”. These things are always relative. Math.Sqrt is not slow compared to a handful of arithmetic operations and some looping logic. It’s practically indetectable.

@Jader Dias – that’s the inverse of the square root, 1.0/sqrt(x) – the reverse of square root is squaring. 🙂