## Square Roots are slow?

Fun question on Stack Overflow just now. Someone wants to know how to fill a circle by setting pixels. In the interests of trying the simplest thing first, I suggested:

Bitmap bmp = new Bitmap(200, 200); int r = 50; // radius int ox = 100, oy = 100; // origin for (int x = -r; x < r ; x++) { int height = (int)Math.Sqrt(r * r - x * x); for (int y = -height; y < height; y++) bmp.SetPixel(x + ox, y + oy, Color.Red); } bmp.Save(@"c:\users\dearwicker\Desktop\circle.bmp");

As the questioner was already using the Bresenham algorithm to minimise the use of square-root calculation, my answer naturally attracted comments, because I’d completely ignored this issue.

To be honest, I probably would have leapt to the same conclusion. After all – `Math.Sqrt`

looks like some kind of hefty call to a method in the BCL. Who knows how many thousands of instructions we’re going to be wasting on it?

Actually, we can find out by looking in the disassembly window. So what exactly does `Math.Sqrt`

turn into when you run the program?

` for (int n = 0; n < reps; n++)`

00000098 mov dword ptr [rsp+2Ch],0

000000a0 jmp 00000000000000DD

c += (int) Math.Sqrt(n);

000000a2 mov rcx,7FF001D1FF8h

000000ac mov ecx,dword ptr [rcx]

000000ae cvtsi2sd xmm0,dword ptr [rsp+2Ch]

**000000b4 sqrtsd xmm0,xmm0**

000000b8 movsd mmword ptr [rsp+60h],xmm0

000000be cvttsd2si eax,mmword ptr [rsp+60h]

000000c4 add ecx,eax

000000c6 mov rax,7FF001D1FF8h

000000d0 mov dword ptr [rax],ecx

That’s right – it gets inlined into what is effectively a single x86 assembler instruction!

The above disassembly comes from a test program I wrote to compare the time taken to sum the results of lots of `Math.Sqrt`

calls, with the time taken to do *simple addition*. There is no difference – or at least, the difference is so small that it is lost in the noise.

So, no – square roots are not slow!