Home > Uncategorized > Square Roots are slow?

Square Roots are slow?

Fun question on Stack Overflow just now. Someone wants to know how to fill a circle by setting pixels. In the interests of trying the simplest thing first, I suggested:

    Bitmap bmp = new Bitmap(200, 200);

    int r = 50; // radius
    int ox = 100, oy = 100; // origin

    for (int x = -r; x < r ; x++)
    {
        int height = (int)Math.Sqrt(r * r - x * x);

        for (int y = -height; y < height; y++)
            bmp.SetPixel(x + ox, y + oy, Color.Red);
    }

    bmp.Save(@"c:\users\dearwicker\Desktop\circle.bmp");

As the questioner was already using the Bresenham algorithm to minimise the use of square-root calculation, my answer naturally attracted comments, because I’d completely ignored this issue.

To be honest, I probably would have leapt to the same conclusion. After all – Math.Sqrt looks like some kind of hefty call to a method in the BCL. Who knows how many thousands of instructions we’re going to be wasting on it?

Actually, we can find out by looking in the disassembly window. So what exactly does Math.Sqrt turn into when you run the program?

for (int n = 0; n < reps; n++)
00000098 mov dword ptr [rsp+2Ch],0
000000a0 jmp 00000000000000DD
c += (int) Math.Sqrt(n);
000000a2 mov rcx,7FF001D1FF8h
000000ac mov ecx,dword ptr [rcx]
000000ae cvtsi2sd xmm0,dword ptr [rsp+2Ch]
000000b4 sqrtsd xmm0,xmm0
000000b8 movsd mmword ptr [rsp+60h],xmm0
000000be cvttsd2si eax,mmword ptr [rsp+60h]
000000c4 add ecx,eax
000000c6 mov rax,7FF001D1FF8h
000000d0 mov dword ptr [rax],ecx

That’s right – it gets inlined into what is effectively a single x86 assembler instruction!

The above disassembly comes from a test program I wrote to compare the time taken to sum the results of lots of Math.Sqrt calls, with the time taken to do simple addition. There is no difference – or at least, the difference is so small that it is lost in the noise.

So, no – square roots are not slow!

Advertisements
Categories: Uncategorized Tags: ,
  1. August 3, 2009 at 1:10 pm

    I readed somewhere that there is a C++ program that accelerates the square root calculation, because the x86 sqrtsd wasn’t fast enough for certain applications. I am certain that I readed it at both Euler Project’s forum and Stackoverflow, and that was made for the original Quake/Doom game.

  2. August 3, 2009 at 1:23 pm

    After some search I found it:
    http://www.codemaestro.com/reviews/9
    It was the Quake engine, and the method was the REVERSE square root, which also has an entry in the x86 instruction set.

  3. August 3, 2009 at 6:30 pm

    Note that on x86, instructions don’t take a fixed number cycles. sqrt is definitely heavy, but it is completely executed on cpu. This makes the calculation faster than using the same algorithm but hardcoded, but it is still heavy.

    I tried to rewrite Sine(degree) a while ago in C#, and managed to get it only 2-3 times as slow as the native Sine function.

  4. earwicker
    August 6, 2009 at 9:44 am

    @Dykam – note that if I take out the SetPixel call, and then compare the speed of the loop including Math.Sqrt with the speed when Math.Sqrt is also removed, I can’t get a significantly different measurement, even with billions of iterations. That’s what I mean by “not slow”. These things are always relative. Math.Sqrt is not slow compared to a handful of arithmetic operations and some looping logic. It’s practically indetectable.

  5. earwicker
    August 6, 2009 at 10:05 am

    @Jader Dias – that’s the inverse of the square root, 1.0/sqrt(x) – the reverse of square root is squaring. 🙂

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: