-fomit-frame-pointer [was: Re: XFree86 host.def]

Matthias Benkmann matthias at winterdrache.de
Thu Oct 3 15:54:04 PDT 2002


On Thu, 03 Oct 2002 09:57:03 -0500 Bruce Dubbs <bdubbs at swbell.net> wrote:

> Excellent demonstration.  In my mind it brings up the next logical 
> question.  How significant is the three extra instructions in each 
> function call?  Using the -fomit-frame-pointer omits three instructions 
> that will save both time and space with the tradeoff that debuggers will
> 
> be more difficult to use on the code, but are the cpu cycles and size 
> going to be noticable?

Aside from the few saved instructions, -fomit-frame-pointer frees up
another register the compiler can use to store commonly accessed data. On
the x86 architecture this can make a huge difference because it has so few
registers.
Look at the following program:

void foo(int* p1,char* p2)
{
  int a,b,c,d,e,f;
  c=0;
  d=10;
  e=5; 
  f=6;
  for (a=0; a<100000; ++a)
  for (b=33; b<20000; b+=c)
  {
    *p1=a;
    *p2=d;
    c+=(d-a-e);
    c=(c&1)+1; 
    d+=(b-a); 
    e+=d-b-f;
    f+=c;
  }
}

int main()
{
  char a;
  int b;
  foo(&b,&a);  
}


 It has a loop that aside from 2 memory accesses contains only integer
computations (could be palette effects while blitting an RGB image or some
kind of decoding loop). Ideally, all variables should be kept in
registers, but because x86 has only so few registers, this is not
possible. 
With gcc -O2, the resulting inner loop is

.L10:
        movl    8(%ebp), %eax
        movl    %esi, (%eax)
        movl    12(%ebp), %eax
        movb    %bl, (%eax)
        movl    %ebx, %eax
        subl    %esi, %eax
        subl    -16(%ebp), %eax
        leal    (%eax,%ecx), %eax
        andl    $1, %eax
        leal    1(%eax), %ecx
        movl    %edx, %eax
        subl    %esi, %eax
        addl    %eax, %ebx
        movl    %ebx, %eax
        subl    %edx, %eax
        addl    %ecx, %edx
        subl    %edi, %eax
        addl    %ecx, %edi
        addl    %eax, -16(%ebp)
        cmpl    $19999, %edx
        jle     .L10

with gcc -O2 -fomit-frame-pointer the result is (shown as unified diff):

 .L10:
-       movl    8(%ebp), %eax
+       movl    20(%esp), %eax
        movl    %esi, (%eax)
-       movl    12(%ebp), %eax
+       movl    24(%esp), %eax
        movb    %bl, (%eax)
        movl    %ebx, %eax
        subl    %esi, %eax
-       subl    -16(%ebp), %eax
+       subl    %ebp, %eax
        leal    (%eax,%ecx), %eax
        andl    $1, %eax
        leal    1(%eax), %ecx
        movl    %edx, %eax
        subl    %esi, %eax
        addl    %eax, %ebx
        movl    %ebx, %eax
        subl    %edx, %eax
        addl    %ecx, %edx
        subl    %edi, %eax
        addl    %ecx, %edi
-       addl    %eax, -16(%ebp)
+       addl    %eax, %ebp
        cmpl    $19999, %edx
        jle     .L10

The first 2 changes have no effect because they replace a memory access
with another memory access. The last 2 changes however show how the
compiler uses the additional free register ebp to store a value that was
previously kept in memory. This eliminates 2 memory accesses (1 read and 1
write) from the loop. The loop is executed a total of 2,000,000,000 times.
So -fomit-frame-pointer replaces 4,000,000,000 memory accesses with
register operations. Now let's see how that translates into time saved (I
ran the test several times, of course to verify the times):

/tmp> gcc -O2 temp2.c -o temp2
 
/tmp> time ./temp2

real    0m52.963s
user    0m52.960s
sys     0m0.000s

/tmp> gcc -O2 -fomit-frame-pointer temp2.c -o temp2

/tmp> time ./temp2

real    0m31.782s
user    0m31.780s
sys     0m0.000s

Okay. I think the numbers speak for themselves. But now for the real fun.
I know there are a lot of you optimization freaks out there who compile
everything with -O3. I've argued against this in the past
(http://archive.linuxfromscratch.org/mail-archives/lfs-support/2002/09/03
57.html) but I know no one's listening. Well, here's what you get:

/tmp> gcc -O3 temp2.c -o temp2

/tmp> time ./temp2

real    0m55.612s
user    0m55.610s
sys     0m0.000s

/tmp> gcc -O3 -fomit-frame-pointer temp2.c -o temp2

time ./temp2

real    0m55.611s
user    0m55.610s
sys     0m0.000s


LMAO. Not only is -O3 3 seconds slower, it even loses all the benefits of
-fomit-frame-pointer. I just love it. Now all of you speed freaks will
have to rebuild your LFSs or live with the feeling that with all of the
CFLAGS fiddling not only have you sacrificed stability but you may even
have lost speed. ROTFL. :-)
BTW, the numbers get only a little better with -march=k6 (and adding all
the flags suggested in the Mozilla hint). The whole situation may be
different on a Pentium or with a different GCC version (I have 3.2) but
the point is (and this has also been mentioned on the gcc mailing list,
btw) that -O3 may produce much worse code than -O2.
The interesting thing is that the code generated for foo() is the same
with -O3 -fomit-frame-pointer and -O2 -fomit-frame-pointer. The difference
is that -O3 inlines foo() into main() but because the frame pointer of
main() apparently can't be omitted, the inlined code is worse because of
fewer available registers. So this example debunks the common myth that
inlining is always a good idea.

But let's get back to -fomit-frame-pointer. Aside from the register saving
effect there's the code savings. These are just a few instructions that
don't matter for most functions, but they do matter for small functions
such as foo(){return a;}. Trivial functions like this are not as rare as
they may seem. They often occur in C++ classes as accessor functions. So
let's look at a little C++ program:

struct Test
{
  char b;
  char x;
  char f;
  char foo();     
  char bar();
  char xyzzy();
  volatile char c;
  void main();
  Test():b(10),x(11),f(12){};
};
 
void Test::main()
{
  int i;
  for (i=0; i<100; ++i)
  {
    int i;
    for (i=0; i<5000000; ++i) c=foo()+bar()+xyzzy();
  }
}

inline char Test::foo(){  return f;};  
inline char Test::bar(){  return b;};
inline char Test::xyzzy(){  return x;};
 
int main()
{
  Test t;
  t.main();
}

/tmp> g++ -O2 test.cpp -o test
/tmp> time ./test

real    0m33.998s
user    0m33.990s
sys     0m0.010s

/tmp> g++ -O2 -fomit-frame-pointer test.cpp -o test
/tmp> time ./test

real    0m27.003s
user    0m27.000s
sys     0m0.010s


So the effect of -fomit-frame-pointer is very visible in this example,
too. A careful observer might notice that something's not quite right
here. The 3 accessor functions are declared inline, so how can
-fomit-frame-pointer have an effect at all? The answer is another example
for the subtleties of optimization. If we move the definition of
Test::main() behind the definitions of the 3 accessors like this

inline char Test::foo(){  return f;};  
inline char Test::bar(){  return b;};
inline char Test::xyzzy(){  return x;};


void Test::main()
{
  int i;
  for (i=0; i<100; ++i)
  {
    int i;
    for (i=0; i<5000000; ++i) c=foo()+bar()+xyzzy();
  }
}


we get the following:

/tmp> g++ -O2 test.cpp -o test
/tmp> time ./test

real    0m2.022s
user    0m2.020s
sys     0m0.000s


Impressive, eh? I didn't change a single line of code. I just shuffled it
around and got a factor 15 speed increase. By letting gcc see the accessor
functions before Test::main(), it was able to inline them into the
Test::main() code. Note that -fomit-frame-pointer does not make a
difference here because inlined functions don't have stack frames.
Okay, the speed freaks are probably going to ask about -O3. Well, with -O3
gcc always inlines the functions regardless of placement, achieving the
factor 15 speed increase. Now what? Is -O3 good or is it bad? I'd like to
repeat what I said in my older message that I referred to above:

Optimization should be a conscious choice made for each package!
Unfortunately many people see it as a lifestyle instead. 

It is pointless (and as demonstrated in the beginning can be even
counterproductive) to override CFLAGS with "-O3 ..." for every package
without making actual measurements (and of course taking into account
whether you run the respective program often enough to make it worth the
trouble).

Okay, I guess that's enough wasted time for today. I'll go to bed now.

MSB

-- 
Digitize if possible - Eradicate if necessary!

-- 
Unsubscribe: send email to listar at linuxfromscratch.org
and put 'unsubscribe blfs-dev' in the subject header of the message



More information about the blfs-dev mailing list