GCC Optimization

Amadeus amadeus.stevenson at gmail.com
Sat Feb 10 12:51:22 PST 2007

Athena P wrote:
> Hi All
I tried compiling the toolchain with various optimisations and further 
down the line something always broke, so now I have an unoptimised 
toolchain and base. I selectively recompile certain apps for 
"performance boosts" (such as the example in this email). I think this 
is a good approach.

I was going to post this a few weeks ago but have been busy, so I'm just 
going to post the results without any nice graphics or tables (sorry).

"I was wondering if anyone has done/is interested in running some 
comparisons of how different compiler flags affect performance/memory 
usage for various programs? Maybe this is off-topic but I only started 
getting interested in CFLAGS etc. while going through LFS... (and 
reading the Intel Compiler and optimising hints).

I put together a shell script to automate the process of compiling and 
"measuring" performance for various CFLAGS.

The attached output (values in bytes/seconds unless stated) is for lame 
compressing a large (150mb) wave file to mp3. The memory usage is from 
the Private values in /proc/PID/smaps and is the average of 3 runs (as 
are the times). I saw the lame example outputting to /dev/null somewhere 
on the web.

While I did my best to make the results reliable (dropping into su mode 
(less bg procs) and averaging) I'm not really sure if smaps is a good 
value for memory usage... and lame is of course only a very specific 
example on specific (P4 3Ghz) hardware.

It would be interesting to run these tests with icc and appropriate 
flags as well, as well as for other programs."

I'm not attaching the shell script as it is a) Pretty ugly b) Written 
for the lame compile method. If there is interest I will rework it and post.

Lame's default optimisations include -ffast-math, which seems to be the 
biggest performance booster in this case.


Sun Jan 28 19:20:41 CET 2007
lame --silent -h /home/amadeus/test.wav /dev/null
* (not specified so lame uses its own defaults)
   text       data        bss        dec        hex    filename
 424573       2088     203968     630629      99f65    frontend/lame
Compiled file size: 458359
Private memory (kb): 1718
Real: 75.0033
User: 74.7767
Sys: 0.2
* -O2
   text       data        bss        dec        hex    filename
 268881       2136     203968     474985      73f69    frontend/lame
Compiled file size: 302570
Private memory (kb): 1640
Real: 79.9733
User: 79.7567
Sys: 0.19
* -O2 -march=pentium4
   text       data        bss        dec        hex    filename
 263500       2132     203968     469600      72a60    frontend/lame
Compiled file size: 296034
Private memory (kb): 1641
Real: 76.3167
User: 76.11
Sys: 0.173333
* -O2 -funroll-loops
   text       data        bss        dec        hex    filename
 400081       2136     203968     606185      93fe9    frontend/lame
Compiled file size: 433512
Private memory (kb): 1717
Real: 78.07
User: 77.77
Sys: 0.27
* -O2 -march=pentium4 -funroll-loops
   text       data        bss        dec        hex    filename
 383004       2132     203968     589104      8fd30    frontend/lame
Compiled file size: 415392
Private memory (kb): 1714
Real: 75.1833
User: 74.9433
Sys: 0.206667
* -O3
   text       data        bss        dec        hex    filename
 310769       2136     203968     516873      7e309    frontend/lame
Compiled file size: 342143
Private memory (kb): 1661
Real: 78.6433
User: 78.39
Sys: 0.22
* -O3 -march=pentium4
   text       data        bss        dec        hex    filename
 305308       2132     203968     511408      7cdb0    frontend/lame
Compiled file size: 336439
Private memory (kb): 1661
Real: 75.6267
User: 75.4067
Sys: 0.186667
* -O3 -funroll-loops
   text       data        bss        dec        hex    filename
 434033       2136     203968     640137      9c489    frontend/lame
Compiled file size: 465183
Private memory (kb): 1738
Real: 79.2233
User: 78.9433
Sys: 0.253333
* -O3  -march=pentium4 -funroll-loops
   text       data        bss        dec        hex    filename
 417100       2132     203968     623200      98260    frontend/lame
Compiled file size: 448631
Private memory (kb): 1718
Real: 75.49
User: 75.2233
Sys: 0.233333
* -Os
   text       data        bss        dec        hex    filename
 217389       2152     203968     423509      67655    frontend/lame
Compiled file size: 250278
Private memory (kb): 1608
Real: 84.3333
User: 84.1167
Sys: 0.183333
* -Os -march=pentium4
   text       data        bss        dec        hex    filename
 214957       2152     203968     421077      66cd5    frontend/lame
Compiled file size: 247846
Private memory (kb): 1610
Real: 80.6533
User: 80.4333
Sys: 0.186667
* -Os -funroll-loops
   text       data        bss        dec        hex    filename
 327917       2152     203968     534037      82615    frontend/lame
Compiled file size: 360676
Private memory (kb): 1685
Real: 80.74
User: 80.5167
Sys: 0.196667
* -Os -march=pentium4 -funroll-loops
   text       data        bss        dec        hex    filename
 317885       2152     203968     524005      7fee5    frontend/lame
Compiled file size: 351940
Private memory (kb): 1666
Real: 77.9133
User: 77.7
Sys: 0.183333
Sun Jan 28 20:27:11 CET 2007

More information about the lfs-support mailing list