GCC Optimization

Ken Moffat ken at linuxfromscratch.org
Sat Feb 10 12:22:08 PST 2007

On Sat, Feb 10, 2007 at 07:18:04PM -0000, Athena P wrote:
> Hi Randy
> Thanks for your reply.
> Surely specifying the CPU architecture is still worth while. For example
> using this string must give speed improvements.
>  "-O3 -march=prescott -march=prescott -mtune=prescott -mmmx -msse -
> msse2 -msse3 -m3dnow -pipe -mfpmath=sse" -fomit-fram-pointer"
> Or am I missing something?
 Yes, I think you are.  I haven't played with toolchain optimization
for a *long* while, but I think you need to consider *what* is
important to you.  Among the possibilities are:

(i) How long it takes to compile - in general, adding extra
optimisations will slow down any particular compile.  So, the whole
system will take longer to build.

(ii) Execution speed - this might be how long the system takes to
build a particular package, or to run a particular task, or for
server applications it might be throughput.

(iii) Impact on your processor's caches - a bigger binary increases
the pressure on your caches, and may mean more pages have to be read
when a program or library is loaded.  For a desktop, it is sometimes
asserted that smaller binaries (smaller code, not removing the
symbols to give shorter files) will provide a more responsive system.

(iv) There might be other things that matter to some other people,
e.g. memory pressure in a heavily-used system, perhaps running
bloatware (OOo and a leaky firefox) while trying to do big compiles
and simultaneously encoding some media.

 For a developer, being able to debug problems is important - that
might constrain use of -fomit-frame-pointer.  The other symbols look
'mostly harmless', although you move towards less-tested territory.

 The best thing you can do is identify what you hope to achieve,
then come up with some repeatable testcases which actually measure
what you are interested in, then measure them, ideally several times
to remove random variation.  My personal view is that there is
enough variability in a running system to make a single short test
meaningless, it needs to be repeated several times with a method
to handle the variation (average all results, or run x times,
eliminate best and worse and average the other, or whatever).

 In testing optimisations of the base system, not only do you need to
build two systems to compare, but you probably want to *run* them
from the same partition, and if doing file i/o (including compiles)
perhaps test on the same empty or pre-loaded-after-mkfs partition,
to eliminate variables.  All disks I've ever tested get slower the
further in you go - try making a few partitions and using hdparm on
the individual partitions to see this.  Sometimes the fall-off isn't
major.  Similarly, filesystem performance may vary according to the
filesysystem's past history (e.g. where it puts a new file).

 Now you can maybe see why hardly anybody has performed meaningful
tests on toolchain optimisation - for most users there isn't enough
likely gain to make the testing worthwhile.  For those supporting a
package across many similar machines, testing optimisations for their
package is possible, but the host system will normally be a given.

 The worst thing about testing optimizations is that the results are
specific to a processor model and the toolchain.  Just because a
particular optimization is best today, doesn't mean it will be best
in a year's time.  Mostly, optimization is based on assertion or gut
feelings, e.g. those package developers who throw in -O9 when their
users are likely to be using gcc, or people who claim that they can
see the difference with a particular optimization.

das eine Mal als Tragödie, das andere Mal als Farce

More information about the lfs-support mailing list