![]() | Profiling Tools |
Prev | Introduction | Next |
Most known is the GCC profiling tool gprof: One needs to compile the program with option -pg; running the program generates a file gmon.out, which can be transformed into human readable form with gprof. One disadvantage is the needed compilation step for a prepared executable, which has to be statically linked. The method used here is compiler generated instrumention which is measuring call arcs happening among functions and according call counts, in conjunction with TBS, which gives a histogram of time distribution over the code. Using both information, it is possible to heuristically calculate inclusive time of functions, i.e. time spent in a function together with all functions called from it.
For exact measurement of events happening, there exist libraries with functions able to read out hardware performance counters. Most known here is the PerfCtr patch for Linux, and the architecture independent libraries PAPI and PCL. Still, exact measurement needs instrumentation of code, as stated above. Either one uses the libraries itself or uses automatic instrumentation systems like ADAPTOR (for FORTRAN source instrumentation) or DynaProf (code injection via DynInst).
OProfile is a systemwide profiling tool for Linux using Sampling.
In many aspects, a comfortable way of Profiling is using Cachegrind or Callgrind, which are simulators using the runtime instrumentation framework Valgrind. Because there is no need to access hardware counters (often difficult with todays Linux installations), and binaries to be profiled can be left unmodified, it's a good alternative way to other profiling tools. The disadvantage of simulation slowdown can be reduced by doing the simulation only on the interesting program parts, and perhaps only on a few iterations of a loop. Without measurement/simulation instrumentation, Valgrinds usage only has a slowdown in the range of 3-5. And when only the call graph and call counts are of interest, the cache simulator can be switched off.
Cache simulation is the first step in approximating real times, as on modern systems, runtime is very sensitive to the exploitation of so called caches, small and fast buffers which accelerate repeated accesses to the same main memory cells. Cachegrind does cache simulation by catching memory accesses. The data produced includes the number of instruction/data memory accesses and 1st/2nd level cache misses, and relates it to source lines and functions of the run program. By combining these miss counts, using typical miss latencies, an estimation of spent time can be given.
Callgrind is an extension of Cachegrind that builds up the call graph of a program on-the-fly, i.e. how the functions call each other and how many events happen while running a function. Besides, the profile data to be collected can separated by threads and call chain contexts. It can provide profiling data on an instruction level to allow for annotation of disassembled code.
Prev | Home | Next |
Profiling Methods | Up | Visualization |