PHP Profiling with Cachegrind and XDebug

Often clients ask me to speed up their websites.  In most cases I use a backend to frontend approach. To get a general overview why a website might be slow I usually start by generating a code profile and analyzing it in Wincachegrind/Kcachegrind. This is in my opinion the fastest way to isolate slow code sections.

To generate the code profile you need to have XDebug installed. The following example shows how to install and configure XDebug for Ubuntu. To keep things simple I assume PHP + Apache are already set up and running. In case you are not running Ubuntu you might need to adjust this.

First install the XDebug PHP module:

root@michael:~# apt-get install php5-xdebug

The next step is to configure XDebug (don’t forget to restart Apache after).

/etc/php5/conf.d/xdebug.ini (Ubuntu)

xdebug.profiler_enable="0"
xdebug.profiler_output_dir="/tmp/xdebug"
xdebug.profiler_output_name="cachegrind.out.%p"
xdebug.profiler_enable_trigger="1"
xdebug.profiler_append="0"
xdebug.profiler_aggregate="0"

Once this is done add the XDEBUG_PROFILE query string to the URL of the page you want to profile.

http://example.com/test.php?XDEBUG_PROFILE=1

XDebug will now generate a code profile in /tmp. In case this doesn’t work double check if Apache has write permissions to the /tmp directory. The file should be called cachegrind.out.[ApacheProcessId]. I do most of my GUI stuff in Windows and use Wincachegrind to visualize my code profiles. Wincachegrind isn’t maintained any more and its last release dates back to 2009, but it still does the trick. Another popular alternative is KCachegrind which provides a neat visual call tree. I prefer Wincachegrind though as it is much easier to follow the code flow. The XDebug profiler documentation page  maintains a full list of alternatives.

Here a screenshot of Wincachegrind visualizing an example code profile.

WinCachegrind profiling session

Note: for security reasons, don’t enable XDebug profiling on live sites. Generating the code profiles is very resource intensive and having profiling enabled might enable an attacker to perform a DOS attack.

PHP Profiling with Cachegrind and XDebug

Timer class to profile code sections

To profile the code base of a large-scale website I wrote a simple helper class to determine processing times of certain code sections. In general I find XDebug and KCachegrind very helpful for code profiling but this has proven very helpful for realtime profiling.

<?php
// Define TIMER_RUNTIME_START for total script timing
// This will also automatically be defined on inclusion of
// class_timer.php
define('TIMER_RUNTIME_START', microtime(true));

require_once('class_timer.php');
$timer = Timer::singleton('/tmp/logfile', array('precision' => 2));

// Timers can be nested too
$timer->start('wrapper_timer');

$timer->start('first_timer');
// Do something slow
sleep(1);
$timer->stop('first_timer');

$timer->start('second_timer');
// Do something slow
usleep(200000);
$timer->stop('second_timer');

$timer->stop('wrapper_timer');
?>

The output can be logged to a dedicated file or the standard error log. Here the output of the example above:

=== 2012-11-29 10:05:46 ===
wrapper_timer => total 1.20, count 1, max. 1.20, avg. 1.20
first_timer => total 1.00, count 1, max. 1.00, avg. 1.00
second_timer => total 0.20, count 1, max. 0.20, avg. 0.20
=== 2012-11-29 10:05:46 ===

It is also possible to log the percentage of the total script execution time for each section.

To do so set the percent flag

$timer = Timer::singleton('/tmp/logfile', array('percent' => true));

The output will look something like this:

=== 2012-11-29 10:32:14 ===
wrapper_timer => total 99.24%, count 1, max. 99.24%, avg. 99.24%
first_timer => total 82.62%, count 1, max. 82.62%, avg. 82.62%
second_timer => total 16.57%, count 1, max. 16.57%, avg. 16.57%
=== 2012-11-29 10:32:14 ===

To get the most precise output define the TIMER_RUNTIME_START constant as early as possible in the script:

define('TIMER_RUNTIME_START', microtime(true));

If you don’t define this constant it will be automatically defined on inclusion of class_timer.php.

Download the code here.

Timer class to profile code sections

Percona Toolkit Query Digest

When optimizing SQL queries (MySQL) developers often only look at the slow log. The slow log is a good place to get started but only represents the beginning of the optimization process. Depending on what your long_query_time (/et/mysql/my.cnf) is set to the slow log only keeps track of really bad queries. Those are not necessarily the ones that make your application slow.

Lets assume your slow log time is set to 0.5s. You got a query running which gets logged 5 times a day with an execution time of 1s. At the same time you got a query running which finishes in 100ms but runs a million times a day. Query one would show up in the slow log but query two wouldn’t. 100ms sounds fast but is actually quite slow and the second query is worse than the first one for your application’s overall performance.

The guys from Percona developed the amazing Percona Toolkit which helps you to perform various MySQL related tasks. To track down slow queries the Query Digest tool is hugely helpful.

To get a profile for your application you need log all queries for a representative timeframe (depends on your traffic) and then run the Percona Query Digest tool against the generated logfile.

To create a full query dump you need to set the slow log query time to 0. You have 2 options doing so:

  1. Set the long_query_time config option to 0
  2. or execute SET global long_query_time = 0; in the mysql cli

The second option is preferred because you don’t need to restart MySQL. Logging all queries will generate significant disk IO and slow down your site quite a bit. To get realistic data you want to run this on your production site though. If a temporary performance hit is not acceptable you can set up a replicated slave and log the queries there.

Once you logged your queries for a while you can run the pt-query-digest tool against your generated log:

root@michael:/tmp/percona-toolkit-2.1.7/bin# ./pt-query-digest /var/lib/mysql/slow.log > log

The generated log file will contain a ordered list of slow queries and a detailed summary for each individual slow query. If you didn’t truncate your query log before switching the long_query_time to 0 restrict the data analysis using the –since switch.

Note: MySQL ships with a similar tool called: mysqldumpslow but Percona Query Digest is much more refined.

Percona Toolkit Query Digest