Optimizing PHP code

Source codes
Original image with source codes and execution times

Yesterday Frédéric Bisson posted on Twitter an image with comparison of execution time of the same algorithm (Scrabble helper) implemented in Haskell, Python and PHP. The indicated time was respectively 0.6s, 0.3s and 5s. I know that PHP can be slow, but 5s was sluggish even for PHP. I had to do something about it, so I spent some time optimizing the code and checking possible solutions.

I’ve contacted the author of the image, and apart from receiving input data file for the script, I got some information: first of all, he was using PHP 5.3, with XDebug enabled. Those two things alone showed that the initial time measurement is off scale. The author disabled XDebug, and the number went down to 1.7s. Still quite high. I’ve tested it on a stock installation of Ubuntu’s PHP 5.5 (all extensions disabled), and it was about 0.86s. Slowest of those three in comparison, but it was not that “off chart” initial number.

But I wanted to know if maybe there was something terribly slow in the code. I used XDebug in profiling mode to generate cachegrind output. Yes, there were many function calls, but no individual call was exceptionally slow.

2014-08-08_15-18-26

As you can see above, most called function was substr. The function itself is not slow, but it was called over 1.5M times. I knew that function calls are generally quite expensive in PHP, so I wanted to reduce it however I can. First fix was very simple. I’ve replaced

substr($word, 0, 1)

with array access:

$word[0]

That itself reduced substr function call count to less than 1M, and gave about 20% speed boost (current execution time: ~0.69s). I also removed temporary variables used only once, but that gave almost unnoticeable gain. (code of the current version)

2014-08-08_15-19-03

But that was still not enough. Knowing that function calls are expensive, you might thing about rewriting recursive algorithm to an iterative version. That might not be fair while using other algorithm in other languages, but hey, you have to know your language’s strengths and weaknesses. That change was done by the author of the initial code.

This change lowered execution time to ~0.55s, which is 19% gain from the previous iteration, and 36% total gain.

Also, it was possible to rewrite closure to a form of a generator. That part was provided by Filip Górny on PHPers group on Facebook (PHPers is a network of Polish PHP meetups).

This code allowed me to go below 0.5s – lowest run time that I achieved was about 0.49s. Not a big difference from previous version, but nevertheless noticeable. At this point I’ve concluded by toying with the code. I was thinking about looking at opcode level, but I decided that it was too much.

I finished modifying the code, but I was curious if the newest achievements in PHP core could be of any help. I downloaded and compiled the so-called php-ng, a branch of PHP with experimental core modifications focused on performance improvements. As it was to be expected, I didn’t disappoint. The last version of the code under php-ng was running in about 0.235s. Now that was the number that satisfied me.

But I also wanted to be fair and check also the other new player in PHP world – HHVM from Facebook. HHVM has a different paradigm than php-ng – it is a virtual machine with Just-In-Time compiler. But whatever it is, it’s also known to work faster than the vanilla PHP interpreter, and after tests it proved worthy of its opinion. The code was running even faster than php-ng, with some run times even below 0.2s

And that concluded my tests. Below you can find a table with run times for all executors, with 20 consecutive runs for each one.

php-5.5 php-ng hhvm
t1 0.633 0.248 0.262
t2 0.621 0.241 0.249
t3 0.530 0.233 0.203
t4 0.592 0.324 0.204
t5 0.557 0.240 0.202
t6 0.612 0.229 0.294
t7 0.539 0.219 0.202
t8 0.611 0.337 0.203
t9 0.558 0.220 0.204
t10 0.538 0.218 0.235
t11 0.531 0.220 0.228
t12 0.516 0.248 0.206
t13 0.540 0.217 0.202
t14 0.542 0.225 0.205
t15 0.564 0.219 0.237
t16 0.537 0.312 0.204
t17 0.553 0.218 0.207
t18 0.568 0.221 0.223
t19 0.548 0.221 0.258
t20 0.658 0.312 0.201
min 0.516 0.217 0.201
max 0.658 0.337 0.294
avg 0.567 0.246 0.221
median 0.555 0.227 0.206

Now it’s time for some conclusions.

First one is obvious: PHP can be as fast as other languages (or, better, interpreters/virtual machines of the other languages).

Second, know thy language. I know that it might not be very ingenious, but languages differ. Syntax is most obvious difference, but also some languages excel in different fields than the others. Some languages prefer one structure over another. Here it was seen that it’s better to avoid too many function calls, so iterative algorithms are favored over recursive ones. Also, you have to know the ecosystem to know that there are alternative interpreters or virtual machines.

Third and last conclusion is that this kind of benchmark-type comparisons are pointless (only worse are synthetic tests). It’s very easy to omit some small issue and language intricacies, and the results will differ greatly. PHP is not a language that will be chosen for a large scale data analysis, and it won’t be chosen for a reason – there are better languages for that task. Is this a problem? Definitely not. There are no languages that excel in each and every application, and there’s no need for them to. We have a lot of different languages to choose, so let’s choose wisely appropriate language for a given application.