Optimizing PHP code

Source codes — Original image with source codes and execution times

Yesterday Frédéric Bisson posted on Twitter an image with comparison of execution time of the same algorithm (Scrabble helper) implemented in Haskell, Python and PHP. The indicated time was respectively 0.6s, 0.3s and 5s. I know that PHP can be slow, but 5s was sluggish even for PHP. I had to do something about it, so I spent some time optimizing the code and checking possible solutions.

I’ve contacted the author of the image, and apart from receiving input data file for the script, I got some information: first of all, he was using PHP 5.3, with XDebug enabled. Those two things alone showed that the initial time measurement is off scale. The author disabled XDebug, and the number went down to 1.7s. Still quite high. I’ve tested it on a stock installation of Ubuntu’s PHP 5.5 (all extensions disabled), and it was about 0.86s. Slowest of those three in comparison, but it was not that “off chart” initial number.

But I wanted to know if maybe there was something terribly slow in the code. I used XDebug in profiling mode to generate cachegrind output. Yes, there were many function calls, but no individual call was exceptionally slow.

As you can see above, most called function was substr. The function itself is not slow, but it was called over 1.5M times. I knew that function calls are generally quite expensive in PHP, so I wanted to reduce it however I can. First fix was very simple. I’ve replaced

substr($word, 0, 1)

with array access:

$word[0]

That itself reduced substr function call count to less than 1M, and gave about 20% speed boost (current execution time: ~0.69s). I also removed temporary variables used only once, but that gave almost unnoticeable gain. (code of the current version)

But that was still not enough. Knowing that function calls are expensive, you might thing about rewriting recursive algorithm to an iterative version. That might not be fair while using other algorithm in other languages, but hey, you have to know your language’s strengths and weaknesses. That change was done by the author of the initial code.

This change lowered execution time to ~0.55s, which is 19% gain from the previous iteration, and 36% total gain.

Also, it was possible to rewrite closure to a form of a generator. That part was provided by Filip Górny on PHPers group on Facebook (PHPers is a network of Polish PHP meetups).

This code allowed me to go below 0.5s – lowest run time that I achieved was about 0.49s. Not a big difference from previous version, but nevertheless noticeable. At this point I’ve concluded by toying with the code. I was thinking about looking at opcode level, but I decided that it was too much.

I finished modifying the code, but I was curious if the newest achievements in PHP core could be of any help. I downloaded and compiled the so-called php-ng, a branch of PHP with experimental core modifications focused on performance improvements. As it was to be expected, I didn’t disappoint. The last version of the code under php-ng was running in about 0.235s. Now that was the number that satisfied me.

But I also wanted to be fair and check also the other new player in PHP world – HHVM from Facebook. HHVM has a different paradigm than php-ng – it is a virtual machine with Just-In-Time compiler. But whatever it is, it’s also known to work faster than the vanilla PHP interpreter, and after tests it proved worthy of its opinion. The code was running even faster than php-ng, with some run times even below 0.2s

And that concluded my tests. Below you can find a table with run times for all executors, with 20 consecutive runs for each one.

	php-5.5	php-ng	hhvm
t1	0.633	0.248	0.262
t2	0.621	0.241	0.249
t3	0.530	0.233	0.203
t4	0.592	0.324	0.204
t5	0.557	0.240	0.202
t6	0.612	0.229	0.294
t7	0.539	0.219	0.202
t8	0.611	0.337	0.203
t9	0.558	0.220	0.204
t10	0.538	0.218	0.235
t11	0.531	0.220	0.228
t12	0.516	0.248	0.206
t13	0.540	0.217	0.202
t14	0.542	0.225	0.205
t15	0.564	0.219	0.237
t16	0.537	0.312	0.204
t17	0.553	0.218	0.207
t18	0.568	0.221	0.223
t19	0.548	0.221	0.258
t20	0.658	0.312	0.201
min	0.516	0.217	0.201
max	0.658	0.337	0.294
avg	0.567	0.246	0.221
median	0.555	0.227	0.206

Now it’s time for some conclusions.

First one is obvious: PHP can be as fast as other languages (or, better, interpreters/virtual machines of the other languages).

Second, know thy language. I know that it might not be very ingenious, but languages differ. Syntax is most obvious difference, but also some languages excel in different fields than the others. Some languages prefer one structure over another. Here it was seen that it’s better to avoid too many function calls, so iterative algorithms are favored over recursive ones. Also, you have to know the ecosystem to know that there are alternative interpreters or virtual machines.

Third and last conclusion is that this kind of benchmark-type comparisons are pointless (only worse are synthetic tests). It’s very easy to omit some small issue and language intricacies, and the results will differ greatly. PHP is not a language that will be chosen for a large scale data analysis, and it won’t be chosen for a reason – there are better languages for that task. Is this a problem? Definitely not. There are no languages that excel in each and every application, and there’s no need for them to. We have a lot of different languages to choose, so let’s choose wisely appropriate language for a given application.

[ruby] “remove_entry_secure does not work” for /tmp

If you get that kind of error:

ArgumentError (parent directory is world writable, FileUtils#remove_entry_secure does not work; abort: "/tmp/gitosis20121120-26282-1q9qa73" (parent directory mode 40777)):

Check for permissions for /tmp directory. remove_entry_secure accepts world-writable directory in a path only if it’s /tmp, but only if that directory has 1777 permissions.

[PHP] Zend Server, PEAR, and PHAR error

Some time ago I was trying to install PEAR with default installation of Zend Server. To do that, you have to run a following command:

C:\Program Files (x86)\Zend\ZendServer\bin\go-pear.bat

That resulted in a problem with PHAR archive:

phar "C:\Program Files (x86)\Zend\ZendServer\bin\PEAR\go-pear.phar" does not have a signature
PHP Warning:  require_once(phar://go-pear.phar/index.php): failed to open stream: phar error: invalid url or non-existent phar "phar://go-pear.phar/index.php" in C:\Program Files (x86)\Zend\ZendServer\bin\PEAR\go-pear.phar on line 1236

To solve that problem, you have to either modify php.ini file, adding following directive:

phar.require_hash=0

Or, you can use it one-time only, setting that configuration option in command line:

C:\Program Files (x86)\Zend\ZendServer\bin\PEAR> php -d phar.require_hash=0 go-pear.phar

The latter form is preferred, as disabling checking signatures is considered a security flaw.

[python] libxml2 xpath on child node

Let’s say you have XML like:

<clients>
   <client>
     <name>foo</name>
     <address>...</address>
     <email>...</email>
     <orders>
       <order>
          <id>id1</id>
          <items>...</items>
       </order>
       <order>
          <id>id2</id>
          <items>...</items>
       </order>
   </client>
   <client>
   ...
   </client>
</clients>

And now you’d like to get pairs order_id-client_name. And you’d like to make it in an elegant way, using xPath, not using DOM navigation, or worse, SAX parser. Getting all “client” nodes is easy:

import libxml2

doc = libxml2.parseFile('clients.xml')
ctxt = doc.xpathNewContext()
clients = ctxt.xpathEval('/clients/client')

# clean up nicely
doc.freeDoc()
ctxt.xpathFreeContext()

But now, how to run an xPath query on every node you found to get client name and orders? You have to tell the context object to change the scope of context, so the next query would be relative to the node you chose:

for client in clients:
    ctxt.setContextNode(client)
    client_name = ctxt.xpathEval('name')[0].getContent()
	
    orders = ctxt.xpathEval('orders/order')
    for order in orders:
        ctxt.setContextNode(order)
        orderId = ctxt.xpathEval('id')[0].getContent()
        print orderId+" "+client_name

And that’s it. I’m writing it because documentation for libxml2’s python bindings is scarce, and it took me a while to get to know about setContextNode method.

Complete script:

import libxml2

doc = libxml2.parseFile('clients.xml')
ctxt = doc.xpathNewContext()
clients = ctxt.xpathEval('//client')

for client in clients:
    ctxt.setContextNode(client)
    client_name = ctxt.xpathEval('name')[0].getContent()
	
    orders = ctxt.xpathEval('orders/order')
    for order in orders:
        ctxt.setContextNode(order)
        orderId = ctxt.xpathEval('id')[0].getContent()
        print orderId+" "+client_name

# clean up nicely
doc.freeDoc()
ctxt.xpathFreeContext()

Zend PHP Certification

For some time I’ve been thinking about passing the Zend PHP exam, and having possibility to attend to Zend PHP 5.3 Certification tutorial session during the recent PHP/Zend Conference 2010 was the thing that pushed me to finally do it. I enrolled for the conference too late to get a free voucher for the examination, so I bought it myself after coming back home, and on Dec 13th I finally got the ZCE title (yay for me!).

What does it test?

At the ZendCon it was said that the PHP 5.3 update of the exam was a huge leap forward in the meaning of quality and thoroughness of testing. Before this version, to pass the exam it was necessary to more or less memorize the manual — function names, arguments and return values. As I didn’t pass any previous version I can’t comment this statement, but I know that on my set of questions I didn’t have many questions that could be labeled as “manual questions” — I remember one “what is the output of the following code” question that was somehow related to the knowledge of function arguments, but it was enough to know the capabilities of the function rather than “is it ‘f($haystack, $needle)’ or ‘f($needle, $haystack)'”, and other about a name of a configuration option, but a very popular and important one.

I found the exam very interesting, as the questions were not only about “dry” PHP code, but also about broadly defined web-related technologies. Databases, web security, web services, HTTP protocol, etc. Zend’s training department strongly stated that the exam is not only about knowing the function names, but about assessing if the person has all what it takes to be a good programmer. And I might confirm that — without proper experience in the web development field it is very difficult to pass the exam. I mean, you can memorize all the definitions of what XSS, CSRF, SQL join et al. is, but the questions are not about definitions, but about a proper understanding of concepts, like “would X solve the problem of CSRF”.

Is it difficult?

It depends. Even if you work with PHP applications on a daily basis, it’s still necessary to check preparation guides. Why? For instance, I’ve been using PHP since PHP3, but I didn’t need to use web services at all. And from study guides and the tutorial I mentioned before I knew that even if web services is not the most important part of the examination, I need to know at least something about all the parts of the test.

Most of the questions are simply “either you know the answer or you don’t”, and that way you can call the test easy. There are also some analytical questions, where you have to force yourself to do some thinking — but if you know how PHP works, it’s only a matter of time to get the proper answer.

The test consists of 70 questions, for which you have 90 minutes – it’s a more than a minute for a question. It’s not much, but as I’ve said before, for most of the questions either you know the answer or you don’t. That way you can go through the easy questions while marking the other for review (the examination software allows that), not wasting time on things you can’t answer from the top of your head.

Watch out for trick questions. It’s especially important on “what does this code print” — one single character can totally change the meaning of the code. If everything seem obvious, usually it’s not. Look at return values, variable scope, function calls etc.

My strategy for this test was:

First pass: easy questions, test questions (generally those which took me just a few seconds)
Second pass: difficult, analytical questions
Third pass: questions I didn’t know the answer for, but with the possibility of proper extrapolation/guessing (questions without any possibility of me answering other way than guessing fell into the first category).

With this method it took me about one hour to fill all the answers.

How to prepare

Zend offers courses preparing for the exam, but in my opinion it’s an overkill. If you only need to refresh and organize your knowledge, $1000 for the course (well, minus $195 for the included exam voucher) is a bit too much. If you need more training (e.g. you don’t have enough experience with PHP), this course wouldn’t help, as it is not designed to teach you PHP.

Everything you need to know about PHP to pass the exam can be found in the manual, which is great, but it contains knowledge organized in a way that is not very helpful while studying for the exam. After buying voucher for the examination you should receive PDF version of Zend Certification Study Guide, which can help you to plan your learning. Generally, I’d advise to read the whole basics section of the manual — variables, data types, control structures, OOP, new features of PHP 5.3 (LSB, namespaces), and to skim through index of some elementary functions (string, array) to know what is possible with PHP. Check the list of topics covered by the exam (to be found on Zend page) and make sure that you know at least basics of each entry.

One more word about the Study Guide provided by Zend – it’s poor. It’s poor mainly because it’s not finished and at some places you can find placeholders instead of some real information. It’s nice to get anything (especially if there are sample questions — and it’s very important to read them as they give you some outlook how the test looks like), but after a year that has passed since the test has been updated I’d expect something better.

What to focus on?

According to the rules of Zend Certification I can’t leak out any questions, but I can give some suggestions. Aside from the obvious, I’d recommend to review especially:

references
streams, contexts etc.
XML processing
OOP’s features like inheritance, static methods, LSB
PDO

SQL-related questions are quite easy, so if you know how to do select, insert and how inner join works, you’re good.

Any benefits?

A value of the certificate itself is debatable. It’s hard to tell if a potential employer will take that paper into consideration or not. Still I think it wouldn’t do any harm. It’s a proof that you meet a certain level of PHP (and web development in general) skills. Exactly like an English language certificate is to be verified on the first interview, your PHP skills will be verified sooner or later, but if you don’t have a proper entries in the résumé, you might not even be invited to the interview.

Few days ago there was a discussion about that on Twitter, and one of the guys said that having ZCE title means that you are not creative and you are wasting your time which could be spent on open source projects otherwise. I think its a very radical opinion. Having ZCE doesn’t exclude being involved in OSS projects. Also OSS projects are long term involvement that can’t be compared to one evening spent on a short recap. My opinion is — don’t put all your eggs in one basket. You can’t rely on the certificate solely, just like you can’t rely on open source projects. Working on public projects can be beneficiary to your skills (but — looking at some of the high-profile projects — it doesn’t have to), and is a nice point on your CV, but just like with anything, employer might don’t give a crap about OSS (and saying “those companies that don’t take open-source into consideration are evil” is childish, recruitment procedures in big companies might be far away from the nearest person that knows anything about computers).

Preparation for the exam can be a value itself. It’s a motivation to look into subjects one didn’t need before (for me it was PDO – I was using abstraction layers, so I didn’t need that), and make a review of features and changes one possibly didn’t know about.

I decided to get the certification because I wanted to have some written proof of my skills, as even a list of prior projects does not say anything about the quality of those projects.

Should I re-test if I have PHP4/PHP5 cert?

Well, it’s up to you. If the certificate is for your better self esteem – go ahead. If it’s for improving your position on the market, it’s like with the value of the certificate in general — employer might value you better if you have the newest version of the document, but he doesn’t have to; he might don’t know the difference between PHP5 and PHP5.3 ;)

Conclusion

In my opinion, if you have some time and two hundred bucks to spare – go ahead :)

[PowerShell snippet] Resolving hostname

Recently I’ve started to embrace PowerShell’s great possibilities, and as a result of that, I’ll post some of my “toys”.

On Unix-like operating system I’m using host command to resolve hostnames into IP’s and the other way. On Windows, there’s a nslookup tool, which works just like the Unix equivalent, but to get accustomed to PowerShell, I’ve decided to write a function, which uses internal command to do NS lookups.

Generally, to resolve a hostname you can use this one-liner:

[System.Net.Dns]::GetHostAddresses("devplant.net")

You can shorten it by declaring a short function.

function resolve( [string] $in ){
   [System.Net.Dns]::GetHostAddresses($in)
}

(Defining functions remember not to collide with pre-existing variable names)

But this is only one-way command – to resolve IP address into a hostname, you have to use other command:

[System.Net.Dns]::GetHostbyAddress("72.21.210.250")

To combine both functionalities, you can extend our function, using regular expression to do a naive recognition of IP addresses.

function resolve( [string] $in ){
	if ($in -match "(\d{1,3}\.){3}(\d{1,3})") {
		[System.Net.Dns]::GetHostbyAddress($in)
	} else {
		[System.Net.Dns]::GetHostAddresses($in)
	}
}

Now here’s how you can use it:

PS C:\Users\leafnode> resolve devplant.net

IPAddressToString : 91.192.224.142
Address           : 2397093979
AddressFamily     : InterNetwork
ScopeId           :
IsIPv6Multicast   : False
IsIPv6LinkLocal   : False
IsIPv6SiteLocal   : False

PS C:\Users\leafnode> resolve 72.21.210.250

HostName                Aliases           AddressList
--------                -------           -----------
210-250.amazon.com      {}                {72.21.210.250}

PS C:\Users\leafnode>

Where are you going, MySQL?

Recently Brian Aker announced that he will develop RDBMS trimmed especially to use with web-apps. It will be named Drizzle.

Features of Drizzle

What will be changed in Drizzle in comparison to MySQL? First of all, whole architecture will be changed. Drizzle will be not monolithic chunk of software like its predecessor, but it will be based on microkernel idea. Most features will be moved from core to optional modules. Those features, like triggers, views, or even query cache, are standard for modern database servers, but are very rarely used in webapps (which is very strange for me, but I’ll come back to this later). One of these modules will be InnoDB engine (owned by Oracle, double licensed), which would make upgrading to newest version of that engine easier. UTF-8 will be standard. Generally – looks nice.

On the other hand, Windows users will be sad, because probably (but not for sure), Drizzle will be available only for Linux and MacOS X. Maybe it’s not such a big deal, because most of production servers are working under non-windows OS’, but for development it would be nice to have a possibility to test-install this RDBMS on Windows.
Continue reading Where are you going, MySQL?

Defending PHP (or not)

Today I’ve read article “Defending PHP” by Jim R. Wilson. He begins saying Ugh. I am so tired of defending PHP. And I’m saying “I am so tired of people defending PHP”. Why? First of all, if everything is OK, the language defends itself, and if lot of people complain about it, maybe really something is wrong with PHP?
Continue reading Defending PHP (or not)

Syntax coloring / highlighting

Everyone knows that “higher level management” like to look at colorful things, especially on PowerPoint presentations, and source codes are most boring things you can include in documentation. How to help it? You can colorize your codes.

There are many software packages that can “beautify” sources. Most of them have one limitation, which can ruin whole experience: small amount of supported programming languages.
Continue reading Syntax coloring / highlighting

Annoying Eclipse

Auto-closing brackets and strings in Eclipse is very useful, but it works fine only for typing new code. When editing, it’s really annoying when you want to enclose some existing string with apostrophes and Eclipse engine enters two marks instead of one.

Great thing about Eclipse is that it’s very configurable. Also search option in preferences dialog is helpful. With these two features, it’s easy to find option to disable code completion for braces and apostrophes. Choose Preferences from Window menu, then with tree navigate to Java->Editor->Typing and untick options in Auto-close panel. That’s it.