A Few Tips for Writing Useful Libraries in PHP

(A month or more ago I started writing a blog entry, which became so long I decided to turn it into an article. However life has gotten away from me, and I don’t know when I’ll get around to doing the clean up to write that article, so here it is.)

Zend has an article Writing Libraries in PHP (you’ll have to trust me on the title, as their CMS seems to be broken). It is a good article as far as it goes, but I think the title over sells the article. Which is a shame, because of all of PHP’s many faults and quirks, perhaps it’s most telling (and most crippling) is a cultural one, PHP programmers write applications, not libraries. I don’t consider myself a PHP guru, being more comfortable in Perl and Java, but I do consider myself a good software developer, and so I’ll try to capture a few of the lesson I learned when developing MagpieRSS, my PHP library for parsing RSS.

First, lets identify the problem.

PHP, its a cultural thang

Python positions itself as distinct from Perl with the slogan “batteries included.” Whether you feel that is accurate is irrelevant, because PHP is the real “batteries included” language. PHP bend so far in this direction, you’ll find functions like pfpro_process for talking to the “Versign Payflow Pro” service in the core language. Besides the patent of absurdity of this, its also created a culture that sees only 2 forms of PHP, core language extensions, and applications.

Why is this a problem?

So whats wrong with that? Code written for a particular application is often very difficult to reuse. Code reuse is one of the holy grails of open source, its how you leverage all the vaunted of benefits of “lots of eyes make bugs shallow”, and patches, and shared development. Code reuse also reduces development time, and bugs in applications that reuse the code. Unless of course to reuse your code I’ve had to dig into it, hacking it to suit my purposes, in which case I’ve probably spent more time, and introduced bugs into code I only half understand. I’ll be tempted to throw it away and start over, splitting time and development energy over multiple solutions rather then improving just one.

So what is the difference between an application and a library?

A little over a year ago, I was wanting to syndicate the events from Protest.net to a website published with PHP. As we generate RSS feeds for our calendars I thought this would be easy. I went looking for a tool to recomend for the website to use, and I came up short. I found many, many, many PHP applications that took an RSS file, and generated HTML, these were applications for displaying RSS as HTML, but they weren’t libraries, they tried to do it all, and therefore couldn’t integrate with this website which had its own way of wanting to use RSS and PHP. A key characteristic of an application versus a library is how many problem it tries to solve; solve too many problems in a single layer and you lose flexibility.

Tips for writing PHP libraries

A few of these tips are theoretical, some are very concrete. Some I’m not thrilled with but they’re are the best solution I have to date. If you disagree, or have suggestions, please add them.

Do one thing, do it well You aren’t building a CMS, you aren’t building an interface, you’re trying to support other people in those tasks.
Don’t echo or print content, return it! This is one of the key problems you see in much PHP code. If you echo out the results of some function directly to the web page, when I’m trying to use your code to write the output to a file, or run it in a testing environment I’m going to be frustrated. What if I’m trying to build an internationalized app, and you’re echo’ing out English? Don’t assume you know in what context your code will be run, return objects, or strings. Don’t print.
Return data, strings, or objects, not HTML. The corollary to the don’t print rule, and abused nearly as often, if not more so, is don’t return formatted HTML. (unless you are writing an HTML widget, in which case that is all you should be doing) If you return the results of your function as a formatted table, it might be pretty and easy to use, but I can’t pass those results to another function to sort them, or integrate it with my pure CSS layout. It is really common to see code like echo('$error_msg');. Don’t do it.
Allow intelligent error handling One of the most common reasons a library will print content directly to the web page is on encountering an error. This makes makes it very difficult for the application using your library to figure out what happened and respond accordingly. Don’t assume your code is the most important part of someone’s application, maybe they don’t care if you failed? Or maybe they care a whole bunch, and want to totally change what they were doing?
Use an error() function I’m still struggling to come up with the best way to do error handling as a library in PHP. Once PHP5 arrives and we have real exceptions, much of this will be irrelevant. In the meantime.

What I do with Magpie is provide an error() method for each part of the application.

error() takes a message, and an optional error level, appends phperrormsg if trackerrors is on, prepends a string identifying the library that is throwing the error, sets up a package/class variable with the resulting error, and, if debugging is on, triggers an error message.

Why is this good? Code using your library has easy way to check for error conditions (if ($lib->error) { ... do error handling .. } ), error messages are very complete, and consistently formatted, when someone is developing with your app they can easily find out what went wrong based on their php.ini settings, and lastly if someone does need to hack or override your chosen behaviour, they only have to do it once.
Allow simple configuration If you don’t provide a way for people to change the behaviour of your library, then you force them to hack on it. If someone has had to go into your code and hack on it, then they’ll be resistant to upgrading as new versions become available, and any changes they make that you want to roll back into the core will be more difficult to apply.

Configuration can be parameters passed to a class’s constructor, set/get methods called later, or constants defined at a runtime. (more on this later)
Choose intelligent defaults This is true with any application, or library in any environment. It is particular true with PHP where a great number of your users aren’t programmers by profession or choice, but just trying to get something working to support their real work.
Break your library into multiple files One way to simplify your code, and encourage encapsulation and reuse is to split your library into logically sub pieces, and move these pieces into their own files.

Something I didn’t do with Magpie, but wish I had was store all the files in a lib/ directory. Having all your files in a single directory makes it much easier for people to install your code. (When it came time to bundle an external library, a modified version of Snoopy, I had learned, and put it in extlib/ )

This tip is really an excuse for the next tip.
Don’t assume everyone’s PHP looks like yours. PHP has a lot of configuration options, runs on dozens of different platforms, and is used in all sorts of different ways. Keep that in mind when writing a library.

If you have multiple files, allow your user to define a base dir. (aka don’t make assumptions #1) This is trick from Smarty that was pointed out to me.

If you have a core library file (e.g. class.inc) that will be including support files don’t make assumptions about the PHP include path on the various machines where your library will be installed.

For example assume there is a constant MAGPIE_DIR defined then you can include the support libraries with:

require_once( MAGPIE_DIR . 'rss_parse.inc' );
require_once( MAGPIE_DIR . 'rss_cache.inc' );
    

This allows code that uses your library to inform your code about the local environment, rather then forcing your client code into contortion to match your expectation of how a PHP install should work.

MAGPIEDIR (or YOURLIBRARY*DIR) might be set up with code like: ```

if (!defined('DIR
```* SEP')) { define('DIR*SEP', DIRECTORY*SEPARATOR); }
    
if (!defined('MAGPIE*DIR')) { define('MAGPIE*DIR', dirname(**FILE**) . DIR\_SEP); }
    
Which fill set the MAGPIE\_DIR to the current directory (useful as ‘.’ isn’t always in the include path) unless you override it with a statement like (for example):
    
```
define('MAGPIE_DIR', '../../magpiefiles');
```
    
 ```
    </p>
    
        (More on using constants for configuration later.) 
    
</li>

    If you're using a semi-obscure PHP extension test that it has been compiled in.  (aka don't make assumptions #2)
    
        This bit me hard when developing MagpieRSS.  I add supported HTTP gzip encoding, and suddenly for a small number of users Magpie starting failing.  This was a surprisingly difficult bug to track down, I recomend avoiding it all together.  This is what PHP's function_exists() function is for.
    
    
    In code that uses gzinflate I might add a conditional like
    
    
```
    
if ( function*exists(‘gzinflate’) ) { …. } Or at the beginning of the Magpie RSS parser I check to make sure PHP has been built with XML support with ```
if (!function
```* exists('xml*parser*create')) { ... trigger error ... }

Don’t pollute the global namespace All functions share a namespace in PHP. What that means is, if I have a parsefile() function in my library, and you have a parsefile() function in your library PHP has no way of telling them apart, and we’ve got a serious problem. Classes help with this.

Another option is to prefix the functions in your library with a common string. Steve does this with Feed on Feeds, prefixing all his functions with fof, e.g. parsefile() becomes fofparsefile(). Cuts down on conflicts, and increases readability.
If you use database tables allow a table prefix. (aka don’t pollute the other global namespace) Most libraries aren’t going to work directly with a database, that is the province of applications usually, but if you are consider allowing the user to configure a table prefix, much like the function prefix from the previous tip. Many users are on low end hosting platforms with only a single MySQL database, this creates another global namespace. If my library has a user table, and your library has a user table, and we have different schemas (almost a guarantee) then we’ve got a problem.
Provide a well designed, object oriented interface to your library, that follows the above rules. This is outside the scope of these tips, but see the following corollary.
Provide a functional, PHP-like interface that builds on your OO interface. PHP is not a language for building airy, abstract object hierarchies. It is a quick and dirty language for throwing together webpages with a minimum of fuss. While its important to provide the more elegant interface for your advanced users, and to encourage proper design, the majority of your users are going to want something simpler.

With Magpie I provide an object oriented RSS parsing class. I also provide a simple, rssfetch() one function front-end that is designed to be used directly from within a PHP page.

My design consideration for rssfetch were fetching remote files is time consuming, and parsing an XML file can be resource intensive. In most languages/environments you would setup a cron script to run in the background handling these tasks, and spitting out HTML fragments. This is not very PHP-like, and often beyond the technical ability of many PHP users. (or beyond what is offered in their hosting environment). So rssfetch transparently uses PHP’s serialize and unserialize to cache the results of the time/resource intensive calls, and serve subsequent calls quickly.

This makes it very easy for the end users to do the right thing, while writing simple, idiomatic PHP. I think this is very important for writing useful PHP libraries, and is one of the challenges that comes with the territory.

Configuring the functional interface using defines. (aside: by functional we means as opposed to object-oriented, not as opposed to non-functional or dysfunctional)

Choose intelligent defaults, and provide a simple default means our functional interface should “just work” for many people. But when it doesn’t, it should be equally easy to configure without cluttering up the API.

What I’ve done with Magpie, and its worked well, is extend the technique used in “setting a base dir”.

Conditionalize your library behaviour on a set of constants (e.g. MAGPIEUSEGZIP, and MAGPIECACHE*ON). Document these constants. People can now change the behaviour of your library with some simple statements at the top of their PHP, like: ```

define('MAGPIE
```* CACHE*ON', false); // turn off cacheing Then the first thing your function should do is call an `init()` function which sets up those intelligent default we talked about: ```
function init () {
    if ( defined('MAGPIE
```* INITALIZED') ) { return; }
    
```
if ( !defined('MAGPIE_CACHE_ON') ) {
    define('MAGPIE_CACHE_ON', true);
}
... other constants ....
define('MAGPIE_INITALIZED', true);
    
```
    
}

Provide examples Always a good idea, but particularly important when distributing PHP libraries. Don’t assume people will read the documentation, or if they do your programmer-esque understanding of your tool will be meaningful to them. What will make sense is example.
Provide examples, carefully Be careful what your examples look like because whatever you do in your example is what 90% of your users will do in their scripts.

Test your examples or your support queue will fill up with people who cut and pasted your code and it isn’t working for them.

Make your examples as attractive as you can while still being simple, otherwise you’ll be forced to look at your ugly HTML all over the web.

Show examples that show proper use of your library, including best practices like error handling. If your examples show how to use a feature it will be used, if they don’t, them most people won’t use them.
Document Document document document. Provide inline document. Provide a README. Provide a FAQ. Provide a website. Provide hints of where to go looking for more info. Hints like “this class is used by rssfetch() you can stop reading now if you just want to use the simple interface” can also be useful. (of course don’t sound too smug about it, as chances are people are there trying to find one of your bugs)

Running code is good documentation.

If you’re going to provide code snippets make them longer then seems necessary as people will often have different instincts about what should come before and after the line in question. This is again the joy and trouble with providing a PHP library.

I personally like “cookbooks”, a hybrid of a FAQ and running code samples.
Use it Yourself, Use Consistent Names, Plan to Expand And just to re-iterate a few of Moran’s suggestions

Until you really use your library you’ll never know how well you’ve succeeded. When you write a library you shouldn’t conceive of yourself as the only user, but certainly one of the users. Also developing good relationship with people who use your library will cause it to improve dramatically.

If you name half your methods getRSSFile() and the other half getcache_dir() your users (and yourself in a few months) will be confused, and your code will look messy.

Moran says, “Never return a single value when you can return an array. Never return an array when you can return an object.”

I think that is overstating the case, often you’ll want to return a single value for simplicity sake, but take into account when you might not want to. Often returning an object will make the most sense, as long as your clearly document how to use the object.

PHP needs libraries

PHP can be a frustrating language to work in, but its also very rewarding. One of the things I find most rewarding about it is the chance to make an impact, and the easiest way to do that if to release well written, well architected libraries. The community is starting pull together and provide some of this in form of PEAR and PECL, but those projects have a long way to go before they are well documented, easy to use and easy to install. In the meantime you can fill a critical need. ### Resisting Temptation, Educating, and Refactoring

Because this lack of libraries is a deep set cultural problem you’ll often find users writing you asking you to add feature X, or feature Y to your library to make it work more like they want their application to work. Remember you’re doing one thing, and doing it well. This is a chance for education. Explain that you’re building a library and the features they’ve asked for are more suited to an application. If you’re feeling like you have the time, or are particularly generous, or if a request comes up over and over consider adding a code sample to your examples to show how to fulfill that particular feature request. Among other things you might find its harder then it should be, and prompt a change in your library. Thanks to Steve, Martin, Scott, and Evan for their feedback and suggestions.