One of the major pitfalls of the PHP LAMP stack is that run-time compilation does not scale very well. An experienced developer would lament over PHP bloat and the other downfalls that come with the ease and convenience of using a scripting language as the foundation for your application.
Over the last month or so, I have been working on creating a better version of addons.mozilla.org (AMO). It is a task that I enjoy, mostly because it isn’t everyday you get the great opportunity to work on a project that touches so many people and means so much.
My journey is similar to the journey taken by Drupal, PHPBB or MediaWiki developers. Given a great PHP product, you not only have to fulfill usability requirements, you also have to make the thing work for an insanely large user base. In my journey I have found an appreciation for the delicate balance between usability/features/design and the pain of having to maintain them. It’s the line between being a designer or marketing droid and being a core developer and a sysadmin.
I’ve learned that, in any successful project, you need to have someone who understands both sides. If you don’t, you run the risk of having one of two things:
- A great looking website that meets your users’ needs but doesn’t run and is not secure
- A fast and secure web app that nobody uses
A successful project usually lands somewhere inbetween. It’s a compromise between the best performance and the usability and flexibility most people demand — albeit users or administrators.
So when you’re faced with circumventing PHP bloat, you go through predictable steps:
- Try to run your site as-is, but then realize it fails miserably because of the database bottleneck
- Duplicate your database, splitting it between read/write and read-only — add a slave or multiple slaves via RRDNS to serve the bits you need
- When this fails, you start looking for output caching mechanisms that give your database a break
- You then realize that full output caching is unrealistic because your cache hits are not high enough
- You find yourself stuck, looking for answers — and you find ways to deliver features with the hardware you have, without making any large-scale app-level changes
When you’re at this point, there are two solutions:
- Optimize PHP somehow to reduce redundant compilation between requests
- Use some sort of granular caching to give your DB a break
In the next part of this post, I’ll talk about implementation of phpa/memcached across an LVS cluster and what a difference it made compared to altnernative solutions. I will post graphs explaining perf gains for AMO, and explain why we used phpa and memcached as the final solution.
Building scalable applications is challenging, but you’re not alone.