One of the major pitfalls of the PHP LAMP stack is that run-time compilation does not scale very well. An experienced developer would lament over PHP bloat and the other downfalls that come with the ease and convenience of using a scripting language as the foundation for your application.
Over the last month or so, I have been working on creating a better version of addons.mozilla.org (AMO). It is a task that I enjoy, mostly because it isn’t everyday you get the great opportunity to work on a project that touches so many people and means so much.
My journey is similar to the journey taken by Drupal, PHPBB or MediaWiki developers. Given a great PHP product, you not only have to fulfill usability requirements, you also have to make the thing work for an insanely large user base. In my journey I have found an appreciation for the delicate balance between usability/features/design and the pain of having to maintain them. It’s the line between being a designer or marketing droid and being a core developer and a sysadmin.
I’ve learned that, in any successful project, you need to have someone who understands both sides. If you don’t, you run the risk of having one of two things:
- A great looking website that meets your users’ needs but doesn’t run and is not secure
- A fast and secure web app that nobody uses
A successful project usually lands somewhere inbetween. It’s a compromise between the best performance and the usability and flexibility most people demand — albeit users or administrators.
So when you’re faced with circumventing PHP bloat, you go through predictable steps:
- Try to run your site as-is, but then realize it fails miserably because of the database bottleneck
- Duplicate your database, splitting it between read/write and read-only — add a slave or multiple slaves via RRDNS to serve the bits you need
- When this fails, you start looking for output caching mechanisms that give your database a break
- You then realize that full output caching is unrealistic because your cache hits are not high enough
- You find yourself stuck, looking for answers — and you find ways to deliver features with the hardware you have, without making any large-scale app-level changes
When you’re at this point, there are two solutions:
- Optimize PHP somehow to reduce redundant compilation between requests
- Use some sort of granular caching to give your DB a break
In the next part of this post, I’ll talk about implementation of phpa/memcached across an LVS cluster and what a difference it made compared to altnernative solutions. I will post graphs explaining perf gains for AMO, and explain why we used phpa and memcached as the final solution.
Building scalable applications is challenging, but you’re not alone.
11 thoughts on “Scalable PHP with phpa, memcached and LVS (Part 1)”
I’m not alone? w00t!
Graphs eh? Excellent…
I bookmarked this presentation on the use of PHP at Yahoo a few months ago. It sounds like you already have an acceptable solution, but you might give it a glance in case you aren’t familiar with their setup. According the the presentation Yahoo has over 400 million unique visitors every month, so there is no reason why it shouldn’t work for Mozilla too.
Interesting to know why you choose PHPA, it looked quite weak compared with eAccelerator and APC in my testings, and it’s binary only, not open source. And it looks like the development has stopped, last release was in July 2005…
I suggest you look at APC and Memcached and avoid phpa and eAccelerator. APC is written be core PHP developers and is used on all of the Yahoo servers.
You can also avoid extra stat calls with the latest cvs version.
We use memcached and APC together to replicate master / slave in shared memory, we fetch items from the master memcache server and store them locally in the APC memory cache.
So if you throw together master / slave database system with a master / slave granular memory cache you’ll make progress.
I’d also consider creating a seperate server for static files since each apache process most likely includes modules that are not needed such as php itself.
Thanks for the feedback — we will take another look at APC and eAccelerator. PHP optimization was a secondary priority — caching is the big gain for us at this point, because the database bottleneck was priority #1.
The app-level is less of a priority because we have the flexibility given to us by LVS. But if we could use APC and save some hardware… 🙂
I think that setting up a side-by-side comparison between php accelerators would be a fun thing to do this weekend. I might give it a shot.
Will at least get the phpa/memcache results part of this post up today. We’ll see how that compares to APC later this week. Thanks again for the feedback, guys. 🙂
Also see http://lerdorf.com/php/bm.html this was a comparison performed last week. Several improvements to PHP have been made since to CVS and the benchmarks will be re-run soon.
PHP 4 is generally still faster.
I used eaccelerator for a small Drupal site and it made a significant improvement for page loading speed. Of course, I have no idea how well it scales. I think the real key is making your site really bad so nobody visits. Worked for me!
I guess not using PHP isn’t an option?
It was, but there are more factors than when a language gets compiled:
ability to find volunteers or hire contractors
We found that although PHP out-of-the-box on one server isn’t the best-case scenario if you are hosting millions of hits a day, it can be alleviated with good infrastructure, optimization and caching.
If you touch on scalability, then from it’s large following PHP pretty much kicks ass in the other areas:
Easy to learn, so a lot of people understand it and are willing to pick it up
Lots of people out there already know PHP and do it for work / play, so there’s a decent pool of workers out there who don’t have to get paid six figures, or who can easily volunteer 5-10 hour a week and be immedeately effective
Between PEAR and PECL most of what you’d need has been done (I admit, not as great as CPAN, but close)
So not using PHP would be an option, but a year ago when we decided to stick with it, those were some of the reasons why we stuck with it.
I could see using Python or Ruby, but in terms of flexibility, ease-of-use, hiring and maintainability it’s hard to beat the well-roundedness of PHP at this point.
That’s not saying it won’t change. If the best tool for the job down the road is Ruby or Python, so be it. We’ll take a look at it when we get there.
Comments are closed.