Scalable PHP with APC, memcached and LVS (Part 2)

In part 1 of this post I talked about some of the challenges we encountered when trying to scale a LAMP application. It’s pretty much what you’d read on danga’s memcached site, just dumbed down.

So after some discussion, caffeine and Googling, you’ll probably end up knowing you’ll need:

Memcached!
An internal cache to speed up and optimize PHP across requests.
To continue to find ways to slim down your application.
Get more caffeine.

I had originally intended this post to be a summary of test results, but I am beginning to realize that what you get out of apache bench or httperf isn’t really as important as how much thought you put into your application. If you think about it, all of these perf tests are just trying to quantify something qualitative, and the tests themselves are nowhere near as important as how you get there.

So instead of showing a lot of Excel or Omnigraffle graphs that won’t help you very much, I’d much rather spend this time talking about the process. That way, you might be able to learn from our mistakes, and not make them yourself.

Together with the Mozilla infra team we worked together to put a lot of thought into this application, and that is what really made the biggest impact. In the end, the big win is just snappy pages in production — and we’ve achieved that. And since I’m a massive tool, I’ll draw a comparison between scalability and basketball.

For one, it takes teamwork and unselfishness to succeed. You need the sysadmins to be involved with the application developers from an early point, because they always ask the right questions — and often times the obvious ones developers miss. You need good coaches who know the game, and can direct on both sides of the coin. And after all is said and done on the performance side of things, you need your fans — the community — to gauge your overall success.

You hope along the way that when the game’s over, the score is in your team’s favor and the fans are cheering.

So when you’re planning your app, the best thing you can do is minimize your code by not including massive libraries or classes. Not to knock PEAR or overgeneralize things, but anytime you include a PEAR class, you have to be very careful. PEAR classes are often times bloated and written to “just make it work”. They work well for your blog, or some weekend project, but if you need some serious performance, including a PEAR class is typically a bad decision.

Includes in PHP are a bit like interest rates — it may seem like a small sacrifice to just include something, but over time and over a lot of requests, it can amount to a huge loss. Imagine if you had a 1% fee every time you hit the ATM. Seems like a minor sacrifice, it’s just 1%, but everybody knows that you’d lose a lot over time. So why would you give up 1% over millions of PHP transactions? You should follow some simple rules when dealing with PHP includes in your application:

Make your includes modular. You should allow yourself the ability to mix-and-match includes or class definitions. Some may have dependencies, that’s fine, but you shouldn’t limit yourself by making everything dependent on your DBI, for example. You should think about what you’d do if you had a page that didn’t pull from the DB, and how your app would serve it up.
Use only what you need. It’s easy to throw everything into one init script, but you should only include what your page actually needs to compile. It’s like importing java.util.* instead of just java.util.List. Doesn’t make sense.
Make the most use of what PHP has to offer built-in, and when that fails, write your own wrappers if PECL doesn’t already have a solution. If you’re adventurous and have C experience, you could write your own PHP extension to avoid run-time compilation of common functions. We didn’t necessarily need to do this, but you might consider it if you have a specific need that isn’t addressed with any available PECL extension.
Ask yourself if you really need DB layer abstraction. DBI’s are great, but hey are also huge. PEAR::DB is massive, and if your app isn’t going to be ported to other RDBMS’s, then you should really consider using your own wrapper for the mysql/mysqli functions built-in to PHP. In my experience, people hardly ever switch their DB layer over, and even if they did, if you write a clear and concise DB class, it is easy to switch out anyway. Abstraction here isn’t worth the overhead.
Ask yourself if you really need a template language with more rules and more syntax to mess up. PHP itself is a scripting language made to write web pages — so how much sense does Smarty make? Having been down the Smarty path, I’ve given it a shot, and I don’t think it’s worth it to replicate everything PHP does. If you’re doing it for design purposes, PHP syntax is already pretty simple, and most WYSIWYG editors have built-in escaping/tokenization for PHP’s syntax. If you’re using Smarty for separation of your view component, you can do the same thing in just PHP using app logic. And if you’re doing it so you can cache or precompile output, you’re just duplicating what memcached and APC would already offer you. If we could do it again, Smarty would not be worth the performance loss. So be wary of templating languages in such a high-level language. It’s usually a lose-lose.

At the app level, before you even get into server configuration or caching, you need to avoid violating the rules above. In our journey with addons.mozilla.org (AMO) we made some interesting group decisions a year ago that we regretted later:

PEAR::DB was unnecessarily large, and Smarty is just not worth it — it confuses the issue and redoes things PHP is already good at using arbitrarily complicated syntax. Any quick run through with something like the Zend Profiler or APD will tell you how much of a dog these things can be. If you haven’t already, I highly recommend profiling your app to see where you’re losing performance — I bet it’s mostly in includes.

For caching, we looked at:

Page/output caching
- Smarty (bad, bad, bad idea)
- Cache_Lite (slow)
- memcached (w00t)
Internal caching / optimization
- phpa (meh, and turning into a proprietary solution — double meh)
- APC 3.0.10 (w00t)
- A handful of other outdated and lesser internal PHP caches

For external caching, the clear choice was memcached. Used and designed for LiveJournal.com, it is a pretty standard way to provide key-based caching of any serialized data. It has APIs for almost every language used in web development, so it was an easy choice. It gave the other caching methods an ass whooping.

Based on user comments in my previous post, we punted phpa and went for APC 3.0.x and we liked the results. Initially, using the default settings in APC.ini, we faced some performance losses. After some tweaking, though, APC showed about a 40% increase over the antiquated phpa. Just make sure the read the INSTALL doc. 🙂

AMO currently runs on a handful of memcached instances, feeding multiple LVS nodes configured to use APC 3.0.10. We can now easily handle release traffic and during peak hours the web app doesn’t even break a sweat. The database bottleneck is history.

So we are happy with the results, but they were achieved using methods that are still less than ideal. There are so many more things we can do to improve performance:

Remove SSL for non-sensitive pages
Remove PEAR::DB and Smarty so when pages are compiled and set in the cache it is less expensive
Move away from page-level caching and get into object-level caching to replace DB calls with queries against memached.
Improve the memache implementation in the app to be truly decentralized with fallback. Currently it does not map a set key with a particular server. We still need to add a key->server hash so the app knows which server to try first per key. The trick there then becomes failover combined with the hash — so the app could learn which server to hit if the first choice wasn’t available and remember that choice. That is an interesting challenge in a stateless environment.
Make certain high-load pages purely static and avoid PHP altogether.
Additional tweaks and Apache config changes to improve performance.

Overall, I have to say it was a great ride and a good learning experience playing with these tools. Working with everyone on this was an exercise in open source development, and it showed us that with the right open source tools you can make some pretty decent enterprise-level web apps performance-wise. I hope that in reading this, you pick up a few things you can use in your next project. If you have any comments or suggestions I’d like to hear them.

Don’t just learn as much as you can from what others have tried — write and talk about it too.

23 thoughts on “Scalable PHP with APC, memcached and LVS (Part 2)”

ant

You could try using HTTP caching in dynamic pages if you aren’t already doing that – just figure out what to send for a Last-Modified header, check for an If-Modified-Since header with strtotime(), and return a 304 response or whatever. Saves some bandwidth plus you can avoid processing the rest of the script.
- April 14, 2006 at 3:46 pm
Tim

I realize you may be reluctant to switch caches yet again, but if not, I would highly suggest taking a look at “Eaccelerator”. It is the new name of one of the older PHP caches called “Turck Accelerator”. The author of Turck went to work for Zend, but eventually he gave his blessing and the OSS community was allowed to continue the project. They’ve made some great advancements in the past 6 months, and from most user comments, it seems to almost consistently beat every PHP cache short of Zend’s own (including APC). Just some food for thought.
- April 14, 2006 at 4:04 pm
morgamic

Ok – we will take a closer look at that. Do you know if they offer amd64 packages for Redhat?
- April 14, 2006 at 4:05 pm
Toe

Mike already knows this stuff, but for others reading, here’s a good article on why template systems like Smarty suck:
http://www.phppatterns.com/docs/design/templates_and_template_engines
- April 14, 2006 at 5:22 pm
kmike

another voice for eAccelerator here. it may have quirks with php 5.1.x, but since AMO is still on PHP 4.3.x, it’ll be rock solid.
- April 14, 2006 at 5:34 pm
Ara

I’ve heard the “but PHP is a templating language” argument before and certainly it’s true. But I maintain a pretty massive site with almost a hundred PHP pages and from a measure of how easy is it to maintain, improve, change using Smarty versus not using Smarty was IMHO a net gain. (Note: that I’m really talking about any templating system not Smarty in particular).

I think using Smarty succesfully requires some discipline. It has way way too many features. I made it so my templates only use the most basic Smarty syntax like looping, and basic if/then/else and nothing else (no config files or other nonsense). Even then I use this syntax only to control display not to mess with any variables or change any data. Doing this I found the seperation of DB access/”business logic”/and display code/logic something I frankly would not want to do without anymore.

Certainly you could use PHP instead of a Smarty in this manner and gain seperation by careful use of include(). But being a contractor, having written a lot of code and having had to add to/fix a ton of code, I have yet to see anyone who has successfully maintained good seperation between PHP for business logic and PHP for presentation. It’s usually a muddled mess that’s hard to read and even more had to alter/add to.

Also, I read that link Toe provided and I think it displays a stunning ingorance towards just what it takes for multiple people to maintain a large web application for a long period of time. “Just use echo” and everything will be OK does not work in a large, complex web application.
- April 14, 2006 at 8:23 pm
Roger

Thanks for posting this stuff; I think a lot of people are in the same position and will benefit from this info. Just out of curiosity, do you have any before/after benchmarks … I mean FG% stats? 🙂 I’m always curious which optimizations make the biggest improvements.
- April 14, 2006 at 8:49 pm
alanjstr

One of the other considerations for Smarty was localization. Smarty enables us to use the same language library technology that the rest of mozilla.org does, gettext. Templating and localization should be discussed in bug 245948

https://bugzilla.mozilla.org/show_bug.cgi?id=245948
- April 14, 2006 at 9:56 pm
mikx

Excellent posting. Good to read also others came to the conclusion to drop Smarty and PEAR(::DB) again.

I think an unmentioned side effect of PHP bytecode caches it that they remove or at least minimize the risk of the fileserver being a bottleneck for PHP performance.

Before i used bytecode caches i run into problems that a RAID-5 based fileserver became the bottleneck of the application as soon as a dozen webservers started pulling files from it under heavy load. I even talked to people who told me to use deployment software to put the code on the local filesystem of each webserver instead of using a centralized fileserver.

With a bytecode cache load on the fileserver dropped siginificantly, because you check the file status and only transfer it when really needed.

Memcached looks pretty interesting, but right now the MySQL query cache does an outstanding job. With 95% read-only MyISAM tables database has never been a bottleneck. But this is obviously a decision highly dependant to the application.
- April 15, 2006 at 11:51 am
Pingback: Performance overwegingen voor AMO | Scriptorama
morgamic

mikx – I think danga had some comments on the memcached homepage about the mysql cache. I agree — depends on the app and the scope of the app.

Ara – I agree that having a nice and simple template is a good thing to have. In a way, the separate syntax _forces_ you to honor the “no business logic in view component” rule. I might consider reevaluating ways to make Smarty smarter or more minimalistic to get the best of both worlds. Haven’t achieved that perfect balance yet, though — probably doesn’t exist. 🙂 Do you know of Smarty alternatives?

In my non-templated apps I have used heredoc to reduce sanity, even though it messed up indentation a bit.
- April 16, 2006 at 1:08 am
Pingback: LordElph’s Ramblings » Smarty considered harmful
Tim

Mike, there might be precompiled RPM’s for Redhat, but I’m not positive. I’ve always installed from source, and it is exceedingly simple.

As for 64bit support, to my knowledge there are no current issues with the latest versions (0.9.5 beta2). Much like dovecot, don’t let the “beta2” scare you, it’s extremely stable in it’s current version.
- April 17, 2006 at 10:43 am
Hubert Roksor

Regarding eAccelerator vs APC, I’d definitely go with APC because that’s what Yahoo uses, and it is very well maintained by several PHP core developpers. Rasmus Lerdorf posted a benchmark some time ago on the php-dev mailing list, you may want to take a look at it. Since it wasn’t meant to compare APC vs eAccelerator you can expect them to be quite fair: http://marc.theaimsgroup.com/?l=php-dev&m=114222446428640&w=2 (read the entire thread for other benchmarks/data)
- April 17, 2006 at 1:50 pm
Tim

Interesting benchmark article. It seems that APC has REALLY increased in performance then since the last time I tried it. I’d been thinking of switching (or at least trying out) APC at some point in the future due to the fact that it will be incorporated into the PHP core for PHP 6 anyhow… but this might have sped up my curiosity a bit. Either way, as you can see, eAccelerator is NO SLOUCH though, and is always improving. I’m not sure of APC’s control mechanisms either, but eAccelerator does ship with a very nice (new) web accessible control panel for enabling/disabling caching or optimizations as well as flushing caches on the fly. Might not be worth switching after all… but still bears a look in my opinion.

Anyhow, keep up the great work with a.m.o.
- April 17, 2006 at 3:06 pm
vicaya

Thanks for sharing. Tried apc 3.0.10, found it a lot less stable (apache processes segfaulting under heavy load, restarting apache is required after that) than eaccelerator, which’s been rock solid since 0.93, at least for php 4.4.2 (all compiled from source on gentoo/amd64).

The admin interface for apc actually has more bells and whistles than eaccelerator’s simple one page.
- June 14, 2006 at 9:09 pm
Tony

Eaccelerator is dying, PHPA is proprietary and APC is very buggy and would need a complete rewrite.

The only decent PHP compiler/accelerator nowadays is Xcache : http://xcache.lighttpd.org

Stable, fast, portable, open-source, and you can use it to cache objects.
- September 3, 2006 at 12:39 pm
Oli

Thanks for this interesting article. I came along the PHP performance problematic during my work for a commercial website which had some performance issues in the past.
I believe it is important for an application above a given size to have a clear and well defined structure. Usually, this includes an object model and a clear separation of view and logic. Unfortunately, with PHP it is quite hard to find a trade off from a nice application design to the performance loss this cause. Every additional piece of PHP software is slowing down the application a bit. Therefore, it is not surprising that many developers choose rather to use simple PHP instead of making use of PHP (PEAR, Smarty, etc.) libraries. On the other hand, this decisions might cause the application to die at a later time. There are so many projects (not only PHP ones) which become unmaintainable because they have grown larger without having a well defined structure from the beginning.
For the project I was working on, we decided to use the following libraries:
– Savant (http://www.phpsavant.com/): Savant is a template engine. Compared to Smarty, Savant templates are written in PHP. There is no special syntax which requires compilation of the template. Savant minimizes the overhead of separating the view from the logic.
– Propel (http://propel.phpdb.org/): Is a framework that allows you to define your object model mapping in an configuration (XML) file. It compiles your model to PHP classes. You then have your full object model and database abstraction without writing a single query. This is nice and let you easily create a complex model withing minutes instead of hours. It is clear, that this type of database interaction is far away from being optimal. However, Propel internally uses Creole, which of course uses native database queries. If you like you can throw in your query at each of those points. You can directly use Creole for your query or you can even use the native query methods of PHP. Therefore, it is up to you, how much of overhead is appropriate for the given page. Say you might want to use the object model on a page which gets called once every seconds, but you might directly use the database connection on pages which get called several times per second.
At the moment we are trying to get an opcode cache to work. APC seams to have some problems in our environment (especially with the propel classes). Therefore, we might try XCache or EAccellerator.
Even if just a part of the new code is deployed to the productive servers, it seams to perform quite well. Some tests on the development server has shown, that the overhead of Propel and Savant is not as big as it might seam.
- September 15, 2006 at 4:48 am
nihilist

Is there a diagram showing the layout of the LVS nodes, how memcached was setup, etc? I’m curious as to how the infrastructure is setup.
- September 25, 2006 at 12:55 pm
Pingback: using DBobject - Discussweb IT Community - Web Development, Software Programming, SEO, Quality Assurance, 3D, Web Hosting and more...
Pingback: PHP Architectures that power your favourite big websites - TechEnclave
MikeFM

Instead of using a template engine like Smarty and database abstraction libraries I instead split my applications up into multiple parts. Each component speaks to other components via XML-RPC.

The UI layer processes user inputs into a usable form and outputs. It does templating and a few basic UI related tasks and that’s pretty much it. The UI component passes requests to an application logic component that does all of the real work. In some cases this component will call other components such as our enterprise back-end system, a remote system such as UPS, or a database – again always communicating in XML-RPC.

The code is kept clean and uncomplex by only using one method to interface between different systems. Systems that don’t speak XML-RPC, such as our enterprise system, are wrapped so that they do. It also makes it very easy to integrate these components into other applications regardless to what language you’re programming them in and of course you have the option to write different components in different languages if you need to. By breaking code into components each component can stay very clean, short, and to the point.

The work required to encode and decode XML-RPC requests is minimal and the different components can be on the same network or server. Your http connections can be reused so that there is no need to open them again for each request. If load gets high you can load balance multiple copies of individual components across multiple machines easily. It’s not bad for security too as you can firewall off back-end systems and databases so that the only possible access to them is through your exposed XML-RPC interface.
- September 26, 2007 at 11:57 pm
Enzo

Maybe I missed it, but are you using Squid?

You should be using Squid to cache before you even hit the webservers. If there’s a cache miss there, you should hit Apache2/Lighttpd. If you have to do any dynamic PHP, then you should hit the APC cache first. If that is a miss and you have to generate dynamic page using PHP, then hopefully MySQL Query Cache is also being used. Lastly, if all else fails, MySQL will actually have to do a disk read.

The point is that there should be a heirarchy of caches used.
- October 19, 2007 at 1:32 pm