The Add-ons Landscape

We’ve been a bit quiet lately because we’ve been cooking up something new. Shortly after April’s re-release of the addons.mozilla.org (AMO) front-end, Mike Shaver came on board. He’s given the project some direction and has been mixing things up a bit with new ideas. He’s helped us focus and move forward on:

  • Drafting policies
  • Gathering more complete requirements for future product releases
  • Making better performance decisions
  • Choosing correct data structures
  • Taking advantage of web frameworks and new technology

We’ve been able to work on all this and still manage to keep things running thanks to community volunteer efforts from people like Mel Reyes, Olive, Nitallica, Alan Starr, Wolf, Pontus Freyhult, Chris Blore, Lupine, Robert Marshall, Giorgio Maone, Mike Kroger, Ed Hume, Daniel Steinbrook, Sethnakht, and Cameron. They have working hard to review add-ons, work with developers, help users, report bugs and submit patches. We all owe them a pat on the back.

Another factor has been the addition of badly needed resources:

  • The attention and focus of Shaver, who genuinely cares about our project
  • Additional staff to help organize and speed up development — myself, Wil Clouser (clouserw), Andrei (sancus)
  • New developers and volunteers stepping up, like Cameron and fligtar (Justin Scott)
  • The ongoing support and direction of Mike Schroepfer (schrep)

With the next major release of Firefox and Thunderbird around the corner, one thing was certain for addons.mozilla.org — change. Lots and lots of yummy change.

Shaver coined “remora“, the shiny-sexy codename of AMO v3. We’ve set up a public wiki where we’ve gathered and cleaned up most of your requirements, and posted a project schedule. We even had one of our volunteers create an image for us (it’s giving birth to add-ons!):

Remora, aka the suckerfish

Our goals are clear:

  • Make finding and installing add-ons easier
  • Support localization of site pages and data
  • Reduce and simplify design and layout with a fresh new look
  • Take full advantage of our new hardware resources
  • Provide better support through forums and threaded discussions
  • Develop with a test-oriented mindset for a more robust and mature application
  • Revitalize and streamline our review process to ensure the quality of add-ons

All this should add up to a common goal: extend Mozilla products to make the web better. Period.

So things are looking up! Please read the wiki and join us in IRC if you have ideas or want to participate in the project:

We are looking for help with l10n support. If you are a translator, please find us in IRC! Thanks.

Enum to Int Conversion in MySQL

Thought I’d post this since it was interesting. If you convert an enum to an int (don’t ask) be aware of the following conversion:

mysql> create table foo ( bar enum('0','1') not null default '0' );
Query OK, 0 rows affected (0.01 sec)

mysql> insert into foo values('0');
Query OK, 1 row affected (0.00 sec)

mysql> insert into foo values('1');
Query OK, 1 row affected (0.00 sec)

mysql> select * from foo;
+-----+
| bar |
+-----+
| 0   |
| 1   |
+-----+
2 rows in set (0.00 sec)

mysql> alter table foo change bar bar tinyint(1) not null default 0;
Query OK, 2 rows affected (0.01 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> select * from foo;
+-----+
| bar |
+-----+
|   1 |
|   2 |
+-----+
2 rows in set (0.00 sec)

MySQL will assign int values corresponding to the non-zero-index of the enum. So, after the migration/conversion you can then use:

mysql> update foo set bar=bar-1;

Or, if you had a more complex enum you could update all corresponding indexes with their correct int values. At least they remain unique so you can adjust.

SQL is fun. For me to poop on.

Scalable PHP with APC, memcached and LVS (Part 2)

In part 1 of this post I talked about some of the challenges we encountered when trying to scale a LAMP application. It’s pretty much what you’d read on danga’s memcached site, just dumbed down.

So after some discussion, caffeine and Googling, you’ll probably end up knowing you’ll need:

  • Memcached!
  • An internal cache to speed up and optimize PHP across requests.
  • To continue to find ways to slim down your application.
  • Get more caffeine.

I had originally intended this post to be a summary of test results, but I am beginning to realize that what you get out of apache bench or httperf isn’t really as important as how much thought you put into your application. If you think about it, all of these perf tests are just trying to quantify something qualitative, and the tests themselves are nowhere near as important as how you get there.

So instead of showing a lot of Excel or Omnigraffle graphs that won’t help you very much, I’d much rather spend this time talking about the process. That way, you might be able to learn from our mistakes, and not make them yourself.

Together with the Mozilla infra team we worked together to put a lot of thought into this application, and that is what really made the biggest impact. In the end, the big win is just snappy pages in production — and we’ve achieved that. And since I’m a massive tool, I’ll draw a comparison between scalability and basketball.

For one, it takes teamwork and unselfishness to succeed. You need the sysadmins to be involved with the application developers from an early point, because they always ask the right questions — and often times the obvious ones developers miss. You need good coaches who know the game, and can direct on both sides of the coin. And after all is said and done on the performance side of things, you need your fans — the community — to gauge your overall success.

You hope along the way that when the game’s over, the score is in your team’s favor and the fans are cheering.

So when you’re planning your app, the best thing you can do is minimize your code by not including massive libraries or classes. Not to knock PEAR or overgeneralize things, but anytime you include a PEAR class, you have to be very careful. PEAR classes are often times bloated and written to “just make it work”. They work well for your blog, or some weekend project, but if you need some serious performance, including a PEAR class is typically a bad decision.

Includes in PHP are a bit like interest rates — it may seem like a small sacrifice to just include something, but over time and over a lot of requests, it can amount to a huge loss. Imagine if you had a 1% fee every time you hit the ATM. Seems like a minor sacrifice, it’s just 1%, but everybody knows that you’d lose a lot over time. So why would you give up 1% over millions of PHP transactions? You should follow some simple rules when dealing with PHP includes in your application:

  • Make your includes modular. You should allow yourself the ability to mix-and-match includes or class definitions. Some may have dependencies, that’s fine, but you shouldn’t limit yourself by making everything dependent on your DBI, for example. You should think about what you’d do if you had a page that didn’t pull from the DB, and how your app would serve it up.
  • Use only what you need. It’s easy to throw everything into one init script, but you should only include what your page actually needs to compile. It’s like importing java.util.* instead of just java.util.List. Doesn’t make sense.
  • Make the most use of what PHP has to offer built-in, and when that fails, write your own wrappers if PECL doesn’t already have a solution. If you’re adventurous and have C experience, you could write your own PHP extension to avoid run-time compilation of common functions. We didn’t necessarily need to do this, but you might consider it if you have a specific need that isn’t addressed with any available PECL extension.
  • Ask yourself if you really need DB layer abstraction. DBI’s are great, but hey are also huge. PEAR::DB is massive, and if your app isn’t going to be ported to other RDBMS’s, then you should really consider using your own wrapper for the mysql/mysqli functions built-in to PHP. In my experience, people hardly ever switch their DB layer over, and even if they did, if you write a clear and concise DB class, it is easy to switch out anyway. Abstraction here isn’t worth the overhead.
  • Ask yourself if you really need a template language with more rules and more syntax to mess up. PHP itself is a scripting language made to write web pages — so how much sense does Smarty make? Having been down the Smarty path, I’ve given it a shot, and I don’t think it’s worth it to replicate everything PHP does. If you’re doing it for design purposes, PHP syntax is already pretty simple, and most WYSIWYG editors have built-in escaping/tokenization for PHP’s syntax. If you’re using Smarty for separation of your view component, you can do the same thing in just PHP using app logic. And if you’re doing it so you can cache or precompile output, you’re just duplicating what memcached and APC would already offer you. If we could do it again, Smarty would not be worth the performance loss. So be wary of templating languages in such a high-level language. It’s usually a lose-lose.

At the app level, before you even get into server configuration or caching, you need to avoid violating the rules above. In our journey with addons.mozilla.org (AMO) we made some interesting group decisions a year ago that we regretted later:

PEAR::DB was unnecessarily large, and Smarty is just not worth it — it confuses the issue and redoes things PHP is already good at using arbitrarily complicated syntax. Any quick run through with something like the Zend Profiler or APD will tell you how much of a dog these things can be. If you haven’t already, I highly recommend profiling your app to see where you’re losing performance — I bet it’s mostly in includes.

For caching, we looked at:

  • Page/output caching
  • Internal caching / optimization
    • phpa (meh, and turning into a proprietary solution — double meh)
    • APC 3.0.10 (w00t)
    • A handful of other outdated and lesser internal PHP caches

For external caching, the clear choice was memcached. Used and designed for LiveJournal.com, it is a pretty standard way to provide key-based caching of any serialized data. It has APIs for almost every language used in web development, so it was an easy choice. It gave the other caching methods an ass whooping.

Based on user comments in my previous post, we punted phpa and went for APC 3.0.x and we liked the results. Initially, using the default settings in APC.ini, we faced some performance losses. After some tweaking, though, APC showed about a 40% increase over the antiquated phpa. Just make sure the read the INSTALL doc. :)

AMO currently runs on a handful of memcached instances, feeding multiple LVS nodes configured to use APC 3.0.10. We can now easily handle release traffic and during peak hours the web app doesn’t even break a sweat. The database bottleneck is history.

So we are happy with the results, but they were achieved using methods that are still less than ideal. There are so many more things we can do to improve performance:

  • Remove SSL for non-sensitive pages
  • Remove PEAR::DB and Smarty so when pages are compiled and set in the cache it is less expensive
  • Move away from page-level caching and get into object-level caching to replace DB calls with queries against memached.
  • Improve the memache implementation in the app to be truly decentralized with fallback. Currently it does not map a set key with a particular server. We still need to add a key->server hash so the app knows which server to try first per key. The trick there then becomes failover combined with the hash — so the app could learn which server to hit if the first choice wasn’t available and remember that choice. That is an interesting challenge in a stateless environment.
  • Make certain high-load pages purely static and avoid PHP altogether.
  • Additional tweaks and Apache config changes to improve performance.

Overall, I have to say it was a great ride and a good learning experience playing with these tools. Working with everyone on this was an exercise in open source development, and it showed us that with the right open source tools you can make some pretty decent enterprise-level web apps performance-wise. I hope that in reading this, you pick up a few things you can use in your next project. If you have any comments or suggestions I’d like to hear them.

Don’t just learn as much as you can from what others have tried — write and talk about it too.