Craigslist hearts Firefox

Maybe I missed the bus on this and it’s old news, but I haven’t used Craigslist a whole lot. Recently I posted a couple of listings because I wanted to sell some stuff locally and saw a “download firefox” link on Craigslist. Pretty cool.

Craigslist rocks!

Opening searches in a new tab

When I search I have an inefficient habit: Ctrl-T + google.com + enter + type query + enter.

When I thought about it, I realized my habit exists because when you search using the search box Firefox provides, it opens the results in the current tab, which causes you to lose your place. So, since I had to open a new tab every time anyway, I would do it manually and skip the search box.

There is a pref browser.search.openintab that, when set to true, forces search results from the search box to open in a new window. Dolske pointed this out to me today in IRC because I didn’t feel like Googling it and he was nice enough to help me. Thanks, Dolske.

This helps me because:

I typically don’t ever want to leave the page I’m looking at when i’m looking up a definition or term on Google or Wikipedia.
To avoid losing my place, I would always Ctrl-T then search.
I hardly ever used to use the search bar because it was arbitrary to Ctrl-T + google.com + enter + type query + enter, which I can do in less than ~1.5 seconds.

I will use Google before asking dumb questions. I will use Google before asking dumb questions.

My Rackathon Donation

The OSU Open Source Lab has been good for open source. In my time there I learned about what makes open source tick, and that it’s not some mysterious cloud of elitist developers, but rather a group of people just like you and me who work hard everyday to keep things going and end up doing extraordinary things. Kind of reminds me of where I work now… 🙂

So it wasn’t hard to join the Rackathon and donate some bucks to help support all of the projects they host. It’s the least I can do. If you enjoy the projects they host, send some money their way and help support them.

Every little bit counts.

AMO Code Freeze

addons.mozilla.org code will be frozen for both the public v2 pages and the developer control panel code, aka v1 while we work on our planned rewrite of both tools over the next month and a half.

Security or major, major bugs, of course, will be worked on, but for the most part we would like to focus on knocking this next version out of the park. Join us in #amo at irc.mozilla.org if you have questions about any outstanding bugs you think are of high importance, or if you want to discuss the next version.

More information:

Sometimes moving forward is the same thing as learning better habits.

Testing Habits Are Your Friend

As I’ve gone farther down the road I’ve learned the value of testing. My first introduction to unit testing was through JUnit in a Java project I worked on last year. Now, there has been a recent push for testing in PHP web apps that used to be homegrown in the worst ways and need to extend past the typical “what, it works, shutup” approach to PHP testing.

Not testing is not healthy. Sooner or later you’ll be wrong, which will make you a huge jackass. And nobody likes being that guy. I know I have been on occasion. It sucks, and it can make people second guess you, which sucks even more down the road.

So cover your ass by making a paradigm shift when it comes to your development habits and approach:

Create tests that you know would work if you wrote your scripts right — as best you can, don’t go for 100% coverage, just get something up there to mimic the typical “yeah okay it works at least, but not quite” once-overs you do
Assume what you’ve written is wrong
Run your tests, and see if they work

I’ve been convinced that this approach to programming is much healthier (thanks Shaver) because it forces people to think before writing the bulk of their code — possibly alleviating problems before they happen. Duh, right? Everybody knows that, right? Well, not everybody does it, and there’s a big difference.

I think that ideally, everybody would create tests for just about everything possible, but I do have some reservations when it comes to that.

For one, sometimes you just don’t have time. This is a terrible excuse, and I guess it depends somewhat on the scope and sensitivity of your project. But, if your project is planned right you should have the time and resources to get in a fair amount of code coverage without jeopardizing your timeline. And, arguably, if you’re already used to a test-oriented approach to development things might actually be faster.

Another thing I’ve tried to identify is when I’m overdoing it (this is more of a fear). So, okay, you want to test your code as much as you can, but there’s a line I wouldn’t want to cross. It’s the line between having a complete and working end product and having an incomplete product with complete and exhaustive tests. In that case, I’d vote to let some of the testing slide, but not all the way, in favor of a more complete product.

The long tail of development can pick up the slack for more exhaustive tests and bug fixes that you ideally would only fix once — write a test for the bug, fix the bug, done. Most of it would probably be doable during alpha or beta releases — it’s what they are for. I’d argue that it’s also more productive during that time because you might have a better knowledge of your app and be in a better position to spot unforeseen problems and write proper tests.

I’ll be honest; for me it’s been a bit of a learning experience. A welcome one, for sure, but frustrating at times because you’re always going to run into “oh shit, my bad” situations when you’re trying to change mindsets and unlearn bad habits. In PHP, I think this is probably a bigger issue than in other languages because it already lacks a bit of structure by nature. There’s also the programmer laziness hurdle to overcome. It’s a big one.

There are some decent PHP testing tools out there that sometimes gather dust — especially in PHP. But, if PHP is going to break more into enterprise development, I think they will gain in popularity. Here are some PHP testing links for you:

PHPUnit
Simple Test
Apache::Test – also see Chris Shiflett’s blog and his presentation on PHP testing

So — PHP developers, it’s time to stop being lazy and take a serious look at this stuff. If you get an irritated feeling because I said that, it’s because you’re wrong and you’ve just gotten used to being wrong.

Comfortable and easy doesn’t get you anywhere in the long run.

The Add-ons Landscape

We’ve been a bit quiet lately because we’ve been cooking up something new. Shortly after April’s re-release of the addons.mozilla.org (AMO) front-end, Mike Shaver came on board. He’s given the project some direction and has been mixing things up a bit with new ideas. He’s helped us focus and move forward on:

Drafting policies
Gathering more complete requirements for future product releases
Making better performance decisions
Choosing correct data structures
Taking advantage of web frameworks and new technology

We’ve been able to work on all this and still manage to keep things running thanks to community volunteer efforts from people like Mel Reyes, Olive, Nitallica, Alan Starr, Wolf, Pontus Freyhult, Chris Blore, Lupine, Robert Marshall, Giorgio Maone, Mike Kroger, Ed Hume, Daniel Steinbrook, Sethnakht, and Cameron. They have working hard to review add-ons, work with developers, help users, report bugs and submit patches. We all owe them a pat on the back.

Another factor has been the addition of badly needed resources:

The attention and focus of Shaver, who genuinely cares about our project
Additional staff to help organize and speed up development — myself, Wil Clouser (clouserw), Andrei (sancus)
New developers and volunteers stepping up, like Cameron and fligtar (Justin Scott)
The ongoing support and direction of Mike Schroepfer (schrep)

With the next major release of Firefox and Thunderbird around the corner, one thing was certain for addons.mozilla.org — change. Lots and lots of yummy change.

Shaver coined “remora“, the shiny-sexy codename of AMO v3. We’ve set up a public wiki where we’ve gathered and cleaned up most of your requirements, and posted a project schedule. We even had one of our volunteers create an image for us (it’s giving birth to add-ons!):

Remora, aka the suckerfish

Our goals are clear:

Make finding and installing add-ons easier
Support localization of site pages and data
Reduce and simplify design and layout with a fresh new look
Take full advantage of our new hardware resources
Provide better support through forums and threaded discussions
Develop with a test-oriented mindset for a more robust and mature application
Revitalize and streamline our review process to ensure the quality of add-ons

All this should add up to a common goal: extend Mozilla products to make the web better. Period.

So things are looking up! Please read the wiki and join us in IRC if you have ideas or want to participate in the project:

Remora Wiki Page
IRC: #amo#irc.mozilla.org

We are looking for help with l10n support. If you are a translator, please find us in IRC! Thanks.

Scalable PHP with APC, memcached and LVS (Part 2)

In part 1 of this post I talked about some of the challenges we encountered when trying to scale a LAMP application. It’s pretty much what you’d read on danga’s memcached site, just dumbed down.

So after some discussion, caffeine and Googling, you’ll probably end up knowing you’ll need:

Memcached!
An internal cache to speed up and optimize PHP across requests.
To continue to find ways to slim down your application.
Get more caffeine.

I had originally intended this post to be a summary of test results, but I am beginning to realize that what you get out of apache bench or httperf isn’t really as important as how much thought you put into your application. If you think about it, all of these perf tests are just trying to quantify something qualitative, and the tests themselves are nowhere near as important as how you get there.

So instead of showing a lot of Excel or Omnigraffle graphs that won’t help you very much, I’d much rather spend this time talking about the process. That way, you might be able to learn from our mistakes, and not make them yourself.

Together with the Mozilla infra team we worked together to put a lot of thought into this application, and that is what really made the biggest impact. In the end, the big win is just snappy pages in production — and we’ve achieved that. And since I’m a massive tool, I’ll draw a comparison between scalability and basketball.

For one, it takes teamwork and unselfishness to succeed. You need the sysadmins to be involved with the application developers from an early point, because they always ask the right questions — and often times the obvious ones developers miss. You need good coaches who know the game, and can direct on both sides of the coin. And after all is said and done on the performance side of things, you need your fans — the community — to gauge your overall success.

You hope along the way that when the game’s over, the score is in your team’s favor and the fans are cheering.

So when you’re planning your app, the best thing you can do is minimize your code by not including massive libraries or classes. Not to knock PEAR or overgeneralize things, but anytime you include a PEAR class, you have to be very careful. PEAR classes are often times bloated and written to “just make it work”. They work well for your blog, or some weekend project, but if you need some serious performance, including a PEAR class is typically a bad decision.

Includes in PHP are a bit like interest rates — it may seem like a small sacrifice to just include something, but over time and over a lot of requests, it can amount to a huge loss. Imagine if you had a 1% fee every time you hit the ATM. Seems like a minor sacrifice, it’s just 1%, but everybody knows that you’d lose a lot over time. So why would you give up 1% over millions of PHP transactions? You should follow some simple rules when dealing with PHP includes in your application:

Make your includes modular. You should allow yourself the ability to mix-and-match includes or class definitions. Some may have dependencies, that’s fine, but you shouldn’t limit yourself by making everything dependent on your DBI, for example. You should think about what you’d do if you had a page that didn’t pull from the DB, and how your app would serve it up.
Use only what you need. It’s easy to throw everything into one init script, but you should only include what your page actually needs to compile. It’s like importing java.util.* instead of just java.util.List. Doesn’t make sense.
Make the most use of what PHP has to offer built-in, and when that fails, write your own wrappers if PECL doesn’t already have a solution. If you’re adventurous and have C experience, you could write your own PHP extension to avoid run-time compilation of common functions. We didn’t necessarily need to do this, but you might consider it if you have a specific need that isn’t addressed with any available PECL extension.
Ask yourself if you really need DB layer abstraction. DBI’s are great, but hey are also huge. PEAR::DB is massive, and if your app isn’t going to be ported to other RDBMS’s, then you should really consider using your own wrapper for the mysql/mysqli functions built-in to PHP. In my experience, people hardly ever switch their DB layer over, and even if they did, if you write a clear and concise DB class, it is easy to switch out anyway. Abstraction here isn’t worth the overhead.
Ask yourself if you really need a template language with more rules and more syntax to mess up. PHP itself is a scripting language made to write web pages — so how much sense does Smarty make? Having been down the Smarty path, I’ve given it a shot, and I don’t think it’s worth it to replicate everything PHP does. If you’re doing it for design purposes, PHP syntax is already pretty simple, and most WYSIWYG editors have built-in escaping/tokenization for PHP’s syntax. If you’re using Smarty for separation of your view component, you can do the same thing in just PHP using app logic. And if you’re doing it so you can cache or precompile output, you’re just duplicating what memcached and APC would already offer you. If we could do it again, Smarty would not be worth the performance loss. So be wary of templating languages in such a high-level language. It’s usually a lose-lose.

At the app level, before you even get into server configuration or caching, you need to avoid violating the rules above. In our journey with addons.mozilla.org (AMO) we made some interesting group decisions a year ago that we regretted later:

PEAR::DB was unnecessarily large, and Smarty is just not worth it — it confuses the issue and redoes things PHP is already good at using arbitrarily complicated syntax. Any quick run through with something like the Zend Profiler or APD will tell you how much of a dog these things can be. If you haven’t already, I highly recommend profiling your app to see where you’re losing performance — I bet it’s mostly in includes.

For caching, we looked at:

Page/output caching
- Smarty (bad, bad, bad idea)
- Cache_Lite (slow)
- memcached (w00t)
Internal caching / optimization
- phpa (meh, and turning into a proprietary solution — double meh)
- APC 3.0.10 (w00t)
- A handful of other outdated and lesser internal PHP caches

For external caching, the clear choice was memcached. Used and designed for LiveJournal.com, it is a pretty standard way to provide key-based caching of any serialized data. It has APIs for almost every language used in web development, so it was an easy choice. It gave the other caching methods an ass whooping.

Based on user comments in my previous post, we punted phpa and went for APC 3.0.x and we liked the results. Initially, using the default settings in APC.ini, we faced some performance losses. After some tweaking, though, APC showed about a 40% increase over the antiquated phpa. Just make sure the read the INSTALL doc. 🙂

AMO currently runs on a handful of memcached instances, feeding multiple LVS nodes configured to use APC 3.0.10. We can now easily handle release traffic and during peak hours the web app doesn’t even break a sweat. The database bottleneck is history.

So we are happy with the results, but they were achieved using methods that are still less than ideal. There are so many more things we can do to improve performance:

Remove SSL for non-sensitive pages
Remove PEAR::DB and Smarty so when pages are compiled and set in the cache it is less expensive
Move away from page-level caching and get into object-level caching to replace DB calls with queries against memached.
Improve the memache implementation in the app to be truly decentralized with fallback. Currently it does not map a set key with a particular server. We still need to add a key->server hash so the app knows which server to try first per key. The trick there then becomes failover combined with the hash — so the app could learn which server to hit if the first choice wasn’t available and remember that choice. That is an interesting challenge in a stateless environment.
Make certain high-load pages purely static and avoid PHP altogether.
Additional tweaks and Apache config changes to improve performance.

Overall, I have to say it was a great ride and a good learning experience playing with these tools. Working with everyone on this was an exercise in open source development, and it showed us that with the right open source tools you can make some pretty decent enterprise-level web apps performance-wise. I hope that in reading this, you pick up a few things you can use in your next project. If you have any comments or suggestions I’d like to hear them.

Don’t just learn as much as you can from what others have tried — write and talk about it too.

AMO v2

The public rewrite of AMO was released today.

Fixed in this release:

Stuff
More stuff
Scalability
Other stuff

No, but seriously, you might find that your bookmarks are a little off, or _____. If so, find us on irc in #umo@irc.mozilla.org and let us know — we aren’t at the “file a bug if it’s broke” stage yet.

Thanks to Wil Clouser, who worked on a large portion of the updates. Thanks to everyone who pointed out bugs and helped us fix stuff, and thanks most importantly to the reviewers who help us keep this ship afloat.

Speaking of which, please take some time to review our draft policy that we have been working on!

Better late than never, and better late than totally crappy.

Scalable PHP with phpa, memcached and LVS (Part 1)

One of the major pitfalls of the PHP LAMP stack is that run-time compilation does not scale very well. An experienced developer would lament over PHP bloat and the other downfalls that come with the ease and convenience of using a scripting language as the foundation for your application.

Over the last month or so, I have been working on creating a better version of addons.mozilla.org (AMO). It is a task that I enjoy, mostly because it isn’t everyday you get the great opportunity to work on a project that touches so many people and means so much.

My journey is similar to the journey taken by Drupal, PHPBB or MediaWiki developers. Given a great PHP product, you not only have to fulfill usability requirements, you also have to make the thing work for an insanely large user base. In my journey I have found an appreciation for the delicate balance between usability/features/design and the pain of having to maintain them. It’s the line between being a designer or marketing droid and being a core developer and a sysadmin.

I’ve learned that, in any successful project, you need to have someone who understands both sides. If you don’t, you run the risk of having one of two things:

A great looking website that meets your users’ needs but doesn’t run and is not secure
A fast and secure web app that nobody uses

A successful project usually lands somewhere inbetween. It’s a compromise between the best performance and the usability and flexibility most people demand — albeit users or administrators.

So when you’re faced with circumventing PHP bloat, you go through predictable steps:

Try to run your site as-is, but then realize it fails miserably because of the database bottleneck
Duplicate your database, splitting it between read/write and read-only — add a slave or multiple slaves via RRDNS to serve the bits you need
When this fails, you start looking for output caching mechanisms that give your database a break
You then realize that full output caching is unrealistic because your cache hits are not high enough
You find yourself stuck, looking for answers — and you find ways to deliver features with the hardware you have, without making any large-scale app-level changes

When you’re at this point, there are two solutions:

Optimize PHP somehow to reduce redundant compilation between requests
Use some sort of granular caching to give your DB a break

In the next part of this post, I’ll talk about implementation of phpa/memcached across an LVS cluster and what a difference it made compared to altnernative solutions. I will post graphs explaining perf gains for AMO, and explain why we used phpa and memcached as the final solution.

Building scalable applications is challenging, but you’re not alone.

The Danger of Mediocrity

a bobomb Addons make the difference between a regular web browser and a browser that just got “pimped” (you can also pimp your email client, too :D). Having a wide variety of extensions and themes helps developers and product managers focus on core features the majority of users will need without having to simply say, “sorry, nope, can’t do that”.

Okay, so maybe your average extension isn’t going to turn your world upside-down the way West Coast Customs spending a week with your Honda would… fine. But addons for and by the people are a vital and important bridge between a non-bloated application core and end users who want personalized functionality.

Once we understand the importance of addons, and the role addons.mozilla.org (AMO) plays for Mozilla products, we can start to focus on improving it.

So it’s great to see healthly discussions about the future of the service, and how it should be modified to ensure we ship addons with a level of quality that is comparable with Firefox or Thunderbird.

I recently read David Baron‘s excellent post about the danger of extensions and how an undersupported and relatively unregulated addon service could mean disaster for Mozilla products and Mozilla’s reputation in the community.

To say the least, I am now filled with uncertainty and doubt.

Though it’s not about the service itself, or whether or not the community will be able to recover from the first security hole found in an extension, or how extensions are external to the normal Mozilla review standards.

I’ve got FUD about how these products will thrive if they cannot change and adapt themselves quickly to new trends, web services or client-side tools that are available in other browsers.

Despite the theoretical holes in AMO, it’s there, it’s important and it’s popular — for better or worse. It has many great extensions, some good ones, and many poor extensions as well. It’s a distribution as diverse as the community; filled with the good, bad, and the ugly.

And the dangerous? Maybe. David’s points are valid — I share his concerns as well — but assuming problems with extensions will translate into plumetting consumer confidence in an application isn’t necessarily such a straight line. The risks we take with AMO would also need to be compared with consumer confidence in a product that didn’t adapt and offer unique features not found anywhere else.

It’s clear — we’ll have to find the solution halfway. We need to improve the overall quality of addons by improving the review process, and making moves to encourage the same openness and exposure Firefox and Thunderbird get. Most of these changes start with an improved (and solidified) policy, which is currently a major focus.

The technical solution could be CVS integration, stricter review requirements, Bugzilla integration. Ideally, we would have all of those, and everybody would develop, test and verify quality for their own extensions the way Mozilla does for Firefox and Thunderbird.

That all sounds great. Now, who’s going to do it? What would the practical schedule be for a new addon? A new extension version? When does the process of controlling the processes become simply overwhelming?

While I wish it were, I have to say the complete Mozilla review process isn’t a perfect fit for addons. It would delay addon releases, create a barrier between developers and the community, and create a lot of additional work for addon developers — who would most likely be satisfied with keeping their extension simply on their own site, in their own repository, with their own bug reporting software.

So how do we properly review this stuff, then? Myk Melez brought up some good ideas about a modified rating system to guage the overall “trustability” of a given extension. I thought that his approach would be a good one given the unique nature of the addon life cycle:

(Our current system) has a number of problems:

it ignores other trust metrics that would refine our sense of each extension’s trustworthiness;

there have never been enough reviewers, so extensions often wait days or weeks to get reviewed;

generally only one reviewer reviews each extension, increasing the risk of human error;

reviewers review extensions they aren’t personally interested in, so they exercise them less thoroughly than the ordinary users they approve them for;

there’s little reward to being a reviewer, and the downside (being held responsible for approving a harmful extension) is significant.

An alternative approach is to publish each extension to the site the moment it is submitted but hide it from the majority of users until it reaches a trustworthiness threshold which takes into account all available trust metrics, including user feedback, code review, whether it is open-source, whether it is signed with a valid code-signing certificate, the trust score for other extensions by the same author, etc.

Until then, only specially designated “tester” users who have been apprised of the dangers and are willing to face them can access the extension. These testers, a much larger group than the current pool of reviewers, will provide feedback on such extensions like ordinary users, and that feedback will serve as one of the trust metrics through which an extension can reach the trustworthiness threshold required to make it available to the rest of the users.

This approach has several benefits over the current approach:

testers try out extensions they’re interested in using, so they exercise them more thoroughly;

multiple testers must look at and provide generally positive feedback for each extension before it satisfies its “user feedback” metric, so no one tester is solely responsible for deciding if an extension is good;

each extension’s trust score comprises multiple trust metrics, increasing the quality of our decisions about each extension’s trustworthiness.

And, I strongly suspect, the mean time to publication will decrease significantly.

Myk’s idea is basically to properly weight the different metrics unique to the AMO process to determine whether or not an extension is trustworthy. It’s like an improved Digg mentality with a bit of moderation. While there is definitely more discussion needed in the next month, this would be a great place to start.

Sometimes pushing the envelope can give you a paper cut.

morgamic

100% pure halfsie

Mozilla