Friday, April 22, 2011

Captain Destructo Breaks Everything

Alternate title ideas: "It's not all s/Tryserver/Try"  or "What I should have done, and didn't"

I bet you get the point by now. Today I caused a fairly lengthy, unnecessary downtime on Try.  Now that I'm writing this, things are under control again and there's a few small niggly bits left but nothing that will keep me up at night.

It all started with a bug about graphserver posts from tests not getting through because they were looking for MozillaTry (the tinderbox name for Try) but instead the graphserver only knew about Tryserver (the branch name for Try) and nothing was using Try (except the repo for Try) which is what it ought to have been doing in the first place.

Now that I've been adding a lot of project branches in a short amount of time, certain things have become more streamlined and so I felt that the best option was to go through and rename Tryserver/MozillaTry to Try everywhere so that from the repo going forward, everything was the same. This has been working extremely well for our project branches and helps make setup a snap.

Here's where it gets all broken. I approached this bug with a quick swipe at this problem was superficial and ended up causing some preventable burning.  I shall now list for you (and future me) what I did and what I should have done:

Did:
  • hg rename on configs for desktop
  • branch configs for s/tryserver/try
  • updated graphserver branch name to Try
  • a quick downtime window from 10am - 12pm in order to prevent builds from getting split into two different upload dirs
Should have done:
  • hg rename on configs for mobile
  • grep of buildbotcustom for "tryserver" as we have special casing for it in several files
  • log uploader and post_upload scripts to make sure everything about the try build was going to the right place
  • updated the dir permissions on ftp for the new upload location and ensure that the archive is on nfs mount
  • edited cronjobs on staging to catch the new try builds
  • updated graphserver machines table for each try platform's builder name
  • more notice for downtime, with a 4 hour window that would have allowed a test push to make sure everything was wired up correctly  
  • updated the treeclosure hook to include the new tinderbox page
Some of the things I should have done didn't have an impact on the burning/try closure but it's fair to say that if I had done a staging round of all my plans first I would have caught more of the obvious things that I missed. I would have then planned the downtime better and been prepared to ensure the disturbance would have been minimal since this was, after all, a really low priority bug.

Aki told me that he had a manager who said "you don't learn til you break something".  Well I broke everything try-related today and here's hoping that I have learned something because the stress of this whole day is not something I want to experience often. It's that feeling you get when you realize you've started something that you can't back out of and there's no way to go but forward, even though everything in front of you now appears hopeless and messy.


So here's some lessons to take away:

  1. Staging is not to be underestimated even for just renaming things that are already working
  2. Taking the time to search with grep/mxr and find the terms you are replacing before starting the upgrade in production will help find wiring you might have overlooked in your preparations
  3. Prepare more thoroughly and have a clear idea of the env. you started in and what it will take to have that env. back when you're done. Leaving dangly bits is not ideal.


Happy Friday.
(and many thanks to Aki)

Wednesday, March 23, 2011

Hey BBC would you like to know how releasing software works?

Dear BBC,

Today on the front page of your technology section you said that downloads for Firefox 4 have been lower than they were for Firefox 3 and that:
The lower figure may be explained by the widespread availability of pre-release versions of Firefox 4 in the months ahead of its launch.

First of all, you forgot that we've had 3.5 and 3.6 between those two and so we now have users spread out a bit across versions. Second, here's an overview of how we're organizing the release of Firefox 4:

  • We put out the RC and picked up users from outside of our usual beta testing pool in order to give our final candidate some solid tire kicking
  • Firefox 4 went live but our users on 3.5 and 3.6 are not offered the update automatically yet, they must "Check for updates" in order to be asked if they want to upgrade to Firefox 4 
  • Once we have more coverage of the new release for a couple of weeks and are even more confident that we've got an amazing browser out there we will turn on the Major Update notification which will offer our 400+ million users the chance to come on up and experience the next level of the web
According to W3C school's stats(which are measured by visits to their site) the browser distribution of their visitors looks like this:


2011 Internet Explorer Firefox Chrome Safari Opera
February 26.5 % 42.4% 24.1% 4.1% 2.5%
January 26.6 % 42.8% 23.8% 4.0% 2.5%


2011 Total FF 4.0 FF 3.6 FF 3.5 FF 3.0 Other
February 42.4 % 1.9 % 35.8 % 2.9 % 1.5 % 0.3 %
January 42.8 % 1.5 % 36.1 % 3.1 % 1.7 % 0.4 %

What this says to me is that our more than 8 million downloads since yesterday morning PDT only shows us how many people are paying attention to the fact that Firefox 4 has launched and is available for download. It's not representative of our 400+ million active daily user base (the people who just use the browser but perhaps don't read your blog or mine).  These people will soon learn about Firefox 4 through their browser's update notification window. We'll be seeing a spike in downloads in a couple of weeks and I hope you'll report on that.

Wednesday, February 23, 2011

Bay Area Video Coalition - Teaching Open Video Part 1

Last night was the first meeting of The Factory and Mozilla. The partnership if a result of work between Mozilla Drumbeat and Web Made Movies. Ben Moskowitz and Brett Gaylor invited myself and Atul Varma to what is to be the first of three sessions helping teens learn about the budding open web technologies that can be integrated with video.

About 13 kids streamed into a computer lab at 4:30pm and we began with some introductions detailing who we were and why this stuff is interesting. The group were very engaged and eager to dive right in with whatever we had for them. So we started with Atul's Web X-Ray Goggles so the students could see what exactly the web was made of. The idea was to just grab parts of whatever websites you liked and change them right on the page so you could see how easy it is to "hack" the web. Some of the teens went even further and started building their own pages with Etherpad by grabbing snippets of code from sites. About 4 kids said they had done a View Source on a page prior to this class and 30 minutes into this workshop they were all doing it like pros.

Once they had a chance to remix a web page we moved on to the next exercise which was to select 4 popular sites of their choosing (Etherpad democracy!) and those sites were printed out on paper, the teens were split into 3 teams, and each team did paper prototyping of a new site using elements from the originals. I was very impressed with how the students took the idea and ran with it. Each team worked fast (they had 7 minutes) and no one was hanging back keeping their opinions to themselves. The teams produced 3 new site mock-ups that each had a very simple look, with a video as the prominent element on the page but they also took time to add site navigation and social media integration by putting Facebook and Twitter in the sidebar or footer.

In the last 30 minutes of our time Brett and Ben demonstrated Butter also explaining how very "caveman" the technology is right now. With only a glance at the interface and a basic explanation of how it's wired up the teens jumped right in with suggestions and ideas about what they would like to be able to do:
* Hide popcorn elements when nothing is showing in them
* Be able to zoom in or out on Google maps while the video is playing
* Click on something in the video (example: coffee mug) and have it trigger an event

Ben made a really great point about how it's also important to look at something like Butter and think "How can you go beyond the interface?". How do you make your story more interesting from the beginning knowing you can use this tool throughout instead of just tacking on events and additional information to a completed video that is done in a standard format?

We'll be working with them again tonight, with chunks of a film they made last summer called "The List". More updates and more potential bugs and feature requests coming soon.

Also, if anyone is planning to teach a class like this you might want a few things in a "kit":
* Portable printer (and paper)
* Scissors
* Tape or glue
* Handout with links to the tools/sites

Just to save some time :)

Monday, February 7, 2011

Volunteers needed for upcoming HTML5/Open Video tutorial

I'm hoping if you're reading this that you might be interested in volunteering this coming Saturday to help 12-16 year old girls at the upcoming Dare 2B Digital conference learn about HTML5 and open video.  There's more information and background on what's happening on this wiki page.

Two kinds of volunteers needed:

1.  Someone who is in the Bay Area and available this coming Saturday from 9-3:30pm to be on-site with us in Mountain View at the Computer History Museum and will work hands-on with the girls to demo Miro Converter, Universal Subtitles, and a little bit of Popcorn.js.

2. Anyone, anywhere, who can do translations to any language and who is available on Saturday anywhere in the 10:15am-3pm PST window to do some 'live' subtitling and show the workshop participants how amazing the universal subtitles project is.

Please get in touch if you are interested/available. Or sign up on the wiki.

Thanks in advance!  I will be posting any demos, workshop materials, and an update post-event on how it went.

Thursday, February 3, 2011

Automated Try Results Posted to Bugzilla - A request for input on what the comment should contain

Lately I've been working on a a script which can check your try syntax for a bug number and a setting asking for --post-to-bugzilla.  If you've provided both, your try server results can be posted directly to the bug.  This is just part of a larger project to have patches submitted on a bug get automatically tried out, results posted, and at some point down the road they could even be pushed to trunk after a successful try run so "look Ma! No hands".

Today I have a script running in staging, polling for completed try runs and doing dry runs of posting to bugzilla in log format only so I can keep an eye out for unusual output.  Already this has shown me a couple of bugs to sort out, and I anticipate having them ironed out very soon.  However, before this lands I would like to get some feedback/ideas/suggestions on what the output to the bug should look like and there could be a couple of options even, with a setting in your try syntax.

In my early testing I posted to our landfill bugzilla and here's what a couple of results looked like:
Lots of success looked like too much info

So I took it out, and only printed the warnings and failures

Which works on small runs

From what you see here, I'm sure you can imagine what it would look like if 145 builds all had warnings/failure combos.

So - what do you want to know in the bug? Let's keep it simple, ok? We can add more later and it's important not to create a bug-spammer here that folks will clamor to turn off soon after it goes live.

Off the top of my head, and after talking with Catlee today about it, I think it should look like this:
Try run for $revision with the following comment:
$try_syntax line
S:# W:# F:# (results total) builds complete from N total requests
S:# W:# F:# (results total) tests complete from N total requests
For more information please see http://tbpl.mozilla.org/?tree=MozillaTry&rev=$revision
This gets you a quick glance at total builds/total requests so you can see that everything is accounted for, and where things might have gone wrong in builds vs. tests but doesn't list the failed/warning builder names so you have to follow the link to get them.  Maybe there would be interest in printing what your try syntax request triggered but I'm not sure that's useful in the bug reporting even though it's requested for when a developer is pushing to try. What do you think?

It would be possible to break down the results further by platform instead of or as well as build/test.  Any ideas on how to get that much information across without making a bug comment too verbose? All input appreciated and considered, I'll be trying to land this in the next couple of weeks so comment here, ping me in IRC (lsblakk), or comment in the bug.

Thursday, January 13, 2011

Try Server Road Map - Q1 2010




The other day this post by Google with slides detailing their Chrome release cycle speed up was going around and it mentioned how try and CI were key to their success. It got me thinking that it's time for another update about the upcoming improvements our try server automation. Most of my Q1 work will be on the try server, with some time on Fennec Beta releases, and a bit of time also working on making it much easier to spin up disposable project branches.

The road map image above shows how there are three areas of focus and here they are now in a more detailed list with bug number attached:

Improving Current Automation
  • Bug 617321 tracks adding two new try buildbot master instances to our other buildbot-masters in the Santa Clara colo.  This gives us flexibility to have rolling downtimes on try (as we already have on the other masters) where we can update things behind the scenes and it also helps by adding redundancy to the try automation in case of a colo outage.
  • Bug 580346 is almost done and it adds xserves to try which gives us some faster macosx building power, that along with about 40 ix builders for win/linux builds will crank out more try builds faster.
  • Bug 594236 is key to getting TryChooser syntax turned on as a default.  With an interactive hg prompt on push to try, you should be able to select what you want and/or have your syntax validated.  Khuey started something to do this and if anyone is up for taking it to the next level before I can get around to it, please please do.
Add Features
  • Bug 430942 is what I am actively working on today and the rest of this week, I'm about to have a second draft ready for review and I expect you can watch for this to land in the next few weeks. With or without try syntax, you will be able to specify the bug number in your push comment and have the results of your try run posted as a comment on that bug.  Hopefully this will help out development in letting people know where something is at if you are away when the results come in. It's also part of our old bug (pre-2010) smack down goal so finishing it will be one more step toward that carried-over goal being met.
  • Bug 615705 is tracking a few more tweaks to the try syntax that will give users more flexibility in the syntax and choice about what to build.
  • Bug 421895 is another old bug and Chris Atlee is close to getting it up for poking at. It provides a way for users to cancel their own try build requests without having to ping RelEng.
  • Bug 621681 addresses having better threading/headers since the current headers only help with threading for some clients. When I first wrote it I was testing with Thunderbird, where it works as intended but apparently Gmail and other clients need some help.
Future

Looking to the future right now we have bug 625464 which talks about setting up something to scan bugzilla for a flag on attachments that will trigger an automatic try run with that attachment and either tip of trunk or perhaps a user repo, which would require the other main future bug, 625463. With the ability to poll and run try on hg.m.o user repos we can have project branches (temporary branches that are loaned out to devs or teams to work on a particular project) toggle a setting that would have their pushes to the repo get run through try instead of the main mozilla-central automation.  This could be handy when you want to limit the machines you are building/testing on with the TryChooser syntax.

I hope anyone reading this will find the upcoming try work as exciting as I do.  Reading the Google slides, I couldn't help but sit up straighter at the mention of try being one of the reasons they were able to speed up their release cycle. I'm hoping we can get there too and that our try server will be more robust and ready to handle our soon-to-be-speedier release process too.

Friday, December 3, 2010

Please use TryChooser

Recently there were some improvements to the trychooser and the landing of those changes led to a couple of bugs[1][2] being discovered and quickly fixed.  It is thanks to those who are regularly using the trychooser that we are able to find bugs quickly and also continue to improve the tryserver.

Right now there are over 350 backed up test/talos requests for the tryserver and when I checked our report for trychooser usage it shows that the average number of users pushing with a try syntax has fallen below 50% where it used to be closer to 60%.

I encourage you to please use the trychooser syntax as much as possible. If you do not need every single try result for your patch, do not just push to try and use up all the resources needlessly. Take a moment to insert some try syntax into your commit message.

See https://wiki.mozilla.org/Build:TryChooser for details and http://people.mozilla.org/~lsblakk/trychooser/ for a simple try syntax builder.

Thanks in advance.