Wednesday, May 26, 2010

Tryserver 2.0 - Fine tuning and learning the hard way

I really thought the try server as a branch was ready to roll out when I did it.  Seriously.  It took a couple of months to get it to the point where I felt it was ready for the public.  So I pushed it live last week - detail here.

Immediately a few issues came up that ruined my vision of a smooth transition:

* Emails were going to the changeset author and not the person pushing the change. Turns out I had changed this default behaviour and didn't know that there was a reason why we used to grab the email address from buildbot sendchange info instead of from hg author.

* Packaged unittests were being merged.  This was quite a big deal for the first full day of being live because people were getting all kinds of strange emails about results that weren't theirs.  The reason this wasn't caught in testing was that there is no 1:1 mapping between a build and its packaged unit test results (except when you go look at tbpl). We should get one set up for MozillaTest where our staging results go in order to make sure something like this doesn't happen again.

* The URL included in an email regarding test results didn't contain a changeset - only the build and leak test builders were setting the information needed to grab this for the emails.  Thanks to dbaron for catching that quickly and bringing it to my attention.

* The try-mac slaves got delayed on their way to the new tryserver's slave pool because of some glitches with puppet and so a huge backlog formed for OS X builds and people thought they didn't even exist.  Because the backlog got so large (80+ full-length builds) I opted to restart the master, wiping those out of the queue in order to get the tryserver back to decent turnaround for all platforms.

I'm still ironing out some issues, tweaking the email results, and I've temporarily disabled the web interface as I work on getting it to use hg push so that all the inputs to tryserver arrive in the same way.  I'm also trying to get more slaves to add to the builder pool since the packaged unittest builders are builder hogs. 

Let me know if I've missed anything.  It's my goal this quarter to make tryserver as helpful as possible in keeping mozilla-central green. 

4 comments:

dave said...

Glitches maybe, but honestly, this thing is awesome. Great work, I'm using it a lot, and love it.

Dave

Robert said...

Yes, this is awesome.

It would be great if the emails for each changeset showed up as a single conversation or thread, especially now that a lot more emails are generated, although this might be tricky.

You could have one email when you submit a job that kicks off the thread, and make all other emails have the same subject and In-Reply-To with the right message ID (which could be based on the changeset ID). E.g.

Message-ID:
Subject: Try Server test ca25ac58f67d

Your build has been submitted. When complete, it will be available for downoad at
http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/rocallahan@mozilla.com-ca25ac58f67d

...

In-Reply-To:
Subject: Try Server test ca25ac58f67d

Your Try Server test (ca25ac58f67d) was successfully completed on win32 on builder: WINNT 5.2 tryserver debug test mochitests-1/5.

It should be available for download at http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/rocallahan@mozilla.com-ca25ac58f67d

Summary of test results:
TinderboxSummaryMessage: s: try-w32-slave15
mochitest-plain-1: 61110/0/786

Visit http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1274927639.1274929103.30298.gz to view the full logs.

...

The initial email would be useful in its own right because people often want to kick off a try server build and tell other people where the builds are going to show up.

Robert said...

Oops, Blogger ate my message IDs, but you get the idea.

Lukas Blakk said...

@Robert, the emails should be threaded since there are extra headers set for try emails that specify the revision id - see http://hg.mozilla.org/build/buildbotcustom/file/tip/misc.py#l454

If you find that is not the case, please file a bug.