Friday, December 11, 2009

So you want a new Talos suite, eh?

This quarter Alice and I have focused on trimming the list of pending test suites and where several new ones (419776, 524089, 515540, 506772) have been turned on in production.

The process for getting a new suite in has becoming a lot clearer, so we gave a presentation at the recent all-hands to help the developer know what to do on their end and what RelEng can do for them once their test suite is ready for staging.

Here's what a developer needs to do:
  • Download and install Standalone Talos to test their suite in
  • Once they have established that the test works on at least one platform, write a patch against talos in cvs
  • File a bug against RelEng in the General component and provide the following information:
    • Contact person who will work with us on getting the test suite enabled
    • What the test does, what the expected output should be, long name, short description
    • Which branches and platforms you want the test run on
RelEng will create the buildbot patches that enable the tests, insert the tests into graph server, and work with the contact person while the tests are in staging to make sure the expected outcome is reached. Once the tests run as expected we can turn them on in production. Perfect world turnaround for this process is about a week and a half and involves a short Talos downtime. The rest of the time allotted to our presentation was spent discussing where we should be setting our sights for Talos improvements. This looks to involve two relatively large undertakings:
  1. While it recently underwent some much needed improvements, the graph server still needs to be faster, more stable, scalable, and able to handle our ever-growing data sets. The blocker here is that no one really owns graph server and it's hard to know who should.
  2. Talos is barely holding up under the current load of tests, hardware, and infrastructure. It also works in such a way that a lot of manual involvement is required to add new tests. It would be awesome for it to work more like unittests where once individual tests are checked in, they would go into production immediately. It would then be possible for a developer to not only write a unittest for any bug fix, but also a performance test to go along with it.

Now this brings up the problem of what performance we want to measure and how we want to approach performance metrics in the long run. Alice made a great point when she stated that folks who are not trained and accustomed to doing QA might be challenged by trying to generate tests that actually create a good metric for the performance they wish to be testing. It's entirely possible to have tests that seem interesting on the surface, when you drill down, don't provide any useful data for actually improving anything.

Do we want per-bug performance tests as we do with unittests? While it looks like this is a way to make a developer more accountable for their code, it's pretty obvious that this model wouldn't scale well at all with our current hardware and turnaround expectancy. Imagine as many individual pageload tests as there are mochitests...I suspect no one wants to see that.

Performance testing would be better and more useful if it was targeted at specific features or areas of the product where someone is actually tracking the improvement/regression ranges on them as they are developed. That's a key area of Talos - that a human is actually accessing the data, finding it useful, and making improvements on their feature/area as a result of this information.

While brainstorming with Aki on the potential of the graph server data, one idea really got me excited. Open up the data.

There's been a lot of hype lately about opening up data. In February of this year Tim Berners-Lee encouraged us to start thinking about open, linked data and how it could be the next round in how the Web helps us re-frame our world. In Canada the city of Vancouver opened up its data in the hopes of "improving liveability and governance" in the Metro area.


What if the Talos graph data was made available to the community and a challenge was created in the spirit of the marketing design challenges where we ask people to help us find new ways to view the data? I'd be really curious to see what kind of visualizations would come out of the larger community. RelEng doesn't have a very large community outside of employees, so this could be a great way to start working on creating one.

Friday, October 23, 2009

Upcoming improvements to Talos documentation and test suite creation

This quarter I'm going to be joining Alice in trying to improve the system for adding new suites to Talos.  The current system involves a lot of hackery on our side and slows down the ability for us to get Talos suites up and running as quickly as might be desired.

So with John's help to create a prioritized list of suite requests, we will be doing a lot of communicating with developers in the coming months to get them up and to improve the process and documentation at the same time.  Currently there are 10 new suite requests waiting that are known and there may be others.  

Part of the issue with adding new suites is that there is a lack of documentation and tools for developers.  Our new system will look more like this:

* A request is made for a new suite and a developer is attached to the request who will be the lead person for working with us to get the suite into production

* The dev will be able to use tools we provide (standalone talos, corral of staging-talos slaves) to do proof of concept on the suite so that it works and is ready to go up in staging when it's handed over to RelEng

* RelEng will enable the test suite in staging and verify that changes in staging work fine with the other existing jobs being run on the same machines. Once all is well, then rollout to production would happen

As we progress through the suite requests, this process should get easier for all parties and more streamlined.  We hope that by the time we reach suite #10 it will be much easier and faster for developers and RelEng to get the proposed new Talos suites into production.

I mentioned the developers will have tools provided by us. We need to do a bit of work to make these tools usable by developers and the first place to start is with our documentation of what Talos is and how it works.  Following this we will have discussed having boilerplate code for creating each of the two styles of tests startup or pageload.  Also, it might be beneficial to have a coral of Talos machines that can be loaned out to a dev for a limited time in order to test a suite during creation and debugging.  This coral could then be re-imaged and passed along to the next suite developer.

Here is the current documentation page.  Doesn't give you much to go on, right?

Well this is about to change.  Given my complete lack of Talos knowledge, I will be writing up what I learn about Talos as it's happening so that hopefully a more complete set of docs will exist for the Talos neophyte and folks who want to work with us to add new suites will benefit from this as well.

Here's the current list of the docs to be created based on what we think you might want to know:

* How Talos works and an overview of the development from past to present

* What preferences Talos runs with

* A description of each test suite, what each runs

* What the numbers mean

These are the things I don't know - is there anything you don't see listed here that you want to know more about?  Feel free to make suggestions in the comments.

Friday, September 11, 2009

Mozilla Service Week - Toronto Event

Yesterday I dropped off posters at the Parkdale Library for our Mozilla Service Week event which will take place on Monday September 14th from 2 - 6pm. The Parkdale Library is a really lively branch, with about 10 computer stations that the neighbourhood folks use constantly. Parkdale is the oldest Toronto neighbourhood and though it has a reputation for being a "bad" neighbourhood, it's been heavily gentrified in the past 6 years or so. There's still a lot of people here who live below the poverty line though, and for whom the digital gap is a very real thing. It's also a neighbourhood full of recent immigrants who depend on the library for connections to learning english, finding work, accessing resources for new Canadians, and keeping in touch with family and friends in their home countries.

When I first moved to Parkdale in early 2001, I relied heavily on access to their computers for my internet needs since I didn't have a computer. It was always a stampede to get in the door and sign up for a time slot when the library opened its doors in the morning. They've recently undergone some renovations and now have added wireless as well as a few more stations. The Parkdale librarians are super friendly and encouraging of community (and noise, in a library!) and the building itself is used often for local activities and grassroots festivals. I'm excited that 8 years since arriving here, I'm in a position to give something back to this vibrant place.

Our event involves a table set up near the computers - "Ask a Geek" - where we can field questions about anything that will help improve their interactions with the web. I believe there will be lots of people interested in picking our brains. Of course, I'm also bringing lots of Mozilla swag to draw people over to the table and to use as ice-breakers :)

Anyone in Toronto who wants to participate - please come on by.

Details about this and other Toronto events HERE

Adding choices to Try Server web interface

Just put in my patches on bug 473184 which will allow folks who submit patches through the try server web interface to select if they want a build or unittest run and what platforms it should be run on.

Looks like this:


I've tested it successfully in the staging environment and I hope to get this rolled out in production before the end of the quarter.

Friday, July 24, 2009

New Branch Timeline: Electrolysis

Now, a week or so later - we are setting up a new project branch. Here's the break down:

2009-06-26 11:55 PDT

  • Bug requesting the branch [500755] was created

2009/07/09 11:13:41 PDT

  • Created a tinderbox page for the branch to report to [Tinderbox Page]

2009-07-14 17:44:27 PDT

  • Patches to add Electrolysis branch to buildbot are written and put up on staging-master. There are several compile fails.


2009-07-22 06:12:49 PDT

  • After testing and scheduling downtime, the config files are checked in and P-M is reconfigured - last minute patch to turn on debug builds is added as well when we realize that is not in the default template for turning on a new branch.

2009-07-22 13:31:59 PDT

  • Need graph server machine table updated for new branch - we did a bunch all at once in [504435]

2009-07-23 06:25:18 PDT

  • Add Nagios monitoring support file is checked in and bug filed requesting IT turn it on. [505986]


What remains is to check in the debug builds patch, this should happen in a downtime next week at which point the project branch request bug will be closed.

New Branch Timeline: Places

A brief rundown of what was involved setting up the Places project branch. This is based on the time since the branch request was given the go-ahead, not when the bug was filed since that happened quite a bit earlier.

2009-04-29 17:17:43 PDT

  • Bug requesting the branch [459269] was re-opened

2009-05-18 16:31:46 PDT

  • Created a tinderbox page for the branch to report to [Tinderbox Page]

2009-05-19 09:16 PDT

  • A separate bug was filed requesting a repo [493745]


2009-05-26 16:32:18 PDT

  • Repo is created and [493745] is closed as FIXED

2009-06-24 10:22 PDT

  • Patches submitted to update config files for Staging-Master and Production-Master

2009-06-30 12:51 PDT

  • After testing and patch updates, the config files are checked in and P-M is reconfigured


2009-07-01 08:49 PDT

  • Add Nagios monitoring support by filing a bug with IT [501710]

2009-07-08 08:04:40 PDT

  • [493740] is fixed to deal with the scheduler not picking up the new Places poller after a reconfig, only after a stop/start

2009-07-09 14:36 PDT - 2009-07-10 12:50 PDT

  • Patches submitted to turn on talos and graph server support for the new builds. The first set were not patches to Talos-Pool so a second set was required.

  • New row in graph server added for branch (bug 459269. IT (justdave) ran the INSERT statement against the production database)

2009-07-13 11:18 PDT

  • Patches submitted to turn on leak testing debug builds. This was checked-in and P-M was reconfigured the same day.

2009-07-13 14:57 PDT

  • Bug closed - project branch is up and running on P-M

Thursday, July 9, 2009

Celebrating Firefox 3.5 with sparkly accessories...

A month or so ago when the Firefox 3.5 was close to launching I got in touch with some artist friends of mine who have a small jewellery making business to create some custom accessories for me in celebration. They have been making these awesome belt buckles, cuff links, magnets and many other items for years and I have several of their pieces including a custom "fancy deluxe" belt buckle with my hound dog on it surrounded by shiny Swarovski crystals. They did a great job with the Firefox logo and I am now the proud owner of

A belt buckle:

And cuff links (though I only have one shirt that uses them):


I'm looking forward to wearing them about town and spreading the word about Firefox 3.5. Many of my pals outside of tech circles have been excited about the new Firefox because they see how excited I am about it. For anyone else looking for a custom buckle or accessory of their own get in touch with the folks at Barbie's Basement Jewelry. I hope you enjoy flashing the Firefox logo around in new and fashionable ways - it's always a conversation starter.