Tuesday, June 14, 2011

Tree Closing Downtime Notice - 4am - 8am PDT Thursday June 16, 2011

Trees will be closed for downtime so that we can land the following:

1. https://bugzilla.mozilla.org/show_bug.cgi?id=662396 -- Fix time on dm-wwwbuild01

2. https://bugzilla.mozilla.org/show_bug.cgi?id=600980 -- Set journal_mode = WAL for dirty places profiles -- This mean new performance numbers will start on Thursday morning after the downtime

3. https://bugzilla.mozilla.org/show_bug.cgi?id=649123 -- Run ANALYZE on dirty places.sqlite files --  This mean new performance numbers will start on Thursday morning after the downtime

4. https://bugzilla.mozilla.org/show_bug.cgi?id=663568 -- reboot the DNS and DHCP servers in scl1 -- Rebooting these servers has been shown to burn builds in the past, requires a short (~5min) outage to reboot these servers to allow updates to take effect.

5. https://bugzilla.mozilla.org/show_bug.cgi?id=663963 -- change LDAP to see if that speeds up mercurial -- This change should be entirely transparent.  Hg processes that are running at the time that the change was made will have already loaded the NSS LDAP module and will continue to use it until they exit.  The only issue to be aware of is that changes to hg access (group membership, or the creation of a new account) will not automatically propagate to the hg servers the way they do now.  If any hg access changes need to be pushed urgently, we can do that manually.

If anyone has a reason not to proceed with this downtime, please let me know.

Thoughts on cultivating an "Everyone is Remote" attitude

As I write this I am working from Paris and our team timezone spread looks like this:
  •  Rangoria, New Zealand: UTC (+12)
  •  Bucharest, Romania: UTC (+3)
  •  Istanbul, Turkey: UTC (+3)
  •  Paris, France: UTC (+2) <--- ME!
  •  Ottawa, ON: UTC (-4)
  •  Toronto, ON: UTC (-4)
  •  Philadelphia, PN: UTC (-4)
  •  Clifton Park, NY: UTC (-4)
  •  Chicago, IL: UTC (-5)
  •  San Francisco, CA: UTC (-7)
  •  Mountain View, CA: UTC (-7)

I'm going to go out on a limb here and say this: Release Engineering does a good job of working remotely with each other. We are 15-16 people (with a few more contractors/fte on the way) and it doesn't matter where you live for you to work with us. Here we are in our meeting yesterday:

Releng Weekly Meeting - June 2011

Quite the impressive Brady Bunch layout, right?

Here's what we do that I think works well for working remotely:

* We meet once per week as a whole group on Mondays. This starts the week off with a status update on our major projects and also a chance for individuals to speak up about anything they're working on that they'd like people to be aware of.

* We are always having conversations in IRC amongst ourselves and with others in several channels. We use #mozbuild as a backchannel for our inter-team discussion, #build for access to a larger group of fellow Mozillians (like philor, Kairo, and ted for example, who often need to liaise with us), #developers is also a place we frequent and then there are some IT/mobile/QA/release-specific channels we hang out in as needed. I think this helps us have a presence in many areas of engineering/dev/IT and even with some of the non-technical teams at Mozilla where inter-team communication needs to happen. It keeps us in the loop on what various teams are up to and also provides the IRC equivalent of being able to overhear water-cooler chat and participate as well.

* We keep wiki pages for most everything. From "how-to" pages for our own release process, automation details, and project planning all the way to pages for outside-releng folks like the Try Syntax. While I find wikis frustrating the minute the information is out of date, the fact that I can update them and find them in my awesomebar quickly when I need them is very valuable to me.

* We email our group with important notices and changes to how things are done. There are not often times when someone will say "Oh I didn't know about that" and the response is "It came up in the hallway when I was talking with so-and-so". More often than not, the person driving a particular upgrade or change to current practices will send out an email to the group with details of : a) what the change is b) what it means going forward c) how the message has been disseminated to a wider audience (if needed) and finally d) where the wiki pages (and bugs, if needed for reference) can be found. This allows any of us to find the information N time units later when the change actually comes up in your daily work and you're wondering "What was I supposed to do when trying to use the new X again?"

* We all meet up face to face approximately once per quarter. Twice a year for Releng work weeks and twice a year at Mozilla all-hands/summit gatherings. We take these as opportunities to discuss larger topics with lots of brainstorming, whiteboard scribbling, and animated opinion-sharing. Notes from meetings like this turn into wiki pages (often during the meeting itself) and those can become specs for projects/bugs to carry the work that needs doing to the next level.

I think that gives a good idea of our team practices. Now here are some thoughts I've been having about lately with regards to working remotely in Mozilla as a whole. It helps that I'm currently working in Paris right now and am pretty much completely opposite of the PDT work day but some of this was on my mind even when I was in SF.

I think Mozilla has an amazing opportunity to set trends in how to work with distributed teams. We already have people in every time zone! Even with the incredible advancements we've made with our use of video/audio/irc tools (airmozilla/vidyo), there are some ways in which MV is still the eye of Mordor for the company.

I would like to see us shake that up so I think we should try:

* Not having meetings in large groups in MV (except at all-hands). Instead, put small groups of people in various rooms around the building so that "we are all remote" is a reality for everyone so that the clarity of the communication channels are taken seriously. This means we all become just as invested in the quality of audio/video feeds, using tools like Etherpad for public collaboration, and advocating best practices for the speakers/presenters as those who are not in MV. I bet we'd see an increase in contributions to new tools & meeting practices if we were all experiencing meetings remotely on a regular basis.

* Rotate the hosting of the Monday meeting so that over a series of Mondays it would be run from various remote Mozilla offices and this would mean that it moves in time (which could be scheduled in advance) but it also means that all offices get a chance to feel special and be the center of attention. We'll have an opportunity to get to know our co-workers from other offices better as they present the meeting and I even imagine some friendly competition could develop for who can run the most energetic and engaging meeting.

I'm really interested in trying that second one. The most MV-centric thing we do is have our 11am PDT meeting on Mondays be a locked-down time. What if it rotated around each week and just happened somewhere in the 9-5pm spectrum of your timezone? We could create a schedule for it so folks could have lots of notice for scheduling their other Monday things around it. Also, maybe sometimes you might miss one Monday meeting because it's just not at a good time for you but that's something some of our remote workers might say is just par for the course.

I know the idea needs more work, but there's the nugget of it. Curious to know what others think. I'll be continuing to talk this up - maybe we can have a larger discussion at the all-hands in September. Eventually I'd like to see us get to a point where we all think of ourselves as remote since if you look at Mozilla as a whole there does not really need to be a "hub" where one would be "local" compared to everyone else - there's just planning for timezones/meetings and then all the people we work with doing their amazing stuff.

Friday, June 10, 2011

Use Try? Read this.

Two updates to Try are about to go into effect which enforce asking for what you want using the try syntax and configuring how much email you want to get with your results.  Read more below.

Bug 661409 - Now that this has landed, a push to try only generates email about a particular try builder's results if it does not succeed.  You can adjust this to be more verbose by adding a -e/--all-emails to your try syntax if you miss getting over all those emails, or you can just shut off the emails completely with a -n/--no-emails in your commit syntax. Note that you must be using the "try: " syntax for these email flags to be picked up which leads quite handily to...

Bug 649402 - Try syntax use is about to be mandatory as soon as this bug is fixed and the hg hook is enabled on the try repo. We're doing this to encourage developers who use try to take an extra moment and request only the resources they absolutely need on their push.  This should reduce the test/talos load that has been increasing wait times across all branches during busy periods.  One additional psychological change is that the "try: -a" syntax has been removed and in order to ask for a mozilla-central matching run you must be more explicit: "try: -b do -p all -u all -t all". I've updated the docs to reflect this change as well as the TryChooser syntax helper webpage. We're really not trying to make your life harder with this change, approximately 50-60% of pushes to try currently use the try syntax and if you push to try without it you will get a helpful message pointing you to docs and syntax builder.  Check with #developers for tips and tricks from the folks who've been using this since the beginning, I know they have many including using the newly-minted Mozilla-Inbound repo where a push will get the complete set of tests/talos if you'd like to let your patch bake for a bit after doing a selective try run.