Thursday, December 18, 2008

Unittest Consolidation and You

This morning was the culmination of months of work from RelEng. We have finally turned on unittests as well as a11y tests on the production Buildbot master which was already doing the nightlies and l10n builds.

Mostly this change is to make our lives easier. Now we have one instance of Buildbot to pay attention to (disk space, waterfall page) with one pool of slaves that can do several types of builds (nightlies, unittest, tracemonkey, l10n). This will make replacing burned out slaves incredibly simple. Doing this consolidation also led to a great unittest factory (thanks to catlee) that will make adding unittests to other branches much easier to deploy.

What does this mean for you? Well, if you are someone who looks at unittest output regularly you are probably familiar with something like this:

Where each unittest build has its own waterfall column and you can gauge how the unittests are doing by comparing them to each other (changeset, start time, colour).

The new way will be different:

There will only be one column on the waterfall, with no machine name since the unittest build will be coming from a pool of over 14 slaves for that platform. When you look at this waterfall column you will see the colour (thus result) of the most recent build, but there may be several builds that start within moments of each other.

More investigation will be needed to look at changesets and start times with this new method.

Hopefully people will adjust to this, and we welcome your feedback about the new setup.

It's a very big step for us, one that leads to hopefully parallelizing the unittest steps for quicker unittest results, and eventually doing unittests on builds independently - where right now the unittest run includes a build step every time, we would like to get to a place where unittests can be run on a completed build so that the suite of tests could be run repeatedly on the same build for example.

We'll wait a day or so to make sure that this consolidation is running smoothly and then the standalone production unittests will be turned off, the mac slave machines will be moved into the new pool and the linux/windows vms will be deleted and then new ones created for the production pool. Hopefully there won't be too much of a backlog on builds pending while we wait to get those new machines added.

Monday, December 8, 2008

Dubai and Dashboards

Two things on my mind these days:

Item One:

I will be attending the Education Without Borders conference as a delegate from Seneca College. Five students were chosen to represent Seneca from several fields.

My goal was to try and get a spot presenting about Mozilla's partnership with the Open Source curriculum that we have at school, taught by Chris Tyler and Dave Humphrey. Sadly, I was not selected to present. There were about 1000 attendees from all over the world submitting their proposals so I trust there was lots of competition and that the selected presentations with be mind-blowing. Many of the attendees will be grad students presenting academic papers. It will be great to be there regardless, and I will certainly be pitching Mozilla development and sharing the Seneca teaching model to anyone who will listen.

Item Two:

Q4 is rapidly approaching its end and while I am working on the consolidation of build and unittest to the same buildbot master (when I am not studying for my exams), I am also looking forward to Q1 where I would like to spend some time gathering requirements for a meaningful dashboard of the unittest information that would be useful to developers.

I'm not sure how one gets this kind of conversation started. It would be great to hear from people who care about unittest results and have opinions on what they use the information for. Would a questionnaire be useful? Should I start a forum discussion?

I'll continue to think on that and ask around for ideas on how to do this right. Has anyone been successful in creating a dashboard that is used frequently?

Thursday, November 20, 2008

Seneca and Mozilla in the news

So Armen and I were interviewed a while back for the school paper - The Senecan - about our work with Mozilla through Seneca's open source courses. The article just came out, Download PDF

Thursday, October 23, 2008

The TTC on the move...

This is an awesome video made with OpenGL by the guys who just did a PK on myttc.ca - an amazing visual of the transit system in Toronto throughout the day:


TTC Weekday Service (HD) from Kieran Huggins on Vimeo.

Ready, Set, FSOSS

Sitting in the workshop "ohai! art!" at FSOSS early on a Thursday morning.  I'm downloading pure data (extended) to participate.  Very curious to see what we'll do in this time.

Last night was the kick-off party at Mozilla for FSOSS speakers and also a time to do a dry run of the PK presentation that I'll be doing on Friday.  I got a sneak-peek at Gale Moore's slides and then did a run-through myself.  By setting the timing on the slides to 20 seconds I forced myself to stick to the format.  It's not hard to do this when you're really just talking about your life.  It's a whirlwind tour of what I call my "Zero to Hero Journey".  Come by tomorrow and see it.

The best part of last night though was informal discussions about two topics I love; activism and enjoyable work environments.  The former came up in discussion with Gale and Mike Hoye about the systems being used in schools.  Seneca uses BlackBoard which is akin to being drawn and quartered on a regular basis.  It's painful to use, ugly, has absolutely nothing to offer students, and even our teachers hardly touch it.  Gale wondered why students weren't rebelling or otherwise agitating for a better system...Moodle for example.  My take, granted that I am a bit different than the "average" student at Seneca, is that because of the decentralization of students - many live up to 3 hours away from campus - there is never going to be a decent level of student activism about this or any other issue because we don't spend time with each other outside of class.

On rare occasions, students may have time to compare notes with each other about their experience in their program - but without a common space for downtime, combined with an interest in making changes to the way things are, we will never join together to combat the system as it stands.

I'm 6.5 months away from being done school and it's not in my list of priorities to take on this kind of activity either.  So no complaining about BlackBoard for me.  I've got bigger fish to fry.

The second topic, the work environment, was fun because it wasn't just a Mozilla love-in (and there is so much to love) but instead I heard from someone in a completely different field how much their workplace supported and encouraged them, gave them a clear path to advancement and had a strong team with no weak links.  I'm often griping about the team work that we are obliged to do in school, mostly because the teams are often nothing like anything found in the "real world" and then also due to the fact that the assignments are often fictitious and therefore don't give us as much of a challenge. 

The open source class at Mozilla really raised the bar by giving us real work to do and that experience is invaluable.  Instead of wasting time with teammates dividing up work and trying to manage other students' work ethics, we were project leaders on our Mozilla work and we pulled people into our teams as needed to get the work done.  This kind of turns the model that other classes use on its ear.  Next term I'll be doing a research and methodologies course where our deliverable is a significant research paper on some facet of technology.  I intend to pitch an idea to research the current BSD program's project stream and how it currently leads to very little implementation.  Hopefully with a little research, we can look at ways to turn that around and give the next generation of BSD students a level of satisfaction that will enable them to recognize this in their future workplaces.

Tuesday, September 23, 2008

Third time's a charm - Unittest Production moves tomorrow morning

Wednesday September 24th at 6:00 PDT I'll be connecting the new slaves to Tinderbox:Firefox and taking down the old ones.

Fingers crossed that it really happens this time.

Postponed - Production MozCentral Unittest Moves - to the Build Network

As you may already know, there is some major outage (power out at the San Jose colo) this morning - now dubbed "Black Tuesday" by me. So the switch is postponed again.

New time and date forthcoming when the network is up and stable again.

Monday, September 22, 2008

Take 2 - mozilla-central unittest production waterfalls moves to build network

Hopefully things will still look good tomorrow morning because the new time of the move is now:

This will take place on Tuesday September 23th, at 7am PDT

The new production buildbot will already be up and running smoothly
(currently reports to the UnitTest tinderbox tree) so there should
hopefully be very little impact when this switch over happens.

Basically, the new unittest buildbot slaves will start reporting to
the Firefox tinderbox tree, and the current slaves will stop reporting
there.

This means you will be looking at new slave names. There are 2
buildslaves for each platform and their names are as follows:

Linux mozilla-central moz2-linux-slave07 dep unit test
Linux mozilla-central moz2-linux-slave08 dep unit test
MacOSX Darwin 9.2.2 moz2-darwin8-slave01 dep unit test
MacOSX Darwin 9.2.2 moz2-darwin8-slave02 dep unit test
WINNT 5.2 mozilla-central moz2-win32-slave07 dep unit test
WINNT 5.2 mozilla-central moz2-win32-slave08 dep unit test

Friday, September 19, 2008

Postponed - Production MozCentral Unittest Moves - to the Build Network

Due to some glitches in the buildslaves, the move of the mozilla-central unittest build master has been postponed for now.

I will post again soon with a new time and date for the switch.

http://groups.google.com/group/mozilla.dev.planning/post

Wednesday, September 17, 2008

Production MozCentral Unittest Moves - to the Build Network

So the last task in the move from QA network to Build network is moving the production mozilla-central unittest boxes.

This will take place on Friday September 19th, at 7am PDT


The new production buildbot is already up and running smoothly (currently reports to the UnitTest tinderbox tree) so there should hopefully be very little impact when this switch over happens.

Basically, the new unittest buildbot slaves will start reporting to the Firefox tinderbox tree, and the current slaves will stop reporting there.

This means you will be looking at new slave names. There are 2 buildslaves for each platform and their names are as follows:

Linux mozilla-central moz2-linux-slave07 dep unit test
Linux mozilla-central moz2-linux-slave08 dep unit test
MacOSX Darwin 9.2.2 moz2-darwin8-slave01 dep unit test
MacOSX Darwin 9.2.2 moz2-darwin8-slave02 dep unit test
WINNT 5.2 mozilla-central moz2-win32-slave07 dep unit test
WINNT 5.2 mozilla-central moz2-win32-slave08 dep unit test

Thanks to Jesse Ruderman and others for working so hard on bug #450637, as it means we are now running unit tests on win32 VMs with very consistent results.

As soon as this is done, and working well I can tie up the loose ends, close out some bugs, update the config files and start looking at some new projects...like the test results - and try server unittests.

Thursday, August 14, 2008

Continuing saga of the 1.9 Unittest Move

When we left off, there was a check error happening across all Linux slaves and a reftest failure on the Win32 ones.

Update #1: A bug (450637) has been filed on that win32 failure, and also I brought the physical boxes back from sleep to be up on the new 1.9 master alongside their VM counterparts. We should know in the next hour or so if the reftest failure is consistent on all of them.

Update #2: The check error on Linux was due to the placement of a simple .sqlite file bug-365166.sqlite to be specific. This file was in /tmp and not in the slave build dir and thus, escaped during chown. Being owned by buildbot instead of cltbld was the cause of the access denied errors. Huge thanks to Cesar and Sdwilsh for looking at that test with me and for catching this anomaly. I've filed a bug (450665)to remove the offending placement so that this doesn't happen again in the future. Files shouldn't be getting created outside of the build dir, creates a whole mess of problems.

Speaking of mess:


Ew. That's all I can say. I've been watching this waterfall obsessively (more than usual) as it has displayed a bruised variety of colours, mostly *Not* green.

In other news, something I noticed while upgrading the windows slaves:


Really? I didn't know that people _chose_ IE. I thought it just came with the OS. I wish they would choose their words more carefully.

Back to the unittest trenches.

Wednesday, August 13, 2008

Update on the Unittest 1.9 move

In order to streamline the buildslave pool, the names of the following unittest 1.9 slaves were changed when we switched networks yesterday.

All of these machines now run Buildbot 0.7.7 and the latest Twisted & Python.

The Linux machines had their names changed and user changed - they are the same VMs as before:

qm-centos5-01 --> fx-linux-1.9-slave07
qm-centos5-02 --> fx-linux-1.9-slave08
qm-centos5-04 --> fx-linux-1.9-slave09

The Mac machines are the same ones as before, only a user change:

qm-xserve01 --> bm-xserve20
qm-xserve06 --> bm-xserve21

The two non-pgo windows machines are now VMs, the pgo box is the same VM that it was before - with a user change and a new 30GB fcal drive added for building on

qm-win2k3-01 --> fx-win32-1.9-slave07
qm-win2k3-02 --> fx-win32-1.9-slave08
qm-win2k3-pgo01 --> fx-win32-1.9-slave09


At the moment all three Linux boxes are experiencing errors in Check :

gmake[2]: Leaving directory `/builds/slave_new/trunk_centos5_8/mozilla/objdir/storage/build'
gmake[2]: Entering directory `/builds/slave_new/trunk_centos5_8/mozilla/objdir/storage/test'
../../_tests/xpcshell-simple/test_storage/unit/test_bug-365166.js: FAIL
../../_tests/xpcshell-simple/test_storage/unit/test_bug-365166.js.log:
>>>>>>>
*** Storage Tests: Trying to close!
*** Storage Tests: Trying to remove file!
*** test pending
[Exception... "Component returned failure code: 0x80520015 (NS_ERROR_FILE_ACCESS_DENIED) [mozIStorageService.openDatabase]" nsresult: "0x80520015 (NS_ERROR_FILE_ACCESS_DENIED)" location: "JS frame :: ../../_tests/xpcshell-simple/test_storage/unit/test_bug-365166.js :: test :: line 22" data: no]
*** FAIL ***

<<<<<<<
../../_tests/xpcshell-simple/test_storage/unit/test_bug-393952.js: PASS
../../_tests/xpcshell-simple/test_storage/unit/test_bug-444233.js: PASS



And all three Win32 boxes are having the same 1 test fail in Reftest:

REFTEST UNEXPECTED FAIL: file:///E:/slave/trunk_2k3_8/mozilla/layout/reftests/bugs/212563-1.html

Please contact me if you have any ideas about what could be causing these.

-- Lukas

Tuesday, August 12, 2008

Welcome to Build, Ben says

Today was a big day for the Firefox 3.0 unittest set up. Since QA and Build have become separated, I have been working towards lining up all out unittest masters on the Build network. What used to be 10+ master addresses will be narrowed to 2 - you're either on staging-master or production master.

Easy.

No. It's actually not that easy. What I estimated would be 2 hours of downtime has turned into almost 8 hours (and counting) for many reasons, including the following:

* All the slave VMs had to have a new user created, one that is consistent with all our other Build machines. It makes sense to do this all at once, but it takes some time to get all the permissions and paths and ssh keys and other little details to line up properly

* In switching networks and users, the linux boxes were unreachable by VNC for some time until it was discovered (thanks to bhearsum & joduinn) that the xstartup in ~/.vnc was configured differently than the other linux boxes. I think it took almost an hour to get the fix on this figured out


All in all there were many little trips and glitches that made this process go for so long, and the fact that it can take over an hour to see if a build & test run is successful sucks. Thank you very much to all the Build Team who helped during this process.

At the time of writing this, I am only waiting on the pgo box to come back up on the new network with a 30GB disk partition added, and looking into a few compiler warnings on Mac and Windows. The PGO box didn't have an fcal disk partition for building on and I wonder if the issues in this bug are related to that. It would be a pretty great bonus if this switch turned up the fix for that machine.

The good news is that we are in the process of streamlining and making things more efficient for the future. All the build machines are getting closer every day to being interchangeable. The time it takes to get a new linux VM running is miniscule - and hopefully the same will be true of the other two platforms soon.

Things still to do:
* post about the new machine names of these VMs
* make sure that Nagios is clear about what it should be reporting on
* update the cron job that does the rsync of the buildmaster logs to the TB share
* file patches for 1.9 unittest's mozconfigs, master.cfg, mozbuild.py and killAndClobber.py

Back to watching the buildbot waterfall for green.

Monday, August 11, 2008

Scheduled Downtime Tues Aug 12 - 8:00 am PDT for Unittest network switch

Tomorrow there will be a ~2hr downtime starting at 8:00 am PDT as the 1.9 unittest master is moved over to the build network.

At the same time there will be a short interruption on the Mozilla2 production master.

If any issues arise, please comment in bug 450119.


Thursday, August 7, 2008

Looking for suggestions on dealing with lots of data

So I'm still plugging away at figuring out how to interpret the massive amounts of error log output that our unittest builds create.

As the test suites are being run, there is a steady stream of stdio being generated and logged. From this stdio, I gather up all the lines of output that contain "TEST-UNEXPECTED-FAIL" (thanks to Ted for unifying the output!).

Now I have files that look something like this:

linux-2 | 67 | 07/25/2008 | 06:40 | *** 61506 ERROR TEST-UNEXPECTED-FAIL | /tests/toolkit/content/tests/widgets/test_tree.xul | Error thrown during test: uncaught exception: [Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIDOMWindowUtils.sendMouseScrollEvent]"  nsresult: "0x80004005 (NS_ERROR_FAILURE)"  location: "JS frame :: http://localhost:8888/tests/SimpleTest/EventUtils.js :: synthesizeMouseScroll :: line 273"  data: no] - got 0, expected 1
linux-2 | 67 | 07/25/2008 | 06:40 | *** 62352 ERROR TEST-UNEXPECTED-FAIL | /tests/toolkit/content/tests/widgets/test_tree_hier.xul | Error thrown during test: uncaught exception: [Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIDOMWindowUtils.sendMouseScrollEvent]" nsresult: "0x80004005 (NS_ERROR_FAILURE)" location: "JS frame :: http://localhost:8888/tests/SimpleTest/EventUtils.js :: synthesizeMouseScroll :: line 273" data: no] - got 0, expected 1
linux-2 | 67 | 07/25/2008 | 06:40 | *** 63084 ERROR TEST-UNEXPECTED-FAIL | /tests/toolkit/content/tests/widgets/test_tree_hier_cell.xul | Error thrown during test: uncaught exception: [Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIDOMWindowUtils.sendMouseScrollEvent]" nsresult: "0x80004005 (NS_ERROR_FAILURE)" location: "JS frame :: http://localhost:8888/tests/SimpleTest/EventUtils.js :: synthesizeMouseScroll :: line 273" data: no] - got 0, expected 1</pre>

Where the info is "|" delimited and goes like this:

<pre>PLATFORM | BUILD_NO | DATE | TIME | TEST-RESULT | TEST-NAME | TEST-OUTPUT


Approximately 7000 lines of error output for less than a month of constant testing.

I want to be able to know the following (at least):

* How many times has a particular test failed?
* On which platforms?
* How many times this week vs. last week?

That would be a start anyway.

I would love to be able to create a graph or something visual that shows peaks of test failures. Unfortunately I don't really know much about that area.

So I am asking for help/suggestions. If you had about 490,000 lines of errors (representing 3 platforms) in the above format - what would you do?

I can pretty easily add to the python script that greps for error output so that it creates sql insert statements instead of a text file and I would welcome tips that include creating/automating a database to hold all the error info. I've been thinking of setting something up with RoR to let people create their own views of the data depending on what they are looking for.

Looking forward to your advice.

Wednesday, July 23, 2008

Grovelling isn't so bad

Been working on a couple of little utility scripts that I think are ready for public viewing. I'm interested in any tips on writing better code, or other ways to do what I'm doing that are more efficient.

The first one is cleanup.py which we need to be able to quickly get rid of old log files so that when we grovel through for errors, only the files of interest are being scanned.

Once you've got the old log files cleared out, you can use grovel.py to scan through for TEST-UNEXPECTED-FAIL. This script looks through each directory passed in from the command line, and prints all the failure lines to a .errors file for that directory - so the darwin log errors end up in a darwin_timestamp.errors file. The script also keeps a counter of TEST-PASS, TEST-KNOWN-FAIL, and TEST-PASS(EXPECTED RANDOM) and then prints the total tests run as well as these counters on the last line of the .errors file.

Next steps:

  • Add gathering up all the .errors files into a tarball

  • Set up a weekly cron job that will run these scripts and email the tarball

  • Create a database and insert results

  • Web interface for aforementioned db that will allow for searching



Even though these are pretty simple utility scripts, I'm excited because they will make my life a little easier and also because it's the first python I've written from scratch...oh, and it's not a school assignment :)

Monday, July 21, 2008

Discussing Data

Some general thoughts on the discussion of data, inspired by Mitchell's blog post.

When I first started using the internet with some regularity, about 13 years ago, I was suspicious about entering any personal information whatsoever. This was before identity theft was a common occurrence, before I had any money to worry about losing, I don't think I even had a credit card yet. Some of the fears were based on run of the mill rebellion against "The Man" but some if it was just a reaction to something new.

For many years, whenever prompted for personal information, I would look for a way around having to enter it. If I couldn't get under it or over it, I would make stuff up...or leave. Creating false accounts gets tiring, because then you have to remember all your lies. Firefox wasn't around yet to help me keep track of all my phony accounts. I sure do appreciate the password manager and extensions like BugMeNot.

Skipping forward to the present, I still look for a way out of having to enter any identifying data wherever possible. Something that continually annoys me is being required to choose between male and female on a form when I am making a purchase. This should NOT be required to buy a sticker, test beta software or sign up for a social networking site. I'd like to see the end of generalized marketing based on gender and find new ways of triangulating what cat owners are doing that is different than dog owners.

Back to the data...

Even though I hate the thought of anyone assuming they know me because of a few hastily checked radio buttons, I also want the freedom to go about my business on the internet as easily as I do in real life - with my driver's license and a bank card. I have proof of who I am and I have money - what more do you want? I should now get to do whatever it is I'm looking to do with as few clicks as possible.

So if the future web browser allows me to safely keep all the important stuff handy, to know that I am who I say I am, and let me skip the 3 page sign up process, this is a Good Thing.

How can we get to that kind of level without talking about data and all the good/bad/lukewarm associations we have with it?

I tell people all the time that they should be using Firefox because it is the safest. People care about safety, and this is what they need to hear. If Firefox starts to work with data, I trust that we will do so in the best interest of the people who came to us for safety. I'm excited to talk about data and what we can do with it.

My hope is that data collection will become less of a top-down "Tell me this information or you can't access {fabulous service name here}" and instead will become the equivalent of the clerk at Best Buy asking you for your postal code and being able to say "No, I don't want to give that information to you, but I will still buy Rock Band from you".




Thursday, July 17, 2008

Set the VNC Password for Mac's Remote Desktop in Terminal

I was stuck in trying to access one of our xserve machines that just got moved from the QA network to the Build network. I could connect via ssh, and Justin could ping it but attempting to connect with VNC wasn't working. It wouldn't accept the usual passwords. Justin seemed to think that it was possible to change the VNC password through the command line, so I google it and read a post from 2 years ago.

Something I've learned from reading "how-to" blogs is that you should always read the comments first. That's where the most up to date information will be, if there is any. The person who wrote the post used strange template structure that made his idea hard to read and understand. Anyone who didn't read the comments wouldn't know that kickstart now takes plain text passwords.

The long and short? If you want to change the VNC passoword do this:

sudo /System/Library/CoreServices/RemoteManagement/ARDAgent.app/Contents/Resources/kickstart -configure -clientopts -setvnclegacy -vnclegacy yes -setvncpw -vncpw [newpassword]


Apparently you can enable VNC access and set the VNC password via the kickstart command. It isn't terribly well documented, but since it now accepts plain text passwords, I think that's a step in the right direction.

Wednesday, July 2, 2008

Chasing rainbows is easier

I was so thrilled to discover Splunk that I installed it on one of the buildbot masters - qm-rhel02 - without realizing that in fact, Splunk starts to quickly eat up disk space and hogs memory usage. Yesterday afternoon some Talos boxes started to go down because of this, and once I stopped the Splunk server everything started to right itself.

Lessons learned:
     Do not play with the buildbot master. 
Do not look directly at the buildbot master.
Do not taunt the buildbot master.


So today's tasks include getting access to the samba share that was set up, creating a cron job that will rsync the buildbot master logs to said share and then finding a safe place to set up Splunk again.

We really need to have a way to look at data from the buildbot master over a long period of time - otherwise filing bugs on these intermittent failures is just a shot in the dark. Take yesterday for example. qm-win2k3-pgo01 is being "unreliable" and had the same errors in refest for two consecutive builds. I file a bug, and the response is "grab me a copy of the offending objdir so we can poke at it". Wouldn't you know that the very next build does not have the same error output - this time it has mochitest issues that are seemingly unrelated. This morning I check again and it's had a compile failure, an exception (the most hideous purple) and then a completely green run.

Intermittent failures == needle in a haystack

Wednesday, June 25, 2008

sha1sum on Mac OSX

Getting ready to assemble the Firefox 3 CD and came upon the glitch that Mac OS X doesn't provide sha1sum tools. Quick Google search turned up a great comment on this blog post which suggests using openssl and by putting alias sha1sum="openssl dgst -sha1" in my .profile I can now do sha1sum $app_name.iso to my heart's content.

The Firefox 3 CD will now have all the supported locales on it, which is a step up from the Firefox 2 CD. Look for it in a Mozilla Store near you.

Thursday, June 19, 2008

Splunk - Where IT's at


So, you know that script I was working on to parse error logs? Well, it turns out that there is already an amazing, free, graphical program that does the work for me. Excellent.

It's called Splunk, John O'Duinn mentioned it in passing last night and today I got it running on the unittest-staging build master in about 2 seconds flat.

Installation (on linux) is a breeze, simply:
wget 'http://www.splunk.com/index.php/download_track?file=3.2.6/linux/splunk-3.2.6-38259-Linux-i686.tgz&ac=&wget=true&name=wget&typed=releases'


Then unpack and bin/splunk start --accept-license

It creates and starts a server on port 8000 which you can then access to use all the graphical features as well as an admin dashboard. The site is clean and well laid out. I look forward to going deeper into what this app can offer us.




I also installed it locally on my Mac and the steps are exactly the same. After starting up the Splunk server, simply point it to the directory where your log files live and Bob's your uncle.



See how it beautifully transforms log files into searchable fields with a graphical display? This will be extremely useful as we shift our machines around, play with the difference between VMs and hardware as well as put all our unittest machines up to Buildbot 0.7.7.

With only a few minutes on the dashboard I found it easy to navigate, add several input streams from the various build slaves that run on unittest-staging and also noticed that you can create and save specific search requests.

Can't wait to see how much this helps others, now off to install it on the other masters.


Tuesday, June 17, 2008

Robot War - Firefox 3 vs The Other Browser

Just in time for Download Day, a little video produced and directed by Marcia Knous and edited by me. I'm posting it on several sites youTube, Flickr and blip.tv - please link to it, digg it, favourite it, whathaveyou - just pass it along.








Digg!

Wednesday, June 11, 2008

What did I do today?

Interning at Mozilla has so far provided me with many opportunities to learn (and re-learn) some of the finer points of the command line interface.

Today I have spent most of my time working on some scripts (both shell and python) that will assist in parsing error logs from the unittest builbot masters.

Here's how I would like it to work:

  • The script will live in the master directory and when run followed by the 1...n slave directory names it will create a folder with files containing all lines which match particular error messages that are in the stdio log files. Each file is specific to the pattern searched (eg: reftest, browser, check)

  • A python script then reads through each of the files and breaks the lines of error statements up into sqlite insert statements and throws them into a sqlite db

  • The DB will have a front end which will allow for easy searching and sorting of the data to see if there is a particular test that fails more frequently than others, on which platforms, etc...



This is an interesting side project for me as we wait for release to start bringing the unittest boxes up to Buildbot 0.7.7. I'm enjoying writing python (esp. compared to shell scripting).

Finally - many thanks to Chris Tyler for today's little success. I needed to put the time and date of the log file into the pattern matches so that the database can be more useful and I was having difficulty figuring it out, Chris is the expert at the one line solution which was ( >> means continued on next line):

for file in *-log* # traverse all files in $DIRNAME
do
grep -HnA 2 "$string1" $file >>
| sed "s~^~|$(date -r $file '+$%D|%T|')~" >>
> "$OUTPUT_FILE.reftest"
done

Friday, June 6, 2008

What's new in Firefox 3? Check out this demo!

A quick (< 4 minutes) overview of some of the new features in the soon-to-be-released Firefox 3. Check it out, then head over to http://www.spreadfirefox.com/worldrecord/ and sign up to be notified when the new version comes out!

read more | digg story

Thursday, June 5, 2008

Parsing for errors in Buildbot log files

The other day two of our Moz2 unittest buildbots - one Linux and one Windows - were both failing tests intermittently. We have all these logs but no way to parse the data to look for patterns and try to figure out what is going on. In an attempt to scratch the surface of this issue, I was tasked to look at the error messages and put together something for bug 435064.

It took a little while to come up with the right approach but in the end I had some grep statements that did the trick. Thinking that this will come up again, I packaged them into a little shell script to do the dirty work for me next time:


#!/bin/bash
# parseError.sh
# simple script for gathering up errors in log files

if [ -z "$1" ]
then
echo "Usage: '$0' [directory]"
exit 1
fi

string1="UNEXPECTED FAIL"
string2="ERROR FAIL"
string3="command timed out"
string4="FAIL"

DIRNAME=$1
OUTPUT_FILE="../output"

echo "Looking for UNEXPECTED, ERROR and command time outs..."
cd $DIRNAME
for file in *-log* # traverse all log files in $DIRNAME
do
grep -Hn "$string1" $file >> $OUTPUT_FILE
grep -Hn "$string2" $file >> $OUTPUT_FILE
grep -Hn "$string3" $file >> $OUTPUT_FILE
done

# these two searches include 5 lines of context
echo "Looking for Check and Browser Fails...."
grep -HnC 5 $string4 *-log-check* >> $OUTPUT_FILE
grep -HnC 5 $string4 *-log-browser* >> $OUTPUT_FILE

echo "Sorting......"
sort -n $OUTPUT_FILE | sed /--/d > "$OUTPUT_FILE.sorted"

echo "Search complete."
exit 0



Thanks to dchen and humph for helping with the finer points of writing shell scripts.

Friday, May 30, 2008

Bets on Canada going over 100,000

Armen doesn't think that Canada will make it over the 100,000 mark - I would like to see him proven wrong, so please Canada - get on with the pledging!

This map is fascinating to watch and there's so many tangents you can follow with it. How amazing is it to see 14 pledges in Yemen? Why does Alaska get to be orange (50K + pledges) when you know that there are not that many Alaskans signed up for Download Day (prove me wrong). I think South Africa will be light blue soon, there's almost 1,000 pledges there.


It's a great distraction from buildbot...

Wednesday, May 28, 2008

Guiness Record for Most Downloads

Want to help Mozilla set a World Record? Join in Download Day and pledge to grab a copy of Firefox 3 in the first 24hours of its release. By pledging, you'll get to see the number of pledgers in your country go up by one, and also you'll get a friendly reminder email on the day that FF3 is released.

Spread the word by putting an affiliate button on your site.


Thursday, May 22, 2008

Learning advanced Bugzilla

In our first year of Mozilla development at Seneca - we learned how to file basic bugs, how to upload patches and we followed a module owner or the like so we could see just how much bugmail a person can handle. Now that I'm a Build intern I am learning to use bugzilla on a few more levels.

This is just one (small) example:



These bugs will allow me to go through the process of creating, setting up and deploying 4 new VMs so that I'll be able to see if they can handle both build and unittest builds. In a perfect world, they will be able to do this and that means we will set up a pool of buildslaves for each platform and delegate work from both build and unittest as needed. The goal is to be scalable and to use as little hardware as possible.

The reason I have to test this is because it's possible that VMs cannot successfully run unittests as they are written now. We need to know that the unittests can and will run before choosing this method.

More on this soon. Back to bringing WinXP vms up to date with mozilla-build instead of ye olde Cygwin.

Tuesday, May 20, 2008

Vista Building with VC9

If you've been banging your head trying to build Firefox on Vista with the newest Visual Studio 2008 express edition (VC9) - know that it is possible now, and with minimal bruising of your forehead.

Two things you need to remember:

1. You have to run start-msvc9 in mozilla-build as administrator (if you do not your builds will fail with messages about bad file numbers and such)

2. You have to put the following four lines in your .mozconfig (for now - keep your eye out for a new rev of mozilla-build which should fix - current at 1.2)
ac_add_options --disable-xpconnect-idispatch
ac_add_options --disable-activex
ac_add_options --disable-activex-scripting
ac_add_options --disable-accessibility


That's it - after bug 419665 is checked in, there should be no other obvious issues unless you forget to run as administrator or check out when the tree is burning.

Happy Vista building.

Wednesday, May 14, 2008

Clobbering buildbot run leaves no trace of history

Sadly, in my excitement to get a Mac buildbot slave up and running yesterday, I have overwritten my profile with a lot of file:/// addresses in the awesomebar and little else. The fabulous part of searching in the awesomebar is so much less when you have NO history.

This is because the master.cfg is set up to clobber and thus wipes clean the default profile as it does it's thing. Not so bad for a VM unittesting machine, kind of terrible for a human with poor retention for URLS thanks to a dependence on the awesomebar.

/me writes on the board "Never use the default profile" * 1000

Monday, May 12, 2008

Week [1] - Learning to set up Buildbot

Okay, it's time for another update as to my activities in MV.

By the time Robcee left last Friday to go back to the picturesque province of New Brunswick, I had come pretty close to having Buildbot installed and ready to be deployed on my CentOS VM.

Today in between wrestling with trying to build on Vista with VC9 (I must remember to "run as administrator" on the mozilla-build shell) I have been learning to deploy my own local buildbot master and a linux slave.

In getting to this:


there were a couple of roadblocks.

1. The current master.cfg and mozbuild.py are written for Builbot 0.7.5 and so I had to make a couple of small changes as per the documentation to account for import changes (no more step, using steps.shell or just steps instead) and also html.Waterfall is deprecated we should use html.WebStatus now.

2. It took a little while to figure out that the "force build" option that you can get when you click on a slave's name link in the waterfall is actually an option passed in to html.Waterfall as in html.Waterfall(http_port=2005, allowForce="true") and this gives you a nice little html form where you can force a build as you need one.

3. The last little glitch was just making sure that all the names matched up. The major lesson learned here was: Never follow directions. Just because someone says to do

buildbot create-slave slave localhost:9989 linux mozilla
doesn't mean that will work. You have to look beyond the literal instructions and realize that the slave name must match what is in auth.py and same with the password.

Next goal - learn how to kick unittest machines when they misbehave.

Wednesday, May 7, 2008

Wrestling with the CentOS ref platform and configurations

Today was supposed to be "learn all about unit tests" day and instead it was "configure CentOS until you drop" day.

Here's what I was working with:
Ref Platform VM (CentOS-5.0)
Ref Platform set up instructions
Install scripts that are supposed to help make it all much easier

So the day looked a little like this:

* Wrestling with Python for about 2 hours - after much trial and error with the python path I ended up deleting my messed up ref platform vm, and starting up a clean one. The install scripts caused a clean VM with a python version of 2.5.1 to become a busted, can't finish make, 2.4.3 version. What?!

* Scrap the scripts, instead follow the instructions as above from the wiki. This brought up a couple of issues. First of all, all the versions on the wiki are a lot older than the ones in the install scripts. With a clean VM you first have to set a symbolic link to gcc-4.1.1 otherwise compiling zope-interface will not work:

ln -s /tools/gcc-4.1.1 /tools/gcc


Now the steps for installing zope-interface work fine. However there is an issue where every time a symbolic link is created, a symbolic link is then also created inside of the symbolic link as a link back to the original item. This is extremely confusing - I just deleted the broken links without looking too deeply into why that was happening. It happened with zope-interface and the twisted/twistedcore.

There are also a few spelling errors in the wiki for the PATH settings, I'll go in an change those soon.

Now that I am trying to write down what happened, all the glitches seem minor. I swear there was a lot of "Why is this happening?" and studying PATH, PYTHONPATH and PYTHONHOME to see if they were configured properly.

Probably the next step is to study those scripts and see how they can be more easily run on the VM anytime, anywhere because right now - they are hit and miss.

Monday, May 5, 2008

Getting set up on Day[0]

First day of the internship, and things are going well. As my first build on the MBP is running in the background, let me lay out my set up so far. It will be familiar to many of you but I want a list for my own records.

Upon opening up the MBP with a clean install of Leopard, I am faced with the first task of downloading Firefox. What people choose to download is an interesting mix. While my supervisor John O'Duinn downloads the beta version when he needs a fresh build both Armen and myself leap headfirst into the latest trunk build, 3.0pre.

Next in line:
* Mozilla-VPN installer
* Tunnelblick
* Colloquy
* Quicksilver
* Chicken of the VNC
* Remote Desktop for Mac (beta)
* MacPorts
* XCode
* Thunderbird
* TextWrangler (a free Mac text editor that has command line integration - which is awesome!)
* VMware Fusion

I've done the build configuration and am just waiting to get my VPN set up properly. Everything here is a bug. As Sean from IT just told me - "If you think 'Should that be a bug?' that's probably a bug".

Speaking of bugs, I got assigned another small bite today, bug 432003, which is to enable source server on thunderbird windows builds. Two lines and a patch uploaded, nice to have such a minor bug to get back into the swing of things after my 2 week holiday from school.

Day[0] out.

Thursday, May 1, 2008

Packed and (almost) ready...

I've just finished attempt #3 to re-pack and prune what I'm bringing out west on Saturday. My bike is packed up in a box, I've booked an airport limo for 4:30 am and there are two 70lb bags of dog food in the basement to keep the hound in chow while I'm gone.

It's amazing to imagine 4 months living out of one suitcase. Kind of makes me want to donate all the rest of my clothes and shoes right now. I'll probably do a huge spring clean when I return and get rid of a ton of stuff.

It's been great to have Armen already out in Mountain View so that I can ping him and ask questions like "Is there an alarm clock in the apartment?". I need to know these things. Right now I'm praying there will be space in my carry on bag for a coffee grinder. Not because I drink coffee but because I used it to grind up flax, spices and oats for baking and smoothies.

There's not much going on in terms of open source work. The past couple of weeks have been filled with preparing to leave town, seeing friends and doing a bit of AMO editing. I'm happy with this past term I have to say - in the end I made the President's Honour list again. Now I just have to keep finding ways to turn that into potential funding for my last year. When I'm filling out applications it seems that the biggest factor is what kind of community and volunteer work you do.

Next year I'm going to volunteer with SOY where I would do 1 on 1 mentoring with a queer youth. It's hard to translate all the hours and effort that went into my time on the Trinity Square Video Board of Directors into funding applications and I've maxed out my 2 X 2 year terms. Mentoring a youth will probably look better and maybe I'll be matched up with a young'un who's into computers and/or film. The other options that interest me are to get involved with Sketch or Parkdale READ. I've been able to give at least 4 hours a month to TSV so hopefully I can make a similar comitment to the new organizations.

In other news, one of my films got into Frameline which is in San Francisco in June. Lucky for me, I'll be in the neighbourhood this summer and will be able to attend my screening. I was working on getting accredited as a distributor by representing my old workplace but as a filmmaker I will get treated much better and probably score some interesting shwag.

After the Images festival opening where G.B. Jone's feature premiered (I did a voice over for one of the main characters) I was inspired to make a feature on Super 8 film and I made a pact with two other filmmaker friends that we would each have a feature before ten years were up. It may be ambitious but my hope is that by the end of this summer I will have a working screenplay to start preparing a schooting schedule from. I'd prefer to have a feature finished in 3 years, not 10.

That's all the news fit to print right now. Time to take the dog out which I'm dreading because she ate 10 pull 'n peel licorices earlier. Ugh.

Friday, April 18, 2008

Final Demo of Source Server for Mozilla

This is the demonstration I showed today for the final wrap-up of my DPS911 class - "Open Source Project".

Friday, April 11, 2008

The Source Server will be ready for public consumption

Okay.

As I download VS Express so that I can test it in that environment, I've uploaded two patches in the continuing saga that is Source Server.

Here's what we've learned since my last post:

* cvs.exe that comes with mozilla-build has issues so it's necessary to point your path to a standalone version
* the tinderbox cvs_root uses private key access so we have to alter symbolstore.py to check for an environment variable of SRCSRV_ROOT which will be set in the tinder-config.pl file to the public :pserver:anonymous@cvs-mirror.mozilla.org:/cvsroot.
* the reason that %fnchdir% wasn't working in my VS was because the srcsrv.dll it was using was an earlier version. once I copied the version from WinDBG over to the VS devenv folder, it worked. Still don't know where %fnchdir% came from (googling it only turns up myself) but I am glad it's working now and hopefully VS2008 express comes with this newer version - otherwise the documentation will be instructing people to do all sorts of extra work
* the last piece of the puzzle - Why Doesn't the Code Show Up After Downloading? - turns out there's a little checkbox in the options for debugging - something about requiring the exact file match...uncheck that and VOILA! Source Server worked in Visual Studio!!!

This is awesome. I am so thrilled that when the two bugs are resolved, downloading a nightly and debugging it should be really really easy!

Time to update some screenshots :)

Wednesday, April 9, 2008

Tweaking locally - ftw

So in order to get the current version of nightly builds to work for me, I had to add a srcsrv.ini file in the same place as the srcsrv.dll and devenv.exe - on my computer this is C:\Program Files\Microsoft Visual Studio 8\Common7\IDE

In that srcsrv.ini the only lines needed are:

[variables]

MYSERVER=:pserver:anonymous@cvs-mirror.mozilla.org:/cvsroot


The reason for this is that anything in your local srcsrv.ini will override what's in the pdb file. In the current situation, the pdb files for the nightly debug builds have MYSERVER=:ext:ffxbld@cvs.mozilla.org:/cvsroot since that is what the computer they were built on used. That cvspath does not work without a key though, so it is no good to the average user of source indexed builds.

The second tweak is still making sure that the code is being located properly once it's downloaded. There is a way to change your registry to alter where Visual Studio looks for its cache of source code. I tried this and it worked but I'm not sure it was necessary. It's all still a bit of a grey area.

However, the silver lining? The patch does work - with the tweaks which are pretty simple - so source indexed builds exists and hopefully some people will start using them and familiarizing themselves with source server. In the long run, the more people using it, the more people who will be able to hack on it.

Thanks especially to this article which helped tonight as I tried to find the right combination of settings.

Now it's time to go write some things into the documentation.

Testing the indexed nightly

So the fix worked and now the Mac/Linux |make buildsymbols| functionality is working again. As well, the nightly debug build from last night had source server indexing.

I downloaded the windows nightly to test it - in both WinDBG and Visual Studio 2005.

All looks good on VStudio - the symbols download and then when I try to break debugging I get this:
"This is great!" I think to myself and I happily hit "Run" several times to the various prompts for cvs commands.

Then...NO Source code can be found anywhere. Why? Well, it didn't download with that cvs command. See how it's got ":ext:ffxbld@cvs.mozilla.org:/cvsroot"? Well that doesn't get me any code, what it does get it timeouts and errors like this.

So now what? I was able to run the command with :pserver:anonymous@cvs-mirror.mozilla.org:/cvsroot and pull the code files to their default location - which by the way is not anywhere you would find it by accident. Here's what I found out about the location that SourceServer puts your code by default:

You may be wondering where Visual Studio puts the source server download cache. The default location is C:\Documents and Settings\Username\Local Settings\Application Data\SourceServer. This is great because with proper security in place only logged-in users can see that directory.

http://msdn2.microsoft.com/en-us/magazine/cc163563.aspx

So I ran the command in that directory and when VStudio is done asking me over and over again about running cvs commands - I do get the source file I checked out appearing in my VStudio solution.

What now? I don't know if this is something that needs to be changed in the mozilla code or if this is now about doing local changes with a srcsrv.ini file. Hopefully I will get some good advice about this soon so that I can demo it tomorrow without too much hacking and with a little bit of authority even :)

Monday, April 7, 2008

Linux headache...

So my patch was backed out because it broke make buildsymbols on the Mac and Linux platforms. From looking closer at the error message and the patch, I deduce that it's GetVCSFilename containing the offending line - a return file where a tuple (two return values in python) should be returning, so I change it to return (file, None) and after testing the patch on a clean Mac build I am presented with the successful making of buildsymbols.

Now enter evil Linux (cue evil music)- the one platform I have yet to build on in any serious fashion. I should test this patch on Linux to make sure that it works there too. Here's the bullet form of what happened (as best I can remember at this point in head bashing on keyboard):

  1. I ssh to 'Liberia', one of the Quad's running Fedora in the ORI lab
  2. Check out, apply patch and build a debug build on it - fast! It takes ~3 mins to checkout and ~13 to build
  3. Run make buildsymbols and I get "make: nothing to do for 'buildsymbols'" (thanks.)
  4. Clobber build, tweak .mozconfig adding --enable-application=browser
  5. Build - run make buildsymbols - still nothing to do for buildsymbols
  6. After attempting to bike home, getting a flat tire from a massive pothole on Keele St and having to bus it I ssh into Liberia again and start from scratch. At this point both Armen and Dave start builds too, all of us using --enable-debug and other assorted .mozconfig settings
  7. No one can make buildsymbols. Without the patch there are errors and with the patch - same errors.
  8. In objdir/config/autoconf.mk I check and see that MOZ_CRASHREPORTER = 1 so make should be recognizing this.
  9. I try a clean checkout and can't even build because of errors with dump_symbols.cc
Swimming in a sea of red herrings is not fun. Tomorrow's another day and another stab at getting this going. At this point I'm convinced that there's something wrong with locally building and not in fact with the patch since it does not touches dump_symbols.cc and since today's Linux symbols are up and fine (ted checked) which means that make buildsymbols does in fact work somewhere...just not for me (or Dave).

Saturday, April 5, 2008

Build & Release - Part Two - more meetings...

Continuing along with the education of future Build and Release engineers, Armen and I were introduced to automation with Robert Helmer. Rob's leaving Mozilla in a couple of weeks and it's too bad we won't get to work with him this summer. He's really got this all figured out and most of his presentation was way over my head. I said to Ben Hearsum later that it feels like once I'm actually touching the build system it will probably make more sense, but just talking about it? That's a bit confusing. In any case, Rob presented for about an hour on the automation process from past to present and then there was a lively discussion about what's coming next.

The next day Ben Hearsum talked about Try server and showed us the current set up. I'm really excited to learn more about Try server, and also to having permissions to use it myself. Considering that my recent patch had to be backed out for breaking buildsymbols on the Mac and Linux platforms the Try server could really help me out. I'll be talking with Ben about how I can get in on it. I have an LDAP account now...

Following Ben, Nick Thomas showed us the flow of releases from tinderbox to virus scanning to their final resting place on the ftp servers. In his diagram you can see pre Sept 2007, how it is currently and then on the right - what we're aiming for.


The last presentation of the day was Rob Campbell with his presentation on unit tests. Again we were treated to a thorough explanation of how things were done, how they're being done now and where things are heading.

The most exciting part of the week was realising that there is still a lot of room in Mozilla to have an impact on how things are done. On Wednesday morning we went for breakfast and Armen and I were lucky enough to be sitting next to Mike Schroepfer - VP of Engineering for Mozilla. He shared a lot of information with us and also pointed out how every single person at the table (the Build, Release and Automation team) has contributed significantly. That Mozilla has been able to grow as quickly as it has without falling down is a testament to all their hard work (and that of many others) and I am inspired to be part of that.

Thanks to Dave and to John for helping organize this opportunity. I think that my transition to interning this summer will be much smoother as I have met many of the people with whom I'll be working, plus the overviews of each area of Build were invaluable.

Tuesday, April 1, 2008

Build & Release - Learning about Talos

Armen and I are in California attending Build and Release team meetings this week. Over the next two days we'll be introduced to the many facets of the Build and Release workflow.

Todays first session was about Talos with Alice.

Here is the diagram of Talos (copied from the diagram Alice drew - yes Talos is a robot):


Alice walked us through how Talos gets its information from browser builds by having buildbot read for new builds from quickparse which is a text file. Buildbot has a script that knows what to look for in order to find new builds and there is a 5 minute delay before Talos is deployed because the information can get into quickparse before the build is finished and so therefore does not technically exist.

Currently there are 30 Production and 20 Stage Talos machines running, this past December there was only 1 Production machine and the stage machines.

This huge increase of Talos machines has led to an insane amount of data being gathered and a database which is in serious need of some help.

After Alice's presentation we all tried a standalone Talos so we could see the tests at work.

Anyone can try them, just follow these directions. If you are using a recent nightly you might need to add security.fileuri.strict_origin_policy : false to your sample.config file preferences because of new security features. Also, you can comment out the tjss tests because those are kind of boring - the fun test is the svg since you'll see a lot of graphics tests running on your browser. This standalone runs on a new profile so it's okay if you already have Firefox running when you run this script.

So the information that's generated is good for recognizing regression like in this bug, where if you look at the graph you can see how the build was chugging along, something got checked in that affected performance and then it was backed out and the performance went back to normal.

Pic here, see bug #425941 for more info:


More information on Talos Machines.

Friday, March 28, 2008

Source Server Tweaks - Now with refactoring and a clean_root function

Things to remember:
1. When you post your patch you select review ? and not review + (I had been confused about why I couldn't put my reviewer's email address in next to the +)
2. When your previously working code stops working suddenly and print statements galore are not helping, and you know what part isn't working but not why...Stop poking at the code and go to MXR -- Thank you MXR for helping me catch (through a line by line comparison) that I had accidentally deleted a key line thinking it was my own addition. I think this calls for an editor that does coloring on text not for syntax but for diff'd text.

Tomorrow should be a big day. I'll be in a little box office at the Royal Cinema all day and night working for Cinéfranco - if anyone wants to see a French movie, I can hook you up - and I'll be anxiously awaiting a review result because this is it people, these are the tweaks that should net me a test version of a nightly build.

Fingers crossed that one particular Build guy will be working on Saturday...

Sunday, March 23, 2008

Some of my favourite things

I'm still learning how to get the most out of my system set-up. Since I've recently become an Editor for AMO, I've started a list of add-ons that are of interest or could be useful to me. Here's the list so far, in no particular order:



That last one, BugMeNot, really impressed me because it's such a great idea. I complain all the time about how many times I need to create a user account with a site I may never visit again, or a site to whom I do not want to provide my information. BugMeNot is a simple idea that has a huge impact. I get to share the username/password of "nobody" with everybody.

As well as add-ons, I've installed Quicksilver
so that I can do more from the keyboard. I don't know why I never looked into this before but apparently you can go to the spotlight search with command + space...nice to know. Quicksilver is spotlight on steroids though, and I'm looking forward to exploring the capabilities. The only glitch I had on starting up was that the hotkey wasn't working how I thought it should. In the end, I set my preference for hotkey to modifier activation only, single, control. To pull up Quicksilver I just tap the control key twice. Perfect.

The last thing is BBEdit - I'm still trying to find a text editor that I feel comfortable with. For some reason XCode scares me...perhaps this summer I will try it again. My favourite editor on the PC was Crimson, which sadly has no Mac support. BBEdit looks like it's got everything I liked about Crimson, namely the ability to work on files over FTP/SFTP. This morning when I was working on a patch for tinder-config.pl though, it did some weird things to line breaks. So I'll have to check the preferences more closely before I use it for another patch.

Time to go fix that tinder-config.pl patch.

Patch updated - now with double slashes

On the edge of my seat as I wait to see if my patch to tinder-config.pl will be checked in and a source indexed build will come down the pipe. I just tweaked the one line addition with double \\ in the file path. Small things hold up big things...

Soon I will be testing the nightlies, grabbing screenshots and putting together my demo/docs.

Tuesday, March 18, 2008

Source Server Demo

So I am still waiting for the install of Windbg. The timing is kind of terrible because everyone's all a flutter with Firefox 3 and I'm twiddling my thumbs a bit. As I can't test the source server from the user end yet, I must turn my attention to documentation and also to planning my final "knock your socks off" demo. Instead of 2 demos which would be ridiculous in the short amount of time that is left, I will do one super demo where lots of people (you dear readers?) will come and test out the source indexed builds. This should happen in the week before exams, more info to come soon. If anyone is looking for contrib points (is that like karma?) please come to the demo.

Tuesday, March 11, 2008

It's quiet...too quiet

Things have calmed down significantly since the big check-in a couple of weeks ago. Time flies when you're waiting for someone to install on the ref platform for you. This is the part of Open Source development that I'm not so good at: patience.

What have I done? Well, I filed a bug asking for the install of Windbg on the ref platform/tinderboxen so that we can start pumping out source indexed nightly builds. Once those are coming down the pipe I can download and test them to be sure that everything is working as expected. Based on the few tests from last semester, I anticipate some tweaking will be needed to where the files are stored and possibly to the cvs set up.

In the meantime it's time to think about what the next few releases will look like. Oh, and I have a demo on Thursday (/me prays for the install to happen before then so there's something worth demo-ing).

Here's what I'm thinking:

0.8 - all tests and tweaks for the user side of source server builds
0.9 - docs and tutorial about the source server and how to use it
1.0 - ? need to think more about that.

In a perfect world 1.0 is a pdbstr replacement but that is starting to seem like something out of my league. I really don't even know where one begins with hacking into a microsoft hex file to deposit information.

Perhaps there will be unforseen bugs or other features that can be added to make the 1.0 of this project sharp and professional.

Okay, it's time to go pester the build guys some more for that install.

Saturday, March 1, 2008

Google search FTW

Here's reason 1,000,001 why I love the internet:

The VMware Fusion Windows XP vm is pretty important to daily life for me and today it decided to hang. Halfway through a "Restoring Virtual Machine State", the progress bar just stopped moving and I had to force quite Fusion. After many restarts, and other experimenting, I was starting to worry that I was in for a long Saturday of re-creating an XP vm instead of working on a job I'm doing.

Instead, a quick Google search for "restoring virtual machine state hangs fusion" turned up this little nugget of wisdom that I bet will come in handy again someday for me, and maybe for anyone else out there who also depends heavily on Fusion.

Go to ~/Documents/Virtual Machine right click on the .vmwarevm that is causing the issue, right click on it and "Show package contents" then delete the .vmss file.


This gets rid of the session state and when you restart the vm it will boot up as if it was not saved in mid session.

Lose the session, no biggie - lose the vm...well that would be much much worse, right?

Tuesday, February 26, 2008

A day that shall go down in infamy...

Today at 6pm I have finished the necessary changes to my Source Server patch so that it is ready to be checked in. This will be my first check in to the Mozilla code and I'm pretty psyched that after about 6 months of working on this, this particular chapter is about to close.

The last few changes today were quite interesting and good learning for me. Basically, I had to take my | if self.srcsrv | stuff and put it into a separate method. Ted pointed out that this was because it's win32 specific code, so there's no need for it to be in the general Dumper class.

Hello object-oriented programming? This is exactly what I need to learn this in an applied manner and I will remember this lesson better than having just read it in a book.

So now there's a declared SourceServerIndexing method in the Dumper, and then a detailed method of the same name in the Win32 Dumper - which is where all the action is. I wonder if there's a chapter somewhere down the line where I will try to take this SourceIndexing into the Linux and Darwin dumpers.

When I first pulled everything out into the method, I had to wrap my head around the way that Python calls methods. Basically, if I called it from within Dumper, I need to do self.MethodName so that it looks for the method inside the Dumper class. Without the self. in front, I got an error message saying that a global name "MethodName" was not recognized. Also, when I was running | make buildsymbols | I wasn't source indexing the pdbs. So I put in some print statements to see if all the right steps were happening. They were. Confusing.

Eventually I noticed that there's a |shutil.copyfile(file, full_path)| which is what puts the pdb files in the place that I want them to be for indexing and I was calling my method before that. Quick switch and the lights turned on, the music played and my pdb files were indexed.

This has been a great project for teaching me how to set up a work flow as efficiently as possible. Now I'm at the point where I can get right into working on the code and testing frequently with minimal effort. The closer I got towards getting it all working, the easier the work flow got. Right now I can test this patch with just MingW32, a command shell and my Crimson Editor. I can tell within the first few seconds of |make buildsymbols| running if it has worked and then a ctrl + C to interrupt it and get back to the tweaking of code.

It feels great to have put this up for approval. Can't wait to see what happens next.

Monday, February 25, 2008

Temporary Geek Home

Went to the Toronto Mozilla office today. Spent the afternoon reviewing add-ons and I also got to put some faces to names. Since last Friday when we got training from Alex Polvi to be editors for AMO, I've reviewed 14 extensions and I'm learning a lot about the process.

Ted said he'd be reviewing my patch tomorrow and we're hoping to land it in the next few days. As far as I can tell right now, the only issue should be that my error handling is currently just printed to stderr and doesn't actually change the srcsrv flag.

So fingers crossed - there might be source server on the debug builds by the end of the week. This means the rest of term can be devoted to pdbstr.exe. I have yet to communicate with Timeless who is the recommended contact to discuss things of this ilk.

Over reading week I should be able to do some more digging into those hex dumps as well as start testing the source server once it's on Tinderbox.

Friday, February 22, 2008

Keeping busy

Well as reading week approaches and I wait (again) for feedback on my patch, I am trying to keep busy in Mozilla activities.

One of my recent activities is helping out with Live Chat user support. It's amazing! A very small team of dedicated people are helping individuals in real time with their wide variety of issues. I've been learning tons about the issues that users face - right now firewalls and security updates seem to be a frequent problem. I'm also jumping headlong into another thread of the vast Mozilla community, this one mostly volunteers, and getting to know some folks. The EST hours for Live Chat are from 4pm - 9pm and from 10pm - 12am. I mostly have been able to pitch in on the later shift because that's about when I am tired of doing my homework :)

Getting involved with Live Chat was super easy - I read the documentation, created and account and was shadowing experienced helpers less than an hour later. One week later and I'm in on a phone conference to discuss what the priorities are for the next 3 months. Again, I say amazing.

I encourage all Mozilla or open source involved Seneca students to jump into Live Chat and try helping out people in real time. It's intense, and can be challenging (I think my success rate is only about 50%) but it's an eye-opening experience for how the other half lives. The half what don't have #seneca for all their questions.

Sunday, February 17, 2008

Better late than never

Although my goal was to have the symbolstore.py patch approved for committal by week's end, I am very close. Today I spent about 5 hours tweaking the patch as per the most recent comments.

What was done:

* Makefile now checks for the environment variable of PDBSTR_PATH to be set before assigning the flag to source index
* Several minor logic changes to the symbolstore.py like removing redundant file path fixing
* I finally understood what Ted was saying about using a temp filename before checking for VCSinfo. This small change made it possible to remove 3 lines of work-arounds that I had been doing :)
* The hard-coding of the path to the pdbstr.exe executable is gone and has been replaced by whatever is in the environment variable

A few glitches:

* Making changes to the Makefile that resulted in errors left me with a makefile that couldn't update itself and Ted pointed me to doing a make -f client.mk configure which regenerated the Makefiles and cleared that up
* I had to look into how to get the environment variable in python - thanks to web search I found the os.environ.get() function

Other than that, it's all working and I'm very happy to be awaiting another review.

Saturday, February 16, 2008

Smart Install Instructions

Today as I was setting up a mysql GUI I saw this:
Screenshot of the user friendly install instructions

It made me very happy to see that part about Ejecting and ridding yourself of the installer because a lot of folks still might not know that. I've definitely met some people who drag firefox to their dock from the installer instead of copying it to their Applications folder. Every time they open the application from the dock, the installer has to mount and someday down the line they might delete it and then not know why the application no longer launches from the dock. OS X just gives the ever so helpful "?" and the user is left to wonder what happened. That far down the line - it's not so apparent that they should have dragged to the Application folder and ejected the installer.

Most Mac installers these days do make it quite clear that you should drag to your install folder and I would love to see them add this tip too - just to help folks finish what they start.

Upcoming Source Server Demos

This is mostly a reminder for me, but if anyone is interested in seeing how the Source Server is working/coming along, my demos are scheduled for:

  • Thursday March 13th at 1:30pm in ORI

  • Thursday April 3rd at 1:30pm in ORI



The first demo should involve the symbolstore.py patch having been committed, there should be a debug build that can be downloaded and attached to VStudio and/or WinDBG where the source code is then pulled via the Source Server.

The second demo will hopefully be a look at a reverse engineered pdbstr.exe - something that can read and write to pdb files the source indexing data block.

Thursday, February 14, 2008

Now on Windows

Using StraceNT I was able to get this output.

Here's a snippet:


IntellectualHeaven (R) System Call Tracer for NT, 2K, XP, 2K3.
Copyright (C) Pankaj Garg. All rights reserved.

Tracing command: ["pdbstr" -r -p:accessiblemarshal.pdb -i:am3.stream -s:srcsrv]
[T3600] TlsGetValue(1, 0, 2bfef8, 182020, ...) = 2c7778
[T3600] EnterCriticalSection(77c61b30, 2c7778, 2bfed0, 77c3a03b, ...) = 0
[T3600] LeaveCriticalSection(77c61b30, 2bfed0, 77c3a0fa, d, ...) = 0
[T3600] EnterCriticalSection(77c61b18, 2c7778, 2bfed0, 77c3a06c, ...) = 0
[T3600] LeaveCriticalSection(77c61b18, 2bfed0, 77c3a108, c, ...) = 0
[T3600] HeapFree(2c0000, 0, 2c7778, 0, ...) = 1
[T3600] TlsSetValue(1, 0, 0, 2bfef8, ...) = 1
[T2556] LeaveCriticalSection(2c1fdc, 6f1c0, 77c2d154, 4, ...) = 0
[T2556] LeaveCriticalSection(2c7718, 6f1d8, 77c3b967, 13, ...) = 0
[T2556] HeapFree(2c0000, 0, 2caa48, 1058d24, ...) = 1


The results are certainly a bit clearer looking than the Linux/Wine results. I am still clueless however to the deeper meaning. I've been told I need to talk to timeless on IRC, that he is the one with major knowledge on reverse engineering.

This is turning into quite the rabbit hole.

Tuesday, February 12, 2008

Exploring what pdbstr.exe actually does

As I wait to find out what else I can do to get the symbolstore.py patch commit-worthy, I thought I would start to look into reading and writing to pdbs as described here.

I downloaded Paws.exe which is a hex editor program and also installed Borland's C++ compiler so I could try to dump the contents of a pdb with Borland's tdump utility.

This netted me a whole lot of hex code and so I stared at that for a while, consulting Jeremy Gordon's information, and didn't really figure much out. I can see the data block that pdbstr write to the file, it's just not clear after comparing the dump information of a couple of pdbs how or where the data block's write position is determined. This is a hex dump.

For kicks my friend John Ford and I looked at pdbstr.exe through strace and Wine on his Ubuntu box. The results are here.

There is a StraceNT for Windows so I will try to run pdbstr with that and hopefully end up with results that are a bit clearer since they won't include the Wine calls.

Anyone with any ability to read into either of these dumps - feel free to comment and share your insights.

Monday, February 11, 2008

The Gap.

This morning I tried to explain to my Deaf-Blind student the usefulness of Bookmarks. What he and my friend's mom who I was helping with a website the other day have in common is that they both save web pages to their desktops. For the mom, this results in an mhtml file on her desktop that, once opened, doesn't help her know what website she is visiting...and she mistakenly thinks she is on the web when she looks at it because the "file:///" URI signifier means nothing to her.

When my student saves the page, I believe he is selecting a different option which creates a folder on his desktop containing as much of the page as Firefox will grab. Again, when he goes back into this folder later many things happen:

  1. He doesn't know why the folder is on his desktop

  2. The contents of the folder are all files that are unfamiliar (.css, .js and .html)

  3. There is no signifier of what the original web page was



So today I thought - okay, let's learn about bookmarks.

I showed him how to drag and drop to the Bookmarks Toolbar and how to Bookmark using the pulldown menu. I'd like to think he "gets it" but I know that it will take many more lessons for this to get across. What did get a little recognition of concept was that he would have access to the most recent content if he used bookmarks instead of saving a local copy. That was appealing.

All this comes to me now as I am reading these studies done on bookmarking habits, and reading into the logic behind Places and how the changes to the Home button are going to be initiated in Firefox 3 and I feel concerned.

I love the idea of bookmarking with tags and never having to scroll down a list again. I love the awesome bar's quick access to recent pages and I love that with minimal typing in the location bar I can see my history and get to previously visited pages quickly. Here's the thing - I'm able to take in the whole screen at once, I see little details quickly and I know what I'm looking for.

The mom, the student, and I'm willing to bet a lot of unstudied people out there are not doing this and are way behind on the idea of tags let alone how to use them.

I would love to do studies of web usage and get a really huge pool of participants because from what I've read so far, the largest group was 320 people at a tech conference and in my opinion that's a lazy study that will only confirm what the researchers and pushers of web 2.0 want to hear.

Moving forward is amazing and fun for me but I'm loathe to leave all the people I know and love behind to wander around lost and confused. My mom said the other day that she now understands less than 50% of what I'm talking about when I describe the projects I'm working on. 50%?! That sucks! My mom is actually a very astute person who even took some computer programming back in the 80's and she is very competent with power tools and techno-gadgetry. I want to be at least 80% compatible with my mom when talking about my projects. It would be great if it was possible for her to participate in the discussion instead of just listening politely.

Today I'm lamenting that only a select few are steering the discussion about the future of bookmarking and the student and the mom are left on the other side of a widening gap. Their bookmarks and habits are just as, if not more, important.

Friday, February 8, 2008

Eric Raymond peeved me this morning...

I was on my morning commute to school, reading my ethics textbook, happily processing information about intellectual property and copyright law (my favourite things!) when I read this:

"Anybody who has studied software engineering knows that programmers do not actually spend most of their time originating software. They spend most of their time on service updates and maintenance. Nobody thinks about the implications of this: that the software industry is actually a service industry operating under the delusion that it is a manufacturing industry. Software producers are operating under a manufacturing and cost model, under which the way you make money is building a product and getting it out the door. Because they have this model of themselves as a manufacturing industry, all the bright people go to production and the dumb people go to the support desk. That's why when you call a vendor support line you have to fight your way through three layers of idiots to get down to anyone who knows anything. " - Eric Raymond in an interview.


Well, this got me a little angry.

Now, I won't say that I have never had to do any "idiot filtering" when calling a large company's support services. I blame this on the McJob model of customer service where companies hire people at low wages and then "train" them by giving them scripts to diagnose the simplest problems. Perhaps the company thinks it is also "idiot filtering" for the person who hasn't plugged in their modem and we are all losing with this approach.

This is not exactly what I feel Eric Raymond is saying though. What I read into his statement is that there is a natural filtering of "bright" people and "dumb" people - like natural selection or something and that is just plain wrong.

I'm in the BSD program at Seneca College largely because I would like to be able to better support the hundreds of people I know who need technical support on a semi-regular basis and at the same time be able to help the hundreds of thousands of people I don't know who also need support. I'm interested in technology and have always gravitated towards it while at the same time noticing that others do not. This puts me in a place where I could be a great "bridge" between techies and non-techies.

Why can't programmers and other technically-stimulated people understand that they are a minority? Most people need help to operated systems, use gadgets, understand software and troubleshoot their life-enhancing tech gear. To be someone who supports them does not mean that you are too "dumb" to be on the creative end (which he already acknowledges is a small portion of the industry).

I'm sick of people treating documentation like a hassle, like cops to paperwork, as though it doesn't matter...this leads to poor documentation done for documentation's sake and not for the end user. In my opinion, you have to be "bright" to create tech support documents that help a user continue to patronize your software and not just quit. From what I've learned as a customer service representative in many forms, people will share their bad experiences to approximately 10 times more people than they will share their good ones.

I take this to mean two things for how to go about life:

1. Share your good experiences often to try and make up for how often people share bad ones
2. Be someone's good experience so they have something positive to share

It's a challenge to me and to other programmers to push ourselves to keep things clear and easy to grasp for the lowest common denominator as much as possible. This doesn't mean the software is dumb or low level, nor does it mean that the user is. It means we will agree to speak a certain language together so the most people can benefit. People will clamour for software that does what they need with the least amount of work.

The article is good, I recommend reading it because Eric Raymond obviously knows his stuff and has some great insights to share. I'm inspired by his vision of alternative business models driven by open source. The quote just sparked something that was obviously bugging me already.

Back to reading up on ethics...