Crashing Into Open Source (without a paddle): databases

Thursday, August 7, 2008

Looking for suggestions on dealing with lots of data

So I'm still plugging away at figuring out how to interpret the massive amounts of error log output that our unittest builds create.

As the test suites are being run, there is a steady stream of stdio being generated and logged. From this stdio, I gather up all the lines of output that contain "TEST-UNEXPECTED-FAIL" (thanks to Ted for unifying the output!).

Now I have files that look something like this:

linux-2 | 67 | 07/25/2008 | 06:40 | *** 61506 ERROR TEST-UNEXPECTED-FAIL | /tests/toolkit/content/tests/widgets/test_tree.xul | Error thrown during test: uncaught exception: [Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIDOMWindowUtils.sendMouseScrollEvent]"  nsresult: "0x80004005 (NS_ERROR_FAILURE)"  location: "JS frame :: http://localhost:8888/tests/SimpleTest/EventUtils.js :: synthesizeMouseScroll :: line 273"  data: no] - got 0, expected 1
linux-2 | 67 | 07/25/2008 | 06:40 | *** 62352 ERROR TEST-UNEXPECTED-FAIL | /tests/toolkit/content/tests/widgets/test_tree_hier.xul | Error thrown during test: uncaught exception: [Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIDOMWindowUtils.sendMouseScrollEvent]"  nsresult: "0x80004005 (NS_ERROR_FAILURE)"  location: "JS frame :: http://localhost:8888/tests/SimpleTest/EventUtils.js :: synthesizeMouseScroll :: line 273"  data: no] - got 0, expected 1
linux-2 | 67 | 07/25/2008 | 06:40 | *** 63084 ERROR TEST-UNEXPECTED-FAIL | /tests/toolkit/content/tests/widgets/test_tree_hier_cell.xul | Error thrown during test: uncaught exception: [Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIDOMWindowUtils.sendMouseScrollEvent]"  nsresult: "0x80004005 (NS_ERROR_FAILURE)"  location: "JS frame :: http://localhost:8888/tests/SimpleTest/EventUtils.js :: synthesizeMouseScroll :: line 273"  data: no] - got 0, expected 1</pre>

Where the info is "|" delimited and goes like this:

<pre>PLATFORM | BUILD_NO | DATE | TIME | TEST-RESULT | TEST-NAME | TEST-OUTPUT

Approximately 7000 lines of error output for less than a month of constant testing.

I want to be able to know the following (at least):

* How many times has a particular test failed?
* On which platforms?
* How many times this week vs. last week?

That would be a start anyway.

I would love to be able to create a graph or something visual that shows peaks of test failures. Unfortunately I don't really know much about that area.

So I am asking for help/suggestions. If you had about 490,000 lines of errors (representing 3 platforms) in the above format - what would you do?

I can pretty easily add to the python script that greps for error output so that it creates sql insert statements instead of a text file and I would welcome tips that include creating/automating a database to hold all the error info. I've been thinking of setting something up with RoR to let people create their own views of the data depending on what they are looking for.

Looking forward to your advice.

Wednesday, June 11, 2008

What did I do today?

Interning at Mozilla has so far provided me with many opportunities to learn (and re-learn) some of the finer points of the command line interface.

Today I have spent most of my time working on some scripts (both shell and python) that will assist in parsing error logs from the unittest builbot masters.

Here's how I would like it to work:

The script will live in the master directory and when run followed by the 1...n slave directory names it will create a folder with files containing all lines which match particular error messages that are in the stdio log files. Each file is specific to the pattern searched (eg: reftest, browser, check)

A python script then reads through each of the files and breaks the lines of error statements up into sqlite insert statements and throws them into a sqlite db

The DB will have a front end which will allow for easy searching and sorting of the data to see if there is a particular test that fails more frequently than others, on which platforms, etc...

This is an interesting side project for me as we wait for release to start bringing the unittest boxes up to Buildbot 0.7.7. I'm enjoying writing python (esp. compared to shell scripting).

Finally - many thanks to Chris Tyler for today's little success. I needed to put the time and date of the log file into the pattern matches so that the database can be more useful and I was having difficulty figuring it out, Chris is the expert at the one line solution which was ( >> means continued on next line):


for file in *-log*    # traverse all files in $DIRNAME
  do 
    grep -HnA 2 "$string1" $file >>
  | sed "s~^~|$(date -r $file '+$%D|%T|')~" >>
    > "$OUTPUT_FILE.reftest"
  done