Crashing Into Open Source (without a paddle): Looking for suggestions on dealing with lots of data

Thursday, August 7, 2008

Looking for suggestions on dealing with lots of data

So I'm still plugging away at figuring out how to interpret the massive amounts of error log output that our unittest builds create.

As the test suites are being run, there is a steady stream of stdio being generated and logged. From this stdio, I gather up all the lines of output that contain "TEST-UNEXPECTED-FAIL" (thanks to Ted for unifying the output!).

Now I have files that look something like this:

linux-2 | 67 | 07/25/2008 | 06:40 | *** 61506 ERROR TEST-UNEXPECTED-FAIL | /tests/toolkit/content/tests/widgets/test_tree.xul | Error thrown during test: uncaught exception: [Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIDOMWindowUtils.sendMouseScrollEvent]"  nsresult: "0x80004005 (NS_ERROR_FAILURE)"  location: "JS frame :: http://localhost:8888/tests/SimpleTest/EventUtils.js :: synthesizeMouseScroll :: line 273"  data: no] - got 0, expected 1
linux-2 | 67 | 07/25/2008 | 06:40 | *** 62352 ERROR TEST-UNEXPECTED-FAIL | /tests/toolkit/content/tests/widgets/test_tree_hier.xul | Error thrown during test: uncaught exception: [Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIDOMWindowUtils.sendMouseScrollEvent]"  nsresult: "0x80004005 (NS_ERROR_FAILURE)"  location: "JS frame :: http://localhost:8888/tests/SimpleTest/EventUtils.js :: synthesizeMouseScroll :: line 273"  data: no] - got 0, expected 1
linux-2 | 67 | 07/25/2008 | 06:40 | *** 63084 ERROR TEST-UNEXPECTED-FAIL | /tests/toolkit/content/tests/widgets/test_tree_hier_cell.xul | Error thrown during test: uncaught exception: [Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIDOMWindowUtils.sendMouseScrollEvent]"  nsresult: "0x80004005 (NS_ERROR_FAILURE)"  location: "JS frame :: http://localhost:8888/tests/SimpleTest/EventUtils.js :: synthesizeMouseScroll :: line 273"  data: no] - got 0, expected 1</pre>

Where the info is "|" delimited and goes like this:

<pre>PLATFORM | BUILD_NO | DATE | TIME | TEST-RESULT | TEST-NAME | TEST-OUTPUT

Approximately 7000 lines of error output for less than a month of constant testing.

I want to be able to know the following (at least):

* How many times has a particular test failed?
* On which platforms?
* How many times this week vs. last week?

That would be a start anyway.

I would love to be able to create a graph or something visual that shows peaks of test failures. Unfortunately I don't really know much about that area.

So I am asking for help/suggestions. If you had about 490,000 lines of errors (representing 3 platforms) in the above format - what would you do?

I can pretty easily add to the python script that greps for error output so that it creates sql insert statements instead of a text file and I would welcome tips that include creating/automating a database to hold all the error info. I've been thinking of setting something up with RoR to let people create their own views of the data depending on what they are looking for.

Looking forward to your advice.

3 comments:

cozby said...: Don't bother with RoR, if you're already using Python give Django a shot.

Not sure if you could use this http://code.google.com/apis/chart/#audience but it could be of some use maybe?; August 7, 2008 at 2:17 PM
Rob Helmer said...: Sounds like you already have the right idea imho:

1) parse the data
2) insert into database
3) ???
4) profit

Seriously though, #3 is just something like "generate canned reports", then after that is working, maybe allow users to build their own queries with a web interface.

I'd focus on the canned reports first, so you can figure out what data is useful, and what is not, and optimize it. If you're only dealing with tens of thousands of rows it shouldn't be too bad at all to generate arbitrary reports though, you could probably get away with putting everything into one big table, to start.

Have you poked anyone on the webtools team about this? They've done some awesome, similar stuff with Breakpad reporting for example..; August 7, 2008 at 11:53 PM
dejan said...: Shouldn't be too difficult to parse, use awk in a bash script, a simple command like awk -F'|' '/linux-2/ {num++;print $5} END {print num}' log.txt
will print all the error messages for linux-2 platform. Something along these lines.; August 11, 2008 at 7:33 PM