Thursday, June 5, 2008

Parsing for errors in Buildbot log files

The other day two of our Moz2 unittest buildbots - one Linux and one Windows - were both failing tests intermittently. We have all these logs but no way to parse the data to look for patterns and try to figure out what is going on. In an attempt to scratch the surface of this issue, I was tasked to look at the error messages and put together something for bug 435064.

It took a little while to come up with the right approach but in the end I had some grep statements that did the trick. Thinking that this will come up again, I packaged them into a little shell script to do the dirty work for me next time:

# simple script for gathering up errors in log files

if [ -z "$1" ]
echo "Usage: '$0' [directory]"
exit 1

string2="ERROR FAIL"
string3="command timed out"


echo "Looking for UNEXPECTED, ERROR and command time outs..."
for file in *-log* # traverse all log files in $DIRNAME
grep -Hn "$string1" $file >> $OUTPUT_FILE
grep -Hn "$string2" $file >> $OUTPUT_FILE
grep -Hn "$string3" $file >> $OUTPUT_FILE

# these two searches include 5 lines of context
echo "Looking for Check and Browser Fails...."
grep -HnC 5 $string4 *-log-check* >> $OUTPUT_FILE
grep -HnC 5 $string4 *-log-browser* >> $OUTPUT_FILE

echo "Sorting......"
sort -n $OUTPUT_FILE | sed /--/d > "$OUTPUT_FILE.sorted"

echo "Search complete."
exit 0

Thanks to dchen and humph for helping with the finer points of writing shell scripts.

No comments: