You are viewing dmalcolm

 
 
04 June 2009 @ 08:48 pm
squeal 0.4  

I've put together a tarball release of my SQL/command-line collision, formerly "show", now "squeal"

Tarball here:   https://fedorahosted.org/released/squeal/squeal-0.4.tar.gz
SRPM for Fedora/RHEL 5 here: http://people.redhat.com/dmalcolm/python/squeal-0.4-3.src.rpm

Here's what happened other than the name change:

New Data Sources

Text files and streams

Squeal can now deal with arbitrary text files, and on stdin (using "-").

It uses the first line of the input to determine the number of columns, giving you an input source of string columns named "col0", "col1", ..., up to N-1.

By default it splits columns on whitespace, but you can use -F a.la. "awk" to specify a field separator, e.g.:

   cat some-file.txt | squeal -F: --format=html -

to generate an HTML table from a colon-separated file.

You can also specify a regular expression containing groups, which will become the columns, e.g.:

   squeal col0, count from -r "(\<[^\>]+\>) (.*)" irc-log.txt group by col0 order by count desc

to figure out who's the chattiest in an IRC log.

Of course, if you need more complicated parsing, it's probably worth writing a dedicated data-source backend.

Archives

You can now issue SQL-like queries upon the contents of various kinds of archive files: .zip, .tar, .tar.bz2, .tar.gz2, and .rpm. For example, here's a query on the payload of an RPM file:

    $ squeal "total(size)" from ~/rpmbuild/RPMS/noarch/squeal-0.4-1.fc10.noarch.rpm
    total(size)|
    -----------+
       171407.0|

tcpdump/Wireshark files

I wrote a proof-of-concept backend for querying tcpdump files, e.g.:

$ squeal "count(*)", "total(length)", src_mac, dst_mac from test.pcap group by src_mac, dst_mac

to analyse the quantity of network traffic between pairs of machines.

Internally it's merely invoking tcpdump turn the file back into textual form, then carve up with regexps, so it's not at all robust yet.

/var/log/maillog* (sendmail and spamd)

I wrote an experimental parser for /var/log/maillog

Query Syntax Improvements

Squeal can split its own arguments, to minimize the amount of escaping that you have to do within the shell. You can now pass in mixed queries like:

      $ squeal "count(*), total(size) host from" /var/log/httpd/*error_log* \
      "order by total(size) desc"

where some of the arguments are split by the shell, and some by squeal's parser.

You can now type "count" rather than "count(*)", provided "count" isn't a column name (it's a pain to type, and to have to escape this from the shell). So the above query can be simplified further to:

      $ squeal "count, total(size) host from" /var/log/httpd/*error_log* \
      "order by total(size) desc"

Output

I added two new formatting options:

  • "text" : outputs as lines of space-separated fields
  • "table" : outputs an ascii-art table

Bugfixes

  • The httpd log backend now supports parsing logs containing usernames
  • The syslog backend can now deal with single-digit dates within a month(!)
  • Deal with absolute and relative paths when path-matching input filenames; only check for Augeas below /etc
  • Various internal cleanups
  • Started a unit test suite.
  • Support Python < 2.5 by using earlier versions of sqlite; runs on RHEL 5
  • Work around an issue seen sometimes with the RPM backend
  • Detect exceptions in execution of the internal sqlite queries, and log them
Enjoy!