Home

Advertisement

Customize
24 June 2009 @ 09:57 am
The word "codebase" has been bugging me lately.  I've noticed myself using it, and it's starting to grate on me.

It appears to mean a collection of computer code, from an implementation perspective.  Users of the word might refer to the "foo-1.0 codebase" as opposed to the "foo-2.0 codebase, where we totally rewrote everything".

Assuming that this is what the word means, it's a useful one.  However, where did this word come from, and when?   Does it have a more precise meaning, or has it spread across the internet with the rough (but useful) meaning above?

The only definitions I can find are within Wikipedia or Wictionary (neither of which I'm a fan of).  I don't see it in the Jargon File either (but ditto, frankly).  It also seems to have a (different) specific meaning for Java applets, and is the name of a proprietary database product.

Does anyone have an etymology for this word?
 
 
04 June 2009 @ 08:48 pm

I've put together a tarball release of my SQL/command-line collision, formerly "show", now "squeal"

Tarball here:   https://fedorahosted.org/released/squeal/squeal-0.4.tar.gz
SRPM for Fedora/RHEL 5 here: http://people.redhat.com/dmalcolm/python/squeal-0.4-3.src.rpm

Here's what happened other than the name change:

New Data Sources

Text files and streams

Squeal can now deal with arbitrary text files, and on stdin (using "-").

It uses the first line of the input to determine the number of columns, giving you an input source of string columns named "col0", "col1", ..., up to N-1.

By default it splits columns on whitespace, but you can use -F a.la. "awk" to specify a field separator, e.g.:

   cat some-file.txt | squeal -F: --format=html -

to generate an HTML table from a colon-separated file.

You can also specify a regular expression containing groups, which will become the columns, e.g.:

   squeal col0, count from -r "(\<[^\>]+\>) (.*)" irc-log.txt group by col0 order by count desc

to figure out who's the chattiest in an IRC log.

Of course, if you need more complicated parsing, it's probably worth writing a dedicated data-source backend.

Archives

You can now issue SQL-like queries upon the contents of various kinds of archive files: .zip, .tar, .tar.bz2, .tar.gz2, and .rpm. For example, here's a query on the payload of an RPM file:

    $ squeal "total(size)" from ~/rpmbuild/RPMS/noarch/squeal-0.4-1.fc10.noarch.rpm
    total(size)|
    -----------+
       171407.0|

tcpdump/Wireshark files

I wrote a proof-of-concept backend for querying tcpdump files, e.g.:

$ squeal "count(*)", "total(length)", src_mac, dst_mac from test.pcap group by src_mac, dst_mac

to analyse the quantity of network traffic between pairs of machines.

Internally it's merely invoking tcpdump turn the file back into textual form, then carve up with regexps, so it's not at all robust yet.

/var/log/maillog* (sendmail and spamd)

I wrote an experimental parser for /var/log/maillog

Query Syntax Improvements

Squeal can split its own arguments, to minimize the amount of escaping that you have to do within the shell. You can now pass in mixed queries like:

      $ squeal "count(*), total(size) host from" /var/log/httpd/*error_log* \
      "order by total(size) desc"

where some of the arguments are split by the shell, and some by squeal's parser.

You can now type "count" rather than "count(*)", provided "count" isn't a column name (it's a pain to type, and to have to escape this from the shell). So the above query can be simplified further to:

      $ squeal "count, total(size) host from" /var/log/httpd/*error_log* \
      "order by total(size) desc"

Output

I added two new formatting options:

  • "text" : outputs as lines of space-separated fields
  • "table" : outputs an ascii-art table

Bugfixes

  • The httpd log backend now supports parsing logs containing usernames
  • The syslog backend can now deal with single-digit dates within a month(!)
  • Deal with absolute and relative paths when path-matching input filenames; only check for Augeas below /etc
  • Various internal cleanups
  • Started a unit test suite.
  • Support Python < 2.5 by using earlier versions of sqlite; runs on RHEL 5
  • Work around an issue seen sometimes with the RPM backend
  • Detect exceptions in execution of the internal sqlite queries, and log them
Enjoy!
 
 
24 May 2009 @ 10:59 pm
My command-line/SQL hybrid needed a new name, since "show" was regarded as too generic.

Thanks everyone who suggested a name, and for the feedback. "squeal" was my favourite, so "show" is now named squeal.

I'm working on a release; it's gained a few new features since I last blogged.

(Thanks to the Fedora Infrastructure team for doing the rename)
 
 
I've learned most of these the hard way:
  • The monitor is in power-saving mode
  • Your video card has two outputs, and the monitor is plugged into the wrong one.
  • The screensaver kicked in while you were talking to your boss, but your program has grabbed input focus.
  • You've forgotten to flip the display buffers. You're drawing everything OK, but not displaying it.
  • There's a bug in your memory management. You're rendering to a different area of video RAM than the one the monitor is reading from.
  • Your scene is too complex; either the CPU or CPU crashed before the first frame finished rendering.
  • The "near" clip plane is further away than the "far" clip plane
  • You're rejecting the wrong side of each line as you walk your BSP tree; your universe is hiding just beyond the corner of your eye.
  • You forgot to set up the bounding-boxes for your objects, and all of them are being rejected as too small
  • You have a bug in your matrix library. The entire universe has collapsed to (0,0,0).
  • You have a bug in your quaternion library. All of your rotated objects are collapsing to points.
  • You have a bug in your back-face culling. Every face is being culled.
  • Your camera is too narrow angle; the universe has collapsed to a single point directly ahead of you
  • Your camera's so wide-angle... (insert mother joke)
  • You've confused the origins of screen space and of the frame buffer, and all of the scene is being rendered off the side of the framebuffer (and culled per-fragment)
  • You're using index-colour, and you forgot to set up a palette: all colours are black.
  • Your scale is wildly wrong: the scene is vastly larger than you expect. You are viewing a single texel on a single triangle in the scene.
  • Your scale is wildly wrong: the scene is vastly smaller than you expect. The entire world has collapsed to a tiny point in the centre of the screen.
  • The camera is outside the scene and pointing the wrong way. Did you try looking behind you? Above you? etc
  • You forgot to add any lights to the scene. Darkness remains over the surface of the deep.
  • You forgot to set the textures. All of the scene is being rendered with a blank texture.
  • You forgot to set any UV coordinates. The entire scene is being rendered with the colour at (0,0) in their textures.
  • You're doing integer multiplication of fractions and everything is coming out as zero.
  • You've forgotten to offset the objects in the scene in world space and relative to the camera. You are viewing all of the visible objects in the scene from inside, and back-face culling ensures that nothing is visible. (Insert bad pun about "Lost In Translation")
  • You've forgotten to pop matrices from the transformation stack, and you overflowed the stack after a few frames. (You should check error codes at least once per frame)
  • Your collision algorithms aren't good enough. The object representing the camera has fallen through the floor and is hurtling at great speed downwards into nothingness whilst the world disappears high above.
  • Your geometry shader has a bug, and all geometry in the scene is appearing behind the camera
  • Your fragment shader failed to compile, and everything is coming out as black.
  • You forgot to clear the z-buffer, and all of your fragments are being rejected.
  • Your scene is foggier than a lazy TV remake of Sherlock Holmes.
  • You forgot to set up alpha values, and the entire world is fully-transparent.
  • You're standing in front of a black wall.
  • The game logic has decided that we're fading out the screen, in preparation for the next level.
  • You set up all of your data correctly, but a stray pointer is trashing one/all of the above.
  • The in-game menus have appeared, overlaid in front of your scene. Unfortunately, due to poor state management, they're far too big, and a single corner of one letter "A" is obscuring the entire screen with black.
  • Your radiosity renderer doesn't have enough photons.
  • Your volume shader is too opaque.
  • Your bidirectional reflectance distribution functions aren't reflective enough
  • Time is running at a different rate than it should be, and your simulation code has either destroyed all of the objects in the scene, or none have been created yet.
  • There's a FIXME in your code that really needs fixing...
 
 
01 April 2009 @ 05:23 pm


I'm hoping to get my SQL/command-line mashup into Fedora, but, alas "show" is too generic to go into /usr/bin.

Some names have been suggested:
- "squeal", a mispronunciation of "sql" (thanks Nalin and Jeremy).
- shelect

Anyone got any other ideas?

My current favourite name is squeal.

(The current project location is here and the package review request is here)
 
 
30 March 2009 @ 05:17 pm
I wholeheartedly agree with Seth's comment here.
 
 
24 March 2009 @ 10:04 pm
I've done a bit more hacking on my command-line/SQL mashup, currently called "show".

It can now handle /var/log/messages, /var/log/secure (and the rotated logs), so you can issue a command like this:
  $ show /var/log/secure* where message like \"%authentication failure%\"

and browse the results

For example, here's a query with aggregation:
$ show "count(*)", source from /var/log/messages group by source order by "count(*)" desc limit 5
count(*)|source        |
--------+--------------+
1635    |kernel        |
1398    |NetworkManager|
98	|ntpd          |
70	|avahi-daemon  |
63	|dhclient      |


Going beyond log files, I used the rather wonderful Augeas library to get parsers for many of the files in /etc, and wrote a backend to leverage this, so you can write things like:
  $ show /etc/passwd where shell !=\'/sbin/nologin\'

and
  $ show /etc/yum.repos.d/*.repo where gpgcheck != \'"1"\'

(it's a little dumb about string vs numeric types, and shell escaping requires lots of quotes here)

I extended the ncurses table-browsing UI so that you can scroll horizontally as well as vertically, which helps when the columns are wide.

The Fedora infrastructure team set up a hosted project for me, so you can see the source here:
https://fedorahosted.org/show/browser (thanks!)

An up-to-date SRPM can be grabbed from here:
http://people.redhat.com/dmalcolm/show-0.3-1.fc10.src.rpm

and you can grab the source via git here:
$ git clone git://git.fedorahosted.org/show.git

Thanks to everyone for the great feedback on my previous post.

I suspect some kind of integration with Func for running queries over groups of machines would be a good next step for this tool (oh, and fixing up the Trac instance)

Is /usr/bin/show too generic?
 
 
22 March 2009 @ 07:31 pm
I found myself analyzing some apache log files the other day, and found myself wanting a SQL interface to the logs.

Now, it's possible to send the logs directly into a database, but that wasn't how the machine was configured.

This got me thinking. We have many different log formats, and many different sources of data. All of our tools seem to have different interfaces.

For example, why should I write regular expressions and shell pipelines to get at my logs?
Why do I have to learn a custom syntax ("rpm -qa --queryformat='various things'") for looking at the software I have installed? Why does e.g. the audit subsystem have its own query format?

Why can't I just use SQL, and write SELECT statements to drill down into all of this data?

So I started writing a SELECT statement for the command line.

I didn't want to use SELECT as caps are a recipe for RSI, and annoyingly, "select" is a bash builtin.

So I've picked "show" as the command; it doesn't seem to be taken by anything on my system. (it's a SQL command, but hopefully that's not going to be too confusing)

The idea is that it looks at the FROM part of the query, and looks up a data source based on this. For example, if you write

$ show host, "count(*)", "total(size)" from /var/log/httpd/*access_log* group by host;

it figures out that you're looking at apache logs, looks up an appropriate backend, and makes a table "on the fly", so that it can run the query and give you the results:

     host|count(*)|total(size)|
---------+--------+-----------+
127.0.0.1|      10|    27679.0|


(You have to either use quotes or escaping to deal with parentheses and * characters from the shell)

You can use filters using "WHERE":

$ show distinct request from /var/log/httpd/*access_log* where status = 404
request                  |
-------------------------+
GET /favicon.ico HTTP/1.1|


If you specify more than one filename it adds a "filename" column:

# show filename, "count(*)", "total(size)" from /var/log/httpd/*access_log* group by filename order by "total(size)" desc
                       filename|count(*)|total(size)|
-------------------------------+--------+-----------+
/var/log/httpd/ssl_access_log.4|    1921| 12824849.0|
    /var/log/httpd/access_log.3|     222|  6207367.0|
/var/log/httpd/ssl_access_log.3|     741|  2210799.0|
    /var/log/httpd/access_log.4|     268|   626711.0|
/var/log/httpd/ssl_access_log.1|       8|    13351.0|
/var/log/httpd/ssl_access_log.2|       5|     7305.0|
    /var/log/httpd/access_log.2|       4|     6995.0|
    /var/log/httpd/access_log.1|       2|      288.0|

Naturally, this isn't just for apache logs.

Here I'm querying the yum logs. The backend code deals with the changes that happened in the logging format, without me having to worry about this in my query:

[root@brick select]# show from /var/log/yum.log* where 'name like "kernel%"' limit 5
           time|    event|           name|  arch|epoch|  version|     release|        filename|
---------------+---------+---------------+------+-----+---------+------------+----------------+
Feb 14 20:00:03|Installed|kernel-firmware|noarch| None|2.6.27.12|170.2.5.fc10|/var/log/yum.log|
Feb 14 20:00:28|  Updated| kernel-headers|  i386| None|2.6.27.12|170.2.5.fc10|/var/log/yum.log|
Feb 14 20:15:11|Installed|   kernel-devel|  i686| None|2.6.27.12|170.2.5.fc10|/var/log/yum.log|
Feb 14 21:05:53|Installed|         kernel|  i686| None|2.6.27.12|170.2.5.fc10|/var/log/yum.log|
Feb 14 21:12:41|Installed|     kernel-PAE|  i686| None|2.6.27.12|170.2.5.fc10|/var/log/yum.log|

I also wrote "rpm" and "proc" backends:

[david@brick ~]$ show name, "count(*)" from rpm group by name having "count(*)>1"
                     name|count(*)|
-------------------------+--------+
               gpg-pubkey|       4|
jakarta-commons-validator|       2|
 java-1.6.0-openjdk-devel|       2|
                   kernel|       3|
             kernel-devel|       2|
               kernel-xen|       3|
                  libgnat|       2|
                  openssl|       2|


Looking in RPM database by vendor:

[david@brick select]$ show vendor, "count(*)" from rpm group by vendor
            vendor|count(*)|
------------------+--------+
              None|      12|
    Fedora Project|    2042|
              Koji|       2|


It got tiresome typing "*", and "from" all the time, so you can omit these:

# show rpm where release not like \'%fc%\' order by name limit 4
    name|epoch|version|release|  arch|        vendor|
--------+-----+-------+-------+------+--------------+
 MAKEDEV| None|   3.24|      1|  i386|Fedora Project|
   PyXML| None|  0.8.4|     10|  i386|Fedora Project|
  autofs|    1|  5.0.3|     36|  i386|Fedora Project|
automake| None| 1.10.1|      2|noarch|Fedora Project|


There's a --format option, which can currently be used to emit tables in HTML format. Other formats could be written e.g. xml, json, yaml, odf spreadsheets, etc.

For bonus points, I wrote an ncurses UI for browsing tabular results. The command detects if stdout is connected to a tty, and if so, goes into the UI, otherwise it sends text, so you can use it for shell pipelines etc.
Here's a screenshot from
show rpm order by name



I'm already finding this useful.

I'm hoping to host this as a Fedora project. For now, you can grab the code from my Red Hat page here:
http://people.redhat.com/dmalcolm/show-0.2-1.fc10.src.rpm

There's plenty of scope for writing new table backends for other data sources/file formats, improving the UI, writing new output formats etc Any ideas?
 
 
I implemented XInclude support for  docbook-lint just now, or rather, a first pass at it.  Most documents I've been pointing it at turned out to be built from a short "core" document that has a collection of XInclude elements pointing at the rest of the content, so this was a must-have feature.

Seems to work, but needs testing.  Hopefully I'll have a chance to try hooking it up with Publican soon (dear lazyweb: it's not clear how to get at the source directly on that page), and maybe make the toolchain a little better.
Tags:
 
 
13 April 2008 @ 07:24 pm
I've been avoiding this for a while, but thanks to Jeremy's prodding,  I've started this blog.  I've got a few coding projects in-flight at the moment, and it's about time I let the world know about them.   So far I've set up a home page for docbook-lint, but there are some others on my hard drive that ought to see the light of day.  More to follow... I hope.
 
 
12 April 2008 @ 11:55 am
I've been a fan of DocBook and "semantic markup" for some time now; I think it's a great way to represent complicated computer documentation, but alas the toolchain is often lacking.  Most DocBook production pipelines just check validity against a DTD, followed by human inspection of generated HTML and PDFs.  This sucks.

About a year ago, I ran into some problems in some documentation built using DocBook and realized that there are plenty of things we can check for  that can be done by a computer, rather than have a human proof-reader have to do it.
 
So I built a small Python program to do this: docbook-lint

It now has its own hosting on Fedora's web site, so I thought it was time to blog about it:
https://fedorahosted.org/docbook-lint/

If you use DocBook, please give it a try, and let me know if you find it useful.  Any ideas for improving it?
 
 
 
 

Advertisement

Customize