View on GitHub

Speedtables

Speed tables is a high-performance memory-resident database. The speed table compiler reads a table definition and generates a set of C access routines to create, manipulate and search tables containing millions of rows. Currently oriented towards Tcl.

Download this project as a .zip file Download this project as a tar.gz file

Why Speedtables/CTables?

Tcl is not well-known for its ability to represent complex data structures. Yes, it has lists and associative arrays and, in Tcl 8.5, dicts. Yes, object systems such as Incr Tcl provide a way to create somewhat complex data structures and yes, the BLT toolkit, among probably others, has created certain more efficient ways to represent data (a vector type, for instance) than available by default and, yes, you can play games with "upvar" and namespaces to create relatively complex structures.

However, there are three typical problems with rolling your own complex data structures using lists, arrays, and upvar, or Itcl or Otcl, etc:

One is that they are memory-inefficient. Tcl objects use substantially more memory than native C. For example, an integer stored as a Tcl object has the integer and all the overhead of a Tcl object, 24 bytes minimum and often way more. When constructing stuff into lists, there is an overhead to making those lists, and the list structures themselves consume memory, sometimes a surprising amount since Tcl tries to avoid allocating memory on the fly by often allocating more than you need, and sometimes much more than you need. (It is not uncommon to see ten or twenty times the space consumed by the data itself used up by the Tcl objects, lists, array, etc, used to hold them. Even on a modern machine, using 20 gigabytes of memory to store a gigabyte of data is at a minimum kind of gross and at worst makes the solution unusable.) (Tcl arrays also store the field name along with each value, which is inherently necessary given their design but is yet another example of the inefficiency of this approach.)

The second problem with rolling your own complex structures is that they are computationally inefficient. Constructing complicated structures out of lists, arrays, etc, traversing and updating them is realtively CPU time-consuming.

Finally, such approaches are often clumsy and obtuse. A combination of upvar and namespaces and lists and arrays to represent a complex structure, for example, creates a relatively opaque way of expressing and manipulating that structure, making the code twisted, hard to follow, hard to teach, hard to modify, and hard to hand off.

CTables reads a structure definition and emits C code to define that structure. We generate a full-fledged C extension that manages rows as native C structs, and emit subroutines for manipulating those rows in an efficient manner. Memory efficiency is high because we have very little per-row overhead (only a hashtable entry beyond the size of the struct itself). Computational efficiency is high because we are reasonably clever about storing and fetching those values, particularly when populating from PostgreSQL database query results, reading them from a Tcl channel containing tab-separated data, writing them tab-separated, locating them, updating them, counting them, as well as importing and exporting by other means.

We also maintain a "null value" bit per field and provide ways to distinguish between null values and non-null values, similar to SQL databases, and providing a ready bridge between those database and our tables.

More Information

If you're ready to learn more information about how to use Speedtables, try the following resources:

online manual for Speedtables walks you through all of the major functionality in a series of progressing chapters.
super basic documentation within the repo.
speedtables project summary on Openhub.