11 - Speed Tables Internals

This chapter describes internal implementation details of Speed Tables. You can skip this section unless you're interested in finding out how Speed Tables work internally.

C Code Generated and C Routines Created

(There is a better interface than this for all but the lowest-level access code. You can interact with any speed table, regardless of its composition, by making standardized C calls via the speed table's methods and speed table's creator table structures. It's not documented yet but you can study speedtable_search.c, where it is used extensively, and speedtable.h, where those structures are defined.)

The row format is not guaranteed to be the same between point releases of speed tables. However, fields you define will be accessible with the name you defined for them and of the data type corresponding to what you defined, regardless of the release, from the first version to the present and for the foreseeable future.

For the above cable_info table defined, the following C struct is created:

struct cable_info {
    TAILQ_ENTRY(cable_info) _link;
    struct in_addr ip;
    struct ether_addr  mac;
    char  *name;
    int  _nameLength;
    char  *address;
    int  _addressLength;
    char  *addressNumber;
    int  _addressNumberLength;
    char  *geos;
    int  _geosLength;
    int  i;
    int  j;
    long ij;
    struct Tcl_Obj  *extraStuff;
    unsigned int _ipIsNull:1;
    unsigned int _macIsNull:1;
    unsigned int _nameIsNull:1;
    unsigned int _addressIsNull:1;
    unsigned int _addressNumberIsNull:1;
    unsigned int _geosIsNull:1;
    unsigned int _iIsNull:1;
    unsigned int _jIsNull:1;
    unsigned int _ijIsNull:1;
    unsigned int _extraStuffIsNull:1;
};

Note that varstrings are char * pointers. We allocate the space for whatever string is stored and store the address of that allocated space. Fixed-length strings are generated inline.

The null field bits and booleans are all generated together and should be stored efficiently by the compiler. We rely on the C compiler to do the right thing with regards to word-aligning fields as needed for efficiency.

You can examine the C code generated -- it's quite readable. Possibly too readable: several times when working on speed tables I've started editing the generated code rather than the code that's generating it, by mistake.

Each table-defining command created has a CTableCreatorTable associated with it, for example:

struct CTableCreatorTable {
    Tcl_HashTable     *registeredProspeed tablePtr;
    long unsigned int  nextAutoCounter;
    int                nFields;
    int                nLinkedLists;
    CONST char       **fieldNames;
    Tcl_Obj          **nameObjList;
    int               *fieldList;
    enum ctable_types *fieldTypes;
    int               *fieldsThatNeedQuoting;
    struct ctableFieldInfo **fields;
    void              *(*make_empty_row) ();
    int               (*set) (Tcl_Interp *interp, struct CTableTable *ctable,
                              Tcl_Obj *dataObj, void *row, int field,
                              int indexCtl);
... and other accessor functions ...
};

The registered proc table is how we handle registering methods, and the nextAutoCounter is how we can generate unique names for instances of the table when using "#auto".

nFields is the number of fields defined for the row, while nLinkedLists says how many doubly linked lists are included in each row. (The first doubly linked list is used by Speed Tables to link all rows of a table together; the rest are created for linking into index entries for each field that is defined as indexable.)

fieldNames is a pointer to an array of pointers to the name of each field, while nameObjList is a pointer to an array of pointers of Tcl objects containing the names of each field. By generating these once in the meta table, they can be used all over the place, by each speed table created by the meta table in many places, sharing these objects and neither incurring the memory or CPU overhead of constantly instantiating new Tcl objects from the name string whenever field names are needed.

fieldList is a pointer to an array of integers corresponding to the field numbers. Guess what? If there are six fields it will contain {0, 1, 2, 3, 4, 5}. The thing is we can feed it to routines we have that take such a list when the user has not told us what fields they want. fieldTypes are an array of data type numbers for each field. (Data type numbers are defined in speed table.h.) fieldsThatNeedQuoting is an array of ints, one for each field, saying if it needs quoting or not.

A number of the fields defined above are being consolidated into the CTableFieldInfo struct, which is defined for each field and contains the field name, name object, field number, type number, whether or not it needs quoting, its compare function (for indexing the the like, something we generate for each field), and its index number (which index of the array of doubly linked list elements built into each row), if indexed, else -1.

Finally, a number of pointers to functions to do things to the speed table are defined. This is cool stuff. As I began to code the complex sorting and indexing code, it started getting hard to keep my head wrapped around it all. Trying to custom-generate all that search code made complicated code even more complicated. Standardizing the search code to not be custom generated at all and to access the custom-generated aspects of the different Speed Tables through these function pointers.

Function pointers are provided to create an empty row, set a field of a row to a value, set a field of a row to null, get the native value of a field from a row as a Tcl object, and get a string representation of a field from a row. Additional function pointers are provided to get the contents of a row as a Tcl list and as a key-value Tcl list, with or without null values, to append the contents of a field to a list, to append the name of a field and the contents of a row's field to a list, and some other stuff like that.

Each instance of the table created with "create" has a CTableTable associated with it:

struct CTableTable {
    struct CTableCreatorTable *creatorTable;
    Tcl_HashTable             *keyTablePtr;
    Tcl_Command                commandInfo;
    long                       count;
    jsw_skip_t               **skipLists;
    struct ctable_baseRow     *ll_head;
    int                        nLinkedLists;
};

This contains a pointer to the meta table (creatorTable), a hash table that we use to store and fetch keys, a command info struct that we use to delete our created command from the Tcl interpreter when it's told to destroy itself, the row count, a pointer to an array of pointers to skip lists, one for each field that has an index defined for it (it's NULL otherwise).

A skip list for an indexed field can be walked to do a walk ordered by that field, as opposed to the pseudo-random ordering provided by walking the hash table or the last-thing-added-is-at-the-front ordering of "linked list zero", the linked list that all rows in a table are in.

Next, the number of fields is defined, the field names as an array of pointers to character strings and an enumerated type definition of the fields:

#define CABLE_INFO_NFIELDS 10
static CONST char *cable_info_fields[] = {
    "ip",
    "mac",
    "name",
    "address",
    "addressNumber",
    "geos",
    "i",
    "j",
    "ij",
    "extraStuff",
    (char *) NULL
};

enum cable_info_fields {
    FIELD_CABLE_INFO_IP,
    FIELD_CABLE_INFO_MAC,
    FIELD_CABLE_INFO_NAME,
    FIELD_CABLE_INFO_ADDRESS,
    FIELD_CABLE_INFO_ADDRESSNUMBER,
    FIELD_CABLE_INFO_GEOS,
    FIELD_CABLE_INFO_I,
    FIELD_CABLE_INFO_J,
    FIELD_CABLE_INFO_IJ,
    FIELD_CABLE_INFO_EXTRASTUFF
};
The types of each field are emitted as an array and whether or not fields need quoting:
enum ctable_types cable_info_types[] = {
    CTABLE_TYPE_INET,
    CTABLE_TYPE_MAC,
    CTABLE_TYPE_VARSTRING,
    CTABLE_TYPE_VARSTRING,
    CTABLE_TYPE_VARSTRING,
    CTABLE_TYPE_VARSTRING,
    CTABLE_TYPE_INT,
    CTABLE_TYPE_INT,
    CTABLE_TYPE_LONG,
    CTABLE_TYPE_TCLOBJ
};

int cable_info_needs_quoting[] = {
    0,
    0,
    1,
    1,
    1,
    1,
    0,
    0,
    0,
    1
};

A setup routine is defined that is automatically run once when the extension is loaded, for example, cable_info_setup creates some Tcl objects containing the names of all of the fields and stuff like that.

An init routine, for example, cable_info_init, is defined that will set a newly malloc'ed row to default values (Defaults can be specified for most fields. If a field does not have a default, that field's null bit is set to true.)

For efficiency's sake, we have a base copy that we initialize the first time the init routine is called and then for subsequent calls to initialize a row we merely do a structure copy to copy that base copy to the pointer to the row passed.

A delete routine is defined, for instance, cable_info_delete, that will take a pointer to the defined structure and free it. The thing here is that it has to delete any varstrings defined within the row prior to freeing the row itself.

*_find takes a pointer to the StructTable corresponding to the speed table, for instance, cable_infoStructTable and a char * containing the key to be looked up, and returns a pointer to the struct (in the example, a struct ctable_info *) containing the matching row, or NULL if none is found.

*_find_or_create takes a pointer to the StructTable, a char * containing the key to be looked up or created, and a pointer to an int. If the key is found, a pointer to its row is returned and the pointed-to int is set to zero. If it is not found, a new entry for that name is created, an instance of the structure is allocated and initialized, the pointed-to int is set to one, and the pointer to the new row is returned.

A *_obj_is_null routine is defined, for instance cable_info_obj_is_null that will return a 1 if the passed Tcl object contains a null value and zero otherwise.

*_genlist (cable_info_genlist), given a pointer to a Tcl interpreter and a pointer to a row of the corresponding structure type will generate a list of all of the fields in the table into the Tcl interpreter's result object.

*_gen_keyvalue_list does the same thing except includes the names of all the fields paired with the values.

*_gen_nonuull_keyvalue_list does the same thing as *_gen_keyvalue_list except that any null values do not have their key-value pair emitted.

*_set (cable_info_set) can be used from your own C code to set values in a row. It takes a Tcl interpreter pointer, a pointer to a Tcl object containing the value you want to set, a pointer to the corresponding structure, and a field number from the enumerated list of fields.

It handles detecting and setting the null bit as well.

*_set_fieldobj is like *_set except the field name is contained in a Tcl object and that field name is extracted and looked up from the field list to determine the field number used by *_set.

*_set_null takes a row pointer and a field number and sets the null bit for that field to true. Note there is no way to set it to false except to set a value into a field as simply clearing the bit would be an error unless some value was written into the corresponding field.

*_get fetches a field from a table entry and returns a Tcl object containing that field. It takes a pointer to the Tcl interpreter, a pointer to a row of the structure, and a field number. If the null bit is set, the null value is returned.

Even though it is returning Tcl objects, it's pretty efficient as it passes back the same null object over and over for null values and uses the correct Tcl_New*Obj for the corresponding data type, hence ints are generated with Tcl_NewIntObj, varstrings with Tcl_NewStringObj, etc.

*_get_fieldobj works like *_get except the field name is contained in the passed-in field object and looked up.

*_lappend_fieldobj and *_lappend_field_and_nameobj append the specified field from the pointed-to row and append the field name (via a continually reused name object) and value, respectively.

*_lappend_nonull_field_and_nameobj works just like *_lappend_field_and_nameobj except that it doesn't append anything when the specified field in the pointed-to row is null.

*_get_string - This is particularly useful for the C coder. It takes a pointer to an instance of the structure, a field number, a pointer to an integer, and a pointer to a Tcl object and returns a string representation of the requested field. The Tcl object is used for certain conversions and hence can be considered a reusable utility object. The length of the string returned is set into the pointed-to integer.

Example:

CONST char *cable_info_get_string (
    struct cable_info *cable_info_ptr,
    int field,
    int *lengthPtr,
    Tcl_Obj *utilityObj)
{
    ...
}

For fixed strings and varstrings, no copying is performed -- a pointer to the row's string is returned. Hence they must be considered to be constants by any of your code that retrieves them.

*_delete_all_rows - give a pointer to the StructTable for an instance, delete all the rows.

Interfacing with Speed Tables From C

The only supported way, other than modifying SpeedTables yourself, is the cfilter hook. Since the ctable row itself is a C structure you can read any ctable field by using row->fieldname in the cfilter code fragment.

See chapter 3, 5, 6, and 13 for information on how to use cfilter.