Speed Tables - Methods

5 - Methods for Manipulating Speed Tables

This chapter enumerates all of the defined methods that are available to interact with Speed Tables, with examples.

Speedtables (Global) Functions

All Speed Tables are defined in a speedtables block. Inside this block you can have any number of common tables and helper functions.

speedtables {block}

Top level block, starts definition of a Speed Table C Extension.

ccode {block}

Raw C code passed through to the compiler.

table {block}

Structured Speed Table definition, see Chapter 3 for data types, Chapter 4 for example.

CTableBuildPath directory

The source code to the speedtable (generated C code and included .c and .h files) and the compiled object and shared library are stored in a subdirectory of the current directory at the time the Speed Table package is invoked. If this directory is not appropriate, use the CTableBuildPath procedure to change it immediately after including the speedtables package. For example:

package require speedtable
CTableBuildPath ~/.ctables

Meta-table (Class) Methods

The following methods are available as arguments to the meta-table (class) of the Speed Table:

create table_name ?master|reader ?name value?...?: Create a new table instance. The syntax for master or reader shared memory tables are described in chapter 8
package: dump the code used to define the Speed Tables Package that created the table.
info: Reserved for future use
method method_name method_proc: Register a new method into the class. When the table is called with the named method, the arguments are passed to the named proc. The proc will be invoked with the table as its first argument, followed by any method arguments.
null_value ?value?: Define or fetch the null value for a class.

Table (Object) Methods

The following built-in methods are available as arguments to each instance of a speed table:

get, set, array_get, array_get_with_nulls, exists, delete, count, foreach, type, import, import_postgres_result, export, fields, fieldtype, needs_quoting, names, reset, destroy, statistics, write_tabsep, read_tabsep, key, makekey, store, share, getprop, attach, cursors

For the examples, assume we have done a "cable_info create x"

set ?-nocomplain? key field value ?field value...?

set ?-nocomplain? key keyValueList

The key is required and unique. It can contain anything you want. It's not an element of the table, though it may appear as a field or the pseudo-field _key.

There is provision for tables to have automatically numbered keys inserted with the store method and updated with set. There may be provision in the future for the key to be a true OID, or to be any field type rather than a simple string, and also to allow more than one key. But for now this is how it works. For more than one key, you can create some kind of compound key the same way you do with Tcl hashes.

% anim_characters set fred age 37 name "Fred Flintstone" coolness 5 show "The Flintstones"

In the above example, we create a row in the anim_characters table named "fred", with age 37, coolness 5, and a name of "Fred Flintstone". All fields in the row that have not been set will be marked as null. (Also any field set with the null value will also be marked as null.)

% set values {age 37 name "Fred Flintstone" coolness 5 show "The Flintstones"}
% anim_characters set fred $list

In this example, we specify the value as a list of key-value pairs. This is a natural way to pull an array into a speed table row:

% anim_characters set fred [array get new_character]

By default it is an error to attempt to set a field in a row that does not exist. However, if -nocomplain is specified, such errors are suppressed, all matching fields are set and any keys that do not exist in the table are silently ignored. This is useful when an array contains some fields that you want to store in a speedtable row but may contain additional fields that you do not want to store but which, without -nocomplain, you'd have to remove from the array prior to invoking set.

store ?-nocomplain? field value ?field value?

store ?-nocomplain? keyValueList

Store is similar to "set", but extracts the key from the provided fields. If the table does not have a field explicitly designated as a key, then the pseudo-field "_key" is used. If the key is not present in the list, then the next autogenerated value (see read_tabsep) will be used.

Store returns the key used to store the list.

makekey field value ?field value?

makekey keyValueList

This simply calculates what the appropriate key value for the list would be.

For example, for a table where the field "ip" was a key:

x makekey {ip 10.2.3.1 name host1}

would return "10.2.3.1"

key

Returns the name of the key field specified for the table, or "_key" if none were specified.

fields

"fields" returns a list of defined fields, in the order they were defined.

% x fields
ip mac name address addressNumber geos i j ij

field field_name proplist

field field_name properties

field field_name getprop propname

"field" returns information about the values that defined the field. You can use this command to retrieve all the key-value pairs that define a field.

Since we accept (and ignore) arguments to field definitions for keys we don't recognize, you can define your own key-value pairs in field definitions inside of speed table definitions and access them using this method.

Following the name of the field should be one of the keywords getprop, properties, or proplist. properties will return the names of all of the properties as a Tcl list. proplist will return the names and values of all the properties as a Tcl list, in what we would call "array set" format. getprop will return the value associated with the key passed as an argument.

% $ctable field $fieldName proplist
default 1 name alive type boolean
% $ctable field $fieldName properties
default name type
% $ctable field $fieldName getprop default

get key ?field_name...?

Get fields. Get specified fields, or all fields if none are specified, returning them as a Tcl list.

% x get peter
127.0.0.1 {} {Peter da Silva} {} {} {} 501 {} {}
% x get peter ip name
127.0.0.1 {Peter da Silva}

array_get key ?field_name...?

Get specified fields, or all fields if none are specified, in "array get" (key-value pair) format. Note that if a field is null, it will not be fetched.

% x array_get peter
ip 127.0.0.1 name {Peter da Silva} i 501
% x array_get peter ip name mac
ip 127.0.0.1 name {Peter da Silva}

array_get_with_nulls key ?field_name...?

Get specified fields, or all fields if none are specified, in "array get" (key-value pair) format. If a field contains the null value, it is fetched anyway. (Yes this should probably be an option switch to array_get instead of its own method.)

% x array_get_with_nulls peter
ip 127.0.0.1 mac {} name {Peter da Silva} address {} addressNumber ...
% x array_get_with_nulls peter ip name mac
ip 127.0.0.1 name {Peter da Silva} mac {}

Note that if the null value has been set, that value will be returned other than the default null value of an empty Tcl object.

% cable_info null_value \\N
% x array_get_with_nulls peter
ip 127.0.0.1 mac \N name {Peter da Silva} address \N addressNumber ...
% x array_get_with_nulls peter ip name mac
ip 127.0.0.1 name {Peter da Silva} mac \N

exists key

Return 1 if the specified key exists, 0 otherwise.

% x exists peter
1
% x exists karl
0

delete key

Delete the specified row from the table. Returns 1 if the row existed, 0 if it did not.

% x delete karl
0
% x set karl
% x delete karl
1
% x delete karl
0

count

Return a count the number of rows in the table.

% x count
1

batch command_list

Take a list of speed table commands (minus the table name, as that's implicit), and invoke each element of the list as a method invocation on the current speed table.

A result list is constructed.

As each command within the batch is invoked, if the invocation is successful and no value is returned, nothing is added to the result list.

If the invocation is successful and a value is returned, a list is added to the result list containing two elements: the number of the element of the batch list and a sublist containing the Tcl result code (0) and whatever the result was that was returned.

If the invocation failed, a list is added to the result list, containing the element index, as above, but with the Tcl result code set to TCL_ERROR (1) and the result portion is the error message returned.

% x batch {{set dean age 17} {incr dean age 1} {incr brock age foo}}
{{1 {0 18}} {2 {1 {expected integer but got "foo" while converting age ...

In this example, setting Dean's age to 17 produced no result. Incrementing it returned the incremented value (18), and trying to set Brock's age to a non-integer value recorded an error.

Note that errors in batched commands do not cause batch to return an error. It is up to the caller to examine the result of the batch command to see what happened.

"batch" will return an error in the event of bad arguments passed to it, the batch list being unparseable as a list, etc.

search -option value ?-option value?...

Search for matching rows and take actions on them, with optional sorting. Search exploits indexes on fields when available, or performs a brute force search if there are no indexed fields available in the compare list. These indexes are implemented using skip lists.

The result of a search is the number of rows matched by the search, unless a -code body executes a return.

Brute-Force Search Is Brutally Fast

Search can perform brute-force multivariable searches on a speed table and take actions on matching records, without any scripting code running on an every-row basis.

On a typical Intel or AMD machine from 2006, speed table search can perform, for example, unanchored string match searches at a rate of sixteen million rows per CPU second (around 60 nanoseconds per row).

On the other hand, skip lists point to a future where there isn't any key that's external to the row -- that is, what would have been the external key would exist as a normal field in the row.

Whether you should use indexes (skiplists) or not depends on the characteristics of the table. On one of our test systems, inserting a row into the table takes about 2.3 microseconds, but a single index increases this to about 7 microseconds. On the other hand, an indexed search on that field may be O(logN) on the number of rows in the table.

Search is a powerful element of the speed tables tool that can be leveraged to do a number of the things traditionally done with database systems that incur much more overhead.

The full list of search options:

$table search \
    ?-sort {?-?field..}? ?-fields fieldList? ?-glob pattern? \
    ?-compare list? ?-filter list? ?-offset offset? ?-limit limit? \
    ?-code codeBody? ?-key keyVar? ?-get varName? \
    ?-array_get varName? ?-array_get_with_nulls varName? \
    ?-array varName? ?-array_with_nulls varName? ?-index field? \
    ?-write_tabsep channel? ?-tab string? ?-with_field_names 0|1? \
    ?-nokeys 0|1? ?-null string? \
    ?-delete 0|1? ?-buffer 0|1? ?-update {field value}? \
    ?-poll_interval interval? ?-poll_code codeBody? \
    ?-cursor name?

Search options:

-sort fieldList

Sort results based on the specified field or fields. If multiple fields are specified, they are applied in order, the first field is the primary sort field, followed by the second and so on.

If you want to sort a field in descending order, put a dash in front of the field name.

Bug: Speed tables are currently hard-coded to sort null values "high". As this is not always what one wants, an ability to specify whether nulls are to sort high or low will likely be added in the future.

-fields fieldList

Restrict search results to the specified fields.

If you have a lot of fields in your table and only need a few, using -fields to restrict retrieval to the specified fields will provide a nice performance boost.

Fields that are used for sorting and/or for comparison expressions do not need to be included in -fields in order to be examined.

-glob pattern

Perform a glob-style comparison on the key, excluding the examination of rows not matching.

-offset offset

If specified, begins actions on search results at the "offset" row found. For example, if offset is 100, the first 100 matching records are bypassed before the search action begins to be taken on matching rows.

-limit limit

If specified, limits the number of rows matched to "limit".

Even if used with -countOnly, -limit still works, so if, for example, you want to know if there are at least 10 matching records in the table but you don't care what they contain or if there are more than that many, you can search with -countOnly 1 -limit 10 and it will return 10 if there are ten or more matching rows.

-write_tabsep channel

Matching rows are written tab-separated to the file or socket (or postgresql database handle) "channel".

-tab string

Specify the separator string for write_tabsep (default "\t").

-null string

Specify the string to be used for null values for write_tabsep

-with_field_names 1

If you are doing -write_tabsep, -with_field_names 1 will cause the first line emitted to be a tab-separated list of field names.

-quote type

When performing -write_tabsep, use the quoting strategy type:

none - default
uri - URI-encoding, percent followed by two hex digits (eg %0a).
escape - backslash-escape compatible with PostgreSQL (eg \n).

-key keyVar

-get listVar

-array arrayName

-array_with_nulls arrayName

-array_get listVar

-array_get_with_nulls listVar

-code codeBody

Run scripting code on matching rows.

If -key is specified, the key value of each matching row is written into the variable specified as the argument that follows it.

If -get is specified, the fields of the matching row are written into the variable specified as the argument to -get. If -fields is specified, you get those fields in the same order. If -fields is not specified, you get all the fields in the order they were defined. If you have any question about the order of the fields, just ask the speed table with $table fields.

-array_get works like -get except that the field names and field values are written into the specified variable as a list, in a manner that array get can load into an array. I call this "array set" format. Fields that are null are not retrieved with -array_get.

-array_get_with_nulls pulls all the fields, substituting the null value (by default, an empty string) for any fields that are null.

Note it is a common bug to use -array_get in a -code loop, array set the returned list of key-value pairs into an array, and not unset the array before resuming the loop, resulting in null variables not being unset -- that is, from a previous row match, field x had a value, and in the current row, it doesn't.

If you haven't unset your array, and you "array get" the new result into the array, the previous value of x will still be there. So either unset (-nocomplain is a useful, not widely known optional argument to unset) or use array_get_with_nulls.

Better yet would be to just use -array or -array_with_nulls, both of which directly put the stuff in an array on your behalf and do the right thing with respect to null values.

-array sets field names and field values into the named array. Any fields that are null are specifically removed (unset) from the array.

Thus, if you use -array to and you with to access a field can be null, you need to check to see if the field exists (using [info exists array(fieldName)], etc) before trying to look at its value.

If you don't want to do that, consider using -array_with_nulls instead.

-array_with_nulls sets field names and field values into the named array. Any fields that are null are set into the array as the null value (by default, an empty string), as set by the null_value method of the creator table.

-compare list

Perform a comparison to select rows.

Compare expressions are specified as a list of lists. Each list consists of an operator and one or more arguments.

When the search is being performed, for each row all of the expressions are evaluated left to right and form a logical "and". That is, if any of the expressions fail, the row is skipped.

Here's an example:

$table search -compare {{> coolness 50} \
{> hipness 50}} ...

In this case you're selecting every row where coolness is greater than 50 and hipness is greater than 50.

Here are the available expressions:

{false field}

Expression compares true if field's value is false. (For booleans, false. For shorts, ints and wides, false is 0 and anything else is true.)

{true field}

Expression compares true if field is true.

{null field}

Expression compares true if field is null.

{notnull field}

Expression compares true if field is not null.

{< field value}

Expression compares true if field less than value. This works with both strings and numbers, and yes, compares the numbers as numbers and not strings.

{<= field value}

Expression compares true if field is less than or equal to value.

{= field value}

Expression compares true if field is equal to value.

{!= field value}

Expression compares true if field is not equal to value.

{>= field value}

Expression compares true if field is greater than or equal to value.

{> field value}

Expression compares true if field is greater than value.

{match field expression}

Expression compares true if field matches glob expression. Case is insensitive.

{match_case field expression}

Expression compares true if field matches glob expression, case-sensitive.

{notmatch field expression}

Expression compares true if field does not match glob expression. Case is insensitive.

{notmatch_case field expression}

Expression compares true if field does not match glob expression, case-sensitive.

{range field low hi}

Expression compares true if field is within the range of low <= field < hi.

{in field valueList}

Expression compares true if the field's value appears in the value list.

The "in" search expression has very high performance, in particular with client-server ctables, as it is much faster to go find many rows in one query than to repeatedly cause a TCP/IP command/response roundtrip on a per-row basis.

-filter list

Filter the search results through a C filter function defined in the speedtable definition. The parameter is a list of name-value pairs, the name being the name of the filter, and the value being the filter value passed to the filter function. If a filter function needs more than one parameter, they will be passed as a single list. If the filter function needs no parameters, pass an empty list.

Filters are distinct from comparison expressions because no field is required... the filter operates on the entire row and can perform complex multi-field operations in one call. The fields relevant to the filter are determined by the filter's code.

ctable point {
  varstring id
  double    latitude
  double    longitude
  bool      is_valid
  ...
}

$points search -filter [list [list distance [concat $fieldLatLong 150]]] -code { ... }

This example uses a distance filter (see examples) that selects all valid points within a certain range of an airfield, implicitly using two fields in the ctable

-index fieldName

Use the index on the named field to search. Note that this is just a hint to the query optimizer.

-buffer 1

Modifying the index being followed from the code body of a search can have undefined behavior. You may miss entries, visit entries twice, or possibly even crash the program. to avoid this, you can use a buffered search. The -buffer option transactionalizes the operation, at the cost of allocating a temporary internal array for the matched rows.

The -delete and -update options are automatically transactionalized when necessary. The -sort option uses the transaction buffer for sorting, so there is no additional overhead for -buffer on a sorted search unless the sort is optimized out by the query optimizer. Searches on shared memory tables are always buffered.

Note: It is not safe to delete rows from the code body of a search, whether buffered or not, because only the index being traversed is buffered, the row itself is freed immediately and removed from other indexes by the delete operation. Use the -delete option instead. Attempting to perform a delete inside the code body of a search on the same table is treated as an error.

Bug: Indexes should be checked and the indexed field should be locked, so that modifying it unbuffered is defined as an error.

-delete 1

Delete the matched records, safely.

-update {field value}

Update the named field to a new value.

-poll_interval interval

Perform an update operation every interval rows, to allow background processing to work in the background while a long search is going on.

-poll_code codeBody

Perform the specified code every -poll_interval rows. Errors from the code will be handled by the bgerror mechanism. If no poll interval is specified then a default (1024) is used.

-countOnly 1

countOnly is deprecated, it only exists for legacy reasons.

-cursor name

-cursor #auto

Create a named cursor. This can not be combined with any other active commands, it creates a new object that holds the results of the search so it can be processed separately from the search command. The restun value of the search will be the name of the cursor rather than a count of matched rows.

Cursor commands:

$cursor destroy: Delete the named cursor.
$cursor reset: Position the cursor at the beginning of the search results.
$cursor next: Step to the next result in the list. Returns the index of the result or -1 if past the end of the results.
$cursor index: Returns the index of the result or -1 if past the end of the results.
$cursor get ?field?...: Return the current row, or the fields names from the current row, as a list.
$cursor array_get ?field?...: Return the current row, or the fields names from the current row, as a key value list suitable for passing to array_set. Null values are omitted from the list
$cursor array_get_with_nulls ?field?...: Return the current row, or the fields names from the current row, as a key value list suitable for passing to array_set Null values are returns as empty strings.
$cursor set ?field value?...: Set the named fields in the current row to the provided values.
$cursor count: Return the number of rows in the cursor.
$cursor at_end: Return true if the cursor is at the end of the results.
$cursor key: Return the key for the current row.

Note that the cursor results are buffered. Multiple cursors can exist and it is safe to change the results of cursors while other cursors exist, but the results may be surprising so it's not recommended. It is not possible to delete rows from the speedtable while cursors are in use.

Search Examples:

Write everything in the table tab-separated to channel $channel

$table search -write_tabsep $channel

Write everything in the table with coolness > 50 and hipness > 50:

$table search -write_tabsep $channel \
  -compare {{> coolness 50} {> hipness 50}}

Create some code to populate a Tcl array for every record in the table matching above:

set fp [open oldschool.tcl w]
$table search \
  -compare {{> coolness 50} {> hipness 50}} \
  -key key -array_get data -code {
      puts $fp [concat [list array set array($key)] $data]
  }
close $fp

cursors

Return a list of all cursors open on the speedtable.

cursors destroy

Destroy all cursors open on the speedtable.

incr

Increment the specified numeric values, returning a list of the new incremented values

% x incr $key a 4 b 5

...will increment $key's a field by 4 and b field by 5, returning a list containing the new incremented values of a and b.

type

Return the "type" of the object, i.e. the name of the meta-table (class) that created it.

% x type
cable_info

import_postgres_result handle ?-nokeys? ?-nocomplain? ?-poll_interval count? ?-poll_code code? ?-foreground? ?-info arrayName? ?-dirty 0|1?

Given a Pgtcl result handle, import_postgresql_result will iterate over all of the result rows and create corresponding rows in the table, matching the SQL column names to the field names.

the "-info arrayname" option can be used to return additional informationfrom the request. Currently the only value provided is arrayName(numTuples).

If the "-nocomplain" option is specified unknown columns in the result will be ignored.

If the "-nokeys" option is specified the key is derived from the key column specified for the table, or autogenerated as described in read_tabsep.

If the "-dirty boolean" option is specified then the defauult dirty flag is set to the provided value. If true, rows are marked dirty even if nothing was changed. This is useful, for example, for tracking stale rows.

If the "poll_interval" option is provided, it will call update every count rows to keep the event loop alive

If the "poll_code" option is provided, it will call the specified code block instead of calling update

Normally the poll code is treated as background processing errors are logged and do not interrupt the import, and breaks and returns are ignored. If the "foreground" option is present, then you can break out of the import in the poll code using "break", and errors will terminate the import.

This is extremely fast as it does not do any intermediate Tcl evaluation on a per-row basis.

The result handle can come from any Pgtcl source, such as pg_exec. You use pg_result to check if the request was successful.

set res [pg_exec $connection "select * from mytable"]
if {[pg_result $res -status] != "PGRES_RESULT_OK"} {
	... error handling ...
} else {
	# pg_result $res -numTuples contains the number of rows imported
	x import_postgres_result $res
}
pg_result $res -clear

Importing PostgreSQL Results Is Pretty Fast

On a 2 GHz AMD64 we are able to import about 200,000 10-element rows per CPU second, i.e. around 5 microseconds per row. Importing goes more slowly if one or more fields of the speed table has had an index created for it.

import_postgres_result connection -rowbyrow ?-nokeys? ?-nocomplain? ?-poll_interval count? ?-poll_code code? ?-foreground? ?-info arrayName?

For large imports, where perfomance is critical, the "rowbyrow" option can be provided. What this does is allow "import_postgres_result" to collect rows directly from PostgreSQL, rather than having them buffered into a complete atomic result by the PgSQL library. Note that this requires careful error handling, because it's theoretically possible for some rows to be returned and included in the table before the error is discovered.

In this case the data source is a Pgtcl connection token, rather than a result, because the PgSQL "row by row" option must be set on the connection immediately after the request before any calls are made, and then it generates multiple request objects.

Typical use of this would be:

pg_sendquery $connnection "select * from mytable"
if [catch {x import_postgres_result -rowbyrow $connnection -info stat} err] {
	... error handling ...
	# Check for stat(numTuples), because if it's present and non-zero then you have actually
	# recieved data despite the error!
} else {
	... normal code ...
	# at this point stat(numTuples) contains the number of rows imported
}
# There's no pg_result to clear because it's been consumed by import_postgres_result

We have achieved an overall 25% speedup in an application (including the other work the application is doing, the actual performance improvement of the import itself is greater) using this approach, because it allows for greater concurrency. You have to balance this performance advantage against the inherent complexity of this approach.

fieldtype fieldName

Return the datatype of the named field.

foreach field [x fields] {
 puts "$field type is [x fieldtype $field]"
 }
ip type is inet
mac type is mac
name type is varstring
address type is varstring
addressNumber type is varstring
geos type is varstring
i type is int
j type is int
ij type is long

needs_quoting fieldName

Given a field name, return 1 if it might need quoting. For example, varstrings and strings may need quoting as they can contain any characters, while integers, floats, IP addresses, MAC addresses, etc, do not, as their contents are predictable and their input routines do not accept tabs.

names

Return a list of all of the keys in the table. This is fine for small tables but can be inefficient for large tables as it generates a list containing each key, so a 650K table will generate a list containing 650K elements -- in such a case we recommend that you use search instead.

This should probably be deprecated.

null key

Return a list of the fields for the row with the key key that are null.

% x null 1337
foo

null key field ?field?...

Set the named fields for the indicated row to null

% x null 1337 baz
% x null 1337
foo baz

isnull key field ?field?...

Test if the field(s) for the indicated row are null.

% x isnull 1337 foo bar
1 0

reset

Clear everything out of the table. This deletes all of the rows in the table, freeing all memory allocated for the rows, the rows' hashtable entries, indexes, etc.... returns the table to the state it was created in. Note that you will need to recreate all the indexes on the table.

% x count
652343
% x reset
% x count
0

destroy

Delete all the rows in the table, free all of the memory, and destroy the object.

% x destroy
% x asdf
 invalid command name "x"

getprop ?name?...

Extract certain static information about a speedtable. If no names a provided a complete name-value list of the properties will be returned, otherwise a list of the named properties will be returned.

type - the meta-table or class of the table.
extension - the package name for loading the table.
key - the name of the key field of the table.
quote - a list of quoting strategies supported by the table.

attach

Share accesses information about the underlying shared memory associated with a shared memory table Attach creates an attachment for a shared reader table in a shared master table. Returns a set of create parameters to use to complete the attachment. See chapter 8 for more information on these commands.

statistics

Report information about the hash table such as the number of entries, number of buckets, bucket utilization, etc. It's fairly useless, but can give you a sense that the hash table code is pretty good.

% x statistics
1000000 entries in table, 1048576 buckets
number of buckets with 0 entries: 407387
number of buckets with 1 entries: 381489
number of buckets with 2 entries: 182642
number of buckets with 3 entries: 59092
number of buckets with 4 entries: 14490
number of buckets with 5 entries: 2944
number of buckets with 6 entries: 462
number of buckets with 7 entries: 63
number of buckets with 8 entries: 6
number of buckets with 9 entries: 0
number of buckets with 10 or more entries: 1
average search distance for entry: 1.5

write_tabsep channel ?-option?... ?fieldName?...

Deprecated: use search -write_tabsep.

Options are: -quote type, -glob pattern, -nokeys, -with_field_names, -tab string, and -null string.

Write the table tab-separated to a channel, with the names of desired fields specified, else all fields if none are specified.

set fp [open /tmp/output.tsv w]
x write_tabsep $fp
close $fp

If the glob pattern is specified and the key of a row does not match the glob pattern, the row is not written.

The first field written will be the key, unless -nokeys is specified and the key value is not written to the destination.

If -with_field_names is specified, then the names of the fields will be the first row output.

If -tab is specified then the string provided will be used as the tab.

If -null is specified then the string provided will be used as the null.

The -quote type options are compatible with search -write_tabsep

For compatibility, the default quoting for tabs and newlines inside a field is "none". It is recommended that -quote escape or -quote uri is used for writing and reading tab-separated files.

read_tabsep channel ?-option?... ?fieldName?...

Options are: -glob pattern, -nokeys, -with_field_names, -tab string, -quote type, -with_nulls -null string, -nocomplain, -dirty, -skip pattern, -term pattern, -poll_interval count, -poll_code code, and -foreground.

Read tab-separated entries from a channel, with a list of fields specified, or all fields if none are specified.

set fp [open /tmp/output.tsv r]
x read_tabsep $fp
close $fp

The first column is normally expected to be the key, unless -nokeys is specified, or the key pseudo-field (_key or a "key" type) is in the field list. If you name five fields, then, each row in the input file (or socket or whatever) should contain six elements.

It's an error if the number of fields read doesn't match the number expected.

If the -glob pattern is defined, it's applied to the key and if it doesn't match, the row is not inserted.

If -tab string is specified, then the string provided will be used as the tab separator. There is no explicit limit on the length of the string, so you can use something like -tab {%JULIE@ANDREWS%} with read_tabsep and write_tabsep (or search -write_tabsep) to reduce the possibility of a conflict.

if -skip pattern is specified, then lines matching that pattern are ignored. This is sometimes necessary for files containing comments.

If -with_field_names is specified, the first row read is expected to be a tab-separated list of field names, and Speedtables will read that line and use its contents to determine which fields each of the following lines of tab-separated values will be stored as. (This is the counterpart to the -with_field_names argument to speedtables's search method when invoked with the -write_tabsep option.)

If -nokeys is specified, the first field of each row is not used as the key -- rather, the key is taken from the provided fields (as if makekey was called for each row), and if there is no key it is automatically created as an ascending integer starting from 0. The last key generated will be returned as the value of read_tabsep.

If you subsequently do another read_tabsep with -nokeys specified, the auto key will continue from where it left off. If you invoke the table's reset method, the auto key will reset to zero.

If you later want to insert at the end of the table, you need to use store rather than set.

read_tabsep stops when it reaches end of file OR when it reads an empty line: You can indicate end of input with an empty line and then do something else with the data that follows. Since you must have a key and at least one field, this is safe. However it might not be safe with -nokeys.

With -term pattern then read_tabsep will instead terminate on a line that matches pattern. For example, to emulate SQL "COPY FROM stdin" you could use -term "\\.".

With -nocomplain then the file being read is allowed to specify columns that do not match fields

Blank or missing fields literally insert empty strings unless -with_nulls is set, indicating that the file was created with null values.

With -dirty then rows read from tabsep are marked dirty even if nothing was changed. This is useful, for example, for tracking stale rows.

The -quote type options are compatible with search -write_tabsep

For compatibility, the default quoting for tabs and newlines inside a field is "none". It is recommended that -quote escape or -quote uri is used for writing and reading tab-separated files.

The -null string allows you to specify the string expected to be seen for a null. The null string must be an entire field, and it is evaluated before quoting is applied.

With -poll_interval count, it will call update every count rows to keep the event loop alive

With -poll_code code, it will call the specified code block instead of calling update. This code is background code, errors are logged and do not interrupt the import, and break and return is ignored.

the -foreground option is present, then you can break out of the import in the poll code using break, and errors will terminate the import.

index create fieldName

index drop fieldName

index count fieldName

index span fieldName

index dump fieldName

index indexable

index indexed

Index is used to create skip list indexes on fields in a table, which can be used to greatly speed up certain types of searches.

x index create foo 24

...creates a skip list index on field "foo" and sets it to for an optimal size of 2^24 rows. The size value is optional. (How this works will be improved/altered in a subsequent release.) It will index all existing rows in the table and any future rows that are added. Also if a set, read_tabsep, etc, causes a row's indexed value to change, its index will be updated.

If there is already an index present on that field, does nothing.

x index drop foo

....drops the skip list on field "foo." if there is no such index, does nothing.

x index dump foo

...dumps the skip list for field "foo". This can be useful to help understand how they work and possibly to look for problems.

x index count foo

...returns a count of the skip list for field "foo". This number should always match the row count of the table (x count). If it doesn't, there's a bug in index handling.

x index span foo

...returns a list containing the lexically lowest entry and the lexically highest entry in the index. If there are no rows in the table, an empty list is returned.

x index indexable

...returns a (potentially empty) list of all of the field names that can have indexes created for them. Fields must be explicitly defined as indexable when the field is created with indexed 1 arguments. (This keeps us from incurring a lot of overhead creating various things to be ready to index any field for fields that just couldn't ever reasonably be used as an index anyway.

x index indexed

...returns a (potentially empty) list of all of the field names in table x that current have an index in existence for them, meaning that index create has been invoked on that field.

Back Index Next