6 - Special Notes On How To Make Searches Go Fast

This chapter explains how to make Speed Table searches go as fast as possible.

An example of brute force searching that there isn't much getting around without adding fancy full-text search is unanchored text search. Even in this case, with our fast string search algorithm and quick traversal during brute-force search, we're seeing 60 nanoseconds per row or searching about sixteen million rows per CPU second on circa-2006 AMD64 machines.

Although many optimizations are being performed by the speed table compiler, further performance improvements can be made without introducing huge new complexities, perturbations, etc.

If you need to search for ranges of things, partial matches, straight equality of a field other than the key field, etc, you can use indexes and the "range", "=", and "in" compare functions to obtain huge search performance improvements over brute force, if the table has an index created on that field using $speedtable index create $fieldName, or the search involves an exact match for the key. Indexes may also be used for some relative and case-sensitive matches. For example:

# make sure we have an index
anim_characters index create name

# fast search on name index
anim_characters search -compare {{match_case name "Space*"}} ...

# brute-force search - the index is on the actual string, so it's not used in
# case-insensitive match.
anim_characters search -compare {{match name "Space*"}} ...

# brute-force search - the index can only be used 
anim_characters search -compare {{match_case name "*Ghost"}} ...

If you need to do a case-independent search on the table, create a new field that has the "search name" in it. To enforce this, create a method for setting the name:

proc set_name {table key name} {
    $table set $key name $name search_name [string tolower $name]
}
[anim_characters type] method name set_name

// ...

anim_characters name fred "Fred Flintstone"

You can also improve search performance by using the -index option to search to bias the query optimizer to prefer a specific index to scan

Speed Table Query Optimizer

The Speed Table Query Optimizer is integrated in search. When performing a search the order of operrations in the "-compare" list is treated as a hint. The "best" field in the query is used as the index, in this order:

"in" has the highest priority, and works for both the key and indexes.

"=" has the next highest priority, and works for both the key and indexes.

"range" on indexes come next.

Anchored "match" on indexed strings come next.

"<", "<=", or ">=" on indexes come next.

">" on indexes comes after these

All other searches are last priority.

In an ordered search, an ascending sort field gets chosen when this makes it possible to avoid manually sorting the results after finding them. This may save allocating a transaction buffer, and avoids a separate sort operation after the search is complete.

Any field specified in the -index search option is tried first, then the compare list is checked in order, so if you have multiple fields that might be used as an index put the ones that will have the most effect on reducing the size of the search first.

# Most characters have coolness of at least 5, but few are younger than 10,
# so check age first
set fp [open cool_kids.tsv w]
anim_characters search -compare {{< age 10} {>= coolness 5}} -write_tabsep $fp
close $fp
Native C Search Filters

Finally, if all else fails, you can define a "C" language filter that can perform native tests on a whole row.

   cfilter latorlong args {double target} code {
     if(row->latitude == target || row->longitude == target) return TCL_OK;
     return TCL_CONTINUE;
   }

...

t search -filter [list latorlong $cord] -array_get a -code { puts $a }