Discussion:
Virtual table optimization using xBestIndex/xFilter
Sören Brunk
2010-12-19 12:21:03 UTC
Permalink
Hi,

I'm using the SQLite virtual table mechanism to create an SQL interface
for a specialized relational database that doesn't speak SQL natively.
Thanks to Jay Kreibich's great "Using SQLite", the implementation has
been pretty straightforward so far.

Now I'm working on the xBestIndex/xFilter functions to make use of
available indexes for better performance.
In xBestIndex, each constraint contains the column on the left-hand side
(iColumn) and the operator (op). If I understood correctly, xFilter gets
the value on the right-hand side of each constraint you chose to use.
But I need to know not only that value in xFilter, but also the column
on the left-hand side and the operator.

I'm wondering if there is any way to pass that additional information to
xFilter besides encoding it into idxNum/idxStr somehow.

Thanks for your help.
Regards,
Sören
Roger Binns
2010-12-19 12:43:05 UTC
Permalink
Post by Sören Brunk
I'm wondering if there is any way to pass that additional information to
xFilter besides encoding it into idxNum/idxStr somehow.
That is the mechanism to use. Remember that internally SQLite uses only
one index, hence xBestIndex to find "the" index and xFilter not needing
much extra information to work with the selected index.

You could probably do something like have the hex address of a data
structure as the string.

Roger
Jay A. Kreibich
2010-12-19 18:24:40 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Post by Sören Brunk
I'm wondering if there is any way to pass that additional information to
xFilter besides encoding it into idxNum/idxStr somehow.
That is the mechanism to use. Remember that internally SQLite uses only
one index, hence xBestIndex to find "the" index and xFilter not needing
much extra information to work with the selected index.
The "one-index-per-table-per-query" rule of thumb is a limitation on
the B-Tree storage system and standard indexes, not SQL or SQLite
inherently.

Because virtual tables use their own storage systems (by layering
together multiple standard tables, or using some external data source),
the rule does not apply unless the VT chooses to make it apply in its
xBestIndex() function. For example, the whole point of the R-Tree
virtual table is that it can use multiple internal "indexes" to
quickly find what it is looking for.

The name of the xBestIndex() function is also a bit confusing. It is
really about processing query constraints, not indexes. It just
happens that, within SQL, the standard tool to efficiently process
constraints is to use indexes. But it is really about constraints,
and if you're dealing with a VT that might have multiple shadow tables
(possibly with multiple indexes) or is using an external data store,
there is really no say about how many "indexes" may or may not be used
to process a series of constraints within a single query upon a VT.

What it all boils down to is using constraint information to
short-cut as much data processing as possible, which is more or less
saying that every performance-related VT needs to implement its own
custom optimizer. That can get pretty complex, but that's why many
VTs tend to be target a very specific style of data storage and
query, and why the best known VTs are products like FTS and R-Tree.

-j
--
Jay A. Kreibich < J A Y @ K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
but showing it to the wrong people has the tendency to make them
feel uncomfortable." -- Angela Johnson
Jay A. Kreibich
2010-12-19 18:03:07 UTC
Permalink
Post by Sören Brunk
In xBestIndex, each constraint contains the column on the left-hand side
(iColumn) and the operator (op). If I understood correctly, xFilter gets
the value on the right-hand side of each constraint you chose to use.
Correct.
Post by Sören Brunk
But I need to know not only that value in xFilter, but also the column
on the left-hand side and the operator.
Column indexes are not passed into xFilter(). From inside xFilter(),
there is no way to match a given value to a specific column. You
have to know the order that the column values that are passed in.

That ordering is determined by your code in xBestIndex(). Inside
xBestIndex(), the values you set in
idxinfo->aConstraintUsage[x].argvIndex will determine where the
idxinfo->aConstraint[x].iColumn value is passed into xFilter().

So the two functions must agree on how and where all those values are
setup, or you must somehow convey the information via idxNum/idxStr.
Post by Sören Brunk
I'm wondering if there is any way to pass that additional information to
xFilter besides encoding it into idxNum/idxStr somehow.
Not easily.

There may be multiple tables in-flight at a given time, so any kind
of global storage outside the function parameters can get quite
tricky. By far, the easiest thing is to pack everything into those
two parameters.

xBestIndex() can be called multiple times to test multiple query
plans. This means anything you allocate inside of xBestIndex() may
get abandoned. Again, without some complex logic to deal with
external memory stores, the safest thing to do is single allocations
with sqlite3_malloc() that are passed via idxStr. If the memory is
allocated with sqlite3_malloc(), then SQLite will properly deallocate
any abandoned query plans. But the allocation must be one big chunk
(it cannot be a multi-level or "deep" data structure with pointers to
other data structures) because SQLite will only call sqlite3_free()
on the top level idxStr pointer.

As explained on page 267, the most typical usage is for "internal"
modules that are using normal SQL statements on "shadow tables" to
implement the virtual table. In that case, xBestIndex() can build
SQL statement strings, complete with indexed SQL statement
parameters, such as "?1", "?2", etc. These SQL command strings can
be passed via idxStr, and the parameter indexes can be matched to
table column indexes by setting the argvIndex numbers. Then,
xFilter() only need to prepare the given idxStr command strings and
bind argv[0] to parameter index 1, argv[1] to parameter index 2,
and so on (remember that parameter indexes start with 1), and run
the statements. In that situation, most of the actual logic happens
in xBestIndex().

This doesn't work so well with "external" modules, where you often
need to pass a great deal more state than a simple string. If you
need to pass a data structure, you can use the idxStr pointer. Just
remember that you should allocate the structure as a single call to
sqlite3_malloc(). That means keeping the data structure "flat",
either by using static sizes or hand-packing the memory.

You also can't bind every column value to its own column index via the
argvIndex (that is, make all the argv[] indexes match the column
indexes), because the argvIndex values are not allowed to have gaps.

I'd love to point you at some examples, but I'm not sure there are
any significant "external" style modules that provide solid examples
of xBestIndex() and xFilter(). External modules are typically fairly
custom in nature, and I'm not sure there is a very established design
pattern, as there is with "internal" style modules.

-j
--
Jay A. Kreibich < J A Y @ K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
but showing it to the wrong people has the tendency to make them
feel uncomfortable." -- Angela Johnson
Sören Brunk
2010-12-20 14:13:59 UTC
Permalink
Thanks Jay and Roger, for your explanations and ideas.

I did't think of passing my own data structure to xFilter using idxStr before,
but in my case that seems to be the easiest way.

In xBestIndex I'm allocating enough memory to hold information about all
usable constraints, using sqlite3_malloc. Then I'm packing a struct containing
iColumn and op into that memory for each usable constraint and I'm using
idxinfo->aConstraintUsage[x].argvIndex to get the right-hand side value for
each constraint in the same order as the constraint information in xFilter.
Finally I'm setting needToFreeIdxStr to make sure all memory is freed.
It seems to work fine that way.

Regards,
Sören

Loading...