Discussion:
[sqlite] Fastest way to search json array values?
Wout Mertens
2016-11-08 19:44:49 UTC
Permalink
I'm using the JSON1 extension, and I want to find items by json array
contents.

This means a query like `*SELECT foo.json FROM foo, json_each(foo.json)
WHERE json_each.value = "bar";*`, so basically scanning the entire table.

I understand that virtual tables can not be indexed, so what other
approaches would be possible here to prevent such a slow operation? There
doesn't seem to be a way to generate multiple values for a single row in an
index…

Wout.
Clemens Ladisch
2016-11-09 10:42:45 UTC
Permalink
Post by Wout Mertens
I'm using the JSON1 extension, and I want to find items by json array
contents.
This means a query like `SELECT foo.json FROM foo, json_each(foo.json)
WHERE json_each.value = "bar";`, so basically scanning the entire table.
I understand that virtual tables can not be indexed, so what other
approaches would be possible here to prevent such a slow operation?
Indexes (even expression indexes) require a 'real' table.

You'd have to put the results of json_each() into a separate table that
can then be indexed.

Or extend the JSON1 extension to do this behind the scenes.


Regards,
Clemens
Wout Mertens
2016-11-09 11:04:51 UTC
Permalink
Post by Clemens Ladisch
Post by Wout Mertens
I'm using the JSON1 extension, and I want to find items by json array
contents.
This means a query like `SELECT foo.json FROM foo, json_each(foo.json)
WHERE json_each.value = "bar";`, so basically scanning the entire table.
I understand that virtual tables can not be indexed, so what other
approaches would be possible here to prevent such a slow operation?
Indexes (even expression indexes) require a 'real' table.
You'd have to put the results of json_each() into a separate table that
can then be indexed.
I assume I can use some sort of trigger setup to automatically update that
derived table when a row gets inserted, deleted or updated? Any pointers on
how to do that?

Another approach I thought of is to assume I will only have a certain
number of array items (let's say 5), generate an "where is not null"
expression index for each of them, and then query them with 5 OR clauses.
Would that use more or less resources?

Finally, I suppose I could use LIKE on the json field to see if it includes
the string, and then further filter from there. However, creating an index
on the json field does not seem to help with LIKE.

Or extend the JSON1 extension to do this behind the scenes.


I'm going to leave that option on the shelf for now :)

Wout.
Clemens Ladisch
2016-11-09 11:20:14 UTC
Permalink
Post by Wout Mertens
I assume I can use some sort of trigger setup to automatically update that
derived table when a row gets inserted, deleted or updated? Any pointers on
how to do that?
Something like this:
http://www.sqlite.org/fts5.html#external_content_tables
Post by Wout Mertens
Another approach I thought of is to assume I will only have a certain
number of array items (let's say 5), generate an "where is not null"
expression index for each of them, and then query them with 5 OR clauses.
Would that use more or less resources?
Check with EXPLAIN QUERY PLAN whether the indexes are used at all.
It's possible you'll have to use UNION instead.


Regards,
Clemens
Wout Mertens
2016-11-09 13:23:59 UTC
Permalink
Post by Clemens Ladisch
Post by Wout Mertens
Another approach I thought of is to assume I will only have a certain
number of array items (let's say 5), generate an "where is not null"
expression index for each of them, and then query them with 5 OR clauses.
Would that use more or less resources?
Check with EXPLAIN QUERY PLAN whether the indexes are used at all.
It's possible you'll have to use UNION instead.
No:
sqlite> create table t(json JSON);
sqlite> create index foo on t(json);
sqlite> explain query plan select * from t where json = '5';
0|0|0|SEARCH TABLE t USING COVERING INDEX foo (json=?)
sqlite> explain query plan select * from t where json like '5';
0|0|0|SCAN TABLE t

Thanks for the FTS5 link!

Wout
Jens Alfke
2016-11-09 21:26:15 UTC
Permalink
Post by Wout Mertens
Post by Clemens Ladisch
Indexes (even expression indexes) require a 'real' table.
You'd have to put the results of json_each() into a separate table that
can then be indexed.
I assume I can use some sort of trigger setup to automatically update that
derived table when a row gets inserted, deleted or updated? Any pointers on
how to do that?
I’ve done similar things, in implementing map/reduce on top of SQLite, but I don’t think the approach I used will help you because it relied on some specifics of the schema I was using, and it also wasn’t as fast as it could have been. I need to revisit it soon, as I’m rewriting that same code.

The trigger approach looks promising. One complication, that doesn’t show up in that snippet on the FTS5 page, is that a row in the source table can produce multiple rows in the derived table. So the derived table would need a foreign key referencing the rowid in the source table, so that when a source row is updated or deleted all the derived rows can be replaced.

(If you get this working, I’d be interested to see the SQL code for it.)

—Jens

Loading...