Discussion:
performance regression: sqlite-3.8.4.3 is not using an automatic covering index when joining with a where condition
Hinrichsen, John
2014-05-07 20:51:24 UTC
Permalink
$ sqlite3
SQLite version 3.7.17 2013-05-20 00:56:22
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> CREATE TABLE x AS SELECT 1 AS a, 1 AS b;
sqlite> CREATE INDEX ix ON x (a);
sqlite> CREATE TABLE y AS SELECT 1 AS b;
sqlite> EXPLAIN QUERY PLAN SELECT * FROM x INNER JOIN y ON x.b=y.b;
0|0|0|SCAN TABLE x (~1000000 rows)
0|1|1|SEARCH TABLE y USING AUTOMATIC COVERING INDEX (b=?) (~7 rows)
sqlite> EXPLAIN QUERY PLAN SELECT * FROM x INNER JOIN y ON x.b=y.b WHERE
x.a = 1;
0|0|0|SEARCH TABLE x USING INDEX ix (a=?) (~10 rows)
0|1|1|SEARCH TABLE y USING AUTOMATIC COVERING INDEX (b=?) (~7 rows)
sqlite>

$ sqlite3
SQLite version 3.8.4.3 2014-04-03 16:53:12
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> CREATE TABLE x AS SELECT 1 AS a, 1 AS b;
sqlite> CREATE INDEX ix ON x (a);
sqlite> CREATE TABLE y AS SELECT 1 AS b;
sqlite> EXPLAIN QUERY PLAN SELECT * FROM x INNER JOIN y ON x.b=y.b;
0|0|0|SCAN TABLE x
0|1|1|SEARCH TABLE y USING AUTOMATIC COVERING INDEX (b=?)
sqlite> EXPLAIN QUERY PLAN SELECT * FROM x INNER JOIN y ON x.b=y.b WHERE
x.a = 1;
0|0|0|SEARCH TABLE x USING INDEX ix (a=?)
0|1|1|SCAN TABLE y
sqlite>
--
This message contains confidential information and is intended only for the
individual named. If you are not the named addressee, you should not
disseminate, distribute, alter or copy this e-mail. Please notify the
sender immediately by e-mail if you have received this e-mail by mistake
and delete this e-mail from your system. E-mail transmissions cannot be
guaranteed to be secure or without error as information could be
intercepted, corrupted, lost, destroyed, or arrive late or incomplete. The
sender, therefore, does not accept liability for any errors or omissions in
the contents of this message which arise during or as a result of e-mail
transmission. If verification is required, please request a hard-copy
version. This message is provided for information purposes and should not
be construed as a solicitation or offer to buy or sell any securities or
related financial instruments in any jurisdiction.
Richard Hipp
2014-05-07 21:21:07 UTC
Permalink
Post by Hinrichsen, John
$ sqlite3
SQLite version 3.7.17 2013-05-20 00:56:22
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> CREATE TABLE x AS SELECT 1 AS a, 1 AS b;
sqlite> CREATE INDEX ix ON x (a);
sqlite> CREATE TABLE y AS SELECT 1 AS b;
sqlite> EXPLAIN QUERY PLAN SELECT * FROM x INNER JOIN y ON x.b=y.b;
0|0|0|SCAN TABLE x (~1000000 rows)
0|1|1|SEARCH TABLE y USING AUTOMATIC COVERING INDEX (b=?) (~7 rows)
sqlite> EXPLAIN QUERY PLAN SELECT * FROM x INNER JOIN y ON x.b=y.b WHERE
x.a = 1;
0|0|0|SEARCH TABLE x USING INDEX ix (a=?) (~10 rows)
0|1|1|SEARCH TABLE y USING AUTOMATIC COVERING INDEX (b=?) (~7 rows)
sqlite>
$ sqlite3
SQLite version 3.8.4.3 2014-04-03 16:53:12
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> CREATE TABLE x AS SELECT 1 AS a, 1 AS b;
sqlite> CREATE INDEX ix ON x (a);
sqlite> CREATE TABLE y AS SELECT 1 AS b;
sqlite> EXPLAIN QUERY PLAN SELECT * FROM x INNER JOIN y ON x.b=y.b;
0|0|0|SCAN TABLE x
0|1|1|SEARCH TABLE y USING AUTOMATIC COVERING INDEX (b=?)
sqlite> EXPLAIN QUERY PLAN SELECT * FROM x INNER JOIN y ON x.b=y.b WHERE
x.a = 1;
0|0|0|SEARCH TABLE x USING INDEX ix (a=?)
0|1|1|SCAN TABLE y
sqlite>
Do you have a database file where the 3.8.4.3 query plan really is slower?
Can you please run ANALYZE on that database and send us the content of the
"sqlite_stat1" table?
--
D. Richard Hipp
drh-CzDROfG0BjIdnm+***@public.gmane.org
Hinrichsen, John
2014-05-07 22:58:00 UTC
Permalink
Post by Richard Hipp
Do you have a database file where the 3.8.4.3 query plan really is slower?
Can you please run ANALYZE on that database and send us the content of the
"sqlite_stat1" table?
It is true that if we add the analyze, the query does use the automatic
covering index. The analyze wasn't necessary with sqlite-3.7.17.

The following will demonstrate the performance regression:

CREATE TABLE x AS WITH RECURSIVE t(n) AS ( VALUES (1) UNION ALL SELECT n+1
FROM t WHERE n < 50000 ) SELECT 1 AS a, n AS b FROM t;
CREATE TABLE y AS SELECT b FROM x;
CREATE INDEX ix ON x(a);
SELECT COUNT(*) FROM (SELECT * FROM x INNER JOIN y ON x.b=y.b WHERE x.a=1);

Although you can't execute the first statement under sqlite-3.7.17, you can
save the db after creating it.
--
This message contains confidential information and is intended only for the
individual named. If you are not the named addressee, you should not
disseminate, distribute, alter or copy this e-mail. Please notify the
sender immediately by e-mail if you have received this e-mail by mistake
and delete this e-mail from your system. E-mail transmissions cannot be
guaranteed to be secure or without error as information could be
intercepted, corrupted, lost, destroyed, or arrive late or incomplete. The
sender, therefore, does not accept liability for any errors or omissions in
the contents of this message which arise during or as a result of e-mail
transmission. If verification is required, please request a hard-copy
version. This message is provided for information purposes and should not
be construed as a solicitation or offer to buy or sell any securities or
related financial instruments in any jurisdiction.
Richard Hipp
2014-05-08 00:30:30 UTC
Permalink
Post by Richard Hipp
Post by Richard Hipp
Do you have a database file where the 3.8.4.3 query plan really is
slower?
Post by Richard Hipp
Can you please run ANALYZE on that database and send us the content of
the
Post by Richard Hipp
"sqlite_stat1" table?
It is true that if we add the analyze, the query does use the automatic
covering index. The analyze wasn't necessary with sqlite-3.7.17.
The query planner in 3.7.17 was not nearly as clever as the 3.8.0+ query
planner. It got the right answer given wrong information by dumb luck.
See http://www.sqlite.org/queryplanner-ng.html and especially
http://www.sqlite.org/queryplanner-ng.html#howtofix for further information.

Also, it is generally considered good practice to create sufficient indices
to avoid having to use an automatic index. Using an automatic index will
make a two-way join O(NlogN). That's better than the O(N*N) that would
occur without the automatic index, but you could have O(logN) if an
appropriate persistent index is available. I know that there may arise
cases where the query is sufficiently infrequent and the size of the
necessary index is sufficiently large, that you may want to deliberately
make use of a transient automatic index. But those cases are rare. SQLite
comes with instrumentation (specifically the SQLITE_STMTSTATUS_AUTOINDEX
verb for sqlite3_stmt_status() -
http://www.sqlite.org/c3ref/stmt_status.html) that can be used to detect
when automatic indices are used and alert the developer through a back
channel to this fact so that she can fix the problem with an appropriate
CREATE INDEX. In other words, SQLite provides you with the tools to help
you detect and eliminate the use of automatic indices. Just saying....
--
D. Richard Hipp
drh-CzDROfG0BjIdnm+***@public.gmane.org
Hinrichsen, John
2014-05-08 18:48:51 UTC
Permalink
These are all good points.

Have you considered implementing hash joins for tables that join on columns
that are not indexed? Typical hash joins (using the equality operator) can
be performed in O(N) time without indexes. Because hash joins evaluate
each row just once, they might also permit us to make calls to scalar
functions more efficiently within the context of the join.
Post by Richard Hipp
Post by Richard Hipp
Post by Richard Hipp
Do you have a database file where the 3.8.4.3 query plan really is
slower?
Post by Richard Hipp
Can you please run ANALYZE on that database and send us the content of
the
Post by Richard Hipp
"sqlite_stat1" table?
It is true that if we add the analyze, the query does use the automatic
covering index. The analyze wasn't necessary with sqlite-3.7.17.
The query planner in 3.7.17 was not nearly as clever as the 3.8.0+ query
planner. It got the right answer given wrong information by dumb luck.
See http://www.sqlite.org/queryplanner-ng.html and especially
http://www.sqlite.org/queryplanner-ng.html#howtofix for further information.
Also, it is generally considered good practice to create sufficient indices
to avoid having to use an automatic index. Using an automatic index will
make a two-way join O(NlogN). That's better than the O(N*N) that would
occur without the automatic index, but you could have O(logN) if an
appropriate persistent index is available. I know that there may arise
cases where the query is sufficiently infrequent and the size of the
necessary index is sufficiently large, that you may want to deliberately
make use of a transient automatic index. But those cases are rare. SQLite
comes with instrumentation (specifically the SQLITE_STMTSTATUS_AUTOINDEX
verb for sqlite3_stmt_status() -
http://www.sqlite.org/c3ref/stmt_status.html) that can be used to detect
when automatic indices are used and alert the developer through a back
channel to this fact so that she can fix the problem with an appropriate
CREATE INDEX. In other words, SQLite provides you with the tools to help
you detect and eliminate the use of automatic indices. Just saying....
--
D. Richard Hipp
_______________________________________________
sqlite-users mailing list
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
--
This message contains confidential information and is intended only for the
individual named. If you are not the named addressee, you should not
disseminate, distribute, alter or copy this e-mail. Please notify the
sender immediately by e-mail if you have received this e-mail by mistake
and delete this e-mail from your system. E-mail transmissions cannot be
guaranteed to be secure or without error as information could be
intercepted, corrupted, lost, destroyed, or arrive late or incomplete. The
sender, therefore, does not accept liability for any errors or omissions in
the contents of this message which arise during or as a result of e-mail
transmission. If verification is required, please request a hard-copy
version. This message is provided for information purposes and should not
be construed as a solicitation or offer to buy or sell any securities or
related financial instruments in any jurisdiction.
Loading...