performance of select in transactions

Discussion:

Mike Shal

2009-10-02 18:39:02 UTC

It seems to be common knowledge that running selects inside a
transaction should offer no performance benefit (for example, this
thread has a number of replies to that effect:
http://www.mail-archive.com/sqlite-users-CzDROfG0BjIdnm+***@public.gmane.org/msg41699.html).
However, I noticed a curious behavior in my application. I was running
a large number of selects on my database, and when I removed the
seemingly superfluous begin/commit statements around the selects, the
application slowed down. I was able to reproduce this using the
sqlite3 command line:

$ time sqlite3 tmpdb '.read transaction.sql' > /dev/null

real 0m2.014s
user 0m1.985s
sys 0m0.030s

$ time sqlite3 tmpdb '.read notransaction.sql' > /dev/null

real 0m2.800s
user 0m2.222s
sys 0m0.578s

$ diff -au notransaction.sql transaction.sql
--- notransaction.sql 2009-10-02 14:12:12.000000000 -0400
+++ transaction.sql 2009-10-02 14:12:12.000000000 -0400
@@ -1,3 +1,4 @@
+begin;
select * from foo where entry='A';
select * from foo where entry='a';
select * from foo where entry='aa';
@@ -49998,3 +49999,4 @@
select * from foo where entry='degraded';
select * from foo where entry='degradedly';
select * from foo where entry='degradedness';
+commit;

Granted it's not an order of magnitude style difference, but ~800ms
out of 2800ms seems like a pretty significant savings for just adding
a begin/commit around a bunch of select statements. These results were
fairly consistent with sqlite 3.6.18, 3.6.11, and 3.6.5 built from
source (I was just trying a few different versions to see if it might
have been a regression).

Does anybody know why just adding the begin/commit here improves
performance? If I have to do a large number of selects like this in my
application, should I always wrap it in a transaction?

If you want to try it out yourself, I used these scripts to create a
database of words from /usr/share/dict/words, and then generate the
sql to test:

#### create.sh (overwrites 'tmpdb' and 'create.sql' in the current directory)
#! /bin/sh

if [ $# -lt 1 ]; then
echo "Usage: $0 num" 1>&2
exit 1
fi
rm -f tmpdb; (echo 'create table foo (id integer primary key not null,
entry varchar(100), unique(entry));'; echo 'begin;'; for i in `cat
/usr/share/dict/words | head -n $1`; do echo "insert into foo(entry)
values('$i');"; done; echo 'commit;') > create.sql
echo "Create:"
time sqlite3 tmpdb '.read create.sql'
#### end create.sh

#### read.sh (overwrites 'transaction.sql' and 'notransaction.sql' in
the current directory)
#! /bin/sh

if [ $# -lt 1 ]; then
echo "Usage: $0 num" 1>&2
exit 1
fi
rm -f transaction.sql
rm -f notransaction.sql
for i in `cat /usr/share/dict/words | head -n $1`; do echo "select *
from foo where entry='$i';" >> notransaction.sql; done
(echo 'begin;'; cat notransaction.sql; echo 'commit;') > transaction.sql

sync
echo "Without transaction:"
time sqlite3 tmpdb '.read notransaction.sql' > /dev/null

echo "Transaction:"
time sqlite3 tmpdb '.read transaction.sql' > /dev/null
#### end read.sh

The parameter is the size of the table to create / number of entries to read:
$ sh create.sh 50000
$ sh read.sh 50000

Thanks,
-Mike

Pavel Ivanov

2009-10-02 18:53:14 UTC

Permalink

Post by Mike Shal
Does anybody know why just adding the begin/commit here improves
performance? If I have to do a large number of selects like this in my
application, should I always wrap it in a transaction?

This looks like some overhead of your file system. When you don't put
begin/commit around selects then every select is a different read-only
transaction. So before each select SQLite takes read lock on database
and then after executing select SQLite releases the lock. If you put
begin/commit around all selects then SQLite will take read lock only
once at the beginning and release lock once at the end. So you have
just found how long will it take to acquire/release read lock on
database 50000 times. So if your application indeed needs to do so
many selects and it needs to do it in minimum amount of time then
beginning transaction could be indeed a good solution. But be aware
that by doing this you're blocking any other instance of your
application which wants to write to database. It will not be able to
do it until you make commit in your read-only transaction. If it's
okay for you then why not.

Pavel

Post by Mike Shal
It seems to be common knowledge that running selects inside a
transaction should offer no performance benefit (for example, this
However, I noticed a curious behavior in my application. I was running
a large number of selects on my database, and when I removed the
seemingly superfluous begin/commit statements around the selects, the
application slowed down. I was able to reproduce this using the
$ time sqlite3 tmpdb '.read transaction.sql' > /dev/null
real 0m2.014s
user 0m1.985s
sys 0m0.030s
$ time sqlite3 tmpdb '.read notransaction.sql' > /dev/null
real 0m2.800s
user 0m2.222s
sys 0m0.578s
$ diff -au notransaction.sql transaction.sql
--- notransaction.sql 2009-10-02 14:12:12.000000000 -0400
+++ transaction.sql 2009-10-02 14:12:12.000000000 -0400
@@ -1,3 +1,4 @@
+begin;
select * from foo where entry='A';
select * from foo where entry='a';
select * from foo where entry='aa';
@@ -49998,3 +49999,4 @@
select * from foo where entry='degraded';
select * from foo where entry='degradedly';
select * from foo where entry='degradedness';
+commit;
Granted it's not an order of magnitude style difference, but ~800ms
out of 2800ms seems like a pretty significant savings for just adding
a begin/commit around a bunch of select statements. These results were
fairly consistent with sqlite 3.6.18, 3.6.11, and 3.6.5 built from
source (I was just trying a few different versions to see if it might
have been a regression).
Does anybody know why just adding the begin/commit here improves
performance? If I have to do a large number of selects like this in my
application, should I always wrap it in a transaction?
If you want to try it out yourself, I used these scripts to create a
database of words from /usr/share/dict/words, and then generate the
#### create.sh (overwrites 'tmpdb' and 'create.sql' in the current directory)
#! /bin/sh
if [ $# -lt 1 ]; then
echo "Usage: $0 num" 1>&2
exit 1
fi
rm -f tmpdb; (echo 'create table foo (id integer primary key not null,
entry varchar(100), unique(entry));'; echo 'begin;'; for i in `cat
/usr/share/dict/words | head -n $1`; do echo "insert into foo(entry)
values('$i');"; done; echo 'commit;') > create.sql
echo "Create:"
time sqlite3 tmpdb '.read create.sql'
#### end create.sh
#### read.sh (overwrites 'transaction.sql' and 'notransaction.sql' in
the current directory)
#! /bin/sh
if [ $# -lt 1 ]; then
echo "Usage: $0 num" 1>&2
exit 1
fi
rm -f transaction.sql
rm -f notransaction.sql
for i in `cat /usr/share/dict/words | head -n $1`; do echo "select *
from foo where entry='$i';" >> notransaction.sql; done
(echo 'begin;'; cat notransaction.sql; echo 'commit;') > transaction.sql
sync
echo "Without transaction:"
time sqlite3 tmpdb '.read notransaction.sql' > /dev/null
echo "Transaction:"
time sqlite3 tmpdb '.read transaction.sql' > /dev/null
#### end read.sh
$ sh create.sh 50000
$ sh read.sh 50000
Thanks,
-Mike
_______________________________________________
sqlite-users mailing list
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Mike Shal

2009-10-02 19:40:45 UTC

Permalink

Post by Pavel Ivanov

Ahh, ok - that makes sense. Does this locking overhead not occur on
other platforms (I've only tried linux -- gentoo and ubuntu), or if
sqlite is built differently? It seems a bit disingenuous to claim
there will be no performance gain by putting selects in a transaction,
when sqlite clearly does less work with the transaction (in the form
of not getting the read lock multiple times).

Thanks,
-Mike

Pavel Ivanov

2009-10-02 19:51:09 UTC

Permalink

Post by Mike Shal
It seems a bit disingenuous to claim
there will be no performance gain by putting selects in a transaction,
when sqlite clearly does less work with the transaction (in the form
of not getting the read lock multiple times).

It's pretty ingenuous in fact to silently assume that nobody wants to
do 50000 selects at once. Lower that to 5 (already a huge amount of
work for "one request to database" that blocks out any writers) or to
1 (the real number when somebody speaks about "difference of putting
select inside transaction") and you'll get a difference of 0.08 ms or
0.016 ms which is pretty small number to ignore when it's compared to
the time of actual reading from disk for the actual select.

Pavel

Post by Mike Shal

Post by Mike Shal
Does anybody know why just adding the begin/commit here improves

> performance? If I have to do a large number of selects like this in my
> application, should I always wrap it in a transaction?
This looks like some overhead of your file system. When you don't put
begin/commit around selects then every select is a different read-only
transaction. So before each select SQLite takes read lock on database
and then after executing select SQLite releases the lock. If you put
begin/commit around all selects then SQLite will take read lock only
once at the beginning and release lock once at the end. So you have
just found how long will it take to acquire/release read lock on
database 50000 times. So if your application indeed needs to do so
many selects and it needs to do it in minimum amount of time then
beginning transaction could be indeed a good solution. But be aware
that by doing this you're blocking any other instance of your
application which wants to write to database. It will not be able to
do it until you make commit in your read-only transaction. If it's
okay for you then why not.

Mike Shal

2009-10-02 20:20:11 UTC

Permalink

Post by Pavel Ivanov

Sorry, I was not aware of these implicit assumptions. Nor did I mean
to imply that you were being disingenuous -- you have answered in 10
minutes what took me several hours to narrow down to my original
example. I guess my usage is a bit peculiar, where the results of a
single select must be analyzed to determine what to select next, and
this can potentially cascade into many other selects, which must also
be analyzed, and so on. During this time no writes are allowed (due to
other requirements that I have), so for me there is no other
consideration as to whether or not it is in a transaction.

Thanks for your help,
-Mike

Simon Slavin

2009-10-02 21:40:24 UTC

Permalink

Post by Mike Shal
Ahh, ok - that makes sense. Does this locking overhead not occur on
other platforms (I've only tried linux -- gentoo and ubuntu), or if
sqlite is built differently?

It's to do with how the platform (OS and hard disk format) support
file locking. Those two things together determine what really happens
when SQLite uses the C call that locks a file. So even saying
'Gentoo' is too general, you want to know what format the disk your
database is on has too.

I've used SQLite for multivariate analysis. The sort of thing where
you collect many variables worth of data on lots of cases, then
analyse every possible combination of variables because you are
looking for /anything/: you didn't have a strong idea of what to look
like when you set out, you just want to know if there's anything
interesting in the data you got.

A census of 25 questions gives you 25 * 24 = 600 SELECT commands.
That the first simple step, and you're unlikely to find anything you
didn't already suspect. Then you try every combination of three
variables: 25 * 24 * 23 = 13800 SELECT commands. If you still get
nothing significant and are scared you're going to lose your funding
you try 25 * 24 * 23 * 22 = 303600. And all this data is unchanging:
might was well be on a data DVD (and data DVD drivers don't implement
file locking !).

So yes, thousands of consecutive SELECT commands is a perfectly normal
use of SQLite. And until you raised the subject it hadn't occurred to
me that it was necessary to lock on a SELECT but thinking about it,
it's obvious. Duh.

Simon.