sqlite in-memory database far too slow in my use case

Discussion:

sqlite in-memory database far too slow in my use case

Ron Arts

2009-10-10 17:24:33 UTC

Hi,

I am building a libevent based application that must be
able to handle tens of thousands requests per second.

Each request needs multiple database lookups. Almost
all requests do the lookups on the primary key of the tables
only. So far I have been using Hash Tables from the glib2
library. But now I need SQL access to the same data and I don't
want to write my own SQL engine, so I'm thinking of adding
sqlite to the mix (in memory mode only).

I've added a virtual table driver and it does seem to work
but it's not an ideal solution, because C structures are pretty
inflexible (can't add columns without recompiling), and
most of the in-memory data is loaded from an external SQL
database, which routinely changes the database layout.

So I am wondering if I can drop the glib Hash Tables, and
go sqlite all the way. But I'm afraid the process of
constructing SQL queries / parsing them by sqlite, and
interpreting the results in my app, multiple times per
event will be too slow.

So now my question:

- can I implement/add a hash index in sqlite?
- can i access the table directly without the overhead of
constructing an SQL query/decomposing the result?

Thanks for any hints and tips,
Ron

Roger Binns

2009-10-10 18:04:02 UTC

Post by Ron Arts
So I am wondering if I can drop the glib Hash Tables, and
go sqlite all the way. But I'm afraid the process of
constructing SQL queries / parsing them by sqlite, and
interpreting the results in my app, multiple times per
event will be too slow.

That is pure guesswork. You will only know by building a representative
benchmark using data substantially similar to what you expect on machines
similar to what you expect in real world usage of the code. (ie anyone
else's benchmark data isn't too useful to you)

Secondly have a look at virtual tables. They will let you store data in any
way you deem fit while still being able to offer a SQL interface to them.

Roger

Simon Slavin

2009-10-10 18:38:08 UTC

Post by Roger Binns

Post by Ron Arts
So I am wondering if I can drop the glib Hash Tables, and
go sqlite all the way. But I'm afraid the process of
constructing SQL queries / parsing them by sqlite, and
interpreting the results in my app, multiple times per
event will be too slow.

Don't forget to use transactions, even for when you are just doing
SELECTs without changing any data.

Post by Roger Binns
That is pure guesswork. You will only know by building a
representative
benchmark using data substantially similar to what you expect on machines
similar to what you expect in real world usage of the code. (ie anyone
else's benchmark data isn't too useful to you)
Secondly have a look at virtual tables. They will let you store data in any
way you deem fit while still being able to offer a SQL interface to them.

Also, don't assume that holding your table in memory is faster than
just using a normal disk file and letting your operating system do
caching and swapping as appropriate. If you have enough data in your
entire database (data /including/ indexes) to force paging/swapping
then this may actually be faster.

As Roger wrote, we can't guess much more without having an idea of the
size of your database and what kind of SELECT commands you're doing.

Simon.

Ron Arts

2009-10-10 19:42:44 UTC

Ok,

I just finished writing a test program. It creates an SQLite memory table
and inserts 500000 records, then it selects 500000 times on a random key.

After that it uses hash memory tables to do the same thing. Here is the
test output:

sqlite3 insert 500000 records time: 17.21 secs
sqlite3 select 500000 records time: 18.59 secs

glib2 hash tables insert 500000 records time: 0.68 secs
glib2 hash tables lookup 500000 records time: 0.24 secs

Considering the fact that I need to do on average 5 lookups
per network request I need 18.59 seconds for 100000 requests
i.e. 5379 requests/sec.

Using hash tables I can do 100000 requests in .24 seconds
meaning around 400000 req/sec.

This is a problem in my case. By the way, the database will
always fit in memory, so swapping is no problem.

Virtual tables are a partial solution and I tried that, but
as I said, hash tables are pretty static, and I can't recompile
(and worse: restart) the application every time some programmer
adds a database column.

So I assume there is no direct access method for the im-memory
table structures? Or a way I can add my own hashing? Or did you
mean that when you said using virtual tables?

Can I add a hashed index on an existing sqlite memory table this way?

Thanks,
Ron

PS: I *am* impressed by sqlite's speed.

Post by Simon Slavin

Post by Roger Binns

Post by Ron Arts
So I am wondering if I can drop the glib Hash Tables, and
go sqlite all the way. But I'm afraid the process of
constructing SQL queries / parsing them by sqlite, and
interpreting the results in my app, multiple times per
event will be too slow.

Don't forget to use transactions, even for when you are just doing
SELECTs without changing any data.

Post by Roger Binns
That is pure guesswork. You will only know by building a
representative
benchmark using data substantially similar to what you expect on machines
similar to what you expect in real world usage of the code. (ie anyone
else's benchmark data isn't too useful to you)
Secondly have a look at virtual tables. They will let you store data in any
way you deem fit while still being able to offer a SQL interface to them.

Also, don't assume that holding your table in memory is faster than
just using a normal disk file and letting your operating system do
caching and swapping as appropriate. If you have enough data in your
entire database (data /including/ indexes) to force paging/swapping
then this may actually be faster.
As Roger wrote, we can't guess much more without having an idea of the
size of your database and what kind of SELECT commands you're doing.
Simon.
_______________________________________________
sqlite-users mailing list
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Roger Binns

2009-10-10 20:26:49 UTC

Post by Ron Arts
Using hash tables I can do 100000 requests in .24 seconds
meaning around 400000 req/sec.

If you are just doing simple lookups (eg doing equality on a single column)
then a hash table will always beat going through SQLite. But if you are
doing more complex queries SQLite will beat the pants off any custom
expression evaluator you have unless you spend a lot of time optimising it.

Post by Ron Arts
Virtual tables are a partial solution and I tried that, but
as I said, hash tables are pretty static, and I can't recompile
(and worse: restart) the application every time some programmer
adds a database column.

I am confused by this. Are columns added while running or is this a
development time issue? Why do virtual tables require a recompile or
restart? (Hint: you can drop and recreate a table.)

Post by Ron Arts
So I assume there is no direct access method for the im-memory
table structures?

It uses btrees and there is no stable API to them nor is it exposed in any
way. See http://www.sqlite.org/arch.html

Post by Ron Arts
Or did you mean that when you said using virtual tables?
Can I add a hashed index on an existing sqlite memory table this way?

The point of virtual tables is to let you provide a SQL API to your
underlying data without exposing in any way how that data is structured or
optimised. In the architecture diagram above your code provides the blue
"backend" box.

If all you have at the end of the day is a single simple flat list then this
won't provide much benefit. If you have multiple lists and need to evaluate
expressions over them then virtual tables are great. If you have an
underlying index then the virtual table BestIndex method works with them, or
you don't return anything and SQLite will iterate over all relevant rows.

Roger

Jay A. Kreibich

2009-10-10 20:27:33 UTC

Post by Simon Slavin

Post by Ron Arts
So I am wondering if I can drop the glib Hash Tables, and
go sqlite all the way. But I'm afraid the process of
constructing SQL queries / parsing them by sqlite, and
interpreting the results in my app, multiple times per
event will be too slow.

Don't forget to use transactions, even for when you are just doing
SELECTs without changing any data.

Using transactions speeds up a long series of SELECTs because it
eliminates the need to re-acquire a read-only file-lock for each
individual SELECT.

Since in-memory databases have no file locks, I'm not sure that is
relevant to this specific case.

-j

--
Jay A. Kreibich < J A Y @ K R E I B I.C H >

"Our opponent is an alien starship packed with atomic bombs. We have
a protractor." "I'll go home and see if I can scrounge up a ruler
and a piece of string." --from Anathem by Neal Stephenson

Simon Slavin

2009-10-10 20:54:04 UTC

Post by Jay A. Kreibich

Post by Simon Slavin
Don't forget to use transactions, even for when you are just doing
SELECTs without changing any data.

Using transactions speeds up a long series of SELECTs because it
eliminates the need to re-acquire a read-only file-lock for each
individual SELECT.
Since in-memory databases have no file locks, I'm not sure that is
relevant to this specific case.

I wasn't sure about that. It could still be slower, even if the
individual transactions don't eventually result in any calls to file-
handling routines. At what point does the code decide whether a lock
is needed ? Does it get dropped almost immediately, or does it go
through multiple levels of calls before some low-level code says "iff
disk-database then lock" ? Sorry, I've never looked at the source.

Simon.

Alexey Pechnikov

2009-10-11 07:49:57 UTC

Hello!

Post by Simon Slavin

Post by Jay A. Kreibich
Using transactions speeds up a long series of SELECTs because it
eliminates the need to re-acquire a read-only file-lock for each
individual SELECT.
Since in-memory databases have no file locks, I'm not sure that is
relevant to this specific case.

I wasn't sure about that. It could still be slower

You can check it very easy.

In transactions:
***@veter-laptop:/tmp$ ./test.tcl
19968119 microseconds per iteration
25649514 microseconds per iteration

Without transactions:
***@veter-laptop:/tmp$ ./test.tcl
35586024 microseconds per iteration
28630785 microseconds per iteration

$ cat ./test.tcl
#!/usr/bin/tclsh8.5

package require sqlite3
sqlite3 db :memory:

set limit 500000
db eval {create table test(id int primary key, value text)}

puts [time {
db transaction {
for {set i 0} {$i<$limit} {incr i} {
set value "value $i"
db eval {insert into test (value) values ($value)}
}
}
}]

puts [time {
db transaction {
for {set i 0} {$i<$limit} {incr i} {
set rowid [expr round(rand()*$limit)]
db onecolumn {select value from test where rowid=$rowid}
}
}
}]

Best regards, Alexey Pechnikov.
http://pechnikov.tel/

Jay A. Kreibich

2009-10-11 18:52:29 UTC

Post by Alexey Pechnikov
Hello!

Post by Simon Slavin

Post by Jay A. Kreibich
Using transactions speeds up a long series of SELECTs because it
eliminates the need to re-acquire a read-only file-lock for each
individual SELECT.
Since in-memory databases have no file locks, I'm not sure that is
relevant to this specific case.

I wasn't sure about that. It could still be slower

You can check it very easy.
19968119 microseconds per iteration
25649514 microseconds per iteration
35586024 microseconds per iteration
28630785 microseconds per iteration

A bit to my surprise, the difference is even more significant using
prepared statements in a C program. For a half-million selects over a
similar table in a :memory: database, there is a 20% speed-up by
wrapping all the selects in a transaction (vs the 10% you're seeing).
It's averaging about 4.3 seconds in auto-commit mode, and 3.4 seconds
in an explicit transaction.

-j

--
Jay A. Kreibich < J A Y @ K R E I B I.C H >

"Our opponent is an alien starship packed with atomic bombs. We have
a protractor." "I'll go home and see if I can scrounge up a ruler
and a piece of string." --from Anathem by Neal Stephenson

Alexey Pechnikov

2009-10-11 21:14:43 UTC

Hello!

Post by Jay A. Kreibich
A bit to my surprise, the difference is even more significant using
prepared statements in a C program. For a half-million selects over a
similar table in a :memory: database, there is a 20% speed-up by
wrapping all the selects in a transaction (vs the 10% you're seeing).
It's averaging about 4.3 seconds in auto-commit mode, and 3.4 seconds
in an explicit transaction.

The tcl programm doing copy of the selected rows. May be your C programm
doesn't copy the selected data?

The modified tcl test script can show how transaction increase speed of data
extracting. In this case the test programm perform copy of data too but the
data doesn't extracting from database table when we use the "select NULL"
construction.

The performance increased of ~10% when we extract table data and of
~3% again.
====================================
$ ./test.tcl
insert transaction 500000 rows
21233766 microseconds per iteration

select 500000 rows
28164019 microseconds per iteration

select without extract 500000 rows
26379441 microseconds per iteration

select transaction 500000 rows
25749923 microseconds per iteration

select transaction without extract 500000 rows
25644248 microseconds per iteration

====================================
$ cat ./test.tcl
#!/usr/bin/tclsh8.5
package require sqlite3
sqlite3 db :memory:
set limit 500000

db eval {create table test(id int primary key, value text)}

puts "insert transaction $limit rows"
puts [time {
db transaction {
for {set i 0} {$i<$limit} {incr i} {
set value "value $i"
db eval {insert into test (value) values ($value)}
}
}
}]

puts "\nselect $limit rows"
puts [time {
for {set i 0} {$i<$limit} {incr i} {
set rowid [expr round(rand()*$limit)]
db onecolumn {select value from test where id=$rowid}
}
}]

puts "\nselect without extract $limit rows"
puts [time {
for {set i 0} {$i<$limit} {incr i} {
set rowid [expr round(rand()*$limit)]
db onecolumn {select NULL from test where id=$rowid}
}
}]

puts "\nselect transaction $limit rows"
puts [time {
db transaction {
for {set i 0} {$i<$limit} {incr i} {
set rowid [expr round(rand()*$limit)]
db onecolumn {select value from test where id=$rowid}
}
}
}]

puts "\nselect transaction without extract $limit rows"
puts [time {
db transaction {
for {set i 0} {$i<$limit} {incr i} {
set rowid [expr round(rand()*$limit)]
db onecolumn {select NULL from test where id=$rowid}
}
}
}]
====================================

Best regards, Alexey Pechnikov.
http://pechnikov.tel/

Jay A. Kreibich

2009-10-10 20:23:30 UTC

Post by Ron Arts
I'm afraid the process of
constructing SQL queries / parsing them by sqlite, and
interpreting the results in my app, multiple times per
event will be too slow.

There should be no need to construct and parse queries with each
interaction. Assuming the queries are fairly well known, you should
be able to prepare them once and then keep using them over and over.
This should save a noticeable amount of time.

Make sure you're using the prepare/bind/step/reset/finalize
interfaces, rather than exec or get_table.

-j

--
Jay A. Kreibich < J A Y @ K R E I B I.C H >

"Our opponent is an alien starship packed with atomic bombs. We have
a protractor." "I'll go home and see if I can scrounge up a ruler
and a piece of string." --from Anathem by Neal Stephenson

Ron Arts

2009-10-10 21:57:30 UTC

Post by Jay A. Kreibich

Post by Ron Arts
I'm afraid the process of
constructing SQL queries / parsing them by sqlite, and
interpreting the results in my app, multiple times per
event will be too slow.

There should be no need to construct and parse queries with each
interaction. Assuming the queries are fairly well known, you should
be able to prepare them once and then keep using them over and over.
This should save a noticeable amount of time.
Make sure you're using the prepare/bind/step/reset/finalize
interfaces, rather than exec or get_table.
-j

Thanks Jay,

I'm expanding my benchmark to test just thaty, but I'm running into a problem.
Here's my code (well part of it):

sqlite3_stmt *stmt;
rc = sqlite3_prepare(db, "select name from company where id = '?'", -1, &stmt, NULL);
if (rc != SQLITE_OK) {
fprintf(stderr, "sqlite3_prepare SQL error: %d\n", rc);
exit(1);
}

for (i=1; i < count; i++) {
rc = sqlite3_bind_int(stmt, 1, rand()%count);
if (rc != SQLITE_OK ){
fprintf(stderr, "sqlite3_bind_int SQL error: %d\n", rc);
exit(1);
}
while (1) {
rc = sqlite3_step(stmt);
if (rc == SQLITE_DONE) {
sqlite3_reset(stmt);
break;
}
if( rc != SQLITE_ROW){
fprintf(stderr, "sqlite3_step SQL error: %d\n", rc);
exit(1);
}
}
}
sqlite3_finalize(stmt);

The sqlite3_bind_int immediately gives me an RANGE_ERROR (25).
Is there some obvious thing I'm doing wrong?

Thanks,
Ron

Jay A. Kreibich

2009-10-10 22:13:43 UTC

Post by Ron Arts
I'm expanding my benchmark to test just thaty, but I'm running into a problem.
sqlite3_stmt *stmt;
rc = sqlite3_prepare(db, "select name from company where id = '?'", -1, &stmt, NULL);
The sqlite3_bind_int immediately gives me an RANGE_ERROR (25).
Is there some obvious thing I'm doing wrong?

Remove the single quotes. They aren't required for parameters, even
if you're binding a text value.

As written, that statement has a single character string literal and
no parameters.

-j

--
Jay A. Kreibich < J A Y @ K R E I B I.C H >

"Our opponent is an alien starship packed with atomic bombs. We have
a protractor." "I'll go home and see if I can scrounge up a ruler
and a piece of string." --from Anathem by Neal Stephenson

Ron Arts

2009-10-10 22:28:46 UTC

Post by Jay A. Kreibich

Post by Ron Arts
I'm expanding my benchmark to test just thaty, but I'm running into a problem.
sqlite3_stmt *stmt;
rc = sqlite3_prepare(db, "select name from company where id = '?'", -1, &stmt, NULL);
The sqlite3_bind_int immediately gives me an RANGE_ERROR (25).
Is there some obvious thing I'm doing wrong?

Remove the single quotes. They aren't required for parameters, even
if you're binding a text value.
As written, that statement has a single character string literal and
no parameters.
-j

Yep that did it, sorry, I'm pretty new to sqlite.

Here's my new benchmark output:

sqlite3 insert 500000 records time: 17.19 secs
sqlite3 select 500000 records time: 18.57 secs
sqlite3 prepared select 500000 records time: 3.27 secs
glib2 hash tables insert 500000 records time: 0.38 secs
glib2 hash tables lookup 500000 records time: 0.24 secs

The prepared select indeed speeds up things tremendously, a 5-fold increase.
Now I can do around 30000 requests per second, of course I need to keep
prepared versions of all the sql statements I use.

This is beginning to look feasible now! I'll have to investigate my
own processing a bit more, see if my assumption of 5 queries
per request is realistic.

Thanks so far,

Ron

Olaf Schmidt

2009-10-10 23:26:37 UTC

Post by Ron Arts
sqlite3 insert 500000 records time: 17.19 secs
sqlite3 select 500000 records time: 18.57 secs
sqlite3 prepared select 500000 records time: 3.27 secs
glib2 hash tables insert 500000 records time: 0.38 secs
glib2 hash tables lookup 500000 records time: 0.24 secs
The prepared select indeed speeds up things tremendously,
a 5-fold increase.

Now do the same thing (prepared commands) for
the sqlite3 - inserts too ... wrapped in a transaction.

Against an InMemory-Table I reach here about
120000 Inserts per second (filling up and inserting
"mixed values" against a 8-column-table).
That's on a 1.8GHz Intel-(mobile)CPU.

As long as your benchmarked-tabledef (your insert) has not
much more columns than my above mentioned '8', then
you should see something like "factor 5" there too.

With a small two-column-table (a prepared two-column-insert-Cmd)
I see about 360000 inserts per second, somewhat depending
on the used datatypes (Integers and Doubles work a bit faster of
course than inserting the same "amount of Columns" as Text-Values).

Another reason for your bad insert-performance could of
course be, that you already defined an index on the table
in question (or in case your ID-Field is defined as
INTEGER PRIMARY KEY *and* you're filling up
new IDs in non-consecutive order).
In that case your current results seem a bit more reasonable.

If you don't have an index created yet (on your "HashKey-
ID-Column" ... or if you don't have mapped your ID-Field
to SQLites RowID yet (per INTEGER PRIMARY KEY)
then you maybe should try to create one - if possible, after
your "main-amount" of fillups was done - that adds some
additional time to your overall-data-preparation efforts of your
"table-list" - but will be of benefit for your single-record-lookups,
based on your "... Where ID = ? ".

Olaf Schmidt

Ron Arts

2009-10-11 08:04:37 UTC

Post by Olaf Schmidt

Post by Ron Arts
sqlite3 insert 500000 records time: 17.19 secs
sqlite3 select 500000 records time: 18.57 secs
sqlite3 prepared select 500000 records time: 3.27 secs
glib2 hash tables insert 500000 records time: 0.38 secs
glib2 hash tables lookup 500000 records time: 0.24 secs
The prepared select indeed speeds up things tremendously,
a 5-fold increase.

Now do the same thing (prepared commands) for
the sqlite3 - inserts too ... wrapped in a transaction.
Against an InMemory-Table I reach here about
120000 Inserts per second (filling up and inserting
"mixed values" against a 8-column-table).
That's on a 1.8GHz Intel-(mobile)CPU.
As long as your benchmarked-tabledef (your insert) has not
much more columns than my above mentioned '8', then
you should see something like "factor 5" there too.
With a small two-column-table (a prepared two-column-insert-Cmd)
I see about 360000 inserts per second, somewhat depending
on the used datatypes (Integers and Doubles work a bit faster of
course than inserting the same "amount of Columns" as Text-Values).
Another reason for your bad insert-performance could of
course be, that you already defined an index on the table
in question (or in case your ID-Field is defined as
INTEGER PRIMARY KEY *and* you're filling up
new IDs in non-consecutive order).
In that case your current results seem a bit more reasonable.
If you don't have an index created yet (on your "HashKey-
ID-Column" ... or if you don't have mapped your ID-Field
to SQLites RowID yet (per INTEGER PRIMARY KEY)
then you maybe should try to create one - if possible, after
your "main-amount" of fillups was done - that adds some
additional time to your overall-data-preparation efforts of your
"table-list" - but will be of benefit for your single-record-lookups,
based on your "... Where ID = ? ".

Olaf,

I tried it, and indeed, this speeds up inserts tremendously as well,
but in fact I'm not at all concernced about insert speed, but much more about
select speed. I use the following queries:

CREATE TABLE company(id INTEGER PRIMARY KEY ASC, name)

Then I insert 500000 records like this:

INSERT INTO company (id, name) VALUES ('1', 'Company name number 1')

(with consecutive values for the id value.)

do you expect the id column now to be mapped to the Row ID, so
this results in the fastest possible way of selecting by id?

I now get this:

sqlite3 prepared insert in trx 500000 records time: 5.08 secs
sqlite3 select 500000 records time: 19.28 secs
sqlite3 prepared select 500000 records time: 3.47 secs
glib2 hash tables insert 500000 records time: 0.37 secs
glib2 hash tables lookup 500000 records time: 0.25 secs

But I'm still looking to speed up selects.

Thanks,
Ron

Post by Olaf Schmidt
Olaf Schmidt
_______________________________________________
sqlite-users mailing list
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Alexey Pechnikov

2009-10-11 08:21:32 UTC

Hello!

Post by Ron Arts
CREATE TABLE company(id INTEGER PRIMARY KEY ASC, name)
INSERT INTO company (id, name) VALUES ('1', 'Company name number 1')
(with consecutive values for the id value.)
do you expect the id column now to be mapped to the Row ID, so
this results in the fastest possible way of selecting by id?

Yes, the id is alias for rowid in your table. Check it by
sqlite3 > explain query plan select name from company where id=1;

Post by Ron Arts
sqlite3 prepared insert in trx 500000 records time: 5.08 secs
sqlite3 select 500000 records time: 19.28 secs
sqlite3 prepared select 500000 records time: 3.47 secs
glib2 hash tables insert 500000 records time: 0.37 secs
glib2 hash tables lookup 500000 records time: 0.25 secs
But I'm still looking to speed up selects.

Hm... I think you may not open more than 8 000 tcp/ip sockets per second
in common case and so SQLite speed is good enough. Why you write about
"to handle tens of thousands requests per second"?

Best regards, Alexey Pechnikov.
http://pechnikov.tel/

Ron Arts

2009-10-11 11:04:51 UTC

Post by Alexey Pechnikov
Hello!

Post by Ron Arts
CREATE TABLE company(id INTEGER PRIMARY KEY ASC, name)
INSERT INTO company (id, name) VALUES ('1', 'Company name number 1')
(with consecutive values for the id value.)
do you expect the id column now to be mapped to the Row ID, so
this results in the fastest possible way of selecting by id?

Yes, the id is alias for rowid in your table. Check it by
sqlite3 > explain query plan select name from company where id=1;

Post by Ron Arts
sqlite3 prepared insert in trx 500000 records time: 5.08 secs
sqlite3 select 500000 records time: 19.28 secs
sqlite3 prepared select 500000 records time: 3.47 secs
glib2 hash tables insert 500000 records time: 0.37 secs
glib2 hash tables lookup 500000 records time: 0.25 secs
But I'm still looking to speed up selects.

Hm... I think you may not open more than 8 000 tcp/ip sockets per second
in common case and so SQLite speed is good enough. Why you write about
"to handle tens of thousands requests per second"?

Well, in my case there can be tens of thousands of connections open
at the same time, where each connection can last days.
Each connection can spit out multiple messages per second, and each
message need around 5 SQL queries.

Ron

Post by Alexey Pechnikov
Best regards, Alexey Pechnikov.
http://pechnikov.tel/
_______________________________________________
sqlite-users mailing list
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Simon Slavin

2009-10-11 13:22:46 UTC

Post by Ron Arts
CREATE TABLE company(id INTEGER PRIMARY KEY ASC, name)
INSERT INTO company (id, name) VALUES ('1', 'Company name number 1')
(with consecutive values for the id value.)

I think you can remove the single quotes from around your value for
the id column, because both the column definition and the values
you're supplying for it are integers rather than strings. This might
speed up your INSERT even more.

Post by Ron Arts
more about
select speed

When testing SELECT speeds to work out how best to use the library,
include code to take the values that are returned and put them into
some variables in whatever language you're using. In some languages,
using some compilers, and with some optimisations turned on, the
SELECT command itself executes quickly but extracting the values to
variables takes longer.

Also, note that if you don't use the values from the variables some
compilers (e.g. recent versions of gcc) will optimise out the routines
which are meant to set the values. So if you're running comparative
speed tests and getting weird results try using the variables, for
example printing them to /dev/null.

Simon.

Olaf Schmidt

2009-10-11 14:55:27 UTC

Post by Ron Arts
I tried it, and indeed, this speeds up inserts tremendously as well,
but in fact I'm not at all concernced about insert speed, but much more about
CREATE TABLE company(id INTEGER PRIMARY KEY ASC, name)

Not sure, if an explicit "text-storage-class-hint" for your name
column will help to speed things up a bit more, but try:
CREATE TABLE company(id INTEGER PRIMARY KEY, name TEXT)

Post by Ron Arts
INSERT INTO company (id, name) VALUES ('1', 'Company name number 1')

You should prepare the statement like this:
INSERT INTO company (id, name) VALUES (?, ?)

And then treat the (now RowID-mapped) Integer ID as an
Integer, not as a String.
Just use the correctly "typed" binding-calls.

And then wrap the Insert-Loop within a transaction.

Post by Ron Arts
But I'm still looking to speed up selects.

With regards to Selects (searching for random "single IDs"), you
probably already reached the maximum (in case you've not done
any mistakes with the Type-Binding).

Olaf

Pavel Ivanov

2009-10-11 16:34:36 UTC

Post by Ron Arts
CREATE TABLE company(id INTEGER PRIMARY KEY ASC, name)

I'm not sure how SQLite treats this table definition but probably
because of your ASC it could decide that id shouldn't be a synonym for
rowid which will make at least inserts slower.

Post by Ron Arts
But I'm still looking to speed up selects.

Other conditions that can speed up selects:
- Depending on relation between size of your database and size of
database cache selects could work faster if all database is cached. So
for example in just started application first set of selects will work
slower than second. But if full database cannot fit into cache then
different sets of selects will have the same performance on average.
- If you don't care about changing your database concurrently from
other processes you can place all your inserts in one transaction or
in case of your real application just start transaction at the
beginning and commit/rollback it at the end.

Tell us if it still doesn't satisfy you.

Pavel

Post by Ron Arts

Post by Olaf Schmidt

Post by Ron Arts
sqlite3 insert 500000 records time: 17.19 secs
sqlite3 select 500000 records time: 18.57 secs
sqlite3 prepared select 500000 records time: 3.27 secs
glib2 hash tables insert 500000 records time: 0.38 secs
glib2 hash tables lookup 500000 records time: 0.24 secs
The prepared select indeed speeds up things tremendously,
a 5-fold increase.

Now do the same thing (prepared commands) for
the sqlite3 - inserts too ... wrapped in a transaction.
Against an InMemory-Table I reach here about
120000 Inserts per second (filling up and inserting
"mixed values" against a 8-column-table).
That's on a 1.8GHz Intel-(mobile)CPU.
As long as your benchmarked-tabledef (your insert) has not
much more columns than my above mentioned '8', then
you should see something like "factor 5" there too.
With a small two-column-table (a prepared two-column-insert-Cmd)
I see about 360000 inserts per second, somewhat depending
on the used datatypes (Integers and Doubles work a bit faster of
course than inserting the same "amount of Columns" as Text-Values).
Another reason for your bad insert-performance could of
course be, that you already defined an index on the table
in question (or in case your ID-Field is defined as
INTEGER PRIMARY KEY *and* you're filling up
new IDs in non-consecutive order).
In that case your current results seem a bit more reasonable.
If you don't have an index created yet (on your "HashKey-
ID-Column" ... or if you don't have mapped your ID-Field
to SQLites RowID yet (per INTEGER PRIMARY KEY)
then you maybe should try to create one - if possible, after
your "main-amount" of fillups was done - that adds some
additional time to your overall-data-preparation efforts of your
"table-list" - but will be of benefit for your single-record-lookups,
based on your "... Where ID = ? ".

Olaf,
I tried it, and indeed, this speeds up inserts tremendously as well,
but in fact I'm not at all concernced about insert speed, but much more about
CREATE TABLE company(id INTEGER PRIMARY KEY ASC, name)
INSERT INTO company (id, name) VALUES ('1', 'Company name number 1')
(with consecutive values for the id value.)
do you expect the id column now to be mapped to the Row ID, so
this results in the fastest possible way of selecting by id?
sqlite3 prepared insert in trx 500000 records time: 5.08 secs
sqlite3 select 500000 records time: 19.28 secs
sqlite3 prepared select 500000 records time: 3.47 secs
glib2 hash tables insert 500000 records time: 0.37 secs
glib2 hash tables lookup 500000 records time: 0.25 secs
But I'm still looking to speed up selects.
Thanks,
Ron

Post by Olaf Schmidt
Olaf Schmidt
_______________________________________________
sqlite-users mailing list
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Ron Arts

2009-10-11 16:39:32 UTC

Post by Pavel Ivanov

Post by Ron Arts
CREATE TABLE company(id INTEGER PRIMARY KEY ASC, name)

I'm not sure how SQLite treats this table definition but probably
because of your ASC it could decide that id shouldn't be a synonym for
rowid which will make at least inserts slower.

Post by Ron Arts
But I'm still looking to speed up selects.

- Depending on relation between size of your database and size of
database cache selects could work faster if all database is cached. So
for example in just started application first set of selects will work
slower than second. But if full database cannot fit into cache then
different sets of selects will have the same performance on average.
- If you don't care about changing your database concurrently from
other processes you can place all your inserts in one transaction or
in case of your real application just start transaction at the
beginning and commit/rollback it at the end.
Tell us if it still doesn't satisfy you.

Pavel,

does the cache work for memory datsbases too?

Thanks,
Ron

Post by Pavel Ivanov
Pavel

Post by Ron Arts

Post by Olaf Schmidt

Post by Ron Arts
sqlite3 insert 500000 records time: 17.19 secs
sqlite3 select 500000 records time: 18.57 secs
sqlite3 prepared select 500000 records time: 3.27 secs
glib2 hash tables insert 500000 records time: 0.38 secs
glib2 hash tables lookup 500000 records time: 0.24 secs
The prepared select indeed speeds up things tremendously,
a 5-fold increase.

Now do the same thing (prepared commands) for
the sqlite3 - inserts too ... wrapped in a transaction.
Against an InMemory-Table I reach here about
120000 Inserts per second (filling up and inserting
"mixed values" against a 8-column-table).
That's on a 1.8GHz Intel-(mobile)CPU.
As long as your benchmarked-tabledef (your insert) has not
much more columns than my above mentioned '8', then
you should see something like "factor 5" there too.
With a small two-column-table (a prepared two-column-insert-Cmd)
I see about 360000 inserts per second, somewhat depending
on the used datatypes (Integers and Doubles work a bit faster of
course than inserting the same "amount of Columns" as Text-Values).
Another reason for your bad insert-performance could of
course be, that you already defined an index on the table
in question (or in case your ID-Field is defined as
INTEGER PRIMARY KEY *and* you're filling up
new IDs in non-consecutive order).
In that case your current results seem a bit more reasonable.
If you don't have an index created yet (on your "HashKey-
ID-Column" ... or if you don't have mapped your ID-Field
to SQLites RowID yet (per INTEGER PRIMARY KEY)
then you maybe should try to create one - if possible, after
your "main-amount" of fillups was done - that adds some
additional time to your overall-data-preparation efforts of your
"table-list" - but will be of benefit for your single-record-lookups,
based on your "... Where ID = ? ".

Olaf,
I tried it, and indeed, this speeds up inserts tremendously as well,
but in fact I'm not at all concernced about insert speed, but much more about
CREATE TABLE company(id INTEGER PRIMARY KEY ASC, name)
INSERT INTO company (id, name) VALUES ('1', 'Company name number 1')
(with consecutive values for the id value.)
do you expect the id column now to be mapped to the Row ID, so
this results in the fastest possible way of selecting by id?
sqlite3 prepared insert in trx 500000 records time: 5.08 secs
sqlite3 select 500000 records time: 19.28 secs
sqlite3 prepared select 500000 records time: 3.47 secs
glib2 hash tables insert 500000 records time: 0.37 secs
glib2 hash tables lookup 500000 records time: 0.25 secs
But I'm still looking to speed up selects.
Thanks,
Ron

Post by Olaf Schmidt
Olaf Schmidt
_______________________________________________
sqlite-users mailing list
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Pavel Ivanov

2009-10-12 23:34:55 UTC

Post by Ron Arts
Pavel,
does the cache work for memory datsbases too?

Doh, missed the fact that it's a memory database. I believe in-memory
database is in fact just a database cache that never deletes its pages
from memory and never spills them to disk. Although anything about
size of database cache will not be applicable here...
But wrapping all selects into one transaction will work with in-memory
databases too (somewhere in another thread there were even benchmarks
showing the effect of this). And maybe in fact you never even need to
commit or rollback transaction in in-memory database because there's
no other connection which will need to get access to the database and
to the committed data...

BTW, another option for speeding up selects is to recompile SQLite
without multi-threading support and maybe even without some other
stuff that you never use. At least lack of multi-threading support can
make a significant difference in performance.

Pavel

Post by Ron Arts

Post by Pavel Ivanov

Post by Ron Arts
CREATE TABLE company(id INTEGER PRIMARY KEY ASC, name)

I'm not sure how SQLite treats this table definition but probably
because of your ASC it could decide that id shouldn't be a synonym for
rowid which will make at least inserts slower.

Post by Ron Arts
But I'm still looking to speed up selects.

- Depending on relation between size of your database and size of
database cache selects could work faster if all database is cached. So
for example in just started application first set of selects will work
slower than second. But if full database cannot fit into cache then
different sets of selects will have the same performance on average.
- If you don't care about changing your database concurrently from
other processes you can place all your inserts in one transaction or
in case of your real application just start transaction at the
beginning and commit/rollback it at the end.
Tell us if it still doesn't satisfy you.

Pavel,
does the cache work for memory datsbases too?
Thanks,
Ron

Post by Pavel Ivanov
Pavel

Post by Ron Arts

Post by Olaf Schmidt

Post by Ron Arts
sqlite3 insert 500000 records time: 17.19 secs
sqlite3 select 500000 records time: 18.57 secs
sqlite3 prepared select 500000 records time: 3.27 secs
glib2 hash tables insert 500000 records time: 0.38 secs
glib2 hash tables lookup 500000 records time: 0.24 secs
The prepared select indeed speeds up things tremendously,
a 5-fold increase.

Now do the same thing (prepared commands) for
the sqlite3 - inserts too ... wrapped in a transaction.
Against an InMemory-Table I reach here about
120000 Inserts per second (filling up and inserting
"mixed values" against a 8-column-table).
That's on a 1.8GHz Intel-(mobile)CPU.
As long as your benchmarked-tabledef (your insert) has not
much more columns than my above mentioned '8', then
you should see something like "factor 5" there too.
With a small two-column-table (a prepared two-column-insert-Cmd)
I see about 360000 inserts per second, somewhat depending
on the used datatypes (Integers and Doubles work a bit faster of
course than inserting the same "amount of Columns" as Text-Values).
Another reason for your bad insert-performance could of
course be, that you already defined an index on the table
in question (or in case your ID-Field is defined as
INTEGER PRIMARY KEY *and* you're filling up
new IDs in non-consecutive order).
In that case your current results seem a bit more reasonable.
If you don't have an index created yet (on your "HashKey-
ID-Column" ... or if you don't have mapped your ID-Field
to SQLites RowID yet (per INTEGER PRIMARY KEY)
then you maybe should try to create one - if possible, after
your "main-amount" of fillups was done - that adds some
additional time to your overall-data-preparation efforts of your
"table-list" - but will be of benefit for your single-record-lookups,
based on your "... Where ID = ? ".

Olaf,
I tried it, and indeed, this speeds up inserts tremendously as well,
but in fact I'm not at all concernced about insert speed, but much more about
CREATE TABLE company(id INTEGER PRIMARY KEY ASC, name)
INSERT INTO company (id, name) VALUES ('1', 'Company name number 1')
(with consecutive values for the id value.)
do you expect the id column now to be mapped to the Row ID, so
this results in the fastest possible way of selecting by id?
sqlite3 prepared insert in trx 500000 records time: 5.08 secs
sqlite3 select 500000 records time: 19.28 secs
sqlite3 prepared select 500000 records time: 3.47 secs
glib2 hash tables insert 500000 records time: 0.37 secs
glib2 hash tables lookup 500000 records time: 0.25 secs
But I'm still looking to speed up selects.
Thanks,
Ron

Post by Olaf Schmidt
Olaf Schmidt
_______________________________________________
sqlite-users mailing list
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Ron Arts

2009-10-11 16:48:12 UTC

Are there compile time switches which I can use to speed up
selects in memory databases? Will the amalgamated version be faster
than linking the lib at runtime?

Thanks,
Ron

Post by Pavel Ivanov

Post by Ron Arts
CREATE TABLE company(id INTEGER PRIMARY KEY ASC, name)

I'm not sure how SQLite treats this table definition but probably
because of your ASC it could decide that id shouldn't be a synonym for
rowid which will make at least inserts slower.

Post by Ron Arts
But I'm still looking to speed up selects.

- Depending on relation between size of your database and size of
database cache selects could work faster if all database is cached. So
for example in just started application first set of selects will work
slower than second. But if full database cannot fit into cache then
different sets of selects will have the same performance on average.
- If you don't care about changing your database concurrently from
other processes you can place all your inserts in one transaction or
in case of your real application just start transaction at the
beginning and commit/rollback it at the end.
Tell us if it still doesn't satisfy you.
Pavel

Post by Ron Arts

Post by Olaf Schmidt

Post by Ron Arts
sqlite3 insert 500000 records time: 17.19 secs
sqlite3 select 500000 records time: 18.57 secs
sqlite3 prepared select 500000 records time: 3.27 secs
glib2 hash tables insert 500000 records time: 0.38 secs
glib2 hash tables lookup 500000 records time: 0.24 secs
The prepared select indeed speeds up things tremendously,
a 5-fold increase.

Now do the same thing (prepared commands) for
the sqlite3 - inserts too ... wrapped in a transaction.
Against an InMemory-Table I reach here about
120000 Inserts per second (filling up and inserting
"mixed values" against a 8-column-table).
That's on a 1.8GHz Intel-(mobile)CPU.
As long as your benchmarked-tabledef (your insert) has not
much more columns than my above mentioned '8', then
you should see something like "factor 5" there too.
With a small two-column-table (a prepared two-column-insert-Cmd)
I see about 360000 inserts per second, somewhat depending
on the used datatypes (Integers and Doubles work a bit faster of
course than inserting the same "amount of Columns" as Text-Values).
Another reason for your bad insert-performance could of
course be, that you already defined an index on the table
in question (or in case your ID-Field is defined as
INTEGER PRIMARY KEY *and* you're filling up
new IDs in non-consecutive order).
In that case your current results seem a bit more reasonable.
If you don't have an index created yet (on your "HashKey-
ID-Column" ... or if you don't have mapped your ID-Field
to SQLites RowID yet (per INTEGER PRIMARY KEY)
then you maybe should try to create one - if possible, after
your "main-amount" of fillups was done - that adds some
additional time to your overall-data-preparation efforts of your
"table-list" - but will be of benefit for your single-record-lookups,
based on your "... Where ID = ? ".

Olaf,
I tried it, and indeed, this speeds up inserts tremendously as well,
but in fact I'm not at all concernced about insert speed, but much more about
CREATE TABLE company(id INTEGER PRIMARY KEY ASC, name)
INSERT INTO company (id, name) VALUES ('1', 'Company name number 1')
(with consecutive values for the id value.)
do you expect the id column now to be mapped to the Row ID, so
this results in the fastest possible way of selecting by id?
sqlite3 prepared insert in trx 500000 records time: 5.08 secs
sqlite3 select 500000 records time: 19.28 secs
sqlite3 prepared select 500000 records time: 3.47 secs
glib2 hash tables insert 500000 records time: 0.37 secs
glib2 hash tables lookup 500000 records time: 0.25 secs
But I'm still looking to speed up selects.
Thanks,
Ron

Post by Olaf Schmidt
Olaf Schmidt
_______________________________________________
sqlite-users mailing list
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Roger Binns

2009-10-11 17:07:44 UTC

Post by Ron Arts
Will the amalgamated version be faster
than linking the lib at runtime?

The SQLite website quotes a 10% performance improvement for the
amalgamation. The reason for the improvement is that the compiler gets to
see all the SQLite code at once and so makes good decisions about inlining
and other optimizations. (Note that in some cases the library was builtin
from the amalgamation so there won't be that much difference.)

Roger

sub sk79

2009-10-12 08:41:31 UTC

Hi!,

Optimizing by hand is one way to go, but it can get tedious with
multiple SQL statements requiring carefully sequenced prepares, binds,
transactions, pragmas, commits, exception-handling, compiler options
etc.

For automated optimization, you can try StepSqlite
(https://www.metatranz.com/stepsqlite/) - a commercial PL/SQL compiler
for SQLite which does several standard optimizations automatically for
you without compromising ACID. If needed, you can always do more
optimizations by hand on top of this.
One major benefit of going automated: your code remains clean.

Inlined below are results of the bechmark program compiled using StepSqlite:
For disk-based db:
INSERT 500000 records time: 6.18 secs
SELECT 500000 records time: 1.87 secs

Benchmark code in PL/SQL:
=======
create table parts(part_no integer, quantity integer, part_name
varchar, part_desc varchar);
PACKAGE BODY MyPackage IS
total integer;
BEGIN

for i in 1 .. 500000 loop
insert into parts (part_no, part_name, quantity) values(i, 'Name'||i, 2);
end loop;

for partsRec in (select quantity from parts)
loop
total := total+partsRec.quantity;
end loop;

dbms_output.put_line('Total Quantity:' ||total);

commit;
END;
========

Regards,
SK

Post by Ron Arts
Are there compile time switches which I can use to speed up
selects in memory databases? Will the amalgamated version be faster
than linking the lib at runtime?
Thanks,
Ron

Post by Pavel Ivanov

Post by Ron Arts
CREATE TABLE company(id INTEGER PRIMARY KEY ASC, name)

I'm not sure how SQLite treats this table definition but probably
because of your ASC it could decide that id shouldn't be a synonym for
rowid which will make at least inserts slower.

Post by Ron Arts
But I'm still looking to speed up selects.

- Depending on relation between size of your database and size of
database cache selects could work faster if all database is cached. So
for example in just started application first set of selects will work
slower than second. But if full database cannot fit into cache then
different sets of selects will have the same performance on average.
- If you don't care about changing your database concurrently from
other processes you can place all your inserts in one transaction or
in case of your real application just start transaction at the
beginning and commit/rollback it at the end.
Tell us if it still doesn't satisfy you.
Pavel

Post by Ron Arts

Post by Olaf Schmidt

Post by Ron Arts
sqlite3 insert 500000 records time: 17.19 secs
sqlite3 select 500000 records time: 18.57 secs
sqlite3 prepared select 500000 records time: 3.27 secs
glib2 hash tables insert 500000 records time: 0.38 secs
glib2 hash tables lookup 500000 records time: 0.24 secs
The prepared select indeed speeds up things tremendously,
a 5-fold increase.

Now do the same thing (prepared commands) for
the sqlite3 - inserts too ... wrapped in a transaction.
Against an InMemory-Table I reach here about
120000 Inserts per second (filling up and inserting
"mixed values" against a 8-column-table).
That's on a 1.8GHz Intel-(mobile)CPU.
As long as your benchmarked-tabledef (your insert) has not
much more columns than my above mentioned '8', then
you should see something like "factor 5" there too.
With a small two-column-table (a prepared two-column-insert-Cmd)
I see about 360000 inserts per second, somewhat depending
on the used datatypes (Integers and Doubles work a bit faster of
course than inserting the same "amount of Columns" as Text-Values).
Another reason for your bad insert-performance could of
course be, that you already defined an index on the table
in question (or in case your ID-Field is defined as
INTEGER PRIMARY KEY *and* you're filling up
new IDs in non-consecutive order).
In that case your current results seem a bit more reasonable.
If you don't have an index created yet (on your "HashKey-
ID-Column" ... or if you don't have mapped your ID-Field
to SQLites RowID yet (per INTEGER PRIMARY KEY)
then you maybe should try to create one - if possible, after
your "main-amount" of fillups was done - that adds some
additional time to your overall-data-preparation efforts of your
"table-list" - but will be of benefit for your single-record-lookups,
based on your "... Where ID = ? ".

Olaf,
I tried it, and indeed, this speeds up inserts tremendously as well,
but in fact I'm not at all concernced about insert speed, but much more about
CREATE TABLE company(id INTEGER PRIMARY KEY ASC, name)
INSERT INTO company (id, name) VALUES ('1', 'Company name number 1')
(with consecutive values for the id value.)
do you expect the id column now to be mapped to the Row ID, so
this results in the fastest possible way of selecting by id?
sqlite3 prepared insert in trx 500000 records time: 5.08 secs
sqlite3 select 500000 records time: 19.28 secs
sqlite3 prepared select 500000 records time: 3.47 secs
glib2 hash tables insert 500000 records time: 0.37 secs
glib2 hash tables lookup 500000 records time: 0.25 secs
But I'm still looking to speed up selects.
Thanks,
Ron

Post by Olaf Schmidt
Olaf Schmidt
_______________________________________________
sqlite-users mailing list
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Ron Arts

2009-10-18 07:37:37 UTC

Post by Pavel Ivanov

Post by Ron Arts
CREATE TABLE company(id INTEGER PRIMARY KEY ASC, name)

I'm not sure how SQLite treats this table definition but probably
because of your ASC it could decide that id shouldn't be a synonym for
rowid which will make at least inserts slower.

Post by Ron Arts
But I'm still looking to speed up selects.

- Depending on relation between size of your database and size of
database cache selects could work faster if all database is cached. So
for example in just started application first set of selects will work
slower than second. But if full database cannot fit into cache then
different sets of selects will have the same performance on average.
- If you don't care about changing your database concurrently from
other processes you can place all your inserts in one transaction or
in case of your real application just start transaction at the
beginning and commit/rollback it at the end.
Tell us if it still doesn't satisfy you.

Pavel,

I formulated a followup question in new thread, and that was silly, so I'll
repeat it here.

Is there a way to bypass the virtual machine altogether and reach directly
into the btree and just retrieve one record by it's oid (primary integer key),
and return it in a form that would allow taking out the column values by name?

I don't care if it's a bit dirty, but I really need to speed up record retrieval
and I know beforehand that 99% of my queries involves just retrieving one record by
primary integer key. I also know that the entire database will fit in memory.

I'll tell you the general flow of processing in my program: on startup, it copies
a postgresql database into a sqlite memory database so I need to do a lot of inserts,
but nod very fast, as postgresql cannot supply the records very fast.
Then my program opens a socket, and starts accepting connections, those connections
are long lasting, and send messages that need a fast reply. Many of the messages result
in messages being send to all other clients. The messages require on average 10 lookups
in the memory db, each by oid. Very occasionally I get a message that need more,
and for that I need to use an SQL statement internally. I also receive SQL queries
from clients occasionally.

My application needs to scale beyond tens of thousand of clients, and should also
communicatie with similar apps running on other machines for high availability
and geographical separation.

Thanks,
Ron

Simon Slavin

2009-10-18 15:10:20 UTC

Post by Ron Arts
Is there a way to bypass the virtual machine altogether and reach directly
into the btree and just retrieve one record by it's oid (primary integer key),
and return it in a form that would allow taking out the column
values by name?

The primary integer key column can always be referred to as the
special name ROWID, even if you have assigned it a column name of your
own. So you can do

SELECT ROWID,myCol1,myCol2 FROM myTable

as long as you don't explicitly declare a primary integer key column
and then change the values in it.

Simon.

Ron Arts

2009-10-18 15:37:57 UTC

Very true Simon,

this has been the fastest way so far and I can do around
350000 selects/second this way, using prepared statements
(on my machine at least), but I need more speed.

That's why I want to skip the SQL processing entirely
and write a C function that reaches directly into the
internal memory structures to gets my record from there.

thanks,
Ron

Post by Simon Slavin

Post by Ron Arts
Is there a way to bypass the virtual machine altogether and reach directly
into the btree and just retrieve one record by it's oid (primary integer key),
and return it in a form that would allow taking out the column values by name?

The primary integer key column can always be referred to as the
special name ROWID, even if you have assigned it a column name of your
own. So you can do
SELECT ROWID,myCol1,myCol2 FROM myTable
as long as you don't explicitly declare a primary integer key column
and then change the values in it.
Simon.
_______________________________________________
sqlite-users mailing list
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Simon Slavin

2009-10-18 15:51:01 UTC

Post by Ron Arts
I want to skip the SQL processing entirely
and write a C function that reaches directly into the
internal memory structures to gets my record from there.

I assume that you've already tested the fastest way of doing this that
the standard library allows: prebinding using _prepare, etc..

You could download the source code for SQLite, take a look at how the
SELECT command is implemented, and write your own customised code
which knows exactly which fields and index you want and how your table
is structured. I have no idea what the results would look like or
whether the developer forum would be a better place to discuss it.
Start by using EXPLAIN to look at the code generated from your SELECT
command.

<http://www.sqlite.org/lang_explain.html>
<http://www.sqlite.org/opcode.html>

Simon.

P Kishor

2009-10-18 16:29:17 UTC

Post by Ron Arts
Very true Simon,
this has been the fastest way so far and I can do around
350000 selects/second this way, using prepared statements
(on my machine at least), but I need more speed.
That's why I want to skip the SQL processing entirely
and write a C function that reaches directly into the
internal memory structures to gets my record from there.

I might have missed the discussion, but... why don't you ditch SQLite
and use something like Berkeley DB? Sounds to me you need a hash db
instead of an rdbms, especially since you have no need for SQL.

Post by Ron Arts
thanks,
Ron

Post by Simon Slavin

Post by Ron Arts
Is there a way to bypass the virtual machine altogether and reach directly
into the btree and just retrieve one record by it's oid (primary integer key),
and return it in a form that would allow taking out the column values by name?

The primary integer key column can always be referred to as the
special name ROWID, even if you have assigned it a column name of your
own. So you can do
SELECT ROWID,myCol1,myCol2 FROM myTable
as long as you don't explicitly declare a primary integer key column
and then change the values in it.
Simon.

--
Puneet Kishor http://www.punkish.org
Carbon Model http://carbonmodel.org
Charter Member, Open Source Geospatial Foundation http://www.osgeo.org
Science Commons Fellow, http://sciencecommons.org/about/whoweare/kishor
Nelson Institute, UW-Madison http://www.nelson.wisc.edu
-----------------------------------------------------------------------
Assertions are politics; backing up assertions with evidence is science
=======================================================================
Sent from Madison, WI, United States

Ron Arts

2009-10-18 18:23:56 UTC

Post by P Kishor

Post by Ron Arts
Very true Simon,
this has been the fastest way so far and I can do around
350000 selects/second this way, using prepared statements
(on my machine at least), but I need more speed.
That's why I want to skip the SQL processing entirely
and write a C function that reaches directly into the
internal memory structures to gets my record from there.

I might have missed the discussion, but... why don't you ditch SQLite
and use something like Berkeley DB? Sounds to me you need a hash db
instead of an rdbms, especially since you have no need for SQL.

Well, that's what I use at the moment (I use glib2 g_hash_table()),
but now the requirement to accept SQL queries for that database
has come up. And I don't want to write my own SQL parser..

Next I tried to add a virtual table driver to SQLite for my database.
That worked, but because the application is evolving, columns
get added/changed on a regular basis. Currently this means adding
the new columns to my C-structures, writing access functions, and
recompiling. I don't want to do that, because this means my appl *must*
be replaced on every database change, and I'd like to be able to
run different versions of it in the wild. I was hoping to make it
more dynamic and adaptive to database layout changes.

Thanks,
Ron

Post by P Kishor

Post by Ron Arts
thanks,
Ron

Post by Simon Slavin

Post by Ron Arts
Is there a way to bypass the virtual machine altogether and reach directly
into the btree and just retrieve one record by it's oid (primary integer key),
and return it in a form that would allow taking out the column values by name?

The primary integer key column can always be referred to as the
special name ROWID, even if you have assigned it a column name of your
own. So you can do
SELECT ROWID,myCol1,myCol2 FROM myTable
as long as you don't explicitly declare a primary integer key column
and then change the values in it.
Simon.

Simon Slavin

2009-10-18 18:33:38 UTC

Post by Ron Arts
because the application is evolving, columns
get added/changed on a regular basis. Currently this means adding
the new columns to my C-structures, writing access functions, and
recompiling. I don't want to do that, because this means my appl *must*
be replaced on every database change, and I'd like to be able to
run different versions of it in the wild. I was hoping to make it
more dynamic and adaptive to database layout changes.

Then what you need is a flexible system that can cope with schema and
index changes. And that's SQLite. Whatever you write yourself, if it
has to have the same flexibility as SQLIte does, won't be much faster
than SQLite.

Simon.

Pavel Ivanov

2009-10-19 12:07:42 UTC

Post by Ron Arts
Currently this means adding
the new columns to my C-structures, writing access functions, and
recompiling. I don't want to do that, because this means my appl *must*
be replaced on every database change, and I'd like to be able to
run different versions of it in the wild. I was hoping to make it
more dynamic and adaptive to database layout changes.

Sorry, Ron, but I don't get it. You're saying that adding more columns
means adding more fields into your structure and making your
application to know about those fields, and that means you need to
recompile and it's a bad thing. But if you're working with SQLite's
in-memory database then you have to create all tables in that database
at startup, so you need an SQL statement in your application
containing all columns you use, so again adding a column you need to
recompile, right? Or if you generate your CREATE TABLE statement
dynamically from postgresql then how does your code know about new
fields added? How does it work with it? Are your newly added columns
used only from sql coming directly from clients?
Maybe you just need to re-think your database schema so that you need
only one select statement per network request and so your requirements
to sql engine speed would be lowered by the order of magnitude?

Pavel

Post by Ron Arts

Post by P Kishor

Post by Ron Arts
Very true Simon,
this has been the fastest way so far and I can do around
350000 selects/second this way, using prepared statements
(on my machine at least), but I need more speed.
That's why I want to skip the SQL processing entirely
and write a C function that reaches directly into the
internal memory structures to gets my record from there.

I might have missed the discussion, but... why don't you ditch SQLite
and use something like Berkeley DB? Sounds to me you need a hash db
instead of an rdbms, especially since you have no need for SQL.

Well, that's what I use at the moment (I use glib2 g_hash_table()),
but now the requirement to accept SQL queries for that database
has come up. And I don't want to write my own SQL parser..
Next I tried to add a virtual table driver to SQLite for my database.
That worked, but because the application is evolving, columns
get added/changed on a regular basis. Currently this means adding
the new columns to my C-structures, writing access functions, and
recompiling. I don't want to do that, because this means my appl *must*
be replaced on every database change, and I'd like to be able to
run different versions of it in the wild. I was hoping to make it
more dynamic and adaptive to database layout changes.
Thanks,
Ron

Post by P Kishor

Post by Ron Arts
thanks,
Ron

Post by Simon Slavin

Post by Ron Arts
Is there a way to bypass the virtual machine altogether and reach directly
into the btree and just retrieve one record by it's oid (primary integer key),
and return it in a form that would allow taking out the column values by name?

The primary integer key column can always be referred to as the
special name ROWID, even if you have assigned it a column name of your
own. So you can do
SELECT ROWID,myCol1,myCol2 FROM myTable
as long as you don't explicitly declare a primary integer key column
and then change the values in it.
Simon.

_______________________________________________
sqlite-users mailing list
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Ron Arts

2009-10-21 13:10:11 UTC

Post by Pavel Ivanov

Post by Ron Arts
Currently this means adding
the new columns to my C-structures, writing access functions, and
recompiling. I don't want to do that, because this means my appl *must*
be replaced on every database change, and I'd like to be able to
run different versions of it in the wild. I was hoping to make it
more dynamic and adaptive to database layout changes.

Sorry, Ron, but I don't get it. You're saying that adding more columns
means adding more fields into your structure and making your
application to know about those fields, and that means you need to
recompile and it's a bad thing. But if you're working with SQLite's
in-memory database then you have to create all tables in that database
at startup, so you need an SQL statement in your application
containing all columns you use, so again adding a column you need to
recompile, right? Or if you generate your CREATE TABLE statement
dynamically from postgresql then how does your code know about new
fields added? How does it work with it? Are your newly added columns
used only from sql coming directly from clients?
Maybe you just need to re-think your database schema so that you need
only one select statement per network request and so your requirements
to sql engine speed would be lowered by the order of magnitude?

Pavel,

You are right, and I'll look into rewriting a bigger part of our application.

Thanks,
Ron

Post by Pavel Ivanov
Pavel

Post by Ron Arts

Post by P Kishor

Post by Ron Arts
Very true Simon,
this has been the fastest way so far and I can do around
350000 selects/second this way, using prepared statements
(on my machine at least), but I need more speed.
That's why I want to skip the SQL processing entirely
and write a C function that reaches directly into the
internal memory structures to gets my record from there.

I might have missed the discussion, but... why don't you ditch SQLite
and use something like Berkeley DB? Sounds to me you need a hash db
instead of an rdbms, especially since you have no need for SQL.

Well, that's what I use at the moment (I use glib2 g_hash_table()),
but now the requirement to accept SQL queries for that database
has come up. And I don't want to write my own SQL parser..
Next I tried to add a virtual table driver to SQLite for my database.
That worked, but because the application is evolving, columns
get added/changed on a regular basis. Currently this means adding
the new columns to my C-structures, writing access functions, and
recompiling. I don't want to do that, because this means my appl *must*
be replaced on every database change, and I'd like to be able to
run different versions of it in the wild. I was hoping to make it
more dynamic and adaptive to database layout changes.
Thanks,
Ron

Post by P Kishor

Post by Ron Arts
thanks,
Ron

Post by Simon Slavin

Post by Ron Arts
Is there a way to bypass the virtual machine altogether and reach directly
into the btree and just retrieve one record by it's oid (primary integer key),
and return it in a form that would allow taking out the column values by name?

The primary integer key column can always be referred to as the
special name ROWID, even if you have assigned it a column name of your
own. So you can do
SELECT ROWID,myCol1,myCol2 FROM myTable
as long as you don't explicitly declare a primary integer key column
and then change the values in it.
Simon.

_______________________________________________
sqlite-users mailing list
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Kees Nuyt

2009-10-18 18:33:26 UTC

On Sun, 18 Oct 2009 17:37:57 +0200,

Post by Ron Arts
Very true Simon,
this has been the fastest way so far and I can do around
350000 selects/second this way, using prepared statements
(on my machine at least), but I need more speed.
That's why I want to skip the SQL processing entirely
and write a C function that reaches directly into the
internal memory structures to gets my record from there.

You will have to implement BTree code to walk the index
pages. You'll have a hard time to make your code more
efficient than the SQLite BTree code.

Luckily the BTree code and the Virtual machine interpreter
are in the SQLite source repository, which is public domain.
EXPLAIN SELECT colX FROM tableY WHERE tableY.id = your key;
yields the VM instructions.

So, yes, you can do it.

Note: the SQLite BTree code may change without notice,
because it isn't a public API.

In your case I'd either go for a hash table, without SQL, or
solve the speed problem with hardware.

Post by Ron Arts
thanks,
Ron

--
( Kees Nuyt
)
c[_]

Olaf Schmidt

2009-10-19 00:10:29 UTC

Post by Ron Arts
Then my program opens a socket, and starts accepting connections,
those connections are long lasting, and send messages that need
a fast reply. Many of the messages result in messages being send
to all other clients. The messages require on average 10 lookups
in the memory db, each by oid.

Is the "socket-listener-thread" already decoupled from the
thread which hosts your sqlite-connection-handle?

If not done already, you should try it (that will not speedup
the sqlite-performance, but the overall-performance of your
"broadcasting-dispatcher-app").
Additionally you should decouple the "sqlite-thread" also from
the "reply-sender-threads" (placing the sqlite-query-results
in some structures, where the sender-threads are able to find
them).

That would ensure, that the sqlite-engine can always run
fullspeed, not waiting for potentially "slow, or blocking
socket-transfers".

In such a design you could also try another thing, which
maybe speeds up your selects - meaning, maybe "oID-
aggregation" can help.

If you receive in your socket-listener-thread approx.
50000 requests per second (and nothing will intermit this
receiver-thread now, since sqlite-queries run elsewhere) ...
then we talk about 50 incoming messages per milisecond.
Now, since the sqlite-thread is running elsewhere already
... why not aggregate the incoming oIDs in a comma-
separated list (in a simple charbuffer, shared with the
sqlite-thread - and flagged with a "next-job-descriptor").

Each 1 msec (to keep the latency low), you should end
gathering oIDs in such a "next-job" charbuffer and set
the finalized-flag in the job-descriptor-structure (after
that you could start gathering oIDs in your listener-thread
on a different charbuf-allocation immediately).

The sqlite-thread should look for new, flagged as "ready to
proceed" charbuffers on its own, and start its work in a more
"aggregated fashion" then - and maybe the engine-overhead
gets a bit reduced, if sqlite now performs *one* (larger)
select (only each 1 msec), but returning more than only
one single record in its step-loop then.
i.e. per:
Select * from Table Where oID In YourGathered_IDList

Just an idea - I've not yet tested here, if the throughput
would be better this way instead of performing single-record-
selects only ... you probably lose the advantage of the
precompiled "single-record-statement", but could gain
over all, as soon as you reach the step-loop, which does
then more than just one record with probably less overhead
overall.

Maybe that worth' a try.

Olaf

Simon Slavin

2009-10-10 22:19:42 UTC

Post by Ron Arts
The sqlite3_bind_int immediately gives me an RANGE_ERROR (25).
Is there some obvious thing I'm doing wrong?

I notice that your _prepare call puts single quotes around the
variable, whereas you are binding an integer to it. But that's
probably not what's causing your problem.

Post by Ron Arts
rc = sqlite3_bind_int(stmt, 1, rand()%count);

rand()%count ?

Just for debugging purposes, split this into two: assign rand()%count
to an integer variable in one line, then use it in the next line. Try
spitting out the variable value to a debugger or console.

Simon.

Naveen Grover -TP

2009-10-12 10:28:10 UTC

Is their a way to prepare the query and save (compiled form) so that we can share them between multiple connection?

Regds,
Naveen

-----Original Message-----
From: sqlite-users-bounces-CzDROfG0BjIdnm+***@public.gmane.org [mailto:sqlite-users-bounces-CzDROfG0BjIdnm+***@public.gmane.org] On Behalf Of Jay A. Kreibich
Sent: Sunday, October 11, 2009 1:54 AM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] sqlite in-memory database far too slow in my use case

Post by Ron Arts
I'm afraid the process of
constructing SQL queries / parsing them by sqlite, and
interpreting the results in my app, multiple times per
event will be too slow.

There should be no need to construct and parse queries with each
interaction. Assuming the queries are fairly well known, you should
be able to prepare them once and then keep using them over and over.
This should save a noticeable amount of time.

Make sure you're using the prepare/bind/step/reset/finalize
interfaces, rather than exec or get_table.

-j

--
Jay A. Kreibich < J A Y @ K R E I B I.C H >

"Our opponent is an alien starship packed with atomic bombs. We have
a protractor." "I'll go home and see if I can scrounge up a ruler
and a piece of string." --from Anathem by Neal Stephenson
_______________________________________________
sqlite-users mailing list
sqlite-users-CzDROfG0BjIdnm+***@public.gmane.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

+++++++++++++++++++++++++++++++++++++++++
CONFIDENTIALITY NOTICE & DISCLAIMER

The contents of this e-mail are confidential to the ordinary user of the e-mail address to which it was addressed and may also be privileged. If you are not the addressee of this e-mail you may not copy, forward, disclose or otherwise use it or any part of it in any form whatsoever. If you have received this e-mail in error please e-mail the sender by replying to this message. The recipient should check this email and any attachments for the presence of viruses. InterGlobe accepts no liability for any damage caused by any virus transmitted by this email.
+++++++++++++++++++++++++++++++++++++++++

Pavel Ivanov

2009-10-12 23:23:30 UTC

Post by Naveen Grover -TP
Is their a way to prepare the query and save (compiled form) so that we can share them between multiple connection?

Yes, there is: http://sqlite-consortium.com/products/sse.

Pavel

Post by Naveen Grover -TP
Is their a way to prepare the query and save (compiled form) so that we can share them between multiple connection?
Regds,
Naveen
-----Original Message-----
Sent: Sunday, October 11, 2009 1:54 AM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] sqlite in-memory database far too slow in my use case

Post by Ron Arts
I'm afraid the process of
constructing SQL queries / parsing them by sqlite, and
interpreting the results in my app, multiple times per
event will be too slow.

There should be no need to construct and parse queries with each
interaction. Assuming the queries are fairly well known, you should
be able to prepare them once and then keep using them over and over.
This should save a noticeable amount of time.
Make sure you're using the prepare/bind/step/reset/finalize
interfaces, rather than exec or get_table.
-j
--
"Our opponent is an alien starship packed with atomic bombs. We have
a protractor." "I'll go home and see if I can scrounge up a ruler
and a piece of string." --from Anathem by Neal Stephenson
_______________________________________________
sqlite-users mailing list
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
+++++++++++++++++++++++++++++++++++++++++
CONFIDENTIALITY NOTICE & DISCLAIMER
The contents of this e-mail are confidential to the ordinary user of the e-mail address to which it was addressed and may also be privileged. If you are not the addressee of this e-mail you may not copy, forward, disclose or otherwise use it or any part of it in any form whatsoever. If you have received this e-mail in error please e-mail the sender by replying to this message. The recipient should check this email and any attachments for the presence of viruses. InterGlobe accepts no liability for any damage caused by any virus transmitted by this email.
+++++++++++++++++++++++++++++++++++++++++
_______________________________________________
sqlite-users mailing list
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Jay A. Kreibich

2009-10-13 02:58:08 UTC

Post by Pavel Ivanov

Post by Naveen Grover -TP
Is their a way to prepare the query and save (compiled form) so that
we can share them between multiple connection?

Yes, there is: http://sqlite-consortium.com/products/sse.

I realize this may be a general question, but given that the topic of
the thread was in-memory databases, I'd just point out that there is
no such thing as "multiple connections" to an in-memory DB. Even
within the same process, you cannot open another database handle to
an in-memory DB.

-j

--
Jay A. Kreibich < J A Y @ K R E I B I.C H >

"Our opponent is an alien starship packed with atomic bombs. We have
a protractor." "I'll go home and see if I can scrounge up a ruler
and a piece of string." --from Anathem by Neal Stephenson

40 Replies
815 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

Ron Arts 2009-10-10 17:24:33 UTC

Roger Binns 2009-10-10 18:04:02 UTC

Simon Slavin 2009-10-10 18:38:08 UTC

Ron Arts 2009-10-10 19:42:44 UTC

Roger Binns 2009-10-10 20:26:49 UTC

Jay A. Kreibich 2009-10-10 20:27:33 UTC

Simon Slavin 2009-10-10 20:54:04 UTC

Alexey Pechnikov 2009-10-11 07:49:57 UTC

Jay A. Kreibich 2009-10-11 18:52:29 UTC

Alexey Pechnikov 2009-10-11 21:14:43 UTC

Jay A. Kreibich 2009-10-10 20:23:30 UTC

Ron Arts 2009-10-10 21:57:30 UTC

Jay A. Kreibich 2009-10-10 22:13:43 UTC

Ron Arts 2009-10-10 22:28:46 UTC

Olaf Schmidt 2009-10-10 23:26:37 UTC

Ron Arts 2009-10-11 08:04:37 UTC

Alexey Pechnikov 2009-10-11 08:21:32 UTC

Ron Arts 2009-10-11 11:04:51 UTC

Simon Slavin 2009-10-11 13:22:46 UTC

Olaf Schmidt 2009-10-11 14:55:27 UTC

Pavel Ivanov 2009-10-11 16:34:36 UTC

Ron Arts 2009-10-11 16:39:32 UTC

Pavel Ivanov 2009-10-12 23:34:55 UTC

Ron Arts 2009-10-11 16:48:12 UTC

Roger Binns 2009-10-11 17:07:44 UTC

sub sk79 2009-10-12 08:41:31 UTC

Ron Arts 2009-10-18 07:37:37 UTC

Simon Slavin 2009-10-18 15:10:20 UTC

Ron Arts 2009-10-18 15:37:57 UTC

Simon Slavin 2009-10-18 15:51:01 UTC

P Kishor 2009-10-18 16:29:17 UTC

Ron Arts 2009-10-18 18:23:56 UTC

Simon Slavin 2009-10-18 18:33:38 UTC

Pavel Ivanov 2009-10-19 12:07:42 UTC

Ron Arts 2009-10-21 13:10:11 UTC

Kees Nuyt 2009-10-18 18:33:26 UTC

Olaf Schmidt 2009-10-19 00:10:29 UTC

Simon Slavin 2009-10-10 22:19:42 UTC

Naveen Grover -TP 2009-10-12 10:28:10 UTC

Pavel Ivanov 2009-10-12 23:23:30 UTC

Jay A. Kreibich 2009-10-13 02:58:08 UTC

about - legalese

Loading...