Discussion:
insert speeds slowing down as database size increases (newb)
Julian Bui
2008-10-29 08:59:54 UTC
Permalink
Hi everyone,

First off, I'm a database and sqlite newbie. I'm inserting many many
records and indexing over one of the double attributes. I am seeing
my insert times slowly degrade as the database grows in size until
it's unacceptable - less than 1 write per millisecond (other databases
have scaled well). I'm using a intel core 2 duo with 2 GB of ram and
an ordinary HDD.

I am trying to figure out why some of the other databases (firebird,
mysql, berkeley db) have provided constant insert speeds whereas
sqlite has not. Now I do not mean to say anything negative about
sqlite, because there's a good chance I'm not doing something right.
Aside from not having the db perfectly tuned, I think I probably have
bigger problems like not having a suitable index or database
structure.

I don't know much about how sqlite does its indexing. My index,
(CREATE INDEX IF NOT EXISTS IDX_DDD on MYTABLE(ddd)), does not really
tell the database how to make use of my data. Since records are
inserted into the database so that ddd is sorted (record i's ddd field
is less than the ddd field of record i+1) the database should know
about this to index it quickly. I figure a clustered index would
help, but I haven't seen any way to do this in sqlite.

It would really benefit me if someone could help explain maybe basic
lower level details about how sqlite is indexing, or maybe provide
some other insight as to why the database insert times may be slowing
down as the database grows.

Thanks in advance,
Julian

<code>
//ps is a prepared statement
ps = conn.prepareStatement("CREATE table IF NOT EXISTS MY_TABLE(aaa SMALLINT
NOT NULL, bbb BIGINT NOT NULL, ccc SMALLINT, ddd DOUBLE, eee CHAR(8));");
ps.execute();

....

ps = conn.prepareStatement("CREATE INDEX IF NOT EXISTS IDX_DDD on MYTABLE
(ddd)");
ps.execute();

...

dataInsertPs = conn.prepareStatement("INSERT into MY_TABLE(aaa, bbb, ccc,
ddd, eee) VALUES (?, ?, ?, ?, ?)";);
//every dataInsertPs gets added to a batch and committed every 1000 records

</code>
Neville Franks
2008-10-29 09:09:05 UTC
Permalink
The most common reason which comes up here time and again is that the
inserts are wrapped in a transaction. See BEGIN, END statements in the
Docs. You haven't mentioned whether you are using a transaction, so I
may be misguided in my reply. But the sample code doesn't!

Wednesday, October 29, 2008, 7:59:54 PM, you wrote:

JB> Hi everyone,

JB> First off, I'm a database and sqlite newbie. I'm inserting many many
JB> records and indexing over one of the double attributes. I am seeing
JB> my insert times slowly degrade as the database grows in size until
JB> it's unacceptable - less than 1 write per millisecond (other databases
JB> have scaled well). I'm using a intel core 2 duo with 2 GB of ram and
JB> an ordinary HDD.
JB> ...

--
Best regards,
Neville Franks, http://www.surfulater.com http://blog.surfulater.com
Julian Bui
2008-10-29 16:42:08 UTC
Permalink
Thanks for replies everyone.

Actually, I don't include the code but I do make a very small mention of
using batch inserts w/ a transaction ("> //every dataInsertPs gets added to
a batch and committed every 1000 records").

I am using JDBC so I do not use BEGIN and END statements. Do I need to use
BEGIN and END *ALONG WITH* the JDBC api transaction commands? I don't think
I do since using jdbc transaction objects shows different insert times than
not using them.

Please let me know.

-Julian

On Wed, Oct 29, 2008 at 2:09 AM, Neville Franks <sqlite-A+NshVhb9/C+XT7JhA+***@public.gmane.org>wrote:

> The most common reason which comes up here time and again is that the
> inserts are wrapped in a transaction. See BEGIN, END statements in the
> Docs. You haven't mentioned whether you are using a transaction, so I
> may be misguided in my reply. But the sample code doesn't!
>
> Wednesday, October 29, 2008, 7:59:54 PM, you wrote:
>
> JB> Hi everyone,
>
> JB> First off, I'm a database and sqlite newbie. I'm inserting many many
> JB> records and indexing over one of the double attributes. I am seeing
> JB> my insert times slowly degrade as the database grows in size until
> JB> it's unacceptable - less than 1 write per millisecond (other databases
> JB> have scaled well). I'm using a intel core 2 duo with 2 GB of ram and
> JB> an ordinary HDD.
> JB> ...
>
> --
> Best regards,
> Neville Franks, http://www.surfulater.com http://blog.surfulater.com
>
>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users-CzDROfG0BjIdnm+***@public.gmane.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
John Stanton
2008-10-29 12:10:56 UTC
Permalink
Look up the implications of Sqlite's ACID feature and the use of
transactions. COMMITs are tied to disk rotation speed. On our Sqlite
databases where we look for performance we use 15,000 rpm disks and are
diligent in wrapping INSERTs, UPDATEs and DELETEs in transactions and
get very good results.

The Sqlite B-Tree indices do slow down on insertion as extra levels are
created in the index as it grows large. That is an inherent feature of
such structures.
JS

Julian Bui wrote:
> Hi everyone,
>
> First off, I'm a database and sqlite newbie. I'm inserting many many
> records and indexing over one of the double attributes. I am seeing
> my insert times slowly degrade as the database grows in size until
> it's unacceptable - less than 1 write per millisecond (other databases
> have scaled well). I'm using a intel core 2 duo with 2 GB of ram and
> an ordinary HDD.
>
> I am trying to figure out why some of the other databases (firebird,
> mysql, berkeley db) have provided constant insert speeds whereas
> sqlite has not. Now I do not mean to say anything negative about
> sqlite, because there's a good chance I'm not doing something right.
> Aside from not having the db perfectly tuned, I think I probably have
> bigger problems like not having a suitable index or database
> structure.
>
> I don't know much about how sqlite does its indexing. My index,
> (CREATE INDEX IF NOT EXISTS IDX_DDD on MYTABLE(ddd)), does not really
> tell the database how to make use of my data. Since records are
> inserted into the database so that ddd is sorted (record i's ddd field
> is less than the ddd field of record i+1) the database should know
> about this to index it quickly. I figure a clustered index would
> help, but I haven't seen any way to do this in sqlite.
>
> It would really benefit me if someone could help explain maybe basic
> lower level details about how sqlite is indexing, or maybe provide
> some other insight as to why the database insert times may be slowing
> down as the database grows.
>
> Thanks in advance,
> Julian
>
> <code>
> //ps is a prepared statement
> ps = conn.prepareStatement("CREATE table IF NOT EXISTS MY_TABLE(aaa SMALLINT
> NOT NULL, bbb BIGINT NOT NULL, ccc SMALLINT, ddd DOUBLE, eee CHAR(8));");
> ps.execute();
>
> ....
>
> ps = conn.prepareStatement("CREATE INDEX IF NOT EXISTS IDX_DDD on MYTABLE
> (ddd)");
> ps.execute();
>
> ...
>
> dataInsertPs = conn.prepareStatement("INSERT into MY_TABLE(aaa, bbb, ccc,
> ddd, eee) VALUES (?, ?, ?, ?, ?)";);
> //every dataInsertPs gets added to a batch and committed every 1000 records
>
> </code>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users-CzDROfG0BjIdnm+***@public.gmane.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Eduardo Morras
2008-10-29 12:18:28 UTC
Permalink
At 13:10 29/10/2008, you wrote:
>Look up the implications of Sqlite's ACID feature and the use of
>transactions. COMMITs are tied to disk rotation speed. On our Sqlite
>databases where we look for performance we use 15,000 rpm disks and are
>diligent in wrapping INSERTs, UPDATEs and DELETEs in transactions and
>get very good results.
>
>The Sqlite B-Tree indices do slow down on insertion as extra levels are
>created in the index as it grows large. That is an inherent feature of
>such structures.
>JS

I think the problem is that he creates the index before insert the
data. He should insert the data using begin..commit and after insert
creates the indexes. If not, for each insert, the indexes must be
reworked/redone and slowdown happens

HTH
John Elrick
2008-10-29 12:48:46 UTC
Permalink
Eduardo Morras wrote:
> At 13:10 29/10/2008, you wrote:
>
>> Look up the implications of Sqlite's ACID feature and the use of
>> transactions. COMMITs are tied to disk rotation speed. On our Sqlite
>> databases where we look for performance we use 15,000 rpm disks and are
>> diligent in wrapping INSERTs, UPDATEs and DELETEs in transactions and
>> get very good results.
>>
>> The Sqlite B-Tree indices do slow down on insertion as extra levels are
>> created in the index as it grows large. That is an inherent feature of
>> such structures.
>> JS
>>
>
> I think the problem is that he creates the index before insert the
> data. He should insert the data using begin..commit and after insert
> creates the indexes. If not, for each insert, the indexes must be
> reworked/redone and slowdown happens
>
FWIW,

I've done some experiments in the past with large numbers of inserts
trying it both ways, with the indexes intact during insert and by
creating them after the fact, and I did not trace any significant
differences in time. You mileage may vary as the type of work I am
doing may simply not benefit, but my experience tells me that moving
index creation is not a guaranteed way of improving performance.


John
Jay A. Kreibich
2008-10-29 14:53:14 UTC
Permalink
On Wed, Oct 29, 2008 at 06:10:56AM -0600, John Stanton scratched on the wall:

> The Sqlite B-Tree indices do slow down on insertion as extra levels are
> created in the index as it grows large. That is an inherent feature of
> such structures.

This can often be mitigated by increasing the size of the page cache.

A large page cache will speed up the creation of new indexes
considerably, especially for data that is not already sorted.
A larger cache can also help with frequent inserts on existing indexes.

See "PRAGMA cache_size". If you're working on a modern desktop with
a comfortable amount of RAM, it isn't unreasonable to increase the
cache size by an order of magnitude or two (default is 2000).

-j

--
Jay A. Kreibich < J A Y @ K R E I B I.C H >

"Our opponent is an alien starship packed with atomic bombs. We have
a protractor." "I'll go home and see if I can scrounge up a ruler
and a piece of string." --from Anathem by Neal Stephenson
Julian Bui
2008-10-29 16:44:00 UTC
Permalink
>
> See "PRAGMA cache_size". If you're working on a modern desktop with
> a comfortable amount of RAM, it isn't unreasonable to increase the
> cache size by an order of magnitude or two (default is 2000).
>

I forgot to mention I use JDBC to access sqlite from a java app. Is there
an equivalent line of code to execute from within java/jdbc to do the same
thing? Or a configuration somewhere?

On Wed, Oct 29, 2008 at 7:53 AM, Jay A. Kreibich <jay-m2zl/UuQNF/***@public.gmane.org> wrote:

> On Wed, Oct 29, 2008 at 06:10:56AM -0600, John Stanton scratched on the
> wall:
>
> > The Sqlite B-Tree indices do slow down on insertion as extra levels are
> > created in the index as it grows large. That is an inherent feature of
> > such structures.
>
> This can often be mitigated by increasing the size of the page cache.
>
> A large page cache will speed up the creation of new indexes
> considerably, especially for data that is not already sorted.
> A larger cache can also help with frequent inserts on existing indexes.
>
> See "PRAGMA cache_size". If you're working on a modern desktop with
> a comfortable amount of RAM, it isn't unreasonable to increase the
> cache size by an order of magnitude or two (default is 2000).
>
> -j
>
> --
> Jay A. Kreibich < J A Y @ K R E I B I.C H >
>
> "Our opponent is an alien starship packed with atomic bombs. We have
> a protractor." "I'll go home and see if I can scrounge up a ruler
> and a piece of string." --from Anathem by Neal Stephenson
> _______________________________________________
> sqlite-users mailing list
> sqlite-users-CzDROfG0BjIdnm+***@public.gmane.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
MikeW
2008-10-29 17:35:24 UTC
Permalink
Julian Bui <***@...> writes:

>
> >
> > See "PRAGMA cache_size". If you're working on a modern desktop with
> > a comfortable amount of RAM, it isn't unreasonable to increase the
> > cache size by an order of magnitude or two (default is 2000).
> >
>
> I forgot to mention I use JDBC to access sqlite from a java app. Is there
> an equivalent line of code to execute from within java/jdbc to do the same
> thing? Or a configuration somewhere?

see here http://article.gmane.org/gmane.comp.db.sqlite.general/42177
MikeW
Alex Scotti
2008-10-31 01:11:48 UTC
Permalink
On Oct 29, 2008, at 4:59 AM, Julian Bui wrote:

> Hi everyone,
>
> First off, I'm a database and sqlite newbie. I'm inserting many many
> records and indexing over one of the double attributes. I am seeing
> my insert times slowly degrade as the database grows in size until
> it's unacceptable - less than 1 write per millisecond (other databases
> have scaled well). I'm using a intel core 2 duo with 2 GB of ram and
> an ordinary HDD.
>
> I am trying to figure out why some of the other databases (firebird,
> mysql, berkeley db) have provided constant insert speeds whereas
> sqlite has not.

i can tell you firsthand that berkeley db does not provide anything
like constant time for random inserts into a btree as it's size grows.
Julian Bui
2008-10-31 03:57:13 UTC
Permalink
Unless I did something wrong, I did observe constant time inserts in
Berkeley DB. Is it possible that I had constant time inserts into a btree
as my db grew because of the nature of my data? I was inserting records in
order of how they would be sorted by index. In other words, every inserted
record's indexed field was greater than the previous one. Perhaps the db
made use of this feature of my inserts because I set sortedDuplicates() on
which allows for a clustered index.

On Thu, Oct 30, 2008 at 6:11 PM, Alex Scotti <alex-***@public.gmane.org> wrote:

>
> On Oct 29, 2008, at 4:59 AM, Julian Bui wrote:
>
> > Hi everyone,
> >
> > First off, I'm a database and sqlite newbie. I'm inserting many many
> > records and indexing over one of the double attributes. I am seeing
> > my insert times slowly degrade as the database grows in size until
> > it's unacceptable - less than 1 write per millisecond (other databases
> > have scaled well). I'm using a intel core 2 duo with 2 GB of ram and
> > an ordinary HDD.
> >
> > I am trying to figure out why some of the other databases (firebird,
> > mysql, berkeley db) have provided constant insert speeds whereas
> > sqlite has not.
>
> i can tell you firsthand that berkeley db does not provide anything
> like constant time for random inserts into a btree as it's size grows.
>
>
>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users-CzDROfG0BjIdnm+***@public.gmane.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
John Stanton
2008-10-31 13:41:52 UTC
Permalink
You canot have constant time inserts into a B-Tree because of the
inherent nature of the algorithm. Berkeley DB has either B-Tree or
hashed indices. The unordered hashed indices are possibly what you
measured. Note that B-Trees have the additional property that they
maintain an order and thus are more useful in a database.
JS

Julian Bui wrote:
> Unless I did something wrong, I did observe constant time inserts in
> Berkeley DB. Is it possible that I had constant time inserts into a btree
> as my db grew because of the nature of my data? I was inserting records in
> order of how they would be sorted by index. In other words, every inserted
> record's indexed field was greater than the previous one. Perhaps the db
> made use of this feature of my inserts because I set sortedDuplicates() on
> which allows for a clustered index.
>
> On Thu, Oct 30, 2008 at 6:11 PM, Alex Scotti <alex-***@public.gmane.org> wrote:
>
>
>>On Oct 29, 2008, at 4:59 AM, Julian Bui wrote:
>>
>>
>>>Hi everyone,
>>>
>>>First off, I'm a database and sqlite newbie. I'm inserting many many
>>>records and indexing over one of the double attributes. I am seeing
>>>my insert times slowly degrade as the database grows in size until
>>>it's unacceptable - less than 1 write per millisecond (other databases
>>>have scaled well). I'm using a intel core 2 duo with 2 GB of ram and
>>>an ordinary HDD.
>>>
>>>I am trying to figure out why some of the other databases (firebird,
>>>mysql, berkeley db) have provided constant insert speeds whereas
>>>sqlite has not.
>>
>>i can tell you firsthand that berkeley db does not provide anything
>>like constant time for random inserts into a btree as it's size grows.
>>
>>
>>
>>_______________________________________________
>>sqlite-users mailing list
>>sqlite-users-CzDROfG0BjIdnm+***@public.gmane.org
>>http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users-CzDROfG0BjIdnm+***@public.gmane.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Loading...