Discussion:
How to set memory usage as high as possible -- but not too high?
Ralf Junker
2008-04-13 22:53:07 UTC
Permalink
I need to create a huge database (about 6 GB, more than 6 mio records, blobs, and FTS text) in as little time as possible. Since memory is the key to speed, I try to use as much memory as is available. However, there is the danger of running out of memory. This is where memory usage control comes into play. I can see there are two options:


* OPTION 1: PRAGMA cache_size = 10000000;

Advantage: SQLite will use ample memory, but no more than that.

Disadvantage: Difficulty to establish exact memory requirements in advance. The help states that "Each page uses about 1.5K of memory.", but I found this to be wrong. Memory usage obviously depends on the page size, and my measurement shows that there is an additional small overhead of undocumented size. Is there a formula to calculate the required memory for a cache_size of x?


* OPTION 2: sqlite3_soft_heap_limit(max_mem_bytes);

Advantage: Memory limit can be set to a known value (amount of free memory as returned from the OS).

Disadvantage: My tests indicate that SQLite slows down drastically when it hits the memory limit. Inserts drop from a few hundred per second to just one or two per sec.


* OPTION 3: Catch out-of-memory errors and reduce cache_size accordingly (untested scenario).

Advantage: Use memory up to the least bits available.

Disadvantage: How to avoid data loss after the out-of-memory error. Can I just call sqlite3_release_memory(some_bytes) and sqlite3_step again and again until it passes without the out-of-memory error?


This raises a few questions:

* Do sqlite3_soft_heap_limit(), or "PRAGMA cache_size=x;", or both establish SQLite's upper memory limit? Do they work independently of each other, i.e. does the lower limit always kick in first?

* Does "PRAGMA cache_size=some_smaller_value;" cause SQLite to free used pages and release their memory straight away?

* Is there another runtime -- important! -- setting to establish a maximum memory limit, possibly undocumented?


In the end this boils down to a simple problem:

* Wow to keep SQLite's memory usage as close to, but not exceeding the memory available to applications?

I will be very grateful for any suggestion!

Many thanks,

Ralf
Roger Binns
2008-04-14 02:04:22 UTC
Permalink
Are you using a 32 bit or 64 bit process. Also is there a requirement
to create the database in the filesystem? If not you could ensure your
swap is sufficiently large (I use a mininmum of 16GB on my machines :-)
and create in a tmpfs filesystem, and then copy the database to
persistent storage when you are done.

You also didn't list turning off synchronous etc while creating the
database and turning it back on when done. I am curious why you think
memory is the bottleneck anyway!

Roger
Ralf Junker
2008-04-14 07:38:50 UTC
Permalink
Post by Roger Binns
Are you using a 32 bit or 64 bit process.
32, but it does not matter to the problem.
Post by Roger Binns
Also is there a requirement to create the database in the filesystem?
Yes.
Post by Roger Binns
If not you could ensure your swap is sufficiently large (I use a mininmum of 16GB on my machines :-) and create in a tmpfs filesystem, and then copy the database to
persistent storage when you are done.
The aim is to avoid slow swap memory but use fast RAM only.
Post by Roger Binns
You also didn't list turning off synchronous etc while creating the database and turning it back on when done.
Performance settings are:

* PRAGMA locking_mode=exclusive;
* PRAGMA synchronous=off;
* Disable journal file :-)
Post by Roger Binns
I am curious why you think memory is the bottleneck anyway!
It has often been pointed out on this list that inserts into indexed tables (regular or FTS) run faster with a high page cache. My own tests 2nd this. A few 100 MB more or less can make an difference of more than 100%.

Ralf
Jay A. Kreibich
2008-04-14 15:56:06 UTC
Permalink
Post by Ralf Junker
Post by Dan
Post by Ralf Junker
* OPTION 1: PRAGMA cache_size = 10000000;
Advantage: SQLite will use ample memory, but no more than that.
Disadvantage: Difficulty to establish exact memory requirements in
advance. The help states that "Each page uses about 1.5K of
memory.", but I found this to be wrong. Memory usage obviously
depends on the page size, and my measurement shows that there is an
additional small overhead of undocumented size. Is there a formula
to calculate the required memory for a cache_size of x?
I'd be curious if you know an answer to this, too?
Given that the default page size is 1K, it seems like the statement
"Each page uses about 1.5K of memory" would account for most of the
overhead.
Post by Ralf Junker
Post by Dan
Post by Ralf Junker
* OPTION 2: sqlite3_soft_heap_limit(max_mem_bytes);
Disadvantage: My tests indicate that SQLite slows down drastically
when it hits the memory limit. Inserts drop from a few hundred per
second to just one or two per sec.
That is an odd result. How did you test it?
I set up a high cache_size and a lower sqlite3_soft_heap_limit() and
started inserting blobs.
Be aware that the default setting for this is "unlimited", so any
value set is only going to reduce the amount of available memory.
I understand you might try to do this to keep the whole memory image
in RAM, but as long as the page cache is something realistic, I'm not
sure any additional limits are advisable.

Also, I'm not sure I would consider this test valid. If the system
runs up against the soft heap limit, it is going recover memory from
the page cache. Setting up a large page cache and a low soft heap
limit is going to cause the memory allocator and page cache to fight
with each other.

Things may really go south in this specific case, as the first pages
the recovery system is going to go after are those that don't require
a sync, i.e. those pages that have been read but not written to.
That would include most of the internal B-Tree index pages that are
providing your performance gain for the indexed inserts.


===================================================
Post by Ralf Junker
Post by Dan
Are you using a 32 bit or 64 bit process.
32, but it does not matter to the problem.
When you give examples like "PRAGMA cache_size = 10000000;", or 10M
pages which would take something on the order of 15 GB of memory to
service with the default 1K page size, a few of us start to wonder.

Not only is that impossible with a 32-bit system, if your database is
only 6GB, that cache setting is much larger than the database itself,
which only has about six million pages.
Post by Ralf Junker
Post by Dan
Also is there a requirement to create the database in the filesystem?
Yes.
Post by Dan
If not you could ensure your swap is sufficiently large (I use a
mininmum of 16GB on my machines :-) and create in a tmpfs filesystem,
and then copy the database to persistent storage when you are done.
The aim is to avoid slow swap memory but use fast RAM only.
Yes, but that's impossible with a 6GB database (and a 32bit system).
Some of it is going to sit on disk. The big advantage of using tmpfs
is that it is linked directly to the operating system, so it will use
as much RAM as possible (but never more) and require very little tuning.
Post by Ralf Junker
Post by Dan
I am curious why you think memory is the bottleneck anyway!
It has often been pointed out on this list that inserts into
indexed tables (regular or FTS) run faster with a high page cache.
My own tests 2nd this. A few 100 MB more or less can make an
difference of more than 100%.
Given that the default page cache is 2000 pages, or on the order of
3MB, it seems that you're hitting some serious limits. If hundreds
of megabytes (!) is giving you a return on the order of 2x, then there
is no magic bullet-- you aren't going to find a setting that suddenly
gives you a 10x speedup. You're hitting diminishing returns in a
serious kind of way.

Personally, I'd pick a number, like half your RAM size or ~1.5GB*
(whichever is smaller), set the page cache, and be done with it.
It sounds like you've already found most of the other PRAGMAs that
are going to get you something. You might be able to tune the size
of your INSERT transactions, but if you're around 1000 or so, going
higher isn't likely to buy you too much.

* This upper limit is OS-dependent. I'd use 1.5GB on Windows and older
Linux systems, 2.5GB for some of the more modern Linux systems, 3.5GB
for Mac OS X.

Like all things performance tuning, unless you have a really good
idea of how the system is working, you're just shooting in the dark.

-j
--
Jay A. Kreibich < J A Y @ K R E I B I.C H >

"'People who live in bamboo houses should not throw pandas.' Jesus said that."
- "The Ninja", www.AskANinja.com, "Special Delivery 10: Pop!Tech 2006"
Ralf Junker
2008-04-24 18:28:08 UTC
Permalink
Post by Jay A. Kreibich
Post by Ralf Junker
Post by Roger Binns
Are you using a 32 bit or 64 bit process.
32, but it does not matter to the problem.
When you give examples like "PRAGMA cache_size = 10000000;", or 10M
pages which would take something on the order of 15 GB of memory to
service with the default 1K page size, a few of us start to wonder.
Good point. But I believe you misunderstood my intention. I was not interested in absolute but in relative numbers, regardless of a 32 or 64 bit system. All I need to know is related to the amount of RAM available when the application starts. The aim is to use as much RAM as possible, but never more than reasonably available. The last bit is the difficult one.
Post by Jay A. Kreibich
Post by Ralf Junker
Post by Roger Binns
I am curious why you think memory is the bottleneck anyway!
It has often been pointed out on this list that inserts into
indexed tables (regular or FTS) run faster with a high page cache.
My own tests 2nd this. A few 100 MB more or less can make an
difference of more than 100%.
Given that the default page cache is 2000 pages, or on the order of
3MB, it seems that you're hitting some serious limits. If hundreds
of megabytes (!) is giving you a return on the order of 2x, then there
is no magic bullet-- you aren't going to find a setting that suddenly
gives you a 10x speedup. You're hitting diminishing returns in a
serious kind of way.
Some numbers: 32 bit app, inserting 6 million records into 2 tables each (text, fts, and blob). The database finally grows to over 6 GB in size. As the last step, a simple index is created on one text field.

With the default 2000 pages cache size (1 KB page size), this takes about a full day or more. Raising the page cache to some 180000 pages uses about 270 MB of memory but brings the timing down to less than one hour.

My testing shows that inserts with lots of random disk searches (indexes, fts) hugely benefit from a large cache size for the simple reason that it reduces disk IO.
Post by Jay A. Kreibich
Personally, I'd pick a number, like half your RAM size or ~1.5GB*
(whichever is smaller), set the page cache, and be done with it.
That's what I ended up doing. In addition, I regularly check sqlite3_memory_used() and reduce the page cache if it exceeds the limit. Then I call sqlite3_release_memory() repeatedly until the memory usage has dropped sufficiently.
Post by Jay A. Kreibich
It sounds like you've already found most of the other PRAGMAs that
are going to get you something. You might be able to tune the size
of your INSERT transactions, but if you're around 1000 or so, going
higher isn't likely to buy you too much.
Currently I use just a single transaction for all inserts into a newly created database. This reduces the number of cache flushes to a single time when all data is inserted and just the used memory is being freed.

As another optimization option I am looking forward for the new journal pragma and will hopefully not need to use journal file after all.

Thanks for the feedback and my apologies for the late response,

Ralf

Dan
2008-04-14 03:58:36 UTC
Permalink
Post by Ralf Junker
I need to create a huge database (about 6 GB, more than 6 mio
records, blobs, and FTS text) in as little time as possible. Since
memory is the key to speed, I try to use as much memory as is
available. However, there is the danger of running out of memory.
This is where memory usage control comes into play. I can see there
* OPTION 1: PRAGMA cache_size = 10000000;
Advantage: SQLite will use ample memory, but no more than that.
Disadvantage: Difficulty to establish exact memory requirements in
advance. The help states that "Each page uses about 1.5K of
memory.", but I found this to be wrong. Memory usage obviously
depends on the page size, and my measurement shows that there is an
additional small overhead of undocumented size. Is there a formula
to calculate the required memory for a cache_size of x?
* OPTION 2: sqlite3_soft_heap_limit(max_mem_bytes);
Advantage: Memory limit can be set to a known value (amount of free
memory as returned from the OS).
Disadvantage: My tests indicate that SQLite slows down drastically
when it hits the memory limit. Inserts drop from a few hundred per
second to just one or two per sec.
That is an odd result. How did you test it? What was the memory
limit? Any chance the machine started using swap space?
Post by Ralf Junker
* OPTION 3: Catch out-of-memory errors and reduce cache_size
accordingly (untested scenario).
Advantage: Use memory up to the least bits available.
Disadvantage: How to avoid data loss after the out-of-memory error.
Can I just call sqlite3_release_memory(some_bytes) and sqlite3_step
again and again until it passes without the out-of-memory error?
* Do sqlite3_soft_heap_limit(), or "PRAGMA cache_size=x;", or both
establish SQLite's upper memory limit? Do they work independently
of each other, i.e. does the lower limit always kick in first?
Both limits can be used simultaneously. The cache_size limit is
per database cache, soft_heap_limit() sets a global parameter
that governs all sqlite connections opened by the process.
Post by Ralf Junker
* Does "PRAGMA cache_size=some_smaller_value;" cause SQLite to free
used pages and release their memory straight away?
No. If the cache_size parameter is set to a value that
is less than the number of pages currently allocated for the
cache, no more pages will be allocated. But no existing
pages will be freed.
Post by Ralf Junker
* Is there another runtime -- important! -- setting to establish a
maximum memory limit, possibly undocumented?
There is the SQLITE_MEMORY_SIZE option. But that's not really
useful for the very large memory limits you're talking about.
So soft_heap_limit() and pragma cache_size are it.
Post by Ralf Junker
* Wow to keep SQLite's memory usage as close to, but not exceeding
the memory available to applications?
It's not really that simple. On a workstation, not all memory is
equal. The maximum amount of memory available to an application
is all of the RAM + all of the swap space. Best performance
probably comes by using up all of the RAM and never using the swap.

Realistically, you should probably just set a large cache_size as
in option 1. Does SQLite really run faster with 1GB available than
it would with 100MB?

Dan.
Ralf Junker
2008-04-14 07:51:40 UTC
Permalink
Dan,

many thanks for the quick response and detailed answers. However, a question or two still puzzle me.
Post by Dan
Post by Ralf Junker
* OPTION 1: PRAGMA cache_size = 10000000;
Advantage: SQLite will use ample memory, but no more than that.
Disadvantage: Difficulty to establish exact memory requirements in
advance. The help states that "Each page uses about 1.5K of
memory.", but I found this to be wrong. Memory usage obviously
depends on the page size, and my measurement shows that there is an
additional small overhead of undocumented size. Is there a formula
to calculate the required memory for a cache_size of x?
I'd be curious if you know an answer to this, too?
Post by Dan
Post by Ralf Junker
* OPTION 2: sqlite3_soft_heap_limit(max_mem_bytes);
Disadvantage: My tests indicate that SQLite slows down drastically
when it hits the memory limit. Inserts drop from a few hundred per
second to just one or two per sec.
That is an odd result. How did you test it?
I set up a high cache_size and a lower sqlite3_soft_heap_limit() and started inserting blobs.
Post by Dan
What was the memory limit? Any chance the machine started using swap space?
I will test again and let you know.
Post by Dan
Post by Ralf Junker
* Does "PRAGMA cache_size=some_smaller_value;" cause SQLite to free
used pages and release their memory straight away?
No. If the cache_size parameter is set to a value that
is less than the number of pages currently allocated for the
cache, no more pages will be allocated. But no existing
pages will be freed.
Good to know. So I would reduce the cache_size and then use sqlite3_release_memory() to free memory, right?

Maybe this is worth documenting?
Post by Dan
Does SQLite really run faster with 1GB available than it would with 100MB?
Yes. If there are indexes (regular or FTS) on the table, SQLite needs quick access to lots of pages for searching and rearranging b-tree entries. My timings show that 100MB or 500MB can sometimes make a difference of more than 100%.

Richard recently talked about upcoming indexing performance improvements. I wonder if they are part of the performance refactoring due with the next release? :-)

Ralf
Continue reading on narkive:
Loading...