Discussion:
Why is VACUUM so slow?
Nemanja Corlija
2006-11-18 14:35:52 UTC
Permalink
I have a db with one table that has a text primary key and 16 text
columns in total.
After importing data from CSV file db had 5M rows and file size was
833MB. After some big DELETEs db had around 3M rows and 500MB after
"VACUUMing".
Running VACUUM for more then an hour filled new db with ~300MB worth
of data. I then aborted that and did something like this:

PRAGMA page_size=4096;
PRAGMA synchronous=OFF;
PRAGMA cache_size=100000;
PRAGMA temp_store=MEMORY;
CREATE TABLE tbl1(same structure as in original db);
ATTACH "original.db3" AS old;
INSERT INTO main.tbl1 SELECT * FROM old.tbl1;

This finished in less then 15 minutes. Timings are obviously very
rough here, but time difference is obvious.

Except for page_size, VACUUM had same PRAGMAs applied.

Isn't what I did manually above very similar to what VACUUM does
behind the scenes?
If so, why is there such a big difference?

One observation though, while VACUUM seemed to be completely I/O
bound. INSERT seemed to be CPU bound, which is not surprising given
that it was doing uniqueness check for PRIMARY KEY all over again. I
guess VACUUM doesn't have to do that since its copying from existing
db that already had uniqueness enforced.

This was on Win2000, SQLite 3.3.7 via sqlite3.exe.

Has anyone experienced similar performance difference?
--
Nemanja Corlija <chorlya-***@public.gmane.org>

-----------------------------------------------------------------------------
To unsubscribe, send email to sqlite-users-unsubscribe-CzDROfG0BjIdnm+***@public.gmane.org
-----------------------------------------------------------------------------
Mark Richards
2006-11-18 14:52:11 UTC
Permalink
Post by Nemanja Corlija
I have a db with one table that has a text primary key and 16 text
columns in total.
After importing data from CSV file db had 5M rows and file size was
833MB. After some big DELETEs db had around 3M rows and 500MB after
"VACUUMing".
Running VACUUM for more then an hour filled new db with ~300MB worth
PRAGMA page_size=4096;
PRAGMA synchronous=OFF;
PRAGMA cache_size=100000;
PRAGMA temp_store=MEMORY;
CREATE TABLE tbl1(same structure as in original db);
ATTACH "original.db3" AS old;
INSERT INTO main.tbl1 SELECT * FROM old.tbl1;
This finished in less then 15 minutes. Timings are obviously very
rough here, but time difference is obvious.
Except for page_size, VACUUM had same PRAGMAs applied.
Isn't what I did manually above very similar to what VACUUM does
behind the scenes?
If so, why is there such a big difference?
One observation though, while VACUUM seemed to be completely I/O
bound. INSERT seemed to be CPU bound, which is not surprising given
that it was doing uniqueness check for PRIMARY KEY all over again. I
guess VACUUM doesn't have to do that since its copying from existing
db that already had uniqueness enforced.
This was on Win2000, SQLite 3.3.7 via sqlite3.exe.
Has anyone experienced similar performance difference?
Yes. I had to build some code to export the data, rebuild the database,
and import. This wasn't a big deal to run on a daily basis and has
saved the horrors of losing the database entirely which, on the embedded
platform that it runs upon, could not be recovered in situ due to memory
constraints.

In the end, switching to sqlite3 and using the auto vacuum pragma has
resolved the problem.

/mark richards
--
"I'm convinced that the universe has been created by someone
with a particularly vile sense of humor. I would like to
propose a theory of "intelligent malicious asshole design."
- anon

-----------------------------------------------------------------------------
To unsubscribe, send email to sqlite-users-unsubscribe-CzDROfG0BjIdnm+***@public.gmane.org
-----------------------------------------------------------------------------
P Kishor
2006-11-18 14:53:51 UTC
Permalink
didn't try any of your tricks, but can confirm that VACUUM is very
slow on a similar db I have...

table1 -- 210k rows x 6 cols, 4 indexes, 1 pk
table2 -- 36k rows x 6 cols, 4 indexes, 1 pk
table3 -- 16k rows x 6 cols, 4 indexes, 1 pk
table4 -- 5M rows x 4 cols, 2 indexes, 1 pk

total size on file 1.95 Gb

VACUUM takes an hour+ on a 3 GHz x 4 Xeon (quad Xeon), 4 Gb RAM, Win
XP, SQLite 3.3.7
Post by Nemanja Corlija
I have a db with one table that has a text primary key and 16 text
columns in total.
After importing data from CSV file db had 5M rows and file size was
833MB. After some big DELETEs db had around 3M rows and 500MB after
"VACUUMing".
Running VACUUM for more then an hour filled new db with ~300MB worth
PRAGMA page_size=4096;
PRAGMA synchronous=OFF;
PRAGMA cache_size=100000;
PRAGMA temp_store=MEMORY;
CREATE TABLE tbl1(same structure as in original db);
ATTACH "original.db3" AS old;
INSERT INTO main.tbl1 SELECT * FROM old.tbl1;
This finished in less then 15 minutes. Timings are obviously very
rough here, but time difference is obvious.
Except for page_size, VACUUM had same PRAGMAs applied.
Isn't what I did manually above very similar to what VACUUM does
behind the scenes?
If so, why is there such a big difference?
One observation though, while VACUUM seemed to be completely I/O
bound. INSERT seemed to be CPU bound, which is not surprising given
that it was doing uniqueness check for PRIMARY KEY all over again. I
guess VACUUM doesn't have to do that since its copying from existing
db that already had uniqueness enforced.
This was on Win2000, SQLite 3.3.7 via sqlite3.exe.
Has anyone experienced similar performance difference?
--
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
--
Puneet Kishor http://punkish.eidesis.org/
Nelson Inst. for Env. Studies, UW-Madison http://www.nelson.wisc.edu/
Open Source Geospatial Foundation https://edu.osgeo.org/
-------------------------------------------------------------------------------------------------------
collaborate, communicate, compete
====================================================

-----------------------------------------------------------------------------
To unsubscribe, send email to sqlite-users-unsubscribe-CzDROfG0BjIdnm+***@public.gmane.org
-----------------------------------------------------------------------------
Nemanja Corlija
2006-11-18 17:52:51 UTC
Permalink
Post by P Kishor
didn't try any of your tricks, but can confirm that VACUUM is very
slow on a similar db I have...
Since you obviously have some CPU cycles and RAM to spare, according
to my experiences at least, you'll benefit greatly by doing it your
self instead of letting VACUUM do all the work.

If you do try it, it would be very interesting to hear about your experiences.

I should say that I have no indexes in my db other then primary key.
--
Nemanja Corlija <chorlya-***@public.gmane.org>

-----------------------------------------------------------------------------
To unsubscribe, send email to sqlite-users-unsubscribe-CzDROfG0BjIdnm+***@public.gmane.org
-----------------------------------------------------------------------------
Joe Wilson
2006-11-18 17:59:28 UTC
Permalink
Add a print statement to vacuum.c to see what SQL statements
are actually executed during VACUUM:


diff -u -3 -p -r1.64 vacuum.c
--- src/vacuum.c 10 Oct 2006 13:07:36 -0000 1.64
+++ src/vacuum.c 18 Nov 2006 17:18:07 -0000
@@ -26,6 +26,7 @@
*/
static int execSql(sqlite3 *db, const char *zSql){
sqlite3_stmt *pStmt;
+ printf("%s\n", zSql);
if( SQLITE_OK!=sqlite3_prepare(db, zSql, -1, &pStmt, 0) ){
return sqlite3_errcode(db);
}
$ ./sqlite3.exe v.db vacuum
ATTACH 'C:\TMP\etilqs_SOVEJE7Rni84Zzy' AS vacuum_db;
PRAGMA vacuum_db.synchronous=OFF
BEGIN EXCLUSIVE;
CREATE TABLE vacuum_db.t1(a, b, primary key(b, a))
CREATE TABLE vacuum_db.t2(c, d)
CREATE INDEX vacuum_db.t2i on t2(d, c)
INSERT INTO vacuum_db.'t1' SELECT * FROM 't1';
INSERT INTO vacuum_db.'t2' SELECT * FROM 't2';
INSERT INTO vacuum_db.sqlite_master SELECT type, name, tbl_name, rootpage, sql FROM sqlite_master WHERE type='view' OR type='trigger' OR (type='table' AND rootpage=0)


----- Original Message ----
From: Nemanja Corlija <***@gmail.com>
To: sqlite-users-CzDROfG0BjIdnm+***@public.gmane.org
Sent: Saturday, November 18, 2006 9:35:52 AM
Subject: [sqlite] Why is VACUUM so slow?

I have a db with one table that has a text primary key and 16 text
columns in total.
After importing data from CSV file db had 5M rows and file size was
833MB. After some big DELETEs db had around 3M rows and 500MB after
"VACUUMing".
Running VACUUM for more then an hour filled new db with ~300MB worth
of data. I then aborted that and did something like this:

PRAGMA page_size=4096;
PRAGMA synchronous=OFF;
PRAGMA cache_size=100000;
PRAGMA temp_store=MEMORY;
CREATE TABLE tbl1(same structure as in original db);
ATTACH "original.db3" AS old;
INSERT INTO main.tbl1 SELECT * FROM old.tbl1;

This finished in less then 15 minutes. Timings are obviously very
rough here, but time difference is obvious.

Except for page_size, VACUUM had same PRAGMAs applied.

Isn't what I did manually above very similar to what VACUUM does
behind the scenes?
If so, why is there such a big difference?

One observation though, while VACUUM seemed to be completely I/O
bound. INSERT seemed to be CPU bound, which is not surprising given
that it was doing uniqueness check for PRIMARY KEY all over again. I
guess VACUUM doesn't have to do that since its copying from existing
db that already had uniqueness enforced.

This was on Win2000, SQLite 3.3.7 via sqlite3.exe.

Has anyone experienced similar performance difference?
--
Nemanja Corlija <chorlya-***@public.gmane.org>

-----------------------------------------------------------------------------
To unsubscribe, send email to sqlite-users-unsubscribe-CzDROfG0BjIdnm+***@public.gmane.org
-----------------------------------------------------------------------------







____________________________________________________________________________________
Sponsored Link

$420k for $1,399/mo.
Think You Pay Too Much For Your Mortgage?
Find Out! www.LowerMyBills.com/lre

-----------------------------------------------------------------------------
To unsubscribe, send email to sqlite-users-unsubscribe-CzDROfG0BjIdnm+***@public.gmane.org
-----------------------------------------------------------------------------
Joe Wilson
2006-11-18 21:25:11 UTC
Permalink
$ ./sqlite3.exe v.db vacuum
ATTACH 'C:\TMP\etilqs_SOVEJE7Rni84Zzy' AS vacuum_db;
PRAGMA vacuum_db.synchronous=OFF
BEGIN EXCLUSIVE;
CREATE TABLE vacuum_db.t1(a, b, primary key(b, a))
CREATE TABLE vacuum_db.t2(c, d)
CREATE INDEX vacuum_db.t2i on t2(d, c)
INSERT INTO vacuum_db.'t1' SELECT * FROM 't1';
INSERT INTO vacuum_db.'t2' SELECT * FROM 't2';
INSERT
INTO vacuum_db.sqlite_master SELECT type, name, tbl_name, rootpage,
sql FROM sqlite_master WHERE type='view' OR type='trigger' OR
(type='table' AND rootpage=0)



Strange, I thought INDEXes were created after the INSERTs in a VACUUM.

If this were the case, perhaps the resultant index pages in the newly vacuumed
database would be closer together and would run quicker in a cold disk cache
situation.

Is this untested patch worth a try?



Index: src/vacuum.c
===================================================================
RCS file: /sqlite/sqlite/src/vacuum.c,v
retrieving revision 1.65
diff -u -3 -p -r1.65 vacuum.c
--- src/vacuum.c 18 Nov 2006 20:20:22 -0000 1.65
+++ src/vacuum.c 18 Nov 2006 21:14:14 -0000
@@ -143,14 +143,6 @@ int sqlite3RunVacuum(char **pzErrMsg, sq
" AND rootpage>0"
);
if( rc!=SQLITE_OK ) goto end_of_vacuum;
- rc = execExecSql(db,
- "SELECT 'CREATE INDEX vacuum_db.' || substr(sql,14,100000000)"
- " FROM sqlite_master WHERE sql LIKE 'CREATE INDEX %' ");
- if( rc!=SQLITE_OK ) goto end_of_vacuum;
- rc = execExecSql(db,
- "SELECT 'CREATE UNIQUE INDEX vacuum_db.' || substr(sql,21,100000000) "
- " FROM sqlite_master WHERE sql LIKE 'CREATE UNIQUE INDEX %'");
- if( rc!=SQLITE_OK ) goto end_of_vacuum;

/* Loop through the tables in the main database. For each, do
** an "INSERT INTO vacuum_db.xxx SELECT * FROM xxx;" to copy
@@ -166,6 +158,18 @@ int sqlite3RunVacuum(char **pzErrMsg, sq
);
if( rc!=SQLITE_OK ) goto end_of_vacuum;

+ /* Create indexes after the INSERTs to keep their pages closer
+ ** together.
+ */
+ rc = execExecSql(db,
+ "SELECT 'CREATE INDEX vacuum_db.' || substr(sql,14,100000000)"
+ " FROM sqlite_master WHERE sql LIKE 'CREATE INDEX %' ");
+ if( rc!=SQLITE_OK ) goto end_of_vacuum;
+ rc = execExecSql(db,
+ "SELECT 'CREATE UNIQUE INDEX vacuum_db.' || substr(sql,21,100000000) "
+ " FROM sqlite_master WHERE sql LIKE 'CREATE UNIQUE INDEX %'");
+ if( rc!=SQLITE_OK ) goto end_of_vacuum;
+
/* Copy over the sequence table
*/
rc = execExecSql(db,






____________________________________________________________________________________
Sponsored Link

Mortgage rates near 39yr lows.
$310k for $999/mo. Calculate new payment!
www.LowerMyBills.com/lre

-----------------------------------------------------------------------------
To unsubscribe, send email to sqlite-users-unsubscribe-CzDROfG0BjIdnm+***@public.gmane.org
-----------------------------------------------------------------------------
Joe Wilson
2006-11-19 18:44:07 UTC
Permalink
Post by P Kishor
didn't try any of your tricks, but can confirm that VACUUM is very
slow on a similar db I have...
table1 -- 210k rows x 6 cols, 4 indexes, 1 pk
table2 -- 36k rows x 6 cols, 4 indexes, 1 pk
table3 -- 16k rows x 6 cols, 4 indexes, 1 pk
table4 -- 5M rows x 4 cols, 2 indexes, 1 pk
total size on file 1.95 Gb
VACUUM takes an hour+ on a 3 GHz x 4 Xeon (quad Xeon), 4 Gb RAM, Win
XP, SQLite 3.3.7
I'd be curious to learn if the timings of VACUUM on this database are any different
than the following:


sqlite3 your.db .dump | sqlite3 new.db


.dump outputs CREATE INDEX statements after the table INSERT statements,
while VACUUM outputs the CREATE INDEX statements before the table INSERT
statements. Sure, there's SQL statement interpretation overhead with the .dump
technique, but some time might be saved creating the indexes one by one after
the tables are populated (and the resultant index pages should also be contiguous).

Note: the above command will only work with SQLite version 3.3.7 or previous
because 3.3.8 does not .dump CREATE INDEX statements:
http://www.sqlite.org/cvstrac/tktview?tn=2072







____________________________________________________________________________________
Sponsored Link

Mortgage rates near 39yr lows.
$510k for $1,698/mo. Calculate new payment!
www.LowerMyBills.com/lre

-----------------------------------------------------------------------------
To unsubscribe, send email to sqlite-users-unsubscribe-CzDROfG0BjIdnm+***@public.gmane.org
-----------------------------------------------------------------------------
Joe Wilson
2006-11-19 20:34:51 UTC
Permalink
I forgot that sqlite page_size is lost during a .dump.
The command should actually be:


(echo "PRAGMA page_size=8192;" ; sqlite3 your.db .dump) | sqlite3 new.db


This runs 4 times faster on my machine with a large database than the command below
with a default page_size of 1024. Replace 8192 with the desired page size.


----- Original Message ----
I'd be curious to learn if the timings of VACUUM on this database are any different
than the following:


sqlite3 your.db .dump | sqlite3 new.db








____________________________________________________________________________________
Sponsored Link

Don't quit your job - take classes online
www.Classesusa.com


-----------------------------------------------------------------------------
To unsubscribe, send email to sqlite-users-unsubscribe-CzDROfG0BjIdnm+***@public.gmane.org
-----------------------------------------------------------------------------
Loading...