Discussion:
[sqlite] Performance of writing blobs
Dominique Devienne
7 years ago
Permalink
I'm surprised about the commit time of SQLite, when writing blobs is
involved.
Can anybody shed light on this subject? Below's a description of what I do,
with the results.

I've exporting data into SQLite, spread in several tables.
I process only about 240,000 rows, and write around 1GB in 20,000 blobs.

I process the data in two passes:
1) write the rows, record rowid when a row has a blob,
zeroblob(N) during insert, to leave room for Incremental Blob I/O later
in 2nd pass.
2) process blobs only

Then I compare this with writing the blobs in HDF5 instead.
Instead of writing the blob itself in-row, I just insert its size in SQLite.

(HDF5 is a "filesystem in a file", non-transactional, with "files" being
N-dimensional typed arrays, thus a 1D array of bytes approximates a "file")

Results writing both rows and blobs in SQLite:
Write ROWs = 6.968s (18.7%) (includes the zeroblob(N))
Write BLOBs = 2.764s ( 7.4%)
Commit TX = 10.566s (28.4%)

Results is a single 1,110,253,568 bytes SQLite DB file.

Now writing only rows to SQLite and blobs to HDF5:
Write ROWs = 4.265s (16.6%)
Write BLOBs = 4.010s (15.6%)
Commit TX = 0.195s ( 0.8%) (includes flushing the HDF5 file too)

Results in two files, 64,475,136 bytes SQLite DB, and 1,039,652,416 bytes
HDF5 file.

SQLite (version 3.19.3) is used in journal mode, and DB written "from
scratch", so there's not much to put in the journal, AFAIK. My assumption
was that after the zeroblob(N), there was enough room in the main DBs
pages, such that the subsequent blob open+write+close did not need to
generate any "page churn" (i.e. journal activity) and could write directly
to the pages created on initial insert. But what surprised me was the
commit time jumping to 10s, i.e. 50x more that when not writing the blobs.

After the initial zeroblob(N) + sqlite3_last_insert_rowid(), part of "Write
ROWs",
the code writing the blobs is below (simplified, error handling removed).

sqlite3_blob* p_blob = nullptr;
sqlite3_blob_open(db, "main", tab, col, rowid, /*read-write*/1, &p_blob);
assert(sqlite3_blob_bytes(p_blob) == length, ;);
sqlite3_blob_write(p_blob, buffer, static_cast<int>(length), 0);
sqlite3_blob_close(p_blob);

So if the code above "the write way" to write blobs?

I don't yet have the buffers to write the blobs directly at-row-insert-time,
for reasons beyond this particular problem. Please assume this is by-design.

I can understand that 64MB I/O vs 1GB I/O takes more time, and that's
reflected
in the 4.2s writing just the rows, versus the 7s + 2.7s = 9.7s when writing
both rows and blobs,
but why the commit time jumps from 0.2s to 10.5s ??? Thanks for any
insights in this.

Thanks, --DD
Clemens Ladisch
7 years ago
Permalink
Post by Dominique Devienne
My assumption
was that after the zeroblob(N), there was enough room in the main DBs
pages, such that the subsequent blob open+write+close did not need to
generate any "page churn" (i.e. journal activity) and could write directly
to the pages created on initial insert.
It does write to the same pages, but those pages must be copied to the
rollback journal so that they can be restored if the transaction is
rolled back. (Or are the two passes inside the same transaction?)

Don't insert the zero blobs in the first pass; this still rewrites all
rows, but there is less data that might need to be rolled back.

Consider moving the blobs to a separate table.


Regards,
Clemens
Dominique Devienne
7 years ago
Permalink
...
They are part of the same transactions. So the no-rewrite pages should
still apply, no?
Post by Dominique Devienne
Don't insert the zero blobs in the first pass; this still rewrites all
rows, but there is less data that might need to be rolled back.
Consider moving the blobs to a separate table.
Is that really/still relevant, considering the above? Thanks, --DD
Clemens Ladisch
7 years ago
Permalink
Post by Dominique Devienne
Post by Clemens Ladisch
It does write to the same pages, but those pages must be copied to the
rollback journal so that they can be restored if the transaction is
rolled back. (Or are the two passes inside the same transaction?)
They are part of the same transactions. So the no-rewrite pages should
still apply, no?
Indeed. I'd guess that just writing all those zero-filled pages is too
much overhead.

And the incremental blob I/O function are intended to access partial
blobs; when you write the entire value at once, just use SQL.
Post by Dominique Devienne
Post by Clemens Ladisch
Don't insert the zero blobs in the first pass; this still rewrites all
rows, but there is less data that might need to be rolled back.
Consider moving the blobs to a separate table.
Is that really/still relevant, considering the above?
Well, there's only one way to find out ... ;-)


Regards,
Clemens
Dominique Devienne
7 years ago
Permalink
...
OK! Output below should be self-explanatory.

Blobs are still saved in a 2nd pass, still in the same transaction as 1st
pass inserting the rows,
but now the blobs are inserted directly (sqlite3_bind_blob) but in a
separate (tab, col, row, val) "blob table",
as suggested. Still in JOURNAL mode.

Bottom line is that not calling zeroblob(N) helps, but the commit time (in
Epilog) is still around 10s :(

I guess my mental model of JOURNAL mode and heavy BLOB insert is not
correct...

In WAL mode, new data goes to WAL file, and then is checkpointed later to
the DB file.
In JOURNAL mode, new data goes to DB file directly, and modified pages go
to the JOURNAL file.
And since here this is INSERT-only, from empty tables, I assumed pages
copied to the JOURNAL
file should be minimal.

So what's going on during the commit to surpass writing all row *AND* blob
data combined???

That commit time represent 30% of my ETL, and that 1GB project export is a
"small" one...

FWIW, here's a rough "histogram" of blob sizes and max size (1/2 MB) for
this small project:
Read 20771 LOBs; min-max: 3-508032 bytes; 1333 <1KB; 3625 <8KB; 10599
<64KB; 5214 <512KB;

D:\>... --export-sqlite GL-blobs-inline.db ... --blobs-as-hdf5=false
--blobs-inline=true
...
Total = 37.586s
...
Write ROWs = 7.068s (18.8%)
Write BLOBs = 3.016s ( 8.0%)
Epilog = 10.637s (28.3%)

D:\>... --export-sqlite GL-blobs-outofline.db ... --blobs-as-hdf5=false
--blobs-inline=false
...
Total = 34.609s
...
Write ROWs = 4.358s (12.6%)
Write BLOBs = 3.130s ( 9.0%)
Epilog = 10.221s (29.5%)

D:>... --export-sqlite GL-blobs-ashdf5.db ... --blobs-as-hdf5=true
--blobs-inline=false
...
Total = 25.646s
...
Write ROWs = 4.368s (17.0%)
Write BLOBs = 4.179s (16.3%)
Epilog = 0.461s ( 1.8%)

D:\> dir GL*
...
06/12/2018 10:51 AM 64,475,136 GL-blobs-ashdf5.db
06/12/2018 10:51 AM 1,039,652,416 GL-blobs-ashdf5.h5
06/12/2018 10:53 AM 1,110,253,568 GL-blobs-inline.db
06/12/2018 10:50 AM 1,111,736,320 GL-blobs-outofline.db
Clemens Ladisch
7 years ago
Permalink
In JOURNAL mode, new data goes to DB file directly, and modified pages go to the JOURNAL file.
And since here this is INSERT-only, from empty tables, I assumed pages copied to the JOURNAL
file should be minimal.
Yes. You can check the journal size with PRAGMA journal_mode = PERSIST.
... --blobs-as-hdf5=false --blobs-inline=false
Write ROWs = 4.358s (12.6%)
Write BLOBs = 3.130s ( 9.0%)
Epilog = 10.221s (29.5%)
... --blobs-as-hdf5=true --blobs-inline=false
Write ROWs = 4.368s (17.0%)
Write BLOBs = 4.179s (16.3%)
Epilog = 0.461s ( 1.8%)
This looks as if HDF5 does not do fsync().

Compare with PRAGMA synchronous = OFF, or run
https://docs.microsoft.com/en-us/sysinternals/downloads/sync
after you've finished with the HDF5 file.


Regards,
Clemens
Dominique Devienne
7 years ago
Permalink
...
You're right of course. Thank you Clemens.

With synchronous = OFF, which suits my use-case here, the commit-time just
vanishes,
and even out-performs HDF5 now (see below). I might still prefer HDF5,
mainly because
the N-d typed arrays are self-describing and thus visible using Hdf5View
for example, instead
of completely opaque in SQLite, but now I know SQLite in non-durable mode
is as fast or even
faster for these smallish blob sizes than HDF5. I'll need to double-check
for larger sizes
(individual blobs get upward of 1GB in size), with my large 31GB, 200K
blobs, 1.2M entities project.

So thanks again Clemens for the guidance. Amazing job from SQLite, as
usual. --DD

D:\>... --export-sqlite GL-blobs-outofline-sync-OFF.db ...
--blobs-as-hdf5=false --blobs-inline=false
...
Total = 23.455s
...
Write ROWs = 4.110s (17.5%)
Write BLOBs = 2.822s (12.0%)
Epilog = 0.005s ( 0.0%)

D:\>... --export-sqlite GL-blobs-ashdf5-sync-OFF.db ...
--blobs-as-hdf5=true --blobs-inline=false
...
Total = 25.238s
...
Write ROWs = 4.464s (17.7%)
Write BLOBs = 4.358s (17.3%)
Epilog = 0.020s ( 0.1%)

06/12/2018 01:49 PM 64,475,136 GL-blobs-ashdf5-sync-OFF.db
06/12/2018 01:49 PM 1,039,652,416 GL-blobs-ashdf5-sync-OFF.h5
06/12/2018 01:48 PM 1,111,736,320 GL-blobs-outofline-sync-OFF.db
Eduardo Morras
7 years ago
Permalink
On Tue, 12 Jun 2018 14:13:33 +0200
...
I join late to this thread but here are some comments, I use sqlite in
environments similar to yours:

- Don't use synchronous=OFF if you need more speed, use pragma
journal=memory or pragma journal=none. They aren't safer but faster,

- As you noted, don't preallocate a zero-blob, insert directly the blob,

- Don't delete/update blobs, instead mark the row as deprecated/invalid
and insert a new one,

- If you have blobs larger than 2GB you'll need to split them, sqlite
can't work with blobs larger than 2GB,

- Use misc/unionvtab extension if you need a safe environment, it's a
bit tricky to use but you will get a RAIDB0/JBODB0 (raid 0/jbod db
split in many files) easily and is faster,

- Compile sqlite with SQLITE_DIRECT_OVERFLOW_READ, with this, you blob
reads don't pollute sqlite cache,

- Compress the blobs with misc/compress extension while writing, less
you write, faster you read, see sqlar project (http://sqlite.org/sqlar).

HTH

--- ---
Eduardo Morras <***@yahoo.es>

Continue reading on narkive:
Loading...