Discussion:
[sqlite] High performance and concurrency
Shevek
2018-03-01 07:24:05 UTC
Permalink
Hi,

I would like to have truly concurrent access to an sqlite database, that
is, the ability for multiple connections to read from the database
simultaneously. I'm using Java with xerial's sqlite-jdbc, customized to
let me mmap the entire database into RAM, and with additional debugging
symbols for perf. The database is about 30Gb, fully read-only, and the
connections are opened as such.

What I think is happening is that either a pthread mutex or a database
lock is serializing the accesses, so each thread blocks the others.

Queries are taking a few seconds, even with covering indexes, and I have
the RAM bandwidth available, so I'd really like to use it.

Any pointers?

Thank you.

S.
Simon Slavin
2018-03-01 07:45:35 UTC
Permalink
What I think is happening is that either a pthread mutex or a database lock is serializing the accesses, so each thread blocks the others.
What journal mode are you using ?

<https://sqlite.org/pragma.html#pragma_journal_mode>

If it's not WAL, try WAL. If it's currently WAL, try DELETE. Once you've changed it see if this changes how your program behaves.

Simon.
Shevek
2018-03-01 08:10:02 UTC
Permalink
Post by Simon Slavin
What I think is happening is that either a pthread mutex or a database lock is serializing the accesses, so each thread blocks the others.
To be specific, I'm concerned about is the line
sqlite3_mutex_enter(db->mutex) at the top of sqlite3_step(). Since my
queries are spending all their time in sqliteVdbeExec(), which is
reached through that path, I assume db->mutex is preventing concurrency.

Our main hotspots in the query are sqliteVdbeExec() and updating the
btree pointer to point to a new page (I forget the call name). We can't
do much about the cost of execution; we've mmap'd everything to avoid
the I/O, we're using covering indexes to help with locality, we've
sorted our query keys to attempt to reduce index page seeks, and now we
want to use concurrency and splitting the logic in our query to exploit
memory bandwidth.

Now I've traced this again, I'm looking warily at SQLITE_OPEN_NOMUTEX
because we need thread-safety, as in, sqlite's internal data structures
must be handled correctly in the presence of multiple threads or passing
a connection between threads (safely in the JMM); we just don't need
serialization of database reads and writes, because nothing we do has a
serializable side-effect. Is SQLITE_OPEN_NOMUTEX the answer?
Post by Simon Slavin
What journal mode are you using ?
I'm fairly sure journal mode is NONE for our readonly database. Anyway,
readonly shouldn't write to a journal. We have confirmed that the md5sum
of the database file is unchanged during and after the execution of our
application.

S.
Post by Simon Slavin
<https://sqlite.org/pragma.html#pragma_journal_mode>
If it's not WAL, try WAL. If it's currently WAL, try DELETE. Once you've changed it see if this changes how your program behaves.
Simon.
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Hick Gunter
2018-03-01 09:24:49 UTC
Permalink
Use 1 connection for each thread. Sharing a connections between threads may cause transactions to be larger than each thread thinks.

-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:sqlite-users-***@mailinglists.sqlite.org] Im Auftrag von Shevek
Gesendet: Donnerstag, 01. März 2018 09:10
An: SQLite mailing list <sqlite-***@mailinglists.sqlite.org>; Simon Slavin <***@bigfraud.org>
Betreff: [EXTERNAL] Re: [sqlite] High performance and concurrency
Post by Simon Slavin
What I think is happening is that either a pthread mutex or a database lock is serializing the accesses, so each thread blocks the others.
To be specific, I'm concerned about is the line
sqlite3_mutex_enter(db->mutex) at the top of sqlite3_step(). Since my queries are spending all their time in sqliteVdbeExec(), which is reached through that path, I assume db->mutex is preventing concurrency.

Our main hotspots in the query are sqliteVdbeExec() and updating the btree pointer to point to a new page (I forget the call name). We can't do much about the cost of execution; we've mmap'd everything to avoid the I/O, we're using covering indexes to help with locality, we've sorted our query keys to attempt to reduce index page seeks, and now we want to use concurrency and splitting the logic in our query to exploit memory bandwidth.

Now I've traced this again, I'm looking warily at SQLITE_OPEN_NOMUTEX because we need thread-safety, as in, sqlite's internal data structures must be handled correctly in the presence of multiple threads or passing a connection between threads (safely in the JMM); we just don't need serialization of database reads and writes, because nothing we do has a serializable side-effect. Is SQLITE_OPEN_NOMUTEX the answer?
Post by Simon Slavin
What journal mode are you using ?
I'm fairly sure journal mode is NONE for our readonly database. Anyway, readonly shouldn't write to a journal. We have confirmed that the md5sum of the database file is unchanged during and after the execution of our application.

S.
Post by Simon Slavin
<https://sqlite.org/pragma.html#pragma_journal_mode>
If it's not WAL, try WAL. If it's currently WAL, try DELETE. Once you've changed it see if this changes how your program behaves.
Simon.
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
sqlite-***@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
Shevek
2018-03-01 19:43:41 UTC
Permalink
Post by Hick Gunter
Use 1 connection for each thread. Sharing a connections between threads may cause transactions to be larger than each thread thinks.
Why would I have a transaction of non-zero size on a read-only connection?

It looks from the source as if having bCoreMutex=true and
bFullMutex=false will allow us the concurrency we need. I'm going to try
again in a couple of days.

Our journal_mode is OFF.

We use HikariCP, so a connection is in use by one thread at a time with
JMM-safe handoff, and they all share the mmap region.

S.
Post by Hick Gunter
-----Ursprüngliche Nachricht-----
Gesendet: Donnerstag, 01. März 2018 09:10
Betreff: [EXTERNAL] Re: [sqlite] High performance and concurrency
Post by Simon Slavin
What I think is happening is that either a pthread mutex or a database lock is serializing the accesses, so each thread blocks the others.
To be specific, I'm concerned about is the line
sqlite3_mutex_enter(db->mutex) at the top of sqlite3_step(). Since my queries are spending all their time in sqliteVdbeExec(), which is reached through that path, I assume db->mutex is preventing concurrency.
Our main hotspots in the query are sqliteVdbeExec() and updating the btree pointer to point to a new page (I forget the call name). We can't do much about the cost of execution; we've mmap'd everything to avoid the I/O, we're using covering indexes to help with locality, we've sorted our query keys to attempt to reduce index page seeks, and now we want to use concurrency and splitting the logic in our query to exploit memory bandwidth.
Now I've traced this again, I'm looking warily at SQLITE_OPEN_NOMUTEX because we need thread-safety, as in, sqlite's internal data structures must be handled correctly in the presence of multiple threads or passing a connection between threads (safely in the JMM); we just don't need serialization of database reads and writes, because nothing we do has a serializable side-effect. Is SQLITE_OPEN_NOMUTEX the answer?
Post by Simon Slavin
What journal mode are you using ?
I'm fairly sure journal mode is NONE for our readonly database. Anyway, readonly shouldn't write to a journal. We have confirmed that the md5sum of the database file is unchanged during and after the execution of our application.
S.
Post by Simon Slavin
<https://sqlite.org/pragma.html#pragma_journal_mode>
If it's not WAL, try WAL. If it's currently WAL, try DELETE. Once you've changed it see if this changes how your program behaves.
Simon.
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___________________________________________
Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0
May be privileged. May be confidential. Please delete if not the addressee.
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Clemens Ladisch
2018-03-02 10:03:51 UTC
Permalink
Post by Shevek
Why would I have a transaction of non-zero size on a read-only connection?
What do you mean with "size"?

A read-only transaction still puts a shared lock on the database file.

A read-only transaction will not change the DB file, but SQLite has lots of
internal data structures in memory, and those can change.

To give each thread its own data structures, use separate connections.
Post by Shevek
I assume db->mutex is preventing concurrency
"db" is the connection object; each one has its own mutex.


Regards,
Clemens
Rowan Worth
2018-03-02 10:08:15 UTC
Permalink
Post by Shevek
We use HikariCP, so a connection is in use by one thread at a time with
JMM-safe handoff, and they all share the mmap region.
What I think is happening is that either a pthread mutex or a database
lock is serializing the accesses, so each thread blocks the others.

I'm not familiar with HikariCP but if it's handing the connection around to
a single thread at a time, sounds like database accesses are serialised
long before sqlite becomes a factor.

-Rowan

Continue reading on narkive:
Loading...