Discussion:
[sqlite] Regarding the effects of the COMMIT keyword
Jim Borden
2016-09-29 07:39:04 UTC
Permalink
I found the following snippet from https://www.sqlite.org/lockingv3.html

➢ The SQL command "COMMIT" does not actually commit the changes to disk. It just turns autocommit back on. Then, at the conclusion of the command, the regular autocommit logic takes over and causes the actual commit to disk to occur.

Does that mean that the actual commit happens after the execution of “COMMIT” (i.e. it is in a sense asynchronous)?

Could that explain some odd behavior that I am seeing? I have two connections to a database file and reports that writing to one and then immediately querying from another causes the second connection to show the entry as missing. I was working under the assumption that after COMMIT finished that everything would be visible to all connections but is this a false assumption? The suspiciousness is also due in part to this only happening when:

1) There are more than ~1200 entries in the table
2) The conclusion of the write and the read happen <= ~500ms apart (not very accurate timing, going by log timestamps)

Further queries after the initial one return the entry as expected. Also, if either of the above conditions are not met then even the initial one is ok. The database file is operating in WAL mode on version 3.8.10.2 (technically it is SQLCipher) with no pragma modifications. All access is on the same physical disk coming from the same process, but with two different threads (one writing and one reading). Really I’d just like to know if I’m barking up the wrong tree here trying to explain why this odd behavior occurs.


Jim Borden
Software Engineer

***@couchbase.com
Clemens Ladisch
2016-09-29 07:51:07 UTC
Permalink
Post by Jim Borden
I found the following snippet from https://www.sqlite.org/lockingv3.html
The SQL command "COMMIT" does not actually commit the changes to disk.
It just turns autocommit back on. Then, at the conclusion of the
command, the regular autocommit logic takes over and causes the actual
commit to disk to occur.
Does that mean that the actual commit happens after the execution of
“COMMIT” (i.e. it is in a sense asynchronous)?
No, "at the conclusion" happens _before_ the execution has finished.
Post by Jim Borden
I have two connections to a database file and reports that writing to
one and then immediately querying from another causes the second
connection to show the entry as missing. [...]
The database file is operating in WAL mode
In WAL mode, the writer and the reader do not block each other. This
implies that the reader sees the old state of the database.

What exactly does "immediately" mean? Is there some synchronization
mechanism that actually ensures that the second transaction is started
_after_ the first one has finished? (And when doing multiple SELECTs,
it's easy to have them in a single transaction because the first one
isn't finalized early enough.)

What is the PRAGMA synchronous setting?


Regards,
Clemens
Jim Borden
2016-09-29 07:59:46 UTC
Permalink
There is a web API, and the application flow is sending a PUT request, which stores the data and then returns a successful status. After that status is received, a GET request is sent. The way the write connection works is that everything is pumped through a single thread and all other threads must wait their turn while the single thread executes the work one by one (the exception being, as I found out painfully a few months ago, stepping through prepared statements). If a transaction is entered, then any further transactions requested by the transaction are handled immediately as save points. Only when the outermost transaction commits may the next work item be executed. So by the time the successful HTTP response code is returned the data should be saved and committed. That is why it is so puzzling that the next request (which does not use explicit transactions, just executes a few SELECT statements) returns 404. To me it seemed obvious that it was seeing an old state, but the question was why is it seeing an old state?

PRAGMA synchronous is 2 (NORMAL, I believe?)

Jim Borden
Software Engineer

***@couchbase.com
<http://www.couchbase.com/connect>
Post by Jim Borden
I found the following snippet from https://www.sqlite.org/lockingv3.html
The SQL command "COMMIT" does not actually commit the changes to disk.
It just turns autocommit back on. Then, at the conclusion of the
command, the regular autocommit logic takes over and causes the actual
commit to disk to occur.
Does that mean that the actual commit happens after the execution of
“COMMIT” (i.e. it is in a sense asynchronous)?
No, "at the conclusion" happens _before_ the execution has finished.
Post by Jim Borden
I have two connections to a database file and reports that writing to
one and then immediately querying from another causes the second
connection to show the entry as missing. [...]
The database file is operating in WAL mode
In WAL mode, the writer and the reader do not block each other. This
implies that the reader sees the old state of the database.

What exactly does "immediately" mean? Is there some synchronization
mechanism that actually ensures that the second transaction is started
_after_ the first one has finished? (And when doing multiple SELECTs,
it's easy to have them in a single transaction because the first one
isn't finalized early enough.)

What is the PRAGMA synchronous setting?


Regards,
Clemens
_______________________________________________
sqlite-users mailing list
sqlite-***@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Simon Slavin
2016-09-29 08:32:30 UTC
Permalink
Post by Jim Borden
There is a web API
If you're using a conventional server as the front end to your web service (e.g. Apache, with your code written in PHP/Python/C/whatever) then the server spawns a new process to handle each incoming request. So it's possible for two calls to execute at the same time and you do have to worry about multiprocessing.

However, there is a question of how SQLite connections are maintained. Does the web service open some connections when it is started and maintain them throughout its life, or does it create a new connection to answer each PUT or GET ? The answer is important because a savepoint is handled by a specific connection. Close the connection and your savepoint vanishes.

Also, depending on how your code is written the server may be sending back the acknowledgement for the PUT first, and then doing the database operations while your program is already moving on to do something else. But that would make it impossible to return a different HTTP response code if the PUT fails. It's more likely that the PUT operation waits until the database connection is finished before returning its HTTP response code, to allow it to report errors.

In which case I don't see how your problem could occur. But someone else might.

Simon.
Jim Borden
2016-09-29 08:50:49 UTC
Permalink
It’s less complicated than a web service. There is no “server” per se, only a lightweight listener object that can accept and respond to HTTP requests (C# HttpListener class). The short explanation is that the library I develop (Couchbase Lite) has a replication function that allows it to communicate with an endpoint that implements the CouchDB sync protocol. As part of that, in order to enable device to device replication, there is also a listener component distributed as an optional part of Couchbase Lite which embeds a listener for REST requests into the process. So yes, multiple threads are at play here, but in the scenario there would be three at most: The thread that requested the PUT, the thread that wrote to the database, and the thread the requested the GET. All write requests to the database are moderated through the write connection thread and will block the calling thread until the write operation is complete. So, as you noted, this is why I am able to return the correct status for any given operation (almost, aside from this odd 404 issue). Also, this situation is relatively rare which makes it more annoying.

All of the code is in one process compiled together (the listener, the storage API, the app, etc), and written all in C# with interop calls to C.

Thanks for your input. The clarification about COMMIT was enough for me to focus my attention elsewhere and has been helpful.

Jim Borden
Software Engineer
Post by Jim Borden
There is a web API
If you're using a conventional server as the front end to your web service (e.g. Apache, with your code written in PHP/Python/C/whatever) then the server spawns a new process to handle each incoming request. So it's possible for two calls to execute at the same time and you do have to worry about multiprocessing.

However, there is a question of how SQLite connections are maintained. Does the web service open some connections when it is started and maintain them throughout its life, or does it create a new connection to answer each PUT or GET ? The answer is important because a savepoint is handled by a specific connection. Close the connection and your savepoint vanishes.

Also, depending on how your code is written the server may be sending back the acknowledgement for the PUT first, and then doing the database operations while your program is already moving on to do something else. But that would make it impossible to return a different HTTP response code if the PUT fails. It's more likely that the PUT operation waits until the database connection is finished before returning its HTTP response code, to allow it to report errors.

In which case I don't see how your problem could occur. But someone else might.

Simon.
_______________________________________________
sqlite-users mailing list
sqlite-***@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Simon Slavin
2016-09-29 07:52:37 UTC
Permalink
Post by Jim Borden
I found the following snippet from https://www.sqlite.org/lockingv3.html
➢ The SQL command "COMMIT" does not actually commit the changes to disk. It just turns autocommit back on. Then, at the conclusion of the command, the regular autocommit logic takes over and causes the actual commit to disk to occur.
Does that mean that the actual commit happens after the execution of “COMMIT” (i.e. it is in a sense asynchronous)?
No. It happens before the API call you're doing finishes. Remember that the SQLite library is just a collection of procedures which you run inside your program. SQLite does not run in a different thread or on a different computer while your program does its own thing.
Post by Jim Borden
I have two connections to a database file and reports that writing to one and then immediately querying from another causes the second connection to show the entry as missing.
Are you using multiple processes or threads ?

Are you using any PRAGMAs which look like they might speed up SQLite operations ?

Are the two connections accessing the file using identical file specifications ?

Are you forgetting to finalize a _prepare,_step,_finalize sequence ?

Simon.
Jim Borden
2016-09-29 08:09:46 UTC
Permalink
I am using multiple threads, but in this instance just 2 inside of one process. I do not change any PRAGMA settings other than user_version and journal_mode. The two connections differ only by the fact that one is read only and one is read-write. It’s possible that I’ve forgotten a finalize somewhere, etc, but unlikely because that stuff gets processed in a pretty centralized way when the object holding the sqlite3 object is disposed (C# term).

But in the end, my original understanding should hold that after the COMMIT execution finishes all data should be immediately visible to other connections from that point forward?

Jim Borden
Software Engineer
Post by Jim Borden
I found the following snippet from https://www.sqlite.org/lockingv3.html
➢ The SQL command "COMMIT" does not actually commit the changes to disk. It just turns autocommit back on. Then, at the conclusion of the command, the regular autocommit logic takes over and causes the actual commit to disk to occur.
Does that mean that the actual commit happens after the execution of “COMMIT” (i.e. it is in a sense asynchronous)?
No. It happens before the API call you're doing finishes. Remember that the SQLite library is just a collection of procedures which you run inside your program. SQLite does not run in a different thread or on a different computer while your program does its own thing.
Post by Jim Borden
I have two connections to a database file and reports that writing to one and then immediately querying from another causes the second connection to show the entry as missing.
Are you using multiple processes or threads ?

Are you using any PRAGMAs which look like they might speed up SQLite operations ?

Are the two connections accessing the file using identical file specifications ?

Are you forgetting to finalize a _prepare,_step,_finalize sequence ?

Simon.
_______________________________________________
sqlite-users mailing list
sqlite-***@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Hick Gunter
2016-09-29 09:01:12 UTC
Permalink
I am using multiple threads, but in this instance just 2 inside of one process. I do not change any PRAGMA settings other than user_version and journal_mode. The two >connections differ only by the fact that one is read only and one is read-write. It’s possible that I’ve forgotten a finalize somewhere, etc, but unlikely because that stuff gets >processed in a pretty centralized way when the object holding the sqlite3 object is disposed (C# term).
But in the end, my original understanding should hold that after the COMMIT execution finishes all data should be immediately visible to other connections from that point >forward?
No. After COMMIT finishes, any TRANSACTION that is STARTED LATER willl see the new data.

Reading "stale" data (i.e. the DB state at the beginning of a transaction) is at least almost always caused by indvertently leaving a transaction open. Setting the journal mode to WAL hides this problem, because the writer is no longer blocked by the reader's transaction. Disable WAL mode and you will probably find that the writer will find that the "database is locked". When this occurs, you can use then sqlite3_next_stmt() interface (on the readonly connection) to locate any unfinished statements. If these are prepared via sqlite3_prepare_V2, you can retrieve the SQL using sqlite3_sql().


___________________________________________
Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: ***@scigames.at

This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation.
Clemens Ladisch
2016-09-29 09:09:39 UTC
Permalink
Post by Hick Gunter
Reading "stale" data (i.e. the DB state at the beginning of a transaction)
is at least almost always caused by indvertently leaving a transaction
open. Setting the journal mode to WAL hides this problem, because the
writer is no longer blocked by the reader's transaction. Disable WAL mode
and you will probably find that the writer will find that the "database is
locked".
Alternatively, put explicit BEGIN/COMMITs around all reads; then you'll
see if you accidentally try to nest transactions.


Regards,
Clemens
Jim Borden
2016-09-30 01:59:40 UTC
Permalink
Thanks for all the input, all of your comments put me on exactly the right track. I was too focused on the behavior of the writes and I didn’t consider the behavior of the reads. I reviewed the logs again and it turns out there was a longer running query that surrounded the entire PUT / GET sequence, and since the same connection was being used for both queries (long, and GET) the GET was being held back on the outer result set. To add to the confusion, another one of these long queries had just started before the previous one finished and so it appeared that it ran and finished quickly after the GET when in reality it was another starting and the previous one finishing.

Jim Borden
Software Engineer
Post by Hick Gunter
Reading "stale" data (i.e. the DB state at the beginning of a transaction)
is at least almost always caused by indvertently leaving a transaction
open. Setting the journal mode to WAL hides this problem, because the
writer is no longer blocked by the reader's transaction. Disable WAL mode
and you will probably find that the writer will find that the "database is
locked".
Alternatively, put explicit BEGIN/COMMITs around all reads; then you'll
see if you accidentally try to nest transactions.


Regards,
Clemens
_______________________________________________
sqlite-users mailing list
sqlite-***@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Continue reading on narkive:
Loading...