[sqlite] update or replace ...

Discussion:

Olivier Mascia

2016-06-29 16:56:15 UTC

Dear all,

What's the expected behavior of statement "update or replace ..."? (http://sqlite.org/lang_update.html)

create table T(K integer primary key, V text);
update or replace T set V='data' where K=1;

Clearly it does nothing.
Does this fit the intended behavior?
Would that mean 'or replace' in the context of an update statement is a void operation?

--
Meilleures salutations, Met vriendelijke groeten, Best Regards,
Olivier Mascia, integral.be/om

R Smith

2016-06-29 17:18:40 UTC

Permalink

Post by Olivier Mascia
Dear all,
What's the expected behavior of statement "update or replace ..."? (http://sqlite.org/lang_update.html)
create table T(K integer primary key, V text);
update or replace T set V='data' where K=1;
Clearly it does nothing.
Does this fit the intended behavior?
Would that mean 'or replace' in the context of an update statement is a void operation?

...OR REPLACE here indicates a Conflict-resolution algorithm. Anything
after the OR in an UPDATE OR xxx statement relates to the conflict
resolution algorithm - i.e. What must be done if this change causes a
constraint to fail?

In all the next examples, assume two rows with Primary key IDs 1 and 2
already exist in the Database:

UPDATE OR FAIL SET ID = 1 WHERE ID = 2;
- indicates that the executions should stop immediately and produce a
Constraint-failed error without touching any transaction mechanics.

UPDATE OR ROLLBACK SET ID = 1 WHERE ID = 2;
- indicates that the executions should stop immediately and roll back
the entire active transaction.

UPDATE OR ABORT SET ID = 1 WHERE ID = 2;
- indicates that the executions should stop immediately and roll back
the current statement, but leave prior changes in the transaction as is.

UPDATE OR IGNORE SET ID = 1 WHERE ID = 2;
- indicates that the the update should be skipped. It's the "Do
nothing" check.

UPDATE OR REPLACE SET ID = 1 WHERE ID = 2;
- indicates that the Row with ID: 1 which already exists must be
deleted and then this row's ID should be set to 1. Dangerous!

Hope that clears it up,
Ryan

Simon Slavin

2016-06-29 17:42:10 UTC

Permalink

Post by Olivier Mascia
What's the expected behavior of statement "update or replace ..."? (http://sqlite.org/lang_update.html)
create table T(K integer primary key, V text);
update or replace T set V='data' where K=1;
Clearly it does nothing.
Does this fit the intended behavior?
Would that mean 'or replace' in the context of an update statement is a void operation?

For the UPDATE OR REPLACE command, the REPLACE part comes into play only if an UPDATE would cause some sort of conflict. Since the UPDATE command does nothing in this case, there will be no conflict, so there will be no REPLACE.

This command will return SQLITE_OK. It is an entirely legal command which does nothing in this situation.

Under different circumstances (i.e. if the table already held data) the same command might result in a REPLACE.

Simon.

Olivier Mascia

2016-06-30 07:24:36 UTC

Permalink

Post by Olivier Mascia
Clearly it does nothing.
Does this fit the intended behavior?
Would that mean 'or replace' in the context of an update statement is a void operation?

Thanks Ryan and Simon for your answers. Indeed I forgot the OR REPLACE|FAIL|IGNORE|... in SQLite is related to the ON CONFLICT clause.

I'd love to have some equivalent to the UPDATE OR INSERT statement (or variation on it) that some other engines expose. But clearly building it on top of a syntax using OR wouldn't be a nice idea: it would bring multiple interpretations to the 'OR' in this context, sometimes ON CONFLICT resolution and sometimes 'ON NOTHING DONE' resolution.

Of course writing straight code in C/C++ it's rather simple to emulate situations where you want to update some values in an existing row, creating the row if needed. Though in larger programming tasks that ends up being rather tedious for business logic programmers. We hide this inside our C++ shell around SQLite C API and the solution is not bad but had some challenges for handling parameters to the statement(s) without risking them to be evaluated twice when re-using them for two distinct SQLite statements (while there is only one 'upsert' at the C++ logical level).

All in all, I hope SQLite could bring up some sort of such UPDATE OR INSERT (which would be more easily named UPSERT in SQLite existing syntax), one day. Does adding such a new statement would fit the 'small' within "Small. Fast. Reliable. Choose any three." I'm not sure, I don't yet have enough background with it.

--
Meilleures salutations, Met vriendelijke groeten, Best Regards,
Olivier Mascia, integral.be/om

n***@null.net

2016-06-30 08:06:23 UTC

Permalink

Post by Olivier Mascia
I'd love to have some equivalent to the UPDATE OR INSERT statement
(or variation on it) that some other engines expose. But clearly

Does the "INSERT OR REPLACE" syntax not provide what you are looking
for?

--
Mark Lawrence

Olivier Mascia

2016-06-30 08:54:57 UTC

Permalink

Post by n***@null.net

Post by Olivier Mascia
I'd love to have some equivalent to the UPDATE OR INSERT statement
(or variation on it) that some other engines expose. But clearly

Does the "INSERT OR REPLACE" syntax not provide what you are looking
for?

Absolutely not. "OR REPLACE", as Ryan and Simon pointed out clearly, is a conflict resolution mechanism.

1) The insert would have to trigger a constraint violation to execute the replace.
2) The replace *do* delete the row and insert a new one: this would have effect on related data (on delete cascade for instance).
3) The typical use case for some form of "UPSERT" (whatever the name and syntax it takes) is to generally update a row, assumed to be existing, inserting it at that occasion if needed.

--
Meilleures salutations, Met vriendelijke groeten, Best Regards,
Olivier Mascia, integral.be/om

R Smith

2016-06-30 11:34:18 UTC

Permalink

Post by Olivier Mascia

Post by n***@null.net
Does the "INSERT OR REPLACE" syntax not provide what you are looking
for?

Absolutely not. "OR REPLACE", as Ryan and Simon pointed out clearly, is a conflict resolution mechanism.
1) The insert would have to trigger a constraint violation to execute the replace.
2) The replace *do* delete the row and insert a new one: this would have effect on related data (on delete cascade for instance).
3) The typical use case for some form of "UPSERT" (whatever the name and syntax it takes) is to generally update a row, assumed to be existing, inserting it at that occasion if needed.

This is true, but the UPSERT is a lofty beast. May I remind people of
the intense bulk of SQL you have to type to make it happen in MSSQL and
the like? here is a minimal set:

MERGE dbo.xxx AS T
USING dbo.yyy AS S
ON T.SomeID = S.SomeID
WHEN MATCHED AND T.ValueThatNeedsChanging = Oldvalue // or whatever
checking is relevant
THEN UPDATE SET T.ValueThatNeedsChanging = NewValue
WHEN NOT MATCHED
THEN INSERT (SomeID, ValueThatNeedsChanging, AndAllOtherColumns)
VALUES () / SELECT clauses
;

That's twice the code needed to do the things others suggested (INSERT
IGNORE followed by UPDATE) - and no, I'm not allergic to typing code,
but can you imagine the SQL engine changes that would be required to
parse and understand all that and describe it in the current
prepared-statement structure? I believe that would take a good ol' bite
out of the "Lite" claim.

This is not to say it shouldn't be done, nor an excuse for it not to be
done (I'm merely pointing out why it mightn't have been done yet, as
some asked that question), but I think the weight of the decision would
be Gain vs. Effort / Size increase / Processing increase (in parser terms).

If it can help 1% of queries to run a little bit faster during a job
which is already blisteringly fast - is it really worth the effort?.
Granted, 1% of all SQlite queries in the World running more efficiently
would probably amount to measurable global energy saving, but then 1% is
probably overestimating by a lot.
Someone mentioned having to climb an Index twice - fair enough, but most
queries do this a zillion times for look-ups, it's hardly a chore,
that's why we have indices.

Again, no reason not to do it, but I would schedule the implementation
of MERGE right after adding Checksums to pages. :)

Olivier Mascia

2016-06-30 12:48:53 UTC

Permalink

Post by R Smith
MERGE dbo.xxx AS T
USING dbo.yyy AS S
ON T.SomeID = S.SomeID
WHEN MATCHED AND T.ValueThatNeedsChanging = Oldvalue // or whatever checking is relevant
THEN UPDATE SET T.ValueThatNeedsChanging = NewValue
WHEN NOT MATCHED
THEN INSERT (SomeID, ValueThatNeedsChanging, AndAllOtherColumns)
VALUES () / SELECT clauses
;

I, hopefully, never used such a piece of coding in my whole life and I know, now, why all my human body cells refrained me to ever even approach MSSQL. :)

The - useful - "upsert" which I can make good use of is the simpler one you can find in FirebirdSQL, where it is called UPDATE OR INSERT with pretty much the same syntax as an insert.

Easy to read, useful and effective for what use cases it is designed for.

--
Meilleures salutations, Met vriendelijke groeten, Best Regards,
Olivier Mascia (from mobile device), integral.be/om

Paul Egli

2016-06-30 13:17:19 UTC

Permalink

Post by R Smith

Post by R Smith
MERGE dbo.xxx AS T
USING dbo.yyy AS S
ON T.SomeID = S.SomeID
WHEN MATCHED AND T.ValueThatNeedsChanging = Oldvalue // or whatever

checking is relevant

Post by R Smith
THEN UPDATE SET T.ValueThatNeedsChanging = NewValue
WHEN NOT MATCHED
THEN INSERT (SomeID, ValueThatNeedsChanging, AndAllOtherColumns)
VALUES () / SELECT clauses
;

I, hopefully, never used such a piece of coding in my whole life and I
know, now, why all my human body cells refrained me to ever even approach
MSSQL. :)
The - useful - "upsert" which I can make good use of is the simpler one
you can find in FirebirdSQL, where it is called UPDATE OR INSERT with
pretty much the same syntax as an insert.
Easy to read, useful and effective for what use cases it is designed for.

I've used MERGE INTO a lot and sometimes wished that SQLite had it, but i
can understand why it's not a priority. To me, the real value of MERGE is
the atomicity of the operation. You can perform inserts, updates, and
deletes all in one statement--No chance for race conditions if data changes
between the operations.

In SQLite where only a single writer is allowed at a time and the only
isolation level available Serializable, all you need to do is BEGIN
IMMEDIATE and you get the atomicity you need, even if it seems unnatural
(to me) to use two or three different statements to do what is logically
one operation.

Paul Egli

2016-06-30 13:29:35 UTC

Permalink

No chance for race conditions if data changes between the operations.

I should say "if data changes *or a read happens* between the operations.
:-)

R Smith

2016-06-30 14:07:02 UTC

Permalink

Post by Olivier Mascia

I, hopefully, never used such a piece of coding in my whole life and I know, now, why all my human body cells refrained me to ever even approach MSSQL. :)

Ha, ok, but I wasn't really complaining about the way it is done in
MSSQL, I was merely saying what is needed for it to work, and I see now
you mean more of a INSERT OR UPDATE the way MySQL does it. Don't
disregard the above MERGE statement though, it is a very powerful piece
of SQL and I quite like to use it. Very often (if not mostly) you want
to update and/or insert only in certain cases (WHERE modified = 1) -
taking values from a table (rather than just VALUES like I did in my
examples) and for that this MERGE mechanism is an amazing device. It's
basically UPSERT with expressions, filtering and sub-queries allowed.

If we /are/ going to go full regalia in SQLite, I'd push for MERGE[1]
actually rather than just INSERT - UPDATE, but the caveats I mentioned
earlier remain.

[1] Perhaps borrowing from the PostGres MERGE implementation rather...

Simon Slavin

2016-06-30 09:01:39 UTC

Permalink

Post by Olivier Mascia
Of course writing straight code in C/C++ it's rather simple to emulate situations where you want to update some values in an existing row, creating the row if needed.

Clemens Ladisch

2016-06-30 09:17:07 UTC

Permalink

Post by Simon Slavin

Post by Olivier Mascia
Of course writing straight code in C/C++ it's rather simple to emulate
situations where you want to update some values in an existing row,
creating the row if needed.

1) INSERT without the REPLACE
2) UPDATE
When step 1 fails because the key values already exist you trap this
and explicitly ignore it in your code (commented, because you're being
nice to other programmers).

Trapping only the conflict violation while properly handling any other
errors makes this even more complex.

A better way is to try the UPDATE first, and if the data was not found
(number of affected rows is zero), do the INSERT. Doesn't even need
a comment.

Regards,
Clemens

Olivier Mascia

2016-06-30 10:00:00 UTC

Permalink

Post by Clemens Ladisch
A better way is to try the UPDATE first, and if the data was not found
(number of affected rows is zero), do the INSERT. Doesn't even need
a comment.

Post by Olivier Mascia
Of course writing straight code in C/C++ it's rather simple to emulate situations where you want to update some values in an existing row, creating the row if needed. Though in larger programming tasks that ends up being rather tedious for business logic programmers. We hide this inside our C++ shell around SQLite C API and the solution is not bad but had some challenges for handling parameters to the statement(s) without risking them to be evaluated twice when re-using them for two distinct SQLite statements (while there is only one 'upsert' at the C++ logical level).

Besides, if you ever have to write a scripts for the command line tool, you're stuck trying to emulate that semantic. Or can we check by script the outcome of the UPDATE?

--
Meilleures salutations, Met vriendelijke groeten, Best Regards,
Olivier Mascia, integral.be/om

R Smith

2016-06-30 12:13:43 UTC

Permalink

Post by Olivier Mascia
Besides, if you ever have to write a scripts for the command line tool, you're stuck trying to emulate that semantic. Or can we check by script the outcome of the UPDATE?

There is no way to catch the outcome in the CLI that I know of, but that
doesn't matter. if you are writing scripts for the CLi and not
programming it, you can't possibly have SPEED as a paramount
consideration, and if that's the case, the simple usual upserts in the
CLI would work just dandy (and the order doesn't matter much for
lookups, but it is very slightly more efficient to do the Update first
as sometimes there's nothing to update - if the update is done second,
there's always something to update.

Here is a performance measurement of a typical script that would run
just fine in the CLi merrily doing Upserts wihout needing to check any
operation result (i.e the slowest possible way to do it).
It's run in SQLitespeed (but will work perfectly in the CLI too) so I
can measure the Virtual Machine operations count and compare with the
same script but without any of the unnecessary steps that makes upserts
(this second one has half the SQL operations and will even be much more
efficient than a MERGE statement).

Note the results in VM Steps measured in both cases. To my mind, the
difference is not worth fretting over if you make scripts for the CLI.
(Note: The time improvement has more to do with caching than efficiency,
but the VM steps don't lie).

-- SQLite version 3.9.2 [ Release: 2015-11-02 ] on SQLitespeed
version 2.0.2.4.

-- Script Items: 10 Parameter Count: 0
-- 2016-06-30 13:47:04.423 | [Info] Script Initialized,
Started executing...
--
================================================================================================

CREATE TABLE t(ID INTEGER PRIMARY KEY, A, B);

INSERT INTO t(A, B) VALUES
('John' , 'Smith'),
('Jerry' , 'Jones'),
('James' , 'Smith'),
('Jimmy' , 'Jones');

SELECT * FROM t;

-- ID | A | B
-- ------------ | ------- | -------
-- 1 | John | Smith
-- 2 | Jerry | Jones
-- 3 | James | Smith
-- 4 | Jimmy | Jones

UPDATE t SET A='Jenny', B='Smith' WHERE ID=2; -- UPSERT Type
1(best)
INSERT OR IGNORE INTO t(ID,A,B) VALUES (2,'Jenny','Smith');

INSERT OR IGNORE INTO t(ID,A,B) VALUES (3,'Jenna','Jones');-- UPSERT Type 2
UPDATE t SET A='Jenna', B='Jones' WHERE ID=2;

UPDATE t SET A='J.K.', B='Johnson' WHERE ID=99; -- UPSERT Type 1 again
INSERT OR IGNORE INTO t(ID,A,B) VALUES (99,'J.K.','Johnson');

SELECT * FROM t;

-- ID | A | B
-- ------------ | ------- | ---------
-- 1 | John | Smith
-- 2 | Jenna | Jones
-- 3 | James | Smith
-- 4 | Jimmy | Jones
-- 99 | J.K. | Johnson

-- Script Stats: Total Script Execution Time: 0d 00h 00m and
00.069s
-- Total Script Query Time: 0d 00h 00m and
00.037s
-- Total Database Rows Changed: 7
-- Total Virtual-Machine Steps: 233
-- Last executed Item Index: 10
-- Last Script Error:
--
------------------------------------------------------------------------------------------------

-- SQLite version 3.9.2 [ Release: 2015-11-02 ] on SQLitespeed
version 2.0.2.4.

-- Script Items: 7 Parameter Count: 0
-- 2016-06-30 13:53:38.403 | [Info] Script Initialized,
Started executing...
--
================================================================================================

CREATE TABLE t(ID INTEGER PRIMARY KEY, A, B);

INSERT INTO t(A, B) VALUES
('John' , 'Smith'),
('Jerry' , 'Jones'),
('James' , 'Smith'),
('Jimmy' , 'Jones');

SELECT * FROM t;

-- ID | A | B
-- ------------ | ------- | -------
-- 1 | John | Smith
-- 2 | Jerry | Jones
-- 3 | James | Smith
-- 4 | Jimmy | Jones

UPDATE t SET A='Jenny', B='Smith' WHERE ID=2;

UPDATE t SET A='Jenna', B='Jones' WHERE ID=2;

INSERT OR IGNORE INTO t(ID,A,B) VALUES (99,'J.K.','Johnson');

SELECT * FROM t;

-- ID | A | B
-- ------------ | ------- | ---------
-- 1 | John | Smith
-- 2 | Jenna | Jones
-- 3 | James | Smith
-- 4 | Jimmy | Jones
-- 99 | J.K. | Johnson

-- Script Stats: Total Script Execution Time: 0d 00h 00m and
00.031s
-- Total Script Query Time: -- --- --- ---
--.----
-- Total Database Rows Changed: 7
-- Total Virtual-Machine Steps: 194
-- Last executed Item Index: 7
-- Last Script Error:
--
------------------------------------------------------------------------------------------------

Cheers,
Ryan

Olivier Mascia

2016-06-30 13:01:39 UTC

Permalink

Thanks Ryan,

Speed concern while using it in scripts is indeed not my concern. I just don't like to have to repeat myself twice... to my computer, especially when I have to tell things first using either the insert or update syntax then say it again using the other syntax. :) And *that* is my concern when having to write a piece of script for some quick data fix. :)

--
Meilleures salutations, Met vriendelijke groeten, Best Regards,
Olivier Mascia, integral.be/om

Olivier Mascia

2016-06-30 09:51:18 UTC

Permalink

Post by Simon Slavin

Post by Olivier Mascia
Of course writing straight code in C/C++ it's rather simple to emulate situations where you want to update some values in an existing row, creating the row if needed.

1) INSERT without the REPLACE
2) UPDATE
When step 1 fails because the key values already exist you trap this and explicitly ignore it in your code (commented, because you're being nice to other programmers). Then it's always the UPDATE which updates the fields you care about.

So
INSERT OR IGNORE ...
followed by
UPDATE ...

The (possibly) annoying thing behind this is that the most logical use case of an "UPDATE OR INSERT"/"UPSERT"/whatever-name is to update a row and in the rare case it might not already exist, to insert it.

I have not yet deep enough knowledge of inner details and optimizations inside sqlite.c but I tend to think it might be more costly to attempt insert first, having it fail (ignored) most of the times, and then only do the update.

--
Meilleures salutations, Met vriendelijke groeten, Best Regards,
Olivier Mascia, integral.be/om

Simon Slavin

2016-06-30 09:56:35 UTC

Permalink

Post by Olivier Mascia
INSERT OR IGNORE ...
followed by
UPDATE ...

Of course. That's what I actually did when I did it, but I'd forgotten the best way. Your way means you don't have to check any error codes. Thanks for reminding me.

Simon.

Paul

2016-06-30 10:03:53 UTC

Permalink

Whatever the case, it will be, on average, 1.5x time less efficient than a 'I-Hope-Would-Be' UPSERT. It all depends on the probability of INSERT to fail.
In the scenario, where updates are frequent and inserts are rare, INSERT + UPDATE approach takes ~2x performance overhead. In most cases we iterate B-Tree 2 times: once for INSERT, that will fail and second time for UPDATE.
In reverse scenario, where inserts are frequent and updates are rare, INSERT + UPDATE approach will have almost no overhead.
But how would you know which scenario would take place, to adjust the order of calls? Especially when you write a generic Insert-Or-Update method? So, whatever strategy you choose (either UPDATE or INSERT is first), statistically, on average, you have 50% of cases where UPDATE would be efficient if performed first and 50% of cases where the reverse is true.

If implemented inside the SQLite engine, overhead can be removed. It is possible to descend B-Tree once, either to find the row to UPDATE or a hint where to INSERT new row. The only thing that puzzles me is why it wasn't implemented years ago.

Post by Olivier Mascia
Of course writing straight code in C/C++ it's rather simple to emulate situations where you want to update some values in an existing row, creating the row if needed.

The standard way of doing this is to do two commands:

1) INSERT without the REPLACE
2) UPDATE

When step 1 fails because the key values already exist you trap this and explicitly ignore it in your code (commented, because you're being nice to other programmers). Then it's always the UPDATE which updates the fields you care about.

Simon.
_______________________________________________
sqlite-users mailing list
sqlite-***@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users