Discussion:
[sqlite] "BIGINT" doesn't act like INTEGER on table creation [Bug]
Peter Halasz
2017-11-24 00:14:07 UTC
Permalink
BIGINT has a "resulting affinity" of INTEGER (
https://sqlite.org/datatype3.html) but cannot be used in its place in the
following example:

CREATE TABLE [FailTable] ( [id] BigInt PRIMARY KEY AUTOINCREMENT )

...as it gives this error:

AUTOINCREMENT is only allowed on an INTEGER PRIMARY KEY

Manually replacing BIGINT with INTEGER leads to success:

CREATE TABLE [OkTable] ( [id] Integer PRIMARY KEY AUTOINCREMENT )

I believe BigInt should work here the same as Integer, as it does in other
contexts. I feel like this is a bug that could be fixed in SQLite. Or
otherwise you might wish to mention the odd exception of AUTOINCREMENT
fields explicitly in the type documentation -- but it seems to me like this
is more something that should be fixed in the code.

Sorry if this is a duplicate, known or already patched bug. It was
difficult to search the bug tracker.

Thanks for taking the time to look into this.

For background, how I came across this: Linq2db, a library for C#,
generates bad SQL in this style and fails. Clearly that library needs to
fix its SQL generator to work better with SQLite (and I've posted a bug
report on their github). However, it also seems odd the SQL should produce
an error at all. If BigInt and Integer were treated synonymously, as
documented, It would be valid. So I felt the bug should be brought to the
attention of the SQLite developers. Hopefully it is something that can be
fixed.

Cheers
Keith Medcalf
2017-11-24 00:35:08 UTC
Permalink
That is correct. You cannot spell "INTEGER PRIMARY KEY", which is an explicit alias for the rowid, as anything other than that one particular correct spelling.

Declaring "bigint primary key" (ie, using an incorrect spelling) defines a column that IS NOT an explicit alias for the rowid. It is merely an integer column, completely separate from the rowid, whcih you happen to want to be the primary key. Since the AUTOINCREMENT only applies to the rowid, you must spell the declaration correctly if you wish to (a) alias the rowid and (b) apply the AUTOINCREMENT option to the rowid.

And no, "BIGINT PRIMARY KEY" is not an alias for "INTEGER PRIMARY KEY"

https://sqlite.org/autoinc.html
https://sqlite.org/datatype3.html
https://www.sqlite.org/lang_createtable.html see "ROWIDs and the INTEGER PRIMARY KEY"

Personally, I have never found a use for the AUTOINCREMENT option. Why is it being used?

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.
-----Original Message-----
From: sqlite-users [mailto:sqlite-users-
Sent: Thursday, 23 November, 2017 17:14
Subject: [sqlite] "BIGINT" doesn't act like INTEGER on table creation
[Bug]
BIGINT has a "resulting affinity" of INTEGER (
https://sqlite.org/datatype3.html) but cannot be used in its place in
the
CREATE TABLE [FailTable] ( [id] BigInt PRIMARY KEY AUTOINCREMENT
)
AUTOINCREMENT is only allowed on an INTEGER PRIMARY KEY
CREATE TABLE [OkTable] ( [id] Integer PRIMARY KEY AUTOINCREMENT )
I believe BigInt should work here the same as Integer, as it does in
other
contexts. I feel like this is a bug that could be fixed in SQLite. Or
otherwise you might wish to mention the odd exception of
AUTOINCREMENT
fields explicitly in the type documentation -- but it seems to me
like this
is more something that should be fixed in the code.
Sorry if this is a duplicate, known or already patched bug. It was
difficult to search the bug tracker.
Thanks for taking the time to look into this.
For background, how I came across this: Linq2db, a library for C#,
generates bad SQL in this style and fails. Clearly that library needs
to
fix its SQL generator to work better with SQLite (and I've posted a
bug
report on their github). However, it also seems odd the SQL should
produce
an error at all. If BigInt and Integer were treated synonymously, as
documented, It would be valid. So I felt the bug should be brought to
the
attention of the SQLite developers. Hopefully it is something that
can be
fixed.
Cheers
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Peter Halasz
2017-11-24 03:23:24 UTC
Permalink
Thank you. It looks like I was looking at the wrong documentation page.

So I will reiterate and clarify my advice that the datatype documentation (
https://sqlite.org/datatype3.html), within the section on "Determination Of
Column Affinity", should more clearly state the fact that the affinity
rules do not apply for the special "INTEGER PRIMARY KEY" type.

Perhaps "INTEGER PRIMARY KEY" could be a special rule #0, as otherwise the
impression is that BigInt and Integer should act the same in all
circumstances, which I have learned now is clearly not the case.

I do notice now there is a cryptic link to "INTEGER PRIMARY KEY
<https://sqlite.org/lang_createtable.html#rowid>" in the section 2, before
type affinity is introduced, but it gives no reason for the reader to think
it links to special rules about how type affinity may or may not work. i.e.
it says "Any column in an SQLite version 3 database, except an INTEGER
PRIMARY KEY column, may be used to store a value of any storage class."

So I hope this documentation page can be made clearer for future devs.

As for whether I need to use AUTOINCREMENT, it seemed like a good idea to
avoid rowid reuse, but I can avoid using it for the sake of optimization,
so probably will.

Thanks again.
Britton Kerin
2017-11-24 06:19:26 UTC
Permalink
I just want to note here that I'm uncomfortable with the docs on type
affinity in a general sort of way. I've read that section several
times in the hope of feeling more comfortable, but so far I still
don't. Unfortunately I can't say how I would improve them exactly, I
just know they leave me with an unhappy feeling of not knowing exactly
what's going on. Perhaps someone with more knowledge could improve
them somehow.

Britton
Post by Peter Halasz
Thank you. It looks like I was looking at the wrong documentation page.
So I will reiterate and clarify my advice that the datatype documentation (
https://sqlite.org/datatype3.html), within the section on "Determination Of
Column Affinity", should more clearly state the fact that the affinity
rules do not apply for the special "INTEGER PRIMARY KEY" type.
Perhaps "INTEGER PRIMARY KEY" could be a special rule #0, as otherwise the
impression is that BigInt and Integer should act the same in all
circumstances, which I have learned now is clearly not the case.
I do notice now there is a cryptic link to "INTEGER PRIMARY KEY
<https://sqlite.org/lang_createtable.html#rowid>" in the section 2, before
type affinity is introduced, but it gives no reason for the reader to think
it links to special rules about how type affinity may or may not work. i.e.
it says "Any column in an SQLite version 3 database, except an INTEGER
PRIMARY KEY column, may be used to store a value of any storage class."
So I hope this documentation page can be made clearer for future devs.
As for whether I need to use AUTOINCREMENT, it seemed like a good idea to
avoid rowid reuse, but I can avoid using it for the sake of optimization,
so probably will.
Thanks again.
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
R Smith
2017-11-24 08:51:30 UTC
Permalink
Post by Peter Halasz
As for whether I need to use AUTOINCREMENT, it seemed like a good idea to
avoid rowid reuse, but I can avoid using it for the sake of optimization,
so probably will.
I agree with Keith and has many times mentioned this before (apologies
to others for the déjà vu). You as the programmer / designer should
always be in control of how and why a new ID is assigned.

It feels like a saving to let the DB engine do it for you, but it isn't
really. What you save in a bit of code that decides the new ID
before-hand (which can be as simple as SELECT MAX(id)+1 FROM t) you
typically have to catch up afterward in code to do the usual
get-LastInsertedID and then pop that in wherever stuff wants to link to
the new item. It's a zero-sum gain really.

I'm not even a big fan of Integer IDs, I think codes / UUIDs are best,
but obviously the speed/size gain with an integer key (especially
INTEGER PRIMARY KEY row-id alias in SQLite) can't be ignored.

Disclaimer: This is only my opinion, well, I'm not completely alone in
it, but it is still an opinion and not a general SQL prescription.
Dominique Devienne
2017-11-24 09:01:01 UTC
Permalink
I'm not even a big fan of Integer IDs, I think codes / UUIDs are best, but
obviously the speed/size gain with an integer key (especially INTEGER
PRIMARY KEY row-id alias in SQLite) can't be ignored.
Disclaimer: This is only my opinion, well, I'm not completely alone in it,
but it is still an opinion and not a general SQL prescription.
Many opinions on this everywhere, and discussed a few times on this list as
well.
Here's one among a million:
https://blog.codinghorror.com/primary-keys-ids-versus-guids/

--DD

PS: I use guids are surrogate keys (always with a companion natural key,
aka unique index, often composite).
Richard Damon
2017-11-24 17:33:26 UTC
Permalink
Post by R Smith
Post by Peter Halasz
As for whether I need to use AUTOINCREMENT, it seemed like a good idea to
avoid rowid reuse, but I can avoid using it for the sake of
optimization,
so probably will.
I agree with Keith and has many times mentioned this before (apologies
to others for the déjà vu). You as the programmer / designer should
always be in control of how and why a new ID is assigned.
I would disagree here, in many cases the ID (rowid) is a purely internal
attribute with the purpose of accessing the data. It may have no problem
domain significance. If the primary key's purpose is purely data access,
then letting the data access layer handle it makes sense. The one case
where it makes sense for the programmer / designer to take control of
the PK is if the domain naturally has an identifier that would be
suitable for the key (an reasonable sized integer that is naturally
unique), The designer should also be fairly certain that it will remain so.
Post by R Smith
It feels like a saving to let the DB engine do it for you, but it
isn't really. What you save in a bit of code that decides the new ID
before-hand (which can be as simple as SELECT MAX(id)+1 FROM t) you
typically have to catch up afterward in code to do the usual
get-LastInsertedID and then pop that in wherever stuff wants to link
to the new item. It's a zero-sum gain really.
The big issues with this method is if two processes both try to create a
new item at the same time, one of them is going to get an error and have
to redo its work. If you start by creating the record with an
autoincrement id, and then getting the ID used, then you remove the need
to handle the error on the simultaneous creation.
Post by R Smith
I'm not even a big fan of Integer IDs, I think codes / UUIDs are best,
but obviously the speed/size gain with an integer key (especially
INTEGER PRIMARY KEY row-id alias in SQLite) can't be ignored.
Making a UUID or other 'big' key the primary access key will increase
the cost of looking up a record for ALL purposes. For small tables, it
might not be measurable, but small tables are less apt to need that sort
of PK either. It can make sense to use a key like that as an External
Key to describe the record to the outside world. Perhaps if the ONLY
accesses to a table are via this 'big' key, and very rarely by some
other key/field, making the big key the primary key would make sense.
Post by R Smith
Disclaimer: This is only my opinion, well, I'm not completely alone in
it, but it is still an opinion and not a general SQL prescription.
--
Richard Damon
Stephen Chrzanowski
2017-11-24 18:20:41 UTC
Permalink
Obviously, this is a design time factor, but, in my applications, I always
use integer IDs if I'm ever going to bring the info that row contains (And
other relevant info) to the UI. I have never had a solid reason to use
GUIDs or UUIDs or whatever.

Any time I'm adding something to a listbox, combo box, or whatever it is,
that element of that lists Object (RE: tStringList) gets the ID that is in
the database. Using a UID would not work, unless I spend CPU cycles
converting a 32 character string to bytes, then to 128-bit numbers. But
then, my compiler is 32-bit only, but can emulate 64-bit numbers.

I can see the reasoning why a UUID is appealing, but, an ID is an ID. It
doesn't matter what it is. When you use INTEGER PRIMARY KEY, you get from
1 to 2^64-1 numbers to play with at LEAST. I don't know if SQLite will go
into 128 or 256bit integers. With UUID, you're looking at a chance of
collision. Small, yes. But its there. With INTEGER PRIMARY KEY, you're
going up by one each time. Since I will never care what that ID is, as a
developer or as a user, Integer IDs are perfect.
Post by Richard Damon
Post by Peter Halasz
As for whether I need to use AUTOINCREMENT, it seemed like a good idea to
avoid rowid reuse, but I can avoid using it for the sake of optimization,
so probably will.
I agree with Keith and has many times mentioned this before (apologies to
others for the déjà vu). You as the programmer / designer should always be
in control of how and why a new ID is assigned.
I would disagree here, in many cases the ID (rowid) is a purely internal
attribute with the purpose of accessing the data. It may have no problem
domain significance. If the primary key's purpose is purely data access,
then letting the data access layer handle it makes sense. The one case
where it makes sense for the programmer / designer to take control of the
PK is if the domain naturally has an identifier that would be suitable for
the key (an reasonable sized integer that is naturally unique), The
designer should also be fairly certain that it will remain so.
It feels like a saving to let the DB engine do it for you, but it isn't
really. What you save in a bit of code that decides the new ID before-hand
(which can be as simple as SELECT MAX(id)+1 FROM t) you typically have to
catch up afterward in code to do the usual get-LastInsertedID and then pop
that in wherever stuff wants to link to the new item. It's a zero-sum gain
really.
The big issues with this method is if two processes both try to create a
new item at the same time, one of them is going to get an error and have to
redo its work. If you start by creating the record with an autoincrement
id, and then getting the ID used, then you remove the need to handle the
error on the simultaneous creation.
I'm not even a big fan of Integer IDs, I think codes / UUIDs are best,
but obviously the speed/size gain with an integer key (especially INTEGER
PRIMARY KEY row-id alias in SQLite) can't be ignored.
Making a UUID or other 'big' key the primary access key will increase the
cost of looking up a record for ALL purposes. For small tables, it might
not be measurable, but small tables are less apt to need that sort of PK
either. It can make sense to use a key like that as an External Key to
describe the record to the outside world. Perhaps if the ONLY accesses to a
table are via this 'big' key, and very rarely by some other key/field,
making the big key the primary key would make sense.
Disclaimer: This is only my opinion, well, I'm not completely alone in
it, but it is still an opinion and not a general SQL prescription.
--
Richard Damon
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Eduardo
2017-11-27 09:28:12 UTC
Permalink
On Thu, 23 Nov 2017 17:35:08 -0700
Post by Keith Medcalf
That is correct. You cannot spell "INTEGER PRIMARY KEY", which is an
explicit alias for the rowid, as anything other than that one particular
correct spelling.
Personally, I have never found a use for the AUTOINCREMENT option. Why is it being used?
If you declare a INTEGER PRIMARY KEY alone, sqlite could reuse rowid number
from deleted rows. Using AUTOINCREMENT forces sqlite to use a rowid bigger than
the last one inserted/updated(1), so you can use it to know if one row is older
or newer than others, without autoincrement no. You can do similar behavior
with a trigger, for example for TEXT columns or automatically with DATE current
time.

(1) You can change the integer primary key to whatever value you want.
Post by Keith Medcalf
The fact that there's a Highway to Hell but only a Stairway to Heaven says a
lot about anticipated traffic volume.
.... See you there then? ;)
--
Eduardo <***@mobelservices.com>
Hick Gunter
2017-11-27 10:19:15 UTC
Permalink
Sorry to rain on your parade, but " The sqlite_sequence table does not track ROWID changes associated with UPDATE statement, only INSERT statements.", so if you change the (auto-generated) rowid, your "row age to row id" relation no longer holds. Same goes for updating the sqlite_sequence entry associated with the table.

Indeed, if you immediately update the sqlite_sequence entry for your INTEGER PRIMARY KEY to MAXINT64, you will achieve "random" rowids.


BTW: Consider the following transaction to implement LRU via rowids:

BEGIN
-- retrieve the rowid for the target row (store as rowid in calling program)
SELECT rowid,... FROM mytable WHERE ...;

-- retrieve the sequence value for the table (store as seq in calling program)
SELECT seq+1 FROM SQLITE_SEQUENCE WHERE name='mytable';

-- update values and set new rowid
UPDATE mytable SET rowid=?seq, ... WHERE rowid=?rowid;

-- update sequence value
UPDATE SQLITE_SEQUENCE SET seq=?seq WHERE name='mytable';

COMMIT;

Yes, this will probably cause foreign keys to break or require execssive work (ON UPDATE CASCADE).

-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:sqlite-users-***@mailinglists.sqlite.org] Im Auftrag von Eduardo
Gesendet: Montag, 27. November 2017 10:28
An: sqlite-***@mailinglists.sqlite.org
Betreff: [EXTERNAL] Re: [sqlite] "BIGINT" doesn't act like INTEGER on table creation [Bug]

On Thu, 23 Nov 2017 17:35:08 -0700
Post by Keith Medcalf
That is correct. You cannot spell "INTEGER PRIMARY KEY", which is an
explicit alias for the rowid, as anything other than that one
particular correct spelling.
Personally, I have never found a use for the AUTOINCREMENT option.
Why is it being used?
If you declare a INTEGER PRIMARY KEY alone, sqlite could reuse rowid number from deleted rows. Using AUTOINCREMENT forces sqlite to use a rowid bigger than the last one inserted/updated(1), so you can use it to know if one row is older or newer than others, without autoincrement no. You can do similar behavior with a trigger, for example for TEXT columns or automatically with DATE current time.

(1) You can change the integer primary key to whatever value you want.
Post by Keith Medcalf
The fact that there's a Highway to Hell but only a Stairway to Heaven
says a lot about anticipated traffic volume.
.... See you there then? ;)
--
Eduardo <***@mobelservices.com>
_______________________________________________
sqlite-users mailing list
sqlite-***@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
Keith Medcalf
2017-11-24 19:16:54 UTC
Permalink
You are missing the point.

The rowid is assigned automatically if it is not specified (that is, if it is null on insert). This is the behaviour of the rowid. In all databases and filesystems ever invented anywhere in the multiverse by any carbon (even non-carbon) based lifeform, whether an ugly bag of mostly water or not.

The AUTOINCREMENT keyword is an entirely different beast that I have never seen actually required for any useful purpose (except that there appears to be a number of folks who seem to think that the rowid is not automatically generated without that keyword because they never RTFM or have some form of cognitive disability).

It has nothing to do with choosing a primary key, a surrogate key, or an alternate key.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.
-----Original Message-----
From: sqlite-users [mailto:sqlite-users-
Sent: Friday, 24 November, 2017 10:33
Subject: Re: [sqlite] "BIGINT" doesn't act like INTEGER on table
creation [Bug]
Post by R Smith
Post by Peter Halasz
As for whether I need to use AUTOINCREMENT, it seemed like a good
idea to
avoid rowid reuse, but I can avoid using it for the sake of
optimization,
so probably will.
I agree with Keith and has many times mentioned this before
(apologies
Post by R Smith
to others for the déjà vu). You as the programmer / designer should
always be in control of how and why a new ID is assigned.
I would disagree here, in many cases the ID (rowid) is a purely
internal
attribute with the purpose of accessing the data. It may have no
problem
domain significance. If the primary key's purpose is purely data
access,
then letting the data access layer handle it makes sense. The one
case
where it makes sense for the programmer / designer to take control of
the PK is if the domain naturally has an identifier that would be
suitable for the key (an reasonable sized integer that is naturally
unique), The designer should also be fairly certain that it will
remain so.
Post by R Smith
It feels like a saving to let the DB engine do it for you, but it
isn't really. What you save in a bit of code that decides the new
ID
Post by R Smith
before-hand (which can be as simple as SELECT MAX(id)+1 FROM t) you
typically have to catch up afterward in code to do the usual
get-LastInsertedID and then pop that in wherever stuff wants to
link
Post by R Smith
to the new item. It's a zero-sum gain really.
The big issues with this method is if two processes both try to
create a
new item at the same time, one of them is going to get an error and
have
to redo its work. If you start by creating the record with an
autoincrement id, and then getting the ID used, then you remove the
need
to handle the error on the simultaneous creation.
Post by R Smith
I'm not even a big fan of Integer IDs, I think codes / UUIDs are
best,
Post by R Smith
but obviously the speed/size gain with an integer key (especially
INTEGER PRIMARY KEY row-id alias in SQLite) can't be ignored.
Making a UUID or other 'big' key the primary access key will increase
the cost of looking up a record for ALL purposes. For small tables,
it
might not be measurable, but small tables are less apt to need that
sort
of PK either. It can make sense to use a key like that as an External
Key to describe the record to the outside world. Perhaps if the ONLY
accesses to a table are via this 'big' key, and very rarely by some
other key/field, making the big key the primary key would make sense.
Post by R Smith
Disclaimer: This is only my opinion, well, I'm not completely alone
in
Post by R Smith
it, but it is still an opinion and not a general SQL prescription.
--
Richard Damon
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Keith Medcalf
2017-11-24 19:25:34 UTC
Permalink
Actually a UUID or a GUID has a 100% certainty of a collision, not just a possibility of a collision. Just as all hash algorithms which take something and generate a shorter "hash" or "checksum" will always have collisions. Without exception and as an absolute 100% certainty. There is no way to avoid this mathematical certainty.

However, whether the absolute and unmitigatable certainty of a collision is of any import or not is an entirely different matter.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.
-----Original Message-----
From: sqlite-users [mailto:sqlite-users-
Sent: Friday, 24 November, 2017 11:21
To: SQLite mailing list
Subject: Re: [sqlite] "BIGINT" doesn't act like INTEGER on table
creation [Bug]
Obviously, this is a design time factor, but, in my applications, I
always
use integer IDs if I'm ever going to bring the info that row contains
(And
other relevant info) to the UI. I have never had a solid reason to
use
GUIDs or UUIDs or whatever.
Any time I'm adding something to a listbox, combo box, or whatever it
is,
that element of that lists Object (RE: tStringList) gets the ID that
is in
the database. Using a UID would not work, unless I spend CPU cycles
converting a 32 character string to bytes, then to 128-bit numbers.
But
then, my compiler is 32-bit only, but can emulate 64-bit numbers.
I can see the reasoning why a UUID is appealing, but, an ID is an ID.
It
doesn't matter what it is. When you use INTEGER PRIMARY KEY, you get
from
1 to 2^64-1 numbers to play with at LEAST. I don't know if SQLite
will go
into 128 or 256bit integers. With UUID, you're looking at a chance
of
collision. Small, yes. But its there. With INTEGER PRIMARY KEY,
you're
going up by one each time. Since I will never care what that ID is,
as a
developer or as a user, Integer IDs are perfect.
family.org>
Post by Richard Damon
Post by R Smith
Post by Peter Halasz
As for whether I need to use AUTOINCREMENT, it seemed like a good
idea to
Post by Richard Damon
Post by R Smith
Post by Peter Halasz
avoid rowid reuse, but I can avoid using it for the sake of
optimization,
Post by Richard Damon
Post by R Smith
Post by Peter Halasz
so probably will.
I agree with Keith and has many times mentioned this before
(apologies to
Post by Richard Damon
Post by R Smith
others for the déjà vu). You as the programmer / designer should
always be
Post by Richard Damon
Post by R Smith
in control of how and why a new ID is assigned.
I would disagree here, in many cases the ID (rowid) is a purely
internal
Post by Richard Damon
attribute with the purpose of accessing the data. It may have no
problem
Post by Richard Damon
domain significance. If the primary key's purpose is purely data
access,
Post by Richard Damon
then letting the data access layer handle it makes sense. The one
case
Post by Richard Damon
where it makes sense for the programmer / designer to take control
of the
Post by Richard Damon
PK is if the domain naturally has an identifier that would be
suitable for
Post by Richard Damon
the key (an reasonable sized integer that is naturally unique), The
designer should also be fairly certain that it will remain so.
Post by R Smith
It feels like a saving to let the DB engine do it for you, but it
isn't
Post by Richard Damon
Post by R Smith
really. What you save in a bit of code that decides the new ID
before-hand
Post by Richard Damon
Post by R Smith
(which can be as simple as SELECT MAX(id)+1 FROM t) you typically
have to
Post by Richard Damon
Post by R Smith
catch up afterward in code to do the usual get-LastInsertedID and
then pop
Post by Richard Damon
Post by R Smith
that in wherever stuff wants to link to the new item. It's a zero-
sum gain
Post by Richard Damon
Post by R Smith
really.
The big issues with this method is if two processes both try to
create a
Post by Richard Damon
new item at the same time, one of them is going to get an error and
have to
Post by Richard Damon
redo its work. If you start by creating the record with an
autoincrement
Post by Richard Damon
id, and then getting the ID used, then you remove the need to
handle the
Post by Richard Damon
error on the simultaneous creation.
Post by R Smith
I'm not even a big fan of Integer IDs, I think codes / UUIDs are
best,
Post by Richard Damon
Post by R Smith
but obviously the speed/size gain with an integer key (especially
INTEGER
Post by Richard Damon
Post by R Smith
PRIMARY KEY row-id alias in SQLite) can't be ignored.
Making a UUID or other 'big' key the primary access key will
increase the
Post by Richard Damon
cost of looking up a record for ALL purposes. For small tables, it
might
Post by Richard Damon
not be measurable, but small tables are less apt to need that sort
of PK
Post by Richard Damon
either. It can make sense to use a key like that as an External Key
to
Post by Richard Damon
describe the record to the outside world. Perhaps if the ONLY
accesses to a
Post by Richard Damon
table are via this 'big' key, and very rarely by some other
key/field,
Post by Richard Damon
making the big key the primary key would make sense.
Post by R Smith
Disclaimer: This is only my opinion, well, I'm not completely
alone in
Post by Richard Damon
Post by R Smith
it, but it is still an opinion and not a general SQL prescription.
--
Richard Damon
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-
users
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Richard Damon
2017-11-24 21:38:35 UTC
Permalink
Post by Keith Medcalf
Actually a UUID or a GUID has a 100% certainty of a collision, not just a possibility of a collision. Just as all hash algorithms which take something and generate a shorter "hash" or "checksum" will always have collisions. Without exception and as an absolute 100% certainty. There is no way to avoid this mathematical certainty.
However, whether the absolute and unmitigatable certainty of a collision is of any import or not is an entirely different matter.
Absolutely incorrect, for a UID. Perhaps if you are talking about actual
hashes, you can say that there are always multiple (at least potential)
messages that will generate the same hash value (but for a good hash,
the likelihood that one of them is sensible or even actually generated
is minuscule). for a UID, while they are typically created by a hash of
various information, including something that varies each time a given
generator is used, what those inputs actually are is generally
unimportant, but are mostly provided to help make the 'randomness' of
the choice more 'random'. We are never really concerned with 'potential'
collisions, only actual collisions.

One proof of the falsehood of your assertion is that we CAN fill a
database with some data using UIDs, and we will almost certainly not get
a collision, while you assertion we will.

Yes, there is a limit on how many entries we can generate before we will
likely, or even certainly hit a collision, but if that number is
significantly higher than the number of records we will generate (or
even CAN generate), we can assume relative safety. With a 128 bit UID,
the approximate point we need to worry about random collisions is the
order of 2^64, I suspect that creating an SQLite database with 2^64
non-trivial records in a single table is likely going to have other
issues besides unique key collisions.
--
Richard Damon
Jean-Christophe Deschamps
2017-11-24 22:26:07 UTC
Permalink
Post by Richard Damon
One proof of the falsehood of your assertion is that we CAN fill a
database with some data using UIDs, and we will almost certainly not
get a collision, while you assertion we will.
This is an attempt at "proof by example". Keith is perfectly right
--mathematically speaking-- and your "proof" doesn't hold water, I mean
as a formal proof. The best proof that your "proof" isn't a proof is
that you feel obliged to add "almost certainly".

If SQLite was coded so that "SELECT 3.1415926" would "almost certainly"
return the expected value you just wouldn't use it. Same thing can be
applied to, say, a the classical Hello world! program and a C compiler,
but in this case experience teaches us to be even much more careful!

I agree that you or anyone else _may_ consider the odds of UUID
collision(s) rare enough to ignore the issue and this is examplified in
practice by a huge number of systems using UUIDs or similar things. But
this doesn't make a proof of anything. That's the difference between
theory and practice. Search for a good quote in this list about theory
and practice ;-)

JcD
Richard Damon
2017-11-24 22:49:26 UTC
Permalink
Post by Jean-Christophe Deschamps
Post by Richard Damon
One proof of the falsehood of your assertion is that we CAN fill a
database with some data using UIDs, and we will almost certainly not
get a collision, while you assertion we will.
This is an attempt at "proof by example". Keith is perfectly right
--mathematically speaking-- and your "proof" doesn't hold water, I
mean as a formal proof.  The best proof that your "proof" isn't a
proof is that you feel obliged  to add "almost certainly".
DISproof by example is a perfectly valid method. If someone makes a
claim that something is ALWAYS true, ONE counter example IS a disproof.
I said almost certainly as the chance of a collision isn't 0 (to be able
to say with certainty) but is most defintely less than the 100% claimed.
Post by Jean-Christophe Deschamps
If SQLite was coded so that "SELECT 3.1415926" would "almost
certainly" return the expected value you just wouldn't use it. Same
thing can be applied to, say, a the classical Hello world! program and
a C compiler, but in this case experience teaches us to be even much
more careful!
I agree that you or anyone else _may_ consider the odds of UUID
collision(s) rare enough to ignore the issue and this is examplified
in practice by a huge number of systems using UUIDs or similar things.
But this doesn't make a proof of anything. That's the difference
between theory and practice. Search for a good quote in this list
about theory and practice ;-)
Considering that for a 'reasonable' number of records, (maybe even into
billions), the odds of a collision are probably on the order of once in
the life of the universe (I would need to run the math to get the exact
number, but it is minuscule in the period of the lifetime of a typical
computer), it seems to be a reasonable assumption. A big part might be
the consequences of a collision.
Post by Jean-Christophe Deschamps
JcD
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
--
Richard Damon
Jean-Christophe Deschamps
2017-11-25 01:58:02 UTC
Permalink
Post by Richard Damon
Post by Jean-Christophe Deschamps
Post by Richard Damon
One proof of the falsehood of your assertion is that we CAN fill a
database with some data using UIDs, and we will almost certainly not
get a collision, while you assertion we will.
This is an attempt at "proof by example". Keith is perfectly right
--mathematically speaking-- and your "proof" doesn't hold water, I
mean as a formal proof. The best proof that your "proof" isn't a
proof is that you feel obliged to add "almost certainly".
DISproof by example is a perfectly valid method. If someone makes a
claim that something is ALWAYS true, ONE counter example IS a
disproof. I said almost certainly as the chance of a collision isn't 0
(to be able to say with certainty) but is most defintely less than the
100% claimed.
You're confusing one mathematical theorem and one practical statement.
The first is the _mathematical_ fact that any PRNG (using any fixed
number of random bits, which is what xUIDs are) will provide an
infinite number of collisions with probability 1. This is definitely
true. Of course here, the number of samples is implicitely infinite.

Your practical statement is that you can "most certainly" ignore the
possibility of collision when feeding 2^N xUIDs into a unique column
without loosing sleep. That's good enough in practice. The issue with
your "demonstration" is that 2^N is bounded, whatever finite N you
choose. Hence you don't contradict what Keith said, you just say
something different applying to restricted cases. You're speaking about
practice, while Keith told about math. You're both right, each from his
own point of view. But you can't claim to disproof a trivially true
theorem this way, by changing its premices.

An event with probability 10^-100000...000 (any finite number of
zeroes) will occur at least once, provided you run enough tries. It'll
occur an infinite number of times if you run an infinite number of
tries. Else its probability would be zero.
Your "disproof" amounts to say that 10^-100000...000 = 0

And neither Keith nor I ever said that an xUID collision will occur
with probability 1 after 2^64 samples. That would be false and that's
why people feel free to use xUIDs _AND_ sleep quietly.

JcD
Richard Damon
2017-11-25 03:11:57 UTC
Permalink
Post by Jean-Christophe Deschamps
Post by Richard Damon
Post by Jean-Christophe Deschamps
Post by Richard Damon
One proof of the falsehood of your assertion is that we CAN fill a
database with some data using UIDs, and we will almost certainly
not get a collision, while you assertion we will.
This is an attempt at "proof by example". Keith is perfectly right
--mathematically speaking-- and your "proof" doesn't hold water, I
mean as a formal proof.  The best proof that your "proof" isn't a
proof is that you feel obliged  to add "almost certainly".
DISproof by example is a perfectly valid method. If someone makes a
claim that something is ALWAYS true, ONE counter example IS a
disproof. I said almost certainly as the chance of a collision isn't
0 (to be able to say with certainty) but is most defintely less than
the 100% claimed.
You're confusing one mathematical theorem and one practical statement.
The first is the _mathematical_ fact that any PRNG (using any fixed
number of random bits, which is what xUIDs are) will provide an
infinite number of collisions with probability 1. This is definitely
true. Of course here, the number of samples is implicitely infinite.
Your practical statement is that you can "most certainly" ignore the
possibility of collision when feeding 2^N xUIDs into a unique column
without loosing sleep. That's good enough in practice. The issue with
your "demonstration" is that 2^N is bounded, whatever finite N you
choose. Hence you don't contradict what Keith said, you just say
something different applying to restricted cases. You're speaking
about practice, while Keith told about math. You're both right, each
from his own point of view. But you can't claim to disproof a
trivially true theorem this way, by changing its premices.
An event with probability 10^-100000...000 (any finite number of
zeroes) will occur at least once, provided you run enough tries. It'll
occur an infinite number of times if you run an infinite number of
tries. Else its probability would be zero.
Your "disproof" amounts to say that 10^-100000...000 = 0
And neither Keith nor I ever said that an xUID collision will occur
with probability 1 after 2^64 samples. That would be false and that's
why people feel free to use xUIDs _AND_ sleep quietly.
JcD
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
The statement, and I quote, was

Actually a UUID or a GUID has a 100% certainty of a collision, not just
a possibility of a collision. Just as all hash algorithms which take
something and generate a shorter "hash" or "checksum" will always have
collisions. Without exception and as an absolute 100% certainty. There
is no way to avoid this mathematical certainty.

The claim makes a positive statement of a 100% chance of collision, and
makes an argument about it being a hash, and that it isn't just a chance.

Something happening at least one in an extremely large number of trials
was NEVER a 100% probability in any form of math I have heard of. The
number of records needed to have a significant chance of the collision,
is beyond any practical usage. If the claim was a non-zero chance, then
it would be true. Probability has always been related to times happened
/ times tried. So your claim is that 10^-10000...  == 1, saying
something will EVENTUAL happen is a claim on non-zero probability, not
of 100% probability.

If you mean that a finite width field can't hold unique values for an
infinite of records, well, duh, why all the irrelevant references to
hashes or even that it is a UUID or a GUID.

If it was meant that EVENTUAL, after an impossibly long time, you will
get a collision, that is being absurd. I can say with better certainty
that long before that happens, the computers running this database are
going to break, so we will never get to the point of the collision.
--
Richard Damon
J. King
2017-11-25 03:54:19 UTC
Permalink
Version 1 UUIDs only use a random number (16 bits) in the case of an uninitialized clock sequence (a case which, ideally, should only occur the first time a device generates a UUID). Version 1 UUIDs especially avoid using random numbers; they are also not a shortening of longer input.

In short, version 1 UUIDs are not a PRNG scheme, nor are they the same as hash functions.

I'm not a mathematician, and it's been a while since I've read the relevant RFC, but I believe collisions in a proper, strict implementation of V1 UUIDs are impossible by design until either a) the 60-bit timestamp overflows, or b) the MAC address namespace is exhausted. It's not a matter of probability, and it's only "a certainty" after the end of their design lifetime.

Of course, UUIDs being of finite size, they will eventually be exhausted, and a single machine may only generate 65536 identifiers in a 100-nanosecond span of time. They will not, however, collide.
Post by Jean-Christophe Deschamps
Post by Richard Damon
Post by Jean-Christophe Deschamps
Post by Richard Damon
One proof of the falsehood of your assertion is that we CAN fill a
database with some data using UIDs, and we will almost certainly not
get a collision, while you assertion we will.
This is an attempt at "proof by example". Keith is perfectly right
--mathematically speaking-- and your "proof" doesn't hold water, I
mean as a formal proof. The best proof that your "proof" isn't a
proof is that you feel obliged to add "almost certainly".
DISproof by example is a perfectly valid method. If someone makes a
claim that something is ALWAYS true, ONE counter example IS a
disproof. I said almost certainly as the chance of a collision isn't 0
(to be able to say with certainty) but is most defintely less than the
100% claimed.
You're confusing one mathematical theorem and one practical statement.
The first is the _mathematical_ fact that any PRNG (using any fixed
number of random bits, which is what xUIDs are) will provide an
infinite number of collisions with probability 1. This is definitely
true. Of course here, the number of samples is implicitely infinite.
Your practical statement is that you can "most certainly" ignore the
possibility of collision when feeding 2^N xUIDs into a unique column
without loosing sleep. That's good enough in practice. The issue with
your "demonstration" is that 2^N is bounded, whatever finite N you
choose. Hence you don't contradict what Keith said, you just say
something different applying to restricted cases. You're speaking about
practice, while Keith told about math. You're both right, each from his
own point of view. But you can't claim to disproof a trivially true
theorem this way, by changing its premices.
An event with probability 10^-100000...000 (any finite number of
zeroes) will occur at least once, provided you run enough tries. It'll
occur an infinite number of times if you run an infinite number of
tries. Else its probability would be zero.
Your "disproof" amounts to say that 10^-100000...000 = 0
And neither Keith nor I ever said that an xUID collision will occur
with probability 1 after 2^64 samples. That would be false and that's
why people feel free to use xUIDs _AND_ sleep quietly.
JcD
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
Darko Volaric
2017-11-25 05:29:20 UTC
Permalink
What about invalid and reused MAC addresses and devices with no MAC address at all?
What about time resets to the epoch which are not restored, user time changes, daylight saving or leap seconds?

It sounds even more probabilistic than the probabilistic methods. Does anyone actually use it?
Post by J. King
Version 1 UUIDs only use a random number (16 bits) in the case of an uninitialized clock sequence (a case which, ideally, should only occur the first time a device generates a UUID). Version 1 UUIDs especially avoid using random numbers; they are also not a shortening of longer input.
In short, version 1 UUIDs are not a PRNG scheme, nor are they the same as hash functions.
I'm not a mathematician, and it's been a while since I've read the relevant RFC, but I believe collisions in a proper, strict implementation of V1 UUIDs are impossible by design until either a) the 60-bit timestamp overflows, or b) the MAC address namespace is exhausted. It's not a matter of probability, and it's only "a certainty" after the end of their design lifetime.
Of course, UUIDs being of finite size, they will eventually be exhausted, and a single machine may only generate 65536 identifiers in a 100-nanosecond span of time. They will not, however, collide.
Post by Jean-Christophe Deschamps
Post by Richard Damon
Post by Jean-Christophe Deschamps
Post by Richard Damon
One proof of the falsehood of your assertion is that we CAN fill a
database with some data using UIDs, and we will almost certainly not
get a collision, while you assertion we will.
This is an attempt at "proof by example". Keith is perfectly right
--mathematically speaking-- and your "proof" doesn't hold water, I
mean as a formal proof. The best proof that your "proof" isn't a
proof is that you feel obliged to add "almost certainly".
DISproof by example is a perfectly valid method. If someone makes a
claim that something is ALWAYS true, ONE counter example IS a
disproof. I said almost certainly as the chance of a collision isn't 0
(to be able to say with certainty) but is most defintely less than the
100% claimed.
You're confusing one mathematical theorem and one practical statement.
The first is the _mathematical_ fact that any PRNG (using any fixed
number of random bits, which is what xUIDs are) will provide an
infinite number of collisions with probability 1. This is definitely
true. Of course here, the number of samples is implicitely infinite.
Your practical statement is that you can "most certainly" ignore the
possibility of collision when feeding 2^N xUIDs into a unique column
without loosing sleep. That's good enough in practice. The issue with
your "demonstration" is that 2^N is bounded, whatever finite N you
choose. Hence you don't contradict what Keith said, you just say
something different applying to restricted cases. You're speaking about
practice, while Keith told about math. You're both right, each from his
own point of view. But you can't claim to disproof a trivially true
theorem this way, by changing its premices.
An event with probability 10^-100000...000 (any finite number of
zeroes) will occur at least once, provided you run enough tries. It'll
occur an infinite number of times if you run an infinite number of
tries. Else its probability would be zero.
Your "disproof" amounts to say that 10^-100000...000 = 0
And neither Keith nor I ever said that an xUID collision will occur
with probability 1 after 2^64 samples. That would be false and that's
why people feel free to use xUIDs _AND_ sleep quietly.
JcD
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Peter Da Silva
2017-11-25 13:06:11 UTC
Permalink
Ah, I see someone else has already brought up version 1 UUIDs.
Post by Darko Volaric
What about invalid and reused MAC addresses and devices with no MAC address at all?
Not an issue with SQLite since that part of the UUID is a constant within a given database. It would be reasonable to simply pick "0" then set the low bit of the first octet to 1 in accordance with the standard for non-MAC node IDs. The standard suggests using a hash or a cryptographically secure random number generator, setting the
Post by Darko Volaric
What about time resets to the epoch which are not restored, user time changes,
I know some systems at least increment the node each time a time change is detected. It will take 2^47 time changes to roll over. Since the node part is not relevant to SQLite, this is perfectly safe.
Post by Darko Volaric
daylight saving or leap seconds?
Not relevant to the timestamp format, since it's an epoch time.
Post by Darko Volaric
It sounds even more probabilistic than the probabilistic methods. Does anyone actually use it?
It's Microsoft's standard GUID/UUID format and is also used by a lot of other systems for compatibility.
Peter Da Silva
2017-11-25 13:09:28 UTC
Permalink
Post by Peter Da Silva
Post by Darko Volaric
What about time resets to the epoch which are not restored, user time changes,
I know some systems at least increment the node each time a time change is detected. It will take 2^47 time changes to roll over. Since the node part is not relevant to SQLite, this is perfectly safe.
Also, the UUID clock doesn't need to be the system clock, so you can simply ignore backwards changes in the system clock (or maintain a common offset that gets updated whenever a backwards change is detected in the system clock). Over time this may trim a few decades off the 3000+ year life of the format.
Wout Mertens
2017-11-26 08:21:33 UTC
Permalink
The one thing that saddens me is that the clock is not the full first
part of the UUID, so it's not a proxy for sorting by creation date. I
often wonder why they did that, they must have done it on purpose.
Post by Peter Da Silva
Post by Peter Da Silva
Post by Darko Volaric
What about time resets to the epoch which are not restored, user time changes,
I know some systems at least increment the node each time a time change is
detected. It will take 2^47 time changes to roll over. Since the node part
is not relevant to SQLite, this is perfectly safe.
Also, the UUID clock doesn't need to be the system clock, so you can simply
ignore backwards changes in the system clock (or maintain a common offset
that gets updated whenever a backwards change is detected in the system
clock). Over time this may trim a few decades off the 3000+ year life of the
format.
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Peter Da Silva
2017-11-25 12:54:32 UTC
Permalink
Are people here talking about UUIDs or things that just look like UUIDs? It sounds like the latter. UUIDs are actually structured objects, with embedded type bits. There are multiple UUID generation schemes, one of which is based on random numbers, others are based on hashes, and there is the common ugly workaround of generating a 128 bit hash and calling it a UUID.

If you use version 1 UUIDs you're mathematically guaranteed to avoid collisions. At least for the next 3000 years:

Version 1 UUIDs are based on a node address (MAC), a 60 bit clock, and a node-specific sequence number. You can generate 163 billion version 1 UUIDs per second and they won't roll over until 5236 AD.
J Decker
2017-11-24 22:56:43 UTC
Permalink
Post by Keith Medcalf
Actually a UUID or a GUID has a 100% certainty of a collision, not just a
possibility of a collision. Just as all hash algorithms which take
something and generate a shorter "hash" or "checksum" will always have
collisions. Without exception and as an absolute 100% certainty. There is
no way to avoid this mathematical certainty.
However, whether the absolute and unmitigatable certainty of a collision
is of any import or not is an entirely different matter.
*I*t's also 100% certainty that collision will occur with a 32 or 64 bit
row id before a UUID collision occurs.

For the truly paranoid UUID can be made CUID (
https://github.com/ericelliott/cuid ) by using seconds for 32 of the 128
bits.

" 1 billion UUIDs per second for about 85 years, and a file containing this
many UUIDs, at 16 bytes per UUID, would be about 45 exabytes, many times
larger than the largest databases currently in existence, which are on the
order of hundreds of petabytes. "
85 years in seconds is 2 680 560 000, which is about 1/2 of a 32 bit
number, so you can add a factor of 170 to that ... or 14,450 years before
50% collision probability. (generating a billion a second mind you for
14,000 years)
Peter Halasz
2017-11-24 23:13:40 UTC
Permalink
Sorry to steer the conversation back to the topic.

Looks like I do need to use AUTOINCREMENT after all, otherwise the
framework I'm using doesn't appear to recognize that INTEGER PRIMARY KEY
columns are allowed a NULL value on insert. That's a shame.

Is there a way to query a schema to get a direct answer to whether table
has a column which acts as the ROW ID alias? I'd like to improve linq2db's
code generation if I can, but I can't see any straightforward way to do it
as the ROW ID information is hidden (in SQLite's API, not just its
documentation)

I found someone asking the same question on this mailing list in 2010 who
was told to: [1]

"look at column 'sql' of TABLE sqlite_master and parse the creation
statement"

Which is frankly ridiculous.

I'm hoping there's an actual answer 7 years later that doesn't involve
implementing a SQL parser? (* please *don't even make suggestions on how to
do this)

I can't find anything in the pragma documentation. [2] The closest I could
see is "*PRAGMA **index_xinfo(**index-name**);*" which will give
information on whether an index references a rowid. I tried creating an
index for a column just to test if it was the ROW ID but it doesn't
actually work that way.

On the page https://www.sqlite.org/rowidtable.html it states:

-

All of the complications above (and others not mentioned here) arise
from the need to preserve backwards compatibility to the tens of billions
of SQLite database files in circulation. In a perfect world, there would be
no such thing as a "rowid" and all tables would following the standard
semantics implemented as WITHOUT ROWID tables, only without the extra
"WITHOUT ROWID" keywords. Unfortunately, life is messy. The designer of
SQLite offers his sincere apology for the current mess.

So it appears ROWID is sticking around. Why is it hidden in the
documentation (not listed on on pages like the datatype page), and hidden
in the pragma interface as well?

Is there an API call or PRAGMA statement that gives this information
directly?

Thanks.


[1]
http://sqlite.1065341.n5.nabble.com/Introspection-and-RowID-INTEGER-PRIMARY-KEY-td60462.html
[2] https://sqlite.org/pragma.html
J Decker
2017-11-25 00:16:45 UTC
Permalink
Post by Peter Halasz
Sorry to steer the conversation back to the topic.
Looks like I do need to use AUTOINCREMENT after all, otherwise the
framework I'm using doesn't appear to recognize that INTEGER PRIMARY KEY
columns are allowed a NULL value on insert. That's a shame.
Is there a way to query a schema to get a direct answer to whether table
has a column which acts as the ROW ID alias? I'd like to improve linq2db's
code generation if I can, but I can't see any straightforward way to do it
as the ROW ID information is hidden (in SQLite's API, not just its
documentation)
I found someone asking the same question on this mailing list in 2010 who
was told to: [1]
"look at column 'sql' of TABLE sqlite_master and parse the creation
statement"
Which is frankly ridiculous.
*shrug* it probably is ridiculous, but I did anyway.... XDataTable extends
DataTable and adds foriegn key relations; otherwise you need the full
DataSet to deal with some table-oriented things (like foreign keys)
https://sourceforge.net/p/xperdex/mercurial/ci/default/tree/xperdex.classes.2/XDataTable.cs

This is a parser that can either generate a create table statement from a
datatable, or a datatable from a create table statement. (Or merge a data
table filled with existing columns and merge it with what's already in a
database)

https://sourceforge.net/p/xperdex/mercurial/ci/default/tree/xperdex.classes.2/SQL_Utilities.cs

I use this to parse the create table statement into reasonable tokens.
https://sourceforge.net/p/xperdex/mercurial/ci/default/tree/xperdex.classes.2/Types/XString.cs

The above can probably be disected from the full surrounding library with a
little work... Some features of XDataTable should probably be removed...
the first version automatically synced to the database when rows were
added/deleted/modified.... I eventually moved that behind an option 'live'
that if 'live' do the inserts directly; otherwise wait, and then later, a
full dataset (XDataSet) can sync all changes in a dataset in appropriate
order so foriegn key parent keys are inserted first.

It has a personality selection also, based on information from
DsnConnection class, which tells it whether it is SQL Server, MySQL or
Sqlite, so generation of types for columns in the database form the types
in the DataTable change depending on the flavor of database, and
constraint/foriegn key generation changes also....
Post by Peter Halasz
I'm hoping there's an actual answer 7 years later that doesn't involve
implementing a SQL parser? (* please *don't even make suggestions on how to
do this)
I can't find anything in the pragma documentation. [2] The closest I could
see is "*PRAGMA **index_xinfo(**index-name**);*" which will give
information on whether an index references a rowid. I tried creating an
index for a column just to test if it was the ROW ID but it doesn't
actually work that way.
-
All of the complications above (and others not mentioned here) arise
from the need to preserve backwards compatibility to the tens of billions
of SQLite database files in circulation. In a perfect world, there would be
no such thing as a "rowid" and all tables would following the standard
semantics implemented as WITHOUT ROWID tables, only without the extra
"WITHOUT ROWID" keywords. Unfortunately, life is messy. The designer of
SQLite offers his sincere apology for the current mess.
So it appears ROWID is sticking around. Why is it hidden in the
documentation (not listed on on pages like the datatype page), and hidden
in the pragma interface as well?
Is there an API call or PRAGMA statement that gives this information
directly?
Thanks.
[1]
http://sqlite.1065341.n5.nabble.com/Introspection-and-
RowID-INTEGER-PRIMARY-KEY-td60462.html
[2] https://sqlite.org/pragma.html
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
petern
2017-11-25 00:56:45 UTC
Permalink
Apparently you would query pragma_table_info for "INTEGER" PK columns and
then ask through a trivial extension function about the other column
meta-data:

https://www.sqlite.org/c3ref/table_column_metadata.html

The basic plot is illustrated below:

sqlite> .load column-meta-data.so
sqlite> SELECT *,isRowId(name) FROM pragma_table_info('the_table') WHERE pk
AND type='INTEGER'

cid,name,type,notnull,dflt_value,pk,"isRowId('the_table',name)"
...1
...etc

Your only serious problem, I think, would be to distribute the
column-meta-data.c extension with your tool.
Post by Peter Halasz
Sorry to steer the conversation back to the topic.
Looks like I do need to use AUTOINCREMENT after all, otherwise the
framework I'm using doesn't appear to recognize that INTEGER PRIMARY KEY
columns are allowed a NULL value on insert. That's a shame.
Is there a way to query a schema to get a direct answer to whether table
has a column which acts as the ROW ID alias? I'd like to improve linq2db's
code generation if I can, but I can't see any straightforward way to do it
as the ROW ID information is hidden (in SQLite's API, not just its
documentation)
I found someone asking the same question on this mailing list in 2010 who
was told to: [1]
"look at column 'sql' of TABLE sqlite_master and parse the creation
statement"
Which is frankly ridiculous.
I'm hoping there's an actual answer 7 years later that doesn't involve
implementing a SQL parser? (* please *don't even make suggestions on how to
do this)
I can't find anything in the pragma documentation. [2] The closest I could
see is "*PRAGMA **index_xinfo(**index-name**);*" which will give
information on whether an index references a rowid. I tried creating an
index for a column just to test if it was the ROW ID but it doesn't
actually work that way.
-
All of the complications above (and others not mentioned here) arise
from the need to preserve backwards compatibility to the tens of billions
of SQLite database files in circulation. In a perfect world, there would be
no such thing as a "rowid" and all tables would following the standard
semantics implemented as WITHOUT ROWID tables, only without the extra
"WITHOUT ROWID" keywords. Unfortunately, life is messy. The designer of
SQLite offers his sincere apology for the current mess.
So it appears ROWID is sticking around. Why is it hidden in the
documentation (not listed on on pages like the datatype page), and hidden
in the pragma interface as well?
Is there an API call or PRAGMA statement that gives this information
directly?
Thanks.
[1]
http://sqlite.1065341.n5.nabble.com/Introspection-and-
RowID-INTEGER-PRIMARY-KEY-td60462.html
[2] https://sqlite.org/pragma.html
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Peter Halasz
2017-11-25 01:57:21 UTC
Permalink
Post by petern
sqlite> .load column-meta-data.so
Sorry I'm at a loss to find this extension? Google gives me nothing related
to SQLite for "isRowId", "column-meta-data.so", "column-meta-data.c", etc.
petern
2017-11-25 03:58:26 UTC
Permalink
Well, first you imagine you're back at the office in Santa Clara or Redmond
in the early 1990's.
Then take a belt of whisky, cross your eyes, and paste from doc to
clipboard.... a few edits and voila:

#include "sqlite3ext.h"
#include <string.h>
SQLITE_EXTENSION_INIT1
static struct metadata {
char const *zDataType; /* OUTPUT: Declared data type */
char const *zCollSeq; /* OUTPUT: Collation sequence name */
int NotNull; /* OUTPUT: True if NOT NULL constraint exists */
int PrimaryKey; /* OUTPUT: True if column part of PK */
int Autoinc;/* OUTPUT: True if column is auto-increment */
} md;
static void initmd(sqlite3_context *context, int argc, sqlite3_value
**argv) {
sqlite3_table_column_metadata(
sqlite3_context_db_handle(context), /* Connection handle */
0, /* Database name or NULL */
(char const*)sqlite3_value_text(argv[0]), /* Table name */
(char const*)sqlite3_value_text(argv[1]), /* Column name */
&md.zDataType, /* OUTPUT: Declared data type */
&md.zCollSeq, /* OUTPUT: Collation sequence name */
&md.NotNull, /* OUTPUT: True if NOT NULL constraint exists */
&md.PrimaryKey, /* OUTPUT: True if column part of PK */
&md.Autoinc/* OUTPUT: True if column is auto-increment */
);
}
static void collseq(sqlite3_context *context, int argc, sqlite3_value
**argv) {
initmd(context,argc,argv);
if (!md.zCollSeq) return;
sqlite3_result_text(context, md.zCollSeq, strlen(md.zCollSeq),
SQLITE_TRANSIENT);
}
static void autoinc(sqlite3_context *context, int argc, sqlite3_value
**argv) {
initmd(context,argc,argv);
sqlite3_result_int(context, md.Autoinc);
}
#ifdef _WIN32
__declspec(dllexport)
#endif
int sqlite3_colmetadata_init(sqlite3 *db, char **pzErrMsg, const
sqlite3_api_routines *pApi) {
SQLITE_EXTENSION_INIT2(pApi);
int rc = sqlite3_create_function(db, "collseq", 2, SQLITE_UTF8, 0,
collseq, 0, 0);
if (SQLITE_OK == rc) sqlite3_create_function(db, "autoinc", 2,
SQLITE_UTF8, 0, autoinc, 0, 0);
return rc;
}

Linux box compile is something like this:

gcc -I<sqlite build dir> -fPIC -lm -shared colmetadata.c -o colmetadata.so

Then test it out:

sqlite> .load colmetadata.so
sqlite> SELECT
*,collseq('sqlite_master',name)collseq,autoinc('sqlite_master',name)autoinc
FROM pragma_table_info('sqlite_master');
cid,name,type,notnull,dflt_value,pk,collseq,autoinc
0,type,text,0,,0,BINARY,0
1,name,text,0,,0,BINARY,0
2,tbl_name,text,0,,0,BINARY,0
3,rootpage,integer,0,,0,BINARY,0
4,sql,text,0,,0,BINARY,0

https://www.sqlite.org/c3ref/table_column_metadata.html

So armed with the above document and newly exposed pk,collseq, and autoinc
info you can deduce the rowid aliases.
Feel free to add more functions and clean up the error handling to suit
your needs.
Post by Peter Halasz
Post by petern
sqlite> .load column-meta-data.so
Sorry I'm at a loss to find this extension? Google gives me nothing related
to SQLite for "isRowId", "column-meta-data.so", "column-meta-data.c", etc.
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Peter Halasz
2017-11-25 04:20:33 UTC
Permalink
Is this a joke?
J. King
2017-11-25 05:12:24 UTC
Permalink
Peter (that is, the other Peter) is being more than a little flippant, yes, and at least a little obscure (I don't get the joke, either), but the substance appears quite serious.

His prior message suggested using what I can only assume would be a trivial extension to SQLite to do what you want to do---this extension, however, does not (already) exist.

His last message provided what I can only assume is an example implementation of such an extension.
Post by Peter Halasz
Is this a joke?
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
Jean-Christophe Deschamps
2017-11-25 02:32:05 UTC
Permalink
Post by Peter Halasz
Looks like I do need to use AUTOINCREMENT after all, otherwise the
framework I'm using doesn't appear to recognize that INTEGER PRIMARY KEY
columns are allowed a NULL value on insert.
I can't answer about just INTEGER PRIMARY KEY columns, but any table
with an AUTOINCREMENT column has an entry in table sqlite_sequence,
something much easier to deal with than digging into sqlite_master.

JcD
Keith Medcalf
2017-11-25 03:04:57 UTC
Permalink
Actually, it is entirely possible to generate two and only two random xUID side by each and have them be duplicates. Such is the nature of randomness.

The only way to ensure that there is no collisions is to check whether the xUID is already in use/seen within the domain where it is used.

A rowid generated by the ordinary method (if no records then 1 else max(rowid)+1) cannot have a key collision within its domain.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.
-----Original Message-----
From: sqlite-users [mailto:sqlite-users-
Sent: Friday, 24 November, 2017 14:39
Subject: Re: [sqlite] "BIGINT" doesn't act like INTEGER on table
creation [Bug]
Post by Keith Medcalf
Actually a UUID or a GUID has a 100% certainty of a collision, not
just a possibility of a collision. Just as all hash algorithms which
take something and generate a shorter "hash" or "checksum" will
always have collisions. Without exception and as an absolute 100%
certainty. There is no way to avoid this mathematical certainty.
Post by Keith Medcalf
However, whether the absolute and unmitigatable certainty of a
collision is of any import or not is an entirely different matter.
Absolutely incorrect, for a UID. Perhaps if you are talking about
actual
hashes, you can say that there are always multiple (at least
potential)
messages that will generate the same hash value (but for a good hash,
the likelihood that one of them is sensible or even actually
generated
is minuscule). for a UID, while they are typically created by a hash
of
various information, including something that varies each time a
given
generator is used, what those inputs actually are is generally
unimportant, but are mostly provided to help make the 'randomness' of
the choice more 'random'. We are never really concerned with
'potential'
collisions, only actual collisions.
One proof of the falsehood of your assertion is that we CAN fill a
database with some data using UIDs, and we will almost certainly not
get
a collision, while you assertion we will.
Yes, there is a limit on how many entries we can generate before we
will
likely, or even certainly hit a collision, but if that number is
significantly higher than the number of records we will generate (or
even CAN generate), we can assume relative safety. With a 128 bit
UID,
the approximate point we need to worry about random collisions is the
order of 2^64, I suspect that creating an SQLite database with 2^64
non-trivial records in a single table is likely going to have other
issues besides unique key collisions.
--
Richard Damon
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Keith Medcalf
2017-11-25 05:18:58 UTC
Permalink
Only if you assume a monotonic clock ...


---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.
-----Original Message-----
From: sqlite-users [mailto:sqlite-users-
Sent: Friday, 24 November, 2017 20:54
To: SQLite mailing list
Subject: Re: [sqlite] "BIGINT" doesn't act like INTEGER on table
creation [Bug]
Version 1 UUIDs only use a random number (16 bits) in the case of an
uninitialized clock sequence (a case which, ideally, should only
occur the first time a device generates a UUID). Version 1 UUIDs
especially avoid using random numbers; they are also not a shortening
of longer input.
In short, version 1 UUIDs are not a PRNG scheme, nor are they the
same as hash functions.
I'm not a mathematician, and it's been a while since I've read the
relevant RFC, but I believe collisions in a proper, strict
implementation of V1 UUIDs are impossible by design until either a)
the 60-bit timestamp overflows, or b) the MAC address namespace is
exhausted. It's not a matter of probability, and it's only "a
certainty" after the end of their design lifetime.
Of course, UUIDs being of finite size, they will eventually be
exhausted, and a single machine may only generate 65536 identifiers
in a 100-nanosecond span of time. They will not, however, collide.
On November 24, 2017 8:58:02 PM EST, Jean-Christophe Deschamps
Post by Jean-Christophe Deschamps
Post by Richard Damon
Post by Jean-Christophe Deschamps
Post by Richard Damon
One proof of the falsehood of your assertion is that we CAN fill
a
Post by Jean-Christophe Deschamps
Post by Richard Damon
Post by Jean-Christophe Deschamps
Post by Richard Damon
database with some data using UIDs, and we will almost certainly
not
Post by Jean-Christophe Deschamps
Post by Richard Damon
Post by Jean-Christophe Deschamps
Post by Richard Damon
get a collision, while you assertion we will.
This is an attempt at "proof by example". Keith is perfectly right
--mathematically speaking-- and your "proof" doesn't hold water, I
mean as a formal proof. The best proof that your "proof" isn't a
proof is that you feel obliged to add "almost certainly".
DISproof by example is a perfectly valid method. If someone makes a
claim that something is ALWAYS true, ONE counter example IS a
disproof. I said almost certainly as the chance of a collision
isn't 0
Post by Jean-Christophe Deschamps
Post by Richard Damon
(to be able to say with certainty) but is most defintely less than
the
Post by Jean-Christophe Deschamps
Post by Richard Damon
100% claimed.
You're confusing one mathematical theorem and one practical
statement.
Post by Jean-Christophe Deschamps
The first is the _mathematical_ fact that any PRNG (using any fixed
number of random bits, which is what xUIDs are) will provide an
infinite number of collisions with probability 1. This is definitely
true. Of course here, the number of samples is implicitely infinite.
Your practical statement is that you can "most certainly" ignore the
possibility of collision when feeding 2^N xUIDs into a unique column
without loosing sleep. That's good enough in practice. The issue
with
Post by Jean-Christophe Deschamps
your "demonstration" is that 2^N is bounded, whatever finite N you
choose. Hence you don't contradict what Keith said, you just say
something different applying to restricted cases. You're speaking
about
Post by Jean-Christophe Deschamps
practice, while Keith told about math. You're both right, each from
his
Post by Jean-Christophe Deschamps
own point of view. But you can't claim to disproof a trivially true
theorem this way, by changing its premices.
An event with probability 10^-100000...000 (any finite number of
zeroes) will occur at least once, provided you run enough tries.
It'll
Post by Jean-Christophe Deschamps
occur an infinite number of times if you run an infinite number of
tries. Else its probability would be zero.
Your "disproof" amounts to say that 10^-100000...000 = 0
And neither Keith nor I ever said that an xUID collision will occur
with probability 1 after 2^64 samples. That would be false and
that's
Post by Jean-Christophe Deschamps
why people feel free to use xUIDs _AND_ sleep quietly.
JcD
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Loading...