Discussion:
[sqlite] TEXT columns with Excel/VBA
Erwin Kalvelagen
2015-12-02 16:34:41 UTC
Permalink
Good morning.

I wrote a little tool to dump certain data sets into a SQLite database. A
user suggested that I should not use type TEXT but rather type VARCHAR for
character columns, due to some issue with Excel/VBA. See the comments in:
http://yetanothermathprogrammingconsultant.blogspot.com/2013/10/gdx2sqlite.html


I could not find a reference to this using Google. So my question is: Is
this a known problem? I would like to understand a little bit better what
this is about.

Thanks, Erwin


----------------------------------------------------------------
Erwin Kalvelagen
Amsterdam Optimization Modeling Group
***@amsterdamoptimization.com
http://amsterdamoptimization.com.
----------------------------------------------------------------
Simon Slavin
2015-12-02 16:43:56 UTC
Permalink
Post by Erwin Kalvelagen
I could not find a reference to this using Google. So my question is: Is
this a known problem? I would like to understand a little bit better what
this is about.
The bug, if there is a bug, must be in Excel/VBA. SQLite will accept either 'TEXT' or 'VARCHAR' and do identical things no matter which you use. This is because SQLite doesn't have a VARCHAR type, it just pretends it saw 'TEXT' instead.

<https://www.sqlite.org/datatype3.html>

I'm not sure what problem Excel/VBA has with it. Maybe someone else is, or maybe another list would be a better place to ask. Or perhaps the problem is with the program they used "SQLite Database Browser" which is not part of SQLite and may have its own bugs.

Simon.
Richard Hipp
2015-12-02 16:44:25 UTC
Permalink
Post by Erwin Kalvelagen
Good morning.
I wrote a little tool to dump certain data sets into a SQLite database. A
user suggested that I should not use type TEXT but rather type VARCHAR for
http://yetanothermathprogrammingconsultant.blogspot.com/2013/10/gdx2sqlite.html
I could not find a reference to this using Google. So my question is: Is
this a known problem? I would like to understand a little bit better what
this is about.
SQLite handles TEXT and VARCHAR equally well. There are no issues.

Perhaps a 3rd-party GUI shell wrapper around SQLite is having problems
with TEXT?
--
D. Richard Hipp
***@sqlite.org
Bernardo Sulzbach
2015-12-02 17:08:42 UTC
Permalink
On Wed, Dec 2, 2015 at 2:34 PM, Erwin Kalvelagen
A user suggested that I should not use type TEXT but rather type VARCHAR for character columns, due to some issue with Excel/VBA.
If he or she turns out to be correct, do it. As Slavin and Hipp
mentioned, SQLite won't care about the change. However, it is easier
on the eyes (at least mine) to see TEXT used with SQLite queries.
--
Bernardo Sulzbach
R Smith
2015-12-03 00:29:14 UTC
Permalink
Post by Erwin Kalvelagen
Good morning.
I wrote a little tool to dump certain data sets into a SQLite database. A
user suggested that I should not use type TEXT but rather type VARCHAR for
http://yetanothermathprogrammingconsultant.blogspot.com/2013/10/gdx2sqlite.html
I could not find a reference to this using Google. So my question is: Is
this a known problem? I would like to understand a little bit better what
this is about.
The reason you are not finding a reference, is because it isn't true.
SQLite (as others have mentioned) is ambivalent to those types - either
will do.

Personally I use VARCHAR(Len) in table column definitions - simply
because my schema is then directly interchangeable with MySQL/PostGres
and the SQLite query planner sometimes notes that length when
considering data shape - but for data purposes, SQLite doesn't care and
neither do any wrappers I know of. Also, I can't imagine Excel would
have an issue, whether referencing a data object via VB Script or OLE DB
table import etc, I have never seen it matter in Excels 2003 through 2013.

My guess is your friend probably heard of some peculiarity and then
possibly misheard or misinterpreted it to be related to Varchar vs.
Text. Note that in MySQL/PostGres/MSSQL/Oracle there is a very big
difference. Text (and its sub-types such as mediumtext, longtext etc.)
usually is stored as a kind of character blob while Varchar(n) is more
like a string with a length constraint.

It's perhaps also prudent to note that in SQLite, Varchar(10) is just a
Text type, and won't actually limit data entries into that field to only
10 characters.

Good luck!
Ryan
Bart Smissaert
2015-12-03 00:45:24 UTC
Permalink
and the SQLite query planner sometimes notes that length when considering
data shape

In what situations does that happen?

RBS
Post by Erwin Kalvelagen
Good morning.
I wrote a little tool to dump certain data sets into a SQLite database. A
user suggested that I should not use type TEXT but rather type VARCHAR for
http://yetanothermathprogrammingconsultant.blogspot.com/2013/10/gdx2sqlite.html
I could not find a reference to this using Google. So my question is: Is
this a known problem? I would like to understand a little bit better what
this is about.
The reason you are not finding a reference, is because it isn't true.
SQLite (as others have mentioned) is ambivalent to those types - either
will do.
Personally I use VARCHAR(Len) in table column definitions - simply because
my schema is then directly interchangeable with MySQL/PostGres and the
SQLite query planner sometimes notes that length when considering data
shape - but for data purposes, SQLite doesn't care and neither do any
wrappers I know of. Also, I can't imagine Excel would have an issue,
whether referencing a data object via VB Script or OLE DB table import etc,
I have never seen it matter in Excels 2003 through 2013.
My guess is your friend probably heard of some peculiarity and then
possibly misheard or misinterpreted it to be related to Varchar vs. Text.
Note that in MySQL/PostGres/MSSQL/Oracle there is a very big difference.
Text (and its sub-types such as mediumtext, longtext etc.) usually is
stored as a kind of character blob while Varchar(n) is more like a string
with a length constraint.
It's perhaps also prudent to note that in SQLite, Varchar(10) is just a
Text type, and won't actually limit data entries into that field to only 10
characters.
Good luck!
Ryan
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Richard Hipp
2015-12-03 01:33:18 UTC
Permalink
Post by R Smith
and the SQLite query planner sometimes notes that length when considering
data shape
In what situations does that happen?
CREATE TABLE ex1(a INTEGER, b VARCHAR(5), c VARCHAR(5000));
CREATE INDEX ex1b ON ex1(b);
CREATE INDEX ex1c ON ex1(c);

SELECT * FROM ex1 WHERE b=?1 AND c=?2;

The query planner is faced with the decision of whether to use the
ex1b or ex1c index. Statistics gathered by ANALYZE would normally
break this tie, but suppose ANALYZE has not been run, or suppose both
indexes are equally selective. In that case, SQLite would choose ex1b
since it guesses the keys would be shorter and will compare faster and
the fanout will be greater, and hence extb can be searched using fewer
CPU cycles.
--
D. Richard Hipp
***@sqlite.org
Bart Smissaert
2015-12-03 10:52:18 UTC
Permalink
So, it will be quite a rare occurrence then that this could be of any
benefit.
Still nice to know this.

RBS
Post by R Smith
Post by R Smith
Post by R Smith
and the SQLite query planner sometimes notes that length when
considering
Post by R Smith
data shape
In what situations does that happen?
CREATE TABLE ex1(a INTEGER, b VARCHAR(5), c VARCHAR(5000));
CREATE INDEX ex1b ON ex1(b);
CREATE INDEX ex1c ON ex1(c);
SELECT * FROM ex1 WHERE b=?1 AND c=?2;
The query planner is faced with the decision of whether to use the
ex1b or ex1c index. Statistics gathered by ANALYZE would normally
break this tie, but suppose ANALYZE has not been run, or suppose both
indexes are equally selective. In that case, SQLite would choose ex1b
since it guesses the keys would be shorter and will compare faster and
the fanout will be greater, and hence extb can be searched using fewer
CPU cycles.
--
D. Richard Hipp
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Scott Hess
2015-12-03 01:04:54 UTC
Permalink
Personally I use VARCHAR(Len) in table column definitions - simply because
my schema is then directly interchangeable with MySQL/PostGres and the
SQLite query planner sometimes notes that length when considering data
shape - but for data purposes, SQLite doesn't care and neither do any
wrappers I know of. Also, I can't imagine Excel would have an issue,
whether referencing a data object via VB Script or OLE DB table import etc,
I have never seen it matter in Excels 2003 through 2013.
I discourage this kind of usage because it means that in some distant
future when someone has to make things work with a different database
engine, they have to grind through and check every weirdo VARCHAR(73) and
MEDIUMBIGINT declaration someone put in, because none of them have ever
been tested with range enforcement enabled. So where someone meant
"VARCHAR(256)" but actually typed "VARCHAR(6)", will your code suddenly
start throwing clean errors which immediately suggest where to look, or
will it just muddle through corrupting your data? There can certainly be
subtle issues in any type, but my experience is that when you're trying to
decode someone's code, it's easiest when the code says what is actually
happening, not what it wishes were happening!

Of course, if you are _currently_ writing cross-platform code, absolutely
write the cross-platform code! And I will agree that the above position
has some issues when faced with things such as INT being 32 bits on some
platforms, which I guess would argue for using BIGINT and BIGTEXT. Unless
you need VARCHAR(MAX) or LONGTEXT, but maybe MEDIUMTEXT is more portable
... and pretty soon you give up and just circle back to not decorating with
unused/unchecked type annotations.

-scott
R Smith
2015-12-03 12:49:41 UTC
Permalink
Post by Scott Hess
I discourage this kind of usage because it means that in some distant
future when someone has to make things work with a different database
engine, they have to grind through and check every weirdo VARCHAR(73) and
MEDIUMBIGINT declaration someone put in, because none of them have ever
been tested with range enforcement enabled. So where someone meant
"VARCHAR(256)" but actually typed "VARCHAR(6)", will your code suddenly
start throwing clean errors which immediately suggest where to look, or
will it just muddle through corrupting your data? There can certainly be
subtle issues in any type, but my experience is that when you're trying to
decode someone's code, it's easiest when the code says what is actually
happening, not what it wishes were happening!
I posit that a column declared as:
col VARCHAR(32) NOT NULL
says a whole lot more about what is actually happening than:
col TEXT NOT NULL

And sure, I agree a mistyped word can be hard to decode by a future
programmer, but that is a statistical probability in any case.
Post by Scott Hess
Of course, if you are _currently_ writing cross-platform code, absolutely
write the cross-platform code! And I will agree that the above position
has some issues when faced with things such as INT being 32 bits//.... etc.
I *ALWAYS* write cross-platform code as far as SQL is concerned. I even
think in this day and age every programmer should, or is there a case
for the opposite?
Simon Slavin
2015-12-03 13:00:11 UTC
Permalink
I *ALWAYS* write cross-platform code as far as SQL is concerned. I even think in this day and age every programmer should, or is there a case for the opposite?
If cross-platform code worked identically cross-platform I'd rest more easily. The case in question is a good example.

Technically if you provide a long string for a SQL column VARCHAR(6) the SQL engine should silently truncate it to 6 characters before storing it. SQL engines which actually support VARCHAR do this correctly. SQLite doesn't. So although your code executes without error messages in both SQLite and PostgreSQL, it will do different things if your software passes along a seven character string.

There are numerous other examples of this in SQL, including when constraint checking is done, the results of errors triggering ROLLBACK, how values are sorted if you put numbers into string columns, and how NULLs are handled in sorting and comparisons.

There are arguments for and against what you're doing and I don't intend to take a stance. Just to keep readers aware of the problems.

Simon.
Scott Hess
2015-12-03 18:49:23 UTC
Permalink
Post by R Smith
Post by Scott Hess
I discourage this kind of usage because it means that in some distant
future when someone has to make things work with a different database
engine, they have to grind through and check every weirdo VARCHAR(73) and
MEDIUMBIGINT declaration someone put in, because none of them have ever
been tested with range enforcement enabled. So where someone meant
"VARCHAR(256)" but actually typed "VARCHAR(6)", will your code suddenly
start throwing clean errors which immediately suggest where to look, or
will it just muddle through corrupting your data? There can certainly be
subtle issues in any type, but my experience is that when you're trying to
decode someone's code, it's easiest when the code says what is actually
happening, not what it wishes were happening!
col VARCHAR(32) NOT NULL
col TEXT NOT NULL
I'm saying that:
col TEXT NOT NULL
fairly describes what is actually happening, but:
col VARCHAR(32) NOT NULL
implies that things are happening which are not happening. CHAR is kind of
happening, in that it is character data, VAR is not happening in that it
can store more data than a 1-byte prefix can describe, and 32 is definitely
not happening.

And sure, I agree a mistyped word can be hard to decode by a future
Post by R Smith
programmer, but that is a statistical probability in any case.
For most engines if you type VRCHAR(32) you'll get an error. In SQLite
you'll get a field with TEXT affinity. If you typed VRCHR(32) SQLite will
give you a field with no affinity, which will work just fine for the most
part. If you develop code with VARCHAR(6) on another database, you'll
notice that your new code isn't storing your 60-character strings pretty
quickly, but if it's three years later and you're porting, you might _not_
notice the problem unless you have a good test suite in place.

[I'd _love_ something like SQLITE_ENABLE_PEDANTIC and "PRAGMA pedantic =
true" to provide an extra layer of checking on these things. When you find
that someone made a mistake in shipping code, you have to decide whether to
risk fixing it, or just to leave it be, and I'd rather have stronger
assertions about this kind of thing up front.]

Of course, if you are _currently_ writing cross-platform code, absolutely
Post by R Smith
Post by Scott Hess
write the cross-platform code! And I will agree that the above position
has some issues when faced with things such as INT being 32 bits//.... etc.
I *ALWAYS* write cross-platform code as far as SQL is concerned. I even
think in this day and age every programmer should, or is there a case for
the opposite?
My experience has always been that cross-platform code written to run on a
single platform turns out to not be very cross-platform at all. Changing
TEXT to VARCHAR as appropriate is the kind of thing which is generally
pretty easy to do, the hard parts will be the things that nobody even
realized had platform issues, like INT storing 32 bits rather than 64 bits,
or subtle differences in treatment of NULL values.

-scott
R Smith
2015-12-03 19:12:40 UTC
Permalink
Post by R Smith
I posit that a column declared as: col VARCHAR(32) NOT NULL says a
whole lot more about what is actually happening than: col TEXT NOT NULL
col TEXT NOT NULL
col VARCHAR(32) NOT NULL
implies that things are happening which are not happening. CHAR is kind of
happening, in that it is character data, VAR is not happening in that it
can store more data than a 1-byte prefix can describe, and 32 is definitely
not happening.
Ok, I see now more clear what you meant, but I don't agree. I write SQL
to follow in form and function what *I MYSELF* intend for it to do - I
do not adjust it to better describe what the specific SQL flavour engine
actually ends up doing with the code.

If I write:
col VARCHAR(32) PRIMARY KEY
It is only because it is my specific intent for that column to contain
no more than 32 characters and have no multiple null values. Some
engines may understand what I mean, some may not, but my code must
always reflect what I intended. That is to me the very best
future-proofing and future-understanding concept. Tomorrow, SQLite might
improve on its interpretation... or add a PRAGMA strict_sql=ON mode as
most of us wish for, and then how sad will you be that your entire code
base contains "TEXT" everywhere where you could really have specified
actual lengths the way you first intended?

Not to mention... as a bonus my schema will compute correctly when moved
to PostGres/MSSQL/MySQL/etc.

To recode my SQL so it only says what the specific /SQL engine du jour/
can interpret because I /might/ make a spelling error and then cause
myself later headaches... I don't know, that seems like a double
cop-out. Besides, my OCD will never allow things like VRCHAR(6) when I
intended VARCHAR(60). (And no, I'm not special, I assume that is true
for 90% of the people here).

Shape the code to what you INTEND.
Don't make mistakes.
Double check your code.


I see your point, but I'm definitely sticking to my way this time. :)

Cheers,
Ryan
Bernardo Sulzbach
2015-12-03 19:32:07 UTC
Permalink
I understand Smith's point. But when I am sure that some schema is
going to stay in SQLite for the foreseeable future, I like using text.
It is simple, adequate, and expresses my intent completely: TEXT.
That's what the column has, TEXT, not 40, not 60, not less than 30,
just freaking text.

Lastly, reading VARCHAR(40) makes my mind expect a programmatic limit
(not on the business logic) that does not exist. This tingles.

Everyone that commented on this secondary issue (about using TEXT or
VARCHAR for clarity of intent) has solid arguments, and any decision
then is mostly a matter of taste.
Post by R Smith
Don't make mistakes.
I try not to. But it happens.

Loading...