Discussion:
blob vs. string in bound parameters
Andy Ross
2006-10-23 20:23:31 UTC
Permalink
I'm working with the sqlite3 bindings to my "nasal" languages
(http://plausible.org/nasal, if anyone is curious) and I'm having a
problem with type conversion.

Take a look at the attached sample code. What it basically does is to
try to extract a row from a table with a bound parameter:

"select val from tab where val = ?"

The val column in the table is declared as a "text" column. But
because I'm working in a language that doesn't distinguish between
strings and byte arrays, I have to do the binding with
sqlite3_bind_blob() instead of sqlite3_bind_text().

Problem is, the *blob* value of "testval" does not, apparently, equal
the *text* value of "testval" in the database.

The workaround right now is to always define the columns as type blob,
never text. But this strikes me as pretty unambiguously a bug.
Clearly an ASCII string should be equal in either representation: what
possible blob value of "testval" could there be except a 7 byte
string: {'t','e','s','t','v','a','l'}?

Any ideas, or have I misunderstood something?

Andy
Andy Ross
2006-10-23 20:38:32 UTC
Permalink
Post by Andy Ross
Problem is, the *blob* value of "testval" does not, apparently, equal
the *text* value of "testval" in the database.
Just to head off the inevitable reply: no, this isn't an encoding issue.
The result of "pragma encoding" on the database file is UTF-8, and obviously
the transformation between UTF-8 and an ASCII string is the identity
transform.

Andy



-----------------------------------------------------------------------------
To unsubscribe, send email to sqlite-users-unsubscribe-CzDROfG0BjIdnm+***@public.gmane.org
-----------------------------------------------------------------------------
Dennis Cote
2006-10-23 20:59:37 UTC
Permalink
Post by Andy Ross
Post by Andy Ross
Problem is, the *blob* value of "testval" does not, apparently, equal
the *text* value of "testval" in the database.
Just to head off the inevitable reply: no, this isn't an encoding issue.
The result of "pragma encoding" on the database file is UTF-8, and obviously
the transformation between UTF-8 and an ASCII string is the identity
transform.
Andy,

You have discovered a "quirk" in SQLite. I'm not sure if it should be
called a bug, or not.

The following log shows that sqlite does indeed distinguish between a
text field and a blob with the same content. It also shows you a
workaround. You simply need to cast your fields to blobs before you
compare them to a variable that is bound to a blob.

sqlite> create table t (a text);
sqlite> insert into t values (X'74657374');
sqlite> insert into t values ('test');
sqlite> select * from t;
test
test
sqlite> select count(*) from t where a = X'74657374';
1
sqlite> select count(*) from t where a = X'7465737400';
0
sqlite> select count(*) from t where a = 'test';
1
sqlite> select typeof(a) from t;
blob
text
sqlite> select count(*) from t where cast(a as blob) = 'test';
0
sqlite> select count(*) from t where cast(a as blob) = X'74657374';
2

HTH
Dennis Cote

-----------------------------------------------------------------------------
To unsubscribe, send email to sqlite-users-unsubscribe-CzDROfG0BjIdnm+***@public.gmane.org
-----------------------------------------------------------------------------
Andy Ross
2006-10-23 21:36:21 UTC
Permalink
Post by Dennis Cote
The following log shows that sqlite does indeed distinguish
between a text field and a blob with the same content. It also
shows you a workaround. You simply need to cast your fields to
blobs before you compare them to a variable that is bound to a
blob.
But I can't do that. I'm writing a language binding, not an
application*, so changes to the query text are outside the realm
of what I can play with. But the fact that it exists is actually
really helpful, because I can just write it into the docs as a
known gotcha without chucking my design entirely, and especially
because it focuses my earlier question better:

If [1] SQLite automatically casts between types as needed in most
cases (e.g. real to/from string) and [2] the cast() syntax
already works to convert between strings and blobs in
the "obvious" way, then: why is the cast automatic for numberic
values, but not for strings/blobs? Isn't that a
non-orthogonality design flaw?

Andy

* ..er, well, I'm writing a language binding *and* an app at the
moment. But I'm much more concerned about the problems with
the bindings. As I mentioned earlier, I can already work
around the issue by making sure all columns that will be
queried with parameters are specified as blobs.

-----------------------------------------------------------------------------
To unsubscribe, send email to sqlite-users-unsubscribe-CzDROfG0BjIdnm+***@public.gmane.org
-----------------------------------------------------------------------------
John Stanton
2006-10-23 22:22:08 UTC
Permalink
The method I wrote about earlier is part of a language binding.

I can show you the code if you are interested.
Post by Andy Ross
Post by Dennis Cote
The following log shows that sqlite does indeed distinguish
between a text field and a blob with the same content. It also
shows you a workaround. You simply need to cast your fields to
blobs before you compare them to a variable that is bound to a
blob.
But I can't do that. I'm writing a language binding, not an
application*, so changes to the query text are outside the realm
of what I can play with. But the fact that it exists is actually
really helpful, because I can just write it into the docs as a
known gotcha without chucking my design entirely, and especially
If [1] SQLite automatically casts between types as needed in most
cases (e.g. real to/from string) and [2] the cast() syntax
already works to convert between strings and blobs in
the "obvious" way, then: why is the cast automatic for numberic
values, but not for strings/blobs? Isn't that a
non-orthogonality design flaw?
Andy
* ..er, well, I'm writing a language binding *and* an app at the
moment. But I'm much more concerned about the problems with
the bindings. As I mentioned earlier, I can already work
around the issue by making sure all columns that will be
queried with parameters are specified as blobs.
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
To unsubscribe, send email to sqlite-users-unsubscribe-CzDROfG0BjIdnm+***@public.gmane.org
-----------------------------------------------------------------------------
Andy Ross
2006-10-23 22:37:19 UTC
Permalink
Post by John Stanton
The method I wrote about earlier is part of a language binding.
I can show you the code if you are interested.
I think we must be talking about different things, then. A language
binding that allows the user to write their own SQL cannot
automatically insert cast() calls into the queries without parsing the
SQL and making a runtime determination about the context in which each
bound parameter will be evaluated. And that's the halting problem. :)

Maybe you're talking about something like rails, which abstracts away
the SQL from the user API and generates its own queries? Then sure.
But that's not the level of abstraction I'm working with.

But regardless: sure, I'd be curious to see how you are dealing with
the same issue. What I've done for now is just change the default
from bind_blob() to bind_text(), and add some language to the docs
warning users that they need to add a cast-to-text when writing
expressions involving blob columns. Since comparing blob values is
obviously rare, this seems to me like an acceptable compromise.

Andy

-----------------------------------------------------------------------------
To unsubscribe, send email to sqlite-users-unsubscribe-CzDROfG0BjIdnm+***@public.gmane.org
-----------------------------------------------------------------------------
John Stanton
2006-10-23 23:11:25 UTC
Permalink
Actually I was talking about an application language which lets users
incorporate their own SQL and which binds to the SQL at run time rather
than being like embedded SQL.

It doesn't need to parse SQL, it just uses the existing SQL API which
provides all the necessary capability.

The application language compiler does not need to parse the SQL.
When I was writing the compiler I thought of looking at the SQL but soon
realized that it was not necessary, the Sqlite API had enough
functionality to make that unecessary.

As I said before, the users of my application language declare Sqlite
columns to be anything their application would like to see. The binding
logic retrieves that declared type (e.g. DECIMAL(5,2)) and gets the
actual type (which may be TEXT) and performs the appropriate move, in
that case a TEXT string of digits into a fixed point, display format
number.

Sqlite's "manifest typing" has been carefully thought through and is
almost all things for all people.
Post by Andy Ross
Post by John Stanton
The method I wrote about earlier is part of a language binding.
I can show you the code if you are interested.
I think we must be talking about different things, then. A language
binding that allows the user to write their own SQL cannot
automatically insert cast() calls into the queries without parsing the
SQL and making a runtime determination about the context in which each
bound parameter will be evaluated. And that's the halting problem. :)
Maybe you're talking about something like rails, which abstracts away
the SQL from the user API and generates its own queries? Then sure.
But that's not the level of abstraction I'm working with.
But regardless: sure, I'd be curious to see how you are dealing with
the same issue. What I've done for now is just change the default
from bind_blob() to bind_text(), and add some language to the docs
warning users that they need to add a cast-to-text when writing
expressions involving blob columns. Since comparing blob values is
obviously rare, this seems to me like an acceptable compromise.
Andy
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
To unsubscribe, send email to sqlite-users-unsubscribe-CzDROfG0BjIdnm+***@public.gmane.org
-----------------------------------------------------------------------------
d***@public.gmane.org
2006-10-23 20:53:37 UTC
Permalink
Post by Andy Ross
Problem is, the *blob* value of "testval" does not, apparently, equal
the *text* value of "testval" in the database.
There is good reason for this, actually. TEXT values
can sort in many different orders, depending on what
collating sequence is used. Different languages use
different collating sequences. Some languages have
multiple collating sequences. I'm told, for example,
that there are two common ways to short Chinese and
several other less common ways. The database needs
to be able to handle all of these. But BLOBs always
sort in memcmp() order.

The second point is that TEXT can be converted from
UTF-8, UTF-16be, and UTF-16le. So, for example, if
the a user stores a UTF-16le string on an x86 machine,
the moves the database file over to a ppc machine and
reads out the same string, it comes out as UTF-16be.
Or if one user writes a UTF-8 string and another requests
a UTF-16 string, the conversion is automatic. But
with a BLOB, no conversions ever occur.

Because TEXT does not have a uniform representation,
and because the sort order can be different for each
column, it is difficult to know how to compart TEXT
and BLOBs. So SQLite takes the approach of always
making every BLOB larger than every TEXT string.
That is simple and unambiguous.
--
D. Richard Hipp <***@hwaci.com>


-----------------------------------------------------------------------------
To unsubscribe, send email to sqlite-users-unsubscribe-CzDROfG0BjIdnm+***@public.gmane.org
-----------------------------------------------------------------------------
Andy Ross
2006-10-23 21:30:26 UTC
Permalink
Post by d***@public.gmane.org
There is good reason for this, actually.
But I'm not sorting, nor doing a < or > comparsion. In fact none of
this has to do with comparison at all, but *identity*, which is a much
simpler test. Are you really arguing that there are situations where
a string value (in arbitrary encoding -- whatever is stored) and a
blob cannot be tested for equality unambiguously?
Post by d***@public.gmane.org
The second point is that TEXT can be converted from UTF-8, UTF-16be,
and UTF-16le.
Sure. But as I mentioned it's UTF-8. Certainly it can't be a problem
to compare a blob and a string for bytewise equality? Do you have a
real world use case for a situation where this actually causes
Post by d***@public.gmane.org
SQLite takes the approach of always making every BLOB larger than
every TEXT string. That is simple and unambiguous.
...but it's a disaster for people like me trying to actually take
advantage of the manifest typing feature. I have an (IMHO really
pleasing) API that looks like this:

sqlite.exec(db, "select * from X where Y = ?", ...bind params...)

This obviously presumes that types are convertable at runtime, which
in SQLite they *mostly* are; I can pass in the string "12" and get a
valid comparison to an integer or real value, etc... But I *can't*
know when binding the parameter whether the context in which it will
be used is a blob or a string. So what I do is use bind_blob()
universally for all string objects. I was led to believe by your
documentation on manifest typing and by analogy to your automatic
numeric conversions that this sort of conversion was legal.

But if that does not work, then this whole API design is shot. I'd
have to expose the *explicit* typing of all the values to the
user-level API, so they can make the call as to whether to query for a
blob or a string. That seems like a mess to me, and very much *out*
of keeping with the manifest typing philosophy.

Seriously: what's wrong with just (1) converting a string to a blob by
exposing the literal byte in whatever encoding it was stored in, and
(2) converting a blob to a string by interpreting the bytes literally
in the current "pragma encoding" environment? Sure, the user can
shoot herself in the foot with that, but it works unambiguously in the
only sane case: where the user-side string environment and the
database schema are written to use the same encoding.

Do you have any other suggestions, or is the clean/simple API choice
above just not something you want to suppot with SQLite? It seems
like many other language bindings are going to have the same issue...
I think what I'll have to do in the interrim is default to bind_text()
instead of bind_blob(). Does that work if the data has embedded nuls?

Also, can you be more specific about exactly why you made this design
choice? This seems to me like a situation where simplicity of
implementation (punting on the issue of comparing strings and blobs)
got mixed up with simplicity of design (making user-visible type
conversion as automatic and error-proof as possible). I've hit that
on a bunch of occasions with Nasal, and sometimes it helps to step
back a bit and look at the problem from the user's perspective.

Andy



-----------------------------------------------------------------------------
To unsubscribe, send email to sqlite-users-unsubscribe-CzDROfG0BjIdnm+***@public.gmane.org
-----------------------------------------------------------------------------
d***@public.gmane.org
2006-10-23 23:22:09 UTC
Permalink
Post by d***@public.gmane.org
SQLite takes the approach of always making every BLOB larger than
every TEXT string. That is simple and unambiguous.
....but it's a disaster for people like me trying to actually take
advantage of the manifest typing feature. I have an (IMHO really
sqlite.exec(db, "select * from X where Y = ?", ...bind params...)
This obviously presumes that types are convertable at runtime, which
in SQLite they *mostly* are; I can pass in the string "12" and get a
valid comparison to an integer or real value, etc... But I *can't*
know when binding the parameter whether the context in which it will
be used is a blob or a string. So what I do is use bind_blob()
universally for all string objects.
Have you looked at the TCL bindings? The above would be

db eval {select * from X where Y=$parameter} {... code here ...}

Notice that the variable to be bound is specify by name directly
in the SQL. So if you are binding multiple parameters (as I
often do) there is no danger of miscounting the number of "?"
and getting the bind parameters wrong on the end. And when
reading, you do not have to move your eye out to the "bind parameters"
section to figure out what the "?" means.

At runtime, SQLite determines the internal data representation
of the $parameter variable, then uses bind_text, bind_blob,
bind_int, or bind_whatever as appropriate. And this works very,
very well in practice.
I was led to believe by your
documentation on manifest typing and by analogy to your automatic
numeric conversions that this sort of conversion was legal.
I am sorry that the documentation misled you. I'll work on
improving it.
Seriously: what's wrong with just (1) converting a string to a blob by
exposing the literal byte in whatever encoding it was stored in, and
(2) converting a blob to a string by interpreting the bytes literally
in the current "pragma encoding" environment? Sure, the user can
shoot herself in the foot with that, but it works unambiguously in the
only sane case: where the user-side string environment and the
database schema are written to use the same encoding.
Well, I suppose we might have made that choice when we were
defining the interface for SQLite version 3 - if you had
brought it up then. But we did not. And the interface is
now frozen is not going to change regardless of whether or
not your system is better than the one that is implemented.
SQLite takes backwards compatibility very seriously, and
so we are not going to make a change of this magnitude
without a very good reason.

You might not think SQLite's backwards compatibility
pledge is a good thing now, but if you continue using
SQLite then someday you will likely thank me.
Do you have any other suggestions, or is the clean/simple API choice
above just not something you want to suppot with SQLite? It seems
like many other language bindings are going to have the same issue...
I would rather support the thousands of applications that
I know are already using SQLite successfully than break
all those other application in order to support a single
language that I have never heard of before. You are the
first person to complain about this in 2.5 years since
the beginning of version 3 and I am aware of at least 2
dozen other language bindings that already exist and are
fully functional, so I really do not think it will become
a widespread problem.
I think what I'll have to do in the interrim is default to bind_text()
instead of bind_blob(). Does that work if the data has embedded nuls?
I believe you can bind_text with embedded '\000' characters,
as long as you explicitly specify the length of the text, of
course. This is not something that is covered in the regression
tests, that I recall, so you might run into problems. But if
you do, I would consider them bugs and will fix them.

--
D. Richard Hipp <***@hwaci.com>


-----------------------------------------------------------------------------
To unsubscribe, send email to sqlite-users-unsubscribe-CzDROfG0BjIdnm+***@public.gmane.org
-----------------------------------------------------------------------------
Andy Ross
2006-10-24 00:02:59 UTC
Permalink
Post by d***@public.gmane.org
Have you looked at the TCL bindings? The above would be
db eval {select * from X where Y=$parameter} {... code here ...}
I do have a wrapper that uses a hash literal for exactly this
(although in practice I find it actually easier to read a bunch of
positional parameters on the argument list). It's actually possible
in Nasal for a function to inspect the namespace of its caller,
although I haven't bothered to implement something like that here.
Post by d***@public.gmane.org
At runtime, SQLite determines the internal data representation of
the $parameter variable, then uses bind_text, bind_blob, bind_int,
or bind_whatever as appropriate.
I'm not quite sure I see how. This is actually exactly what I'm
doing: Nasal has two meaningful datatypes in this context: double
precision numbers and "strings", which are internally arbitrary byte
arrays. A double gets set using bind_double(), and the string using
bind_blob() (originally: now bind_text()). But if the *column* (Y, in
your example above) against which the parameter is compared has type
"text" and not "blob", that comparison always fails.

And TCL (the last I used it, which was in the mid-90's) has exactly
the same data representation as Nasal: there is no internal
distinction between a string and a byte array. It would have exactly
the same issue that I am seeing; I suspect that the only reason you
don't hit more often it is that the TCL bindings picked the opposite
convention to the one I (originally) did: they call bind_text() for
all strings instead of bind_blob().
Post by d***@public.gmane.org
Well, I suppose we might have made that choice when we were defining
the interface for SQLite version 3 - if you had brought it up then.
Is the implication then that no planning is being done for version 4? :)

This is a straw man argument you are making: I consider this issue a
bug, not an interface change, and made that case as such. You
disagree, and of course your opinion takes precedence. But to argue
that *all* such changes to query behavior are "interface changes" and
therefore unfixable seems to stretch the point of what "compatibility"
means past a reasonable breaking point. Clearly you have and will
continue to make changes that cause the results from queries to
change; you simply call them "fixes" instead of "changes". And
honestly, your implication that my asking that this issue be fixed
implies a lack of understanding about compatibility and release
management is a little insulting.
Post by d***@public.gmane.org
I would rather support the thousands of applications that I know are
already using SQLite successfully than break all those other
application
Um, break *all* of them? Now you are simply exagerating. I find it
difficult to believe that there are thousands of applications whose
correct behavior depends on blobs being != to text when used with
bound parameters. I can't even think of even a *theoretical*
situation where this might happen. Absence of evidence is not
evidence of absence, obviously, but nonetheless, I find it more likely
that the actual number of vulnerable applications is zero. You
Post by d***@public.gmane.org
You are the first person to complain about this in 2.5 years since
the beginning of version 3 and I am aware of at least 2 dozen other
language bindings that already exist and are fully functional
I strongly suspect those other languages made the same choice that the
TCL bindings did: they use bind_text() always, and never bind_blob()
except where the language or exposed API makes the distinction
explicit. And because almost no one writes schemas where comparing
blobs is required, none of these hit the issue. But for whatever
reason, I picked blob as the default; and as described, blob works
very poorly as a default, because your strings won't compare against
other strings already in the database.

I'm actually reasonably happy with this workaround from Dennis, so I'm
going to disappear from the list and not argue the case further. But
I will admit to being a little puzzled that you seem to understand the
issue yet don't find it worth fixing, even in a future verison.

Andy

-----------------------------------------------------------------------------
To unsubscribe, send email to sqlite-users-unsubscribe-CzDROfG0BjIdnm+***@public.gmane.org
-----------------------------------------------------------------------------
Dennis Cote
2006-10-23 21:07:37 UTC
Permalink
Post by d***@public.gmane.org
There is good reason for this, actually.
And that pretty definitively answers the question of whether or not this
is a bug. :-)

Dennis Cote

-----------------------------------------------------------------------------
To unsubscribe, send email to sqlite-users-unsubscribe-CzDROfG0BjIdnm+***@public.gmane.org
-----------------------------------------------------------------------------
John Stanton
2006-10-23 22:17:01 UTC
Permalink
Sqlite has declared types and actual types. Both can be accessed
through the API.

What I do is look at the declared type, which defines the data and then
look at the actual type to determine how to process it.

Youn can declare the type to be anything you want. A name such a GEORGE
or INVOICE-DATE are each accpetable.
Post by Andy Ross
I'm working with the sqlite3 bindings to my "nasal" languages
(http://plausible.org/nasal, if anyone is curious) and I'm having a
problem with type conversion.
Take a look at the attached sample code. What it basically does is to
"select val from tab where val = ?"
The val column in the table is declared as a "text" column. But
because I'm working in a language that doesn't distinguish between
strings and byte arrays, I have to do the binding with
sqlite3_bind_blob() instead of sqlite3_bind_text().
Problem is, the *blob* value of "testval" does not, apparently, equal
the *text* value of "testval" in the database.
The workaround right now is to always define the columns as type blob,
never text. But this strikes me as pretty unambiguously a bug.
Clearly an ASCII string should be equal in either representation: what
possible blob value of "testval" could there be except a 7 byte
string: {'t','e','s','t','v','a','l'}?
Any ideas, or have I misunderstood something?
Andy
------------------------------------------------------------------------
#include <stdio.h>
#include <string.h>
#include <sqlite3.h>
//
// rm -f test.db
// echo "create table tab (val text);" | sqlite3 test.db
// echo "insert into tab values ('testval');" | sqlite3 test.db
//
// Validate: (prints 'testval' as expected)
//
// echo "select val from tab where val = 'testval';" | sqlite3 test.db
//
//
// gcc -o test test.c -lsqlite3 && ./test
//
//
// The blob apparently tests as not equal to the identitcal string,
// and the query returns zero rows.
#define DB "test.db"
#define QUERY "select val from tab where val = ?"
#define FIELD "testval"
#define PERR(msg) { printf(msg); printf("%s\n", sqlite3_errmsg(db)); }
int main()
{
int stat, cols, i;
sqlite3 *db;
sqlite3_stmt *stmt;
const char *tail;
if(sqlite3_open(DB, &db)) {
PERR("open failure\n");
return 1;
}
if(sqlite3_prepare(db, QUERY, strlen(QUERY), &stmt, &tail)) {
PERR("prepare failure\n");
return 1;
}
if(sqlite3_bind_blob(stmt, 1, FIELD, strlen(FIELD), SQLITE_TRANSIENT)) {
PERR("bind failure\n");
return 1;
}
while((stat = sqlite3_step(stmt)) != SQLITE_DONE) {
cols = sqlite3_column_count(stmt);
for(i=0; i<cols; i++) {
if(i != 0) printf("|");
fwrite(sqlite3_column_blob(stmt, i),
sqlite3_column_bytes(stmt, i), 1, stdout);
}
}
return 0;
}
------------------------------------------------------------------------
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
To unsubscribe, send email to sqlite-users-unsubscribe-CzDROfG0BjIdnm+***@public.gmane.org
-----------------------------------------------------------------------------
Continue reading on narkive:
Search results for 'blob vs. string in bound parameters' (Questions and Answers)
51
replies
Can someone offer me absolute proof that God exists?
started 2009-08-14 13:34:48 UTC
religion & spirituality
Loading...