Post by Cecil WesterhofPost by Richard HippPost by Cecil WesterhofSELECT *
FROM <TABLE>
WHERE STRINGENCODE(<COLUMN>) LIKE '%\\u%'
Is something like this also possible with SQLite?
SELECT * FROM <table>
WHERE <column> GLOB ('*[^'||char(1,45,127)||']*');
I tried it and it looks like it works. But I do not understand how it
works. Could you enlighten me?
char() returns text with 1 (unicode) char per unicode code point integer
used as argument to this function.
So char(1,45,127) return first-ascii-char - (i.e. dash) last-ascii-char
(unicode is a superset of ascii).
So the same way a-z is the character range between a and z, i.e. all
lower-case letters
NUL - DEL is the full range of ascii characters. (char(1) = NUL, char(127)
= DEL)
[^...] is the negation of the characters inside.
So [^NUL-DEL] is all non-ascii (unicode) characters. It matches a single
character.
And *[^NULL-DEL]* allows that 1 character to be matched anywhere.
* is the kleen star, which matches 0 or more arbitrary characters.
Note that glob above is not unicode aware per se. It operates on UTF-8
encoded strings,
and in that encoding, all non-ascii characters are encoded into multiple
bytes, which (the bytes) all are
in the 128-255 range. So it works indeed.
Post by Cecil WesterhofNote, however, that the "*[" combination in a GLOB pattern in SQLite
Post by Richard Hippis inefficient. So the above might be slow for a large amount of text.
because the kleen star is "griddy" by default.
Post by Cecil WesterhofPost by Richard HippPerhaps a better approach would be to create an application-defined
function to do the search.
SELECT * FROM <table> WHERE contains_non_ascii(<column>);
I am working (in this case) with bash. That is not possible then I
think.Or is it?
You can create a C shared library SQLite extension, that you .load in the
sqlite3 shell.
So that works in bash too, but you need to do some C programming. --DD