Discussion:
[sqlite] Fwd: How to correctly display unicode characters in Windows 10 / cmd.exe / sqlite3.exe?
Shane Dev
2017-06-04 20:05:48 UTC
Permalink
Hello,

After logging in to Windows 10, I open a command prompt (cmd.exe) and
change the code page to Unicode (UTF-8)
chcp 65001
Active code page: 65001

then I test this with a UTF-8 file -
type utf8test.txt
néo66€

next I execute sqlite-tools-win32-x86-3190200\sqlite3.exe and check the
encoding

sqlite> pragma encoding;
UTF-8

then I try to print some characters to screen -

sqlite> select char(0x006e); --Unicode LATIN SMALL LETTER N
n
sqlite> select char(0x00e9); --Unicode LATIN SMALL LETTER E WITH ACUTE

sqlite> select char(0x006f); --Unicode LATIN SMALL LETTER O
o
sqlite> select char(0x20ac); --Unicode EURO SIGN
?

only the ascii characters are displayed correctly. Next I test outputting
the euro sign to a file -

sqlite> .once eurotest.txt
sqlite> select char(0x20ac); --unicode EURO SIGN
sqlite> .quit

.. and from the command prompt -
type eurotest.txt


Why can't I display these Unicode characters from sqlite3 command line
utility?

P.S I have a similar in powershell but not in Ubuntu.
Simon Slavin
2017-06-05 15:40:30 UTC
Permalink
Post by Shane Dev
Why can't I display these Unicode characters from sqlite3 command line
utility?
Windows Console doesn’t support multibyte characters:

<https://msdn.microsoft.com/en-us/library/windows/desktop/dd317752(v=vs.85).aspx>

"many legacy applications continue to use character sets based on code pages. Even new applications sometimes have to work with code pages, often for one of the following reasons:
• To communicate with legacy applications.
• To communicate with older mail and news servers, which might not always support Unicode.
• To communicate with the Windows Console, which does not support Unicode."

See the third bulletpoint.

PowerShell won’t display Unicode correctly because it is limited to fonts which don’t have Unicode glyphs. Even if you were to hack it to display in other fonts, I don’t know whether PowerShell handles multibyte characters correctly internally. (The current version of PowerShell might have fixed this. Does anyone know ?)

Please note also that other parts of Windows do not handle multibyte characters correctly for other functions. For instance, some versions of Windows convert Unicode to ASCII in pipes. So you would even get errors when piping output from sqlite3 to a file.

Can confirm that your examples word fine on a Macintosh, outputting the unicode characters you’d expect to see, just like Linux.

Simon.
David Raymond
2017-06-05 16:10:19 UTC
Permalink
For the command prompt best I can suggest is changing the font to one of the TrueType fonts listed there like Lucida Console, then not touching the code page. Doing chcp 65001 <used> to help, but then in some release something got added to the CLI to try and help with that automatically. With the examples you gave I'm still not seeing the Euro sign, but at least I'm seeing the é.

I haven't messed with it in a while, but also note that you can potentially get issues when using the alt+4numbers to enter something on the command line if you've messed with the code page. You can do alt+0233 to get the é for example, and the glyph on the screen will show the accent, and if you do a select statement it will show the same glyph with the accent, but the data that actually got inserted was something different. The code page just interpreted the "incorrect" raw data to be what you were expecting to see. If there's ever a doubt you can use the unicode() function to get a code point value. If the console can't display the resulting 0-9 numerals ok, then there's a bigger problem to worry about.


D:\>chcp
Active code page: 437

D:\>sqlite3
SQLite version 3.19.2 2017-05-25 16:50:27
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.

sqlite> select char(233);
char(233)
é

sqlite> select unicode(char(233));
unicode(char(233))
233

sqlite> select unicode('é');--input with alt+0233
unicode('é')
233

sqlite> .exit

D:\>chcp 65001
Active code page: 65001

D:\>sqlite3
SQLite version 3.19.2 2017-05-25 16:50:27
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.

sqlite> select char(233);
char(233)


sqlite> select unicode(char(233));
unicode(char(233))
233

sqlite> select unicode('é');--input with alt+0233, crashes back to command prompt


D:\>


-----Original Message-----
From: sqlite-users [mailto:sqlite-users-***@mailinglists.sqlite.org] On Behalf Of Shane Dev
Sent: Sunday, June 04, 2017 4:06 PM
To: sqlite-***@mailinglists.sqlite.org
Subject: [SPAM] [sqlite] Fwd: How to correctly display unicode characters in Windows 10 / cmd.exe / sqlite3.exe?
Importance: Low

Hello,

After logging in to Windows 10, I open a command prompt (cmd.exe) and
change the code page to Unicode (UTF-8)
chcp 65001
Active code page: 65001

then I test this with a UTF-8 file -
type utf8test.txt
néo66€

next I execute sqlite-tools-win32-x86-3190200\sqlite3.exe and check the
encoding

sqlite> pragma encoding;
UTF-8

then I try to print some characters to screen -

sqlite> select char(0x006e); --Unicode LATIN SMALL LETTER N
n
sqlite> select char(0x00e9); --Unicode LATIN SMALL LETTER E WITH ACUTE

sqlite> select char(0x006f); --Unicode LATIN SMALL LETTER O
o
sqlite> select char(0x20ac); --Unicode EURO SIGN
?

only the ascii characters are displayed correctly. Next I test outputting
the euro sign to a file -

sqlite> .once eurotest.txt
sqlite> select char(0x20ac); --unicode EURO SIGN
sqlite> .quit

.. and from the command prompt -
type eurotest.txt


Why can't I display these Unicode characters from sqlite3 command line
utility?

P.S I have a similar in powershell but not in Ubuntu.
_______________________________________________
sqlite-users mailing list
sqlite-***@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Shane Dev
2017-06-05 18:44:36 UTC
Permalink
Hi David,

I am already using Lucida Console font. Without chcp 65001, sqlite3.exe
will display LATIN SMALL LETTER E WITH ACUTE (U+00E9) correctly. But I
can't figure out how to display EURO SIGN (U+20AC). Lucida Console
definitely has the glyph, otherwise my test "type utf8test.txt" would have
failed. Also REGISTERED SIGN (U+00AE) is displayed as "r" (i.e. not
enclosed in a circle). Simon Slavin pointed out that Windows Console
doesn’t support multibyte characters. Despite that, other console programs
I use are able to display these characters correctly with code page 65001
active - even a simple c program with printf("héllo wo®ld€"); works.

Does anyone know how to display characters like these in the sqlite3.exe
command line utility under Windows 10 (either cmd.exe or powershell.exe)?
Post by David Raymond
For the command prompt best I can suggest is changing the font to one of
the TrueType fonts listed there like Lucida Console, then not touching the
code page. Doing chcp 65001 <used> to help, but then in some release
something got added to the CLI to try and help with that automatically.
With the examples you gave I'm still not seeing the Euro sign, but at least
I'm seeing the é.
I haven't messed with it in a while, but also note that you can
potentially get issues when using the alt+4numbers to enter something on
the command line if you've messed with the code page. You can do alt+0233
to get the é for example, and the glyph on the screen will show the accent,
and if you do a select statement it will show the same glyph with the
accent, but the data that actually got inserted was something different.
The code page just interpreted the "incorrect" raw data to be what you were
expecting to see. If there's ever a doubt you can use the unicode()
function to get a code point value. If the console can't display the
resulting 0-9 numerals ok, then there's a bigger problem to worry about.
D:\>chcp
Active code page: 437
D:\>sqlite3
SQLite version 3.19.2 2017-05-25 16:50:27
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> select char(233);
char(233)
é
sqlite> select unicode(char(233));
unicode(char(233))
233
sqlite> select unicode('é');--input with alt+0233
unicode('é')
233
sqlite> .exit
D:\>chcp 65001
Active code page: 65001
D:\>sqlite3
SQLite version 3.19.2 2017-05-25 16:50:27
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> select char(233);
char(233)

sqlite> select unicode(char(233));
unicode(char(233))
233
sqlite> select unicode('é');--input with alt+0233, crashes back to command prompt
D:\>
-----Original Message-----
On Behalf Of Shane Dev
Sent: Sunday, June 04, 2017 4:06 PM
Subject: [SPAM] [sqlite] Fwd: How to correctly display unicode characters
in Windows 10 / cmd.exe / sqlite3.exe?
Importance: Low
Hello,
After logging in to Windows 10, I open a command prompt (cmd.exe) and
change the code page to Unicode (UTF-8)
chcp 65001
Active code page: 65001
then I test this with a UTF-8 file -
type utf8test.txt
néo66€
next I execute sqlite-tools-win32-x86-3190200\sqlite3.exe and check the
encoding
sqlite> pragma encoding;
UTF-8
then I try to print some characters to screen -
sqlite> select char(0x006e); --Unicode LATIN SMALL LETTER N
n
sqlite> select char(0x00e9); --Unicode LATIN SMALL LETTER E WITH ACUTE
sqlite> select char(0x006f); --Unicode LATIN SMALL LETTER O
o
sqlite> select char(0x20ac); --Unicode EURO SIGN
?
only the ascii characters are displayed correctly. Next I test outputting
the euro sign to a file -
sqlite> .once eurotest.txt
sqlite> select char(0x20ac); --unicode EURO SIGN
sqlite> .quit
.. and from the command prompt -
type eurotest.txt

Why can't I display these Unicode characters from sqlite3 command line
utility?
P.S I have a similar in powershell but not in Ubuntu.
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Olivier Mascia
2017-06-05 22:20:11 UTC
Permalink
Post by Shane Dev
I am already using Lucida Console font. Without chcp 65001, sqlite3.exe
will display LATIN SMALL LETTER E WITH ACUTE (U+00E9) correctly. But I
can't figure out how to display EURO SIGN (U+20AC). Lucida Console
definitely has the glyph, otherwise my test "type utf8test.txt" would have
failed. Also REGISTERED SIGN (U+00AE) is displayed as "r" (i.e. not
enclosed in a circle). Simon Slavin pointed out that Windows Console
doesn’t support multibyte characters. Despite that, other console programs
I use are able to display these characters correctly with code page 65001
active - even a simple c program with printf("héllo wo®ld€"); works.
Does anyone know how to display characters like these in the sqlite3.exe
command line utility under Windows 10 (either cmd.exe or powershell.exe)?
Patch shell.c to get it to output to the console using WriteConsoleW and not fputs().

For a starter, see shell.c (3.19.2) around line 372 and following, function utf8_printf().
The intent is right (Windows needs some help to output UTF8), the implementation is not.
For proper unicode console output on Windows you need to use WriteConsoleW(), and not the C library.


shell.c
@@ -380,10 +380,12 @@
va_start(ap, zFormat);
if( stdout_is_console && (out==stdout || out==stderr) ){
char *z1 = sqlite3_vmprintf(zFormat, ap);
- char *z2 = sqlite3_win32_utf8_to_mbcs_v2(z1, 0);
+ WCHAR *z2 = sqlite3_win32_utf8_to_unicode(z1);
sqlite3_free(z1);
- fputs(z2, out);
- sqlite3_free(z2);
+ DWORD sout;
+ WriteConsoleW(GetStdHandle((out == stdout) ? STD_OUTPUT_HANDLE : STD_ERROR_HANDLE),
+ z2, wcslen(z2), &sout, 0);
+ sqlite3_free(z2);
}else{
vfprintf(out, zFormat, ap);
}

And there you go:

sqlite> select char(0x20ac);
char(0x20ac)

sqlite>

The important thing here is that it will work as long as the font selected in your command-prompt knows the glyph to code point 20ac. No matter if your Windows is 'western-', 'eastern-', or whatever-based. Also the command CHCP has NO effect on output produced by WriteConsoleW().

(This is just a quick dirty patch, I find it ugly to call wcslen() to get the number of wide chars of z2 while this information was known by the api which converted from utf8 to wide and could have been preserved. The GetStdHandle() calls might also better be made once, and not at each call. But the simpler the patch, the better it helps get the idea and experiment for yourself.)

Then for the input handling (keyboard), you'll need an alternate implementation of local_getline(), lines 509 and further. The current implementation use the C library to read the console and then attempts to convert whatever MBCS it thinks it is to UTF8. It really needs to be reworked around ReadConsoleW() when the standard input is detected to really be the console.


With these kind of changes, assuming this content is properly copied as utf8 into a file (unicode.sql for instance):

create table €cole(proverb text);
insert into €cole values('鱼与熊掌');

These commands work as expected, as long as your console font is set to something more complete than the usual Consolas (here using nsimsun on Server 2016 for this purpose):

sqlite3 école.db ".read unicode.sql"

And then:

sqlite3 école.db
SQLite version 3.19.2 2017-05-25 16:50:27
Enter ".help" for usage hints.
sqlite> select * from €cole;
proverb
鱼与熊掌
sqlite>
--
Best Regards, Meilleures salutations, Met vriendelijke groeten,
Olivier Mascia, http://integral.software
David Raymond
2017-06-09 17:55:02 UTC
Permalink
Non C programmer question: In your patch there, does sout have to be freed or anything, or am I missing something?


-----Original Message-----
From: sqlite-users [mailto:sqlite-users-***@mailinglists.sqlite.org] On Behalf Of Olivier Mascia
Sent: Monday, June 05, 2017 6:20 PM
To: SQLite mailing list
Subject: Re: [sqlite] [SPAM] Fwd: How to correctly display unicode characters in Windows 10 / cmd.exe / sqlite3.exe?


Patch shell.c to get it to output to the console using WriteConsoleW and not fputs().

For a starter, see shell.c (3.19.2) around line 372 and following, function utf8_printf().
The intent is right (Windows needs some help to output UTF8), the implementation is not.
For proper unicode console output on Windows you need to use WriteConsoleW(), and not the C library.


shell.c
@@ -380,10 +380,12 @@
va_start(ap, zFormat);
if( stdout_is_console && (out==stdout || out==stderr) ){
char *z1 = sqlite3_vmprintf(zFormat, ap);
- char *z2 = sqlite3_win32_utf8_to_mbcs_v2(z1, 0);
+ WCHAR *z2 = sqlite3_win32_utf8_to_unicode(z1);
sqlite3_free(z1);
- fputs(z2, out);
- sqlite3_free(z2);
+ DWORD sout;
+ WriteConsoleW(GetStdHandle((out == stdout) ? STD_OUTPUT_HANDLE : STD_ERROR_HANDLE),
+ z2, wcslen(z2), &sout, 0);
+ sqlite3_free(z2);
}else{
vfprintf(out, zFormat, ap);
}

And there you go:

sqlite> select char(0x20ac);
char(0x20ac)

sqlite>


...

--
Best Regards, Meilleures salutations, Met vriendelijke groeten,
Olivier Mascia, http://integral.software


_______________________________________________
sqlite-users mailing list
sqlite-***@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Wenbo Zhao
2017-06-09 19:38:59 UTC
Permalink
Post by David Raymond
Non C programmer question: In your patch there, does sout have to be freed
or anything, or am I missing something?
No. The 4th arg of WriteConsoleW is lpNumberOfCharsWritten.
Post by David Raymond
-----Original Message-----
On Behalf Of Olivier Mascia
Sent: Monday, June 05, 2017 6:20 PM
To: SQLite mailing list
Subject: Re: [sqlite] [SPAM] Fwd: How to correctly display unicode
characters in Windows 10 / cmd.exe / sqlite3.exe?
Patch shell.c to get it to output to the console using WriteConsoleW and not fputs().
For a starter, see shell.c (3.19.2) around line 372 and following, function utf8_printf().
The intent is right (Windows needs some help to output UTF8), the implementation is not.
For proper unicode console output on Windows you need to use
WriteConsoleW(), and not the C library.
shell.c
@@ -380,10 +380,12 @@
va_start(ap, zFormat);
if( stdout_is_console && (out==stdout || out==stderr) ){
char *z1 = sqlite3_vmprintf(zFormat, ap);
- char *z2 = sqlite3_win32_utf8_to_mbcs_v2(z1, 0);
+ WCHAR *z2 = sqlite3_win32_utf8_to_unicode(z1);
sqlite3_free(z1);
- fputs(z2, out);
- sqlite3_free(z2);
+ DWORD sout;
+ WriteConsoleW(GetStdHandle((out == stdout) ? STD_OUTPUT_HANDLE : STD_ERROR_HANDLE),
+ z2, wcslen(z2), &sout, 0);
+ sqlite3_free(z2);
}else{
vfprintf(out, zFormat, ap);
}
sqlite> select char(0x20ac);
char(0x20ac)

sqlite>
...
--
Best Regards, Meilleures salutations, Met vriendelijke groeten,
Olivier Mascia, http://integral.software
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
--
Best regards,
Wenbo Zhao
=========
Olivier Mascia
2017-06-09 19:45:20 UTC
Permalink
Post by Wenbo Zhao
Post by David Raymond
Non C programmer question: In your patch there, does sout have to be freed
or anything, or am I missing something?
No. The 4th arg of WriteConsoleW is lpNumberOfCharsWritten.
--
Best Regards, Meilleures salutations, Met vriendelijke groeten,
Olivier Mascia, http://integral.software
Olivier Mascia
2017-06-09 20:02:59 UTC
Permalink
Post by Wenbo Zhao
Post by David Raymond
Non C programmer question: In your patch there, does sout have to be freed
or anything, or am I missing something?
No. The 4th arg of WriteConsoleW is lpNumberOfCharsWritten.
Sorry for the other (empty) post: mistakenly hit send early. :(

Indeed, as Wenbo points out, that parameter is the address of an unsigned 32 bits integer which receives the count of 'characters' (count of 16 bits quantities) that were actually written. Nothing to be freed.

As I said, I hope clearly, in my original post: this patch is a quick-n-dirty patch just to show off some path. Please do not use as such, except for experiment. Calling GetStdHandle() right there is not a good idea, it should be done once per shell run. Calling wcslen() to get the count of WCHARs is not appropriate too. Would I rewrite that, I'd probably rework sqlite3_win32_utf8_to_unicode() (introduce a different version, albeit maybe not yet another public sqlite3 API but a local version to the shell) in order to get back the count of WCHARs of the conversion result and reuse it (because that length is anyway a by-product of the conversion function itself).

All of this was just quickly draft to get it simple to see the difference. That's in no way close to production quality code for Sqlite coding standards.
--
Best Regards, Meilleures salutations, Met vriendelijke groeten,
Olivier Mascia, http://integral.software
David Raymond
2017-06-06 14:12:07 UTC
Permalink
Nice. Thank you for that patch info.


-----Original Message-----
From: sqlite-users [mailto:sqlite-users-***@mailinglists.sqlite.org] On Behalf Of Olivier Mascia
Sent: Monday, June 05, 2017 6:20 PM
To: SQLite mailing list
Subject: Re: [sqlite] [SPAM] Fwd: How to correctly display unicode characters in Windows 10 / cmd.exe / sqlite3.exe?
Post by Shane Dev
I am already using Lucida Console font. Without chcp 65001, sqlite3.exe
will display LATIN SMALL LETTER E WITH ACUTE (U+00E9) correctly. But I
can't figure out how to display EURO SIGN (U+20AC). Lucida Console
definitely has the glyph, otherwise my test "type utf8test.txt" would have
failed. Also REGISTERED SIGN (U+00AE) is displayed as "r" (i.e. not
enclosed in a circle). Simon Slavin pointed out that Windows Console
doesn’t support multibyte characters. Despite that, other console programs
I use are able to display these characters correctly with code page 65001
active - even a simple c program with printf("héllo wo®ld€"); works.
Does anyone know how to display characters like these in the sqlite3.exe
command line utility under Windows 10 (either cmd.exe or powershell.exe)?
Patch shell.c to get it to output to the console using WriteConsoleW and not fputs().

For a starter, see shell.c (3.19.2) around line 372 and following, function utf8_printf().
The intent is right (Windows needs some help to output UTF8), the implementation is not.
For proper unicode console output on Windows you need to use WriteConsoleW(), and not the C library.


shell.c
@@ -380,10 +380,12 @@
va_start(ap, zFormat);
if( stdout_is_console && (out==stdout || out==stderr) ){
char *z1 = sqlite3_vmprintf(zFormat, ap);
- char *z2 = sqlite3_win32_utf8_to_mbcs_v2(z1, 0);
+ WCHAR *z2 = sqlite3_win32_utf8_to_unicode(z1);
sqlite3_free(z1);
- fputs(z2, out);
- sqlite3_free(z2);
+ DWORD sout;
+ WriteConsoleW(GetStdHandle((out == stdout) ? STD_OUTPUT_HANDLE : STD_ERROR_HANDLE),
+ z2, wcslen(z2), &sout, 0);
+ sqlite3_free(z2);
}else{
vfprintf(out, zFormat, ap);
}

And there you go:

sqlite> select char(0x20ac);
char(0x20ac)

sqlite>

The important thing here is that it will work as long as the font selected in your command-prompt knows the glyph to code point 20ac. No matter if your Windows is 'western-', 'eastern-', or whatever-based. Also the command CHCP has NO effect on output produced by WriteConsoleW().

(This is just a quick dirty patch, I find it ugly to call wcslen() to get the number of wide chars of z2 while this information was known by the api which converted from utf8 to wide and could have been preserved. The GetStdHandle() calls might also better be made once, and not at each call. But the simpler the patch, the better it helps get the idea and experiment for yourself.)

Then for the input handling (keyboard), you'll need an alternate implementation of local_getline(), lines 509 and further. The current implementation use the C library to read the console and then attempts to convert whatever MBCS it thinks it is to UTF8. It really needs to be reworked around ReadConsoleW() when the standard input is detected to really be the console.


With these kind of changes, assuming this content is properly copied as utf8 into a file (unicode.sql for instance):

create table €cole(proverb text);
insert into €cole values('鱼与熊掌');

These commands work as expected, as long as your console font is set to something more complete than the usual Consolas (here using nsimsun on Server 2016 for this purpose):

sqlite3 école.db ".read unicode.sql"

And then:

sqlite3 école.db
SQLite version 3.19.2 2017-05-25 16:50:27
Enter ".help" for usage hints.
sqlite> select * from €cole;
proverb
鱼与熊掌
sqlite>

--
Best Regards, Meilleures salutations, Met vriendelijke groeten,
Olivier Mascia, http://integral.software


_______________________________________________
sqlite-users mailing list
sqlite-***@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Loading...