Discussion:
[sqlite] bug: failure to write journal reported as "disk I/O error"
KRECKEL Richard (AREVA)
2017-09-25 11:39:48 UTC
Permalink
Remove the write permission of a SQLite database's journal file. Then, try write-accessing the database. The error reported is "disk I/O error". (This happened to me when two user tried to share a DB and had their umask set wrong.)



The error message reported by SQLite is inappropriate. A "permission denied" would be much better and guide the user towards fixing the problem (instead of scaring the hell out of the poor sysadmin who suspects a filesystem corruption might be going on.)



I'm using SQLite 3.19.3.



All my best,

-rbk.
Jens Alfke
2017-09-26 15:22:59 UTC
Permalink
Post by KRECKEL Richard (AREVA)
Remove the write permission of a SQLite database's journal file. Then, try write-accessing the database. The error reported is "disk I/O error". (This happened to me when two user tried to share a DB and had their umask set wrong.)
The basic error code is SQLITE_IOERR, which just means "Some kind of disk I/O error occurred” according to the comment. Which is true in this case; an I/O operation returned an error.

If you want more detailed info, use extended error codes by calling sqlite3_extended_result_codes() or sqlite3_extended_errcode(). Then you’ll get a more specific error; in your situation probably SQLITE_IOERR_ACCESS.

—Jens
Guy Harris
2017-09-26 19:47:47 UTC
Permalink
Post by Jens Alfke
The basic error code is SQLITE_IOERR, which just means "Some kind of disk I/O error occurred” according to the comment. Which is true in this case; an I/O operation returned an error.
But the *disk* didn't - the *operating system* did, so if SQLITE_IOERR really means "Some kind of disk I/O error occurred", it's *not* the right error to return for a *permission* error.

And, on UN*X, a write() call can return ENOSPC; a write() is an I/O operation, and "returns -1 with errno set to ENOSPC" is an error, but that presumably gets reported as SQLITE_FULL, not as SQLITE_IOERR.

Sadly, the name chosen for that error code

1) suggests an "I/O error" in the sense of "a device reported an error trying to read or write it"

and

2) is probably part of the API and thus unchangeable.

However, if SQLITE_IOERR is returned for *anything* other than, on UN*X, an EIO errno:

1) The documentation should *really really really really really* avoid calling it an "I/O error", as "I/O error" has a connotation of "the device reported an error" (which is what EIO signifies) rather than "an I/O operation got some sort of error, not necessarily an error from the device from which we were trying to read data or to which we were trying to write data".

2) The documentation should tell people *always* to use sqlite3_system_errno() after an SQLITE_IOERR and report the error based on *that*, not just by reporting an "I/O error". Yes, that means writing platform-dependent code; if you want to allow platform-independent code to be written atop SQLite, stuff the platform dependency inside SQLite, by providing some API to get errors such as, for example, "permission denied" or "disk quota exceeded" or "an actual disk I/O error occurred" rather than "write() got some error other than ENOSPC". (Yes, you *can* get "permission denied", e.g. in an NFSv2/NFSv3 write to a file to which you had write permission when you opened it but to which you no longer have write permission, and, yes, if, for example, you're in the remote file system group at Apple, with a home directory on an NFS server, you can have an SQLite database being accessed over NFS.)
Post by Jens Alfke
If you want more detailed info, use extended error codes by calling sqlite3_extended_result_codes() or sqlite3_extended_errcode(). Then you’ll get a more specific error; in your situation probably SQLITE_IOERR_ACCESS.
Perhaps, in that particular code path, the permission problem would show up in an xAccess method call, so that this would happen to be able to give you a better error.

However, what matters isn't "what operation got the error?", it's "what non-file-system-full error did you get?", and the extended error code won't help for errors other than ENOSPC and EIO returned by write().
Simon Slavin
2017-09-26 20:05:04 UTC
Permalink
Post by Guy Harris
Post by Jens Alfke
The basic error code is SQLITE_IOERR, which just means "Some kind of disk I/O error occurred” according to the comment. Which is true in this case; an I/O operation returned an error.
But the *disk* didn't - the *operating system* did, so if SQLITE_IOERR really means "Some kind of disk I/O error occurred", it's *not* the right error to return for a *permission* error.
Those error codes were devised in a day when OS error codes were more simple. Also please note that those error codes are addressed to programmers. Your users should never see the text explanation of the number. Because your users wouldn’t know what to do about them. At most the user can be shown the number returned to they can quote it in a support call.

Can you find out which extended result code is returned ?

<https://www.sqlite.org/c3ref/extended_result_codes.html>

<https://www.sqlite.org/c3ref/c_abort_rollback.html>

That will let us know what’s really going on.

Simon.
Guy Harris
2017-09-26 20:17:18 UTC
Permalink
Post by Simon Slavin
Post by Guy Harris
Post by Jens Alfke
The basic error code is SQLITE_IOERR, which just means "Some kind of disk I/O error occurred” according to the comment. Which is true in this case; an I/O operation returned an error.
But the *disk* didn't - the *operating system* did, so if SQLITE_IOERR really means "Some kind of disk I/O error occurred", it's *not* the right error to return for a *permission* error.
Those error codes were devised in a day when OS error codes were more simple.
EDQUOT was introduced in 1982, with 4.2BSD; when was SQLITE_IOERR devised?
Post by Simon Slavin
Also please note that those error codes are addressed to programmers. Your users should never see the text explanation of the number. Because your users wouldn’t know what to do about them.
A user wouldn't know what to do with "you've exceeded your stored data quota"? If so, your site has failed to explain to the users that they've been given a quota, limiting the amount of space on the server that they can use, and that if they exceed their quota, they either need to delete stuff they no longer need, move stuff they might *someday* need but don't need *now* to some archival medium, or ask their system administrator to increase their quota?
Post by Simon Slavin
At most the user can be shown the number returned to they can quote it in a support call.
The *number* might annoy the support staff; right off the top of your head, what's the error number for "file system quota exceeded" or "I/O error"? (No cheating by looking it up in a man page or include file!)

And, yes, there needs to be *some* way to get the underlying problem reported to somebody in a position to do something about it - where "the underlying problem" includes "what did the OS say?" as much as it includes "what SQLite operation got the error?".
Jens Alfke
2017-09-26 20:37:42 UTC
Permalink
A user wouldn't know what to do with "you've exceeded your stored data quota”?
A Turkish or Chinese user likely wouldn’t. (SQLite’s error messages are not localized.) And there are plenty of messages that are much less understandable to a lay user than the one you picked out.
The *number* might annoy the support staff; right off the top of your head, what's the error number for "file system quota exceeded" or "I/O error"? (No cheating by looking it up in a man page or include file!)
On the contrary, error numbers are a lot easier for support. They’re independent of locale, they don’t get re-worded from one version of the app to the next, and they’re very short and easy to dictate over the phone. Of course, these shouldn’t be the primary error information given to the user! But the user-level error message should be something specific to the application, like “an unexpected database error occurred (19)” instead of "Abort due to constraint violation”. The number would appear only for support purposes.

I say this as someone who’s worked on a number of end-user GUI applications over the years.

—Jens
Guy Harris
2017-09-26 20:57:36 UTC
Permalink
Post by Jens Alfke
A user wouldn't know what to do with "you've exceeded your stored data quota”?
A Turkish or Chinese user likely wouldn’t. (SQLite’s error messages are not localized.)
Which means "for stuff that would be shown to the user, for the user to read, either localize your error messages, or make sure your API returns error codes that the application can turn into localized error messages".

And none of this argues against presenting to the user, in their native language, a message saying "you've exceeded your file system quota", if that is, in fact, what happened.
Post by Jens Alfke
And there are plenty of messages that are much less understandable to a lay user than the one you picked out.
"I got a permission error trying to write to the journal" isn't something you'd directly say to the lay user, but *don't* tell the user anything that might convince them that their disk is failing if you didn't get EIO or the equivalent on some other OS - and don't tell them something that, when relayed to tech support, would lead the support person to believe that, either.

I.e., Richard Krekel is 100% correct when he says that "disk I/O error" is an inappropriate message for a permission error - the *disk* had no problem, the *OS* had a problem when the disk returned file system data that, among other things, indicated that the user didn't have permission to do something. Replacing the disk and restoring from a backup probably won't fix that problem (unless the user had that permission when the backup was done).
Post by Jens Alfke
The *number* might annoy the support staff; right off the top of your head, what's the error number for "file system quota exceeded" or "I/O error"? (No cheating by looking it up in a man page or include file!)
On the contrary, error numbers are a lot easier for support. They’re independent of locale,
But the error reported by sqlite3_system_errno() isn't independent of the OS on which the user is running, so *that* error wouldn't be easy for support. You'd need a platform-independent error code, meaning, in this case, one supplied by SQLite, not by the OS.
Simon Slavin
2017-09-26 21:16:46 UTC
Permalink
Post by Guy Harris
Post by Jens Alfke
A user wouldn't know what to do with "you've exceeded your stored data quota”?
A Turkish or Chinese user likely wouldn’t. (SQLite’s error messages are not localized.)
Which means "for stuff that would be shown to the user, for the user to read, either localize your error messages, or make sure your API returns error codes that the application can turn into localized error messages".
No. It means that you should present /your/ error messages to your users, not error messages generated by SQLite. SQLite is a programmer’s tool. Its users are programmers, and that’s who its error messages are addressed to. You should not be letting your users see error message intended for you, and you should not be making your users worry about what to do about them.

If your software wants to react to a SQLite result code by presenting one of its own error messages to its users, that’s fine.

Simon.
Guy Harris
2017-09-26 21:35:52 UTC
Permalink
Post by Simon Slavin
Post by Guy Harris
Post by Jens Alfke
A user wouldn't know what to do with "you've exceeded your stored data quota”?
A Turkish or Chinese user likely wouldn’t. (SQLite’s error messages are not localized.)
Which means "for stuff that would be shown to the user, for the user to read, either localize your error messages, or make sure your API returns error codes that the application can turn into localized error messages".
No. It means that you should present /your/ error messages to your users, not error messages generated by SQLite. SQLite is a programmer’s tool. Its users are programmers, and that’s who its error messages are addressed to. You should not be letting your users see error message intended for you, and you should not be making your users worry about what to do about them.
"You" in "either localize your error messages, *or* make sure your API returns error codes that the application can turn into localized error messages", refers to SQLite. It ultimately doesn't *need* have have error messages - it could leave that entirely up to the application - but it provides them nonetheless.

And there's an "or" in my statement; providing a way to get error codes more fine-grained than SQLITE_IOERR - so that you don't say "disk I/O error" for errors that have nothing to do with a disk reporting an I/O error - is something that the application would need in order to provide an appropriate error to end users and to the people to whom the end user might report an error. And, no, "that error occurred on this operation" is not the sort of fine-grained to which I'm referring.

So just provide a way to get an indication of what *particular* type of error generated SQLITE_IOERR - permission error, quota error, actual disk I/O error, etc. - and recommend that this *always* be used for SQLITE_IOERR.
Jens Alfke
2017-09-26 21:22:30 UTC
Permalink
Post by Guy Harris
Which means "for stuff that would be shown to the user, for the user to read, either localize your error messages, or make sure your API returns error codes that the application can turn into localized error messages".
Um, that’s what I said.
Post by Guy Harris
And none of this argues against presenting to the user, in their native language, a message saying "you've exceeded your file system quota", if that is, in fact, what happened.
This thread isn’t about filesystem quotas. Why do you keep bringing them up as an example?
Post by Guy Harris
*don't* tell the user anything that might convince them that their disk is failing if you didn't get EIO or the equivalent on some other OS - and don't tell them something that, when relayed to tech support, would lead the support person to believe that, either.
As we’ve been saying, error messages produced by SQLite are not meant to be shown to end users, for all the reasons previously discussed.

SQLite’s error numbers ought to be sufficiently detailed once you enable extended error codes, and/or get the OS errno. The original set of error codes is inadequate to be sure, for historical reasons, but compatibility rules out breaking that API; that’s why the extended error codes exist.

—Jens
Guy Harris
2017-09-26 21:53:59 UTC
Permalink
Post by Jens Alfke
Post by Guy Harris
Which means "for stuff that would be shown to the user, for the user to read, either localize your error messages, or make sure your API returns error codes that the application can turn into localized error messages".
Um, that’s what I said.
Post by Guy Harris
And none of this argues against presenting to the user, in their native language, a message saying "you've exceeded your file system quota", if that is, in fact, what happened.
This thread isn’t about filesystem quotas. Why do you keep bringing them up as an example?
Because the thread brings up the general question of folding multiple types of errors into a single error code, and because it's an example of an error you *would* want to show to the user, just as SQLITE_FULL is.
Post by Jens Alfke
Post by Guy Harris
*don't* tell the user anything that might convince them that their disk is failing if you didn't get EIO or the equivalent on some other OS - and don't tell them something that, when relayed to tech support, would lead the support person to believe that, either.
As we’ve been saying, error messages produced by SQLite are not meant to be shown to end users, for all the reasons previously discussed.
SQLite’s error numbers ought to be sufficiently detailed once you enable extended error codes, and/or get the OS errno. The original set of error codes is inadequate to be sure, for historical reasons, but compatibility rules out breaking that API; that’s why the extended error codes exist.
Yes, which is why I wasn't suggesting changing the error codes.

I *would* suggests an additional API to get a *separate* extended error code, so that if, for example, a write() fails and that failure is turned into SQLITE_IOERR, you can get something that distinguishes EIO from other errors such as EFBIG, EDQUOT, etc.. I would also suggest that the documentation say that, if you don't have to run on a version of SQLite that doesn't support the new API, the new API be used by applications and libraries running atop SQLite in their error-reporting code, rather than, for example, just using sqlite3_errstr().
Simon Slavin
2017-09-26 22:11:57 UTC
Permalink
Post by Guy Harris
I *would* suggests an additional API to get a *separate* extended error code, so that if, for example, a write() fails and that failure is turned into SQLITE_IOERR, you can get something that distinguishes EIO from other errors such as EFBIG, EDQUOT, etc..
You know about this, right ?

<https://www.sqlite.org/c3ref/extended_result_codes.html>

<https://www.sqlite.org/c3ref/c_abort_rollback.html>

Simon.
Guy Harris
2017-09-26 22:17:50 UTC
Permalink
Post by Simon Slavin
Post by Guy Harris
I *would* suggests an additional API to get a *separate* extended error code, so that if, for example, a write() fails and that failure is turned into SQLITE_IOERR, you can get something that distinguishes EIO from other errors such as EFBIG, EDQUOT, etc..
You know about this, right ?
<https://www.sqlite.org/c3ref/extended_result_codes.html>
<https://www.sqlite.org/c3ref/c_abort_rollback.html>
Yes. I do.

You know about this, right?

https://www.sqlite.org/rescode.html#ioerr_access

It shows a whole bunch of codes, none of which are "something that distinguishes EIO from other errors such as EFBIG, EDQUOT, etc.".

I'm not asking for something that indicates what xXYZZY method reported the error. I'm asking for something that indicates what the underlying problem causing the I/O error is, to the extent that information is available from the OS, i.e. *why* did the I/O operation not succeed?
Jens Alfke
2017-09-27 03:49:14 UTC
Permalink
Post by Guy Harris
It shows a whole bunch of codes, none of which are "something that distinguishes EIO from other errors such as EFBIG, EDQUOT, etc.".
I'm not asking for something that indicates what xXYZZY method reported the error. I'm asking for something that indicates what the underlying problem causing the I/O error is, to the extent that information is available from the OS, i.e. *why* did the I/O operation not succeed?
Yes, you’re right — I hadn’t looked at the definitions of those extended codes, and they seem … um, not super useful. As a client of SQLite, I want to know what specifically went wrong, not which internal bit of SQLite reported the error.

—Jens
Nico Williams
2017-09-27 04:17:11 UTC
Permalink
Post by Jens Alfke
A user wouldn't know what to do with "you've exceeded your stored data quota”?
A Turkish or Chinese user likely wouldn’t. (SQLite’s error messages
are not localized.) And there are plenty of messages that are much
less understandable to a lay user than the one you picked out.
They could be. And regardless, more detail in the error _code_ is
better for the applicaton developer.

EIO is definitely an I/O error. Could be all sorts of things. E.g.,
you're using iSCSI and the network is timing out.

ENOSPC is very, very different. Reporting ENOSPC as an I/O error means
that the app or the user must now use df(1) or strace(1) or similar to
work it out, when SQLite3 could just have reported that the FS is full.
Ditto EDQUOT.

EROFS is also very different.

And so on.

These are ancient error codes.
Post by Jens Alfke
The *number* might annoy the support staff; right off the top of
your head, what's the error number for "file system quota exceeded"
or "I/O error"? (No cheating by looking it up in a man page or
include file!)
On the contrary, error numbers are a lot easier for support. They’re
independent of locale, they don’t get re-worded from one version of
the app to the next, and they’re very short and easy to dictate over
the phone. Of course, these shouldn’t be the primary error information
given to the user! But the user-level error message should be
something specific to the application, like “an unexpected database
error occurred (19)” instead of "Abort due to constraint violation”.
The number would appear only for support purposes.
As long as you can resolve them to symbolic names and/or messages.

Nico
--
Simon Slavin
2017-09-26 20:43:43 UTC
Permalink
Post by Guy Harris
The *number* might annoy the support staff; right off the top of your head, what's the error number for "file system quota exceeded" or "I/O error"? (No cheating by looking it up in a man page or include file!)
My support staff are allowed to look things up.

My users, when faced with a result which means "permission error" will probably grant all permissions to all apps and all users because that’s the simplest way to make a permission error message go away. My users don’t understand the Posix permission model, because they’re not computer experts, they are financial sector specialists, or psychologists, or tailors. I don’t want them thinking about computer problems. If they knew enough about computer problems to fix a permission problem the right way, they wouldn’t be paying me.

Simon.
Guy Harris
2017-09-26 21:18:18 UTC
Permalink
Post by Simon Slavin
Post by Guy Harris
The *number* might annoy the support staff; right off the top of your head, what's the error number for "file system quota exceeded" or "I/O error"? (No cheating by looking it up in a man page or include file!)
My support staff are allowed to look things up.
Just don't force them to ask, *before* the look it up, whether the user's running Linux or macOS or FreeBSD or Solaris or Windows.
Post by Simon Slavin
My users, when faced with a result which means "permission error" will probably grant all permissions to all apps and all users because that’s the simplest way to make a permission error message go away. My users don’t understand the Posix permission model, because they’re not computer experts, they are financial sector specialists, or psychologists, or tailors. I don’t want them thinking about computer problems. If they knew enough about computer problems to fix a permission problem the right way, they wouldn’t be paying me.
And, when faced with a result that says "disk I/O error", your users will probably think their disk is broken and take it in to be fixed.

So:

for errors where the user *can* perhaps fix the problem, such as "out of file system space" (which already has its own error) and "out of disk quota" (which doesn't, and which is different from "out of file system space"), tell the user what the problem is (and, at the application level, offer a suggestion such as "delete some of those cat videos you've saved");

for errors where the user probably *can't* fix the problem, tell them that there's a problem for which they need to talk to support, and tell them what to say to the support staff so that the support staff knows that, for example, a disk hasn't gone bad.

(And there are places where "you don't have permission to do that" *is* the appropriate thing to tell the user, e.g. if they're trying to open a document to which they haven't been given read permission, or trying to write to a document to which they haven't been given write permission, etc.. I suspect your support staff have better things to do with their time than explain to a user that they're not allowed to read somebody else's private files.)
Scott Robison
2017-09-26 21:08:51 UTC
Permalink
There are physical errors and there are logical errors. If an error is
generated from write, it's not unreasonable to classify it as an
"output error". From read as an "input error".

There is a lot of sqlite source code that already exists and has been
written to work with the current interface. That's probably one of the
reasons why extended errors were created, to provide finer
granularity. Regardless of whether it is ideal or not, changing sqlite
in a way that would break existing code is unlikely to happen.

Ultimately it doesn't matter when error codes were added to a given
operating system or which predates what. A decision was made in the
past. The options are to live with decisions that were made in the
past (one I've seen espoused multiple times in this mailing list),
come up with an approach that allows old code to work but exposes new
information (probably the genesis of extended error codes), or break
older code (which I've not seen done deliberately).

I'm not trying to tell you that your point is invalid. It makes sense
in many ways. Short of a time machine I doubt anything will change
(though those decisions are above my pay grade).

That being said, I don't know any non-technical users who are going to
panic that IOERR means their hard drive is dying specifically because
of that text being displayed. Panic perhaps, but not that a hard drive
is about to die. Most people I know don't have that level of
understanding to correlate IO / ERR / hard drive failure rates. They
just think the stupid program is broken and not letting them get their
work done. As for the experienced technical people I know (or at least
me), their first thought would be to investigate the problem, not to
assume their hard drive is failing.
Post by Guy Harris
Post by Simon Slavin
Post by Guy Harris
Post by Jens Alfke
The basic error code is SQLITE_IOERR, which just means "Some kind of disk I/O error occurred” according to the comment. Which is true in this case; an I/O operation returned an error.
But the *disk* didn't - the *operating system* did, so if SQLITE_IOERR really means "Some kind of disk I/O error occurred", it's *not* the right error to return for a *permission* error.
Those error codes were devised in a day when OS error codes were more simple.
EDQUOT was introduced in 1982, with 4.2BSD; when was SQLITE_IOERR devised?
Post by Simon Slavin
Also please note that those error codes are addressed to programmers. Your users should never see the text explanation of the number. Because your users wouldn’t know what to do about them.
A user wouldn't know what to do with "you've exceeded your stored data quota"? If so, your site has failed to explain to the users that they've been given a quota, limiting the amount of space on the server that they can use, and that if they exceed their quota, they either need to delete stuff they no longer need, move stuff they might *someday* need but don't need *now* to some archival medium, or ask their system administrator to increase their quota?
Post by Simon Slavin
At most the user can be shown the number returned to they can quote it in a support call.
The *number* might annoy the support staff; right off the top of your head, what's the error number for "file system quota exceeded" or "I/O error"? (No cheating by looking it up in a man page or include file!)
And, yes, there needs to be *some* way to get the underlying problem reported to somebody in a position to do something about it - where "the underlying problem" includes "what did the OS say?" as much as it includes "what SQLite operation got the error?".
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
--
Scott Robison
Guy Harris
2017-09-26 21:43:27 UTC
Permalink
Post by Scott Robison
There are physical errors and there are logical errors. If an error is
generated from write, it's not unreasonable to classify it as an
"output error". From read as an "input error".
"Output error", yes, although it'd be useful to provide more information.

"Disk I/O error", no; it'd be unreasonable to classify "out of file system free space", "over quota", "permission error", "file bigger than 2GB-1 bytes", etc. as "disk I/O errors".
Post by Scott Robison
There is a lot of sqlite source code that already exists and has been
written to work with the current interface. That's probably one of the
reasons why extended errors were created, to provide finer
granularity. Regardless of whether it is ideal or not, changing sqlite
in a way that would break existing code is unlikely to happen.
I was not suggesting that. I didn't suggest adding SQLITE_OVERQUOTA or SQLITE_WRITE_PERMISSION_ERROR.
Post by Scott Robison
Ultimately it doesn't matter when error codes were added to a given
operating system or which predates what. A decision was made in the
past. The options are to live with decisions that were made in the
past (one I've seen espoused multiple times in this mailing list),
come up with an approach that allows old code to work but exposes new
information (probably the genesis of extended error codes), or break
older code (which I've not seen done deliberately).
I'm advocating a better version of the second of those choices than the current "here's the raw operating system error code" version that's currently provided. (sqlite3_system_errno() also has the problem that if SQLITE_IOERR is provided for something *other* than a failure that provides a system errno value, it doesn't do the job.)
Post by Scott Robison
That being said, I don't know any non-technical users who are going to
panic that IOERR means their hard drive is dying specifically because
of that text being displayed. Panic perhaps, but not that a hard drive
is about to die. Most people I know don't have that level of
understanding to correlate IO / ERR / hard drive failure rates.
They don't treat "disk I/O error" as an indication that their disk is having a problem? That doesn't need an understanding of hard drive failure rates.

I have no reason to dismiss the original writer's notion that "disk I/O error" might "[scare] the hell out of the poor sysadmin who suspects a filesystem corruption might be going on".
Post by Scott Robison
They
just think the stupid program is broken and not letting them get their
work done. As for the experienced technical people I know (or at least
me), their first thought would be to investigate the problem, not to
assume their hard drive is failing.
Less investigative work is needed if the software gives a more detailed error report.
Keith Medcalf
2017-09-27 13:58:05 UTC
Permalink
Well, the terminology is correct. These *ARE* I/O Errors. The system attempted I/O. It failed. Hence the term I/O Error. It is irrelevant whether the error was caused because the heads on the tape drive need cleaning, access was denied to spool storage, the disk was full, someone yanked the cable out of the disk drive, or the card reader got jammed up.

The program attempted to perform an I/O operation (of some kind).
That operation failed.

Now it is up to you, the application programmer, to figure out what to do. There are quite a few facilities available to help you do this. SQLite itself has Extended error codes that can help point to where the trouble is. You can ask the Operating System for its abend code. You can sacrifice chickens or baby's or perhaps read the tea leaves.

Personally I think we need a reversion to the old days when there were only four status codes: OK, What?, How?, and Where?

This is far more effective than niggling over what an error code means. It means there was an error. Full-stop end of sentence, paragraph, page, chapter, section, story and book. There are more than adequate was of determining the nature and localization of the error. Use them. Love them.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.
-----Original Message-----
From: sqlite-users [mailto:sqlite-users-
Sent: Tuesday, 26 September, 2017 21:49
To: SQLite mailing list
Subject: Re: [sqlite] bug: failure to write journal reported as "disk
I/O error"
Post by Guy Harris
It shows a whole bunch of codes, none of which are "something that
distinguishes EIO from other errors such as EFBIG, EDQUOT, etc.".
Post by Guy Harris
I'm not asking for something that indicates what xXYZZY method
reported the error. I'm asking for something that indicates what the
underlying problem causing the I/O error is, to the extent that
information is available from the OS, i.e. *why* did the I/O
operation not succeed?
Yes, you’re right — I hadn’t looked at the definitions of those
extended codes, and they seem … um, not super useful. As a client of
SQLite, I want to know what specifically went wrong, not which
internal bit of SQLite reported the error.
—Jens
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Guy Harris
2017-09-27 16:39:12 UTC
Permalink
Post by Keith Medcalf
Well, the terminology is correct. These *ARE* I/O Errors. The system attempted I/O. It failed. Hence the term I/O Error.
Just don't call it a "disk I/O error".
Post by Keith Medcalf
It is irrelevant whether the error was caused because the heads on the tape drive need cleaning, access was denied to spool storage, the disk was full, someone yanked the cable out of the disk drive, or the card reader got jammed up.
I.e., SQLITE_IOERR is equivalent to -1 as a return from various UN*X system calls, so that, when a program sees it, it needs to get further error information, such as an errno value, to deal with the error and, if necessary, to report it.

So it *is* relevant to what to do next.
Keith Medcalf
2017-09-27 17:00:57 UTC
Permalink
Post by Guy Harris
Post by Keith Medcalf
Well, the terminology is correct. These *ARE* I/O Errors. The
system attempted I/O. It failed. Hence the term I/O Error.
Just don't call it a "disk I/O error".
Well, maybe. However the I/O that had the error was associated with a disk operation (as opposed to a "Video I/O Error", or a "Cardpunch I/O Error", "Printer I/O Error", etc.).
Post by Guy Harris
Post by Keith Medcalf
It is irrelevant whether the error was caused because the heads on
the tape drive need cleaning, access was denied to spool storage, the
disk was full, someone yanked the cable out of the disk drive, or the
card reader got jammed up.
I.e., SQLITE_IOERR is equivalent to -1 as a return from various UN*X
system calls, so that, when a program sees it, it needs to get
further error information, such as an errno value, to deal with the
error and, if necessary, to report it.
Yes. An I/O operation of some sort was attempted. That I/O operation involved some sort of "disk" access. That operation failed with an error.
Post by Guy Harris
So it *is* relevant to what to do next.
Well, in the same sort of way as the message from attempting to send Snail mail "Mail Undeliverable" is relevant to what to do next. You know that the error was related to the delivery of the postal item just as the "Disk I/O Error" indicates that an I/O operation that involved a disk operation failed with an error.

In both cases you need to query for the underlying error condition in order to determine what to do. So in that sense it is relevant to what to do next -- you need to query for more particulars. This is opposed to say a "Syntax Error" in which it is pretty clear that the error is a mis-formed statement.

In both cases only the underlying error code from the "Operating System" can assist you in what to do next. In the case of Snail Mail, the underlying error code of "No Such Address" entails a completely different response than "Delivery Vehicle Exploded and Your Message Burned Up en-route to Delivery" or "Delivery Location Not Found -- Destroyed by Hurricaine". Similarly the return code from the OS in the case of an I/O is relevant to determining the next step in recovery -- "Device Not Found" is different from "Filesystem is Corrupt" which is different from "Access is Denied" which is different from "General Failure Reading ..." (who is General Failure and why is he trying to read my files ... I should hope that such attempts fail :) )
Guy Harris
2017-09-27 17:12:56 UTC
Permalink
Post by Keith Medcalf
Post by Guy Harris
Post by Keith Medcalf
Well, the terminology is correct. These *ARE* I/O Errors. The
system attempted I/O. It failed. Hence the term I/O Error.
Just don't call it a "disk I/O error".
Well, maybe. However the I/O that had the error was associated with a disk operation (as opposed to a "Video I/O Error", or a "Cardpunch I/O Error", "Printer I/O Error", etc.).
Actually, if it had occurred on my machine, it wouldn't have been associated with a disk operation; should the application check where the data is stored and say "flash memory I/O" error if appropriate? :-)

The point is that the *disk* isn't particularly relevant to some possible errors - the problem isn't with the *disk*, which reported no error, the problem is with something in the *file system*, such as the amount of space available, the permissions on files, etc..
Post by Keith Medcalf
Post by Guy Harris
Post by Keith Medcalf
It is irrelevant whether the error was caused because the heads on
the tape drive need cleaning, access was denied to spool storage, the
disk was full, someone yanked the cable out of the disk drive, or the
card reader got jammed up.
I.e., SQLITE_IOERR is equivalent to -1 as a return from various UN*X
system calls, so that, when a program sees it, it needs to get
further error information, such as an errno value, to deal with the
error and, if necessary, to report it.
Yes. An I/O operation of some sort was attempted. That I/O operation involved some sort of "disk" access. That operation failed with an error.
...and the next step is to determine what the exact error was.
Post by Keith Medcalf
Post by Guy Harris
So it *is* relevant to what to do next.
Well, in the same sort of way as the message from attempting to send Snail mail "Mail Undeliverable" is relevant to what to do next. You know that the error was related to the delivery of the postal item just as the "Disk I/O Error" indicates that an I/O operation that involved a disk operation failed with an error.
In both cases you need to query for the underlying error condition in order to determine what to do.
Well, in the first case, the postal service may well say more than just "Mail undeliverable", such as "no such person at that address", "no such address", etc..
Post by Keith Medcalf
So in that sense it is relevant to what to do next -- you need to query for more particulars. This is opposed to say a "Syntax Error" in which it is pretty clear that the error is a mis-formed statement.
Yes, but even in *that* case, it's often possible to say, for example, "there's no operator between the operands "foo" and "bar"" rather than just "syntax error".
Don V Nielsen
2017-09-27 18:07:38 UTC
Permalink
I'm sorry gentlemen, but the argument has gotten thick and petulant.

Every complaint and response is resolving down to a mainframe line of
thought (thank God), which few today are willing to accept. That is, the
SQLite software is kept compatible with its root. How many System 370 Cobol
programs can run on to today's hyper-tech mainframes? All of them. Sqlite
was inspired by a need and built at a time when PC's and O/S's were more
primitive. It has some flaws from then that are still with us today. Why?
Because of compatibility. It is more important for this product to be
compatible with its origin because people and machines are dependent on it
being that way.

The error system is what it is because it worked back then. Efforts have
been made to improve things as far as giving the developer more information
to work with and figure things out. The developer knows their version of
SQLite and their operating system(s). It's the developer's responsibility
to match what SQLite provides given the values available in the environment
that it exists in. If the developer's application is going to run atop of
Linux, and Windows, and Android, it is the developers job to create their
application in a way that is sensitive to them.

SQLite is capable of running anywhere. It is not its responsibility for
knowing exactly where it is being run. It doesn't function at that layer.
If the codes are not enough, the amalgamation is out there. Get a copy and
build into it a new layer of error interpretation logic and have it return
what is needed by the O/S that is specific to the application's needs and
wants.

If I'm wrong, I'm sorry. But I got the feeling the original post (the very
first) was a tantrum, and no matter what anyone does to sooth the situation
is working. It is only getting worse.

Again, I apologize for losing my control.
Post by Keith Medcalf
Post by Keith Medcalf
Post by Guy Harris
Post by Keith Medcalf
Well, the terminology is correct. These *ARE* I/O Errors. The
system attempted I/O. It failed. Hence the term I/O Error.
Just don't call it a "disk I/O error".
Well, maybe. However the I/O that had the error was associated with a
disk operation (as opposed to a "Video I/O Error", or a "Cardpunch I/O
Error", "Printer I/O Error", etc.).
Actually, if it had occurred on my machine, it wouldn't have been
associated with a disk operation; should the application check where the
data is stored and say "flash memory I/O" error if appropriate? :-)
The point is that the *disk* isn't particularly relevant to some possible
errors - the problem isn't with the *disk*, which reported no error, the
problem is with something in the *file system*, such as the amount of space
available, the permissions on files, etc..
Post by Keith Medcalf
Post by Guy Harris
Post by Keith Medcalf
It is irrelevant whether the error was caused because the heads on
the tape drive need cleaning, access was denied to spool storage, the
disk was full, someone yanked the cable out of the disk drive, or the
card reader got jammed up.
I.e., SQLITE_IOERR is equivalent to -1 as a return from various UN*X
system calls, so that, when a program sees it, it needs to get
further error information, such as an errno value, to deal with the
error and, if necessary, to report it.
Yes. An I/O operation of some sort was attempted. That I/O operation
involved some sort of "disk" access. That operation failed with an error.
...and the next step is to determine what the exact error was.
Post by Keith Medcalf
Post by Guy Harris
So it *is* relevant to what to do next.
Well, in the same sort of way as the message from attempting to send
Snail mail "Mail Undeliverable" is relevant to what to do next. You know
that the error was related to the delivery of the postal item just as the
"Disk I/O Error" indicates that an I/O operation that involved a disk
operation failed with an error.
Post by Keith Medcalf
In both cases you need to query for the underlying error condition in
order to determine what to do.
Well, in the first case, the postal service may well say more than just
"Mail undeliverable", such as "no such person at that address", "no such
address", etc..
Post by Keith Medcalf
So in that sense it is relevant to what to do next -- you need to query
for more particulars. This is opposed to say a "Syntax Error" in which it
is pretty clear that the error is a mis-formed statement.
Yes, but even in *that* case, it's often possible to say, for example,
"there's no operator between the operands "foo" and "bar"" rather than just
"syntax error".
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Loading...