[sqlite] possible bug: select clause column alias shadowing

Discussion:

Moritz Bruder

7 years ago

Hi,

I just came across a strange behaviour in sqlite3 (3.22.0 2018-01-22
18:45:57
0c55d179733b46d8d0ba4d88e01a25e10677046ee3da1d5b1581e86726f2171d).
Consider the following test case:

    CREATE TABLE test (name varchar);
    INSERT INTO test VALUES ("foo"),("bar");

-- Returns a single row with a single column: 'foo!'
SELECT (test.name || '!') AS tname
FROM test
WHERE tname = 'foo!'

    --Returns an empty result.
    SELECT (test.name || '!') AS name
    FROM test
    WHERE name = 'foo!';

What happens is that the identifier "name", defined in the
SELECT-clause, gets shadowed by the table's column "name". I'm not
exactly sure what the SQL standard says but it is wrong in my opinion. I
expect it to be the other way round.Let me know whether you consider it
a bug.

Best wishes,

Moritz

Simon Slavin

7 years ago

Permalink

I'm not exactly sure what the SQL standard says

As best I can find, SQL92 does not specify what happens when you choose an AS clause giving a value name the same as a column. It doesn't go into much detail at all about applying "AS" to a value.

It does go into more detail about applying "AS" to a table name. Section 5.4 paragraph 12 it says

An <identifier> that is a <correlation name> is associated with
a table within a particular scope. The scope of a <correlation
name> is either a <select statement: single row>, <subquery>, or
<query specification> (see Subclause 6.3, "<table reference>").
Scopes may be nested. In different scopes, the same <correlation
name> may be associated with different tables or with the same
table.

We can apply this to a 'correlation name' for a value rather than a table. If I read this right, it suggests you're right: the use of 'name' should equate to your definition '(test.name || '!')', not to 'test.name'.

However if I see code where the programmer picked an alias the same as a real column name I'd question the quality of that programmer's thinking. I get a "Do not do this." feeling.

Cue comment from the development team about backward compatibility.

Simon.

Clemens Ladisch

7 years ago

Permalink

Post by Simon Slavin
As best I can find, SQL92 does not specify what happens when you choose
an AS clause giving a value name the same as a column.

| 7.3 <table expression>
|
| Function
|
| Specify a table or a grouped table.
|
| Format
|
| <table expression> ::=
| <from clause>
| [ <where clause> ]
| [ <group by clause> ]
| [ <having clause> ]
|
| [...]
| General Rules
|
| 1) If all optional clauses are omitted, then the result of the <ta-
| ble expression> is the same as the result of the <from clause>.
| Otherwise, each specified clause is applied to the result of
| the previously specified clause and the result of the <table ex-
| pression> is the result of the application of the last specified
| clause.
|
| [...]
|
| 7.9 <query specification>
|
| Function
|
| Specify a table derived from the result of a <table expression>.
|
| Format
|
| <query specification> ::=
| SELECT [ <set quantifier> ] <select list> <table expression>
|
| <select list> ::=
| <asterisk>
| | <select sublist> [ { <comma> <select sublist> }... ]
|
| <select sublist> ::=
| <derived column>
| | <qualifier> <period> <asterisk>
|
| <derived column> ::= <value expression> [ <as clause> ]
|
| Syntax Rules
|
| 1) Let T be the result of the <table expression>.
| [...]
|
| 6) Each <column reference> directly contained in each <value ex-
| pression> ... shall unambiguously reference a column of T.
|
| [...]
| General Rules
|
| 1) a) ii) [...] each <value expression> is applied to each row of T
| yielding a table of M rows, where M is the cardinality of T.
| The i-th column of the table contains the values derived by
| the evaluation of the i-th <value expression>.

In other words, aliases in the SELECT clause are evaluated _after_ the
FROM and WHERE clauses are done.

The order of the SELECT/WHERE clauses in the SQL syntax is misleading;
the actual behaviour would be better represented by something like this:

( FROM test
WHERE name = 'foo!' )
SELECT test.name || '!' AS name;

Regards,
Clemens

Jean-Christophe Deschamps

7 years ago

Permalink

Post by Clemens Ladisch
In other words, aliases in the SELECT clause are evaluated _after_ the
FROM and WHERE clauses are done.
The order of the SELECT/WHERE clauses in the SQL syntax is misleading;
( FROM test
WHERE name = 'foo!' )
SELECT test.name || '!' AS name;

I must be misinterpreting:

create temp table t (a int);
insert into t values (1), (2), (6);
select a int, printf('<%5i>', a) fmt from t where fmt like '%>';

int fmt
1 < 1>
2 < 2>
6 < 6>

Here WHERE understands what fmt refers to.

Clemens Ladisch

7 years ago

Permalink

Post by Jean-Christophe Deschamps

Post by Clemens Ladisch
In other words, aliases in the SELECT clause are evaluated _after_ the
FROM and WHERE clauses are done.

I was talking about the SQL standard. (I might have mentioned that somewhere ...)

Post by Jean-Christophe Deschamps
select a int, printf('<%5i>', a) fmt from t where fmt like '%>';
Here WHERE understands what fmt refers to.

SQLite tries to be helpful. But when in doubt (i.e., when an alias tries to
shadow a real column), it chooses the standard-conforming interpretation.

Regards,
Clemens

Igor Tandetnik

7 years ago

Permalink

Post by Moritz Bruder
    CREATE TABLE test (name varchar);
    INSERT INTO test VALUES ("foo"),("bar");
-- Returns a single row with a single column: 'foo!'
SELECT (test.name || '!') AS tname
FROM test
WHERE tname = 'foo!'
    --Returns an empty result.
    SELECT (test.name || '!') AS name
    FROM test
    WHERE name = 'foo!';
What happens is that the identifier "name", defined in the SELECT-clause, gets shadowed by the table's column "name".

If I recall correctly, SQL standard doesn't allow aliases from SELECT to be used in WHERE clause, only in ORDER BY and, possibly, HAVING (I'm not sure of the latter). SQLite allows aliases in WHERE as an extension, but prefers the real column name in case of conflict, so as to match the behavior of other DBMS.

--
Igor Tandetnik

petern

7 years ago

Permalink

Compared to PostgreSQL, SQLite does a better job here when there is no
input column collision.

The column collision case below returns no rows in both SQLite and
PostgreSQL:

WITH t(a) AS (VALUES ('foo')) SELECT a||'!' AS a FROM t WHERE a='foo!';

But the following edit with intermediating alias column b produces 'ERROR:
column "b" does not exist' in PostgreSQL:

sqlite> WITH t(a) AS (VALUES ('foo')) SELECT a||'!' AS b FROM t WHERE
b='foo!';
b
foo!

A safer coding style would be to use an intermediating query/view/cte when
any input column's meaning is being modified:

sqlite> WITH t(a) AS (VALUES ('foo')), u AS (SELECT a||'!' AS a FROM t)
SELECT a FROM u WHERE a='foo!';
a
foo!

Peter

...