Discussion:
Index on BOOLEAN field
François
2011-09-13 10:23:53 UTC
Permalink
Hello,

Let us consider a table "ITEM" with a BOOLEAN field "FLAG". This table
may contain up to 100 000 entries and those entries can be selected
using "WHERE ITEM.FLAG = ?" conditions.

Is then a good or a bad practice to add an index on this field if we
want to improve SELECT time execution ?

Please note that it concerns an iPhone app.

Thank you !

Best Regards,

François
Richard Hipp
2011-09-13 11:33:28 UTC
Permalink
Post by François
Hello,
Let us consider a table "ITEM" with a BOOLEAN field "FLAG". This table
may contain up to 100 000 entries and those entries can be selected
using "WHERE ITEM.FLAG = ?" conditions.
Is then a good or a bad practice to add an index on this field if we
want to improve SELECT time execution ?
Bad practice. It will likely make things worse, not better.
Post by François
Please note that it concerns an iPhone app.
Thank you !
Best Regards,
François
_______________________________________________
sqlite-users mailing list
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
--
D. Richard Hipp
drh-CzDROfG0BjIdnm+***@public.gmane.org
François
2011-09-13 12:07:32 UTC
Permalink
Hello Richard, and thank you for this fast answer.

Can we consider that creating an index for an integer field is a good
practice if this field has at least 3 possible values ? Or more ?

Best Regards,

François
Post by François
Hello,
Let us consider a table "ITEM" with a BOOLEAN field "FLAG". This table
may contain up to 100 000 entries and those entries can be selected
using "WHERE ITEM.FLAG = ?" conditions.
Is then a good or a bad practice to add an index on this field if we
want to improve SELECT time execution ?
Bad practice.  It will likely make things worse, not better.
Post by François
Please note that it concerns an iPhone app.
Thank you !
Best Regards,
François
_______________________________________________
sqlite-users mailing list
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
--
D. Richard Hipp
_______________________________________________
sqlite-users mailing list
Simon Slavin
2011-09-13 12:15:11 UTC
Permalink
Post by François
Can we consider that creating an index for an integer field is a good
practice if this field has at least 3 possible values ? Or more ?
It's about chunkiness, and which of the values you're looking for.

If all possible values are equally distributed, and you frequently look for a particular value, the index will help even if you have only two possible values. If you have almost all '2011' rows, and you're searching for '2011', then the index won't help much.

One nice thing about SQL is the command syntax doesn't change when you make or delete indexes. So without having to chance your app you can write your code, get it working, then experiment with CREATE and DROPping indexes and see whether this improves speed.

Simon.
Igor Tandetnik
2011-09-13 12:23:57 UTC
Permalink
Post by Simon Slavin
Post by François
Can we consider that creating an index for an integer field is a good
practice if this field has at least 3 possible values ? Or more ?
It's about chunkiness, and which of the values you're looking for.
If all possible values are equally distributed, and you frequently look for a particular value, the index will help even if you
have only two possible values. If you have almost all '2011' rows, and you're searching for '2011', then the index won't help
much.
Actually, the break-even point is roughly 1/10: an index helps if you are selecting 10% or fewer of the records in the table; otherwise, a linear scan is faster.
--
Igor Tandetnik
Petite Abeille
2011-09-13 18:22:22 UTC
Permalink
Post by Simon Slavin
It's about chunkiness, and which of the values you're looking for.
Chunkiness? Surely you mean selectivity, no?
Simon Slavin
2011-09-13 20:16:32 UTC
Permalink
Post by Petite Abeille
Post by Simon Slavin
It's about chunkiness, and which of the values you're looking for.
Chunkiness? Surely you mean selectivity, no?
I'm sorry, but I've failed to find a better word. You can have a table with ten thousand rows. One column of the table has a thousand different values. Another column of the same table has only four different values, and one of them appears only once. That column is more chunky.

Simon.
Jay A. Kreibich
2011-09-13 20:24:11 UTC
Permalink
Post by Simon Slavin
Post by Petite Abeille
Post by Simon Slavin
It's about chunkiness, and which of the values you're looking for.
Chunkiness? Surely you mean selectivity, no?
I'm sorry, but I've failed to find a better word.
Clumpy, not chunky. 8-)

-j
--
Jay A. Kreibich < J A Y @ K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
but showing it to the wrong people has the tendency to make them
feel uncomfortable." -- Angela Johnson
François
2011-09-14 21:11:18 UTC
Permalink
Thank you all
Post by Simon Slavin
Post by Petite Abeille
Post by Simon Slavin
It's about chunkiness, and which of the values you're looking for.
Chunkiness? Surely you mean selectivity, no?
I'm sorry, but I've failed to find a better word.
  Clumpy, not chunky.  8-)
   -j
--
"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
_______________________________________________
sqlite-users mailing list
Igor Tandetnik
2011-09-13 12:03:19 UTC
Permalink
Post by François
Let us consider a table "ITEM" with a BOOLEAN field "FLAG". This table
may contain up to 100 000 entries and those entries can be selected
using "WHERE ITEM.FLAG = ?" conditions.
Is then a good or a bad practice to add an index on this field if we
want to improve SELECT time execution ?
This may help if and only if a) you have many more records with FLAG=1 than with FLAG=0 (or vice versa); and b) most of the time, you are looking up the records belonging to the small subset. For example, if there's a small number of "active" or recent records that need to be processed, and a large archive of "processed" records.

However, in such a case, you might be even better off splitting the small subset into its own separate table.
--
Igor Tandetnik
François
2011-09-13 12:12:58 UTC
Permalink
Thank you Igor!.

Igor, your a) and b) cases do concern me. But reading your both
answers I am quite confused now :-)

Best Regards,

François
Post by Igor Tandetnik
Post by François
Let us consider a table "ITEM" with a BOOLEAN field "FLAG". This table
may contain up to 100 000 entries and those entries can be selected
using "WHERE ITEM.FLAG = ?" conditions.
Is then a good or a bad practice to add an index on this field if we
want to improve SELECT time execution ?
This may help if and only if a) you have many more records with FLAG=1 than with FLAG=0 (or vice versa); and b) most of the time, you are looking up the records belonging to the small subset. For example, if there's a small number of "active" or recent records that need to be processed, and a large archive of "processed" records.
However, in such a case, you might be even better off splitting the small subset into its own separate table.
--
Igor Tandetnik
_______________________________________________
sqlite-users mailing list
François
2011-09-13 12:21:45 UTC
Permalink
Richard, Igor,

I made some tests based on a) and b) cases described by Igor.

I can see much faster SELECT with an index than without it: so Igor
seems to be right. Richard, can you tell me in what cases performance
will be less good?

Thank you,

Best Regards,

François
Post by François
Thank you Igor!.
Igor, your a) and b) cases do concern me. But reading your both
answers I am quite confused now :-)
Best Regards,
François
Post by Igor Tandetnik
Post by François
Let us consider a table "ITEM" with a BOOLEAN field "FLAG". This table
may contain up to 100 000 entries and those entries can be selected
using "WHERE ITEM.FLAG = ?" conditions.
Is then a good or a bad practice to add an index on this field if we
want to improve SELECT time execution ?
This may help if and only if a) you have many more records with FLAG=1 than with FLAG=0 (or vice versa); and b) most of the time, you are looking up the records belonging to the small subset. For example, if there's a small number of "active" or recent records that need to be processed, and a large archive of "processed" records.
However, in such a case, you might be even better off splitting the small subset into its own separate table.
--
Igor Tandetnik
_______________________________________________
sqlite-users mailing list
_______________________________________________
sqlite-users mailing list
Richard Hipp
2011-09-13 12:24:39 UTC
Permalink
Post by François
Richard, Igor,
I made some tests based on a) and b) cases described by Igor.
I can see much faster SELECT with an index than without it: so Igor
seems to be right. Richard, can you tell me in what cases performance
will be less good?
CREATE TABLE t1(x,y,z BOOLEAN);
SELECT * FROM t1 WHERE z;

The SELECT above will generally be much faster without an index, unless most
values of z are false.
Post by François
Thank you,
Best Regards,
François
Post by François
Thank you Igor!.
Igor, your a) and b) cases do concern me. But reading your both
answers I am quite confused now :-)
Best Regards,
François
Post by Igor Tandetnik
Post by François
Let us consider a table "ITEM" with a BOOLEAN field "FLAG". This
table
Post by François
Post by Igor Tandetnik
Post by François
may contain up to 100 000 entries and those entries can be selected
using "WHERE ITEM.FLAG = ?" conditions.
Is then a good or a bad practice to add an index on this field if we
want to improve SELECT time execution ?
This may help if and only if a) you have many more records with FLAG=1
than with FLAG=0 (or vice versa); and b) most of the time, you are looking
up the records belonging to the small subset. For example, if there's a
small number of "active" or recent records that need to be processed, and a
large archive of "processed" records.
Post by François
Post by Igor Tandetnik
However, in such a case, you might be even better off splitting the
small subset into its own separate table.
Post by François
Post by Igor Tandetnik
--
Igor Tandetnik
_______________________________________________
sqlite-users mailing list
sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Post by François
_______________________________________________
sqlite-users mailing list
sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
--
D. Richard Hipp
drh-CzDROfG0BjIdnm+***@public.gmane.org
Continue reading on narkive:
Loading...