Post by Keith MedcalfPost by Jay A. Kreibicheach column is usually undesirable. A given SELECT can usually only
use one index per query (or sub-query), so it rarely makes sense to
stack up the indexes... adding unused indexes only slows down
insert/update/deletes, as well as makes the file size much larger.
Generally speaking indexes should be treated as a performance trade-off.
Ideally they're a performance gain. That's kind of the point. If
they're not an overall gain, you likely shouldn't be using them.
Post by Keith MedcalfWhen you create an index (other than a UNIQUE index used to enforce a
constraint, or an index on a parent or child key in a foreign-key
relationship, where such an index may greatly increase INSERT or UPDATE
performance) you are "moving" execution time from the retrieval
processing to the maintenance processing of your application.
Yes and no. If you view an index as an optimization, then the idea
is usually to increase overall performance, so that there is net win.
It is true that an index will (hopefully) increase the performance
of many queries at the possible cost of additional maintenance processing,
but the net change depends on the application's read/write ratios,
and the maintenance patterns.
In that sense, I'd say the indexes do not move execution *time*, so
much as move and adjust execution *cost*. If the application's needs
and access patterns are such that the refactored set of costs is lower,
the indexes are usually seen as a correct and positive optimization.
If the new set of costs results in lower overall performance, the
index is seen as a negative thing.
There is also nothing special about an automatic UNIQUE index.
The costs are the same. The only difference is the motivation for
creating it, accepting the write performance cost as the price of
enforcing the constraint. Adding a UNIQUE constraint (and the index
that goes with it) is not an optimization, but an application requirement.
Post by Keith MedcalfWhen you add an index, you are (usually) optimizing retrieval and
query operations. The execution time saved during such query
operations does not disappear (it is not recovered). These processor
cycles and I/O operations are "removed" from retrieval operations
and "spent" when you perform updates to the database to maintain
the indexes.
I disagree with this idea, as it implies there is a 1:1 exchange in
read performance gains and write performance loss. That is very
rarely the case. Even the relative costs of a single read vs a
single write are not linear or particularly related. All of these
things are highly dependent on the table size, the different indexes,
and the operation being performed.
It isn't about moving costs from category A to category B, it is
about finding efficient work flows that work faster and better. We
already know there are some situations when an index will provide
significant performance benefits, and other cases when the same index
may slow things down. A big part of what the query optimizer must do
is identify these situations and pick the best use of the available
resources-- in many situations that may be to ignore the index.
That's not about shifting execution time, it is about getting rid of
it, and picking the fastest path for the situation at hand. The
balance is that an index opens up more options to the query
optimizer, but there is also an associated cost-- and that cost
should only be paid if the index is used from time to time.
This balance is true of all operations, not just SELECTs. For
example, a good index can make UPDATE and DELETE operations faster,
just as it may make SELECTs faster. That kind of blows the theory of
moving execution time around, since there can be advantages in both
types of operations.
Index optimization is in no way a zero-sum game. The bag of water is
*not* fixed, and it is just as possible to remove water as it is to
add it.
Post by Keith MedcalfPost by Jay A. KreibichAdditionally, indexing any column (even one used heavily in a query
filter) is not useful if the index doesn't reduce the working data
set to ~10% of the rows. In other words, having an index on a column
that has all one value (be it NULL or anything else) accomplishes
nothing but significantly slowing down queries that use the index.
Same is true, for example, of a True/False column with an even
distribution. Generally, unless the index can be used to eliminate
the vast majority of rows, it will slow down, rather than speed up, a
query.
The exception to this is, of course, where the index created is a
covering index because using a covering index, while it may not
necessarily reduce the number of rows significantly, eliminates the
accesses to the base table to retrieve data that might be being used
in further operations.
Hence the "generally." Covering indexes are a somewhat unique case,
and not every database can use them. Normally they still don't result
in much savings unless a noticeable percentage of rows are eliminated
(or very expensive columns are eliminated, such as large BLOBs), but
the break-even point is definitely lower than a normal index, with the
worse case being no worse than a normal table scan. All of this is
true for read operations only, of course. Covering indexes offer no
advantage over normal indexes in the case of UPDATE and DELETE
operations (and no disadvantages either; they can still be used to
speed up these operations, just no more than a "normal" index). And,
of course, covering indexes often have a higher maintenance cost than
normal indexes, since they typically contain more columns.
Post by Keith MedcalfCare needs to be taken to not prematurely add indexes that will add
maintenance cost but not significantly improve query performance (ie,
be careful not to just add water into the bag -- the objective is to
poke it around, not just add more water).
If it is possible add water, it is also possible to take it away.
As I said, this is not a zero-sum game. The bag of water is not fixed.
I agree that premature optimization is a bad idea-- not just with
indexes, but with just about any aspect of software development.
Unless you know the patterns and needs of your application, you're
optimizing guesses, which is always a bad thing.
In the end, a big part of the reason index optimization is so
difficult is because it makes the cost structure much more complex.
Adding an index is not so simple that queries are reduced by 50% and
insert/update/delete operations increase by 50%. If it was that easy,
the query optimizer would be much better at suggesting indexes (and/or
automatically creating them, as it does create temporary indexes
for some queries).
Rather, the cost/benefit requirements of an index are much more
complex, and often have to do with the number of rows being operated
on, as well as the total number of rows in the table. This is why
it is important to understand how indexes work, and how they provide
(or fail to provide) advantages to the application.
"Using SQLite" (http://shop.oreilly.com/product/9780596521196.do)
has a very lengthy discussion of indexes and how they work,
specifically because it is difficult to generalize the use of
indexes. One must really look at each application and each database
(and the data in that database!) with a solid knowledge of what an
index can-- or cannot-- provide in order to find places where an
index will make a positive difference.
-j
--
Jay A. Kreibich < J A Y @ K R E I B I.C H >
"Intelligence is like underwear: it is important that you have it,
but showing it to the wrong people has the tendency to make them
feel uncomfortable." -- Angela Johnson