Saturday, May 13, 2017

Is my H2 query slow because of a missing index?

Making sure the proper indexes are in place is one of the first things you should do when trying to speed up query execution.

This github repo has several running examples of performance defects that you can run on your own machine.  Test 05b is a demonstration of a slow H2 query because it is missing an index.

One reason you might expect that a particular query was slow is if you saw something fishy in your very nice SQL response time metrics from glowroot.org.  In following screenshot, note that the query in the top row is more than 10 times slower than the other two.


Here is the query -- its average response time is 358.5ms.
SELECT hid from history WHERE aid = ?


So to find out whether the right indexes are in place for this query, you need to get the query's execution plan.  To get this, you first load the H2 administrative console.  My H2 DB is running on my local box, so here is the URL:  http://localhost:8082/

Then, you have to submit a particular query to get the execution plan.  All you need to do is prefix your query with the word 'explain' and then you need to replace all your bind variable question marks with actual (or at least legal) values.  I also had to prefix my table name with the schema name.  Here is the result:
EXPLAIN SELECT hid from s01.history WHERE aid = 235




Towards the bottom of the above screenshot, you will see this text:


/* S02.HISTORY.tableScan */

The text "tableScan" indicates that no index was used to, probably, process the WHERE clause, which is "WHERE aid = 235".  The next question to ask is whether any index exists on the column aid?  And sure enough, on the left below my DDL is missing an index on the aid column, and the DDL on the right contains the new index that fixes the problem.


CREATE TABLE HISTORY (
       hid bigint  not null,
       tid int ,
       bid int,
       aid int,
       delta int,
       mtime timestamp,
       filler char(22) );
ALTER TABLE HISTORY add primary key (hid);
CREATE TABLE HISTORY (
       hid bigint  not null,
       tid int ,
       bid int,
       aid int,
       delta int,
       mtime timestamp,
       filler char(22) );
ALTER TABLE HISTORY add primary key (hid);
CREATE INDEX HIST_ACCOUNT ON HISTORY(aid);
Once the new DDL has been committed, the execution plan should specify the name of the index it uses to process the WHERE aid = ? instead of saying "tableScan".  So, the "HIST_ACCOUNT" index name used above (in yellow) should show up in the execution plan.  Let's check.


And sure enough, its there!  The "tableScan" has been replaced with this:


/* S02.HIST_ACCOUNT: AID = 235 */
Now, lets see whether the new index improved performance.  I ran the test again, and here is the glowroot data:
Sure enough, the "SELECT hid from HISTORY" query now has 0.21ms response time.   Before adding the index, it was 358.5ms, which is more than 350ms slower.

If getting rid of the tableScan does not improve query response time, check out this H2 performance page.

So here is the summary, and whether you are using H2, Oracle, DB2, SQL Server Postgres or whatever, the steps are roughly the same.
  1. Run the "EXPLAIN" to get the execution plan of the query.
  2. Check for any "table scan" or "full table scan" in the execution plan.
  3. Find what index is missing, and add it in the DDL and re-run the DDL to recreate the table.  Columns in the WHERE clause and in any JOIN clauses are good candidates for indexes.
  4. Re-run the "EXPLAIN" to make sure the "tableScan" is replaced by the name of the index that you added.
  5. Re-test your query to see if execution response time improves.
NOTE:  The schema names in my screenshots and text are a bit messed up (sometimes s01, sometimes s02) because of some oddities in my environment.