Saturday, October 18, 2014

Chatty SQL Performance Anti-Patterns

Year after year, these three SQL "data access" anti-patterns cause a large percentage of the performance problems we see as performance engineers.

Anti Pattern 1: "SELECT N+1"

ANTI-PATTERN: When code, instead of joining two tables, iterates through the result set of the parent table and executes SELECTs to the child table.  Much has been written on this topic
ResultSet parentRs = myStatement.executeQuery(strParentSELECT);
while(rs.next()) {
    String myId = rs.getString(1);
    ResultSet childRs = statement.executeQuery("select * from child where id = " + myId); 
    //do something with childRs
TO REFACTOR (Option 1): Join the parent and child tables.  Here is one way to tackle this with Hibernate, if the # of rows is reasonable small enough (perhaps < 200) to be retrieved in one trip.
If more than 200 rows are to be returned, consider using hibernate batch size parameter.
TO REFACTOR (Option 2): Instead of JOINing, use a sub-select, like this:
select * from CHILD where CHILD.MY_KEY in (select MY_PARENT_KEY from parent);

This describes the relationship between just two tables.   Often we see a parent table driving the results of an entire pattern of table accesses.  Instead of SELECT N+1, we have SELECT N(x) +1, where (x) is the pattern of SQL statements instead of a single child table.
This diagram (from wuqiSpank) shows a pattern (in yellow) of SQL statements that is repeatedly executed.

Anti Pattern 2: UnCached - Static

Static tables are those that generally get less than 1 update an hour during an average day of production traffic.  All tables in a schema should be categorized as either static or dynamic, as seen here.

ANTI-PATTERN:  Source code repeatedly SELECTs static data, even though that data rarely changes.

TO REFACTORUse EHCache (which includes Hibernate integration).  The goal is to enable EHCache to essentially 'interrupt'  the jdbc request and instead return results from an in-memory cache.  EHCache manages a configurable cache refresh process.

Anti Pattern 3: UnCached - Dynamic

ANTI-PATTERN:  Source code retrieves dynamic data (perhaps customer or account data) from db and re-queries that same data before the business process has completed.

TO REFACTOR: Temporarily store the initial query results so they are available to subsequent code.  For example, say that two web service calls are made after a user clicks a button.  Both calls require the same five SQL queries for initialization.  Solution:  Refactor code into a single web service call instead of two.

References

The following show that "SELECT N+1" is a widely problematic.

No comments: