Sunday, November 16, 2014

The Worst Server Side Performance Problem

I've been working in software performance for the last 8 years, and by far the worst performance problem has been "too many database calls."  But don't take my word for it.  Here are some outside opinions from people who should know.

Martin Fowler has written two books on the "Most Influential Book Every Programmer Should Read" list. In 2003, he wrote,

"Try to pull back multiple rows at once. In particular,
 never do repeated queries on the same table to get multiple rows."  (link). 

He used the "never" word, but more than 10 years later, it is still common place for code to hit the same table multiple times when the end user makes a single click.

Fast forward to 2010.  DynaTrace is a .NET and Java monitoring tool that has seen lots of production action.  Andi Grabner of DynaTrace put "too many database calls" at number one on the list of worst performance problems.

Just a few years later in 2012, this problem shows up as #4 on DZone's list of biggest performance headaches:
"Too many external system calls are performed in a synchronous and sequential manner."

When DynaTrace revised their list in 2013, client side performance problems were now the largest problem, but "too many database calls" remained the largest problem on the server side.

The "SELECT N+1" is a specialization of this problem.  Also in 2013, AppDynamics wrote up the SELECT N+1 in their "Common Performance Problems"series.  here's part of their post:

Here’s how it usually goes down: You have two database tables with a parent/child relationship (like blogs and posts, or products and line items), and you want to iterate through all of them. So you do this:
SELECT id FROM Parentand then executing a query for each record:
SELECT * FROM Child WHERE parent_id = ?
 See how they talk about parent and child relationships?  It is quite ironic that their own blog surfaces this issue, because their monitoring tool does nothing to help you identify those parent-child relationships.  Nothing.

The following year, in 2014, Steven Haines, author of Pro Java EE 5 Performance Management and Optimization, wrote a piece for AppDynamics.  The SELECT N+1 issue shows up as the number one issues on this updated list.  This problem just isn't going away!

This year, in 2015, people are still talking about the same problem.  Check out this survey question, from  "DZone Guide to Performance and Monitoring" (2015 Edition), where a few hundred IT professionals chimed on on the question.  "Too many DB queries" shows up as the number two root cause of performance problems.


Near the beginning of “Java Performance: TheDefinitive Guide”, Scott Oaks has a peculiar little section that is aptly named “Look Elsewhere: The Database Is Always the Bottleneck.” It is really just a page or two that cautions against blaming Java for all your performance woes, because database performance issues, regardless of the programming language, will very likely eclipse any vanilla Java performance issues.

Slow individual db queries are the highest reported performance problem (54.8%) in this 2015 report from the JRebel/XRebel people.  The amorphous category of "inefficient app code" is 2nd (51.5%), and "too many db calls" is third with 38.6%.



So we have famed and published Java architects and performance engineers (Martin Fowler, Scott Oaks, Steven Haines), we have two of this biggest application performance monitoring tools in the market (Dynatrace and AppDynamics) and relatively current surveys (DZone) from in the field.  For this important group of experts, no other performance problem bubbles to the top ten list of server-side performance issues more than "too many database calls."

Performance engineers deal with dozens upon dozens of SQL statements in every trace session.  We need a tool that quickly identifies those relationships from all of that SQL text.