Tuesday, April 30, 2013

Database roles - 2. Big...Mistakes?

Looks like my Friday post about BigData got some attention. In that case - let's continue! While looking at tons of articles about Big-anything I've noticed a very constant pattern - everybody is talking about multi-node clusters of database servers: you constantly hear about horizontal scalability, sharding, node failure tolerance etc... But let's ask the question - is it only me who feels that we are over-complicating our development patterns? Maybe we just don't know how to properly use available hardware resources?

Before going any further - let me tell you a "war story": couple of years ago a system built by us at Dulcian was compared to a similar system (both by the scope and functionality). And at one of the tender kick-off meetings our competitors loudly accused us of lying in the proposal! The reasoning was very simple - we claimed that our current required hardware was one database server (16 cores total) utilized by 15% and two application servers (4 core each) utilized by 5%. For competitors such a footprint was plainly impossible - to do exactly the same job they've been burning to the ground a couple of racks of servers!
Yeah, sure - by request our servers were counted and matched what was in the proposal!

That's the price to pay for architectural mistakes: for "them" it was a "new norm" to require a separate application server for every 100 users - and for years nobody questioned that ratio! So, let me ask - did our IT industry fall in the same trap and became accustomed to bad performance? Did we forget to wonder WHY do we need that extra app server for every 100 users?

As far as I see while attending different conferences current software/database architects prefer to have the same problems as their neighbor (pretty good "out-of-jail" card to show to management, isn't it?) rather than doing proper tuning on their systems. It is very rare for a company to do a real honest-to-God performance review - and event if it happens, it is usually in the wrong time and against the will of the current IT staff

Recently I talked to a number of top specialists who make living out of fixing somebody's systems - and all of them repeat the same story:
  • Once upon a time there was a current IT system that performed badly
  • "Contemporary architect" proposed complete rewrite of the whole system using "the new way"
  • Management looked at the cost of rewrite and went ballistic!
  • Somebody on IT staff finally proposed to hire a performance tuning experts/guru-DBAs. At least to be able to say next time that they followed all possible ways!
  • "Hired gun" in a couple weeks found a lot of "interesting things" (wasting about 3/4 of time fighting uncooperative locals). Some of those issue were fixable - some of them were not. But still the system started working faster (and often faster than anybody expected).
  • "Hired gun" got a big check (and curses behind the back from IT staff)
  • "Contemporary architect" got a pink slip (and also curses behind the back from IT management)
As you see, nobody is happy at the end! But the whole situation started when local IT specialists were not able to correctly evaluate existing resources and solutions. Yes, we all would like to be trend-compliant, but jumping to the system overhaul without a good reason is a dangerous as falling behind the technology curve.

Summary: your IT may not be as big as you think!


No comments: