Blog Moved

Future posts related to technology are directly published to LinkedIn
https://www.linkedin.com/today/author/prasadchitta

Saturday, April 30, 2011

SQL performance tuning

Having seen several performance problems within IT systems, I have a methodology for performance tuning. When it comes to a SQL query tuning, it should be slightly different.

9 out of 10 cases of performance problems on relational database systems relate to a bad SQL programming. Even with the latest "optimizers" within the commercial database management core execution engines, it is the skill of the developer to make use of the facilities effectively to get the best out of the RDBMS.

I have recently came across a typical problem with a "junction table" design.
A set of tables represent USER and all are connected by a USERID
Another set of tables represent ACCOUNT and all are connected by ACCOUNTID

The software product implements the one to many relationship using a junction table called USER_ACCOUNTS which contains (USERID, ACCOUNTID) with the composite primary key USERID, ACCOUNTID.

Now there are 30K users 120K accounts and 120K USER_ACCOUNTS and a query that need to get data from USER tables involving some outer joins on itself and ACCOUNT tables which joins multiple tables to get various attributes; all these tables linked in a join using the junction table. That query runs for 18 hours.

When the query is split into two inline views with all the required filtering in each side of data access on USER and ACCOUNT individually and then joined using the junction table it completes in 43 seconds.

So, FILTER and then JOIN is better than JOIN and then FILTER in terms of resource consumption. Hence the performance tuning is all about doing the sequence of actions in the right order to minimize the consumption of resources to perform the job!

Tuesday, April 26, 2011

Data Serialization, Process Parallization!

There is a lot of buzz around: going away from traditional data processing i.e., a relational database and persistent data in relational form being processed by a set of processes that capture, process (validate, summarize, re-format etc.,) and present (display on multiple format displays over multiple channels in verbal and multimedia formats) that data.

But where are we going? Object orientation of encapsulating data with its own operations to make loosely coupled application services those can be orchestrated to form business services with in an enterprise.... Those enterprise business services further choreographed to form a business to business flows across common interfacing models...

The traditional computer architecture that has a Processor that can process the data which is stored in a distinct Memory of the computer. The processor and memory are two distinct components of the basic architecture of the modern computer. When an "object" needs to be stored or shared between two different applications one should "serialize" we have Hibernate and JSON etc., formats developed for this data serialization...

At the same time, there is a trend that takes over to process the data more and more in parallel streams in the shared nothing style clusters to break the typical task into smaller pieces and summarize the results in a hierarchical fashion to arrive at final result. This can happen when the data becomes more and more unstructured with the help of objects!

Overall the trend means we are slowly going away from structured data stores in traditional relational databases and going nearer to natural language, fault tolerant and predictive data capture and processing (e.g., you can type any spelling on Google and it will return results for the right word!) and more visual and multimedia presentation of the information (in mashups, maps) with bi-directional interaction (like social, I can "like", "comment" etc on the presented data as feedback!)

That is just my view.... Businesses have to gear-up quickly to adapt to these trends!