Blog Moved

Future posts related to technology are directly published to LinkedIn
https://www.linkedin.com/today/author/prasadchitta

Saturday, May 31, 2014

Data and Analytics - some thoughts on consulting

On this technical blog, I have not been very regular now-a-days primarily due to the other writing engagements on artha SAstra and on Medium.

Yesterday, I have been asked to address a group of analytic enthusiasts in an interactive session arranged by NMIMS at Bangalore on a theme "Visualize to Stratagise". Having spent around three hours several topics on Analytics and specifically on Visual Analytics were discussed.

I thought of writing on two aspects of Analytics which I have seen in past few months on this post to give a little food for thought to those who are consulting on Analytics.

Let "data" speak.
Few weeks back one of my customers had a complaint on database. Customer said, we have allocated a large amount of storage to the database and in one month time all the space was consumed. As per the customer's IT department at the maximum of 30K business transactions were only performed by a user group of 50 on the application which is supported by this database. So, they have concluded there is something wrong on the database and hence an escalation to me to look into it.

I have suspected some interfacing schema that could be storing the CLOB/BLOB type data and there could be missing cleanup and asked my DBA to give me a tablespace growth trend. The growth is on the transaction schema and across multiple transaction tables in that schema. I have ruled out some abnormal allocation on a single object with this observation.

We thought of running a simple analytics on the transaction data to see the created user on those transactions to verify if someone has run any migration script that could have got a huge amount of data into transaction tables or some other human error.

For our surprise we have seen 1100 active users who have created 600,000+ transactions in the database. All through the different times and most regular working day, working hour pattern. No nightly batch or migration user created the data. We went ahead with a detailed analytics on the data which has mapped all the users across geography of the country of operation.

We created a simple drill down visualization of the data and submitted to business and IT groups at the customer with a conclusion that the data indeed valid and created by their users and there is no problem with the system.

So, the data spoke for itself and the customer's business team said to the IT team that they have started using the system across the country for the last month and all the users were updating transactions on this system. This fact the IT team was not aware of. IT team is still thinking it is running pilot mode with one location and 50 users.

Let the data speak. Let it show itself to those who need it for decision making.Democratize the data.

The second point which came up evidently yesterday was

"If you torture your data long enough, it will confess anything"
Do not try to prove the known hypothesis with the help of data. It is not the purpose of analytics. With data and statistics you can possibly infer anything. Any bias towards a specific result will defeat the purpose of analytics.

So, let the data with its modern visualization ability be an unbiased representative which shows the recorded history of the business with all its deficiencies, with all its recording errors and all possible quality problems; in the process of decision making and strategising..... 

Hope I made clear my two points while consulting on Analytics....