Sunday, 23 September 2012

Hadoop, Sql Azure the BI and Analytics Dilemma

MSFT teams in Redmond haven’t made any official statement on Analysis Service (SSAS) for Sql Azure. With Azure making big commitment on Hadoop the dilemma becomes two fold building BI & Analytical capabilities for Sql Azure and Hadoop. I’m sure countless discussions have happened in those Redmond buildings deciding the same. Some guess work around this “MSFT will release Analysis Services for Sql Azure early next year”. The Analysis Service should involve support for both Sql and No Sql.
I started with a simple application in the financial industry vertical in cloud. I soon reached a point where I was pushed to corner to make some hard decision on cloud as the application besides the transactional features required tons of BI and Analytics spread across both structured and unstructured world.
Sql Azure unwillingly became the choice of structured database and Hadoop and Azure became willing choice for unstructured data. Hadoop provide very quick and optimal search across terra bytes of data  and Sql Azure with its limited on cloud offering gave transactional. The BI and Analytical capability became a nightmare.
The MSFT teams on Sql Azure are tight lipped about an “Analysis framework in Sql Azure which can do both structured & un structured”. For right now I have a Sql 2012 Analysis Service running on Azure virtual machines which did pretty much what I wanted. But then “unwillingly I have to say SSAS of Sql 2012 is coming in Sql Azure” pure guesswork.
The application as of current looks like this
What below is my thought process pure guesswork? The shift to No Sql is evident MSFT – Azure platform has to embrace the No Sql platform completely this involves extending the BI and Analytics. What seems to be emergent is an Analysis Service architecture is expected to support both Sql and No Sql running out of Sql Azure platform. The Sql Server 2012Analysis Service -  xvelocity an in memory analytics engine is step to move into the PaaS model for Sql Azure.This new engine is delivered within the following modules
  • xVelocity for Data Warehousing: is a memory optimized columnstore index for high speed data querying (relational queries).
  • xVelocity for Business Intelligence: is the in-memory analytics engine for Analysis Services (Tabular Model) and PowerPivot.
As the name in-memory engine implies all the data is stored in memory. Although todays computer systems are equipped with gigabytes of memory, memory still is an expensive resource. Therefore we need to be able to analyze the memory usage of the Analysis Services in-memory engine to understand how much memory is consumed by the different applications. The SQL Azure Analysis Service Framework should deliver a massively parallel processing infrastructure with a software solution that embeds both SQL and MapReduce analytic processing for deeper analytic insights on multi-structured data and new analytic capabilities driven by data science. Analysis service is most likely to uses an integrated MapReduce analytics engine for embedded analytic processing, simplifying enterprise access for big data analytics. Sql Server Analysis Service formally supported only SQL so that any business intelligence tool that generates standard SQL or any business analyst that knows SQL can immediately invoke the power of data science without having to learn programming languages or new interfaces. What is expected Sql Analysis Services Framework expected to look like is image While writing my applications I have found pushing data into Hadoop Azure cumbersome. I have resorted to writing data poll mechanism which will pull data from the Azure Blob storage push to Hadoop Head Node. This is not a documented way, it works. Find the code here.