Wednesday, 6 November 2013

Azure SDK 2.2 Features & Migration


Brief Synopsis

The SDK 2.2 is not major upgrade it has brought more features around remote debugging in cloud which was a big ask till now & Windows Azure Management Libraries from a Developer per say. The Windows Azure Service Bus partition queue and topics multiple message broker will help in better availability, as each queue or topic is assigned to one message broker which is single point of failure, now with the new feature of multiple message broker assigned to a queue or topic. Please read the Q&A at the end to know the nuances with the same & the approach to move to Azure SDK 2.2 from generic project stand point of view

High Level following are the new features

  • Visual Studio 2013 Support
  • Integrated Windows Azure Sign-In support within Visual Studio
  • Remote Debugging Cloud Services with Visual Studio – Very Relevant to  Developers
  • Firewall Management support within Visual Studio for SQL Databases
  • Visual Studio 2013 RTM VM Images for MSDN Subscribers
  • Windows Azure Management Libraries for .NET – Very Relevant to  Deployment Team
  • Updated Windows Azure PowerShell Cmdlets and ScriptCenter
  • Topology Blast – Relevant to Deployment Team
  • Windows Azure Service Bus – partition queues and topics across multiple message brokers – Relevant to  Developers. All Service Bus based projects have to move ASAP.

Below covered are only the highlighted areas.

Remote Debugging Cloud Resources within Visual Studio

Today’s Windows Azure SDK 2.2 release adds support for remote debugging many types of Windows Azure resources. With live, remote debugging support from within Visual Studio, you are now able to have more visibility than ever before into how your code is operating live in Windows Azure.  Let’s walkthrough how to enable remote debugging for a Cloud Service:

Remote Debugging of Cloud Services

Note: To debug the web or worker role should be on Azure SDK 2.2

To enable remote debugging for your cloud service, select Debug as the Build Configuration on the Common Settings tab of your Cloud Service’s publish dialog wizard:


Then click the Advanced Settings tab and check the Enable Remote Debugging for all roles checkbox:


Once your cloud service is published and running live in the cloud, simply set a breakpoint in your local source code:


Then use Visual Studio’s Server Explorer to select the Cloud Service instance deployed in the cloud, and then use the Attach Debugger context menu on the role or to a specific VM instance of it:


Once the debugger attaches to the Cloud Service, and a breakpoint is hit, you’ll be able to use the rich debugging capabilities of Visual Studio to debug the cloud instance remotely, in real-time, and see exactly how your app is running in the cloud.


Today’s remote debugging support is super powerful, and makes it much easier to develop and test applications for the cloud.  Support for remote debugging Cloud Services is available as of today, and we’ll also enable support for remote debugging Web Sites shortly.

Windows Azure Management Libraries for .NET (Preview)- Automating PowerShell

Windows Azure Management Libraries are in Preview!

What do Azure Management Libraries provide, the control of creation, deployment and tear down resources which previously has been at a PowerShell level now will be available in the code.

Having the ability to automate the creation, deployment, and tear down of resources is a key requirement for applications running in the cloud.  It also helps immensely when running dev/test scenarios and coded UI tests against pre-production environments.

These new libraries make it easy to automate tasks using any .NET language (e.g. C#, VB, F#, etc).  Previously this automation capability was only available through the Windows Azure PowerShell Cmdlets or to developers who were willing to write their own wrappers for the Windows Azure Service Management REST API.

Modern .NET Developer Experience

We’ve worked to design easy-to-understand .NET APIs that still map well to the underlying REST endpoints, making sure to use and expose the modern .NET functionality that developers expect today:

  • Portable Class Library (PCL) support targeting applications built for any .NET Platform (no platform restriction)
  • Shipped as a set of focused NuGet packages with minimal dependencies to simplify versioning
  • Support async/await task based asynchrony (with easy sync overloads)
  • Shared infrastructure for common error handling, tracing, configuration, HTTP pipeline manipulation, etc.
  • Factored for easy testability and mocking
  • Built on top of popular libraries like HttpClient and Json.NET

Below is a list of a few of the management client classes that are shipping with today’s initial preview release:

.NET Class Name

Supports Operations for these Assets (and potentially more)




Hosted Services


Virtual Machines

Virtual Machine Images & Disks


Storage Accounts


Web Sites

Web Site Publish Profiles

Usage Metrics





Automating Creating a Virtual Machine using .NET

Let’s walkthrough an example of how we can use the new Windows Azure Management Libraries for .NET to fully automate creating a Virtual Machine. I’m deliberately showing a scenario with a lot of custom options configured – including VHD image gallery enumeration, attaching data drives, network endpoints + firewall rules setup - to show off the full power and richness of what the new library provides.

We’ll begin with some code that demonstrates how to enumerate through the built-in Windows images within the standard Windows Azure VM Gallery.  We’ll search for the first VM image that has the word “Windows” in it and use that as our base image to build the VM from.  We’ll then create a cloud service container in the West US region to host it within:


We can then customize some options on it such as setting up a computer name, admin username/password, and hostname.  We’ll also open up a remote desktop (RDP) endpoint through its security firewall:


We’ll then specify the VHD host and data drives that we want to mount on the Virtual Machine, and specify the size of the VM we want to run it in:


Once everything has been set up the call to create the virtual machine is executed asynchronously


In a few minutes we’ll then have a completely deployed VM running on Windows Azure with all of the settings (hard drives, VM size, machine name, username/password, network endpoints + firewall settings) fully configured and ready for us to use:



This new functionality will allow Windows Azure to communicate topology changes to all instances of a service at one time instead of walking upgrade domains. This feature is exposed via the topologyChangeDiscovery setting in the Service Definition (.csdef) file and the Simultaneous* events and classes in the Service Runtime library.

Windows Azure Service Bus – partition queues and topics across multiple message brokers

Service Bus employs multiple message brokers to process and store messages. Each queue or topic is assigned to one message broker. This mapping has the following drawbacks:

· The message throughput of a queue or topic is limited to the messaging load a single message broker can handle.

· If a message broker becomes temporarily unavailable or overloaded, all entities that are assigned to that message broker are unavailable or experience low throughput.


Q. Can I use Azure SDK 2.2 to debug Web Role , Worker Role using earlier SDK.

A. No, you need to have your roles migrated to SDK 2.2. For older role you can only get the diagnostic information out of Visual Studio if installed 2.2.


Q. What are typical issues while migrating from 1.8 to 2.2?

Worker Roles and Web Role Recycling

I have 3 worker roles and a web role in my project and I upgraded it to the new 2.2 SDK (required in VS2013). Ever since the upgrade, all of the worker roles are failing and they instantly recycle as soon as they're started.

Post can be found here-

Not able to update the Role after upgrading

I recently worked on an issue where the following error was being thrown while deploying the upgraded role to Windows Azure.  You just upgraded the SDK to 2.1 or 2.2 and you start getting the following error while deploying the role.

Link to the post

Q. Steps to Migrate to Azure SDK 2.2

A. Open the Azure project in Visual Studio 2012,

  1. For upgrading your project is via the Properties windows of the Cloud Project. You will see the following screenshot.
  2. clip_image013
  3. Follow through the upgrade process fix the errors,
  4. Run the Project locally to see if any errors fix the same.
  5. Check In the code post all fixes
  6. Test the same in Dev. environment to see if this breaking. There is potential chance of breaking due to dependencies.

Note: The web & worker role tend go into inconsistent state due library dependency mismatch. This will have to fixed

Generic Migration to Azure SDK 2.2- High Level Approach

The suggested approach is to start with one component, one web role and one worker role WCF Rest and see the impact in terms of issues and then decide then timelines for others. The POC will be done in 1 Sprint, the candidate are the following

  • Component – Reusable Component
  • Web Role – Portal Web
  • Worker Role – Portal Worker



· Installation of Azure SDK 2.2 -

Wednesday, 2 October 2013

Cloud–Azure Downtime --A Reality and It Hurts….

Cloud is a technological innovations which has been accepted in the main stream . The cloud platform is constantly evolving, still in its infancy still. The platform will take sometime to get matured. The non functional the ilaties is something which needs a lot more data, understanding and most of them come a very one sided T&C with fine print “Conditions Apply”.

Cloud downtime is a tricky topic and needs a lot more understanding. Its only when we run into a real situation do we start reading the fine print.

Microsoft Azure Compute ran into hot waters 27 Sep 2013 6:34 AM UTC North Central US datacenters. The documentation mentions Partial Performance Degradations & “The repair steps have been successfully executed and validated. Full Compute functionality has been restored in the affected North Central US sub-region. We apologize for any inconvenience this has caused our customers.”.

What does the Cloud Services SLA read as on the Microsoft site

Cloud Services, Virtual Machines and Virtual Network

  • For Cloud Services, we guarantee that when you deploy two or more role instances in different fault and upgrade domains, your Internet facing roles will have external connectivity at least 99.95% of the time.
  • For all Internet facing Virtual Machines that have two or more instances deployed in the same Availability Set, we guarantee you will have external connectivity at least 99.95% of the time.
  • For Virtual Network, we guarantee a 99.9% Virtual Network Gateway availability

What is the SLA around Fault Domain?

We all tend to think the Fault Domain will just be God sent, but in all actuality, Fault Domain is a set of computer on some rack and same data center is also liable to go down.  If the entire data center goes down there is no fail over.  Moreover even if the down time is unplanned and over the documented limits there is no region wise fail over.

<Correction> Fault Domain don't exists across data center as pointed out by Wood  - Thanks</Correction>

For more documentation on Fault Domain refer here.

Can one find which are fault domain instances for their instances?

No apparently MSFT doesn't indicate where the fault domain for the instances ,at least on the management portal. However Windows Azure SDK provides some properties you can use to query fault domain and upgrade domain information. RoleInstance class has property called FaultDomain that you can read to find out in which fault domain your role instance is running. There is a catch though – querying FaultDomain property will return either 1 (one) or 2 (two). This is because you are entitled for only 2 fault domains for your application. If your application is deployed across more fault domains you will not be able to determine this using the FaultDomain property.

Can one really rely on the Fault Domain?

My take is given the limited documentation and Fault Domain is more of abstract term which features in the SLA and T&C . The recent increase in compute & sql reporting downtime

  • Sept 27 – 2013 – North Central US , 4:36 AM, 5:15 AM, 6:34 AM.
  • Sept 28-2013 -Compute : Partial Performance Degradation [North Central US]
    • 28 Sep 2013 5:15 PM UTC
    • 28 Sep 2013 5:00 PM UTC
    • 28 Sep 2013 4:20 PM UTC
    • 28 Sep 2013 3:20 PM UTC
  • Sept 29 -Compute, Storage and SQL Reporting : Performance Degradation [Southeast Asia]
    • 29 Sep 2013 4:44 PM UTC
    • 29 Sep 2013 4:07 PM UTC
    • 29 Sep 2013 2:07 PM UTC
    • 29 Sep 2013 1:07 PM UTC

The string of episodes around Compute - did impact a lot of customers in multiple region and most of them I presume had fault domain (min 2 instances), still this resulted in disruption of service. My take if downtime can result in business loss then don’t depend on fault domains look for alternatives.

What alternatives does the customer have in case they don’t want to rely on Fault Domains?

Azure does provide Traffic Manager – which is CTP , this can be used please evaluate a fully working POC before using this as an option.

Traffic Manager allows you to load balance incoming traffic across multiple hosted Windows Azure services whether they’re running in the same datacenter or across different datacenters around the world. By effectively managing traffic, you can ensure high performance, availability and resiliency of your applications. Traffic Manager provides you a choice of three load balancing methods: performance, failover, or round robin.

Use Traffic Manager to:

Ensure high availability for your applications

Traffic Manager enables you to improve the availability of your critical applications by monitoring your hosted services in Windows Azure and providing automatic failover capabilities when a service goes down.

Run responsive applications

Windows Azure allows you to run services in datacenters located around the globe. By serving end-users with the hosted service that is closest to them in terms of network latency, Traffic Manager can improve the responsiveness of your applications and content delivery times.

Note: The customer pays for the extra set of instances running in a different data centre. The data synchronization methods have to build over and above Azure Sync. The deployment from the customer development environment has to deploy to both production and fail over simultaneously. The customer will entail additional costs.

What are dates around Traffic Manager GA?

No dates are announced in public from MSFT, The assumption is early March 2014.

What does the SLA for Compute mention?

Find the Compute SLA here. The important part of the document SLA Exclusions

a. SLA Exclusions

i. This SLA and any applicable Service Levels do not apply to any performance or availability issues:

1. Due to factors outside Microsoft’s reasonable control (for example, a network or device failure at the Customer site or between the Customer and our data center).<Important> Does a Natural Disaster classify beyond reasonable control </important>

2. That resulted from Customer’s or third party hardware or software. This includes VPN devices that have not been tested and found to be compatible by Microsoft. The list of compatible VPN devices is available at

3. When Customer uses versions of operating systems in either Virtual Machines or Cloud Services that have not been tested and found to be compatible by Microsoft. The Virtual Machines list of compatible Microsoft software and Windows versions is available at The Virtual Machines list of compatible Linux software and versions is available at The Cloud Services list of compatible operating systems is available at

4. That resulted from actions or inactions of Customer or third parties;

5. Caused by Customer’s use of the Service after Microsoft advised Customer to modify its use of the Service, if Customer did not modify its use as advised;

6. During Previews (e.g., technical previews, betas, as determined by Microsoft);


7. Attributable to the acts or omissions of Customer or Customer’s employees, agents, contractors, or vendors, or anyone gaining access to Microsoft’s Service by means of Customer’s passwords or equipment.

This is lot of legal jargons but My 2 cents is if your business cannot afford a downtime please read the SLA’s and considering cloud is evolving, we can only get better on the SLA’s. The customer will have to look for options other than MSFT.

Additional Notes

How to find out the Fault Domain? -

Microsoft Azure Real-time Service Status

Wednesday, 4 September 2013

Intelligence As A Service–Codename Zephyr

Intelligence As A Service
How often have you searched for a product on the internet haven't got a decent comparative analysis and pushed you closer to a purchase. How often have responded to a job advertisement and didn't know the complete analysis of the
company, the role, probable salary and the generic responses from HR folks what does it indicate. More importantly whom should you connect in the industry to better understand the opportunity which you are applying. Big Data comes into the
mainstream into everything. The next big thing I am working on an intelligent analyser for anything. Tagged as open source Codename Zephyr. More to follow

Sunday, 14 July 2013

Big Data–Back to the Basics


The Big Data Landscape is for ever changing. While studying what’s there in market and how do I really get a handle at understanding Big Data I come to find that every 2 weeks there is a new name in the landscape. Also the flip side is there names which would just vanish in a short time. It would be good to get a basic understanding of Big Data.  In this post I don't necessarily talk about a specific technology I’m just trying to get the understanding right. Big Data if you see is classified around 3 areas

  • Batch
  • Interactive
  • Real-time Tools.

One really has a challenge in terms of how to understand Big Data, what is base classification of the Big Data space. This is what the landscape looks like as of early Jan 2013 this year, this has changed.


Broadly this gives an idea of all the different things that are in the big data landscape. The only way one can tend to understand the Big Data space is the following high level concepts.

Batch Processing

The data provided needs to processed. Large amounts of data needs to be processed fairly quickly. Typically most seen in the world of Hadoop or HDInsight on Windows Azure. Essentially what does it entail.

  • It is has data spread over n number disk on each of the nodes.Has distributed cluster to process that data.
  • Volume of the data is generally TB to PB.
  • The primary programming model used is MapReduce , essentially what we are doing is Mapping out operations to each of the machine and post that there is aggregation using the reduce function.

The MapReduce function have to be typically written by a developer. The original project was started by Google Map/Reduce.Currently 2 open source projects – Hadoop & Spark. Spark provides primitives for in-memory cluster computing: your job can load data into memory and query it repeatedly much more quickly than with disk-based systems like Hadoop MapReduce.

Interactive Analysis

The large set of data needs to analysed interactively. There are 2 technologies generally been employed her is column based databases which has sequentially indexed data. It has capability of doing table scans pretty quickly. The second is to have as much data in the memory cache. This is pure play interactive platform which are in a position to analyse large sets of data at very low latency. One such good example is  Palantir’s Project Horizon majorly building interactive analysis of big data for US government. Below is a cool video which explains it more depth. What seems be important in these kinds of implementation

  • Data is never duplicated.
    • Compact in memory representation – The in memory data representation needs to real small on the memory footprint unto 16GB. Compression must be lightweight -  Dictionary & prefix based compression , localized block based scheme are effective.
  • Analyse functionality here is not business specific “its more like analyse any kind of data”
  • Partitioning of processing:  Shared nothing/Sharded architecture .
  • Partition Id for Objects would be a good idea, but sending half billion Partition Id from the client to the serve will not be a good idea. Another way is to have has Partition object ID’s in subset using the same hash, use the hash for query.
  • Other option of interactive analysis is Drill, Shark, Impala & Hbase. Originally started with Google Dremel.

<Video on Palantir’s Interactive Analysis Platform – Project Horizon>

Stream Processing

Hadoop’s batch-oriented processing is sufficient for many use cases, especially where the frequency of data reporting doesn’t need to be up-to-the-minute. However, batch processing isn’t always adequate, particularly when serving online needs such as mobile and web clients, or markets with real-time changing conditions such as finance and advertising.”

The real-time use case is an obvious one. If you need to respond or be warned in real-time or near real-time, for example, security breaches or a service impacting event on a VoIP or video call, the high initial latency of batch oriented data stores such as Hadoop is not sufficient.

Moreover the data is not valuable without analysis. In a typical real-time scenario data is fed from multiple sources and analyse this data on fly is seen as a business requirement in multiple industry Segments.

Streaming Big Data analytics needs to address two areas. First, the obvious use case, monitoring across all input data streams for business exceptions in real-time. This is a given. But perhaps more importantly, much of the data held in Big Data repositories is of little or no business value, and will never end up in a management report. Sensor networks, IP telecommunications networks, even data center log file processing – all examples where a vast amount of ‘business as usual’ data is generated. It’s therefore important to understand what’s being stored, and only persist what’s important (which admittedly, in some cases, may be everything). For many applications, streaming data can be filtered and aggregated prior to storing, significantly reducing the Big Data burden, and significantly enhancing the business value of the stored data.

Stream Processing was started by a project called Storm called Twitter the need out there was to able store the data like images, feeds and second was to aggregation very quickly.

Storm, Apache & Kafka are some of the Stream Processing Platform.

Quick Comparison Sheet


The No SQL Paradigm

The Relational Database built by Codd came into existence in the 70’s majored by IBM with a multiple query language variants. Post that one we saw.

In 80’s we saw a lot of applications been built , the need for faster speed was important. Ingres.They came up with the idea of have to save some part of the big database in another small area what they termed as Index which improved performance.

Then come along the web which kind change the dynamics on the database which is were the concept of No Sql came from. No Sql means Not Only SQL.

The focus on any relational database design is on “how can we store” without duplicating the same. There is very little focus given on how do we use this. Given that in the current times the focus is more “How do I Use this data” obliviously with the best performance and scalability at center of the conversation. NoSQL ideally is focused on “How Do I Use This Data” for example is the data going to use for job queue, shopping cart, cms or multiple other usage scenarios. Below is JSON of a shopping cart item, which consist of id, user id, line items all in one place and that’s how it actually gets used.

id : 3,
user_id : 25,
line_items: [
{ sku : '123', price: 1000,
name : 'Nunemaker Autograph'},
{ sku : '124', price: 1000,
name : 'Banker Autograph'},
shipping_address: {
street : '123 Some St.',
city : 'South Bend',
state : 'IN',
zip : '11216'
subtotal : 2000,
tax : 140,
total : 2140

Its best to keep all the data which kind of related in one place because that’s the way its going to be used. In the relational world the emphasis is around how are we going to store it and separate it VS in the No SQL world its kept altogether and usage becomes far simplified.

NoSQL is about data OR Pro Data and its about How you Use the Data.The debate of SQL Vs. NoSQL is never around Scalability, but when get to a point of Scaling its far more easier to scale in the NoSQL world.

No SQL is analogous to OLTP

In coming section I will concentrate on some of important Interactive Analysis & Streaming platform.

HBaseInteractive Platform

HBase is used by Facebook. All the messages that one sends to Facebook are actually done through HBase. Its all about high volume super fast INSERTS. Its also good at volatile READS. HBase was designed to a highly transactional system owing to the fact the data is pretty much in the memory.

HBase is an in memory , column store database. Apparently this has to be understood properly especially the column store is not the same as RDBMS column. A very efficient INSERT or writes engine. Also the definition of database for HBase is different.


  HBase relies on Hadoop for its persistent storage , that kind pretty explain the rest of the figure.

Zookeeper is a distributed coordinating service this keeps tracks all the HRegion Server and make sure whatever is written into the memory os also written in the HDFS. By definition the way HDFS works is that if you lose a node you already 3 replica’s.

Cassandra - Interactive Platform

Cassandra is very similar to HBase in terms of functionality. Its got a SQL like query language they call it the CQL. Both HBase and Cassandra are very good at writes and super fast and both are in memory.

Facebook initially started out of Cassandra that moved on to HBase. The real reason of moving to HBase is not very clear.


Apache Incubation Project inspired by Dremel. Designed to scale to 10k servers , query PB’s in minutes. Traditional PetaByte Map Reduce will take hours today , Drill apparently takes minutes and some cases seconds. This is an Open Source reimplementation of Dremel.

A little background on Dremel

Some Google Terminologies which are associated with Drill- Big Data & Big Query- Big Data in the industry by definition is about 500 million rows and above.  Big Data has a limitation on size of the column at 64kb. 

Big Data at Google – What does it mean from a number per say

  • 60 hours of YouTube video uploaded per minute
  • 100 million gigabytes search Index / Analysis of 2010
  • 425 million Gmail users.

Looking at the size of the data a relational database is an obvious no, that leaves us with an option of doing a full table scan, which can be expensive in the relational world, that’s where Dremel was born.

Big Query is the externalization of this technology.

What does a Big Query really look like ex: Finding top installed markets apps?

SELECT  top(appId 20) AS app, count( * ) AS count FROM installing.2012


Result in last than 20 seconds.

Where can we use Big Query?

  • Game and social media analytics
  • Infrastructure monitoring
  • Advertising campaign optimization
  • Sensor data analysis.

Apache Drill is Google Incubation project around Interactive Analysis of Large Datasets. Map Reduce is a batch mode tool and there is a latency associated with the same. There are cases where one would like to real-time data faster some of the scenario are

  • Adhoc –analysis with interactive tools
  • Real-time dashboards
  • Event/trend detection and analysis
    • Network intrusion
    • Fraud
    • Failures

The key point in Dremel is it uses Nested Data Model.

Apache Drill its system designed to support Nested Data.

  • Flat record are the simplest case of nested data that is root only
  • Support Schema(Protocol Buffers, Apache Avro) and Schema less (JSON, BSON) formats.

What is Nested Query Languages supported by Drill?

  • DRQL
    • SQL like query language for nested data
    • Compatible with Google BigQuery/Dremel
    • Designed to support efficient column based processing
  • Mongo Query Language
  • Other language/ programming models can plug.



The data model is a nested Data Model for Document split across Basic Document Data and URL entries. The query is very SQL like.

How does Data Flow work with Drill?

Data loaded into Hadoop Cluster by one of many mechanism i.e Hive, HDFS command line, Map/Reduce or NFS interface. The data in Hadoop is stored as row fashion. The Drill Load is responsible for converting the row based data into columnar manner.

Alternatively for the first time Row Based query is allowed which in turn helps in creating a columnar copy of the same.



What does the Query Execution of Drill look like at high level?


At a very high level the Query Execution involves the following step.

  • The Driver submits the Query(text) for the Parser. The Parser does a parsing builds the abstract syntax tree and hands it over to the Compiler.
  • The Compiler does the optimization and builds the execution plan
  • The Execution Engine is responsible for scheduling this Storage which can on any server

Typical Sql Query Components support? What are they

Query Components

  • FROM
  • (JOIN)

Key logical operators

  • Scan
  • Filter
  • Aggregate
  • (Join)

What is so unique about the SCAN logical operator?

 One of architecture goal for Drill has been to support multiple formats that is achieved by having a SCAN operator for each format. So for example the query consists of a Where which has JSON data the query would involve

SELECT Json(data URI)

Field and predicates are pushed down into the scan operator.

What’s actually involved in the Execution Engine?

Drill Engine Execution has 2 layers


  • Operator Layer which is serialization layer- This where individual records are processed, for example doing a count on a table there local table scan done followed by local aggregation and at the global aggregation a sum is done.
  • Execution Layer is not serialization aware, all it does is transfer blobs across nodes and cluster and this layer is responsible for communication between nodes, dependencies(what has to finish before) & fault tolerant.

A Complete Video on Drill can be found here

Introduction to Apache Drill


ImpalaInteractive Analysis

Impala kind of solve the same problem as Drill i.e moving beyond Map/Reduce , batch processing with low latency problems.  Very similar Columnar Storage and the complete works as Drill. So for the sake of simplicities will just get to the points of differentiations here.

However, it would be unfair of me to compare the two in detail in terms of maturity and functionality at the present time. As of October 29th, 2012, the Drill source code repository at [1] has code for a query parser and a plan parser which includes a reference plan evaluator which can perform scans against JSON-formatted data in flat files. Impala's tree at [2] includes a distributed query execution engine with support for cancellation, failure-detection, data modification via INSERT, integration with HDFS and HBase, JIT-compiled execution fragments via LLVM and a bunch of other stuff.

Impala is completely dependent on Hadoop. It utilizes Hive QL.Impala is progressing to becoming an MPP (Massively Parallel Processing) Architecture.

The query processing is similar to Drill.


The high drive from MSFT towards Impala is pretty evident as they want a good Interactive tool in this arena without that they would be doomed.


Storm- Stream Analysis

Storm is a free and open source distributed real-time computation system. Storm makes it easy to reliably process unbounded streams of data, doing for real-time processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use!

Storm has many use cases: real-time analytics, online machine learning, continuous computation, distributed RPC, ETL, and more. Storm is fast: a benchmark clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate.

Storm integrates with the queuing and database technologies you already use. A Storm topology consumes streams of data and processes those streams in arbitrarily complex ways, repartitioning the streams between each stage of the computation however needed. Read more in the tutorial.

Closing Notes……………

There is a lot happening in this space. The need for interactive analysis and stream processing is a must for any big data implementation.

Google has been pretty much the innovator of Map/Reduce way back in 2003, then Dremel  They are pretty much leading this space with open source world really quick to capitalize on the same and bringing them to market. On the contrary the Rest of the Google world (Microsoft, Oracle, IBM…) have a long way to sprint to keep up. The easiest way from them to accept an open source implementation and roll it out quickly example HDInsight.

Its very clear that MSFT is pushing Impala in the Interactive space alongside HDInsight.



Sunday, 23 June 2013

IaaS Inherent Part of the PaaS Architecture


We wish for a perfect world, honestly this exist only in utopian terms and so does an Architect realize while architecting a solution for the Cloud PaaS there is not perfect architecture. There is an architecture which fits the bill for the moment based on the shortcomings.

Normally while migrating an on premise application on Cloud PaaS we run into multiple scenarios where we see PaaS is not a perfect fit and one soon starts to ponder what are the alternatives and easiest is “Have that application or component build and deployed on an IaaS” as this would be very similar what we already have on premise.

So IaaS is something which is a solution in the short term. With Windows Azure correcting there mistake and bringing Persistent IaaS in 2012 and providing better basic features like clustering and support early this year, IaaS does become an attractive choice.

So what does Persistent VM in Azure really have to offer?

  • Storage: Persistent Storage – Easily Add new Storage.
  • Deployment: Build the VHD in the cloud or build on premise and deploy,
  • Networking:Internal end points are open as default. Access control with firewall or guest OS. Input endpoints controlled through management portal , services and API.
  • Primary Use: Application that requires persistent storage easily run on Windows Azure.

What OS images Azure IaaS comes with?

  • Windows Server 2008 R2
  • Windows Server 2008 R2 with Sql Server 2012 Evaluation
  • Windows Server 2008 R2 with BizTalk Server 2012.
  • Windows Server 2012
  • Open SUSE 12.1
  • CentOS 6.2
  • Ubuntu 12.04
  • Suse Linux Enterprise Server SP2.

Which key Server Applications does Azure IaaS support?

  • Sql Server 2008 , 2008 R2, 2012 ---> Note Sql Azure comes with very strip version of Sql Server so in case one is planning on using anything beyond transactional , one has to look at Sql Server on IaaS ( SSAS, SSRS, SSIS…..)
  • SharePoint 2010 , 2013(assuming) –> Note: Given Sharepoint Online is strip down version of SharePoint one will have to look at SharePoint Server on IaaS for more functionalities.
  • BizTalk Server 2010 – The BizTalk PaaS is in its infancy very limited features EDI / EAI integrations. A complete reference can be found here.
  • Windows Server 2008 R2, 2012

* The biggest work load on the cloud for any enterprise application is Sql Server on IaaS. This list is going to grow over time. There is also customer support for the above list.

What is the difference between Virtual Machines and Cloud Services?

The Virtual Machines that one creates are implicitly on Cloud Services. The Cloud Services may appear to be segregated from VM but apparently they are not.

To explain things better. Lets take an example.

Let assume we have a cloud services with Web Role ( 3 instances) and Worker Role ( 3 Instances). The Cloud Services acts more like a container consisting of Web and Worker Role,

  • its like a management container when one deletes/update the cloud services it deletes all the entities in it.
  • Its also a security boundary i.e roles in the same cloud service can interact with one another which cannot be done across cloud services unless they explicitly allow it.
  • Its a network boundary, each of the roles are visible to each other on the network.



When creates a Virtual Machine (which are roles with exactly one instance) they are in an implicit cloud service.


When one creates a VM it appears in the VM section of management portal and not under the cloud services. The implicit cloud service is the dns name which is been assigned to the virtual machine. So for example if one has created the first virtual machine with the name mymachinedemo and creates the second virtual machine with the same name mymachinedemo and chooses to create the virtual machine to connect with an existing virtual machine it will give a list of existing virtual machines. So essentially the cloud services act as a container for the virtual machines.


When one creates multiple virtual machine via the option of “connect to an existing virtual machine” what it does it places the new virtual machine under the same cloud service and then the dns name will start showing in the list of cloud services.


The hiding of the cloud service only happens in the portal

Images and Disks, What are these?

Images are base images provided by the create from gallery functionality where one has a bunch of pre-existing images of Operating System, Post creation of the Images you get is an OS disk which your specific operating system disk and associated with these are data disk. By the way the disk are writable disks for Virtual Machines. The VM sizes supported by MSFT currently and subject to change, Additionally you have 28 & 56 GB RAM sizes as well.


The Data disk can go up to 1TB in size. One can attach multiple data disk with one VM.

Images and Disk are stored as Windows Azure Storage Blobs, Data is triplicated i.e 3 copies. It also supports Disk Caching read and readwrite.

OS disk size is about 127 GB.

What is availability story around virtual machines?

The service level agreement for 99.95% for multiple role instances(web and worker) which 4.38 hours of downtime/year. Multiple role instance ideally means 2 VM in the same role. So idea is to a minimum of 2 vm in a role. What’s included in the 99.95% is

  • compute hardware failure (disk, cpu, memory),
  • Data Center failure - network and power failure.
  • Hardware upgrades- Software Maintenance – Host OS Updates

What is not included? – VM Container crashes, Guest OS Updates.

What does this SLA means VM ?

It means if one deploys 2 instances of the same virtual machine in the same cloud service (dns name) which the same availability set then one gets a 99.95% SLA.


What is the concept of availability set?

By default for every role which has 2 instances Windows Azure create 2 instances in Fault Domain and 2 instances in Update Domain. i.e if you have defined Fault and Update Domain . Fault Domain gets defined on the basis of single point of failure in this case its  the top of rack router.

Fault Domain represents groups of resources which are anticipated to fail together i.e same rack or same server. Fabric spreads instances across fault or at least 2 fault domains.

Update Domain represent groups of resources that can be updated together.

The availability set comes with the same concept of a fault and update domain concept. So for example if you had 2 instance of the same vm defined in an avail. set , you are going to get instances of same vm in fault and update domain i.e a bare minimum. So in all there are 6 instances of VM running.


The story would be incomplete without proper Networking capability of Azure IaaS.What are the options?

So what has MSFT done for Azure IaaS networking, some of the features include

  • Full control over machine names
  • Windows Azure provided DNS- Resolves VM’s by name within the same cloud services. Machine names are modelled and explicitly published in the DNS services
  • Use an on premise DNS Server.

Note: In PaaS Web and Worker communication happens via messaging in the VM world its DNS lookup.

Protocols Supported

  • UDP traffic supported – Load balancing incoming traffics and allows outbound traffic
  • Support all IP Based Protocols (VM to VM communication)- Instance to instance communication TCP, UDP & ICMP.
  • Port Forwarding – Direct communication to multiple VM’s in the same cloud service.
  • Custom Load Balancer Health Probes- Health check with probe timeouts. HTPP based probing, allowing granular control of health checks.

Load Balanced Sets for IaaS

Similar to Avail. Set is Load Balance Sets which allows a set of VM within the same cloud service to be load balanced.


Load Balance with Custom Probes

In IaaS there are no agents installed on the VM , so there was a requirement to define a point which could be probed example /health.aspx which is the probe path, if we get an HTTP 200 it assumes everything is healthy.


Cross Premise Connectivity

Connecting with on Premise Active Directory or connecting on premise network Windows Azure Connect has been around, but not very well accepted. Windows Azure Connect using IP Sec tunnelling concept and agent hosted on both the machines which need to communicate. If one is doing a domain join with Azure VM the problem has been is to have the Agent install on the DC which has not gone very well with the many.

Alternatively Site to Site Connectivity – Windows Azure Network came into play. It provides a virtual network and gateway. The gateway using a standard VPN device.


What would one need to take care when migrating application to Azure IaaS?

  • Sql Server installed on IaaS needs to be clustered so having 2 instances of the Sql Server in the same cloud service will help. One may need size up the data disks required.
  • Built in Load Balancing Support, So if you deploy Web Application on IaaS one can be relieved on LB.
  • Integrated Management and Monitoring provided by Azure itself for VM;s
  • Fault and Upgrade Domain for all VM’s are a must.
  • Windows Azure Network can be used to connect with an on premise application, domain join
  • Hourly Billing Support: In addition to making it easier and faster to get started, these SQL Server and BizTalk Server images also enable an hourly billing model which means you don’t have to pay for an upfront license of these server products – instead you can deploy the images and pay an additional hourly rate above the standard OS rate for the hours you run the software. This provides a very flexible way to get started with no upfront costs (instead you pay only for what you use). You can learn more about the hourly rates .
  • Workloads to be shifted on cloud need to be looked from a compute, storage & networking stand point of view.


The philosophy for the cloud world is lift and shift workloads.

As of 2013 we still need to use a lot of application as is include to cloud and IaaS is an inherent part of overall PaaS. May be in years to come it will be a complete PaaS architecture.


Wednesday, 22 May 2013

Mobile Devices Application Architecture

Mobile Platforms are increasing finding there right places in the enterprises. The Mobile Platform vary from an insignificant non UI based devices to a full-fledged tablets and SMART TV. The compute and storage are a fast moving segment for mobile device capabilities, building applications for these platforms can be a daunting task for developers. Most mobile devices have connectivity in some form or the other and the applications build for devices connect with services which are hosted on premise OR cloud. Concentrating on the mobile device Architecture it would be good to have some guidance for developers what constitute a typical mobile architecture.


High Level Device Architecture

Since most mobile devices as of current are UI intensive the starting point is more around the UI architecture.

The High Level Mobile /Device Application Architecture includes the following components. It’s typically a layered architecture.


Presentation Layer

The UI architecture is heavily driven but certain patterns

  • MVC – The Model View Controller is an architectural pattern that divides an interactive applications into 3 components. The model contains the core functionality and data. View displays information to the user. Controller handles the user input. View and controller together comprise the user interface. A change propagation mechanism ensures consistency between the user interface and model. This architectural pattern came up first with Smalltalk 80 and post that a large number of UI frameworks have been built around MVC and now become a defacto standard for UI development. For more on MVC refer here.
  • Delegation- Most User Interface in mobile devices are rich from a functionality standpoint view and have delegate i.e. transfer information, data and processing to another object typically referred to as a background object. Delegation is a design pattern. For more on Delegation refer here. The delegation pattern is a design pattern in object-oriented programming where an object, instead of performing one of its stated tasks, delegates that task to an associated helper object. There is an Inversion of Responsibility in which a helper object, known as a delegate, is given the responsibility to execute a task for the delegator. The delegation pattern is one of the fundamental abstraction patterns that underlie other software patterns such as composition (also referred to as aggregation), mixins and aspects.
  • Target actions- The User Interface is divided up into View, Controller which dynamically establish relationships by telling each other which object they should target and what action or message to send to that target when an event occurs. This is especially useful when implementing graphical user interfaces, which are by nature event-driven. Most UI on devices are event driven based on the user input raise the desired event and the event handler associated with the same will executed by the event handler associated.
  • Block objects – Most device based application interact with services or other applications for data services or more. The need of having asynchronous call backs will help saving compute. The services have to modelled differently for devices that’s a whole different topic will be covered in a separately.

Management Layer

The management is crucial piece in devices ranging from memory management, state management all the way to device management. Below are some of the key components.

Memory Management-

Device have less usable memory and storage in comparison to a desktop computer, all applications built need to be very aggressive on deleting unneeded objects and be lazy about creating objects.

Foreground and Background Application Management

Applications on device have to be managed differently when in foreground and background. The operating system limits what your application can do in the background in order to improve the battery life and the user experience with the foreground application. The OS notifies your application when it moves from background to foreground which requires special handling for data loading and UI refresh. So typically from transition between these states what does one really have to take care?

  • Moving to Foreground: Respond appropriately to the state transitions that occur. Not handling these transitions properly can lead to data loss and a bad user experience.
  • Moving to Background make sure your app adjusts its behaviour appropriately. Devices which support multitasking have this option in other cases application is terminated. The elementary steps in taking a snapshot image of the current User Interface, save user data and information and free up as much memory as possible. Background application continue to stay in the memory until a low memory situation occurs and OS decides to kill your application. Practically speaking, your app should remove strong references to objects as soon as they are no longer needed. Removing strong references gives the compiler the ability to release the objects right away so that the corresponding memory can be reclaimed. However, if you want to cache some objects to improve performance, you can wait until the app transitions to the background before removing references to them.

Examples of objects that you should remove strong references to as soon as possible include:

  • Image objects
  • Large media or data files that you can load again from disk
  • Any other objects that your app does not need and can recreate easily later
  • Handling Interrupts

The devices are more complicated than one can think classic scenario handling interrupts like an incoming phone call, when an alert-based interruption occurs, such as an incoming phone call, the app moves temporarily to the inactive state so that the system can prompt the user about how to proceed. The app remains in this state until the user dismiss the alert. At this point, the app either returns to the active state or moves to the background stat. In most devices in this state the application don’t receive notification and other types of events. There needs to be some nature of application state management.

State Management

Irrespective what state the application is in foreground, background or suspended, the application’s data has to stored and restored. Even if your app supports background execution, it cannot run forever. At some point, the system might need to terminate your app to free up memory for the current foreground app. However, the user should never have to care if an app is already running or was terminated. From the user’s perspective, quitting an app should just seem like a temporary interruption. When the user returns to an app, that app should always return the user to the last point of use, so that the user can continue with whatever task was in progress. Preserving and restoring the view controllers and view is something which has to be implemented application specific. State preservation needs to considered at the following scenarios

  • Application delegate object, which manages the top level state
  • View Controller object which manages the overall state of the app’s user interface.
  • Custom View, custom data.

State preservation and restoration is an opt-in feature and requires help from your app to work. When thinking about state preservation and restoration, it helps to separate the two processes first. State preservation occurs when your app moves to the background. The restoration process uses the preserved data to reconstitute your interface. The creation of actual objects is handled by your state management.

Core Data Management

The Model in the MVC in the Presentation Layer holds references to business data which may be displayed in the views. The Models generally can end up holding a lot of data can become a major performance issue. The Models should ideally load data which is most relevant to the scenario with support for lazy loading. The responsibility of loading Core Business Data in the Business entities is handled by the Core Data Management. Core Data is generally a schema driven object graph management and persistent framework. Fundamentally, Core Data helps save the model objects, retrieve model objects from the business layer.

  • Core Data provides an infrastructure for managing all the changes to your model objects. This gives you automatic support for undo and redo, and for maintaining reciprocal relationships between objects.
  • It allows you to keep just a subset of your model objects in memory at any given time,
  • It uses a schema to describe the model objects.
  • It allows you to maintain disjoint sets of edits of your objects. This is useful if you want to, for example, allow the user to make edits in one view that may be discarded without affecting data displayed in another view.
  • It has an infrastructure for data store versioning and migration. This lets you easily upgrade an old version of the user’s file to the current version.
  • It interacts with Services or Business Level API to perform CRUD on the business entities.

Core Data Management is runs on a background thread(s).

Application Resource Management

Aside from images and media files, most devices have a lot more capability ranging from accelerometer, camera, Bluetooth, gps, gyroscope, location services, telemetry, magneto meter, microphone, telephony, Wi-Fi….. Most of these have platform based API to access the same. In certain cases there may be a requirement to manage this resources for example video telephony module. The nature of the API provided by platform provides direct access mechanism to access these resources.

Services Helper Layer

The Services Helper Layer on the device serve the purpose providing caching, api, logging or notification services. The Service Helper Layer function more as an assistance to the device hiding the complexity of the inner working thus simplifying the programming model for the developer and not having to worry about low level functions.

Caching Services

The application running on the devices does have limited memory and compute. Most UI elements, pages, data may require to be in memory while they may not be displayed. A efficient disk and memory based caching strategy needs to implemented for the application for better performance.

API Services

In the cloud world or on premise world the devices are likely to talk via services. These services are infact well defined API’s of the business. APIfying the services is a core concept on the cloud which will be covered later. A standard API is


The API Services will interact with the Services API and manage the lifecycle of the service call asynchronously. It will manage aspects such as format conversion, lazy loading and much more.

Logging Services

This is a generic services provides for application level and business level logging. The data collected will be sent back to the Services serves multi-purpose use.

Notification Services

This is asynchronous module which will manage application based notification and events from the services and other application