Saturday, 5 May 2012

PaaS (Platform as a Service)–The Choice for New Applications on Cloud

PaaS or Platform as a service as a concept has been well received, however one really needs to understand when is it likely to hit the mainstream. In this post I will start with the the basics of PaaS and IaaS, dig deeper into PaaS,  notes on Windows Azure PaaS programming model & lastly what’s the roadmap of Windows Azure really looking like. As usual a disclaimer “ this post is my personal views I don’t write for MSFT”. Humble request to the readers would really love have some feedback. Happy reading…

Cloud platform technologies are broadly divided into 2 categories PaaS & IaaS. Amazon Web Services(AWS) Elastic Cloud (EC2) first hit the market in IaaS segment. PaaS is something we are given to believe is expected to hit  the mainstream soon, the question is when & with the new developments I see the timeline just stretching.

The key point is that IaaS is dominant in the market, its about 10 times the market share of PaaS (courtesy:Gartner Inc.). It sounds little disruptive but Azure is adding true IaaS support by the end of this year.

If we look into Windows Azure today which is purely PaaS what exists as of current in the Azure Platform is

  • Web/Worker roles
  • Persistent VM roles expected to hit the market later this year. (VM Roles already exist as of current and are not very useful). Persistent VM Role is true IaaS functionality.
  • Web Sites expected to hit the market later this year.

 

Getting Definitions right….

Understanding IaaS:  Understanding IaaS from a scenario per say. 

image

From an example standpoint of explaining IaaS , a developer is running a multi tier application and has to deploy this application on a cloud would include the following steps

  • Choose a pre installed VM which included the OS & the database.
  • Choose a pre installed VM which included the OS and Application support such as IIS
  • Provision database and create the tables and add data
  • Install application
  • Configure the load balancer
  • From time to time manage the VM’s and DBMS from a patch management point of view.

 

Understanding PaaS

If one needs to deploy the same application on the PaaS platform, it would look some what like below. The PaaS platform come pre installed with the Database, Application and load balancer

image

The steps involved in deploying the application are only 2

    • Provision database and create the tables and add data
    • Deploy the application

From the abovementioned scenarios PaaS seems much simpler and this simplicity will drive the usage of PaaS in the future.

Benefits of PaaS 

  • PaaS is faster
    • Reason: Theirs is less work for developers to do
    • Benefit: Applications can from idea to availability more quickly.
  • PaaS is Cheaper
    • Reason: There’s less administrative work to do
    • Benefit: Organizations spend less supporting applications
  • PaaS is lower risk
    • Reason: Platform gives so much predefined , the window of error is reduced
    • Benefit:Creating and running applications gets more reliable.

* With all these benefits IaaS is 10 times more popular the question how come? The answer is fairly complex and will explain in the remaining of the post

Drawbacks of PaaS

  • Unfamiliar for developers
    • Its harder to adopt because they much learn the PaaS platform
  • Developer have less control
    • They must work within the constraints of the PaaS technology. Each PaaS technology is different from another comparing Azure from AWS quite different. There is no standardization so moving across PaaS platforms can become very difficult
  • PaaS isn’t identical to an existing on premise environment
    • This can raise fears of vendor lock in , example is Salesforce.com PaaS is completely different and building an application on that can mean married to the same for life.
    • Moving existing on premise application to PaaS can be hard. There can be a considerable amount of rewrite on moving existing applications to PaaS.
  • PaaS supports fewer useful scenarios  than IaaS . IaaS in its current form is much more flexible to allow on premise application to move to cloud.Lets take a quick comparison from scenario standpoint between PaaS & IaaS

 

Scenarios IaaS PaaS
Running New Cloud Native Application Yes Yes
High Performance Computing and Big Data Yes Probably
Running a Standard Database Yes No
VM’s for a Dev/Test Lab Yes No
Running existing Web App/Sites Yes Maybe
Running Standard Packaged Apps Yes No
Virtual Data Center (VM;s for on Demand Use) Yes No
Disaster / Recovery similar to the on Premises world Yes No
  • Running New Cloud Native Application works fine on PaaS as long one does have issues with the vendor lock.
  • HPC and Big Data very apparent in IaaS world, in the PaaS still getting there, again moving an existing HPC on premise to PaaS may not be possible
  • Running a Standard Database such as Sql Server or Oracle is not supported by PaaS as of current.
  • VM’s for Dev/Test Lab not possible on PaaS
  • Running existing Web App/Sites on PaaS not possible as of today
  • Running Standard Packaged Application such as SAP, SharePoint on PaaS not possible.
  • Virtual Data Center a fantastic offering from IaaS not possible on PaaS
  • Disaster Recovery , IaaS can be a good foundation which replicates the on premise world on cloud. PaaS however cannot do that.

IaaS addresses a lot more scenarios than PaaS. On the contrary there is still an argument i.e cost of operation Vs. abstraction.

Cost of Operation Vs. Abstraction

From a cost of operation standpoint the physical machines are the most costly and least level abstraction, then came virtual machines which brought down the cost further and increased the level of abstraction.Subsequent to this we see the IaaS which reduced cost of operation further and increased the level of abstraction further. Finally came PaaS which reduces the cost of operation further and increased level of abstractions. How long will it take for the enterprise take to move into PaaS no correct answer?

 

Benefits of PaaS – A Closer Look

Dwelling into the benefits of PaaS the platform on which the conclusion are drawn is Windows Azure. Looking at following key parameters

  • Application Design
  • Application Development
  • Application Test
  • Application Deployment
  • Storage
  • Administration & Management

 

Application Design

  • The starting point on Application Design on PaaS is at much higher level from a design point of view there are lesser things to do.
  • Virtualized Images which is important in IaaS one doesn’t need to bother in PaaS. So in way one need not look at security too much in depth when designing for PaaS
  • Designing for redundancy at a VM level is not required as PaaS manages it internally

 

Application Development

  • PaaS provides a lot more services than IaaS a developer needs to write lesser code.
  • PaaS hides most of the configuration related stuff and developer has to do very little. In scenario where you have teams working globally integration problems stemming from diverse environment are reduced as there is very little for configuration and the environment is one (Azure).

 

Application Testing

  • As there is lesser code to write apparently there is lesser code to test
  • Azure provides single environment to test
    • Teams don’t need their own test platform
    • Test teams don’t need to understand and track configuration changes

Application Deployment

  • One key thing in PaaS as a developer one gives the tested code to the PaaS platform (assuming the role level segregation) and PaaS is responsible for deploying, so the timeline for deployment comes down in contrast to this IaaS is the same as on premise deployment.
  • Another important feature of PaaS is “in place update without downtime”. Updated applications can be deployed in place without any downtime. Again this is a platform feature.
  • Caching and Storage is inbuilt feature of PaaS , developer can use this in their code without really bothering about the setup or configuration related details.

Storage

  • Considering one is using on cloud storage there is zero administration.
  • HA comes in automatically
  • Data is replicated automatically: Doing backup solely for recovery failure is less necessary.

 

Administration and Management

  • No need for administrators
  • No need to management team

 

*In my next post I will be publishing a comparison of the actual data on timeline for building an on-premise application vs. on a PaaS platform & the complexities associated with the same

 

 Getting Into Windows Azure Programming Model

Why is there a need to create a new programming model?

The PaaS platform comes with a lot of pre canned features and in order to effectively use it one has to follow a certain discipline which eventually is a new programming model.

PaaS sets in some ground rules… they are

  • Role Segregation: PaaS ideally segregates the applications into roles ex: web role, worker
    • Web Role which accepts request from users (Web Role synonymous to IIS)
    • Worker Role: Runs code
  • Multiple Instances: PaaS application runs multiple instances of each role. PaaS has an SLA of 24X7 availability so the bare bone requirement of this is 2 instance to manage HA. Its not mandatory to have 2 instances of each role and function without HA.
  • Application Behavior: If one of roles (which the application is hosted) fails the applications should behave correctly.  Its required that Application have to survive failure of any instance. This is a hard rule. What does it mean
    • Storage must be external to Web/ Worker role instance. An instance shouldn’t store data locally.  It should use Sql Azure , Tables or blobs to store the state. Most of us many think of the lines of components been stateless. Stateless is a confusing term.
    • Interaction between Roles should be generic: In other words Web/Worker role should not care which instance of another role it interacts with. Example a Web role instance in time may open a tcp/ip connection to a specific worker role and hope that the worker role continues to live in the bigger scheme of things. Understanding the basic premises that communications across roles also needs to be loosely coupled and the expectation that next time the web role is going to connect to the same worker role is not appropriate as “The worker role may been recycled and all the state is lost. Go with the basic assumption that any role can fail any time and that’s way the PaaS platform wants you to build.
    • No Sticky Sessions in PAAS:  A client shouldn’t assume that all of its request will be handled by the same Web Role Instance.

There are constraints around how you build the application which needs to run PaaS there is a rationale as to why these constraints have come into existence.

Fabric Controller – A Background

Most PaaS implementation has this component called the Fabric Controller and all the machines in a particular data center are its ownership.

  • It creates and monitors role instances on those machines.
  • It starts new instances when – a new application is deployed or an running application fails or when it needs to update system software in an instance virtual or physical machine.
  • The FC is smart enough not to assign the same roles of the application on the same physical machine.

Fabric Controller 101….

Lets say we have a set of computers to be exact 40 of them each with 4 cores. We have a total of 160 cores at our disposal. There is a need to run a variety of applications on these cores. So architecturally speaking I would need a central software which we call is a Controller and I would need Agents installed on all the computers.

1.An application run request would come to the Controller. The controller has a complete inventory which computer and which core is been assigned to what application.

2.The controller finds the appropriate computer passes the application binaries to agent (computer) which in turn has running virtual instances of Windows Server 2008 .

3.The agent picks up one of the virtual instances and hands them over the binaries.

4.The application binaries are scanned for the type of role if it’s a web role the binaries are copied to c:\inetpub\wwwroot\ creates a virtual directory & application sends the endpoint back to the agent.

5.The agent in turn sends the physical endpoint to the controller.

6.The controller registers the endpoint into some kind of registry. The logical endpoint is something which is given to the end-user.

7. The FC can kill any of running instance at any point of time.

Somewhere in the description one will soon realize there is an service bus also initially called the internet service bus.

 

Microsoft’s Fabric Controller

Microsoft’s data center stores all the data of Windows Azure storage and all Windows Azure applications. Windows Azure Fabric Controller controls manages the servers, the set of machines which are dedicated to Windows Azure and the software that runs on the Microsoft Data Center. Windows Azure Fabric Controller is a distributed applications that is replicated among a group of machines.It has its own of resources in its own environment like computers, load balancers, switches etc. Windows Azure Fabric Controller can communicate with the fabric agent on each machine. It keeps track of all Windows azure application in the fabric.

image

This helps the Windows Azure Fabric Controller to perform useful activities like monitoring all the running applications. The Windows Azure fabric controller decides where new applications will run and also selects the physical server so that hardware is utilized optimally. This is achieved using the configuration information which is uploaded with each Windows Azure application. The FC controller achieves this using the configuration information which is uploaded with each application on Windows Azure. The configuration file is an XML file which explains the various instance of the application, the number of virtual machines to be created for the applications/

Because of this understanding the FC does a number of things like monitoring all running application, decides where a new application should run , optimize hardware utilization by choosing the physical server.

OpenStack Compute fabric controller is called Nova.

Windows Azure is a 1 million core machine and I’m assuming the FC in itself is Server Farms locally and distributed.

 

Interacting with the Operating System

In PaaS at any given point of time your code will never interact with the operating system directly , the FC own the OS. It updates each’s OS when necessary. Any changes made must be applied each time an instance starts. Any changes made from a configuration stand point have to reapplied each time an instance starts. In case there is a requirement to have software which is not already there at the platform level what does do.  This can be done in more than one ways lets say you need to have telerik support on your web role you need install this every time the role starts up. In case there are too many things to be installed the “time to get started will be too long” and this solution may not look feasible.

This is scenario we can use current VM role provided by Azure where the developer gets to supply the image but any changes to VM are lost at every restart this is a problem hence MSFT will be including persistent VM Role where the state is stored in the blob.

Summarizing- PAAS Programming Model

  • Application are more available and cheaper to run on PaaS
  • What it offers
    • Protection against hardware failures
    • Protection against software failures
    • No downtime application updates
      • With a single step update called the whipsaw
      • With a rolling update using update domain
    • No downtime system software updates
    • No administrative efforts

Moving Applications to Windows Azure PaaS

  • An ASP.NET application with multiple load balanced instance that share state stored in Sql Server
    • An easy move
    • Perfect fit for PaaS platform
  • An ASP.NET application that runs multiple instances that maintain per instance state and relies on sticky session
    • Requires some work
  • A client accessing WCF services running in a middle tier
    • If the service don’t maintain per client state between calls , an easy move
    • Otherwise some redesigning effort is required
  • An application with a single instance running on Windows Server that maintains state on its own machine
    • Some redesign needed
    • This application might run well in Persistent VM role.

 

Innovative Business Idea: Writing a Migration Tool for on premise windows application to Azure….

Introduces Web Sites in Azure

PaaS & IaaS are cloud platform technologies. Cloud computing and hosting which were 2 different worlds  years are no longer separate. Customers can now buy a wide range of platforms offering from various service providers which include IaaS & PaaS or could be buying hosting servers by the month.

Cloud categorization has predominantly understood as IaaS, PaaS & SaaS a more cleaner way to look is SaaS, IaaS, cloud Platforms like Azure, AWS & Private Cloud.

Hosting - Common Technology options today

image

Difference Between Hosting to Cloud Computing

image

Hosting & Cloud Computing- Categorizing options

image

 

Azure is likely offer monthly pricing for Persistent VM Role & Websites which are actually a part Windows Azure. Microsoft will also make WebSite software for service providers.

Windows Azure WebSites- Provides shared hosting for WebSites- Application can also access other Windows Azure roles.

WebSites are different from Web/Worker Roles on following accounts

  • Web Sites provides a standard IIS Web environment it supports sticky session. Web Role are stateless low admin application.
  • Web Sites will help in running existing Web Application unchanged on Azure as compared to Web Role which mandates a change depending on how the application is written
  • WebSites are shared on the same virtual instance on contrast Web Role which is dedicated to a virtual instance.
  • Web Sites are best suited for new and existing small to medium Web Sites/ app on contrary Web Role are meant for large cloud apps.
  • Application deployment for WebSites its liking a creating a new site on an existing VM on Web Role it’s a new VM Role
  • WebSites allow deployment of updates without downtime same as Web Role

 

Web Sites in Azure will help bring customer onto Azure with lesser effort this is a well thought out strategy for the long term.

Windows Azure provides multiple choices for the customer to move to cloud & it is worth the effort in terms of reduced costs.

 

 

 

 

 

 

Tuesday, 17 April 2012

Amazon–Microsoft SharePoint on AWS–Reference Architecture

Amazon – AWS has seemed to have caught the imagination of a lot folks lately with its “increasing love for Microsoft products”. AWS provides a complete set of services and tools for deploying Windows workload including Microsoft SharePoint. Some kind of a satirical comment apparently no, just new out of Amazon yard is “Microsoft SharePoint on AWS – Reference Architecture”.

AWS & MSFT have partnered to enable customers to deploy enterprise class workloads involving Windows Server and Microsoft SQL Server on a pay as you go, on-demand elastic infrastructure, thereby eliminating the capital cost for server hardware and greatly reducing the provisioning time required to create or extend SharePoint Server farm. The on-demand elastic infrastructure on Amazon is somewhat of an icing on the cake and one need to really understand how it can really help. The pay as you go model is interesting and how does it differ from Microsoft SharePoint online.

Amazon seemed to have used SharePoint for its own corporate intranet; this is really interesting to see “Dog food before sell mantra” really may work.

Thursday, 15 March 2012

Talk over coffee- Hybrid Cloud Intro series….

 

A walk into an organization IT department & we find a large number of applications which support the business of the organization and are critical by nature & the  not so critical ones, we often get into this debate with the customer I’m really not ready to move the complete stack into cloud I’d prefer a fair mix or private & public cloud. The customer may come back telling I only want to use the compute of the public cloud in certain peak situations. This discussion is acceptable as this is a real problem and so what we have as a solution the hybrid cloud. Hybrid cloud is a mix of both we yet have to get to clear boundaries of how much is the mix.

Getting Definition of Hybrid Cloud hopefully Correct….

So by definition what do I mean by hybrid cloud? As defined by NIST…

Hybrid cloud. The cloud infrastructure is a composition of two or more distinct cloud infrastructures (private, community, or public) that remain unique entities, but are bound together by standardized or proprietary technology that enables data and application portability (e.g., cloud bursting for load balancing between clouds).

Some real disruptive thinking………………………

Using two sets of criteria to define cloud deployment models roots inconsistency and ambiguity.

As defined in SP 800-145, a hybrid cloud is a composition of infrastructures, yet at the same time a private cloud and a public cloud are defined according to their intended audiences. The change of criteria in classifying a hybrid cloud roots inconsistency and ambiguity in the deployment models presented in SP 800-145. Forming a concept with two sets of criteria is simply a confusing way to describe an already very confusing subject like cloud computing.

"Hybrid cloud" is an ambiguous, confusing, and frequently misused term.

A hybrid cloud is a composition of two or more distinct cloud infrastructures (private, community, or public) as stated in SP 800-145. That is to say that a hybrid cloud can be a composition of private/private, private/community, private/public, etc. From a consumer’s point of view, they are in essence a private cloud, a private cloud, and a public or private cloud respectively. Regardless how a hybrid cloud is constructed, if it is intended for public consumption it is a public cloud, and if for a particular group of people it is then a private cloud according to SP 800-145. Essentially the composition of clouds is still a cloud and it is an either public or private cloud, and cannot be both at the same time.

For many enterprises IT professionals, a hybrid cloud means an on-premise private cloud connected with some off-premise resources. Notice these off-premise resources are not necessary in reality a cloud. In such case, it is simply a private cloud with some extended boundaries. A cloud is a set of capabilities and must be referenced in the context of the delivered application. Just placing a VM in the cloud or referencing a database placed in the cloud does not make the VM or the database itself a public cloud application.

The key is that a hybrid cloud is a derived concept of clouds. Namely, a hybrid can be integrations, modifications, extensions, or a combination of all of cloud infrastructures. A hybrid is nevertheless not a new concept or a different deployment model and should not be classified as a unique deployment model in addition to the two essential ones, i.e. the public and private cloud models. A cloud is either public or private and there isn’t a third kind of cloud deployment model based on the intended users.

“Hybrid cloud” is perhaps a great catchy marketing term. For many, a hybrid seems to suggest it is advanced, leading edge, and magical, and therefore better and preferred. The truth is "hybrid cloud" is an ambiguous, confusing, and frequently misused term. It confuses people, interjects noises into a conversation, and only to further confirm the state of confusion and inability to clearly understand what cloud computing is.

 

Hybrid Cloud 101….

Needed a basic 101 on hybrid , this a hybrid cloud video by VMWare , not that it mandates  VMWare knowledge the idea is get the concept clear

.

Cloud like any other IT trends needs a long term vision and short term adaptive strategy which changes based on the market requirements, not to mention the long term vision continues to guide the overall direction. The hybrid cloud is a reality and does fit into the long term vision however the short term strategy for the same is skewed with each using the hybrid space as a sales pitch example MSFT , Amazon & VMWare.

 

Understanding the Hybrid Cloud Play……

image

  Referring to the diagram above most organization start of there Private Cloud experiment with virtualization and first step is virtualized data center, next to come into the Private Cloud is the line of business application and finally data storage.

Community Cloud is sharing of private cloud within /between organizations.

The Public Cloud pretty much breathes the NIST guidelines wont get too much into that what we are trying to focus on the strategy of hybrid and the public cloud picture is drawn in retrospect to the same. The key element in the Public Cloud is the business strategy (in conjunction with hybrid cloud only).

Hybrid Cloud is the entire picture inclusive of the public & the private cloud.

 

Precision Questioning for the Hybrid Cloud…

Getting past the definition & 101 phase the next question pops up what and when to use Hybrid Cloud.  Precision questioning can help you navigate the Hybrid Cloud discussion better.

1. What are the New Non Mission Critical Functionality that are planned to be built in next 2 years:  This is a very good candidate for the cloud its new functionality required can be a candidate for PAAS or SAAS depending on the nature of the workload.This is typically a low risk low returns item can be tackled very easily. Driving the value proposition in this query is not difficult.

2. Can some applications directly benefits by virtualization? This can be a quick turn around for the question “maximizing the use of the resources”, However this does not bring down the overall cost of operation whether public or private. This typically is a low hanger.

3.. What are the New Mission Critical Functionality/Accessing Private Data that are planned to be built in next 2 years? ex: Payroll System. This conversation can be fragile and this will must probably be driven into the private cloud of the organization. This is high return and low risk if ported on private cloud as most of the phobic queries are directly addressed “we are in private cloud”.

4. Do you see any of your application which may need to scale on an event or some specific time period of the year? ex: tax filing applications.  The idea here is use the cloud to scale the existing workload.

5. Refactoring Existing Functionality , this is where the existing ESB is taken and made cloud ready. This is a difficult conversation to have with the customer it completely depends of ESB’s state of readiness to move to cloud.  Depending on how the discussion with customer goes this most probably will turn out to be a low priority.

6. They Final Key Question “How much of existing private data would want to be moved to the cloud”. This is high risk and very high return item. What is max which can work here is a private cloud.

7.  “Migration of Existing Functionality i.e existing line of business application to the cloud”.

image

Architecting Solutions that span Private & Public Clouds

  • On Premises IT with Off Premises Cloud.
  • On Premise IT with Multiple Off Premises Cloud
  • Cloud Striping
  • Distributing Community Cloud
  • Lift and Shift
  • Batch at Scale/ Bursting
  • Images in CDN
  • Adding Odata
  • Shopping Cart
  • Compliance
  • Partner + PaaS
  • Cloud + Optimized
  • Outsourcing
  • Synchronization
  • Scaling and Caching
  • Pull back
  • Big Data
  • Consumerization

Research Speak……

NIST & Gartner have come out with some models on the hybrid cloud. NIST seems to be well defined and easily understand in terms of what they mean Hybrid.

Below is the NIST definition represented.

image

The Gartner definition revolves around what they call is Hybrid IT which acts a broker and switches between internal and external cloud.

image

The Gartner Definition is very sketchy and no clear guidelines.  The IT organization acting as a broker is a very noble concept but putting into practice the actual implementation will call for a very different conversation as we look at the overall Hybrid IT strategy from an true implementation point of view.

The Gartner report can be found here https://skydrive.live.com/redir.aspx?cid=b4c4034e55e4a63f&resid=B4C4034E55E4A63F!598&parid=root.

 

Implementation excerpt from the Gartner Report…..

Hybrid IT is likely to rely on hybrid clouds. Hybrid clouds are a connection or integration between two clouds — usually between an internal private cloud and an external public cloud. Hybrid clouds are constructed by using software or hardware appliances that enable applications and data to more easily migrate among connected clouds. For example, many applications are dependent on identity management systems to authenticate users or to consume terabytes of data, or they have
deterministic input/output (I/O) latency requirements. These dependencies often prevent applications from migrating to the external cloud. Hybrid cloud solutions solve each of these dependencies in unique ways.
Essentially, two types of hybrid clouds exist:
■ Service interface-based: The service interface-based hybrid cloud utilizes an appliance to present a list of cloud services to the end user (i.e., the cloud consumer). When the user selects a cloud service, the appliance redirects the user to an internal or external cloud service based on the consumer's identity.
■ Infrastructure-based: The infrastructure-based hybrid cloud is essentially a software or appliance bridge designed to augment internal IT resources and integrate two clouds by connecting the back-end infrastructure of an internal cloud to one or more external cloud services.

Closing Comments…….

By definition no one has built a pure hybrid cloud component, taking an example of large multinational what they have an on premise and off premise kind of IT infrastructure which kind of complements the Gartner story revisiting the definition this what comes to my mind.

“An integrated infrastructure whose unique assets, although separated by well defined boundaries are connected via a standardized proprietary technology to broker data and application interoperability in order to optimize computing resources, increase shareholder values and reduce risks”. 

Hybrid Cloud More detailed post coming next …… Applying cloud attributes based on NIST for Hybrid Cloud is something what I plan to write next.

 

 

 

 

Saturday, 10 March 2012

Cloud Based Application Architecture–What is it really boiling down to….… CQRS!!!


My earlier post on Architecting the Private Cloud shed light on real life scenario based on some of the customers experiences on what to ask, what are the steps on moving into Private Cloud and an example on mapping the services to cloud attributes.  In the cloud we are governed by the cloud attributes to a large extent. In this post I want to take a dig into “what dwells into an average developer or architects mind on moving an on premise application on to the cloud”. The cloud can be private or public doesn’t matter , the assumption the private cloud architecture is in line with earlier post holds here.
What am addressing is post is “What are the software pattern & frameworks that are emerging in the cloud scenario” & What do customers do in practice.
As the new technologies come in they challenge some of the existing thinking, pattern way of writing of software. In the pre cloud world a standard distributed application breathed the following
  • Synchronous:  Does Request/ Response ring a bell, not far long ago in the web world this was a standard defacto and everyone just loved programming around this pattern.
  • Dependency
  • Tightly coupled architecture
There is effectively no absolute right or wrong “ its relatively right/wrong” What stood as good design practice a decade ago now probably is bad relatively speaking, in the cloud world what relatively make sense is
  • Asynchronous
  • Independent Layers
  • Loosely coupled architecture 
The above patterns is something we have seen in past couple of years and have used it in some form or the other, The SOA – ESB pretty much lives on these patterns. Frankly speaking what has been extended to the cloud is SOA – ESB Next Generation.
So what’s happening differently here is “Asynchronous nature” the programming paradigm is becoming a standard for Windows 8.
In the past we had the Presentation Layer which is tightly coupled to the Business Layer, a lot of the architecture guidance has been around moving this to a true n- tier architecture which involved decoupling the Presentation and the Business Layer by introducing a Services Layer & decoupling the Business/ Data Layers using something like ORM.  Essentially this is horizontal decoupling.  Most of our current day applications exists in this form.
The question really boils down “Is this architecture good enough for cloud?”
How can be this taken to a cloud deployment may amazon (aws) or windows azure so evidently this is what it looks like
image

Depending on the nature of technology you can choose any 1 of them, the above is an Amazon stack. At high level the amazon application architecture would look like something below. 
In Azure the presentation layer would be assigned to a web role and business/application tier would get into worker role. The database would be a standard sql azure or a blob. In case of amazon the presentation and application would be running on something what the call is the elastic bean stalk which is similar to web and worker role combined & database
  • RDS – Relational Storage Service
  • Simple DB
  • S3 – Blob
image
Above is a visual representation on Windows Azure Deployment
The above architecture can scale out with more web or worker role this currently works on the cloud but there is a reason as why do we have to relook at this architecture.

The Problem Statement Defined…

In the above architecture when it comes to presenting or distributing information the multiple viewers its not easy to scale that read portion of the solution without affecting the write portion of the solution so you have the inserts, updates & reads this is an overhead there is a need to isolate the read from writes”. There’s needs to be a segregated IO channel for both reads & writes. Typically in write there will be record locking.
image
This is not new we have seen architectural solution something like ORM which introduces a cache which kind reducing the read latency for the overall application at the cost of memory. The caching mechanism in most scenarios is an after thought.
As always there will be new ways to solve the issue they new term here is CQRS (Command Query Response Segregation)

Command Query Response Segregation to your Rescue….

CQRS first surfaced by Greg Young a complete post is available by Martin Fowler.
Getting the Basics on CQRS
The problem statement from an architectural stand point of view is to segregate the the Reads from the Writes , so at a very simplistic level a notion that you can use a different model to update information than the model you use to read information.
We are used to the idea of thinking of CRUD as a single datastore for interaction purpose.. By this I mean that we have mental model of some record structure where we can create new records, read records, update existing records, and delete records when we're done with them. In the simplest case, our interactions are all about storing and retrieving these records.
The needs to todays are more sophisticated the need to look at the information in a different way to the record store, perhaps collapsing multiple records into one, or forming virtual records by combining information for different places. On the update side we may find validation rules that only allow certain combinations of data to be stored, or may even infer data to be stored that's different from that we provide.

Since the information request is so varied there is need to have multiple representation of the same information, When users interact with the information they use various presentations of this information, each of which is a different representation. Developers typically build their own conceptual model which they use to manipulate the core elements of the model. If you're using a Domain Model, then this is usually the conceptual representation of the domain. You typically also make the persistent storage as close to the conceptual model as you can.
This structure of multiple layers of representation can get quite complicated, but when people do this they still resolve it down to a single conceptual representation which acts as a conceptual integration point between all the presentations.
The change that CQRS introduces is to split that conceptual model into separate models for update and display, which it refers to as Command and Query respectively following the vocabulary of CommandQuerySeparation. The rationale is that for many problems, particularly in more complicated domains, having the same conceptual model for commands and queries leads to a more complex model that does neither well.

By separate models we most commonly mean different object models, probably running in different logical processes, perhaps on separate hardware. A web example would see a user looking at a web page that's rendered using the query model. If they initiate a change that change is routed to the separate command model for processing, the resulting change is communicated to the query model to render the updated state.
Write Operations become Commands are put in a queue , Read Operations are essentially queries.
CQRS as a concept is a set of principles, a way of thinking about software architecture
As a Pattern it is a way of designing & developing scalable , robust enterprise solutions where reads are independent from the writes.
Benefits of CQRS
  • Scalability: Ability to increase or reduce the number of resources without affecting the end user experience.
  • Speed: Faster delivery of information is crucial
  • Reduced Complexity: Improved Maintainability.
My opinion on benefits on CQRS in addition to above is segregated pricing of cloud in terms of reads and writes could be also a key driving factor.

CQRS & the data stores story…

With the read and write separation the data store implementation can be many
  • Can have separate data store for read (on cloud) & write on premises and replicate back to the on cloud read data store.
  • Can have the same data store with 2 different object models.
  • Can have the same data store with same object models implementing the query & command interfaces 

How Does CQRS really solve our Big Problem Statement..

Coming back the architecture n tier architecture in cloud and how does CQRS address it
image
The CQRS implementation has below is a conceptual understanding
  • When users ends up doing some operation that operation becomes a command, that command is stored in the command queue,
  • There some kind of background process typically a worker role & the command handler is running in that process the command handler is looking in queue of commands and individually processing them when its finished processing them it will take them and raise an event that typically ends up in an event queue.
  • At the very same moment there may be a dispatcher running. The dispatcher is also known as an event processor and its looking at the events queue and processing them and dispatching them as a bunch of readable data stores and the readers have the data in the format
Commands are used for writes ,Events are used to Read operations. The command handler and the dispatcher can be in different worker processes or the same .
From a CQRS there are some other patterns which are required to support the overall construct below are the same.

Domain Driven Design (DDD)

The Command and Query end up functioning on a Object Model. From an implementation stand point of view they can have different object models or same doesn’t really matter, the intend has to be the object model very similar to the persistent model.
The model can be based on a complex domain design.  The focus of any model has be on the domain and domain logic and be technology agnostic to a large extent.

Event Sourcing

Cloud Application have to follow the asynchronous pattern. Given that we are in a disconnected , stateless world we need to use some mechanism to communicate between components event sourcing becomes an alternate for the same. Event sourcing is a simple concept its like storing all the user actions and storing it as a time sequential list.
  • Captures all changes to an application state as a sequence of events.
  • Allows developers to determine how much much a given state has reached
  • Also allows developers to reconstruct past states
Event Sourcing - Capture all changes to an application state as a sequence of events.
We can query an application's state to find out the current state of the world, and this answers many questions. However there are times when we don't just want to see where we are, we also want to know how we got there.
Event Sourcing ensures that all changes to application state are stored as a sequence of events. Not just can we query these events, we can also use the event log to reconstruct past states, and as a foundation to automatically adjust the state to cope with retroactive changes.
The key to Event Sourcing is that we guarantee that all changes to the domain objects are initiated by the event objects. This leads to a number of facilities that can be built on top of the event log:
  • Complete Rebuild: We can discard the application state completely and rebuild it by re-running the events from the event log on an empty application.
  • Temporal Query: We can determine the application state at any point in time. Notionally we do this by starting with a blank state and rerunning the events up to a particular time or event. We can take this further by considering multiple time-lines (analogous to branching in a version control system).
  • Event Replay: If we find a past event was incorrect, we can compute the consequences by reversing it and later events and then replaying the new event and later events. (Or indeed by throwing away the application state and replaying all events with the correct event in sequence.) The same technique can handle events received in the wrong sequence - a common problem with systems that communicate with asynchronous messaging

Cloud Scenario #1 – Important concepts of the Cloud are Scalable & Elasticity, the requirement really boils down to application built in terms of components and baked into a service unit. These service unit have a SLA,  service units can be a conceptual lines business application or functional module which has to be stateless. A hypothetical situation could be the fabric controller is running an optimization algorithm and decides to shoot down your running applications and rebuild it on another instance Event Sourcing is something which is much needed here to do a complete rebuild.
Structuring the Event Handler Logic
There are a number of choices about where to put the logic for handling events. The primary choice is whether to put the logic in Transaction Scripts or Domain Model. As usual Transaction Scripts are better for simple logic and a Domain Model is better when things get more complicated.
In general I have noticed a tendency to use Transaction Scripts with applications that drive changes through events or commands. Indeed some people believe that this is a necessary way of structuring systems that are driven this way. This is, however, an illusion.
A good way to think of this is that there are two responsibilities involved. Processing domain logic is the business logic that manipulates the application. Processing selection logic is the logic that chooses which chunk of processing domain logic should run depending on the incoming event. You can combine these together, essentially this is the Transaction Script approach, but you can also separate them by putting the processing selection logic in the event processing system, and it calls a method in the domain model that contains the processing domain logic.
Once you've made that decision, the next is whether to put the processing selection logic in the event object itself, or have a separate event processor object. The problem with the processor is that it necessarily runs different logic depending on the type of event, which is the kind of type switch that is abhorrent to any good OOer. All things being equal you want the processing selection logic in the event itself, since that's the thing that varies with the type of event.
Of course all things aren't always equal. One case where having a separate processor can make sense is when the event object is a DTO which is serialized and de-serialized by some automatic means that prohibits putting code into the event. In this case you need to find selection logic for the event. My inclination would be to avoid this if at all possible, if you can't then treat the DTO as an hidden data holder for the event and still treat the event as a regular polymorphic object. In this case it's worth doing something moderately clever to match the serialized event DTOs to the actual events using configuration files or (better) naming conventions.

Challenges with CQRS

  • Separating Reads from Writes introduce challenges
  • Data Staleness need special handling
    • CQRS doesn’t consider staleness as an exceptional cases, its expected
    • CQRS does not use record locking mechanism
* The SLA of having data staleness down near zero is something which will drive a lot of design decisions ex: banking applications.
Eventual Synchronization of any primary data repository and any caching mechanism must be guaranteed.
Probable Solutions for Data Staleness in CQRS
  • Processing of command changes to the primary data storage mechanism and events publishing should be performed transactionally together.
  • If using a message bus this can be achieved by using transactional queues.

Some the frameworks on CQRS…

Lockad.CQRS – Get the binaries here - https://github.com/lokad/lokad-cqrs/ based on .NET meant for Windows Azure
  • Managing of message contracts, serialization formats, and transport envelopes
  • Sending messages to supported queues: In memory and Azure queues
  • Message scheduling for the delayed delivery of messages to the recipients
  • Message routing for implementing load balancing and partitioning
This is a good place to start.
Ncqrs – Main features find the binaries here - http://ncqrs.org/
  • Command handling, including servicing & execution
  • Event Sourcing support
  • Domain modeling support
  • Additional support for
    • NServiceBus
    • SQLite
    • StructureMap
    • RavenDB
What's happening in the CQRS community?

CQRS is still evolving it is an important pattern and involves many supplemental pattern for supporting cloud based application architectures. 

Saturday, 3 March 2012

Architecting the Private Cloud

 

Private Cloud is next most abused term in current times after SOA. When we get into a conversation with CIO “ lets move onto private cloud” the prompt response comes in lets have a hypervisor and orchestrator and we are done, be it Cisco, VMware, MSFT they all talk the same jargon. I’d like to differ from the standard jargon take a step back. I have written this post based on actual customer interaction I have followed this process. The flow of the article goes from questioning your customer,  the steps involved in architecting private cloud, putting your learnt knowledge to use an example. This article is simple to read chill on…..

Gaining Clarity by Precision Questioning

First and foremost I would ask some questions to the customer just to understand the context correct & get the conversation going in the right direction

1. What do expect to obtain from a private cloud? Some responses that are typical I need resiliency, the current disaster recovery data center is dysfunctional. From this conversation one gets to understand which key attributes have decided to push the conversation of a private cloud & at the same you get to understand there is an indeed business problem.

2.What happens to your IT organization when you deploy a private cloud? Obviously the response is “ haven’t thought about that”. Dwelling deeper into this one would probably get an idea what kind of skepticism is one going to be met with,and what are probable impediments at an early stage and a later stage from an organizational dynamics stand point.

3.Does Investment = Efficiency “What are the steps to IT efficiency” This question is not intended to be around Quality of Services it more around efficiency of IT once it moves to private cloud , some typical  responses are “ does it mean buying new piece of software or getting additional trained resources on private cloud”.

4. Do you think Virtualization = Private Cloud? This is one question where most responses are Yes and we get an idea how much of real education on cloud has the customer had or some marketing pitch from MSFT  or VMware has been there.

5. Does Automation = Efficiency? Define services first or automate first?  This is a tricky question there is no straightforward answer to this some customer may end saying automate first test and some may say I want to build the catalog of services first then automate then either of the approaches are fine

6. When a user selects one more .. what happens? Typical example is I need one more SAP connection in your private cloud what happens is it expected that private cloud will auto provision and hand over the service request in matter of minutes or well it take longer “like I may need order more hardware for every 10 additional SAP connection”. In short we need to get a definition of the service unit what is this one more. There is equal responsibility on the end-user requesting and and the data center sufficing this request, the user is aware there is a cost attached to in terms of compute, storage & network.

7.Does a central console give your application depth end to end monitoring?

Understanding & Drafting The Private Cloud Strategy

 

image

In the green we have a stack of cloud attributes a simplified one from NIST. At the end is link to the NIST guidelines. In the current scenario as it exists today the virtualized instance is provided by hypervisor however the entities that are provide self service portal, service mapping,automatic provisioning, automation resiliency  kind of services are a collection of the application stack and the management stack, This more importantly is a total mess.  They are not segmented properly as one  might expect, In practice at some levels there is no concurrence example the application stack cannot provide for automated provisioning as the components in the underlying architecture are not quite there. And some cases there may  have service mapping but it can be too complicated. The utopian ask will be that we hope to application unit of deployment that could be deployed as is and get to use all the entities as is this is what we call a mature private cloud.  We hope that since we would be at mature private cloud state we would be in a position to move to public cloud more easily.

So what is really the strategy today it’s a purchasing strategy where I buy the technologies with probable entities support and in the near future it will be deployment strategy assuming we have figured out which type unit is deployed on which kind of stack. Azure should eventually be an application strategy.

Purely talking from entities stand point it may not be correct to tell the IT staff to go ahead buy the stack need to go through some discovery process as in some case I will have to rewrite the application stack in some cases the entities fit the bill from feature supportability stand point of view.

Steps to Build a Private Cloud

image

The 3 key benefits of a private cloud are Automation, Virtualization & Integration.

First Step – Standardization

It’s a requirement to standardize the unit of deployment at a component level. Components essentially become the configuration unit a collection of configuration unit become a service unit.  The service unit is defined by the end user example for IT staff a service unit could be bare metal services on the contrary if the end user is a business user it may mean a full blown user with applications or a single application. The service unit is what we finally offer , which gets metered and has a SLA. Example in the below figure.

image

Define clear configuration units that can aggregate and create service unit

image

Second Step – Baseline it.

Baseline essentially means understanding the health of each individual service unit this essentially means do statistical analysis on why did the individual service unit break and number of times too.  Therefore we understand what are the methods where the failure occurred inclusive monitoring and health management. This is adding a procedure of Statistical Analysis to the datacenter monitoring.  The Benefits of Deep application monitoring and end to end heterogeneous application management,

image

How does one achieve Deep Application Monitoring?  Firstly there is a requirement of tool which deeply understand the architecture & dependency of the different layers of the application this would discover the application dependencies which in turn would enable monitoring end user and application components. In case of an alert the information produced at a micro component level will help in isolating the root cause which will help in remediation.

image

Third Step – Service Management

This step is drafting SLA for the services based on step 1 what services are required , step 2 defines how the services are going to remain healthy

image

SLA Management and processes are very similar. But it is all about the service templates that you apply this is where your grab the configuration units and put them together in service units.image

Fourth Step – Process Engineering & Pre-Production

Quality Assurance for Pre-production. Every single process in the environment has to understood unto the last level of detail & documented. Themes for this steps are necessarily are quality assurance & pre-production practices. Pre-production practices make sure everything that gets into production goes through an exhaustive pre-production.

image

Fifth Step – Automation & Orchestration

This steps what service units are to be automated and orchestrated. The Benefits of process automation and orchestration

image

The first question that comes up who orchestrates should be the developers or datacenter  system administrators. In other words the central console of monitoring will need developers to script as well as will be required to the statistical analysis as well do the automation. In the process we move refining the automation process to finally writing the playbook.

 Sixth Step  – Service Lifecycle

ISO 9000, MOF ITIL pretty much comes in here.

image

Seventh Step Self Service

Self Service entity to be enabled and access given over to the end user. Self service should also be in a position to manage capacity.

 

Services Layer in Standard IT Organization – Putting Principles to Practice……

The Services Layer is important wanted to take an example of standard IT organization and see how it maps into the services layer.

Starting with doing on layers of a Private Cloud the figure indicates what it looks like , this may not be 100% complete but its good for a starters conversation. If I grab the layers from a NIST model we start pretty much at physical/ hardware resources which include Real Estate, Power, Cooling , Bandwidth, Server, Storage, Network & Fabric.

The next layer is the Virtualization Extraction which is virtualization layer which virtualization and networking. The next one is Management which goes from Monitoring all the way to security.

Infrastructure as a Service / Service Delivery which is self explanatory. The last is the access layer.

image

Sample of an IT organizational chart

Referring to the org chart below the extreme left “Distributed Systems Management” is the area which takes care the Desktop and other devices in terms tablets, phone etc.. and in addition to it the call center & desktop support. The next vertical is the Application Management (Development & Support) for the business applications which could developers as well. The next is the Data Center Management which include a whole lot of things, the interesting aspects to be noted are business support, financials & chargebacks, Portfolio Management, Process & Compliance, SLA Mgmt & Business Continuity and Disaster Management.

The next vertical is Security which consists the identity management, security & access management & process and compliance. The next is Architecture/ Engineering which includes standards, research & configuration management.

The last but not least is the Business Relationship Management which includes business architect, portfolio management & business architecture. Below is a depiction of Standard IT Organization.

image

What can really be outsourced in above diagram.

  • Distributed System – Outsourced
  • Application Development & Support – Generally Kept in House
  • Data Centre- pretty much everything Outsourced except Business Support in kept in House.
  • Security- Outsourced
  • Architecture/Engineering – Kept in House
  • Business Relationship Management – Kept in House

So basic question which comes up “How does private cloud fit here? Which process/roles are added which change” or “How is going to effect my operation”?

So going back the Cloud attributes the cross referencing of attributes vs. Service Layers.

Access Anywhere impacts the top 2 layers Access & Infrastructure Service / Service Delivery. Scalable, Control, Elastic it impacts the bottom layers.

image

Self Services touches upon Management layer all the way to access. Multi tenancy & customizable is affecting all the layers that’s the basic simple concept of outsourcing.

Failure Resistant can mean many thing it can mean at the application level, it can mean at the data center level or it can be a cloud strategy, by the way it touches upon all the layers. At the end of the day this impacts the complete stack and by and the large the most expensive attribute of the over all cloud strategy. Most of the time this doesn’t really work. This is one which is key decider factor for the cloud strategy.

Metered by usage – If you architecture is not built along lines componentization & service oriented then metering is an impossible attribute to meet. This impacts Access, Infrastructure as a Service & Management & Security.

Highly Automated & On Demand impact all.

So what happens when we apply the cloud attributes to a sample IT organizational chart

Access Management – the one’s green are directly impacted. This one impacts the edges of the service layer.

image

 

Scalable is highly thought about inside of the core architecture of the services, this more of the design of service.

image

Control

Control has to be ingrained in every single service of the architecture.

image

Elastic is purely a data center play. It depends on how does central IT architect the services.

Self Service – it is combination which spans across the whole organization where the control is given to the application. The delegation is something which needs to given a thought.

 

image

Multi tenancy & Customization: are pure outsourcing thoughts.

Failure Resistant this one should be thought from application recovery point pretty much decided at an architectural level.

image

Metered by use is again at a component level.

Highly Automation is at every level. 

Coming Next Post…Architecting The Hybrid  Cloud ……