atmos Blog

Current Articles | RSS Feed RSS Feed

Cloud Storage: What Indy Doesn't Understand About Modern Archives

  
  
  

      raiders book warehouse resized 600

Guest post by Mark O’Connell

What do you think of when you think of an archive?  For years, the final scene in Raiders of the Lost Ark represented the state of the art in archiving technologies.  In case you don’t remember the movie, "Indy" Indiana Jones, after surviving being chased by scores of Nazis, manages to defeat them and to recover the Lost Ark of the covenant, an incredibly powerful religious artifact.  The US government seizes the artifact and assures Indiana that it is being studied by “top men”, while the final scene of the movie shows the Ark being archived: shut in a wooden box, crated into a giant storage room which is filled to the brim with hundreds of thousands of similar boxes, never to be studied because no one could ever hope to find it again.

If you’ve ever tried to recover information from banks and banks of labeled tapes, you might know this feeling.  The information is there, somewhere, just waiting to be discovered, like a precious gem buried deep within the earth.  In such an environment, the archive isn’t quite dead, though it does represent quite a treasure hunt!  However, you certainly have more efficient ways to use your time, and there are certainly more efficient ways to leverage your business critical historical information.

This world began to change in 2002 when EMC introduced the Centera  platform, the first storage platform designed specifically to address the challenges posed by the long term storage of fixed content, archive data.  Because Centera stored the data on disks instead of tape or optical platters, information retrieval time immediately improved by more than a factor of 10 and was suddenly being measured in seconds or sub-seconds instead of minutes or worse.

With such a momentous change in the accessibility of information, the old paradigms were no longer sufficient.  Companies began transferring their data from primary storage to the archive more and more quickly, sometimes even bypassing primary storage altogether and immediately writing to the Centera.  As the average age of archived data grew younger and younger, the retrieval demands for the information grew exponentially, and suddenly sub-second retrieval times were no longer a glorious luxury but an absolute business necessity.

To say that such changes were unanticipated by the Centera engineers would be an understatement.  But the Centera engineers responded, and Centera today remains the premier platform for scale-out, long-term storage of fixed content and compliant archive information.

However, Centera remains designed around the storage of fixed content data.  With a treasure trove of historical information   at their fingertips or available over the web  businesses began to want to derive more and more value from it by running analytics, by annotating the data, even moving away from fixed content to storing mutable business content on the archive.  To meet the growing demands for these next generation archives, EMC introduced the Atmos platform.  Learning from the Centera experience, Atmos was designed to support both mutable and fixed content data, to support annotations and indexing natively, and to natively support multiple methods to access the data, including firewall and mobile device friendly friendly REST protocols.

As with Centera, customers have taken the base support in Atmos  and used it to drive active archives in directions that would have been unimaginable even a few years ago.  For customers whose primary concern is space efficiency, Atmos provides GeoParity  offering distributed erasure encoding for data, maintaining full read/write access to data while also giving protection against disk, node, and site failures for only 33% overhead.  For customers who have existing applications using a filesystem interface, Atmos GeoDrive   

enables these applications to leverage scale-out cloud storage without any changes required.  For customers with global data storage needs, Atmos supports a single system image across geographically dispersed systems, with support to minimize read latencies by reading from the location which is closest to the requesting client.  And for customers with a mixed legacy of applications and use cases, Atmos multi-tenancy allows the customization of the storage policies, self-service access, data indices, and workflows such that each application has the illusion of a storage platform tuned precisely to its needs, while the system administrator has the ease of use associated with a single system with per-application personalities.

Now imagine Indiana Jones in this new world – all the wooden boxes, containing treasures from all over the world, are open and their contents visible, but using only a fraction of space that used to be required.  Any item can be found in under a second using a well known,  unique metadata identifier.  Similar items are grouped, so I can find “all arks from antiquity” or “items found by Indiana Jones” in a heartbeat.  Research notes can be added to any items, and are instantly visible to all users, multiplying the value of the information and bringing new insights that had never before been possible.  Archeologists would see the collection organized by era or empire, while linguistics professors would see the collection organized by hieroglyphic or alphabetic family.  Indiana Jones could only dream of living in such a world, while for us it is the new business reality, driven by the changes initiated by Centera and continued in Atmos.

Photo courtesy of Lucasfilm Ltd.

When Big Data Meets Small Business, Cloud Storage Can Help

  
  
  

 

Since the announcements last week with Iomega here and here we've been seeing Big Data from a completely different point of view.

While we (EMC, the parent company) have calculated the Zetabytes of Big Data for Big Business, it's been a real eye-opener to see how those same bytes can "bite" Small Business.

If you start to look closely enough and you can see it. Lots of small gets big, really fast.

BigSmall

 

 

Sure, we've heard testimonials of how Atmos cloud storage helps major hospitals like The Beth Israel Deaconess and Kettering, but now we see the same challenges of storing and scaling ....on a small scale.

Guess what? It can be just as painful...if not more painful, especially since in a small clinic, doctor's office, or lab there is no storage team to scream at when you need more storage.

As the small bucket gets filled with more BIG DATA, 

  • Patient records can't be searched 
  • Past studies can't be retrieved
  • The staff on-site spends more time with provisioning tasks, and less time with patient tasks

 

And in the BIG picture - it's not just health care settings, but almost any small business these days has to "fight with the bytes" of managing unstructured content.

With cloud storage connected to those small sites,

Now there's a better, easier way to scale images resized 600

 

                from small to BIG    

 

 

 

 

So,

Are you the car in front?

Do you have an Iomega box and want to take a cloud storage test drive yourself ?

 

Are you the car in back?

Want to offer connections to your public cloud service?

Or offer connections for your branch offices inside your private cloud?

See how easy it can be, and Contact Us for a free trial cloud storage account.

 

 

Like the Ferraris ? (either size) see more of Steve Brandon's collection

Big Data Is Really About Small Data

  
  
  

1 30 12

"Big data" is deceptively self-descriptive.  Big data is not simply about storing large quantities of data.  Big data is about what you do with large quantities of data and how you manage it.  That's a subtle, but important distinction. Let me explain.

Big data sets are difficult to manage and understand because the data is usually stored raw and unfiltered.  The process of sifting through these data sets usually produces much smaller data sets that serve as summaries that are easier to consume.

Big data is about making meaning out of large, unwieldy data sets.  It's about the insight gained because of fundamental changes in the technology used to store and manage content as well as the dramtically reduced costs.

Big data places a bet against the high cost of compute and storage resources because critical insight across disparate data isn't practical until the costs of storing and managing data approach zero. 

One of the most common big data use cases is related to measuring website page views by collecting raw web server log files and processing them to aggregate the log data into more meaningful results.  The input is a "big data" set of log files and the output is a "small data" set that consists of a summary.  The output in this case is the aggregated set of results that describes the actual number of page views, unique visitors, etc. That's the classic Hadoop use case.

Big data for the masses wasn't possible when the storage and compute costs required to gain insight were exhorbitantly high. Simultaneously, the software being used was maturing and improving to make it more broadly accessible, but more importantly the software was developing around the notion of squeezing efficiencies out of the now lower cost hardware.  It was a perfect storm of sorts leading to the ability to do more with less. 

So, yes:  big data is really about small data.  And the costs of storing and managing big data finally make it feasible to create small data. 

 

The Importance of REST and Mobile (and how Instagram scales)

  
  
  

 

1 4 12I read a post recently about Instagram, the mobile photo sharing phenomenon, written by their engineering team that describes how the service is built and how it scales. Instagram allows users to take photos on their mobile phones, add a filter to the image to change the appearance, upload it to the Instagram service and then share it on various social platforms such as Facebook and Twitter. 

Instagram encountered many technical scaling issues early in the life of the application because of the incredible user and data growth.  They're nearing 20 million users and 50 objects created a second. That sort of growth exposes a thorny set of issues that most platforms won't ever encounter.  For instance simply creating a unique ID to track each object in a distributed system with that sort of volume becomes a challenge. 

One of the most interesting parts of that post to me, unsurprisingly, was about how the actual photographs are stored. 

Here's what they had to say:

"The photos themselves go straight to Amazon S3, which currently stores several terabytes of photo data for us."

In other words, an application server running in a compute instance isn't a gateway (or a bottleneck) to the storage layer.  The Instagram app (on the iPhone perhaps) communicates directly with the storage layer using REST over the Internet.

It wouldn't work as well any other way. 

RESTful webservices are fundamental to the mobile Internet because users themselves are more mobile now than ever.  Traditional ways of thinking about storage are almost irrelevant in the mobile Internet particularly where unstructured data is concerned. 

The design pattern of mobile apps that access RESTful webservices will not just continue, but it will increase. 

HTTP is the language of the Internet, but particularly mobile apps.  It's the protocol that apps want to speak.  And it's why we built Atmos.

(Photograph of the MIT Stata Center in Cambridge, MA taken with Instagram)

 

Cloud Storage: Take A Vacation from "Some Assembly Required"

  
  
  

 

Regardless of what holiday you observe this time of year, I’m sure there’s some form of gift giving involved.

Whether you’re a parent of small children or, like me, find yourself providing the “mission critical” IT services to your parents like programming whatever new gadgets they get, these 3 little words “Some Assembly Required” strike fear into the heart of anyone responsible for providing the gift or service.

12 20 11Do you really want to stay up all night assembling a bike? or a doll house? or pinning the desktop resolution to 800x600 for your folks? Probably not.

Life is short. Holiday time is precious. God Particle or not,  some laws of Physics still apply and we can't bend the time / space continuum.  

Liberation from tedious assembly has value.

Your time is probably better spent with family, friends, or significant others. If you’re passionate about your work, maybe you’d rather spend your time building a new app.

Having things thoughtfully pre-assembled for you can eliminate the effort of assembly and can give you the time to add a creative personal touch on a gift, have a 2nd cup of coffee or adult beverage with a friend, or just get more sleep and be less stressed out.

We feel the same way about storage. 

When we came to market a decade ago with EMC Centera we quickly started seeing it solve difficult problems of storing unstructured content. “Assembling” your own capabilities like compliance, content authenticity and low-touch operation at cloud scale, across multiple sites, on a file system,  took a lot of effort and a lot of money.

In the elapsed time, we've continued to innovate and bake-in the components you'll need for storage as a service.

So just in time for the final holiday rush, here's your check-list of capabilities we've built in that you’d otherwise have to assemble yourself:

With Atmos you can:

Or better yet?

So,

  • If you’re trying to manage mobile apps, get control over an explosion of unstructured content, or
  • Quickly stand up all the software needed for delivering software into a Service , or build the next killer app yourself and skip the tedious steps and click Watch Video here
  • See how easy it is to build the next cool cloud app (that I may one day have to then configure for my folks) take a workshop
  • See for yourself how easily storage as a service can be for your customers, developers or end-users

Freedom from tedious assembly has value. Since the first Centera, we’ve exhaustively worked to drive the complexity and tedium out of storing, managing and protecting content - regardless of where it comes from, where it needs to be stored, and whatever its business or clinical value.

12 20 11 2In return, you can free yourself from the ‘some assembly required’ complexity at work. 

Now, with the time you've saved, and tedium you’ve avoided, please invest that time by spending it with those you care about.

Happy Holidays






JavaScript, HTML5, Node.js, and Cloud Storage

  
  
  

 

"Developers, developers, developers!"  That refrain from Steve Ballmer from a few years ago accurately describes our focus with Atmos to provide application developers with a scalable storage back-end using modern architectures and protocols. 

We're constantly thinking about how developers can utilize Atmos with their applications in an easy and efficient manner.  Part of that thinking involves pattern recognition and being able to peer around corners to see how developers will build and deliver their applications to users and how we can help. 

Clearly application development in the mobile market using specific platform tools (iOS, Android, etc) has a great deal of developer mind share.  We already have tools that make it easy to build on those platforms, but developers are also building similarly rich experiences using technologies such as HTML5 and JavaScript without being limited to or dependent on a single client-side platform.  By using HTML5 and various mobile toolkits such as Sencha and jQuery developers can build thin client applications in the browser that have similar characteristics to their thicker client counterparts. 

Developers are also starting to build server-side web applications using JavaScript with Node.js.  Node's primary value is that requests are all non-blocking in nature.  Requests are processed asynchronously so that resources are not consumed simply waiting for other requests to complete.  The net result is that more requests can be served because resources are being used more efficiently. 

In all of these scenarios applications still need access to a storage infrastructure designed with mobility and scale in mind.  We recently released Atmos support for JavaScript by developing a wrapper that performs the heavy lifting required to communicate with our REST API.  The wrapper signs HTTP requests, sends them to Atmos, and parses the responses.  The wrapper supports asynchronous I/O and event-driven programming, so it's written using the same paradigms that developers would expect when being used with AJAX, Node, etc. 

The wrapper is being released under the open source BSD license; use it, modify it, and perhaps build on it too.  One can get access to it by checking out a copy from the Google Code SVN

So, yes:  developers, developers, developers.  That's our focus.  And we're continuing to innovate to make it easier for developers to implement cloud storage. 

 

Everyone Is A Service Provider

  
  
  

self service cloud 

We announced an update to our Atmos Cloud Delivery Platform (ACDP) today.  ACDP is a software layer deployed on Atmos that allows service providers to deliver metered cloud storage resources in a self-service manner. 

Who's a service provider these days?  Everyone is a service provider.  Historically, we only thought of service providers as being organizations that operated as public utilities whose mission it was to serve otherwise unrelated 3rd parties. 

But that's changing; nearly every category of customer wants to deliver metered, monitored, and scalable resources to users in a self-service manner.  This includes "internal customers" of an IT organization and "external customers" of a public utility. 

Why would an enterprise IT organization need the same infrastructure tools as a public utility?  Because enterprises want to bill internal departments only for the resources that they use --- just like a public utility does with external customers. 

It's more efficient for departments to only pay for the resources they consume rather than over provisioning resources up front.  IT gets a better understanding of which applications and departments consume resources and can appropriately charge back.

Enterprise IT gets many of the benefits of operating a public utility while still maintaining control.  This is IT-as-a-service.  Because today everyone is a service provider. 

 

Product DNA

  
  
  

 Product DNA

I've been thinking a lot lately about the genesis of technology products, the teams that build them, and the pace at which they're delivered to end-users.  Part of what's prompting my focus on product evolution stems from an observation of an industry-wide attempt by technology companies to re-architect legacy products for this new era of cloud that we're experiencing.  It's not going to work well.  Let me explain. 

Products need to be purposefully designed from the beginning to deliver predictable results in the cloud.  Legacy products can have fundamental architecture issues that results in a fundamental incompatibility with the cloud.  Certain foundational aspects of products designed specifically for cloud can't be added as an afterthought without significant effort and yet still an unnatural result is possible.  Multi-tenancy is a great example.  Multi-tenant architectures allow users and applications to be logically segregated at the infrastructure layer for purposes of security, isolation, consumption based billing, management, etc. 

Adding multi-tenancy to an existing product is akin to adding a foundation to a house.  It just can't be done seamlessly without re-designing (and rebuilding) significant parts of the entire structure. One can add piers to a insufficient foundation, but it's merely a temporary fix:  the house will still have fundamental issues at the core.

Atmos is an organic product developed within EMC specifically for the Cloud.  I like to think of Atmos as being 'net native because of the individual foundational elements that make up the product.

When you sequence the Atmos DNA it looks something like this:

  • Speaks HTTP natively
  • Multi-tenant foundation
  • Scale-out architecture
  • Distributed metadata
  • Pragmatic design

Forging prosthetic extremities onto technology products can have odd ways of materializing.  This is evident because technology products have personality traits and lineage.  One could even say that products have DNA.  And our DNA was clearly designed for the Cloud.


 



Understanding Cloud Storage Economics

  
  
  

One of the most compelling, yet challenging, aspects of cloud storage is the new economic paradigm it presents.  This paradigm promises tremendous cost savings and efficiencies, but in order to deliver on these promises, they demand a keen understanding of the underlying tenets which make them possible and the critical technological and process drivers necessary to exploit them.  In the world of Atmos, we have synthesized these tenets into three key principles: Scale, Utilization and Variable Cost, and we’ve internalized them as the foundation on which we build our technology.

 

cost to serve emc resized 600

 

Scale

The concept of scale focuses on the notion that every environment has an inevitable amount of fixed costs, be it facilities, labor, core infrastructure, etc.  By scaling resources around them (in this case storage) customers are able to spread these fixed costs over an enormous amount of capacity, thereby achieving “economies of scale” as the cost per incremental resource approaches zero.  While such scale may appear easy enough to achieve (to start, one simply needs a blank check) in reality, scale is often limited by the technology one chooses to deploy.

In Atmos, scale is primarily attacked from two dimensions.  To begin, Atmos focuses on automation, system healing and self-service functionality to reduce the number one cost in any environment, human capital, by dramatically increasing the amount of storage a single admin can manage.  Second, Atmos’ support of a vast array of access methods, all through a single, flexible API, means customers can build a ubiquitous storage layer, across a variety of use cases, that allows them to achieve a scale unmatched by otherwise siloed, purpose-built solutions.

 

Utilization

While a large, well-scaled environment is a critical first step, ultimately its value is lost if not fully utilized.  The concept of utilization focuses on minimizing unused, or overhead, capacity with the knowledge that every dollar spent on such resources is a dollar lost to the bottom line.  While this may seem rather straight-forward and measurable, in truth an environment’s utilization is quietly under attack at every turn; from the amount of overhead required to protect one’s data, to wasted capacity provisioned to users or held back as system, or site, reserve.

In the case of Atmos, utilization is first addressed through a true multi-tenant and multi-site architecture, which allows the pooling of storage resources across countless users and numerous data centers.  Second, by offering highly efficient, object-level protection schemas such as Atmos GeoProtect®, customers have powerful tools to optimize the protection they place on their data.  Finally, by designing Atmos as a scale out, node-based architecture, customers can expand their environment quickly and efficiently, thereby allowing them to realize a just-in-time operational model.

 

Variable Cost

Once an environment is both scaled and fully utilized, a customer has reached a point where fixed costs are immaterial and the marginal cost of adding the next unit of capacity is all that remains.  This final cost element is truly variable in that it grows as capacity grows and thus it defines a customer’s long term potential for economic success.  While it may be tempting to avoid grappling with such long term issues up front, once a large deployment exists, customers are all but stuck with the economics they have wrought.  Thus understanding the inherent variable cost implications of a given technology is a critical day one requirement.

Atmos is designed to reward customers for achieving scale, by both directly and indirectly lowering their variable costs.  By tiering its pricing as capacity grows, Atmos enables customers to see direct evidence of variable cost declines with each new investment.  At the same time, by developing the intelligence necessary to automate the entire storage user lifecycle, customers indirectly gain variable cost efficiencies by avoiding incremental investments in human capital and management overhead as their environments grow.  

While these three principles were not born in the cloud, leveraging technologies that were provide unique opportunities to exploit and reshape them.  Ultimately, when customers, partners and vendors embrace these concepts, put them into practice and measure their impact, the true power and economic benefits of the cloud are unleashed.

 

scale utilization variable cost emc resized 600

 

Scott is a Principal Marketing Business Analyst with EMC’s Cloud Infrastructure Group.  He’s worked with Atmos for the past 4 years developing both business and operational models for cloud storage offerings.


Siri and Cloud: Resource Disintermediation

  
  
  

 

siri cloudApple released Siri on the iPhone 4S this past Friday.  Siri is an automated, voice directed assistant that can schedule reminders, find restaurants, and generally answer questions in natural language format.  I spent a few minutes on Friday evening asking Siri questions that I would normally have asked Google on my mobile. 

Siri is largely disintermediating the user from Google search to have a direct relationship with the user.  Apple is removing the layers in between the user's questions and related answers with a neat interface that does natural language processing.  But this post isn't necessarily meant to be about Apple --- it's about resource disintermediation. 

Cloud is also about the direct to consumer model.  Cloud removes the barriers to efficiently allocating resources so developers can build and deploy applications quickly.  No longer are developers constrained by the time it takes to setup and tear down compute and storage resources. They can self-provision resources using point-and-click portals rather than submit requests for resources that could take weeks and perhaps months to build and deploy. 

In the case of cloud, the consumers are mostly developers who are not building applications based on IT practices of the past.  Developers are not writing boiler-plate code when building web and mobile applications -- they're using frameworks that provide a massive amount of functionality at the offset.  Signing up to use an API is the new partnering model --- they're not waiting around for business development people to make decisions in order to begin hacking together apps to prove out concepts. 

Inefficient layers are being removed to provide quicker access to resources in intuitive ways.  We're living in the direct to consumer era.  Self-service is the new normal. 

 

 

All Posts
follow us
May 21-24, 2012

Subscribe by Email

Your email:
About Atmos Online
Atmosonline.com lets us share deep insight about EMC Atmos. We will cover application development, high scale architectures, and other topics around the design and use of cloud storage, with as many actual real world scenarios as possible. Atmosonline.com is also a portal to our Atmos Online storage as a service test and dev environment.

Disclaimer: "The opinions expressed in our blog are the personal opinions of the authors. Content published here is not read or approved in advance by EMC and does not necessarily reflect the views and opinions of EMC nor does it constitute any official communication of EMC."