Cloudera is taking aim at the complexity of deploying and managing Hadoop in the cloud with a new product called Cloudera Director, announced today from the Strata + Hadoop World 2014 event, in conjunction with the launch of Cloudera Enterprise 5.2.

If there have been any complaints about Apache Hadoop, the popular big data storage and processing platform, it is that it can be rather complex to deploy. Commercial Hadoop distributions like those offered by Cloudera, Hortonworks and MapR have helped address, but not eliminate, this complexity, by offering pre-packaged bundles of software that are known to work together and can be effectively supported within an enterprise.

Hosted Hadoop offerings, on the other hand, like Amazon Web Services’ (AWS) Elastic Map Reduce, simplify things even further, but require enterprises to give up a degree of control in order to use them.

“Hadoop users want a cloud-centric experience, including elasticity, self-service, tracking and accountability,” said Matt Brandwein, director of product marketing at Cloudera. “But they still need the enterprise qualities like security, governance and supportability that they get from an enterprise Hadoop distribution.”

Cloudera Director seeks to fill the gap between these two types of offerings by making it easier for enterprises to use Cloudera’s Hadoop distribution (CDH) on top of cloud infrastructure.

Director accomplishes this by providing self-service tools for operators and end-users that allow them to provision and deploy CDH within infrastructure-as-a-service (IaaS) environments.

Read More:   Update What Spotify Learned From the Flop of its App Store

Cloudera Director provides:

  • An administrative console that IT users can use to create resource pools and establish user and group quotas and entitlements. Administrators may also report on resource utilization and employ chargeback/showback to distribute infrastructure costs to users.
  • An end-user portal that allows teams working with Hadoop to create and scale Hadoop clusters on-demand.

Once the infrastructure has been provisioned and CDH has been deployed, users use the existing Cloudera Manager console to manage the Hadoop environment itself.


Cloudera Director utilizes IaaS APIs to orchestrate and manage the underlying IaaS environment. Initially only AWS is supported, but the company’s aspirations are broader and it plans to support additional environments in the future. It envisions future support for hybrid and multi-cloud deployments because what cloud announcement would be complete without a mention of hybrid cloud? to allow enterprises to easily migrate between different cloud providers.

The ultimate utility of Cloudera Director will be determined in large part by its ability to strike a balance between ease-of-use, control, and flexibility in allowing users to configure and deploy Hadoop clusters to meet different needs.

“Hadoop deployment is especially difficult because of its heterogenous cluster layouts and large number of different components,” said Jonathan Gray, founder and CEO of Cask, whose Coopr product is open source software based on Chef that also aims to simplify cluster provisioning and management on public and private clouds.

Coopr was originally developed out of the company’s own need for a fast, self-service way to provision Hadoop clusters, but the software can now provision any software stack. It supports multiple tenants, each with their own administrators, users, cluster templates, and clusters, and supports various IaaS providers, including AWS, Google Compute Engine, OpenStack, Rackspace, and Joyent.

As cloud and big data continue to converge, simplifying the delivery of big data services on top of cloud infrastructure becomes increasingly important. I’ve long contended that what’s needed to address the complexity of deploying and managing big data and Hadoop is something akin to a platform-as-a-service (PaaS) layer. Cloudera Director doesn’t go all the way there, but it’s a good start for Cloudera and Cloudera Enterprise users.

Read More:   Update The Data Stack Journey: Lessons from Architecting Stacks at Heroku and Mattermost

Feature image via Flickr Creative Commons