Cloud Got You Locked-in? Avoid it by Choosing an Open PaaS

With Google App Engine (GAE) raising their prices last week there has been discussion all around about the advantages and disadvantages of developing to a particular Platform-as-a-Service. At OpenShift, we believe in Choice, Freedom, and never being locked in. As thousands of developers and their associated businesses discovered, being locked-in can be frustrating and expensive!

Clouds can change direction, go out of business, raise prices, change security policies, remove features and functions, change service levels, have outages, fail to live up to their advertised service level, be acquired by other companies, move your applications and data to places you don't like, and the list goes on. At this early stage in cloud computing, although the benefits are clear unmistakable, you don't want to get locked-in.

With this in mind, here is our guide to avoiding lock-in and preserving your Freedom & Choice:

Write to a standard open source datastore

If your application is coded to a proprietary datastore that only runs on one cloud, then you will be stuck in that cloud. If something you don't like happens to the cloud, although you might be able to export or backup your data, you have nowhere to move it to since nobody else runs that datastore. This is the case with BigTable, SimpleDB, Force.com's Database Services, and others. OpenShift offers MySQL, MongoDB & MemBase, all of which you can run in your own datacenter, on your own laptop, and on a variety of cloud providers whether using OpenShift or even without OpenShift.

Write to open source middleware APIs and frameworks

If your application is coded to a framework that is proprietary or has been modified or forked to work in a special way in the PaaS then your code will have to be redeveloped if you want to run it somewhere else. You need to be careful that the APIs you are using aren't specific to the PaaS and are being provided by the standard open source version of the framework or middleware. Or even better, write to APIs that are standardized and supported by multiple vendors such as Java EE 6 and AMQP.

When using native cloud storage or other native IaaS resources, use an abstraction API

Generally a Platform-as-a-Service will take care of provisioning of compute and networking resources for you. But sometimes you want to interact with the resources directly. Most commonly this is done for storing data in an object store such as Amazon S3. While some clouds are adopting or supporting the S3 API, not all of them support it. And few support anything beyond the simplest S3 API. The safest thing to do is to use an abstraction API such as DeltaCloud, jclouds, libcloud, SimpleCloud, or one of the several others to make sure that your code is portable from cloud to cloud. This is especially true if you're interacting with the compute API for some reason (but as a developer, why would you want to?).

Use standard stuff and open source libraries

AMQP runs on many clouds and has many providers. Amazon SQS and Azure Service Bus do not. SQL (and in particular MySQL) runs on many clouds and has hosted providers within PaaS and outside of PaaS. So does MongoDB and other NoSQL options. Most datastorage-as-a-service options lock you in. Spring, Java EE, Django, Rails, Symfony, Twisted, Sinatra, RichFaces, and all the other open source frameworks run in a variety of PaaS. Proprietary or forked frameworks don't. Even cloud object storage is often not necessary when you could use a standard unix/POSIX filesystem - they're widely available and some even scale up or distribute out. (For example, OpenShift gives you a standard Unix filesystem to store data - crazy talk right?) Do you really need to lock yourself in? Could you implement what you need using an open source library on top of standard middleware (for example, Hibernate in a standard Java web container) or do you really need the proprietary library or service? What is the extra convenience or speed going to cost you down the line? Could you invent an open source project that does the same, or contribute to one which is getting there?

Avoid native libraries if possible, especially those which aren't in standard OS distros

One of the areas where a PaaS will differ, and one which many are not open about, is what the Application Binary Interface is on which they're running. The ABI is what an operating system provides to middleware and native libraries (.so and .dll and the like). One of the most frustrating compatibility headaches to run into is when your application depends on a native library (such as for rendering graphics) and that library is not available on the PaaS where you want to run. Most PaaS don't let you upload your own native libraries (OpenShift does). Most PaaS don't tell you what ABI your native library needs to be compiled to (OpenShift runs on RHEL, the most widely supported ABI). Some PaaS's want to be able to switch out their underlying ABI, or don't want to say what it is, since they may not be able to support their own operating system, or they may be getting their operating system from another vendor (such as us). Some PaaS may have patched their operating system thus making it non-standard and making it impossible for you to compile native libraries that will work on the PaaS, if they even allow their use.

Your options if you need native libraries are to either use a PaaS based on a standard OS that is easy to get and understand (such as OpenShift, which is based on unmodified Red Hat Enterprise Linux), or to be ready to move your application onto your own PaaS that you construct from IaaS…which gets to our next point:

Use PaaS not IaaS

It's glorious and geek-cool to install your own packages, write your own sysadmin scripts and configuration automation, do your own OS patches, configure your own backups and load balancers… oh wait, it was cool 5 years ago. Now it's much cooler to spend the time getting your app to work on a mobile browser, or to integrate with Facebook, or to graph its user footprint on Google Maps. When the PaaS can do all the undifferentiated work for you, you're just asking for hurt when you drop down to the OS to write your own deployment scripts and web server configs. Now you're the one who has to ensure API compatibility from patch to patch. Why not leave that to us? Did you know we have literally millions of lines of test code that we run to make sure we keep that compatibility?

Besides the admin overhead, IaaS has another problem: it's too tempting to hard code some piece of config that will be hard to find and difficult to port if you want to move. You'll need to remember to factor out IP addresses, file system locations (not every IaaS puts your drives in the same place), network assumptions, security configs, etc because they're different from one IaaS to another. Oh and you most likely won't be able to move your VM images from cloud to cloud without some kind of translation step (that will most likely require manual intervention).

Stick with a PaaS, open source frameworks, standard APIs, open source data stores, and libraries that don't have native code, and you'll be able to git push or REST upload your app from here to there as you need. You'll also gain the advantage of new features in the PaaS layer without changing your code or configuration, and new cloud support if your PaaS is multi-cloud. Now that's Freedom & Choice.