Posts Tagged ‘failover’

What is Cloud Computing to me?

Monday, July 12th, 2010

When I look at cloud computing, the primary differentiator that keeps jumping out at me is the ability to quickly recover from failure. Since I have a group of servers that host various sites, I can fully understand what the benefits of cloud computing would mean for me.

Going back to the ability to recover quickly from a failure, let’s look at the tried and trusted method of recovering from the failure of a dedicated server. Let me preface this by saying that dedicated servers have proven to be an excellent platform for hosting sites both large and small. They give you complete control, you have 100% of the resources of the server available to you and you are completely isolated from other websites. However in the event of a failure, the restoration process can be tedious at best. In a perfect world your dedicated server would have a raid configuration and if you lost a hard drive, the system would automatically fail-over to the 2nd drive and notify you that the other drive had failed and needs replacement. This provides the opportunity to swap the drive in a very controlled manner and during a maintenance window. The restore process is fairly straightforward and has been done thousands upon thousands of times by various providers with varying degrees of success depending upon conditions. Backup and restore can be a tricky process and often times we are at the mercy of Companies who develop the software and hardware for backup systems.

Initially the problem must be identified and in this case let’s assume that it is a failed primary hard drive. The server has to be powered down and the failed hard drive has to be swapped. This can take go quickly or slowly depending on various circumstances and conditions. Then the server has to be brought online and the restore process from the backup systems is initiated. This step is relatively quick and provided there are no errors along the way the restore process should begin without incident. This is where it gets tricky though because depending on how much data you have, the restore and can either finish quickly or take a very long time. If you have a simple Linux server with a few gigs of data, that should restore very quickly. However if you have for example a Windows server running SQL Server and you have several terabytes of data to be restored, that might take a while. The real problem with this is that your server is down during the restore process and will be unavailable for your clients to access until it’s completed and the server has gone through a final reboot and system check. This is where cloud computing kills the dedicated server in my opinion.

Now let me outline the restore process for cloud computing. We refer to the backups in cloud computing as snapshots. The reason for this is that a normal backup typically does either a file by file or block by block backup of the entire hard drive or drives. Not only does this take a while but the format of those files which are more than likely highly compressed, are specific to your backup system and are in the format that your system requires to perform a successful restore. A snapshot on the other hand is literally just that, it’s like a photograph was taken of your hard drive in its current state and moved to a storage device. That snapshot is not a highly compressed and highly modified version of your data and operating system, it is a fully functioning duplicate that in the event of a primary failure, can simply be booted up. So the restore process is reduced from a series of steps that require lots of manual intervention and maybe even a technician to pull your server and do physical work on the server, to you simply clicking a button that says  “restore this snapshot”. Let me make sure that you understand this because even though this is an incredibly simple concept, people often times still don’t get it. So the system takes a snapshot of your cloud computing environment and instantly stores that snapshot on a storage device. When the system fails for whatever reason whether it is hacked beyond recognition, an angry ex employee went in and deleted all of your content or whatever the case may be,  you instruct the system to restore whichever snapshot you want and all it does his boot up that snapshot and your environment is restored. How cool is that.!

The other benefits of cloud computing are very obvious but the ability to recover quickly and completely from any type of failure is what really jumps out at me. Cloud Computing is still in its infancy but the writing is on the wall, the upside is crystal clear and I predict that eventually everyone will hop on the cloud.

~ Till next time

SociBook Digg Facebook Google Yahoo Buzz StumbleUpon