Thursday, August 19, 2010

Notes on Amazon EC2 Image Creation Fails for Windows Machines

Disclaimer:
  • This is my notes on the topic and not necessarily a complete solution. In fact, probably it contains some very inaccurate information (but you can help to rectify this via your comments.)
  • This article also pertains to EBS Backed Windows EC2 running instances.
One of the things all of us should do is that once we have a stable machine running, we'd want to save the image of it so that if, for some reason, we completely trash the running image, we have something to go back to without spending a lot of time putting the machine back from the scratch.

Obviously I have tried this with varying degree of success and information on this is scattered around the Amazon AWS forums and on the web. The most common of the errors would be that the Image creation begins and then after a long while it says "Failed" but without much explanations why. Some people on the net are saying the success rate is 40%. I was actually relieved to find this because I thought it was just me that's suffering this much failure rate!

The Amazon documentation is not very clear to me as to what and how to do these things. It seems to require you have gained some more "under-the-hood" knowledge of this. And as a just a casual user of the system I expect that I should not need to learn everything to use it.

In addition to that as a "basic" user, I don't get any direct support from Amazon. That's at least $100/mo or much more for silver or gold support privilege even in these kind of the situations when the failure (even just to explain what to do) is on their side.

I also highly suggest you install Elasticfox on your browser. Since the following descriptions will be based on how I'd do things on Elasticfox.

My Findings on Creating a Copy of Running Image
  • To do this on Elasticfox, go to the Instances list, right click and select Create Images (AMI) item.
  • Many people reported that when creating a copy from the running image, it will stop and restart the instance. This does happen. Worse yet, it will be stopped for the entire duration of when it is taking the snapshot, depending, this could be 10s of minutes. You'd probably want to schedule a down time. I should have done this just as soon as I have created the initial instance before went into production. Elasticfox does have a menu item not to reboot the instance I have not yet tried this.
  • Some people say it has to be a Running instance, but that seems to be not true. I can image from a stopped instance as well and I have made images successfully this way via Elasticfox. In fact, that has more success rate than doing it on a running instance.

Some of The Issues I Have Discovered That Cause Failures
  • Again, copying the running instance will cause a restart, so be prepared for the downtime.
  • If you have already stopped the instance, it is at least more likely the imaging will succeed. A few caveats on this are that, at least in my case, the Elastic IP will get disassociated with the instance as shutting down the instance will detach the Elastic IP (wish they did not, at least for a short time), and also the machine's local addresses will change after restarting. This means that if you are locally pointing systems, for example, SQL, then the that address have to be re-configured in each application.
  • Processes like SQL or other high I/O or CPU consumer items should be stopped. In fact, you should pare down the services to essentials (like SQL, agents, browser etc.) before taking the image copy. The one exception to this is not to stop the Amazon's EC2 services that came with the original Windows image whichever they exist.
  • If you have other volumes mounted that also leads to a failure. Stop all services and applications you don't need to be running then dismount any "drives" and then make the image copy. This seems not be an issue if you are imaging a stopped instance.
Well, good luck and if you find this article correct or wrong, please let me know in the comments so that I can actually improve it.



No comments: