Azure Service Deployment Package Inefficiency (Bug) - Content Size Added Twice
I spent this past week migrating my companion site for my book www.silverlightbusinessintelligence.com to the Azure cloud (http://slbusinessintelligence.cloudapp.net/). The site has a lot of demos and source code associated with it, therefore the Azure Service deployment package (cspkg file) was pretty big.
Problem: I actually ended up running into the maximum allowed size for an Azure service package, which is 600 megs (which I find odd, since you can easily get around that limitation by adding static content to the web role by remoting in or doing a web deploy afterwards). I deployed to an "Extra Small" instance, which is all I really need to a simple site like this and you get a lot of storage (shown below). The reason Microsoft caps the deployment at 600 megs is that your deploying to the E:\ logical drive, which has only 1 gig of space. However, as you can see there is PLENTY of space on both the d:\ logical drive. I know space is required for the OS, future patches and changes etc. but this limitation struck me as odd.
My three logical drives on the Azure Web Role:
Solution: One of the obvious solutions was the move the three source code zip files (which are over 100 megs) to the Azure blob storage (which costs extra) and server the links from there. I have my static source code zip files on the Azure blob storage for example: http://bartczernickiblogs.blob.core.windows.net/sourcecode/SilverlightBI_SourceCode_Version1.zip.
The other nice benefit of moving your static content to blob storage (and enabling public read access) is that your content loads faster since it is coming from a different domain in a browser. For example, if you had a lot of images to load on a web site.
Azure Deployment Package Inefficiency: After doing this I thought it was curious that the Azure service package size was so high as I didn't think I had over 600 megs of code, XAP filess or content. Granted my three different versions of the source code are 137 megs in total and the total Azure service package is: 630 megs. However, after I removed the three source code zip files (figuring it would make the package 137 megs less) it made the newer package a total of 351 megs almost 279 megs smaller! So it looks like Azure deployment packaging is adding the content twice!?
Digging in....an Azure Service Deployment package (cspkg file) is just a zip file. Inside it you can find the main content file, which is stored inside a file with the csnsx extension. The csnsx file is just another zip file that is encrypted. In order to decrypt it, add the _CSPACK_FORCE_NOENCRYPT_ environment variable and set it to true (follow this blogpost in detail). I decrypted my package and low and behold the entire site is replicated basically twice in two folders both APPROOT and SITEROOT (shown below):
AppRoot source code files of the Azure Deployment package:
SiteRoot source code files of the Azure Deployment package:
There obviously is a technical reason this was done this way, by having the same files duplicated in both APPROOT and SITEROOT folders. However, I look at this as a big optimization opportunity for the next version of the Azure SDK. For example, Microsoft could derive the APPROOT or SITEROOT from each other after it is deployed to the web role (?). In my opinion, if Microsoft is going to limit the Azure deployment package size to 600 megs, static content should not be added twice!
My example is not very common, but there are reasons why to make this faster:
- Bandwidth costs: If you count the amount of content uploaded to Azure, having content double package sizes costs the Azure user and Microsoft both money. Also for large packages like this, with the new world of bandwidth caps this costs me money too.
- Deployment time: Uploading large packages like this takes time. It almost took 45 minutes from upload to instantiation of the package. Making the packages smaller and more efficient will speed up the deployment process. One of the biggest selling points of AppHarbor (which is .NET cloud on Amazon EC2) is how much faster it is to deploy to the cloud....and it is by factors of 10x or more!).