Challenges for Serverless Blue-Green Deployment
Serverless blue-green deployment is a nice practice. However, it doesn't come without challenges.
I already blogged about the serverless blue-green development some time ago. I used it in practice a lot with very promising results. But there are challenges as well.
First, let's briefly summarize the idea: We want to separately deploy a serverless service (stack of resources) and test it in isolation from another resources. Typically, we can achieve this through multi-account deployment, when several stages are separated via accounts (e.g. dev
-test
-prod
standing for development, testing and production). We deploy changes into the dev
stage, which is kind of playground for development, and then into the test
stage where the tests are executed. If testing is successful, the service will be deployed into the prod
account.
This strategy works well until we take a look at the bill. Continuous development teaches us to have all the stages as similar as possible, ideally identical. If you're using expensive resources, your bill could be three (or more) times bigger.
Serverless blue-green development is a way how to save some money by deploying only those resources really needed for the test and use already deployed and tested dependencies within a single account.
Deploying any change results in a new stack of resources (blue), existing parallelly with the previous version (green). When tests succeed, the blue and the green stacks are switched, the blue becomes green and duplicated resources are removed.
Challenge: Unwanted Interactions
Consider a situation when a blue stack is triggered by some green resource. For example, a transformer is listening on a topic of upload events. Such "green" events could disturb our testing (our test emits "blue" test events) and devalue the test results. And what's more, if the transformation results are saved in a storage, like Amazon S3, the clean-up after the testing could be hard (it is not possible to delete a non-empty bucket).
Solution: Conditional Flags
Of course, it is not difficult to find a workaround for each such use-case (like to empty the whole bucket and distinguish "green" events from "blue" ones by an identifier, prefix or special attribute), but all those bring too much knowledge into tests. Knowledge we don't want to have and deal with.
A solution is to use conditional flags in the stack template. For example, with AWS CloudFormation, it can look like this (Yaml):
Parameters: GreenDeployment: Type: String Description: "This is a green deployment" AllowedPattern: "true|false" ConstraintDescription: "A boolean value" Default: false Conditions: GreenDeploy: !Equals [ !Ref GreenDeployment, true ] Resources: # The subscription to the topic only when "green" deployed UploadEventsSubscription: Type: AWS::SNS::Subscription Condition: [GreenDeploy] ...
For our example, we simply don't subscribe to the upload events topic when the stack is not being deployed as green.
Challenge: Expensive and Slow Resources
Even when created only for the time of testing, some resources can be really expensive or extremely slow, which can slow testing and the whole deployment process down. A typical example for both of these characteristics is Elasticsearch in AWS. To create such a service is expensive and very slow. So, how to deal with this?
Solution: Green Resources
In many cases using green resources (index clusters, databases) does no harm, especially if we clean up afterwards.
We can use a condition for not to create the expensive resource and inject a green dependency instead (AWS CloudFormation):
ResourceRef: !If [ GreenDeploy, !Ref MyExpensiveResource, !Ref GreenResourceRef ]
Solution: Fake Resources
In testing we use test-doubles for expensive or slow services, and we can do the same here:
ExpensiveFn: Condition: [GreenDeploy] Type: AWS::Lambda::Function Properties: FunctionName: expensive-function Runtime: nodejs8.10 Handler: index.handler Role: !GetAtt ExpensiveFnRole.Arn Code: ZipFile: > exports.handler = event => { // do something very expensive and slow // ... return 'success' } FakeFn: Condition: [BlueDeploy] Type: AWS::Lambda::Function Properties: FunctionName: fake-function Runtime: nodejs8.10 Handler: index.handler Role: !GetAtt FakeFnRole.Arn Code: ZipFile: > exports.handler = event => 'success' MyFn: Type: AWS::Lambda::Function Properties: FunctionName: my-function Runtime: nodejs8.10 Handler: index.handler Role: !GetAtt MyFnRole.Arn Environment: Variables: PROCESSING_FN: !If [ GreenDeploy, !Ref ExpensiveFn, !Ref FakeFn ] Code: ZipFile: > exports.handler = event => { // call the processing function // ... }
All the above listed solutions are implemented in the stack preparation phase, the tests stay completely agnostic without any additional knowledge.
Happy deploying!