In Part 1 and 2 of this series we've shared some of the technical challenges we faced when building a Serverless car insurance platform. In this post we are going to take a look at some ways in which you can help guide your team on their transition to Serverless.
In the 1970s & 80s, the expression “nobody ever got fired for buying IBM” was coined to illustrate IBMs utter dominance of the IT industry and how executives, who were playing it safe, kept buying IBM. Rather than look at the competition, which could potentially save them time and money in the long run, they opted for the perceived safe choice.
I have seen a similar trend in regards to Serverless, many teams have reluctance when it comes to adopting a Serverless-First approach, despite the benefits being clear and proven:
- No infrastructure provisioning or maintenance
- Automatic scaling
- Pay for what you use
- Highly available and secure
There are potentially many reasons for this lack of adoption, but what I want to focus on is fear.
Fear when it comes to the developer experience, fear when it comes to modelling databases, fear when it comes to cost and scaling.
When I started on the Stroll project I was comfortable with Lambda functions, messaging queues but honestly I was afraid of DynamoDB. When you work with DynamoDB you are told that “you have to know all your access patterns up front?”. As an engineer you know that it's so easy for requirements to change and needing to know your access patterns seemed like a huge risk.
I wanted to see if this fear was just something I experienced, or had it impacted other teams as well. So I asked a few of my fellow Serverless AWS Community Builders a question:
When you first started your Serverless journey what were you, your team or your company afraid of, and how did you overcome it?
And I got some great answers:
We were afraid that getting up to speed with the Serverless paradigm would take ages and that we'd end up investing a lot over a long time before we could reap any benefits. We overcame this by breaking things down and achieving small wins. Getting the first Lambda in production only took a couple of days and felt great! From there, we kept on adding the next small building block, learning what we needed to know as we went.
On the company that I was working at that time the main fears/blockers were: Questioning if Serverless would provide value to our systems and customers. The misconceptions: Serverless is more expensive than X, Serverless is harder to test than X or Serverless is more complex than X. Education, not only in Serverless but in distributed systems in general.
This was late 2016/early 2017: Vendor lock-in - got over it quickly though that Serverless wouldn't gain market share - that we'd end up with an architecture with no community backing which we'd struggle hiring for.
If I circle back to my own fears around DynamoDB, the solution to that problem was education. I educated myself on DynamoDB (Alex DeBrie's book is a great starting point), eliminated the fears that I had and have come to the realisation that planning, discovring and knowing your access patterns upfront is actually a good thing that leads to a better architecture overall.
One of the things that is key to our success is that we had a confident Serverless team. This confidence wasn't something we started out with, we had to eliminate the fears and concerns that the team had along the way.
One of the best things about our jobs is that we start with an empty git repository and turn it into a business for someone, that experience never gets old. What we have learnt is that you can do this better and faster with Serverless.
Encouraging the Serverless mindset
One of the first things we had to learn as a team was this notion of a Serverless mindset.
And if you’re not really in the Serverless world, drinking the Serverless Kool-aid as one of my colleagues would say, I'm sure that “Serverless mindset” isn't even a term you would be familiar with. I think the origin of this comes from a blog article written by Ben Kehoe entitled “Serverless is a State of Mind” And really what it boils down to is that if you want to succeed with Serverless technology you need to think differently about how and why you build things, focus on delivering business value more than the underlying technology you are using.
Encouraging this Serverless mindset is key to building confidence.
If you really want to encourage this mindset, it can't start at an individual level, it has to come from an organisational level.
At Instil, this idea of adopting a Serverless mindset, or going Serverless-first as some may say started from the top. We have an engineering strategy which lays out a number of things that we want to focus on from an engineering perspective and in the “cloud” section this was the first point:
In the last few years, Serverless has exploded and dramatically changed how applications are built - no longer do we need to worry about capacity planning, patching servers or wasting money on under utilised resources, we now have (virtually) infinite computing power at our disposal ready to scale up massively in a fraction of a second. We want to embrace Serverless but this requires a cultural change in how we approach building software. We all understand that code is a liability and in the Serverless world, our focus shifts to connecting events and services through configuration while reducing the amount of code we write. Right now, the Functionless (aka Serverfull) movement is also gaining popularity driven in part by AWS enabling application developers to ditch Lambda functions in favour of direct integrations between managed services.
That’s a confidence booster right there, if leadership are encouraging a Serverless first approach, that should help teams take that first step. But its one thing saying that you want to take this approach, but a very different thing putting it into action.
The Serverless mindset was definitely something that our team struggled with, at the very beginning we were treating Lambda Functions as just another way to get some compute in the cloud. As developers we focused too much on what Serverless meant for us, and I think when you take that narrow minded approach you can come up short on some good reasons for choosing Serverless:
- Is it easier to deploy? Perhaps…
- Is it easier to test? Not if you try to do it locally.
- Is it easier to code? Only if you educate yourself first.
Nothing really stands out from that list.
In fact I know that early on on the project there were people that had their doubts if Serverless was a good choice. The team needed to change their mindset and instead of solely focusing on the technology and how that made their lives as developers easier, instead look at the bigger picture, what does this mean for Instil and what does it mean for our customer.
One of the first steps on this journey is enlightening developers that writing less code is better.
Enlightening developers that writing less code is better
One of the key steps to instilling confidence in our Serverless team was to learn to write the code that really matters. In Part 2 we shared how our team came to this realisation.
And when we think about how we encourage the Serverless mindset in our teams, we need to have some key people to drive that forward. The lead engineers and architects of the team need to have the confidence that Serverless is the right choice. You can't expect the rest of the team to succeed if the people making the decisions have their doubts. For me I certainly had my doubts, but that confidence came from educating myself. When I first joined instil Garth, our Director of Training shared a quote:
Being a software engineer is just agreeing to do homework for the rest of your life.
So if you are a lead engineer and you want to encourage the Serverless mindset in your team, you first have to educate yourself and build up your own confidence.
At Instil we strive for engineering excellence and that kind of culture attracts people that love to write code, but how do you convince a team of die hard programmers to write less code? As developers were told that “code is a liability”. But let's be honest with ourselves, do we really believe that?, or do we say to ourselves, their code is a liability, but mine? mine is perfect!
We wanted to encourage the engineers to really own the problems they were trying to solve and to do that we needed to create a safe space for them to experiment, propose new ideas, get feedback from their peers and then be given the opportunity to actually implement the changes.
Now I need to make a disclaimer, I’m a huge Apple fanboy, and secretly my dream job is to be an independent iOS developer living off my App Store profits. If you’re familiar with iOS development you'll know that the main languages you can use is Swift. For new Swift language features there is a process called Swift Evolution, it’s quite a big template with a few headers that enables people to propose new features. Now I’m not saying that Apple invented this, I know that Kotlin has a similar process and I’m sure other languages do too, but remember Apple Fanboy, so history and facts are not important. But this is where I drew inspiration from.
So we created a process called Stroll Evolution, a colleague summarised it like this:
Stroll Evolution, in this sense, includes a proposed solution to achieve some goal (motivation), with information about the proposed design, what effects there would be, as well as alternatives considered. It’s always good when you can answer “but why didn’t you do it some other way?” At least even for yourself in the future.
And with a simple template with the following headers:
- Introduction
- Motivation
- Proposed solution
- Detailed design
- Effect on Web App
- Effect on Mobile App
- Alternatives Considered
- Decision
We created a way for engineers to own a problem and help find a solution. This process helped us to:
- Do our first DynamoDB data model, where we followed the steps laid out in Alex DeBrie’s DynamoDB book, came up with our access patterns and primary key design to help enable them.
- Plan out what our first step function would look like and why we wanted to use step functions in the first place.
- Decide how we process insurance policy documents and what the best AWS messaging service was for solving that problem.
So not only did it help us solve project specific problems, it gave team members a way to educate themselves on the Serverless offerings of AWS. Let them see for themselves, that they can either spend weeks building a thing, or just connect services together and focus on actually delivering business value instead. Now I’m not saying that this exact process is going to work for your team, I’m just saying that there needs to be a way for engineers to own the problems they are trying to solve.
On our team, anyone can come up with a Stroll Evolution, from Senior engineers to Apprentice software engineers. And with Serverless the engineers need to own more than they might do in other architectures. A Serverless engineer can't just write some code and chuck it over the wall for someone else to deploy, they need to understand what that looks like and own the end to end solution. And in my opinion that's a good thing. But if we want engineers to own the end to end solution, they also need to take ownership of deploying it to production.
Enabling production deployments with confidence
There’s one final point I want to make around building confidence in your Serverless teams and that’s around production deployments, specifically around verifying that things are working as expected. We wanted everyone on the team to have the confidence to kick off a production deployment. But for a period of time this wasn't the case, deployments were left for senior engineers to kick off, this impacted our lead time when delivering new features and reduced our deployment frequency.
No one actually came out and said “only this group of people can manage deployments”, it was just something that happened naturally because people were worried about breaking something. The reason behind this was there was a gap in our testing approach.
Some of you may be familiar with the testing triangle. Where you might have E2E at the top, integration in the middle and unit at the bottom.
On Stroll however we were missing the bit in the middle, we had plenty of unit tests. In fact I would say we had too many unit tests. Also, because we encouraged developers to use things like direct service integrations and step functions, unit testing became less useful. We had some e2e tests that used the UI to drive the test but there was no integration tests.
The reason being that with Serverless, integration testing is hard.
You have two choices, try and run everything locally by simulating AWS on your developer machine. Or deploy to AWS and test in the cloud.
But this was a gap in our verification process that we needed to plug. So (using a Stroll Evolution) we put together a plan to write some integration tests and from that plan we came up with a definition of what integration testing meant for the project:
An integration test (on this project) is defined as one that tests integrations with AWS services, e.g. DynamoDB, S3, SQS. But does not test integrations with third parties, these should be mocked out instead.
The aim of these tests were to firstly verify that the workload is functioning correctly but also that you have the correct IAM permissions and that the necessary resources have deployed correctly.
If you’re deploying a workload to AWS it is really important for everyone involved in the project to be confident that it is running correctly. We have found that writing tests for each step of the process has been immensely helpful in giving us this confidence and as a result production deployments are thankfully not something that are left to an individual but are instead the responsibility of the entire team.
In a our next post a colleague will be sharing how we ensured our Step Function State Machines were executing correctly.