Infrastructure as Code is the Best
Infrastructure as code (IaC) is the practice of managing hardware and service provisioning using code. This includes provisioning of hardware for compute, creating data bases, defining pipelines, managing DNS, defining service types and allocations for managed services, orchestration of these services, and more!
In my opinion, it is the most important innovation in cloud service management and I will never go back to clicking within the GUI of the cloud consoles like AWS. Here is why...
Types of IaC Tooling
There are a few different types of infrastructure as code that I think about when I hear the term. They all have similar concepts of stacks which are composed of constructs. Stacks are groupings of related services which are meant to be deployed together since the stack is a deployment concept. These are composed of constructs which can be as simple as the definition of a single service and its configuration, or can be more complex by representing a grouping of services and configurations which are meant to be used as a whole unit exposed to the developer as a simpler set of parameters compared with the lower level constructs. For implementation of these details, I think about 2 types of IaC:
Configuration-based tooling
Configuration based tooling includes tools like Cloudformation (CFN) or Terraform are not technically code since they describe infrastructure in formats like JSON and YAML. Cloudformation is the AWS service which takes the CFN templates and creates all the AWS resources these describe. Terraform, as another example, is an open source tool that provides these same tools, but across cloud providers (AWS, Azure, GCP, Oracle, etc).
These configuration tools serve as the baseline for other code-based tools to be built on top of. They serve as the primitives which are transformed by the respective cloud providers into the infrastructure and service definitions the developers want.
Cloud Developer Kits
The more complex (in function) tooling is called Cloud Developer Kits (CDKs). These are type-safe code libraries which enable the developer to use actual code like Typescript or Python to produce the appropriate configuration files (from above). In addition to a library, these come with CLI and cloud-based tooling which ease the use of producing the underlying configuration files (like YAML CFN) in conjunction with packaging code, uploading runtime code artifacts, uploading static assets for websites and anything else that is needed to actually deploy your application. Since they provide typing (assuming you use a language that leverages this), it drastically reduces your chance of error in comparison to pure configuration-based tooling since you can use your IDE to help you to write the code and build-time tooling that produces the configuration will stop lots of types of errors before uploading occurs. As a simple example, this eliminates the creation of typos since you can instead use the library of enums to set the right value. In more complex cases, sometimes configuration for a service includes mutually exclusive options across parameters. In this case, the build-time tooling can catch this and throw an error with appropriate messaging before you try to upload the produced template and debug with the in-depth docs. This build-time mechanism and ability to look through the code for the CDK library is extremely helpful to speed up development of IaC.
Additionally, since it is code, you can use things like for-loops to create repeated patterns of infrastructure. In YAML, you need to just copy/paste and then remember to modify all of the sections if you ever make a sweeping change (breaks the DRY principle). You can even define your own constructs by combining different primitives while exposing only what you need to expose as parameters for that specific pattern which enables a lot of benefits across an organization.
My personal preference is to use CDKs since they give everything you need to manage your infrastructure, reduce development mistakes through enforcement of types and build-time errors, and enable creation of your own constructs, easily.
Benefits of IaC
Infrastructure as code is code and has the benefits of code. This means:
- Version control infrastructure: Check in versions of infrastructure changes alongside git messages that describe the diff's purpose. You can even mutate infrastructure alongside code and deploy these together in pipelines.
- Review Infrastructure: Since it is code, it can be reviewed in diffs before deployment to a wider audience and can be subject to your team's code review processes and all its benefits.
- Duplicate stacks, easily: It is easy to deploy a beta or personal test version of a stack. Code allows you to provide other identifiers as arguments that can be used to create new test stacks, in order for each developer to iterate independently on their own replicas. Additionally, test/integration environments are equally trivial to create.
- Security enforcement mechanism: Security practices for your company can be applied across the organization in a very easy way. You can enforce that certain infrastructure is create in certain ways through custom constructs which define infrastructure according to your practices.
- Build a library of examples: You can create a library of example architectures for your own ease of creation for new projects. Since it is code, you can copy/paste and modify your architecture for a new service. Additionally, since these tools have been out for a long time, you can use the examples provided on the internet!
- More easily manage complexity: Architecture can get very complex. Infrastructure as code provides you a mechanism to break this down (through stacks and constructs). Additionally, this helps enable new developers or auditors to have a place to ramp-up on what actually exists in the otherwise nebulous "cloud".
- GUIs change: No matter how good your infrastructure creation runbook may be, AWS, Google Cloud, etc all change their GUIs with time. This means your screenshots and maybe even configuration options are no longer up to date. The code libraries on the other hand might get marked deprecated prior to being removed, giving you insight into when services change configuration and giving time to correct rather than relying on outdated runbooks for infrastructure deployments.
AI and IaC: Since infrastructure as code is...code, you can use AI coding agents to help write it! AI in my experience is quite good at writing the majority of IaC I have thrown at it. Even ad-hoc coding tools are great at writing infrastructure code. Once you have a baseline, you don't want to be changing it often. One word of caution though, be sure to review the code and make sure you fully understand what it is doing so you do not end up with a huge cloud bill. And set reminders in your billing tool to tell you if you breach your spending thresholds.
Having the ability to track, review, duplicate, and more easily understand infrastructure by looking at code is extremely powerful. And if you use pipelines or CDK, you can track (in version control) and deploy other runtime code alongside the infrastructure that is generated, giving a very powerful tool for managing complexity in cloud services. By using this form of automation, you cannot forget to set some vital piece of configuration like would happen if you are following manual runbook instructions to setup a service or make some configuration change on deployment. And the ability to build a library of useful (or security critical) constructs is extremely helpful in an enterprise organization.
Downsides of Using IaC
So, why not use infrastructure as code? As an experienced cloud software developer, it is hard for me to advocate for not using IaC due to the variety of benefits listed above. That said, there are some downsides:
Learning Curve
I think the #1 reason why you might not want to use IaC is due to the learning curve which is quite steep. Architecture as a whole can be complicated as you have seemingly endless options for every type of service with cloud providers these days. Since you are writing code to codify these things, understanding of the options available to you is a learning curve all on its own.
On top of this, learning the workflow of CDK is a departure from other types of testing. The same can be said for learning the workflow for a custom-built solution on top of the template-based solutions. You now have more options on how to build and test software: locally, in personal test stacks, and in integration/prod environments, each with it's own state. While IaC provides you the tools to enable this, your own project needs will vary and may necessitate other types of tooling which might not be easily compatible with a CDK system.
Deployment Hardships
This is sort of an extension of the learning curve above, but a specific problem related to IaC. Deploying code via IaC is easy in mechanics: you either generate or write a couple configuration files, then you upload them and their artifacts to your cloud providers, and they create resources! In practice, this is not so trivial.
Remember that stacks are groups of resources which are meant to be deployed together. Within a stack, these resources might be tightly coupled together. Maybe you have some reason you need to decouple these resources. You cannot simple move a resource as is between stacks without having issues if this has been deployed as easily as you can move the code. One common example, if you manage DNS through your CDK code, you will want to do this within its own stack and export the Certificate and DNS records from this stack. This way, any number of other stacks can use those tools while not being coupled to each other, only relying on the DNS stack to exist. So you can have stack #1 that serves your website and stack #2 that serves your API where both rely on the DNS record and certificate, but they can then be deleted independently or arbitrarily changed without worrying about the other or deleting your certificate. If one of these stacks owned the DNS record, it cannot be deleted and the other stack now would rely upon that deployment unit. So you have to pay a lot of attention to how you couple these resources together and if you are like me, you will make tons of mistakes, especially as you are learning to use these tools.
Adoption of Tooling
CDKs may not be easily adopted in your enterprise. Maybe you have a mix of Cloud and self-hosted infrastructure or use some services which do not have any analagous concepts to CFN. My advice in these situations is to adopt what you can. My very first infrastructure as code project involved a small Python job that ran on an hourly interval and processed some data from a monolithic application that was not using traditional AWS tooling. This partial adoption will not let you realize all of the benefits of IaC, but at least you can start somewhere! Depending on how complex your situation is, this might be hard.
Example
If you would like an example, I have an ai-generated technical architecture article based off my own personal preferences for development if you would like an example of how I like to adopt this tech in a mono repo. Keep the infrastructure in it's own module/package and allow it to rely on the compute package for your backend since it needs the built artifacts for compute. Then the client artifacts also can be used by the infrastructure package, in this example, for serving static files through S3. This is the basic setup I like to start with on a new web service project.
Use Infrastructure as Code!
Despite the challenges that infrastructure as code presents, IaC is an incredibly powerful tool for managing the complexity of cloud services both within a single provider and across cloud providers. It allows for enforcement of your enterprise strategy towards architecture and security while enabling new ways to develop and test software across your organization. While there is a learning curve to using the tooling and to cloud architecture as a whole, the benefits to developing infrastructure and replicating it compound for you over time. Every developer working with cloud providers should be using this powerful tool.