Implementing GitOps: Version-Controlled Infrastructure for Scalable DevOps
GitOps is an operational model that brings DevOps principles to infrastructure automation using Git as the source of truth. It combines the practices of Git-based version control and Infrastructure as Code (IaC) to manage infrastructure in a scalable, consistent, and automated manner.
Introduction: What is GitOps and Why It's Crucial for Modern DevOps?​
With GitOps, infrastructure changes follow the same collaborative, reviewable, and auditable workflows as software development, enabling teams to automate infrastructure deployments, recover quickly from failures, and scale their environments effortlessly.
In this blog, we will dive deep into implementing GitOps with a focus on IaC, using tools like Terraform, Ansible, and Kubernetes. You'll learn how to version-control your infrastructure, automate deployments, and ensure consistency across environments, all while maintaining a scalable DevOps practice.
Why Infrastructure as Code (IaC) is Essential for GitOps​
Before diving into GitOps, it’s important to understand the role of Infrastructure as Code (IaC). IaC is the practice of managing and provisioning computing infrastructure through machine-readable configuration files rather than through manual processes. With IaC, teams can codify infrastructure configurations, allowing them to version control, automate, and share infrastructure setups across teams.
GitOps relies heavily on IaC to manage infrastructure declaratively. By storing infrastructure configurations in Git, you gain the ability to track changes, roll back to previous versions, and review every change before it’s applied to production. This ensures your infrastructure remains consistent, reproducible, and auditable.
Checkout our document on deploying an app in EC2 using Terraform
.
Key Concepts of GitOps with IaC​
- Declarative Infrastructure: Infrastructure is defined using declarative code (e.g., Terraform, Kubernetes manifests), meaning you declare the desired state of the infrastructure.
- Version-Controlled Changes: All infrastructure changes are committed to a Git repository, which acts as the source of truth.
- Automated Sync: An automated process (such as continuous integration or deployment pipelines) ensures that the actual state of your infrastructure matches the desired state defined in Git.
- Auditable and Reversible: Every infrastructure change is tracked, auditable, and reversible, as every change goes through Git version control.
Setting Up GitOps with Terraform​
Terraform is one of the most popular IaC tools that allows you to define, provision, and manage cloud infrastructure using declarative code. GitOps enhances Terraform by making every infrastructure change trackable and reviewable in Git.
Step 1: Writing Your Terraform Configuration​
Let’s start by writing a basic Terraform configuration that provisions an AWS EC2 instance:
# providers.tf
provider "aws" {
region = "us-east-1"
}
# instance.tf
resource "aws_instance" "example" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t2.micro"
tags = {
Name = "ExampleInstance"
}
}
In this configuration:
provider "aws"
: Defines the AWS provider and sets the region to "us-east-1".resource "aws_instance"
: Provisions an EC2 instance using a specified Amazon Machine Image (AMI) and instance type.
Step 2: Version-Controlled Infrastructure​
Next, commit your configuration to a Git repository:
$ git init
$ git add .
$ git commit -m "Initial commit: Terraform configuration for EC2 instance"
By committing this configuration to Git, any changes made to the infrastructure must be done via version-controlled pull requests (PRs). This means that before deploying new infrastructure changes, they can be reviewed, approved, and tested, ensuring a safer and more collaborative workflow.
Step 3: Applying Infrastructure Changes​
To deploy the EC2 instance, run the following Terraform commands:
$ terraform init
$ terraform apply
The terraform init
command initializes your working directory, while terraform apply applies the configuration and provisions the resources on AWS. Terraform then tracks the state of your infrastructure, ensuring that future changes are applied incrementally.
Step 4: Integrating GitOps Workflows​
Now, let’s set up a CI/CD pipeline (e.g., GitLab CI) to automate infrastructure changes:
# .gitlab-ci.yml
stages:
- plan
- apply
plan:
stage: plan
script:
- terraform init
- terraform plan -out=plan.tfout
artifacts:
paths:
- plan.tfout
apply:
stage: apply
script:
- terraform apply plan.tfout
when: manual
In this pipeline:
-
Plan Stage: Runs
terraform plan
, which calculates and outputs the proposed changes without applying them. This output is stored as an artifact. -
Apply Stage: Once the plan is reviewed and approved, the
terraform apply
command applies the changes to your infrastructure. Thewhen: manual
setting ensures that changes must be manually triggered after approval.
Implementing GitOps with Kubernetes Manifests​
When working with Kubernetes, GitOps enables you to manage your Kubernetes infrastructure in a declarative, version-controlled manner. In a GitOps workflow, Kubernetes manifests (YAML files) are committed to a Git repository, and a tool like ArgoCD or Flux ensures that the cluster state always matches the desired state stored in Git.
Step 1: Defining Kubernetes Manifests​
Let’s create a simple Kubernetes deployment manifest for an NGINX pod:
# nginx-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.17.10
ports:
- containerPort: 80
This YAML file defines:
- A deployment named
nginx-deployment
. - A replica set of 3 NGINX pods.
- The container image to use (
nginx:1.17.10
) and the port (80
).
Step 2: Storing Kubernetes Manifests in Git​
As with Terraform, you’ll commit the Kubernetes manifests to a Git repository:
$ git add nginx-deployment.yaml
$ git commit -m "Add NGINX deployment manifest"
By committing the manifest, you can now manage Kubernetes infrastructure using the same GitOps principles: version control, reviewable changes, and automated syncing.
Step 3: Automating Kubernetes Sync with ArgoCD​
ArgoCD is a GitOps tool that continuously monitors a Git repository and ensures the desired Kubernetes state (as defined by the manifests) is applied to the cluster.
To set up ArgoCD, you’ll first deploy it to your Kubernetes cluster:
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
Next, configure ArgoCD to monitor your Git repository:
# argocd-app.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: nginx-app
namespace: argocd
spec:
destination:
namespace: default
server: https://kubernetes.default.svc
source:
path: ./manifests
repoURL: 'https://github.com/your-repo.git'
targetRevision: HEAD
project: default
syncPolicy:
automated:
prune: true
selfHeal: true
In this configuration:
- repoURL: The Git repository URL containing your Kubernetes manifests.
- syncPolicy: Enables automatic syncing, pruning old resources, and self-healing (ensuring that any manual changes in the cluster are reverted to match the Git state).
Now, ArgoCD will automatically synchronize your Kubernetes cluster with the desired state defined in your Git repository.
Managing Multi-Cloud Infrastructure with GitOps and Terraform​
One of the most powerful aspects of GitOps is the ability to manage multi-cloud infrastructure. Using tools like Terraform, you can define cloud resources for multiple cloud providers (e.g., AWS, Azure, Google Cloud) in the same repository and enforce consistency across them.
Step 1: Writing Multi-Cloud Terraform Configuration​
Let’s extend the earlier Terraform configuration to include resources on both AWS and Azure.
# aws.tf
provider "aws" {
region = "us-east-1"
}
resource "aws_s3_bucket" "my_bucket" {
bucket = "my-unique-bucket-name"
acl = "private"
}
# azure.tf
provider "azurerm" {
features {}
}
resource "azurerm_resource_group" "example" {
name = "example-resources"
location = "East US"
}
In this setup:
- aws_s3_bucket: Provisions an S3 bucket on AWS.
- azurerm_resource_group: Creates a resource group on Azure.
Step 2: GitOps for Multi-Cloud​
Commit the configuration to Git:
$ git add aws.tf azure.tf
$ git commit -m "Add multi-cloud Terraform configuration"
Next, use a CI/CD pipeline (similar to the earlier GitLab CI example) to automate the management of your multi-cloud infrastructure. The pipeline will automatically provision and update resources across both AWS and Azure based on the Terraform configurations stored in Git.
Here’s an extended CI/CD pipeline configuration that can handle both AWS and Azure infrastructure changes:
# .gitlab-ci.yml
stages:
- init
- plan
- apply
init:
stage: init
script:
- terraform init
plan:
stage: plan
script:
- terraform plan -out=plan.tfout
artifacts:
paths:
- plan.tfout
apply:
stage: apply
script:
- terraform apply plan.tfout
when: manual
With this pipeline in place, any changes committed to the aws.tf
or azure.tf
files will be automatically detected, and the infrastructure can be updated across multiple cloud providers once the changes are approved.
Code Review and GitOps Best Practices for IaC​
Managing infrastructure with GitOps brings a wealth of benefits, but it also requires adherence to best practices to ensure everything runs smoothly. Here are some key considerations:
1. Infrastructure Modularity​
Modularize your Terraform or Ansible code to make it reusable and scalable. By creating reusable modules, you avoid redundancy, simplify maintenance, and ensure that different parts of your infrastructure are consistent.
Here’s a basic example of a Terraform module for creating an S3 bucket:
# modules/s3/main.tf
resource "aws_s3_bucket" "my_bucket" {
bucket = var.bucket_name
acl = "private"
}
# variables.tf
variable "bucket_name" {
description = "The name of the S3 bucket"
type = string
}
# main.tf (calling the module)
module "s3_bucket" {
source = "./modules/s3"
bucket_name = "my-unique-bucket"
}
With this approach, you can easily reuse the S3 bucket module across multiple environments or teams by simply passing different variables.
2. Pull Requests for Infrastructure Changes​
All changes to infrastructure must be done via pull requests (PRs). This allows for review, testing, and approval before deployment. For example, if a team wants to update the instance type of an EC2 machine, they should:
- Create a branch with the updated
instance.tf
file. - Open a PR for the update.
- Let the CI pipeline run
terraform plan
to verify the changes. - Have peers review the changes.
- Merge the PR after approvals.
This workflow ensures infrastructure changes are transparent, reviewable, and thoroughly tested.
3. Enforce Policies with Sentinel or Open Policy Agent (OPA)​
In large-scale environments, enforcing compliance and security policies is critical. Tools like HashiCorp Sentinel or Open Policy Agent (OPA) can help enforce guardrails by allowing or denying infrastructure changes based on predefined policies.
For example, you can write a Sentinel policy that only allows EC2 instances to be provisioned in specific regions:
import "tfplan"
allowed_regions = ["us-east-1", "us-west-2"]
main = rule {
all tfplan.resource_changes as rc {
rc.type == "aws_instance" and rc.change.after.region in allowed_regions
}
}
This policy will block any infrastructure change that tries to provision an EC2 instance outside of the allowed regions.
4. Implementing Rollbacks in GitOps​
One of the major advantages of GitOps is its ability to easily roll back infrastructure changes. If a deployment goes wrong, you can simply revert to a previous commit in Git, and the infrastructure will automatically sync to the previous state.
For example, to roll back a change:
-
Identify the commit that introduced the problematic change.
-
Revert the commit using Git:
git revert <commit-hash>
- Push the change to the repository. The CI/CD pipeline or GitOps tool (e.g., ArgoCD or Flux) will automatically detect the rollback and apply the necessary changes to the infrastructure.
Monitoring and Observability in GitOps-Managed IaC​
Once your GitOps setup is running, you’ll want to ensure that the infrastructure and GitOps processes are being continuously monitored. Observability tools are essential in large-scale DevOps setups to provide real-time insights into infrastructure state, performance, and potential issues.
1. Infrastructure Monitoring​
Tools like Prometheus
and Grafana
are commonly used in GitOps pipelines to monitor infrastructure. These tools can be configured to alert you of issues such as resource exhaustion (e.g., CPU, memory), network latency, or even failures in your GitOps sync processes.
For example, you can set up a Prometheus alert for high CPU usage on your EC2 instances:
- alert: HighCpuUsage
expr: avg_over_time(node_cpu_seconds_total[5m]) > 0.9
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage detected"
description: "CPU usage has been over 90% for the last 5 minutes."
2. GitOps Sync Monitoring​
Tools like ArgoCD come with built-in monitoring features that track the synchronization state of your cluster. ArgoCD, for instance, can alert you when the actual state of the infrastructure deviates from the desired state defined in Git.
You can set up notifications that alert your team when:
- A sync fails (e.g., due to a merge conflict).
- Infrastructure drifts from the desired state.
- A manual change in the Kubernetes cluster is detected (in a self-healing setup).
Conclusion: The Benefits of Implementing GitOps with IaC​
By implementing GitOps with Infrastructure as Code, you bring a new level of automation, control, and scalability to your DevOps practice. With version-controlled infrastructure, teams can ensure consistency, enforce compliance, and recover from failures quickly. The principles of GitOps help teams achieve:
- Increased visibility and collaboration: Every change is tracked in Git, allowing for greater transparency and collaboration among teams.
- Enhanced security and compliance: Policies can be enforced at the code level, ensuring that infrastructure adheres to organizational standards.
- Scalable and flexible infrastructure: By using tools like Terraform and Kubernetes, infrastructure can be easily scaled or modified as business needs change.
Ultimately, GitOps with IaC allows DevOps teams to manage complex infrastructures efficiently, reduce human error, and ensure that their environments are always in a desired, predictable state.
Final Thoughts​
GitOps is revolutionizing how infrastructure is managed in modern DevOps practices. With the power of IaC, teams can automate every aspect of their infrastructure management, making deployments faster, safer, and more reliable. Whether you are working in a single-cloud, multi-cloud, or hybrid environment, GitOps with IaC provides a clear, auditable, and scalable approach to infrastructure management.
Start small by implementing GitOps for a single service or environment, and gradually expand it to cover all aspects of your infrastructure. The future of DevOps is declarative, version-controlled, and automated, and GitOps is leading the way.