Gruntwork Blog | Atlantis or Terragrunt Scale: Which is best for you?

Infrastructure teams adopt tools like Terragrunt Scale and Atlantis because OpenTofu and Terraform workflows become more complex as infrastructure scales: more modules, more environments, more cloud accounts, more reviewers, and more opportunities for drift or out-of-order changes.

Atlantis is a self-hosted automation server that runs plans and applies through PR comments. It is flexible, broadly compatible, and intentionally narrow in scope.

Terragrunt Scale is a commercial, Terragrunt-native orchestration layer that runs inside GitHub Actions or GitLab CI and adds opinionated automation for multi-account orchestration, dependency-aware runs, drift detection, and apply-after-merge workflows.

Both Terragrunt Scale and Atlantis are capable tools for bringing IaC into a PR-driven workflow, so how do you choose which is best for your organization?

On the surface, it might come down to a critical feature that only one supports. However, your choice really should depend less on features and more on what kind of operating model your team wants. If you need a vendor-neutral PR automation layer that can fit a mixed IaC environment, Atlantis may be the better fit. If you are already using Terragrunt, or you want an opinionated workflow for scaling infrastructure across many accounts and environments, Terragrunt Scale may remove a lot of custom glue.

Let’s take a look at both, particularly the areas that matter most to platform and DevOps teams: workflow model, hosting, scalability, drift detection, policy controls, multi-account orchestration, dependency management, VCS support, and operational overhead.

Here’s the TL;DR:

Terragrunt Scale is best if you need:

Built-in drift detection
High-availability/multi-instance operation, scaling across your CI runners rather than an individual server
Vendor support and backwards-compatibility guarantees
DAG-aware multi-account orchestration
OIDC role isolation for multiple cloud accounts
Apply-after-merge GitOps rather than comment-driven applies
Horizontal scaling across very large estates

Choose Atlantis if you need:

VCS support other than GitHub and GitLab
A vendor-neutral tool that works well in an environment with a mix of IaC tools (Atlantis runs whatever you put in a run step, so even custom CLI tools can be integrated into your workflow)
Explicit comment-driven plan/apply separated from the merge event

Two different models for PR-driven infrastructure automation

The two tools are easiest to compare if you start with what each one assumes about your infrastructure workflow.

Terragrunt Scale turns Terragrunt into an operating model

Terragrunt Scale is explicitly built around Terragrunt and expects your IaC structure to follow Terragrunt's mental model: units for deployable infrastructure, stacks for repeatable patterns, and a dependency graph (DAG) that describes how those units relate to each other. If your team already organizes infrastructure this way, Terragrunt Scale can use that structure directly instead of forcing you to recreate it in one-off CI scripts.

That is the main difference from a generic PR automation tool. Terragrunt Scale is not just running `plan` and `apply`; it uses Terragrunt's dependency graph to decide what changed, what needs to run, what order those runs should happen in, and which cloud role each unit should assume. That gives platform teams a more complete workflow for multi-account infrastructure: CI/CD, drift detection, version updates, OIDC-based authentication, and dependency-aware orchestration all follow the same model.

Terragrunt Scale runs within your GitHub Actions or GitLab CI runners, so there's no separate execution service or long-running automation server to maintain. The work happens inside your CI environment, using your runners, your repository permissions, your branch protection, your audit trail, and your cloud accounts.

Where Atlantis still makes sense

The single problem Atlantis is built to address is to make Terraform changes visible on pull requests. It's built with the Unix philosophy in mind: it's a single daemon for a single, well-defined job (to broker plan/apply through PR comments), and to do that job well while leaving everything else to other tools or your own custom scripts. The Atlantis community has built an ecosystem around this to help fill gaps, i.e. with Docker images, OPA integration patterns, and custom workflows designed to extend the power of the Atlantis core.

Atlantis listens for VCS webhooks, then runs terraform plan, import, and apply remotely, then adds comments to the PR. It's only opinionated about the PR comment workflow. State layout, directory structure, dependency ordering, etc. are left to you.

How the tradeoffs compare

Self-managed server or CI-native automation?

Terragrunt Scale runs as CI actions within your existing pipeline (including self-hosted GitHub/GitLab Enterprise and air-gapped environments), so there's no dedicated server to maintain or an execution layer for Gruntwork to host. Gruntwork does host a dev portal to distribute proprietary reusable workflows, e.g. gruntwork-io/pipelines-workflows.

Atlantis is only self-hosted, and runs as a single Go binary or a Docker image (ghcr.io/runatlantis/atlantis). Unlike Terragrunt Scale's ephemeral CI runners, the Atlantis daemon is a continuously-run process. Official deployment options include a raw Kubernetes StatefulSet, Helm Chart, GCE module, ECS module, or bare Docker. As a single process, there's no high-availability/clustering option. Persistent disks are recommended to preserve locks and plan files across restarts.

Multi-account operations: Native OIDC role isolation or custom workflow glue?

Terragrunt Scale has built-in support for multiple environments and accounts. You can configure it via the Pipelines HCL. For example, you could define prod with something like this:

environment "prod" {
    filter { paths = ["prod/**"] }
    authentication { aws_oidc { ... } }
}

Terragrunt's terragrunt.stack.hcl allows you to reuse patterns across different environments. Pipelines will generate units from these during plan/apply.

Atlantis doesn't have an internal concept of environment or account. Environments can be modeled through projects in atlantis.yaml and workspace-specific tfvars. Atlantis checks for env/{workspace}.tfvars. Cross-project ordering is determined by execution_order_group (lower numbers run first) and depends_on. parallel_plan and parallel_apply respect those groups. Different cloud roles per environment are implemented through env blocks or custom run steps per workflow.

Scaling execution: Ephemeral CI runners or a long-running daemon?

Terragrunt Scale inherits the scalability of your existing GitHub Actions or GitLab CI runners. Every run is a fresh, ephemeral container. Terragrunt 1.0 introduced a Runner Pool concurrency model that uses a shared OpenTofu provider plugin cache. Dependency groups are built automatically based on DAG constraints to run in parallel.

Since Atlantis is a single daemon, it can only really be scaled vertically by increasing the CPU/RAM/disk capacity of the server it runs on. It does have configurable per-repo parallelism (parallel_plan: true, parallel_apply: true) and it respects execution_order_group. It doesn't have a built-in queue manager beyond the per-directory lock. The built-in locking is in-memory, so locks are lost if the process restarts during an apply.

Drift detection: Automated remediation or DIY detection?

Like all actions in Terragrunt Scale, drift detection is run as a GitHub action or GitLab CI workflow. You can run it on a cron schedule, or manually via workflow_dispatch. It runs terragrunt plan on every unit by default, or it can be run only on a filtered subset, e.g. like management/**.

When it detects drift, it opens a remediation PR/MR on a drift-detection branch. Then you can merge the PR to re-apply IaC to overwrite the drift, or update the IaC to reflect the drifted state. Recent releases support Terragrunt stacks natively, so it will auto-generate Stack-derived units before planning, and it also supports advanced filtering (comma-separated path patterns and an ignore list).

Atlantis doesn't have built-in drift detection. It can be handled manually, typically by running a cron job that triggers atlantis plan by pushing commits or via the API.

User interface: Dashboard or CI-native visibility?

Terragrunt Scale has no dedicated UI. Instead, it's managed via your GitHub Actions or GitLab CI, i.e. via run history and job logs, along with the substantial PR comments it generates. The Gruntwork dev portal handles account/subscription management as well as an early preview "infrastructure insights" view based on metadata collected during pipeline runs. Terragrunt does offer a Catalog TUI for scaffolding new units.

Atlantis has a simple built-in web UI served on port 4141, though its scope is limited. It shows recent activity and active locks, as well as enabling you to manually unlock, but isn't designed to convey information about your IaC. Auth is handled via basic auth or fronted by an SSO proxy.

Notifications and alerting: Built-in hooks or CI platform signals?

Terragrunt Scale integrates directly into your CI platform. So if your GitHub Actions are set up for Slack notifications, then Terragrunt Scale will deliver notifications via Slack just like your other PRs with all the necessary details in the comments. For example, drift detection notifications show up as opened PRs/MRs on a drift-detection branch. Terragrunt Scale has also recently introduced built-in hooks allowing for custom notifications and other automation on various points of the execution lifecycle.

Atlantis has built-in Slack webhooks, configured under webhooks: in the server-side config with regex filters for event, workspace, and branch. You can add custom run steps to push other notifications to anything that can be pushed by running a terminal command. You can also set up post-workflow hooks or scrape Prometheus metrics.

Secrets management: Static host credentials or dynamic runtime authentication?

Terragrunt Scale operates on the principle of dynamically resolving credentials at runtime rather than storing anything long-term. It uses OIDC for cloud auth. It then exposes credentials to Terragrunt as environment variables, like GOOGLE_OAUTH_ACCESS_TOKEN or ARM_CLIENT_ID. If you need to fetch secrets from another source, there's a custom authentication block that will run whatever shell command you supply in auth_provider_cmd. It emits an envs JSON object.

Since Atlantis runs as a daemon on a server you provide, it inherits whatever cloud credentials or instance profile are present on the host. So VCS auth secrets can be passed via env vars or Kubernetes secrets. There's no out-of-the-box support for things like AWS Secrets Manager or Azure Key Vault, but any secret that can be discovered by a shell command can be fetched via a run command.

State management: Same backend, different credential model

Both Terragrunt Scale and Atlantis manage state through your OpenTofu or Terraform backend that you'll need to have configured separately.

Terragrunt Scale uses OIDC, so each runner gets an ephemeral IAM role per unit. Auth blocks in .gruntwork/*.hcl map filesystem paths to AWS accounts, Azure subscriptions, or GCP service accounts via aws_oidc, Azure federated identity, or GCP Workload Identity Federation respectively. Since this all runs on your own CI pipeline, no Gruntwork service has access to them.

Atlantis gets its state credentials from the server it runs on, usually from IRSA on EKS or an instance profile on EC2. It handles both backend state locking and directory-level locking.

Policy controls: Conftest/OPA Integrations

Terragrunt Scale offers Conftest/OPA integration scoped at Terragrunt's native units, stacks, etc. Both advisory (warn) and blocking (deny) enforcement modes allow gradual adoption. At a unit level, policies validate what's being created, and are targeted via filter blocks using environments and labels. At a stack level (i.e. orchestrated group of units), policies validate how things relate, and are implicitly scoped via stages = ["post-stack-plan"], which runs once per stack. For environments (i.e. deployment contexts like dev, staging, and prod), policies enforce risk and governance tiers. They're explicitly targeted via environments in the filter block. And at the broadest scope, the repo level, a policy_assignment with no filter block applies to every unit in the repo.

Policies are git-backed, and can be stored as a central monorepo with a subdirectory per bundle, one repo per bundle, or co-located for local ownership. In the case of a false positive or intentional, documented deviation from policy, a GitOps-native approval workflow also allows designated users to manually override deny policies, keeping the audit trail fully within version control. Gruntwork ships several drop-in starter policy bundles for things like encryption requirements, IAM hygiene, network guardrails (no public S3 buckets, no inbound port 22/3389, etc.), and cost controls.

Atlantis has native, server-side Conftest/OPA integration. Enabling --enable-policy-checks and configuring a policies: block with policy_sets, owners, and approve_count causes Atlantis to run Rego policies against the plan output's SHOWFILE (the terraform show -json output). Failing a policy blocks apply until either (a) a successive commit fixes it or (b) a top-level policy owner uses atlantis approve_policies. Policies can be local, S3-hosted (via Conftest's go-getter), or fetched from any go-getter-compatible source. A custom_policy_check flag allows running non-Conftest tools. RBAC in Atlantis is coarse; control is at the VCS layer (who can comment on the PR) plus the policy owners list.

Dependency management: DAG-aware orchestration or manually ordered projects?

Terragrunt Scale builds a DAG of dependencies off of Terragrunt's dependency block. This block lets you define a unit's outputs which are other units' inputs. Terragrunt 1.0 uses a Runner Pool model that can run multiple units concurrently as permitted by the defined dependencies.

Atlantis dependencies are defined declaratively in atlantis.yaml. You set execution_order_group and depends_on. The order group is an integer where lower numbers run first, so it's up to you to manage order, unlike Terragrunt Scale's automatic DAG generation. Also, each project is an isolated Terraform invocation, so there's no out-of-the-box support for passing outputs of one project to another.

CI/CD integration: Replace part of the pipeline or extend it?

Terragrunt Scale is designed entirely around integrations to your existing CI. You add gruntwork-io/pipelines-workflows, gruntwork-io/pipelines-execute, etc. to your existing .github/workflows/*.yml or .gitlab-ci.yml. All of your existing CI (secret stores, custom steps, etc.) remain in place.

Atlantis is the executor, it replaces a portion of your existing CI/CD. Think of it as Atlantis is your new CI for Terraform. Integrating Atlantis into your workflow typically involves removing or reducing your existing Terraform stages in your GitHub Actions or GitLab pipelines, and instead routing those operations through Atlantis. An alpha API is available for invoking Atlantis through existing pipelines, or Atlantis can be invoked by PR events.

VCS flexibility: Broad compatibility or GitHub/GitLab-focused depth?

Terragrunt Scale follows the "apply-after-merge" pattern. Opening a PR runs terragrunt run --all on all affected units. Merging to the deploy branch (typically main) triggers terragrunt run --all apply. The plan output appears as a PR comment, with optional in-place comment updates (new_comment_per_push = false) so previous outputs are preserved in GitHub's edit history. Pipelines does not implement its own lock manager, it relies on the Terragrunt CLI plus VCS branch protection. There is a pipelines-unlock.yml workflow for releasing stuck state. Approvals are enforced via standard GitHub/GitLab branch protection rules.

When a PR is opened, Atlantis receives a webhook, auto-discovers (or reads from atlantis.yaml) which projects were modified, and runs atlantis plan automatically. Reviewers see the plan inline in the PR; an approver comments atlantis apply to apply. Commands include atlantis plan, atlantis apply, atlantis unlock, atlantis import, atlantis state rm, atlantis approve_policies, and atlantis version. Each command accepts -d (directory), -p (project), -w (workspace) and --auto-merge-disabled flags. The default workflow is plan-on-PR / apply-on-comment; auto-merge after apply is also supported (automerge: true).

Atlantis locks a directory/workspace until the PR is merged or the lock is manually deleted, ensuring serial application. The repo_locks setting (default on_plan) can be changed to on_apply or disabled. Locks are in-process (BoltDB on disk by default; Redis optional), and external analyses note that "if the process restarts mid-apply, locks are lost" — a real operational consideration.

The apply_requirements array supports approved (a non-author has approved), mergeable (PR passes branch protection / status checks), and undiverged (project files are not behind base). There is also a targeted undiverged mode that uses when_modified patterns so monorepos do not block unrelated applies.

Most teams need a DevOps operating system, not a tool

Atlantis remains a strong choice for teams that want an open source, vendor-neutral way to run Terraform or OpenTofu from pull requests, especially when they need broad VCS support or a mixed IaC environment. But as infrastructure estates grow across accounts, environments, and interdependent modules, the operational burden often shifts from "how do we run plan and apply?" to "how do we coordinate, govern, detect drift, and scale the workflow safely?"

That is where Terragrunt Scale is designed to fit. For teams already using Terragrunt, or teams ready to adopt its opinionated model, Terragrunt Scale turns that structure into a CI-native workflow with dependency-aware runs, drift detection, OIDC-based account isolation, and vendor-backed automation.

The right choice depends on your team's constraints. Choose Atlantis when flexibility and tool neutrality matter most. Choose Terragrunt Scale when you want a more complete operating model for managing infrastructure at scale.

Terragrunt Scale offers a free tier for up to 25 infrastructure units, and paid plans for larger projects. Learn more and sign up here.

Enjoyed the article?

Atlantis or Terragrunt Scale: Which is best for you?

Two different models for PR-driven infrastructure automation

Terragrunt Scale turns Terragrunt into an operating model

Where Atlantis still makes sense

How the tradeoffs compare

Self-managed server or CI-native automation?

Multi-account operations: Native OIDC role isolation or custom workflow glue?

Scaling execution: Ephemeral CI runners or a long-running daemon?

Drift detection: Automated remediation or DIY detection?

User interface: Dashboard or CI-native visibility?

Notifications and alerting: Built-in hooks or CI platform signals?

Secrets management: Static host credentials or dynamic runtime authentication?

State management: Same backend, different credential model

Policy controls: Conftest/OPA Integrations

Dependency management: DAG-aware orchestration or manually ordered projects?

CI/CD integration: Replace part of the pipeline or extend it?

VCS flexibility: Broad compatibility or GitHub/GitLab-focused depth?

Most teams need a DevOps operating system, not a tool

Terragrunt: A tour of the streamlined CLI

Terragrunt: How to use the autoinclude block

Terragrunt: How to use the filter flag

Platform

Services

Open Source

Resources

Company

Atlantis or Terragrunt Scale: Which is best for you?

Two different models for PR-driven infrastructure automation

Terragrunt Scale turns Terragrunt into an operating model

Where Atlantis still makes sense

How the tradeoffs compare

Self-managed server or CI-native automation?

Multi-account operations: Native OIDC role isolation or custom workflow glue?

Scaling execution: Ephemeral CI runners or a long-running daemon?

Drift detection: Automated remediation or DIY detection?

User interface: Dashboard or CI-native visibility?

Notifications and alerting: Built-in hooks or CI platform signals?

Secrets management: Static host credentials or dynamic runtime authentication?

State management: Same backend, different credential model

Policy controls: Conftest/OPA Integrations

Dependency management: DAG-aware orchestration or manually ordered projects?

CI/CD integration: Replace part of the pipeline or extend it?

VCS flexibility: Broad compatibility or GitHub/GitLab-focused depth?

Most teams need a DevOps operating system, not a tool

Related Articles

Terragrunt: A tour of the streamlined CLI

Terragrunt: How to use the autoinclude block

Terragrunt: How to use the filter flag

Platform

Services

Open Source

Resources

Company