Skip to content

github-workflows-dojo360-terraform-ops

Terraform state management and troubleshooting operations including state lock resolution and force unlock capabilities

active
IDE:
claude
codex
vscode
Version:
1.0.0
Owner:pcorazao
github-actions
workflow
dojo360

Terraform Operations Skill

Overview

The Terraform Operations workflow enables state management and troubleshooting for Terraform deployments across multi-cloud environments. This workflow is essential for resolving state lock issues that can occur when Terraform operations fail or are interrupted, leaving the state file locked and blocking subsequent operations.

Primary Use Cases:

  • State Lock Resolution: Unlock Terraform state files when operations are interrupted
  • Force Unlock: Forcefully remove locks when automatic resolution fails
  • State Troubleshooting: Diagnose and resolve state file conflicts
  • Emergency Recovery: Restore Terraform operations after failures

When to Use This Workflow:

  • Terraform operations are blocked with "state locked" errors
  • Previous Terraform runs failed and left state locks behind
  • Need to manually intervene in state management
  • Troubleshooting multi-environment deployment conflicts

Workflow Reference

Repository: dojo360/pipelines-workflows
Workflow: .github/workflows/terraform-ops.yml
Version: v2.0.0 (stable) or @beta (latest)
Documentation: terraform-ops/index.md

Key Features

State Lock Management

  • Force Unlock: Remove persistent state locks using unique lock IDs
  • Multi-Cloud Support: AWS (awsOptum, awsChc20), Azure (azureOptum), GCP
  • Backend Compatibility: Works with azurerm, S3, GCS, and TFE backends
  • Safe Operations: Requires explicit lock ID to prevent accidental unlocks

Enterprise Integration

  • OIDC Authentication: Keyless authentication for AWS and Azure
  • Metadata API Integration: Automatic configuration from Dojo360 metadata
  • Runner Selection: Automatic runner assignment based on cloud-type
  • Terraform Mirroring: Uses enterprise Terraform provider mirrors

State File Operations

  • Lock Identification: Retrieve and identify specific lock IDs
  • State Inspection: View current state lock status
  • TFE Support: Compatible with Terraform Enterprise workspaces
  • Backend Flexibility: Supports multiple backend configurations

Prerequisites

Before using this workflow, ensure:

  1. Metadata API Onboarding

  2. OIDC Configuration (Required for OIDC cloud types)

  3. Lock ID Identification

    • Obtain the lock ID from error messages or state backend
    • Lock ID format varies by backend type
  4. Access Permissions

    • Appropriate cloud permissions to modify state files
    • GitHub repository permissions for workflows

Requirements

Terraform and Provider Versions

  • Terraform: ~> 1.9.x (default: 1.9.2)
  • AWS Provider: ~> 5.xx (for AWS operations)
  • AzureRM Provider: ~> 3.xx (for Azure operations)
  • GCP Provider: ~> 6.xx (for GCP operations)

GitHub Actions Permissions

permissions:
  actions: read
  contents: write
  id-token: write  # Required for OIDC authentication
  pull-requests: write
  security-events: write

Input Reference

Required Inputs

InputDescriptionExample
aide-idAideId from aide.optum.com for metadata12345
cloud-typeTarget cloud platform (supported types)awsOptum, azureOptum, gcp, awsChc20
domainDomain name for metadata lookupplatform-engineering
environmentDeployment environment (defines approval gates)dev, qa, stage, prod
lock-idUnique identifier of the Terraform state lock to removeabc123-def456-ghi789
team-nameTeam name for metadata lookupinfrastructure-team

Optional Inputs

InputDefaultDescription
backend-typeazurermBackend type: azurerm, s3, gcs, tfe
refHEADBranch, tag, or SHA to checkout
remote-state-file-name''Filename of remote state file (when not using env vars)
runner-labels''Comma-separated custom runner labels
terraform-directory.Directory path relative to repo root containing Terraform code
terraform-loggingoffTerraform logging level: off, trace, debug, info, warn, error
terraform-provider-network-mirrorhttps://repo1.uhc.com/artifactory/api/terraform/terraform-virtual/providers/Terraform provider mirror URL
terraform-version1.9.2Terraform version to use
tfe-hostname''Terraform Enterprise hostname
tfe-organization''Terraform Enterprise organization
tfe-workspace''Terraform Enterprise workspace name

Backend-Specific Inputs

For AzureRM Backend:

  • State configuration from metadata or explicit backend config

For S3 Backend:

  • S3 bucket and key configuration from metadata

For GCS Backend:

  • GCS bucket and prefix configuration from metadata

For TFE Backend:

  • Requires tfe-hostname, tfe-organization, tfe-workspace

Secrets Management

Required Secrets

SecretDescriptionScope
GH_TOKENClassic GitHub Personal Access Token (PAT) with SSO authorizationRepository or Organization

GH_TOKEN Requirements:

  • Must have SSO authorization to Dojo360 and your GitHub organization
  • Minimum scopes: repo(all) and workflow
  • Used to read GitHub environment variables during workflow runs

Usage Examples

Example 1: Basic State Lock Removal (AWS)

name: Unlock Terraform State

on:
  workflow_dispatch:
    inputs:
      lock-id:
        description: 'Lock ID to remove'
        required: true
        type: string
      environment:
        description: 'Environment to unlock'
        required: true
        type: choice
        options:
          - dev
          - qa
          - stage
          - prod

permissions:
  actions: read
  contents: write
  id-token: write
  pull-requests: write
  security-events: write

jobs:
  unlock-state:
    runs-on: uhg-runner
    uses: dojo360/pipelines-workflows/.github/workflows/[email protected]
    with:
      # Required inputs
      aide-id: '<change me>'
      cloud-type: 'awsOptum'
      domain: '<change me>'
      environment: ${{ inputs.environment }}
      lock-id: ${{ inputs.lock-id }}
      team-name: '<change me>'
      
      # Optional inputs
      terraform-directory: 'terraform/infrastructure'
      backend-type: 's3'
    secrets:
      GH_TOKEN: ${{ secrets.GH_TOKEN }}

Example 2: Azure State Unlock with AzureRM Backend

name: Unlock Azure Terraform State

on:
  workflow_dispatch:
    inputs:
      lock-id:
        description: 'Lock ID from error message'
        required: true
      environment:
        description: 'Environment'
        required: true
        default: 'dev'

permissions:
  actions: read
  contents: write
  id-token: write
  pull-requests: write
  security-events: write

jobs:
  azure-unlock:
    runs-on: uhg-runner
    uses: dojo360/pipelines-workflows/.github/workflows/[email protected]
    with:
      aide-id: '<change me>'
      cloud-type: 'azureOptum'
      domain: '<change me>'
      environment: ${{ inputs.environment }}
      lock-id: ${{ inputs.lock-id }}
      team-name: '<change me>'
      
      # Azure-specific configuration
      backend-type: 'azurerm'
      terraform-directory: '.'
      terraform-version: '1.9.2'
    secrets:
      GH_TOKEN: ${{ secrets.GH_TOKEN }}

Example 3: GCP State Unlock with GCS Backend

name: Unlock GCP Terraform State

on:
  workflow_dispatch:
    inputs:
      lock-id:
        description: 'GCS state lock ID'
        required: true
      environment:
        description: 'Target environment'
        required: true

permissions:
  actions: read
  contents: write
  id-token: write
  pull-requests: write
  security-events: write

jobs:
  gcp-unlock:
    runs-on: uhg-runner
    uses: dojo360/pipelines-workflows/.github/workflows/[email protected]
    with:
      aide-id: '<change me>'
      cloud-type: 'gcp'
      domain: '<change me>'
      environment: ${{ inputs.environment }}
      lock-id: ${{ inputs.lock-id }}
      team-name: '<change me>'
      
      backend-type: 'gcs'
      terraform-directory: 'gcp/terraform'
    secrets:
      GH_TOKEN: ${{ secrets.GH_TOKEN }}

Example 4: Terraform Enterprise (TFE) Workspace Unlock

name: Unlock TFE Workspace State

on:
  workflow_dispatch:
    inputs:
      lock-id:
        description: 'TFE lock identifier'
        required: true
      workspace:
        description: 'TFE workspace name'
        required: true

permissions:
  actions: read
  contents: write
  id-token: write
  pull-requests: write
  security-events: write

jobs:
  tfe-unlock:
    runs-on: uhg-runner
    uses: dojo360/pipelines-workflows/.github/workflows/[email protected]
    with:
      aide-id: '<change me>'
      cloud-type: 'awsOptum'
      domain: '<change me>'
      environment: 'prod'
      lock-id: ${{ inputs.lock-id }}
      team-name: '<change me>'
      
      # TFE-specific configuration
      backend-type: 'tfe'
      tfe-hostname: 'app.terraform.io'
      tfe-organization: '<change me>'
      tfe-workspace: ${{ inputs.workspace }}
    secrets:
      GH_TOKEN: ${{ secrets.GH_TOKEN }}

Example 5: Custom Runner Labels and Advanced Configuration

name: Advanced State Unlock

on:
  workflow_dispatch:
    inputs:
      lock-id:
        description: 'Lock ID'
        required: true
      environment:
        description: 'Environment'
        required: true
      enable-logging:
        description: 'Enable Terraform debug logging'
        type: boolean
        default: false

permissions:
  actions: read
  contents: write
  id-token: write
  pull-requests: write
  security-events: write

jobs:
  advanced-unlock:
    runs-on: uhg-runner
    uses: dojo360/pipelines-workflows/.github/workflows/[email protected]
    with:
      aide-id: '<change me>'
      cloud-type: 'awsOptum'
      domain: '<change me>'
      environment: ${{ inputs.environment }}
      lock-id: ${{ inputs.lock-id }}
      team-name: '<change me>'
      
      # Advanced configuration
      backend-type: 's3'
      terraform-directory: 'infrastructure/terraform'
      terraform-version: '1.9.5'
      terraform-logging: ${{ inputs.enable-logging && 'debug' || 'off' }}
      runner-labels: 'uhg-runner,large-runner'
      terraform-provider-network-mirror: 'https://repo1.uhc.com/artifactory/api/terraform/terraform-virtual/providers/'
    secrets:
      GH_TOKEN: ${{ secrets.GH_TOKEN }}

Example 6: Multi-Environment Lock Resolution with Conditional Logic

name: Emergency State Unlock

on:
  workflow_dispatch:
    inputs:
      environment:
        description: 'Environment to unlock'
        required: true
        type: choice
        options:
          - dev
          - qa
          - stage
          - prod
      lock-id:
        description: 'Lock ID (from error message)'
        required: true
      confirm-unlock:
        description: 'Type "CONFIRM" to proceed'
        required: true

permissions:
  actions: read
  contents: write
  id-token: write
  pull-requests: write
  security-events: write

jobs:
  validate-confirmation:
    runs-on: uhg-runner
    outputs:
      confirmed: ${{ steps.check.outputs.confirmed }}
    steps:
      - name: Validate Confirmation
        id: check
        run: |
          if [ "${{ inputs.confirm-unlock }}" != "CONFIRM" ]; then
            echo "❌ Confirmation failed. You must type 'CONFIRM' to unlock state."
            exit 1
          fi
          echo "confirmed=true" >> $GITHUB_OUTPUT

  emergency-unlock:
    needs: validate-confirmation
    runs-on: uhg-runner
    environment: ${{ inputs.environment }}
    uses: dojo360/pipelines-workflows/.github/workflows/[email protected]
    with:
      aide-id: '<change me>'
      cloud-type: 'awsOptum'
      domain: '<change me>'
      environment: ${{ inputs.environment }}
      lock-id: ${{ inputs.lock-id }}
      team-name: '<change me>'
      
      backend-type: 's3'
      terraform-directory: '.'
    secrets:
      GH_TOKEN: ${{ secrets.GH_TOKEN }}

  notify-team:
    needs: emergency-unlock
    runs-on: uhg-runner
    if: always()
    steps:
      - name: Notify Team
        run: |
          echo "πŸ”“ State unlock operation completed for ${{ inputs.environment }}"
          echo "Lock ID: ${{ inputs.lock-id }}"
          echo "Status: ${{ needs.emergency-unlock.result }}"

How to Obtain Lock ID

From Terraform Error Messages

When Terraform operations fail due to state locks, the error message includes the lock ID:

Error: Error acquiring the state lock

Error message: ConditionalCheckFailedException: The conditional request failed
Lock Info:
  ID:        abc123-def456-ghi789-jkl012
  Path:      terraform.tfstate
  Operation: OperationTypePlan
  Who:       [email protected]
  Version:   1.9.2
  Created:   2026-01-16 10:30:00.000000 UTC
  Info:      

Terraform acquires a state lock to protect the state from being written
by multiple users at the same time. Please resolve the issue above and try
again.

Copy the ID value from the error message: abc123-def456-ghi789-jkl012

From AWS S3 Backend

# List DynamoDB lock table entries
aws dynamodb scan \
  --table-name terraform-state-lock \
  --region us-east-1

# Query specific lock
aws dynamodb get-item \
  --table-name terraform-state-lock \
  --key '{"LockID": {"S": "my-state-file-md5"}}' \
  --region us-east-1

From Azure Blob Storage

# Check blob lease status
az storage blob show \
  --account-name <storage-account> \
  --container-name <container> \
  --name terraform.tfstate \
  --query "properties.lease"

# Lease ID is the lock ID for Azure

From GCS Backend

# Check object metadata
gsutil stat gs://<bucket>/terraform.tfstate

# Look for lock metadata in object attributes

Best Practices

1. Verify Before Unlocking

Always verify that no Terraform operations are actively running before force unlocking:

  • Check CI/CD pipelines for running jobs
  • Confirm with team members that no one is actively deploying
  • Review recent workflow runs in GitHub Actions
  • Verify that the operation that created the lock has fully terminated

2. Use Confirmation Gates

Implement manual approval or confirmation steps:

environment: production  # Requires manual approval

3. Document Lock Incidents

After unlocking state:

  • Document the incident and root cause
  • Update runbooks if recurring issue
  • Review infrastructure logs for failures
  • Consider implementing retry logic in Terraform

4. Backend-Specific Considerations

AWS S3 + DynamoDB:

  • Ensure DynamoDB table exists and has correct permissions
  • Verify S3 bucket access permissions
  • Check DynamoDB table capacity

Azure Blob Storage:

  • Verify storage account access
  • Check blob lease status before unlocking
  • Review Storage Account firewall rules

GCS:

  • Confirm bucket permissions
  • Verify service account has storage.objects.update permission

5. Production Safety

For production environments:

  • Always require manual approval via GitHub environments
  • Implement confirmation inputs (e.g., typing "CONFIRM")
  • Use validation jobs before unlock operation
  • Notify teams via Slack/Teams after unlock operations
  • Log all unlock operations for audit purposes

6. Automation Guidelines

  • Schedule regular state cleanup if locks persist frequently
  • Implement timeout policies for Terraform operations
  • Use consistent backend configurations across environments
  • Monitor state lock metrics to identify patterns

7. Emergency Procedures

In case of critical production blocks:

  1. Verify lock ID from error message
  2. Confirm no active operations
  3. Use emergency unlock workflow with approval
  4. Validate state consistency after unlock
  5. Document incident for post-mortem

8. Lock ID Management

  • Store lock IDs in incident tickets
  • Maintain a log of all force unlocks
  • Track lock patterns to identify infrastructure issues
  • Correlate locks with deployment failures

Troubleshooting

Issue 1: Lock ID Not Found

Symptoms:

  • Workflow completes but lock persists
  • Error: "Lock ID does not match"

Solutions:

  • Verify lock ID is copied correctly (no extra spaces)
  • Check if lock has already been released
  • Confirm backend type matches state file location
  • Try retrieving lock ID again from error message

Issue 2: Permission Denied

Symptoms:

  • Cannot access state backend
  • OIDC authentication fails

Solutions:

  • Verify cloud permissions for state management
  • Check OIDC configuration for cloud-type
  • Confirm IAM roles have state file access
  • Review GitHub runner permissions

Issue 3: Backend Configuration Mismatch

Symptoms:

  • Cannot find state file
  • Backend initialization fails

Solutions:

  • Verify backend-type matches actual backend
  • Check terraform-directory path is correct
  • Confirm remote state configuration in metadata
  • Review backend configuration in Terraform code

Issue 4: TFE Workspace Errors

Symptoms:

  • Cannot connect to Terraform Enterprise
  • Workspace not found

Solutions:

  • Verify tfe-hostname is correct
  • Confirm tfe-organization and tfe-workspace names
  • Check TFE API token permissions
  • Review network connectivity to TFE

Issue 5: Multiple Environments Locked

Symptoms:

  • Locks across multiple environments
  • Cascading lock failures

Solutions:

  • Identify root cause of initial failure
  • Unlock environments in reverse deployment order (prod β†’ stage β†’ qa β†’ dev)
  • Review shared infrastructure dependencies
  • Implement circuit breakers for multi-environment deployments

Related Workflows

Support & Documentation


Version: 1.0.0
Last Updated: January 16, 2026
Maintained By: Platform Engineering Team

Related Assets