Vintner

Scale-to-Zero

Lambda-based auto-scaler that scales Tendril ECS services to zero when idle.

Scale-to-Zero

Cloud-hosted Tendrils support automatic scale-to-zero. When no jobs are queued, the Lambda scaler scales ECS services down to 0 desired tasks. When a job arrives, it scales back to 1. This eliminates Fargate costs during idle periods.

How It Works

A Lambda function runs on a 1-minute schedule via EventBridge. It checks the Trellis database for queued jobs and adjusts ECS service desired counts accordingly.

Scale-Up Logic

Every 1 minute:
  1. Query Supabase: SELECT count(*) FROM provision_jobs WHERE status = 'QUEUED'
  2. For each registered Tendril ECS service:
     - If queued > 0 AND current desired = 0:
       → Scale UP to 1
       → Reset idle counter

Scale-Down Logic

  3. For each registered Tendril ECS service:
     - If queued = 0 AND current desired > 0:
       → Increment idle counter
       → If idle counter >= 5 (5 consecutive checks = 5 minutes):
         → Scale DOWN to 0
     - If queued > 0:
       → Reset idle counter

The 5-check threshold (5 minutes) prevents flapping — a brief gap between jobs won't cause a scale-down followed by an immediate scale-up.

Infrastructure

Lambda Function

PropertyValue
RuntimePython 3.12
ArchitectureARM64
Memory128 MB
Timeout30 seconds
TriggerEventBridge rule (every 1 minute)

IAM Permissions

The Lambda function needs:

  • ecs:DescribeServices — read current desired count
  • ecs:UpdateService — change desired count
  • Network access to Supabase REST API (for job count query)

Environment Variables

VariablePurpose
SUPABASE_URLSupabase project URL
SUPABASE_SERVICE_ROLE_KEYService role key for admin API access
WORKERSJSON array of ECS service configurations

Workers Configuration

[
  {
    "region": "eu-west-1",
    "cluster": "trellis-prod-tendril-eu",
    "service": "trellis-prod-tendril-eu-service"
  },
  {
    "region": "us-east-1",
    "cluster": "trellis-prod-tendril-us",
    "service": "trellis-prod-tendril-us-service"
  }
]

Each entry maps to one ECS service running a Tendril. The scaler handles multiple regions independently.

Job Count Query

The Lambda queries Supabase's REST API:

url = f"{SUPABASE_URL}/rest/v1/provision_jobs?status=eq.QUEUED&select=id"
req = urllib.request.Request(url, headers={
    'apikey': SUPABASE_KEY,
    'Authorization': f'Bearer {SUPABASE_KEY}',
    'Prefer': 'count=exact',
})

It uses the content-range header to get the count without fetching all rows.

Cold Start Behavior

When a Tendril is scaled from 0 → 1:

  1. ECS task launch — ~30 seconds (Fargate cold start, image pull)
  2. Tendril boot — ~5 seconds (binary startup, API registration)
  3. First heartbeat — ~5 seconds after boot
  4. First job claim — next poll cycle (≤ 10 seconds)

Total cold start: ~50 seconds from job queue to job start.

During this time, the job remains in QUEUED status. The user sees "Waiting for Tendril..." in the UI.

Cost Savings

Without scale-to-zero, a single Tendril running 24/7 on Fargate costs:

  • 1 vCPU + 4 GiB memory ≈ $40/month per Tendril

With scale-to-zero, you pay only for:

  • Lambda invocations — 43,200/month (1/min) × 128MB × 1s ≈ $0.02/month
  • Fargate — only when jobs are running

For workloads with sporadic provisioning (a few deploys per week), this reduces Tendril costs by ~99%.

Terraform Configuration

The scaler is defined in infra/platform/scaler/:

module "scaler" {
  source = "./scaler"

  name_prefix               = local.name_prefix
  supabase_url              = var.supabase_url
  supabase_service_role_key = var.supabase_service_role_key

  workers = [
    for name, w in module.tendril : {
      region  = var.tendrils[name].region
      cluster = w.cluster_name
      service = w.service_name
    }
  ]
}

The scaler automatically discovers all Tendril deployments from the module.tendril outputs.

On this page