iToverDose/Software· 11 MAY 2026 · 16:05

How a self-service DevOps sandbox tamed Azure’s deployment chaos

A developer built a miniature Heroku with auto-destroying environments and chaos testing—only to spend more time fighting Azure’s firewall and SSH keys than writing code.

DEV Community5 min read0 Comments

Building a self-service DevOps sandbox that auto-destroys environments and simulates outages sounds straightforward—until Azure’s hidden firewall blocks every port and SSH keys refuse to cooperate.

The problem every DevOps team faces

Every developer on a team needs an isolated environment to test code changes, but spinning them up manually is slow and error-prone. Worse, environments often linger long after their purpose is served, wasting cloud resources. Even when environments exist, teams rarely test how systems behave under real-world stress. That’s the gap my team set out to fill: a self-service platform where developers could spin up disposable environments, deploy apps automatically, simulate failures, and let the system clean up after itself.

A miniature Heroku built on chaos principles

The result is a self-service DevOps sandbox platform—effectively a miniature internal Heroku—that runs entirely within a single Azure VM. Users can spin up isolated environments with a single command, deploy applications instantly, monitor health every 30 seconds, inject controlled chaos like crashes or network failures, and auto-destroy everything when the time-to-live expires. All of this starts with a single command: make up.

Five core components power the platform

  • Nginx as the front door

Each environment gets its own auto-generated Nginx configuration file, written to nginx/conf.d/ and reloaded with nginx -s reload. Traffic is routed by hostname, ensuring clean isolation between environments.

  • FastAPI control API

A RESTful API with seven endpoints wraps all the underlying Bash scripts. Functions include creating, listing, and destroying environments; fetching logs; checking health; and triggering outages. Swagger documentation is available at /docs.

  • Bash engine scripts

Four scripts handle the heavy lifting:

  • create_env.sh spins up containers, networks, Nginx configs, and log shipping
  • destroy_env.sh tears everything down, archives logs, and cleans state
  • simulate_outage.sh triggers chaos scenarios like crashes, pauses, network loss, recovery, and CPU stress
  • cleanup_daemon.sh runs every 60 seconds to auto-destroy expired environments
  • Health monitor

A Python script polls each active environment’s /health endpoint every 30 seconds. Three consecutive failures mark the environment as degraded, enabling proactive responses.

  • State management

JSON files in envs/ store environment metadata. All writes use atomic temp-file + mv operations to prevent corruption during concurrent operations.

From prototype to deployment: four Azure wars

The platform prototype came together quickly: one command to start everything, environments spinning up in seconds, chaos simulation working perfectly. But deployment on Azure turned into a series of technical battles that had nothing to do with the code.

Battle 1: The SSH key mismatch

The first attempt to SSH into the VM failed with Permission denied (publickey). The .pem file path in the command was wrong, and the actual key file—hng5-vm_key.pem—sat unnoticed in the Downloads folder. Even after correcting the path, authentication still failed because the key didn’t match the VM’s registered SSH public key. The fix required resetting the SSH key directly in the Azure portal under Connect → Reset SSH public key, costing 20 minutes of debugging time. Lesson learned: always verify that your SSH key matches the VM it was created for—Azure allows easy resets, but the clock is ticking on your downtime.

Battle 2: The silent firewall blockade

The API was live on port 5000 inside the VM, confirmed by running ss -tlnp, but browsers couldn’t reach it. Adding inbound port rules in the Azure Network Security Group (NSG) for port 5000 had no effect. Routing through Nginx on port 8080, trying proxy containers, and testing various IP addresses like 172.17.0.1 and 127.0.0.1 all returned 502 Bad Gateway. The root cause? Azure enforces an additional firewall layer beyond NSG rules. The solution was to run the API container with --network host instead of the default bridge network:

docker run -d \
  --name sandbox-api \
  --network host \
  -v $(pwd):/app \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -w /app \
  devops-sandbox-api \
  python3 platform/api.py

Host network mode binds directly to the VM’s network interface, bypassing Docker’s bridge and Azure’s hidden firewall. Suddenly, port 5000 became reachable from outside. Lesson learned: Docker bridge networking can be blocked by Azure’s internal firewall even when NSG rules look correct; host network mode is the escape hatch.

Battle 3: Reserved module names strike again

With host networking working, the API still refused to start, throwing ERROR: Could not import module "platform.api". The issue? platform is a built-in Python standard library module. Uvicorn was importing Python’s built-in platform instead of our platform/api.py file. The fix was to run the API directly as a Python script:

python3 platform/api.py

instead of through Uvicorn’s module import. Lesson learned: never name your application directory the same as a Python standard library module. Names like platform, json, os, and sys are off-limits; rename to app, api, or src instead.

Battle 4: The disappearing EOF delimiter

Writing Nginx configuration files directly in the terminal using heredoc syntax kept producing malformed output. The EOF delimiter was being swallowed or the content duplicated silently. For example, this command failed without warning:

# This kept failing silently
cat > nginx/conf.d/api.conf << 'EOF'
server {
    ...
}
EOF  # ← this line was the problem

The terminal interpreted EOF as part of the previous command rather than a delimiter. The fix was to use tee with explicit redirection:

tee nginx/conf.d/api.conf > /dev/null << 'EOF'
server {
    listen 8080;
    location / {
        proxy_pass 
    }
}
EOF

Always verify config files immediately after writing them in the terminal to prevent silent failures downstream.

What comes next for self-service DevOps sandboxes

The platform now runs reliably on Azure, offering developers a true self-service sandbox where environments are ephemeral by design, health is monitored continuously, and controlled chaos is just a command away. While the deployment battles were painful, they reinforced a hard truth: the biggest challenges in building internal platforms often live outside the code—in cloud networking quirks, security policies, and infrastructure assumptions. As DevOps teams push further into automation and isolation, tools like this miniature Heroku will become essential, but the real work begins the moment the first make up command runs.

AI summary

Learn how a developer built a miniature Heroku with auto-destroying environments and chaos testing, then fought Azure’s hidden firewall and SSH pitfalls during deployment.

Comments

00
LEAVE A COMMENT
ID #1BDEF1

0 / 1200 CHARACTERS

Human check

3 + 9 = ?

Will appear after editor review

Moderation · Spam protection active

No approved comments yet. Be first.