Multi-cloud deployment lessons: FastAPI on Cloud Run, Railway & Oracle

A single FastAPI project taught me more about cloud deployment than a dozen tutorials ever could. After pushing the same code to Google Cloud Run, Railway, and Oracle Cloud’s Always Free tier, I discovered that real-world challenges rarely match the marketing promises. Here’s what actually happened behind the scenes—and how to avoid the same pitfalls.

The first deployment: Cloud Run’s startup trap

Google Cloud Run promised seamless scaling and zero maintenance, but my initial attempts ended in repeated failures. After 20+ deployments, the pattern was clear: the build and push succeeded, but the container never started. Health checks timed out, and port binding remained stuck on 8080.

The culprit wasn’t the cloud provider—it was my own code. The FastAPI startup sequence included blocking I/O operations that prevented the service from binding to the port immediately. Every time the application tried to initialize the database, send a Telegram message, or start a scheduler, Cloud Run’s health checks interpreted the delay as a failure.

Rethinking the startup sequence

Moving heavy initialization outside the startup phase solved the problem. Instead of blocking the port with synchronous I/O calls, I implemented lazy loading. The application now returns a response immediately, deferring non-critical tasks until the first actual request arrives.

_initialized = False

async def lazy_init():
    global _initialized
    if _initialized:
        return
    _initialized = True
    await telegram_client.send_message("Application started")
    scheduler.start()

@app.post("/webhook")
async def webhook(request: Request):
    await lazy_init()
    return {"status": "processed"}

This approach reduced startup time from over 60 seconds to under 100 milliseconds. Cloud Run’s health checks passed on the first try, and the deployment stabilized.

Key takeaway: Start with the minimal viable endpoint. Deploy a basic / route, confirm it works, then gradually layer on features like health checks and logging. Each phase should require only one deployment test.

Railway’s hidden costs and automation risks

Railway marketed itself as the simplest way to deploy, and for quick projects, it delivered. A single Git push triggered an auto-deployment that completed in under 30 seconds. The built-in PostgreSQL and Redis instances eliminated the need to manage separate databases.

But simplicity came with trade-offs. The first surprise arrived in the form of a $25 monthly bill instead of the expected $10. The culprit? A 1 vCPU, 512MB instance running 24/7 with no cold starts meant memory usage accumulated continuously. Bandwidth overages piled on top, turning a budget-friendly option into an unexpected expense.

Memory leaks and rollback challenges

A seemingly minor RSS feed crawler triggered a memory leak that grew from 150MB to 260MB over four hours. Without proactive monitoring, the application would crash from an out-of-memory error before anyone noticed. Railway’s built-in dashboard provided basic metrics, but pinpointing the leak required SSH access and manual inspection.

Auto-deployment also introduced risks. Changes pushed to the main branch went live immediately, bypassing any staging or testing environment. Rolling back meant reverting commits, which could introduce new issues if the team wasn’t prepared.

Operational discipline became essential:

Run unit tests with pytest before pushing to main
Lint code with pylint to catch style issues
Build and test the Docker image locally before deployment
Only push to main after all checks pass

Oracle Cloud’s free tier: power with operational overhead

Oracle Cloud’s Always Free tier offered 4 CPUs, 24GB of RAM, and 200GB of storage at no cost. For a low-traffic project, the resources were more than sufficient. But accessing that power required navigating a steeper learning curve.

Memory constraints during setup

The first hurdle appeared during package installation. A 1GB instance couldn’t handle a full pip install without crashing. The solution? Introducing swap space or trimming dependencies.

sudo fallocate -l 8G /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

Alternatively, installing only essential packages with --no-cache-dir prevented memory exhaustion. For lighter workloads, removing version pins from requirements.txt let pip resolve compatible versions automatically.

Docker vs local environment mismatches

Local development used a pre-installed version of the anthropic library, but the Docker container started fresh from requirements.txt. When langchain-anthropic required a newer version, pip failed to resolve the conflict. Pinning specific versions in development while omitting them in production created unnecessary friction.

Best practice: Let dependency management tools resolve versions in production. Avoid hardcoding versions unless absolutely necessary.

SSH deployment demands automation

Manual deployments via SSH were error-prone and time-consuming. Every update required logging in, pulling the latest code, and restarting the service. A simple GitHub Actions workflow streamlined the process:

ssh -i $key oracle@$ip "cd /opt/ai-lifelogger && git pull && systemctl restart ai-lifelogger"

Automating deployments reduced human error and freed up time for more critical tasks.

Performance and cost at a glance

After three months of running each platform in production, the differences became clear. Cloud Run offered solid stability but required careful startup design. Railway provided speed and simplicity, though costs and memory leaks demanded vigilance. Oracle Cloud delivered raw power for free, but its operational overhead suited only those willing to manage it.

| Metric | Google Cloud Run | Railway | Oracle Cloud Always Free | |-----------------|------------------|---------|--------------------------| | Deploy time | 2–3 minutes | 30 seconds | 5 minutes | | Cold start | 3–5 seconds | 0 seconds | <1 second | | Monthly cost | $15 | $25 | $0 | | CPU limit | 2 cores | 1 core | 4 cores | | RAM limit | 2GB | 512MB | 24GB | | Stability | ✅ Reliable | ⚠️ Prone to leaks | ✅ Reliable |

Choosing the right platform for your project

The ideal cloud provider depends on the project’s scale, traffic, and budget.

High-traffic applications benefit from Cloud Run’s auto-scaling and 24/7 availability. The startup latency can be mitigated with lazy loading patterns.
Medium-traffic prototypes thrive on Railway’s simplicity and integrated services. Just budget for memory usage and implement monitoring.
Low-traffic or experimental projects can leverage Oracle Cloud’s free tier. Accept the extra setup work in exchange for zero costs.

Regardless of the choice, always test deployments locally first. A simple Docker workflow ensures consistency before pushing to the cloud.

# Build and run locally
docker build -t myapp .
docker run -p 8080:8080 myapp

Monitoring and maintenance: the non-negotiable reality

Cloud providers offer dashboards, but they’re not substitutes for active monitoring.

Cloud Run: Use Google Cloud’s Logs Explorer and Cloud Monitoring for granular insights.
Railway: Rely on its built-in dashboard for basic metrics, but supplement with external logging for leak detection.
Oracle Cloud: SSH into the instance and use journalctl or tail -f to track application logs in real time.

Without monitoring, memory leaks and hidden costs can spiral out of control before anyone notices.

The real lesson: constraints shape better design

There’s no universally perfect cloud platform—only the right fit for a specific use case. Cloud Run’s strict startup requirements forced me to decouple initialization from deployment. Railway’s memory limits highlighted the importance of efficient resource usage. Oracle Cloud’s free tier taught the value of automation and careful dependency management.

The 20+ failed Cloud Run deployments weren’t setbacks; they were lessons. Each error revealed a flaw in design or process, turning into a building block for a more resilient system. Today, all three projects run in production, each optimized for its respective cloud’s strengths and weaknesses.

As cloud technologies evolve, the lesson remains constant: understand the platform’s constraints, design around them, and automate everything else. The cloud is a tool—not a magic wand—and mastery comes from working within its boundaries.

AI summary

Learn how to deploy FastAPI apps across Google Cloud Run, Railway, and Oracle Cloud with lessons on startup timeouts, memory leaks, and hidden costs.