A single FastAPI project taught me more about cloud deployment than a dozen tutorials ever could. After pushing the same code to Google Cloud Run, Railway, and Oracle Cloud’s Always Free tier, I discovered that real-world challenges rarely match the marketing promises. Here’s what actually happened behind the scenes—and how to avoid the same pitfalls.
The first deployment: Cloud Run’s startup trap
Google Cloud Run promised seamless scaling and zero maintenance, but my initial attempts ended in repeated failures. After 20+ deployments, the pattern was clear: the build and push succeeded, but the container never started. Health checks timed out, and port binding remained stuck on 8080.
The culprit wasn’t the cloud provider—it was my own code. The FastAPI startup sequence included blocking I/O operations that prevented the service from binding to the port immediately. Every time the application tried to initialize the database, send a Telegram message, or start a scheduler, Cloud Run’s health checks interpreted the delay as a failure.
Rethinking the startup sequence
Moving heavy initialization outside the startup phase solved the problem. Instead of blocking the port with synchronous I/O calls, I implemented lazy loading. The application now returns a response immediately, deferring non-critical tasks until the first actual request arrives.
_initialized = False
async def lazy_init():
global _initialized
if _initialized:
return
_initialized = True
await telegram_client.send_message("Application started")
scheduler.start()
@app.post("/webhook")
async def webhook(request: Request):
await lazy_init()
return {"status": "processed"}This approach reduced startup time from over 60 seconds to under 100 milliseconds. Cloud Run’s health checks passed on the first try, and the deployment stabilized.
Key takeaway: Start with the minimal viable endpoint. Deploy a basic / route, confirm it works, then gradually layer on features like health checks and logging. Each phase should require only one deployment test.
Railway’s hidden costs and automation risks
Railway marketed itself as the simplest way to deploy, and for quick projects, it delivered. A single Git push triggered an auto-deployment that completed in under 30 seconds. The built-in PostgreSQL and Redis instances eliminated the need to manage separate databases.
But simplicity came with trade-offs. The first surprise arrived in the form of a $25 monthly bill instead of the expected $10. The culprit? A 1 vCPU, 512MB instance running 24/7 with no cold starts meant memory usage accumulated continuously. Bandwidth overages piled on top, turning a budget-friendly option into an unexpected expense.
Memory leaks and rollback challenges
A seemingly minor RSS feed crawler triggered a memory leak that grew from 150MB to 260MB over four hours. Without proactive monitoring, the application would crash from an out-of-memory error before anyone noticed. Railway’s built-in dashboard provided basic metrics, but pinpointing the leak required SSH access and manual inspection.
Auto-deployment also introduced risks. Changes pushed to the main branch went live immediately, bypassing any staging or testing environment. Rolling back meant reverting commits, which could introduce new issues if the team wasn’t prepared.
Operational discipline became essential:
- Run unit tests with
pytestbefore pushing to main - Lint code with
pylintto catch style issues - Build and test the Docker image locally before deployment
- Only push to main after all checks pass
Oracle Cloud’s free tier: power with operational overhead
Oracle Cloud’s Always Free tier offered 4 CPUs, 24GB of RAM, and 200GB of storage at no cost. For a low-traffic project, the resources were more than sufficient. But accessing that power required navigating a steeper learning curve.
Memory constraints during setup
The first hurdle appeared during package installation. A 1GB instance couldn’t handle a full pip install without crashing. The solution? Introducing swap space or trimming dependencies.
sudo fallocate -l 8G /swapfile
sudo mkswap /swapfile
sudo swapon /swapfileAlternatively, installing only essential packages with --no-cache-dir prevented memory exhaustion. For lighter workloads, removing version pins from requirements.txt let pip resolve compatible versions automatically.
Docker vs local environment mismatches
Local development used a pre-installed version of the anthropic library, but the Docker container started fresh from requirements.txt. When langchain-anthropic required a newer version, pip failed to resolve the conflict. Pinning specific versions in development while omitting them in production created unnecessary friction.
Best practice: Let dependency management tools resolve versions in production. Avoid hardcoding versions unless absolutely necessary.
SSH deployment demands automation
Manual deployments via SSH were error-prone and time-consuming. Every update required logging in, pulling the latest code, and restarting the service. A simple GitHub Actions workflow streamlined the process:
ssh -i $key oracle@$ip "cd /opt/ai-lifelogger && git pull && systemctl restart ai-lifelogger"Automating deployments reduced human error and freed up time for more critical tasks.
Performance and cost at a glance
After three months of running each platform in production, the differences became clear. Cloud Run offered solid stability but required careful startup design. Railway provided speed and simplicity, though costs and memory leaks demanded vigilance. Oracle Cloud delivered raw power for free, but its operational overhead suited only those willing to manage it.
| Metric | Google Cloud Run | Railway | Oracle Cloud Always Free | |-----------------|------------------|---------|--------------------------| | Deploy time | 2–3 minutes | 30 seconds | 5 minutes | | Cold start | 3–5 seconds | 0 seconds | <1 second | | Monthly cost | $15 | $25 | $0 | | CPU limit | 2 cores | 1 core | 4 cores | | RAM limit | 2GB | 512MB | 24GB | | Stability | ✅ Reliable | ⚠️ Prone to leaks | ✅ Reliable |
Choosing the right platform for your project
The ideal cloud provider depends on the project’s scale, traffic, and budget.
- High-traffic applications benefit from Cloud Run’s auto-scaling and 24/7 availability. The startup latency can be mitigated with lazy loading patterns.
- Medium-traffic prototypes thrive on Railway’s simplicity and integrated services. Just budget for memory usage and implement monitoring.
- Low-traffic or experimental projects can leverage Oracle Cloud’s free tier. Accept the extra setup work in exchange for zero costs.
Regardless of the choice, always test deployments locally first. A simple Docker workflow ensures consistency before pushing to the cloud.
# Build and run locally
docker build -t myapp .
docker run -p 8080:8080 myappMonitoring and maintenance: the non-negotiable reality
Cloud providers offer dashboards, but they’re not substitutes for active monitoring.
- Cloud Run: Use Google Cloud’s Logs Explorer and Cloud Monitoring for granular insights.
- Railway: Rely on its built-in dashboard for basic metrics, but supplement with external logging for leak detection.
- Oracle Cloud: SSH into the instance and use
journalctlortail -fto track application logs in real time.
Without monitoring, memory leaks and hidden costs can spiral out of control before anyone notices.
The real lesson: constraints shape better design
There’s no universally perfect cloud platform—only the right fit for a specific use case. Cloud Run’s strict startup requirements forced me to decouple initialization from deployment. Railway’s memory limits highlighted the importance of efficient resource usage. Oracle Cloud’s free tier taught the value of automation and careful dependency management.
The 20+ failed Cloud Run deployments weren’t setbacks; they were lessons. Each error revealed a flaw in design or process, turning into a building block for a more resilient system. Today, all three projects run in production, each optimized for its respective cloud’s strengths and weaknesses.
As cloud technologies evolve, the lesson remains constant: understand the platform’s constraints, design around them, and automate everything else. The cloud is a tool—not a magic wand—and mastery comes from working within its boundaries.
AI summary
Learn how to deploy FastAPI apps across Google Cloud Run, Railway, and Oracle Cloud with lessons on startup timeouts, memory leaks, and hidden costs.