In a world where social platforms reward volume over perfection, content creators face an impossible choice: spend hours meticulously editing every frame or churn out dozens of raw clips hoping a few go viral. This dilemma led to the creation of ClipFarmer, a video processing SaaS built in Senegal that replaces manual labor with open-source computer vision models.
Unlike many "AI tools" marketed across West Africa—often little more than Telegram bots wrapping ChatGPT—ClipFarmer relies entirely on self-hosted machine learning pipelines. The system doesn’t call external APIs for tasks like speech-to-text or object detection. Instead, it runs HuggingFace’s Whisper, YOLO, Detectron2, MediaPipe, and OpenCV directly on its own infrastructure, ensuring full control over processing speed and costs.
From raw footage to viral-ready clips
The core challenge ClipFarmer solves is automating the tedious parts of video editing that consume hours of creators’ time. Most manual workflows involve:
- Scanning hours of footage to identify scene changes
- Generating accurate subtitles for accessibility and platform algorithms
- Resizing and reframing content for vertical formats like TikTok or Instagram Reels
- Applying consistent visual effects and transitions
The platform’s processing pipeline begins with scene detection using YOLO and OpenCV, which identifies natural breakpoints where scenes transition—far more precise than splitting videos at fixed intervals. For dialogue-heavy content, Whisper transcribes speech locally to generate subtitles without per-minute fees or API bottlenecks. MediaPipe then analyzes body and facial landmarks to keep subjects centered when converting widescreen footage to vertical formats, while Detectron2 handles background removal for clean masking effects.
Each processing stage runs in isolation within dedicated Docker containers to prevent dependency conflicts, particularly between Whisper, Detectron2, and MediaPipe, which require incompatible Python environments. The result is a fully automated workflow where raw footage enters the system and emerges as a polished, platform-optimized clip within minutes.
Parallel processing with Celery and optimized pipelines
Building the effects and transitions system proved the most technically demanding part of the project. Unlike simple cuts or crossfades, ClipFarmer employs advanced frame-level processing:
workflow = chord(
splitter_clip.s(job.job_id, input_path),
workflow_tasks_parallel.s()
)
task_result = workflow()The architecture leverages Celery’s chord functionality to split videos into clips first, then process subtitles, effects, and transitions in parallel across worker nodes. This parallelization cuts total processing time dramatically compared to sequential frame handling.
Key optimizations emerged from painful trial and error:
- Frame batching: Processing frames individually crippled performance. Batch reading and writing reduced processing time by over 60% for 10-minute videos
- Direct uploads via presigned URLs: Streaming large video files through FastAPI caused crashes. Switching to MinIO’s presigned uploads shifted bandwidth demands to clients
- Docker networking quirks: A missing environment variable in the FastAPI container prevented Celery workers from connecting to RabbitMQ, costing days of debugging
The backend stack reflects these lessons: FastAPI handles jobs and metadata, Celery manages distributed tasks, RabbitMQ coordinates queues, and Redis caches intermediate results. PostgreSQL stores user data and processing metadata, while React-based frontends provide an intuitive interface built with Vite and TailwindCSS.
Serving West African creators with mobile-first payments
ClipFarmer’s deployment reflects the realities of digital payment systems in West Africa, where credit cards remain uncommon and mobile money dominates transactions. The platform natively integrates Wave and Orange Money, removing friction for local creators who might otherwise be priced out of premium editing tools.
This local-first approach addresses a critical gap in the market. Many "AI" tools available regionally are either outright scams—posing as advanced solutions while merely repackaging ChatGPT—or inaccessible due to high costs or bandwidth requirements. ClipFarmer’s self-hosted model eliminates per-minute billing and ensures creators maintain full control over their intellectual property.
Lessons for video processing at scale
The journey from prototype to production revealed unexpected technical hurdles that could derail similar projects:
- Dependency isolation isn’t optional: Attempting to run Whisper, Detectron2, and MediaPipe in shared environments led to cryptic errors. Separate Conda environments within worker containers became the only viable solution
- Memory management matters: Long videos (20+ minutes) exhausted RAM during frame processing. Implementing chunked frame batches prevented crashes and kept processing times predictable
- Network configuration is critical: Docker networking behaves differently than expected. Missing environment variables in one container can silently break others, requiring exhaustive testing
ClipFarmer now processes hundreds of videos weekly, with creators using the platform to repurpose long-form content into dozens of social-ready clips. The system continues to evolve, with ongoing optimizations for frame batching and memory usage on extended footage.
For other developers building video processing pipelines, the project demonstrates that open-source models can deliver professional-grade results without relying on black-box APIs. The real challenge lies in orchestrating these models efficiently and serving users in regions where infrastructure constraints demand creative solutions.
What aspect of AI-powered video editing would make you abandon manual workflows entirely?
AI summary
TikTok ve Instagram için manuel video klipleri oluşturmak saatler alıyor. ClipFarmer ile bu süreci otomatikleştirin ve daha fazla viral video oluşturun.