GitHub’s engineering team recently faced a paradox most tech organizations dread—a circular dependency so deep it could disrupt its own operations. Hosting the company’s source code on github.com is a point of pride, but it introduced an ironic twist: if the platform ever went down, the team couldn’t deploy fixes to restore it. This revelation sparked a broader investigation into deployment risks, leading to a groundbreaking solution using eBPF (extended Berkeley Packet Filter) to break these dependencies without compromising system stability.
The hidden risks of circular dependencies in deployments
Circular dependencies aren’t just theoretical problems; they can paralyze critical operations during incidents. Imagine a scenario where a MySQL outage cripples GitHub’s ability to serve release data. To resolve the issue, engineers need to roll out a configuration change to affected MySQL nodes. The deployment script responsible for this fix might inadvertently create its own circular trap:
- Direct dependency: The script attempts to fetch an open-source tool from GitHub to complete the deployment. If GitHub is unreachable, the script fails, delaying recovery.
- Hidden dependency: The script relies on a tool already installed on the node, but that tool checks GitHub for updates. If GitHub is down, the tool may hang or behave unpredictably, sabotaging the deployment.
- Transient dependency: The script calls an internal service (e.g., a migrations service) via API, which in turn tries to download a binary from GitHub. A single outage can cascade through the system, compounding the problem.
Traditionally, teams mitigated these risks by reviewing deployment scripts manually—a process that often fails to catch hidden or transient dependencies until an incident occurs. The result? Slower incident response and increased downtime.
eBPF as a surgical tool for deployment safety
GitHub’s solution hinged on isolating deployment scripts at the network level, ensuring they couldn’t inadvertently depend on external services like GitHub during critical operations. Enter eBPF, a Linux kernel technology that allows developers to load custom programs into the kernel and hook into core system functions, such as network traffic.
The team zeroed in on the BPF_PROG_TYPE_CGROUP_SKB program type, which enables filtering network egress traffic from specific cGroups (control groups). cGroups are Linux primitives that enforce resource limits and isolation for process sets, commonly used in containerized environments but not exclusive to them. The key insight? By placing only the deployment script into a dedicated cGroup, the team could selectively restrict its outbound network access without affecting other services.
This approach offered a critical advantage: stateful hosts—machines serving production traffic—could continue operating normally while deployment scripts remained isolated. No more blocking GitHub entirely, which would disrupt production workloads. Instead, GitHub engineered a targeted solution that addressed the root cause of circular dependencies.
A proof of concept in Go and C
To validate their approach, GitHub built a proof of concept using the cilium/ebpf library, a pure-Go toolkit that simplifies eBPF program development. The project demonstrated how to load, compile, and attach eBPF programs to kernel hooks, providing a blueprint for teams looking to implement similar safeguards.
The Go code snippet below illustrates the process of attaching an eBPF program to a cGroup to monitor and filter network egress:
//go:generate go tool bpf2go -tags linux bpf cgroup_skb.c -- -I../headers
func main() {
// Load pre-compiled programs and maps into the kernel.
objs := bpfObjects{}
if err := loadBpfObjects(&objs, nil); err != nil {
log.Fatalf("loading objects: %v", err)
}
defer objs.Close()
// Link the count_egress_packets program to the cgroup.
l, err := link.AttachCgroup(link.CgroupOptions{
Path: "/sys/fs/cgroup/system.slice",
Attach: ebpf.AttachCGroupInetEgress,
Program: objs.CountEgressPackets,
})
if err != nil {
log.Fatal(err)
}
defer l.Close()
log.Println("Counting packets...")
// Read loop reporting the total amount of times the kernel
// function was entered, once per second.
ticker := time.NewTicker(1 * time.Second)
defer ticker.Stop()
for range ticker.C {
var value uint64
if err := objs.PktCount.Lookup(uint32(0), &value); err != nil {
log.Fatalf("reading map: %v", err)
}
log.Printf("number of packets: %d\n", value)
}
}The accompanying eBPF program in C defines the logic for counting egress packets and interacting with kernel maps:
//go:build ignore
#include "common.h"
char __license[] SEC("license") = "Dual MIT/GPL";
struct {
__uint(type, BPF_MAP_TYPE_ARRAY);
__type(key, u32);
__type(value, u64);
__uint(max_entries, 1);
} pkt_count SEC(".maps");
SEC("cgroup_skb/egress")
int count_egress_packets(struct __sk_buff *skb)
{
u32 key = 0;
u64 init_val = 1;
u64 *count = bpf_map_lookup_elem(&pkt_count, &key);
if (!count) {
bpf_map_update_elem(&pkt_count, &key, &init_val, BPF_ANY);
return 1;
}
__sync_fetch_and_add(count, 1);
return 1;
}This prototype proved that eBPF could provide the granular control needed to prevent circular dependencies in deployments. By isolating deployment scripts and selectively filtering their network access, GitHub ensured that critical fixes could proceed even when core services were down.
A new standard for deployment safety
GitHub’s experiment with eBPF highlights a broader shift in how engineering teams approach deployment reliability. Rather than relying on reactive measures or manual reviews, organizations can now implement proactive, kernel-level safeguards that prevent circular dependencies before they cause incidents.
The lessons from this project extend beyond GitHub’s infrastructure. Teams deploying stateful hosts, running rolling updates, or managing internal services can adopt similar strategies to harden their deployment pipelines. With tools like eBPF becoming more accessible, the future of deployment safety may no longer hinge on luck—but on precision.
AI summary
Learn how GitHub engineers broke deployment circular dependencies using eBPF to isolate scripts and prevent outage-induced failures. Explore the tech behind this breakthrough.