For decades, economists have relied on physical exports, patent filings, and academic research to measure a nation’s economic complexity. Yet these traditional metrics overlook one of the most critical drivers of modern growth: software. A groundbreaking study published in Research Policy now demonstrates how open-source software production—tracked through developer activity on GitHub—can reveal a nation’s "digital complexity," offering unprecedented insights into GDP, income inequality, and even emissions that conventional data often misses.
Rethinking economic complexity through open-source data
The paper, co-authored by researchers from Corvinus University of Budapest, Maastricht University, and the Toulouse School of Economics, introduces a novel approach to measuring economic complexity by analyzing software development patterns. Traditional methods focus on tangible goods and intellectual property, but software operates in a borderless digital space, making its production nearly invisible in standard economic models.
Sándor Juhász, a research fellow at Corvinus University, highlights the limitations of current approaches: "For the past 15 years, economists have measured economic complexity by tracking physical exports, patents, and research publications. These methods are remarkably effective at predicting growth and inequality, but they completely miss software—the invisible engine of modern economies."
Johannes Wachs, Associate Professor at Corvinus and Director of the Center for Collective Learning, compares software to "digital dark matter," an essential yet undetected component of economic activity. He explains that unlike physical goods, software crosses borders instantly via version control systems like git push, cloud platforms, and package repositories, leaving little trace in traditional trade or patent databases.
From programming languages to economic indicators
To quantify this digital complexity, the research team leveraged the GitHub Innovation Graph, a dataset tracking developer activity across 163 economies and 150 programming languages from 2020 to 2023. However, individual languages don’t tell the full story—most software projects combine multiple languages into coherent technology stacks.
To address this, Sándor and his colleagues used the GitHub GraphQL API to identify which languages frequently co-occur within the same repositories. They then applied hierarchical clustering to group these languages into 59 distinct "software bundles," each representing a common technology stack. For example:
- - A web application might bundle HTML, CSS, and JavaScript.
- - A machine learning project typically combines Python with Jupyter Notebook.
- - Systems programming often pairs C with Assembly.
Jermain Kaminski, Assistant Professor at Maastricht University, explains the methodology: "We computed cosine similarity between languages based on their co-occurrence patterns, then normalized the data to prevent polyglot repositories—those using many languages—from skewing the results. This allowed us to map the software landscape into meaningful technology stacks."
The team then applied the Economic Complexity Index (ECI) to these software bundles, measuring how disproportionately a country specializes in certain stacks relative to global averages. Countries with diverse, less common stacks score higher, while those specializing in ubiquitous bundles rank lower. The results revealed that software ECI provides new predictive power for GDP per capita and income inequality, even after accounting for traditional economic indicators.
The principle of relatedness in software ecosystems
Beyond measuring complexity, the study explored how nations diversify their software capabilities. Just as countries in the physical economy expand into related product categories, the research found that nations tend to move into software stacks that align with their existing expertise.
Johannes Wachs elaborates: "We discovered that countries don’t randomly jump between software specializations. Instead, they follow a principle of relatedness, much like in traditional economies. If a country excels in web development, it’s more likely to expand into adjacent areas like backend frameworks or cloud infrastructure."
This finding suggests that software-driven economic growth isn’t random but follows predictable patterns, offering governments and policymakers a roadmap for strategic investment in digital infrastructure and education.
César Hidalgo, Professor at the Toulouse School of Economics and Director of the Center for Collective Learning, provides a simple analogy: "Think of countries like kitchens. Some kitchens can cook anything because they have diverse ingredients and tools. Similarly, countries with broad software capabilities can adapt to new technologies faster and innovate across multiple domains."
What’s next for digital economic measurement?
The implications of this research extend far beyond academic curiosity. By demonstrating that open-source software activity can serve as a proxy for economic complexity, the study opens new avenues for policymakers, investors, and researchers to assess national competitiveness in the digital age.
For instance, governments could use software ECI to identify underserved technology niches, prioritize STEM education initiatives, or design targeted incentives for emerging tech sectors. Investors might leverage these insights to spot early-stage markets with untapped potential.
As the global economy becomes increasingly software-driven, tools like the GitHub Innovation Graph are poised to play a crucial role in shaping economic policy and strategy. The next frontier may involve integrating real-time developer activity data into broader economic forecasting models, ensuring that nations don’t just measure their growth but actively cultivate the digital foundations of their future prosperity.
AI summary
Discover how analyzing open-source developer activity on GitHub can predict a nation’s GDP, inequality, and emissions better than traditional metrics. Learn the surprising insights from groundbreaking research.
Tags