In the rush to ship features, developers sometimes embed API keys or credentials directly into code to test functionality. While the intent may be temporary, the consequences linger indefinitely once those secrets enter version control. A 2023 report from GitGuardian identified more than 10 million exposed secrets in public GitHub repositories, and the volume continues to rise. The damage isn't just embarrassment—compromised keys can trigger massive cloud bills on AWS or grant attackers access to sensitive payment data on Stripe. Cleaning secrets from history requires rewriting commits, force-pushing, and rotating every dependent service. Prevention, therefore, must happen before the push.
How pre-commit hooks stop secrets before they reach history
Git’s pre-commit hook runs automatically before each commit. If the hook exits with a non-zero status, the commit is blocked entirely—effectively stopping secrets from ever entering the repository. The solution involves scanning staged files for patterns that resemble API keys, tokens, or credentials. When a match is found, the developer is prompted to remove the secret or suppress the flag before proceeding.
To implement this, teams add a script to .git/hooks/pre-commit. The hook filters staged files, skips binaries, and uses git show ":$file" to read only the staged version—not the working copy—ensuring consistency and preventing false negatives from partial staging:
#!/bin/sh
# Pre-commit hook: block secrets from entering git history
set -e
STAGED_FILES=$(git diff --cached --name-only --diff-filter=ACM)
if [ -z "$STAGED_FILES" ]; then
exit 0
fi
FOUND=0
for file in $STAGED_FILES; do
# Skip binary files
if file "$file" | grep -q "binary"; then
continue
fi
# Read only staged content
CONTENT=$(git show ":$file" 2>/dev/null) || continue
# Check for known secret patterns
if echo "$CONTENT" | check_patterns "$file"; then
FOUND=1
fi
done
if [ "$FOUND" -eq 1 ]; then
echo "COMMIT BLOCKED: potential secrets detected."
echo "Use a suppression comment to bypass false positives."
exit 1
fiMatching patterns that reveal real-world secrets
The most effective hooks include a curated set of regular expressions derived from actual secret formats. These patterns target common providers like AWS, Stripe, and GitHub, as well as generic high-entropy strings that often represent API keys or tokens:
check_patterns() {
file="$1"
matched=0
# AWS Access Key ID
if echo "$CONTENT" | grep -nE 'AKIA[0-9A-Z]{16}' | filter_suppressed; then
echo " [AWS] $file: AWS Access Key ID"
matched=1
fi
# Stripe secret keys
if echo "$CONTENT" | grep -nE 'sk_(live|test)_[0-9a-zA-Z]{24,}' | filter_suppressed; then
echo " [STRIPE] $file: Stripe secret key"
matched=1
fi
# Stripe restricted keys
if echo "$CONTENT" | grep -nE 'rk_(live|test)_[0-9a-zA-Z]{24,}' | filter_suppressed; then
echo " [STRIPE] $file: Stripe restricted key"
matched=1
fi
# GitHub personal access tokens
if echo "$CONTENT" | grep -nE 'ghp_[0-9a-zA-Z]{36}' | filter_suppressed; then
echo " [GITHUB] $file: GitHub PAT"
matched=1
fi
# Generic high-entropy strings
if echo "$CONTENT" | grep -nE "['\"][0-9a-zA-Z]{32,}['\"]" | filter_suppressed; then
echo " [ENTROPY] $file: high-entropy string (>=32 chars)"
matched=1
fi
return $matched
}The generic check for 32+ character alphanumeric strings is especially valuable, catching tokens and keys that don’t match known vendor prefixes. It also flags legitimate values like UUIDs or hashes, which is where suppression becomes essential.
Suppressing false positives with intentional comments
No pattern scanner is perfect. Hashes, encoded public keys, or long IDs may trigger false alarms. To avoid disabling the hook entirely, teams use a suppression pragma: pii-ok. If a line contains this marker, the scanner skips it. This balance keeps the hook effective while minimizing interruptions:
// Test fixture containing a SHA-256 hash - no sensitive data
const EXPECTED_HASH = 'a1b2c3d4e5f6...'; // pii-ok
// This WILL be blocked (no suppression comment)
const STRIPE_KEY = 'sk_live_abc123...';The rule is straightforward: if a value is confirmed non-sensitive, add pii-ok. If unsure, leave it uncensored and let the hook flag it. The minor friction of a false positive pales in comparison to the cost of a leaked secret.
Extending protection to .env and .htaccess files
Secrets aren’t limited to source code. Environment files and web server configurations frequently contain credentials. Teams should extend pre-commit checks to block .env files entirely and flag .htaccess entries that embed real values:
# Block .env files
if echo "$file" | grep -qE '\.env$'; then
echo " [ENV] $file: .env files must be .gitignored"
FOUND=1
continue
fi
# Flag SetEnv in .htaccess with real values
if echo "$file" | grep -qE '\.htaccess$'; then
if echo "$CONTENT" | grep -nE 'SetEnv\s+\S+\s+\S+' | filter_suppressed; then
echo " [HTACCESS] $file: SetEnv with real values"
FOUND=1
fi
fiThe convention is simple: commit sanitized templates like .env.example with placeholder values. The real .env file remains in .gitignore. The same principle applies to .htaccess—keep production credentials out of version control.
Beyond regex: the role of AI in secret detection
Regular expressions excel at catching known patterns, but they struggle with obfuscated or novel secrets. Emerging AI-powered tools analyze context, entropy, and distribution to flag unusual strings that regex might miss. These systems can detect database connection strings, hardcoded JWTs, or custom token formats without relying solely on predefined patterns. Integrating such tools with pre-commit hooks provides a layered defense—combining immediate prevention with adaptive detection.
By implementing a pre-commit hook today, teams can shift from reactive damage control to proactive protection, ensuring secrets never make it into version control in the first place.
AI summary
Stop API key leaks before they reach Git. Learn to automate secret detection with pre-commit hooks, preventing costly breaches and cloud overages using regex and suppression pragmas.