AI systems today operate like forgetful assistants—erasing every conversation the moment it ends. This stateless design forces developers to resend entire histories with each query, creating inefficiency and memory limits. The solution lies in stateful agents: systems that persist knowledge, refine skills, and evolve alongside users.
Below, we explore the architecture behind Hermes Agent, a stateful AI framework that mirrors how professionals retain expertise. From core identity files to dynamic skill libraries, we’ll break down the engineering patterns that enable true autonomy—and provide a Python implementation to build your own self-improving agent.
The Stateless Trap: Why Most AI Tools Fail to Adapt
Stateless AI systems function like vending machines—they process inputs in isolation, with no recollection of past interactions or user preferences. Every request triggers a fresh context load, forcing developers to reconstruct conversation histories repeatedly.
Consider this Python function, a textbook example of statelessness:
import datetime
def parse_date(date_string: str) -> datetime.datetime:
return datetime.datetime.strptime(date_string, "%Y-%m-%d")This routine has no memory of previous inputs or user habits. It doesn’t adapt to regional date formats or optimize based on repeated use. When developers attempt to simulate continuity by stitching chat histories together, three critical flaws emerge:
- Context overload: Token usage grows exponentially with conversation length, straining budgets and model performance.
- Memory erosion: Context windows cap out, deleting older interactions that may contain crucial context.
- Knowledge fragmentation: Lessons learned in one session vanish entirely in the next, forcing repeated discovery of solutions.
Stateful agents eliminate these issues by treating memory as a persistent, evolving system—akin to a skilled artisan carrying forward lessons from past projects.
Designing Stateful Intelligence: Memory, Identity, and Skills
The Hermes Agent framework structures statefulness into three interconnected layers that mirror human cognitive organization:
1. Soul: The Agent’s Core Identity
The SOUL.md file acts as the agent’s foundational constitution, defining its communication style, ethical boundaries, and operational principles. Rather than dynamic data, this document serves as immutable guidance injected into every system prompt.
For example, a developer-focused agent might include:
- Preferred coding standards (e.g., PEP 8 compliance)
- Response tone (concise technical explanations)
- Security protocols (no direct file modifications)
A helper class reads this markdown file and embeds it into the LLM’s system prompt, ensuring consistent behavior regardless of task complexity.
2. Memory: Episodic and Semantic Knowledge
Memory in stateful agents splits into two curated stores:
- USER.md: Static user attributes like programming languages, OS preferences, or working hours. This file grows only with deliberate updates.
- MEMORY.md: Dynamic episodic knowledge such as database ports, recurring bugs, or user-specific workflows. This store evolves organically during tasks.
A MemoryStore class manages these files, enabling the agent to:
- Retrieve context for new queries (e.g., "Remember the user prefers pytest over unittest")
- Append new insights (e.g., "Fixed the Docker network conflict by restarting the container")
- Prune irrelevant data automatically
Semantic search tools enhance retrieval by mapping natural language queries to stored memories, avoiding the need to sift through raw transcripts.
3. Skills: Reusable Procedural Knowledge
Skills represent the agent’s "how-to" knowledge—packaged toolkits for recurring tasks. In Hermes, each skill is a self-contained directory within ~/.hermes/skills/, containing:
- Configuration files defining trigger phrases (e.g., "deploy to staging")
- Python modules with reusable functions (e.g.,
database_backup()) - Documentation explaining usage and dependencies
Skills operate like macros, allowing the agent to:
- Chain operations (e.g., "Backup database → Run migrations → Deploy app")
- Share expertise across user sessions
- Update toolkits independently of the agent’s core code
For instance, a docker_debug skill might include:
# ~/.hermes/skills/docker_debug/actions.py
import subprocess
def inspect_container(container_name: str) -> str:
"""Check container logs and return error insights."""
cmd = f"docker logs {container_name} --tail 50"
result = subprocess.run(cmd, shell=True, capture_output=True, text=True)
return result.stdoutBuilding Your Own Stateful Agent: A Step-by-Step Guide
Implementing statefulness requires three core components:
1. Initialize the Memory System
Start by creating a MemoryManager class to handle file operations and semantic indexing:
from pathlib import Path
import json
from sentence_transformers import SentenceTransformer
class MemoryManager:
def __init__(self, memory_dir: str = "~/.hermes/memory"):
self.memory_dir = Path(memory_dir).expanduser()
self.model = SentenceTransformer('all-MiniLM-L6-v2')
self._ensure_directories()
def _ensure_directories(self):
self.memory_dir.mkdir(parents=True, exist_ok=True)
(self.memory_dir / "episodic").mkdir(exist_ok=True)
(self.memory_dir / "semantic").mkdir(exist_ok=True)2. Design the Skill Architecture
Structure skills as modular packages with a standardized interface:
~/.hermes/skills/
├── docker_debug/
│ ├── __init__.py
│ ├── actions.py
│ └── config.json
└── git_workflow/
├── __init__.py
├── actions.py
└── README.mdEach skill’s __init__.py should expose a run() method accepting JSON-serialized parameters:
# docker_debug/__init__.py
from .actions import inspect_container
def run(params: dict) -> dict:
container = params.get("container_name", "app")
logs = inspect_container(container)
return {"status": "success", "logs": logs}3. Integrate with an LLM
Use the agent’s SOUL.md to guide the LLM’s responses while delegating stateful operations to your memory and skill systems:
from openai import OpenAI
class HermesAgent:
def __init__(self, soul_path: str = "~/.hermes/SOUL.md"):
self.soul = Path(soul_path).read_text()
self.client = OpenAI()
self.memory = MemoryManager()
self.skills = SkillRegistry()
def process_query(self, query: str) -> str:
# Retrieve relevant context
user_prefs = self.memory.load_user_preferences()
episodic_memories = self.memory.search_episodic(query)
# Inject context into system prompt
system_prompt = f"""{self.soul}
User preferences: {json.dumps(user_prefs)}
Relevant memories: {episodic_memories}
"""
# Generate response
response = self.client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": query}
]
)
return response.choices[0].message.contentThe Future of Stateful AI: Beyond Static Prompts
Stateful agents represent a paradigm shift from prompt engineering to persistent intelligence. By decoupling memory, identity, and skills into modular systems, developers can create AI assistants that:
- Reduce token waste by eliminating redundant context
- Maintain continuity across sessions and projects
- Evolve without requiring full retraining
- Operate with human-like adaptability
The Hermes Agent framework demonstrates that statefulness isn’t just an architectural choice—it’s the foundation for AI systems that can truly collaborate. As memory systems grow more sophisticated and skills become shareable across users, we’re moving closer to agents that don’t just answer questions but grow alongside their human counterparts.
AI summary
Learn how to build stateful AI agents with persistent memory and self-learning loops. Explore the Hermes Agent architecture with a hands-on Python implementation.