Domain setup scans your repo and builds initial configuration. Task execution decomposes intent into atomic steps with verification gates. Enforcement operates at the tool-call boundary. Every action is audited and persistent across sessions.
The Problem
Mid-task they forget what they were doing. They skip steps they read minutes ago. They rationalize shortcuts as "more efficient." The model has already proven it will read the instruction, agree, and then violate it. Telling it harder doesn't work. Asking nicely doesn't work. The fix has to be structural.
The Loop
session-start, anchor, work, complete. The same cycle, every session. Anchors fire every N actions. Failures trigger a mandatory learn step. State survives restarts. Context survives compression. The loop itself is hook-enforced. Specific rules within it land in one of two tiers: hooks where they can, protocol re-read at every anchor where they can't.
session-start -> anchor -> WORK -----------------> complete
^ | ^
`-- every 10 actions <-----------'
|
failure? -> fix -> learn (MANDATORY)
Domain Setup
The domain-setup command scans your repository structure, discovers file patterns, naming conventions, and existing code organization, then generates an initial protocol configuration. This configuration becomes the baseline for governance. Optional domain specs layer industry-specific knowledge on top of the discovered patterns.
Discovers file structure, naming patterns, testing conventions, build patterns, and existing code organization across the repository.
Identifies conventions in use: test frameworks, dependency patterns, file naming, module structure, and architectural patterns already present in the codebase.
Generates an initial protocol that encodes discovered patterns as rules. The protocol becomes the baseline that governance enforces during execution.
Drop-in skill folders that add industry-specific knowledge. Merge with discovered patterns to create domain-aware governance without requiring manual configuration.
Task Execution
The execute-pipeline command transforms written intent into executable tasks. The task-builder decomposes intent into atomic steps, each with a single action and measurable acceptance criteria. Each step includes a gate contract that defines preconditions and success criteria. The run-task.sh executor cycles through tasks with retry logic, state persistence, and failure recording.
Captures intent as a backlog item. Records the user's raw input and the resulting specification for auditability and change tracking.
Breaks the goal into atomic tasks. Each task has one primary action, measurable acceptance criteria, and dependencies. Task count is tracked to size execution windows appropriately.
Defines BUILD gates (verification tests) and TEST gates (integration validation). Each gate must PASS before the task is marked complete. Failed gates trigger re-execution with lesson recording.
run-task.sh cycles through tasks sequentially. Each task runs as a fresh session with full context recovery. Failed tasks retry up to N times, then skip with audit trail preservation.
Autonomous Cycling
The autonomous-cycle command executes task queues sequentially while maintaining full governance enforcement. State persists between sessions completed tasks are tracked, failed tasks are logged with failure details, and skipped tasks include audit records. The agent picks up exactly where it left off: what task failed, why, and what was attempted.
Session state, workflow state, and intent chains survive process restarts. The next session reads this state and resumes mid-task without re-explaining context or repeating completed work.
Every task is tracked: completed (DONE), failed (with retry count and error details), or skipped (with skip reason and timestamp). This tracking enables deterministic resumption and accurate progress reporting.
Failed tasks are retried up to a configured limit. After N consecutive failures, the task is skipped and execution continues. Failure details are recorded in the audit trail for later analysis.
Governance enforcement continues throughout cycling. Anchors fire every N actions. Learn cycles trigger on test failures. State validation gates block writes until preconditions are met. The cycle itself is hook-enforced.
SDD Architecture
Each component addresses a specific failure mode. Together they reduce the surface area where the agent must be trusted to follow text it has already proven it will violate.
Fire at the tool-call boundary. Block writes when state preconditions fail. Append every action to an audit log. For rules wired to a hook, the hook isn't reading instructions: it's reading bytes. Rules that can't be reduced to a check live in the protocol tier instead.
Kernel operations the agent must invoke: anchor, learn, complete, backlog, execute-pipeline, autonomous-cycle. Skill-based, so the agent calls them through a Skill interface. Each command enforces template structure and consistency.
Multi-step capabilities: autonomous cycling, task building, domain setup, production testing, audit workflows. Each skill is an indexed file structure: entry point plus step-by-step references the agent reads at execution time.
Session state, workflow state, intent chains, anchor logs, lesson files. Append-only where it matters (audit trail). Mutable where it doesn't (session counters). Hash-signed for cross-machine verifiability and integrity checking.
Design Decision
Hooks alone are too rigid. Protocol alone is too soft. Both are necessary. The lesson file is exactly the seam between them: every entry is either already a hook or a candidate to become one.
Hooks fire at the tool-call boundary. If state says needs_learn: true, every write is blocked until /kernel/learn runs. No talking past it. No prompt engineering past it. No editing state files to bypass it: that path is itself a hook-blocked write.
Every 10 actions, the anchor command forces the agent to re-read the protocol, re-read the lessons, and apply specific rules to its next action with concrete verbs. Protocol drift is a guaranteed failure mode in long sessions. The anchor reverses it on a schedule.
Hooks alone would require encoding every conceivable rule in code: impossible for a self-improving system. Protocol alone would rely on the agent following text it has already proven it will violate. Together: hooks for the rules you can enforce mechanically, protocol for the rules you can't.
Specialized Skills
Beyond core governance, specialized skills extend the kernel for domain-specific and operational use cases. Each skill encodes patterns, workflows, and conventions for a specific capability. All skills run under kernel governance the enforcement loop applies regardless of what the skill does.
Assembles a test environment, copies the codebase, runs L1/L2/L3 tests (existence → execution → correctness), collects results, and cleans up infrastructure. Runs as a spawned sub-process under governance.
Scans kernel infrastructure (commands, skills, hooks, protocol, state, testing) for gaps, generates fix tasks, and cycles through execution. Produces a gap report and auto-remediation tasks.
Clones a website structure via Playwright, reproduces HTML/CSS/JS locally, generates a static archive. Useful for testing and local development of web-based tools.
Orchestrates multiple independent agents across separate task folders. Each agent maintains its own state and session. The parent agent monitors, aggregates results, and handles inter-agent coordination.
Cross-Session Persistence
Multiple state files work together to preserve execution context across process restarts. Session state holds anchor status and action counts. Workflow state tracks completed tasks and current task. Intent chains hash user input and generated specifications. Anchor logs record violations. Lesson files document learned rules. Together, they enable deterministic resumption from exactly where execution was interrupted.
Current session ID, anchor timestamp, actions since last anchor, protocol hash, and pending anchor tokens. Tracks whether the agent is currently anchored and ready to work.
Current task, completed task list, skipped task list, total task count, and current action counts. Enables cycling to resume at the exact next task without repeating completed work.
SHA-256 hash of raw user input and resulting specification file. Append-only and immutable. Proves what the user requested and what was produced, with cryptographic integrity.
Anchor logs record violations and actions reviewed. Lesson files document failures and what changed. All append-only, time-stamped, and searchable. Enable recurrence detection and escalation triggers.
Results
Numbers from the workspace this page was built in. Receipts in the linked repos.
Produced Harnesses
Each harness below demonstrates the kernel applied to a specific domain. All use the same governance layer and execution model with domain-specific patterns and conventions encoded in optional domain specs.
Stack: TypeScript, Playwright Test
What: Web UI and API test automation harness with kernel-enforced 5-layer architecture
View on GitHub (6 stars) ↗Stack: Python, Selenium, pytest
What: AI-powered Selenium test automation harness
View on GitHub (4 stars) ↗Stack: Docker, pytest, Python
What: AI-powered Docker container validation harness
View on GitHub ↗Stack: Python, SSH, compliance frameworks
What: SSH image testing platform with compliance gates
View on GitHub ↗Stack: Python, DeepEval
What: LLM output evaluation, 5-layer architecture, 29+ metrics
View on GitHub ↗Stack: Claude Code, kernel
What: AI-powered app builder for non-technical founders
View on GitHub ↗Receipts
An anchor catches a self-detected violation. The intent chain hashes the user's words. A recorded lesson flags a candidate for mechanical enforcement after recurrence, if the rule can be hooked at all. All three are append-only artifacts from this workspace.
{
"anchor_timestamp": "2026-06-08T06:45:00Z",
"actions_count": 48,
"violations_found": 1,
"violation_details": [
"Backlog 129 created via direct intent.py call,
bypassing /kernel/backlog skill."
],
"needs_learn_set": true
}Every anchor archives the actions reviewed and any violations found. When backlog 129 was created by bypassing the /kernel/backlog skill, the next anchor flagged it and set needs_learn: true. The agent could not write again until the lesson was recorded.
{
"rev": 1,
"timestamp": "2026-06-08T07:24:57+00:00",
"raw_input_hash": "0c43483511fb6ee241b34fc58c294751...",
"backlog_hash_after": "6f9ebac571cb3369ee518f1d04432458..."
}Every backlog item is hash-signed. raw_input_hash is the SHA-256 of the user's literal words at /kernel/backlog invocation time. backlog_hash_after is the SHA-256 of the resulting file. Both go into an append-only chain a third party can verify.
## 2026-06-08 Bypassed /kernel/backlog
- Root cause: batched "yes" trigger, agent
optimized middle item for speed
- Recurrence: 2nd time recorded
(first: 2026-04-25 with backlog 047)
- Escalation: mechanical enforcement via
PreToolUse hook now overdueWhen the /kernel/backlog bypass happened a second time, the lesson was updated with a recurrence count and an escalation note: mechanical enforcement via PreToolUse hook is now overdue. Two failures of the same rule means human discipline lost.
Full attestation chain at /attestation.html.
Who This Is For
You've felt the drift. Agents that read the rule then break it. You want governance, not vibes. The kernel is the execution layer under the agents.
You deploy agents into customer environments. You need the runtime layer to be inspectable, governable, and auditable. This is that layer.
You think about the seam between behavioral instructions and mechanical enforcement. Open-source kernel, real audit trail, recurrence detection, lesson loop. All inspectable.
Or email alain@isagawa.co direct.