Isagawa | Kernel

The Problem

Agents drift. Forget. Skip steps. Prompts don't fix it.

Mid-task they forget what they were doing. They skip steps they read minutes ago. They rationalize shortcuts as "more efficient." The model has already proven it will read the instruction, agree, and then violate it. Telling it harder doesn't work. Asking nicely doesn't work. The fix has to be structural.

How do you stop an agent from skipping a checkpoint it just read about?
How do you make the system catch its own failures, even when the agent thinks it didn't fail?
How do you keep behavior consistent across hour-long sessions and context compression?

The Loop

Every session runs the same loop. No step can be skipped.

session-start, anchor, work, complete. The same cycle, every session. Anchors fire every N actions. Failures trigger a mandatory learn step. State survives restarts. Context survives compression. The loop itself is hook-enforced. Specific rules within it land in one of two tiers: hooks where they can, protocol re-read at every anchor where they can't.

session-start -> anchor -> WORK -----------------> complete
                   ^         |                       ^
                   `-- every 10 actions <-----------'
                             |
                   failure? -> fix -> learn (MANDATORY)

Domain Setup

Configuration from code structure.

The domain-setup command scans your repository structure, discovers file patterns, naming conventions, and existing code organization, then generates an initial protocol configuration. This configuration becomes the baseline for governance. Optional domain specs layer industry-specific knowledge on top of the discovered patterns.

1

Repository Scan

Discovers file structure, naming patterns, testing conventions, build patterns, and existing code organization across the repository.

2

Pattern Extraction

Identifies conventions in use: test frameworks, dependency patterns, file naming, module structure, and architectural patterns already present in the codebase.

3

Protocol Generation

Generates an initial protocol that encodes discovered patterns as rules. The protocol becomes the baseline that governance enforces during execution.

4

Domain Specs (Optional)

Drop-in skill folders that add industry-specific knowledge. Merge with discovered patterns to create domain-aware governance without requiring manual configuration.

Task Execution

From intent to verified completion.

The execute-pipeline command transforms written intent into executable tasks. The task-builder decomposes intent into atomic steps, each with a single action and measurable acceptance criteria. Each step includes a gate contract that defines preconditions and success criteria. The run-task.sh executor cycles through tasks with retry logic, state persistence, and failure recording.

1

/kernel/backlog

Captures intent as a backlog item. Records the user's raw input and the resulting specification for auditability and change tracking.

2

Task Decomposition

Breaks the goal into atomic tasks. Each task has one primary action, measurable acceptance criteria, and dependencies. Task count is tracked to size execution windows appropriately.

3

Gate Contract

Defines BUILD gates (verification tests) and TEST gates (integration validation). Each gate must PASS before the task is marked complete. Failed gates trigger re-execution with lesson recording.

4

Execution & Cycling

run-task.sh cycles through tasks sequentially. Each task runs as a fresh session with full context recovery. Failed tasks retry up to N times, then skip with audit trail preservation.

Autonomous Cycling

Task queues across sessions.

The autonomous-cycle command executes task queues sequentially while maintaining full governance enforcement. State persists between sessions completed tasks are tracked, failed tasks are logged with failure details, and skipped tasks include audit records. The agent picks up exactly where it left off: what task failed, why, and what was attempted.

State Persistence

Session state, workflow state, and intent chains survive process restarts. The next session reads this state and resumes mid-task without re-explaining context or repeating completed work.

Execution Tracking

Every task is tracked: completed (DONE), failed (with retry count and error details), or skipped (with skip reason and timestamp). This tracking enables deterministic resumption and accurate progress reporting.

Retry Logic

Failed tasks are retried up to a configured limit. After N consecutive failures, the task is skipped and execution continues. Failure details are recorded in the audit trail for later analysis.

Governance During Cycling

Governance enforcement continues throughout cycling. Anchors fire every N actions. Learn cycles trigger on test failures. State validation gates block writes until preconditions are met. The cycle itself is hook-enforced.

SDD Architecture

Four moving parts. One governance layer.

Each component addresses a specific failure mode. Together they reduce the surface area where the agent must be trusted to follow text it has already proven it will violate.

1

Hooks

Fire at the tool-call boundary. Block writes when state preconditions fail. Append every action to an audit log. For rules wired to a hook, the hook isn't reading instructions: it's reading bytes. Rules that can't be reduced to a check live in the protocol tier instead.

2

Commands

Kernel operations the agent must invoke: anchor, learn, complete, backlog, execute-pipeline, autonomous-cycle. Skill-based, so the agent calls them through a Skill interface. Each command enforces template structure and consistency.

3

Skills

Multi-step capabilities: autonomous cycling, task building, domain setup, production testing, audit workflows. Each skill is an indexed file structure: entry point plus step-by-step references the agent reads at execution time.

4

State

Session state, workflow state, intent chains, anchor logs, lesson files. Append-only where it matters (audit trail). Mutable where it doesn't (session counters). Hash-signed for cross-machine verifiability and integrity checking.

Design Decision

Two-tier governance. Mechanical and behavioral.

Hooks alone are too rigid. Protocol alone is too soft. Both are necessary. The lesson file is exactly the seam between them: every entry is either already a hook or a candidate to become one.

Tier 1: Mechanical

Hooks fire at the tool-call boundary. If state says needs_learn: true, every write is blocked until /kernel/learn runs. No talking past it. No prompt engineering past it. No editing state files to bypass it: that path is itself a hook-blocked write.

Tier 2: Behavioral

Every 10 actions, the anchor command forces the agent to re-read the protocol, re-read the lessons, and apply specific rules to its next action with concrete verbs. Protocol drift is a guaranteed failure mode in long sessions. The anchor reverses it on a schedule.

Hooks alone would require encoding every conceivable rule in code: impossible for a self-improving system. Protocol alone would rely on the agent following text it has already proven it will violate. Together: hooks for the rules you can enforce mechanically, protocol for the rules you can't.

Specialized Skills

Purpose-built execution patterns.

Beyond core governance, specialized skills extend the kernel for domain-specific and operational use cases. Each skill encodes patterns, workflows, and conventions for a specific capability. All skills run under kernel governance the enforcement loop applies regardless of what the skill does.

Production Testing

Assembles a test environment, copies the codebase, runs L1/L2/L3 tests (existence → execution → correctness), collects results, and cleans up infrastructure. Runs as a spawned sub-process under governance.

Audit Workflow

Scans kernel infrastructure (commands, skills, hooks, protocol, state, testing) for gaps, generates fix tasks, and cycles through execution. Produces a gap report and auto-remediation tasks.

Website Cloner

Clones a website structure via Playwright, reproduces HTML/CSS/JS locally, generates a static archive. Useful for testing and local development of web-based tools.

Agent Swarm

Orchestrates multiple independent agents across separate task folders. Each agent maintains its own state and session. The parent agent monitors, aggregates results, and handles inter-agent coordination.

Cross-Session Persistence

State that survives restarts.

Multiple state files work together to preserve execution context across process restarts. Session state holds anchor status and action counts. Workflow state tracks completed tasks and current task. Intent chains hash user input and generated specifications. Anchor logs record violations. Lesson files document learned rules. Together, they enable deterministic resumption from exactly where execution was interrupted.

Session State

Current session ID, anchor timestamp, actions since last anchor, protocol hash, and pending anchor tokens. Tracks whether the agent is currently anchored and ready to work.

Workflow State

Current task, completed task list, skipped task list, total task count, and current action counts. Enables cycling to resume at the exact next task without repeating completed work.

Intent Chains

SHA-256 hash of raw user input and resulting specification file. Append-only and immutable. Proves what the user requested and what was produced, with cryptographic integrity.

Audit Artifacts

Anchor logs record violations and actions reviewed. Lesson files document failures and what changed. All append-only, time-stamped, and searchable. Enable recurrence detection and escalation triggers.

Results

130+ Backlog items processed

29 Lessons recorded

122 Pipeline runs completed

39 Repos across both orgs

Numbers from the workspace this page was built in. Receipts in the linked repos.

Produced Harnesses

Same kernel. Different stacks.

Each harness below demonstrates the kernel applied to a specific domain. All use the same governance layer and execution model with domain-specific patterns and conventions encoded in optional domain specs.

platform-playwright

Stack: TypeScript, Playwright Test

What: Web UI and API test automation harness with kernel-enforced 5-layer architecture

View on GitHub (6 stars) ↗

platform-selenium

Stack: Python, Selenium, pytest

What: AI-powered Selenium test automation harness

View on GitHub (4 stars) ↗

platform-docker

Stack: Docker, pytest, Python

What: AI-powered Docker container validation harness

View on GitHub ↗

platform-ssh

Stack: Python, SSH, compliance frameworks

What: SSH image testing platform with compliance gates

View on GitHub ↗

test-platform-deepeval

Stack: Python, DeepEval

What: LLM output evaluation, 5-layer architecture, 29+ metrics

View on GitHub ↗

vibe-coder-agent

Stack: Claude Code, kernel

What: AI-powered app builder for non-technical founders

View on GitHub ↗

Receipts

The system writes down its own failures.

An anchor catches a self-detected violation. The intent chain hashes the user's words. A recorded lesson flags a candidate for mechanical enforcement after recurrence, if the rule can be hooked at all. All three are append-only artifacts from this workspace.

.claude/state/anchor-logs/2026-06-08/06-45-00Z.json

{
  "anchor_timestamp": "2026-06-08T06:45:00Z",
  "actions_count": 48,
  "violations_found": 1,
  "violation_details": [
    "Backlog 129 created via direct intent.py call,
     bypassing /kernel/backlog skill."
  ],
  "needs_learn_set": true
}

Every anchor archives the actions reviewed and any violations found. When backlog 129 was created by bypassing the /kernel/backlog skill, the next anchor flagged it and set needs_learn: true. The agent could not write again until the lesson was recorded.

.claude/state/intents/131-intent-chain.jsonl

{
  "rev": 1,
  "timestamp": "2026-06-08T07:24:57+00:00",
  "raw_input_hash":     "0c43483511fb6ee241b34fc58c294751...",
  "backlog_hash_after": "6f9ebac571cb3369ee518f1d04432458..."
}

Every backlog item is hash-signed. raw_input_hash is the SHA-256 of the user's literal words at /kernel/backlog invocation time. backlog_hash_after is the SHA-256 of the resulting file. Both go into an append-only chain a third party can verify.

.claude/lessons/kernel-compliance.md

## 2026-06-08 Bypassed /kernel/backlog
- Root cause: batched "yes" trigger, agent
  optimized middle item for speed
- Recurrence: 2nd time recorded
  (first: 2026-04-25 with backlog 047)
- Escalation: mechanical enforcement via
  PreToolUse hook now overdue

When the /kernel/backlog bypass happened a second time, the lesson was updated with a recurrence count and an escalation note: mechanical enforcement via PreToolUse hook is now overdue. Two failures of the same rule means human discipline lost.

Full attestation chain at /attestation.html.

Who This Is For

ENGINEERING LEADER / FOUNDATION

Engineering leaders shipping agents

You've felt the drift. Agents that read the rule then break it. You want governance, not vibes. The kernel is the execution layer under the agents.

FDE / FORWARD-DEPLOYED ENGINEER

FDEs deploying lab-built agents

You deploy agents into customer environments. You need the runtime layer to be inspectable, governable, and auditable. This is that layer.

RESEARCHER / HARNESS DESIGN

Researchers in harness design

You think about the seam between behavioral instructions and mechanical enforcement. Open-source kernel, real audit trail, recurrence detection, lesson loop. All inspectable.

Governed agent runtime with hook-enforced execution.

Agents drift. Forget. Skip steps. Prompts don't fix it.

Every session runs the same loop. No step can be skipped.

Configuration from code structure.

Repository Scan

Pattern Extraction

Protocol Generation

Domain Specs (Optional)

From intent to verified completion.

/kernel/backlog

Task Decomposition

Gate Contract

Execution & Cycling

Task queues across sessions.

State Persistence

Execution Tracking

Retry Logic

Governance During Cycling

Four moving parts. One governance layer.

Hooks

Commands

Skills

State

Two-tier governance. Mechanical and behavioral.

Tier 1: Mechanical

Tier 2: Behavioral

Purpose-built execution patterns.

Production Testing

Audit Workflow

Website Cloner

Agent Swarm

State that survives restarts.

Session State

Workflow State

Intent Chains

Audit Artifacts

Same kernel. Different stacks.

platform-playwright

platform-selenium

platform-docker

platform-ssh

test-platform-deepeval

vibe-coder-agent

The system writes down its own failures.

Engineering leaders shipping agents

FDEs deploying lab-built agents

Researchers in harness design