AI工程SynapseB类

[EN] 让 AI 系统自我进化:用 audit log 构建行为约束的反馈闭环

TL;DR

  • Governance rules without execution records are effectively unenforced
  • A PreToolUse hook intercepts every tool call and writes to ceo-guard-audit.log
  • Weekly log analysis surfaces repeat violation patterns → new decision_rules.yaml entries
  • Rules require timestamps and an entropy budget to prevent the governance file from rotting
  • The audit log is the training signal — constraints get sharper the more the system runs

When we launched a session this morning, the CEO Guard startup hook surfaced this immediately:

CEO-GUARD: WARNING - 405 high-risk tool direct call(s) detected

405 calls. Not a one-off slip — a systemic pattern. Our AI CEO (Lysander) operates under a clear execution chain: all write operations must be dispatched to sub-Agents; the main conversation is restricted to five tools (Read, Skill, Agent, Glob, Grep). Bash, Edit, Write, and similar tools are explicitly blacklisted. The rule was in CLAUDE.md. Everyone “knew.” And yet.

The problem wasn’t that the model ignored the rule. It’s that the rule had no runtime enforcement loop. Under high-pressure tasks, the model exhibits a consistent shortcut tendency: when the sub-Agent dispatch chain adds even a small amount of friction, and the task looks simple enough, it calls the tool directly — then files a report afterward. No single call looks egregious in isolation. Only when you aggregate across dozens of sessions does the pattern become obvious: certain task types (single-file config patches, quick append writes) trigger direct calls at a stable rate above 60%.

CEO Guard fills that gap. As a PreToolUse hook, it fires before every tool call and writes the caller, tool name, timestamp, and context snippet to logs/ceo-guard-audit.log. It doesn’t block — it records. That record is the data. Every Sunday at 23:00 Dubai time, harness_engineer and execution_auditor run a 6-dimension review of that log, and high-frequency violation patterns get promoted directly into agent-CEO/config/decision_rules.yaml as explicit task-type → dispatch-route mappings. decision_engine.py reads that config at session init and injects the patterns into live interception logic. The loop is closed.

To keep the governance file itself from becoming an unmaintainable black box, CLAUDE.md enforces a hard line count budget (320 lines at current phase), every rule requires a # [ADDED: YYYY-MM-DD] timestamp, and anything not reaffirmed within 180 days enters deprecation review automatically.


Key Takeaways

  • If you’re setting behavioral boundaries for an AI system, add a PreToolUse-level runtime hook alongside the rule text — rules describe intent, hooks capture deviation, neither is sufficient alone.
  • If individual violations seem explainable in context, aggregate across sessions by task type; patterns with >60% trigger rates belong in config, not in prompts you’ll repeat indefinitely.
  • If you maintain a governance config file, give it a hard line limit and mandatory timestamps — without an entropy budget, it becomes untouchable within three months.
  • If you’re designing automated review cycles, route output directly to executable config changes, not recommendation documents — recommendation docs have a half-life of about two weeks.

Read the Full Article (Chinese)

This is an abstract. The full technical walkthrough — including the YAML rule schema, the weekly review checklist, and the 5-scenario bypass regression suite — is in the original Chinese article above.