第6课：无限对话——上下文压缩三层策略

上下文窗口有限，但Agent任务无限。三层压缩策略——micro_compact静默裁剪、auto_compact自动摘要、compact工具手动触发——让Agent在大项目中永不断档。

系列导读

这是**《12课拆解Claude Code架构》系列的第 6 课，也是阶段二：规划与知识**的收官课。

前五课我们造了一个有工具链、规划系统、子Agent隔离和技能加载能力的Agent。但有一个定时炸弹一直在倒计时——上下文窗口总会满。

第 6 课的格言：

"上下文总会满，要有办法腾地方"

这一课，我们给Agent装上三层压缩策略，让对话理论上可以无限进行。

上下文爆炸：一个算术问题

Claude 的上下文窗口是 200k token。听起来很大，但算一笔账：

操作	Token 消耗
读一个 1000 行源码文件	~4,000 token
一次 `ls -la` 输出	~500 token
一次 pytest 输出	~2,000 token
一个中等 JSON 配置	~1,500 token

一个典型的 "帮我重构这个模块" 任务：

读 5 个文件      → 20,000 token
跑 10 次命令     → 15,000 token
写 3 个文件      → 12,000 token
来回 8 轮对话    → 8,000 token
系统提示 + 工具定义 → 5,000 token
───────────────────────────
合计：             ~60,000 token

一个任务就吃掉 30% 的窗口。如果不压缩，三四个任务之后 Agent 就满了——要么报错，要么被迫丢弃最早的消息，丢的可能恰好是关键上下文。

更糟的是，即使没满，上下文越长，模型的注意力越分散。一堆三十轮之前的 cat 输出还躺在消息列表里，模型要在干草堆里找针。

上下文爆炸：几个任务就把窗口撑满

三层压缩架构：总览

解决方案不是一刀切，而是三层递进，激进度逐级上升：

┌──────────────────────────────────────────────────────┐
│                    Agent Loop                         │
│                                                       │
│   每轮结束 ──→ Layer 1: micro_compact (静默裁剪)       │
│                  替换旧 tool_result 为占位符            │
│                  ↓                                    │
│   token > 阈值 → Layer 2: auto_compact (自动摘要)      │
│                  保存 transcript → LLM 做摘要           │
│                  ↓                                    │
│   模型主动调用 → Layer 3: compact 工具 (手动触发)       │
│                  同样的摘要机制，按需执行                 │
│                                                       │
│   激进度:  低 ──────────────────────────────→ 高       │
└──────────────────────────────────────────────────────┘

三层各司其职：

层级	触发方式	激进度	信息损失
Layer 1 micro_compact	每轮自动	低	几乎无
Layer 2 auto_compact	token 超阈值	中	细节压缩为摘要
Layer 3 compact 工具	模型主动调用	中	同 Layer 2

Layer 1：micro_compact —— 静默裁剪

核心洞见

3 轮之前你 cat 过的文件内容，模型大概率不再需要逐行看了。它已经 "读过" 了，脑子里有印象。那些旧的 tool_result 占着空间但不再提供新信息。

实现

def micro_compact(messages: list, keep_recent: int = 3):
    """保留最近 keep_recent 轮完整内容，
    更早的 tool_result 替换为占位符。"""
    assistant_indices = [
        i for i, m in enumerate(messages)
        if m["role"] == "assistant"
    ]
    if len(assistant_indices) <= keep_recent:
        return

    cutoff_index = assistant_indices[-keep_recent]

    for i, msg in enumerate(messages):
        if i >= cutoff_index:
            break
        if msg["role"] != "user":
            continue
        content = msg.get("content", [])
        if not isinstance(content, list):
            continue
        for j, block in enumerate(content):
            if (isinstance(block, dict)
                    and block.get("type") == "tool_result"):
                tool_name = _find_tool_name(
                    messages, block["tool_use_id"]
                )
                content[j] = {
                    **block,
                    "content": f"[Previous: used {tool_name}]",
                }

第 1 轮 cat main.py 返回 200 行代码（~800 token），到第 5 轮被替换为 "[Previous: used bash]"（~10 token）。一次替换省 790 token，10 个旧结果省近 8000 token。

模型看到占位符就知道 "之前用过但输出被压缩了"。需要重新看？再 cat 一次就行。

Layer 2：auto_compact —— 自动摘要

触发条件

micro_compact 是细水长流，但如果对话实在太长，光替换旧输出不够。auto_compact 在 token 总量超过阈值时触发，做更激进的压缩——把整段对话压缩为一个摘要。

两步走：保存 + 摘要

def save_transcript(messages: list) -> Path:
    """保存完整对话到 .transcripts/，压缩前的快照。"""
    transcript_dir = Path(".transcripts")
    transcript_dir.mkdir(exist_ok=True)
    filepath = transcript_dir / f"conversation_{int(time.time())}.jsonl"
    with open(filepath, "w") as f:
        for msg in messages:
            f.write(json.dumps(msg, ensure_ascii=False) + "\n")
    return filepath


def summarize_conversation(client, messages: list) -> str:
    """让 LLM 把完整对话压缩为结构化摘要。"""
    response = client.messages.create(
        model=MODEL, max_tokens=4000,
        system=(
            "把对话压缩为结构化摘要。保留：1) 用户原始目标 "
            "2) 已完成步骤 3) 关键发现和决策 4) 当前待办。"
            "丢弃：具体文件内容、命令输出、中间调试过程。"
        ),
        messages=[{"role": "user",
                    "content": format_messages_for_summary(messages)}],
    )
    return response.content[0].text


def auto_compact(client, messages: list) -> list:
    """保存 transcript + 生成摘要 + 替换消息列表。"""
    filepath = save_transcript(messages)
    summary = summarize_conversation(client, messages)
    return [{
        "role": "user",
        "content": (
            f"[Context compacted. Full transcript: {filepath}]\n\n"
            f"## Conversation Summary\n\n{summary}\n\n"
            "Continue from where we left off."
        ),
    }]

信息没有丢失——它从活跃上下文移到了磁盘。压缩前几万 token，压缩后一条摘要消息 1000-2000 token。压缩比 10:1 到 50:1。

Layer 3：compact 工具 —— 模型主动触发

为什么需要手动层

auto_compact 靠阈值触发。但有时候模型自己知道 "当前上下文太乱了，我需要整理一下再继续"。compact 工具让模型拥有这个自主权。

工具定义 + 处理

COMPACT_TOOL = {
    "name": "compact",
    "description": (
        "Compress the conversation history into a summary. "
        "Use when the context is getting long."
    ),
    "input_schema": {"type": "object", "properties": {}},
}

def handle_compact(client, messages: list) -> tuple:
    """手动触发，复用 auto_compact 的摘要流程。"""
    new_messages = auto_compact(client, messages)
    return new_messages, "Context compacted successfully."

本质上和 Layer 2 一样的摘要流程。区别只是触发者：Layer 2 系统自动，Layer 3 模型主动。

集成：三层压缩在循环中的位置

TOKEN_THRESHOLD = 100_000

def agent_loop(client, messages: list, tools: list):
    while True:
        micro_compact(messages)                    # ① Layer 1

        if estimate_tokens(messages) > TOKEN_THRESHOLD:
            messages = auto_compact(client, messages)  # ② Layer 2

        response = client.messages.create(
            model=MODEL, system=SYSTEM,
            messages=messages, tools=tools, max_tokens=8000,
        )
        messages.append({"role": "assistant", "content": response.content})

        if response.stop_reason != "tool_use":
            return

        results = []
        for block in response.content:
            if block.type != "tool_use":
                continue
            if block.name == "compact":            # ③ Layer 3
                messages, output = handle_compact(client, messages)
            else:
                output = dispatch_tool(block)
            results.append({
                "type": "tool_result",
                "tool_use_id": block.id,
                "content": output,
            })
        messages.append({"role": "user", "content": results})

Layer 1 每轮循环开头，静默执行，零感知
Layer 2 发送给模型之前检查，超阈值才触发
Layer 3 工具 dispatch 阶段，模型自主决定

三层压缩在循环中的集成点

洞见

信息没有丢失，只是移出了活跃上下文

micro_compact 替换的旧输出？模型可以重新执行命令获取。auto_compact 保存的完整对话？落盘在 .transcripts/ 里随时回看。LLM 摘要保留了目标、进度、关键决策——这些才是继续工作真正需要的。

这和人类记忆一样：你不需要记住上周二看的每一行代码，但你记得 "那个模块有竞态条件需要加锁"。摘要是比原始数据更高效的信息载体。

为什么需要三层，不是一层

一层不够，因为压缩有代价：

层级	代价	频率
micro_compact	几乎为零（字符串替换）	每轮
auto_compact	一次 LLM 调用 + 磁盘 IO	偶尔
compact 工具	同 auto_compact	极少

如果只有 auto_compact，每次都要花一次 LLM 调用来做摘要。micro_compact 用近乎免费的字符串替换延缓了摘要的触发时机——能不花钱的地方绝不花钱。

如果只有 micro_compact，面对真正的长对话它迟早顶不住——旧消息的元信息（assistant 回复、占位符）累积到一定量，也会撑满窗口。

三层递进，就像一个渐进式垃圾回收器：第一层是增量回收，第二层是全量 GC，第三层是手动 GC。

五分钟跑起来

# 克隆仓库
git clone https://github.com/shareAI-lab/learn-claude-code
cd learn-claude-code

# 安装依赖
pip install -r requirements.txt

# 配置 API Key
cp .env.example .env
# 编辑 .env，填入你的 ANTHROPIC_API_KEY 和 MODEL_ID

# 启动第六课
python agents/s06_context_compact.py

启动后你会看到 s06 >> 提示符。以下是真实的运行记录：

任务 1：大量读文件触发自动压缩

s06 >> Read every Python file in the agents/ directory one by one
> bash: (列出 agents/ 目录下所有文件)
> read_file: __init__.py
> read_file: s01_agent_loop.py
> read_file: s02_tool_use.py
> read_file: Error: No such file: 'agents/s03_context.py'
> bash: (重新 ls 确认文件名)
> read_file: s03_todo_write.py
> read_file: s04_subagent.py
> read_file: s05_skill_loading.py
> read_file: Error: No such file: 'agents/s06_memory.py'
> bash: (再次 ls 确认)
> read_file: s06_context_compact.py
> read_file: s07_task_system.py
> read_file: s08_background_tasks.py
> read_file: Error: No such file: 'agents/s09_guardrails.py'
> bash: (第三次 ls 确认)
> read_file: s09_agent_teams.py
> read_file: s10_team_protocols.py
> read_file: s11_autonomous_agents.py
> read_file: s12_worktree_task_isolation.py
> read_file: s_full.py

[auto_compact triggered]
[transcript saved: .transcripts/transcript_1775572349.jsonl]

I have full context of the entire codebase.
13 Python files in agents/ — a progressive tutorial:
| s01 | Basic chat loop | s07 | File-based task board |
| s02 | Tool use | s08 | Background execution |
| s03 | TodoWrite | s09 | Multi-agent messaging |
| ... | ... | s12 | Git worktree isolation |

注意两件事：

模型猜错了文件名（s03_context.py、s06_memory.py、s09_guardrails.py），说明它在依赖推测而非精确记忆。每次猜错后它重新 ls 纠正——这就是 Agent 的自修复能力。
读完 13 个文件后自动触发 auto_compact——13 个源码文件的内容轻松超过 token 阈值。压缩后，几万 token 的文件内容变成了一张精炼的摘要表格。完整对话落盘到 .transcripts/。

任务 2：持续读文件直到再次触发

s06 >> Keep reading files until compression triggers automatically
> bash: (列出所有文件)
> read_file: hello.py
> read_file: LICENSE
> read_file: requirements.txt
> read_file: mypackage/utils.py
> read_file: mypackage/__init__.py
> read_file: mypackage/tests/test_utils.py
> read_file: README.md
> read_file: README-zh.md
> read_file: greet.py
> bash: ls -la agents/
> read_file: s01_agent_loop.py
> read_file: s02_tool_use.py
> read_file: s03_todo_write.py
> read_file: s04_subagent.py
> read_file: s05_skill_loading.py
> read_file: s06_context_compact.py
> read_file: s07_task_system.py
> read_file: s08_background_tasks.py
> read_file: s09_agent_teams.py
> read_file: s10_team_protocols.py
> read_file: s11_autonomous_agents.py
> read_file: s12_worktree_task_isolation.py
> bash: ls docs/

[auto_compact triggered]
[transcript saved: .transcripts/transcript_1775572551.jsonl]

第二轮压缩。Agent 从上次压缩后的摘要开始，又读了 20+ 个文件，再次触发 auto_compact。每次压缩后 Agent 可以从摘要续航继续工作，不会因为上下文满了而中断。

任务 3：手动触发压缩

s06 >> Use the compact tool to manually compress the conversation
> compact:
Compressing...
[manual compact]
[transcript saved: .transcripts/transcript_1775572604.jsonl]

模型调用 compact 工具主动触发压缩。这是 Layer 3——当模型自己觉得上下文需要整理时，不用等阈值，随时可以手动清理。

观察 .transcripts/ 目录，里面会出现多个完整对话的 JSONL 备份——每次压缩前都会保存快照，信息不丢失，只是从活跃上下文移到了磁盘。

总结：你刚造了什么

组件	之前 (s05)	之后 (s06)
上下文管理	无，消息只增不减	三层压缩策略
旧工具结果	永久占据上下文	micro_compact 替换为占位符
长对话处理	撑满就报错	auto_compact 自动摘要
模型自主权	无法管理自己的上下文	compact 工具主动触发
对话备份	无	.transcripts/ 完整保存
理论对话长度	受窗口限制	无限

三层压缩：从有限到无限

下一课预告

第 6 课解决了 "对话太长怎么办"。但还有一个问题：对话断了怎么办？

终端一关，所有上下文全没了。第二天打开，Agent 对昨天的工作一无所知。

第 7 课：Task System —— 把任务状态持久化到磁盘。Agent 可以中断、恢复、跨会话继续工作。从 "一次性对话" 变成 "持久化工作流"。

# 预告：s07 的任务持久化
task = TaskManager.load("task_001")  # 从磁盘恢复
task.status = "in_progress"
# ... Agent 继续工作 ...
task.save()  # 随时保存进度

关了终端也不怕。任务还在。

这是《12课拆解Claude Code架构：从零掌握Agent Harness工程》系列的第 6 课。关注Claw开发者，不错过后续更新。

完整代码和交互式学习平台：github.com/shareAI-lab/learn-claude-code

如果这篇文章对你有帮助，欢迎转发给你的技术团队。

系列目录

第1课：用20行Python造出你的第一个AI Agent
第2课：给Agent加工具 —— dispatch map模式详解
第3课：TodoWrite —— 让Agent先想后做：规划系统
第4课：Subagent —— 拆解大任务，上下文隔离
第5课：按需加载领域知识——Skill机制
第6课：无限对话——上下文压缩三层策略（本文）
第7课：任务持久化——文件级DAG任务图
第8课：后台执行——异步任务与通知队列
第9课：Agent Teams——多Agent协作：团队与邮箱系统
第10课：团队协议——状态机驱动的协商
第11课：自治Agent——自组织任务认领
第12课：终极隔离——Worktree并行执行

第6课：无限对话——上下文压缩三层策略

On this page