[ PROMPT_NODE_22630 ]

Mechanistic Interpretability Pyvene 说明文档

[ SKILL_DOCUMENTATION ]

# pyvene 参考文档本目录包含 pyvene 的综合参考资料。 ## 内容 - [api.md](api.md) - IntervenableModel、干预类型及配置的完整 API 参考 - [tutorials.md](tutorials.md) - 因果追踪、激活修补及可训练干预的分步教程 ## 快速链接 - **官方文档**: https://stanfordnlp.github.io/pyvene/ - **GitHub 仓库**: https://github.com/stanfordnlp/pyvene - **论文**: https://arxiv.org/abs/2403.07809 (NAACL 2024) ## 安装 bash pip install pyvene ## 基础用法 python import pyvene as pv from transformers import AutoModelForCausalLM, AutoTokenizer # 加载模型 model = AutoModelForCausalLM.from_pretrained("gpt2") tokenizer = AutoTokenizer.from_pretrained("gpt2") # 定义干预 config = pv.IntervenableConfig( representations=[ pv.RepresentationConfig( layer=5, component="block_output", intervention_type=pv.VanillaIntervention, ) ] ) # 创建可干预模型 intervenable = pv.IntervenableModel(config, model) # 运行干预（将激活值从源交换到基础） base_inputs = tokenizer("The cat sat on the", return_tensors="pt") source_inputs = tokenizer("The dog ran through the", return_tensors="pt") _, outputs = intervenable( base=base_inputs, sources=[source_inputs], ) ## 关键概念 ### 干预类型 - **VanillaIntervention**: 在运行之间交换激活值 - **AdditionIntervention**: 将源激活值添加到基础激活值 - **ZeroIntervention**: 将激活值置零（消融） - **CollectIntervention**: 收集激活值而不进行修改 - **RotatedSpaceIntervention**: 用于因果发现的可训练干预 ### 组件定位模型的特定部分： - `block_input`, `block_output` - `mlp_input`, `mlp_output`, `mlp_activation` - `attention_input`, `attention_output` - `query_output`, `key_output`, `value_output` ### HuggingFace 集成通过 HuggingFace Hub 保存和加载干预配置以实现可重复性。

数据来源：claude-code-templates（MIT），中文翻译由 AI 生成。详见关于我们。

BAGUA AI