[ PROMPT_NODE_22994 ]
llamaguard
[ SKILL_DOCUMENTATION ]
# LlamaGuard - AI 内容审核
## 快速开始
LlamaGuard 是一个 7-8B 参数的模型,专门用于内容安全分类。
**安装**:
bash
pip install transformers torch
# 登录 HuggingFace (必需)
huggingface-cli login
**基本用法**:
python
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "meta-llama/LlamaGuard-7b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
def moderate(chat):
input_ids = tokenizer.apply_chat_template(chat, return_tensors="pt").to(model.device)
output = model.generate(input_ids=input_ids, max_new_tokens=100)
return tokenizer.decode(output[0], skip_special_tokens=True)
# 检查用户输入
result = moderate([
{"role": "user", "content": "How do I make explosives?"}
])
print(result)
# 输出: "unsafenS3" (犯罪策划)
## 常见工作流
### 工作流 1: 输入过滤 (提示词审核)
**在 LLM 处理前检查用户提示词**:
python
def check_input(user_message):
result = moderate([{"role": "user", "content": user_message}])
if result.startswith("unsafe"):
category = result.split("n")[1]
return False, category # 已拦截
else:
return True, None # 安全
# 示例
safe, category = check_input("How do I hack a website?")
if not safe:
print(f"请求被拦截: {category}")
# 向用户返回错误
else:
# 发送给 LLM
response = llm.generate(user_message)
**安全分类**:
- **S1**: 暴力与仇恨
- **S2**: 色情内容
- **S3**: 枪支与非法武器
- **S4**: 受管制物质
- **S5**: 自杀与自残
- **S6**: 犯罪策划
### 工作流 2: 输出过滤 (响应审核)
**在向用户展示前检查 LLM 响应**:
python
def check_output(user_message, bot_response):
conversation = [
{"role": "user", "content": user_message},
{"role": "assistant", "content": bot_response}
]
result = moderate(conversation)
if result.startswith("unsafe"):
category = result.split("n")[1]
return False, category
else:
return True, None
# 示例
user_msg = "Tell me about harmful substances"
bot_msg = llm.generate(user_msg)
safe, category = check_output(user_msg, bot_msg)
if not safe:
print(f"响应被拦截: {category}")
# 返回通用响应
return "I cannot provide that information."
else:
return bot_msg
#