[ PROMPT_NODE_22832 ]
Optimization Hqq 故障排查
[ SKILL_DOCUMENTATION ]
# HQQ 故障排除指南
## 安装问题
### 找不到包
**错误**: `ModuleNotFoundError: No module named 'hqq'`
**修复**:
bash
pip install hqq
# 验证安装
python -c "import hqq; print(hqq.__version__)"
### 后端依赖缺失
**错误**: `ImportError: Cannot import marlin backend`
**修复**:
bash
# 安装特定后端
pip install hqq[marlin]
# 或安装所有后端
pip install hqq[all]
# 安装 BitBlas
pip install bitblas
# 安装 TorchAO
pip install torchao
### CUDA 版本不匹配
**错误**: `RuntimeError: CUDA error: no kernel image is available`
**修复**:
bash
# 检查 CUDA 版本
nvcc --version
python -c "import torch; print(torch.version.cuda)"
# 重新安装匹配 CUDA 的 PyTorch
pip install torch --index-url https://download.pytorch.org/whl/cu121
# 然后重新安装 hqq
pip install hqq --force-reinstall
## 量化错误
### 量化时显存溢出
**错误**: `torch.cuda.OutOfMemoryError`
**解决方案**:
1. **使用 CPU 卸载**:
python
from transformers import AutoModelForCausalLM, HqqConfig
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-3.1-8B",
quantization_config=HqqConfig(nbits=4, group_size=64),
device_map="auto",
offload_folder="./offload"
)
2. **逐层量化**:
python
from hqq.models.hf.base import AutoHQQHFModel
model = AutoHQQHFModel.from_pretrained(
"meta-llama/Llama-3.1-8B",
quantization_config=config,
device_map="sequential"
)
3. **减小分组大小**:
python
config = HqqConfig(
nbits=4,
group_size=32 # 较小的分组在量化期间占用更少内存
)
### 量化后出现 NaN 值
**错误**: `RuntimeWarning: invalid value encountered` 或输出 NaN
**解决方案**:
1. **检查异常值**:
python
import torch
def check_weight_stats(model):
for name, param in model.named_parameters():
if param.numel() > 0:
has_nan = torch.isnan(param).any().item()
has_inf = torch.isinf(param).any().item()
if has_nan or has_inf:
print(f"{name}: NaN={has_nan}, Inf={has_inf}")
print(f" min={param.min():.4f}, max={param.max():.4f}")
check_weight_stats(model)
2. **对问题层使用更高精度**:
python
layer_configs = {
"problematic_layer": BaseQuantizeConfig(nbits=8, group_size=128),
"default": BaseQuantizeConfig(nbits=4, group_size=64)
}
3. **跳过嵌入层 (Embedding)