[ PROMPT_NODE_22766 ]

Multimodal Segment Anything 高级用法

[ SKILL_DOCUMENTATION ]

# Segment Anything 高级用法指南 ## SAM 2 (视频分割) ### 概述 SAM 2 通过流式内存架构将 SAM 扩展到视频分割： bash pip install git+https://github.com/facebookresearch/segment-anything-2.git ### 视频分割 python from sam2.build_sam import build_sam2_video_predictor predictor = build_sam2_video_predictor("sam2_hiera_l.yaml", "sam2_hiera_large.pt") # 使用视频初始化 predictor.init_state(video_path="video.mp4") # 在第一帧添加提示词 predictor.add_new_points( frame_idx=0, obj_id=1, points=[[100, 200]], labels=[1] ) # 在视频中传播 for frame_idx, masks in predictor.propagate_in_video(): # masks 包含所有被跟踪对象的分割结果 process_frame(frame_idx, masks) ### SAM 2 与 SAM 对比 | 特性 | SAM | SAM 2 | |---------|-----|-------| | 输入 | 仅图像 | 图像 + 视频 | | 架构 | ViT + 解码器 | Hiera + 内存 | | 内存 | 每张图像 | 流式内存库 | | 跟踪 | 否 | 是，跨帧跟踪 | | 模型 | ViT-B/L/H | Hiera-T/S/B+/L | ## Grounded SAM (文本提示词分割) ### 设置 bash pip install groundingdino-py pip install git+https://github.com/facebookresearch/segment-anything.git ### 文本转掩码工作流 python from groundingdino.util.inference import load_model, predict from segment_anything import sam_model_registry, SamPredictor import cv2 # 加载 Grounding DINO grounding_model = load_model("groundingdino_swint_ogc.pth", "GroundingDINO_SwinT_OGC.py") # 加载 SAM sam = sam_model_registry["vit_h"](checkpoint="sam_vit_h_4b8939.pth") predictor = SamPredictor(sam) def text_to_mask(image, text_prompt, box_threshold=0.3, text_threshold=0.25): """根据文本描述生成掩码。""" # 从文本获取边界框 boxes, logits, phrases = predict( model=grounding_model, image=image, caption=text_prompt, box_threshold=box_threshold, text_threshold=text_threshold ) # 使用 SAM 生成掩码 predictor.set_image(image) masks = [] for box in boxes: # 将归一化框转换为像素坐标 h, w = image.shape[:2] box_pixels = box * np.array([w, h, w, h]) mask, score, _ = predictor.predict( box=box_pixels, multimask_output=False ) masks.append(mask[0]) return masks, boxes, phrases # 用法 image = cv2.imread("image.jpg") image = cv

数据来源：claude-code-templates（MIT），中文翻译由 AI 生成。详见关于我们。

BAGUA AI