BA

seggpt-vit-large

Otherby Beijing Academy of Artificial Intelligence·Model page

BAAI's 371M-parameter ViT-Large model for universal in-context visual segmentation across images and videos.

Share:

Model Description

The SegGPT model was proposed in SegGPT: Segmenting Everything In Context by Xinlong Wang, Xiaosong Zhang, Yue Cao, Wen Wang, Chunhua Shen, Tiejun Huang.

Model description

SegGPT employs a decoder-only (GPT-like) Transformer that can generate a segmentation mask given an input image, a prompt image and its corresponding prompt mask. The model achieves remarkable one-shot results with 56.1 mIoU on COCO-20 and 85.6 mIoU on FSS-1000.

Intended uses & limitations

You can use the raw model for one-shot image segmentation.

How to use

Here's how to use the model for one-shot semantic segmentation:

import torch
from datasets import load_dataset
from transformers import SegGptImageProcessor, SegGptForImageSegmentation

model_id = "BAAI/seggpt-vit-large"
image_processor = SegGptImageProcessor.from_pretrained(checkpoint)
model = SegGptForImageSegmentation.from_pretrained(checkpoint)

dataset_id = "EduardoPacheco/FoodSeg103"
ds = load_dataset(dataset_id, split="train")
# Number of labels in FoodSeg103 (not including background)
num_labels = 103

image_input = ds[4]["image"]
ground_truth = ds[4]["label"]
image_prompt = ds[29]["image"]
mask_prompt = ds[29]["label"]

inputs = image_processor(
    images=image_input, 
    prompt_images=image_prompt,
    prompt_masks=mask_prompt, 
    num_labels=num_labels,
    return_tensors="pt"
)

with torch.no_grad():
    outputs = model(**inputs)

target_sizes = [image_input.size[::-1]]
mask = image_processor.post_process_semantic_segmentation(outputs, target_sizes, num_labels=num_labels)[0]

BibTeX entry and citation info

@misc{wang2023seggpt,
      title={SegGPT: Segmenting Everything In Context}, 
      author={Xinlong Wang and Xiaosong Zhang and Yue Cao and Wen Wang and Chunhua Shen and Tiejun Huang},
      year={2023},
      eprint={2304.03284},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Author
BA
Beijing Academy of Artificial Intelligence
Organization
BAAI
Details
Downloads6.3K
Likes5
AccessOpen Source
Parameters371M
Licenseapache-2.0
Librarytransformers
CreatedNov 30, 2023
UpdatedFeb 22, 2024
View on Hugging Face
Get the full context.

Sign up to read complete case studies, access detailed metrics, and unlock all use cases.

seggpt-vit-large — AI Model Details | Applied