D

DeepSeek-OCR-2

Multimodalby DeepSeek·Model page

DeepSeek-OCR-2 is a 3.4B-parameter multilingual OCR vision-language model by DeepSeek for extracting text from images.

Share:

Model Card


Usage

Inference using Huggingface transformers on NVIDIA GPUs. Requirements tested on python 3.12.9 + CUDA11.8:

torch==2.6.0
transformers==4.46.3
tokenizers==0.20.3
einops
addict 
easydict
pip install flash-attn==2.7.3 --no-build-isolation
from transformers import AutoModel, AutoTokenizer
import torch
import os
os.environ["CUDA_VISIBLE_DEVICES"] = '0'
model_name = 'deepseek-ai/DeepSeek-OCR-2'

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(model_name, _attn_implementation='flash_attention_2', trust_remote_code=True, use_safetensors=True)
model = model.eval().cuda().to(torch.bfloat16)

# prompt = "<image>\nFree OCR. "
prompt = "<image>\n<|grounding|>Convert the document to markdown. "
image_file = 'your_image.jpg'
output_path = 'your/output/dir'


res = model.infer(tokenizer, prompt=prompt, image_file=image_file, output_path = output_path, base_size = 1024, image_size = 768, crop_mode=True, save_results = True)

vLLM

Refer to 🌟GitHub for guidance on model inference acceleration and PDF processing, etc.

Support-Modes

  • Dynamic resolution
    • Default: (0-6)×768×768 + 1×1024×1024 — (0-6)×144 + 256 visual tokens ✅

Main Prompts

# document: <image>\n<|grounding|>Convert the document to markdown.
# without layouts: <image>\nFree OCR.

Acknowledgement

We would like to thank DeepSeek-OCR, Vary, GOT-OCR2.0, MinerU, PaddleOCR for their valuable models and ideas.

We also appreciate the benchmark OmniDocBench.

Citation

@article{wei2025deepseek,
  title={DeepSeek-OCR: Contexts Optical Compression},
  author={Wei, Haoran and Sun, Yaofeng and Li, Yukun},
  journal={arXiv preprint arXiv:2510.18234},
  year={2025}
}
@article{wei2026deepseek,
  title={DeepSeek-OCR 2: Visual Causal Flow},
  author={Wei, Haoran and Sun, Yaofeng and Li, Yukun},
  journal={arXiv preprint arXiv:2601.20552},
  year={2026}
}
Author
D
DeepSeek
Organization · ✓
deepseek-ai
Details
Downloads2.2M
Likes990
AccessOpen Source
Taskimage-text-to-text
Parameters3.4B
Licenseapache-2.0
Librarytransformers
CreatedJan 27, 2026
UpdatedFeb 3, 2026
View on Hugging Face
Languages
multilingual
Get the full context.

Sign up to read complete case studies, access detailed metrics, and unlock all use cases.

DeepSeek-OCR-2 — AI Model Details | Applied