Unlimited-OCR-GGUF
Sahil Chachra's GGUF quantization of Baidu's Unlimited-OCR, a multilingual vision-language model for document parsing.
Base model
Model Description
GGUF quantizations of baidu/Unlimited-OCR, a 3B vision-language OCR model that pushes DeepSeek-OCR one step further (one-shot, long-horizon document parsing). This repo contains a full spread of K-quants and i-quants of the language model plus the vision projector (mmproj) needed for image input.
⚠️ Requires a DeepSeek-OCR–aware llama.cpp build (PR #17400). Unlimited-OCR uses the DeepSeek-OCR architecture (a SAM+CLIP DeepEncoder vision tower with a DeepSeek-V2 MoE text decoder). Support is not yet merged into upstream
main— stock llama.cpp will not load these files. Build the PR branch (instructions below).
Files
Every run needs two files: one language model GGUF (pick a quant) plus the shared vision projector. The projector is fp16 and identical for all quants.
| File | Quant | Bits | Size | Notes |
|---|---|---|---|---|
Unlimited-OCR-BF16.gguf |
BF16 | 16 | 5.47 GiB | Full-precision conversion. The base every quant is made from; reference quality. |
Unlimited-OCR-Q8_0.gguf |
Q8_0 | 8 | 2.91 GiB | Near-lossless. Best quality short of BF16; recommended if you have the disk/RAM. |
Unlimited-OCR-Q6_K.gguf |
Q6_K | 6 | 2.43 GiB | Very high quality, essentially indistinguishable from Q8_0 for OCR. |
Unlimited-OCR-Q5_K_M.gguf |
Q5_K_M | 5 | 2.07 GiB | High quality. Great balance when you can spare a bit more than Q4. |
Unlimited-OCR-Q5_K_S.gguf |
Q5_K_S | 5 | 1.95 GiB | High quality, slightly smaller than Q5_K_M. |
Unlimited-OCR-Q4_K_M.gguf |
Q4_K_M | 4 | 1.82 GiB | Recommended default — best overall size/quality trade-off. |
Unlimited-OCR-Q4_K_S.gguf |
Q4_K_S | 4 | 1.68 GiB | Slightly smaller than Q4_K_M with a small quality cost. |
Unlimited-OCR-Q3_K_M.gguf |
Q3_K_M | 3 | 1.45 GiB | Compact. Usable when memory is tight; some quality loss. |
Unlimited-OCR-IQ4_XS.gguf |
IQ4_XS | 4 | 1.53 GiB | i-quant: smaller than Q4_K_S at similar quality (built with imatrix). |
Unlimited-OCR-IQ4_NL.gguf |
IQ4_NL | 4 | 1.59 GiB | i-quant (non-linear): 4-bit tuned for ARM/edge; good on Jetson/Apple. |
Unlimited-OCR-IQ3_M.gguf |
IQ3_M | 3 | 1.35 GiB | i-quant: solid 3-bit quality for the size (imatrix). |
Unlimited-OCR-IQ3_XXS.gguf |
IQ3_XXS | 3 | 1.24 GiB | i-quant: very small 3-bit; noticeable quality loss but runnable. |
Unlimited-OCR-IQ2_M.gguf |
IQ2_M | 2 | 1.15 GiB | i-quant: smallest here; experimental, lowest quality — for tight memory only. |
Vision projector (required for all of the above):
| File | Type | Size |
|---|---|---|
mmproj-Unlimited-OCR-F16.gguf |
F16 | 774.27 MiB |
Sizes are the on-disk GGUF sizes. The vision encoder is kept at F16 (not quantized) — it is small and quantizing it hurts OCR accuracy. i-quants were built with an importance matrix (imatrix) computed from a general-text calibration set.
Build llama.cpp with DeepSeek-OCR support
git clone https://github.com/ggml-org/llama.cpp && cd llama.cpp
git fetch origin pull/24975/head:pr24975 && git checkout pr24975
cmake -B build -DCMAKE_BUILD_TYPE=Release # add -DGGML_CUDA=ON for NVIDIA
cmake --build build -j --target llama-mtmd-cli llama-server
Quick start
Download one quant + the projector (you always need both):
huggingface-cli download sahilchachra/Unlimited-OCR-GGUF \
--include "Unlimited-OCR-Q4_K_M.gguf" "mmproj-Unlimited-OCR-F16.gguf" --local-dir ./uocr
Run it on an image:
./build/bin/llama-mtmd-cli \
-m ./uocr/Unlimited-OCR-Q4_K_M.gguf \
--mmproj ./uocr/mmproj-Unlimited-OCR-F16.gguf \
--image document.png \
-p "<|grounding|>Convert the document to markdown." \
--temp 0
Use
--temp 0for OCR (deterministic). Add-n 4096(or more) for long/dense documents.
Prompting guide
Unlimited-OCR uses the DeepSeek-OCR prompt vocabulary. The prompt is just an instruction;
prefix it with <|grounding|> whenever you also want bounding boxes for what was read.
| Task | Prompt (-p) |
|---|---|
| Document → Markdown (layout-aware, with boxes) | `< |
| Plain text OCR (just the text, no layout) | Free OCR. |
| OCR with bounding boxes | `< |
| Native Unlimited-OCR parse | document parsing. |
| Parse a figure / chart / diagram | Parse the figure. |
| Describe the image (general VQA) | Describe this image in detail. |
| Find specific text (referring grounding) | `< |
Worked examples
1) Document → clean Markdown (tables, headings, reading order):
./build/bin/llama-mtmd-cli -m ./uocr/Unlimited-OCR-Q4_K_M.gguf \
--mmproj ./uocr/mmproj-Unlimited-OCR-F16.gguf \
--image invoice.png --temp 0 -n 4096 \
-p "<|grounding|>Convert the document to markdown."
2) Just the raw text, no layout / no boxes:
./build/bin/llama-mtmd-cli -m ./uocr/Unlimited-OCR-Q4_K_M.gguf \
--mmproj ./uocr/mmproj-Unlimited-OCR-F16.gguf \
--image receipt.jpg --temp 0 -p "Free OCR."
3) Locate a specific string and get its box:
./build/bin/llama-mtmd-cli -m ./uocr/Unlimited-OCR-Q4_K_M.gguf \
--mmproj ./uocr/mmproj-Unlimited-OCR-F16.gguf \
--image form.png --temp 0 \
-p "<|grounding|>Locate <|ref|>Invoice Number<|/ref|> in the image."
Understanding the output (grounding tokens)
With <|grounding|>, the model interleaves the recognized text with detection boxes:
<|det|>title [37, 64, 464, 132]<|/det|>INVOICE #2026-0623
<|det|>text [37, 194, 350, 247]<|/det|>Bill To: Sahil Chachra
<|det|>text [37, 483, 329, 543]<|/det|>Total Due: $44.00
Each [x1, y1, x2, y2] is the bounding box (top-left → bottom-right) of that span, in the
coordinate space of the model's input image. Drop the <|det|>...<|/det|> tags if you only
want the text, or parse them to overlay boxes / build a layout. Without <|grounding|> you get
plain text (or Markdown) with no box tags.
Tip — long documents: Unlimited-OCR targets one-shot long-horizon parsing. For multi-page scans, run page-by-page and concatenate. If output ever repeats/loops on a dense page, add a mild repetition penalty, e.g.
--repeat-penalty 1.05, and keep--temp 0.
Serving (OpenAI-compatible API)
./build/bin/llama-server \
-m ./uocr/Unlimited-OCR-Q4_K_M.gguf \
--mmproj ./uocr/mmproj-Unlimited-OCR-F16.gguf \
-c 8192 --host 0.0.0.0 --port 8080
Call it with an image (base64 data URL):
IMG=$(base64 -w0 document.png)
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"temperature": 0,
"messages": [{ "role": "user", "content": [
{ "type": "text", "text": "<|grounding|>Convert the document to markdown." },
{ "type": "image_url", "image_url": { "url": "data:image/png;base64,'"$IMG"'" } }
]}]
}'
Python (OpenAI SDK) is identical — point base_url at http://localhost:8080/v1, send a
text part with the prompt above and an image_url part with the data URL.
About the model
- Architecture:
DeepseekOCRForCausalLM— DeepEncoder vision (SAM-ViT-B + CLIP-L/14, 1024×1024 input, 16× downsample) → linear projector → DeepSeek-V2 MoE text decoder (12 layers, hidden 1280, 64 routed + 2 shared experts, 6 experts/token). - Task: multilingual OCR / document parsing — single image, multi-page, and PDF (one-shot long-horizon parsing). The original supports gundam (crop) and base resolution modes.
- License: MIT (inherited from the base model).
How these were made
- Converted
baidu/Unlimited-OCRto GGUF with the PR #17400convert_hf_to_gguf.py. The converter targets DeepSeek-OCR, so the config's top-levelarchitectureswas set toDeepseekOCRForCausalLMandlanguage_config.architecturestoDeepseekV2ForCausalLM(the model is otherwise byte-identical to DeepSeek-OCR's tensor layout). - Exported the text decoder (BF16) and the vision tower (
--mmproj, F16) separately. - Built an importance matrix from a general-text corpus and produced the K-/i-quants with
llama-quantize. - Verified: the BF16 GGUF + mmproj correctly OCR a test document (text + grounding boxes)
via
llama-mtmd-clibefore quantizing.
Limitations
- Needs the PR #17400 llama.cpp build until DeepSeek-OCR support lands in
main. - Very low-bit i-quants (IQ3_XXS, IQ2_M) trade real accuracy for size — prefer Q4_K_M or higher for production OCR.
- The vision encoder runs in fp16 regardless of the chosen text quant.
Credits
- Base model: baidu/Unlimited-OCR (MIT) — builds on deepseek-ai/DeepSeek-OCR.
- GGUF / DeepSeek-OCR llama.cpp support: ggml-org/llama.cpp#17400.
- Quantized by sahilchachra.
Sign up to read complete case studies, access detailed metrics, and unlock all use cases.
Sign up to read complete case studies, access detailed metrics, and unlock all use cases.