Gemma4-12B-QAT-Uncensored-HauhauCS-Balanced
Variante GGUF cuantizada QAT sin censura de Google Gemma 4 12B por HauHau, optimizada para visión, programación y tareas creativas.
Modelo base
Descripción del Modelo
Join the Discord for updates, roadmaps, projects, or just to chat.
Gemma4-12B (QAT) uncensored by HauhauCS. 0/465 Refusals*
About
No changes to datasets or capabilities — fully functional, 100% of what the original authors intended, just without the refusals. Built from the official QAT weights, so the 4-bit quant stays close to full-precision quality.
Balanced
The Balanced variant (recommended — 99%+ of users will be happy here) uses optimized full uncensoring tuned especially for agentic coding, reasoning, creative writing and reliability-critical tasks. It reasons before answering and stays dependable and on-instruction. An Aggressive variant, for cases where Balanced still deflects too much, after current testing is not required.
~60% faster with MTP
Ships with an MTP (multi-token-prediction) draft head for speculative decoding — roughly 60% faster generation with identical output (the model verifies every drafted token, so quality is unchanged — pure speed). This release is tuned to pair well with the included MTP head.
llama.cpp:
llama-server \
-m Gemma4-12B-QAT-Uncensored-HauhauCS-Balanced-Q4_K_M.gguf \
-md mtp-gemma-4-12B-it.gguf --spec-type draft-mtp \
-ngl 99 -fa on
Note: the MTP speedup was currently tested by me through llama.cpp (llama-server / llama-cli).
Downloads
| File | Type | Size |
|---|---|---|
Gemma4-12B-QAT-Uncensored-HauhauCS-Balanced-Q4_K_M.gguf |
Q4_K_M (text) | 6.9 GB |
mmproj-Gemma4-12B-QAT-Uncensored-HauhauCS-Balanced-BF16.gguf |
mmproj (vision) | 168 MB |
mtp-gemma-4-12B-it.gguf |
MTP speculative drafter | 242 MB |
Why only Q4_K_M? Gemma 4 is quantization-aware-trained for ~4-bit, so Q4_K_M is the sweet spot — higher-precision quants add size with no real quality gain. Carefully quantized for best quality at 4-bit.
Vision
Load the mmproj alongside the model for image input:
llama-server -m Gemma4-12B-QAT-Uncensored-HauhauCS-Balanced-Q4_K_M.gguf \
--mmproj mmproj-Gemma4-12B-QAT-Uncensored-HauhauCS-Balanced-BF16.gguf -ngl 99 -fa on
Recommended sampling
These are dialed in specifically for this HauhauCS build — use them for the intended behaviour and quality:
temperature 0.6top_k 64top_p 0.9min_p 0.05repeat_penalty 1.1
This release is tuned end-to-end as its own thing; the settings above are part of that and aren't the stock Gemma defaults.
Specs
- 12B dense · 256K (262144) context
- Vision (image input) via mmproj
- Based on Gemma 4 12B by Google DeepMind
Compatibility
- Works with llama.cpp, LM Studio, Jan, koboldcpp, and other GGUF runtimes.
- Multi-GPU + LM Studio: I've personally noticed Gemma 4 can crash under LM Studio's tensor-split mode — use a single GPU (layer-split or priority order) for this model.
Acknowledgements
- Google DeepMind — Gemma 4.
- The included
mtp-gemma-4-12B-it.ggufspeculative draft head comes from Unsloth's Gemma 4 release — many thanks to the Unsloth team for it.
* Tested with both automated and manual refusal benchmarks — none have been found in standard use. A small number of edge-case prompts deflect on the first ask but comply on a re-ask or strategic framing. If you hit one that's actually obstructive to your use case, join the Discord and flag it so I can work on it in a future revision.
Regístrate para leer casos de estudio completos, acceder a métricas detalladas y recibir todos los reportes.
Regístrate para leer casos de estudio completos, acceder a métricas detalladas y recibir todos los reportes.