How many parameters does Gemma-4-26B-A4B-StyleTune have?

Gemma-4-26B-A4B-StyleTune has approximately 26.5 billion parameters.

Who created Gemma-4-26B-A4B-StyleTune?

Gemma-4-26B-A4B-StyleTune was published by Gryphe Padar on Hugging Face.

Gemma-4-26B-A4B-StyleTune

Name: Gemma-4-26B-A4B-StyleTune
Author: Gryphe Padar

LLMby Gryphe Padar·Model page ↗

Gryphe's Gemma 4 26B fine-tune for roleplay and creative writing.

Base model

google/gemma-4-26B-A4B-it

Model Card

Note this version has been superseded by 26B-A4B V2 which I highly recommend you use instead.

Now available in 26B-A4B flavour upon popular request! The text below is mostly recycled from the 31B Style Tune, though statistics have been adjusted accordingly.

A happy accident in surgical finetuning - 54% fewer clichés, an entirely new writing style, and the same Gemma 4 26B-A4B you already know underneath. One tensor changed out of 659.

Also available in 31B version!

What is a style tune?

Normally when I finetune a model I train as much of it as possible, loading every tensor and transforming it to better approximate whatever's in my data. Not this time. This time I trained precisely one tensor: the lm_head output projection - the layer that decides which token to emit. Literally the last stop before text appears on your screen.

This specific tensor has a massive influence on a model's writing style, something I first discovered building MythoMax years ago. Gemma 31B (the first style tune) is a VRAM-hungry monster, so the question became: how do I have the maximum impact with the minimum hardware requirements?

The answer: freeze everything else. All 30 transformer layers, all the attention heads, all the MLPs — completely untouched. Only lm_head trains, which means VRAM requirements drop dramatically, training completes in a single overnight run on consumer hardware, and every single one of Gemma's capabilities remains fully intact. The model hasn't changed. Only the voice has, and it's done so in the best way possible. (Obligatory disclaimer: I might be biased towards my own data.)

I used the same data I had on me for my last Pantheon Reasoning release, with one notable exception - No instruct 24k set. 100% narrative data, certified cliché free.

What changed?

Benchmarked against 200 diverse roleplay prompts versus the base instruct model:

54% fewer clichés per 100 words (1.141 → 0.528)
Only 18.3% shared trigram vocabulary - the model reaches for an almost entirely different set of phrases, with responses feeling much less sloppy as a result.

Considering we're talking about narrative data it's hard to provide you with many other meaningful statistics - It's one of those "try it to understand it" kinda situations.

What didn't change?

Everything else. All the reasoning capability, world knowledge, instruction following, and language understanding are completely intact - none of those live in lm_head. This isn't a full finetune. It's a targeted style replacement on a single tensor.

Inference

Whatever you prefer, Gemma seems remarkably flexible in that regard. I run with temp 1.0, 0.10 MinP and the DRY sampler.

Prompt Format

Gemma 4's native chat template applies automatically.

Notes

For all I know this might only genuinely work for Gemma 4 specifically, but I'll certainly be poking other models if people enjoy this release. Feedback is, as always, very welcome!

Credits

Everyone from Anthracite! Hi, guys!
Latitude, for which I am still producing finetunes on a regular basis, helping me keep my skills sharp and up-to-date!
All the folks I chat with on a daily basis on Discord! You know who you are.
Anyone I forgot to mention, just in case!

Author

Gryphe Padar

User

Gryphe

Details

Downloads432

Likes44

AccessOpen Source

Tasktext-generation

Parameters26.5B

Trending42

Licenseapache-2.0

CreatedJun 14, 2026

UpdatedJun 20, 2026

View on Hugging Face

Languages

Get the full context.