HF

zephyr-7b-gemma-sft-v0.1

LLMby Hugging Face H4·Model page

HuggingFaceH4's 8.5B Zephyr chat model built on Gemma 7B via supervised fine-tuning on the Deita 10k dataset.

Share:

Base model

google/gemma-7b

Model Card

This model is a fine-tuned version of google/gemma-7b on the HuggingFaceH4/deita-10k-v0-sft dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9732

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 16
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 128
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss
0.9482 1.0 299 0.9848
0.8139 2.0 599 0.9610
0.722 2.99 897 0.9732

Framework versions

  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.1
Author
HF
Hugging Face H4
Organization
HuggingFaceH4
Details
Downloads114
Likes13
AccessOpen Source
Tasktext-generation
Parameters8.5B
Licenseother
Librarytransformers
CreatedMar 1, 2024
UpdatedMar 1, 2024
View on Hugging Face
Languages
en
Get the full context.

Sign up to read complete case studies, access detailed metrics, and unlock all use cases.

zephyr-7b-gemma-sft-v0.1 — AI Model Details | Applied