HF

starchat2-15b-sft-v0.1

LLMby Hugging Face H4·Model page

HuggingFaceH4's 15.9B code chat model built on StarCoder2 via supervised fine-tuning on coding, math, and instruction datasets.

Share:

Base model

bigcode/starcoder2-15b

Model Card

This model is a fine-tuned version of bigcode/starcoder2-15b on the HuggingFaceH4/airoboros-3.2, the HuggingFaceH4/Code-Feedback, the HuggingFaceH4/orca-math-word-problems-200k, the HuggingFaceH4/SystemChat and the HuggingFaceH4/capybara datasets. It achieves the following results on the evaluation set:

  • Loss: 0.6614

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 16
  • total_train_batch_size: 128
  • total_eval_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss
0.6422 1.0 910 0.6910
0.5701 2.0 1820 0.6639
0.5227 3.0 2730 0.6614

Framework versions

  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.16.1
  • Tokenizers 0.15.1
Author
HF
Hugging Face H4
Organization
HuggingFaceH4
Details
Downloads46
Likes5
AccessOpen Source
Tasktext-generation
Parameters16B
Licensebigcode-openrail-m
Librarytransformers
CreatedMar 12, 2024
UpdatedMar 12, 2024
View on Hugging Face
Get the full context.

Sign up to read complete case studies, access detailed metrics, and unlock all use cases.

starchat2-15b-sft-v0.1 — AI Model Details | Applied