S

Step 3.7 Flash

Multimodalby StepFun·Model page

StepFun's multimodal reasoning model with a 256K-token context for text, image, and video inputs.

Share:

Model Card

Step 3.7 Flash is StepFun's latest high-efficiency multimodal Mixture-of-Experts model. It pairs a 196B-parameter language backbone with a vision encoder for native image and video understanding, activating roughly 11B parameters...

Author
S
StepFun
Organization
stepfun
Details
Downloads
Likes
AccessOpen Source
Context256K tokens
Input price$0.2 /1M
Output price$1.15 /1M
CreatedMay 28, 2026
Updated
View on Hugging Face
Benchmarks
Intelligence29.7
Coding37.3
Agentic21.5
Get the full context.

Sign up to read complete case studies, access detailed metrics, and unlock all use cases.

Step 3.7 Flash — AI Model Details | Applied