Z

GLM 4.5V

Multimodalby Z.ai·Model page

Z.ai's vision-language model for image and text understanding with a 65k-token context.

Share:

Model Card

GLM-4.5V is a vision-language foundation model for multimodal agent applications. Built on a Mixture-of-Experts (MoE) architecture with 106B parameters and 12B activated parameters, it achieves state-of-the-art results in video understanding,...

Author
Z
Z.ai
Organization
z-ai
Details
Downloads
Likes
AccessOpen Source
Context66K tokens
Input price$0.6 /1M
Output price$1.8 /1M
Knowledge cutoffDec 31, 2024
CreatedAug 11, 2025
Updated
View on Hugging Face
Get the full context.

Sign up to read complete case studies, access detailed metrics, and unlock all use cases.

GLM 4.5V — AI Model Details | Applied