G

paligemma-3b-pt-224

Multimodalby Google·Model page

Google's 3B-parameter pre-trained vision-language model combining a SigLIP image encoder with a Gemma text decoder at 224px.

Share:
Author
G
Google
Organization · ✓
google
Details
Downloads652.8K
Likes479
AccessOpen Source
Taskimage-text-to-text
Parameters2.9B
Licensegemma
Librarytransformers
CreatedMay 12, 2024
UpdatedSep 21, 2024
View on Hugging Face
Get the full context.

Sign up to read complete case studies, access detailed metrics, and unlock all use cases.

paligemma-3b-pt-224 — AI Model Details | Applied