GLM-4.6V

GLM-4.6V is Z.ai's full-scale 106B vision-language foundation model with a context window of 128K tokens, native multimodal function calling, interleaved image-text generation, and pixel-accurate frontend replication from screenshots.

Vision (Image)File InputReasoningTool UseImplicit Caching

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'zai/glm-4.6v',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

More models by Z.ai

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

205K

0.3s

130tps

$1.30/M

$4.30/M

Read:$0.26/M

Write:—

—

04/07/2026

203K

3.8s

59tps

$1.20/M

$4.00/M

Read:$0.24/M

Write:—

—

03/15/2026

203K

0.3s

102tps

$0.80/M

$2.56/M

Read:$0.16/M

Write:—

—

02/12/2026

205K

0.1s

857tps

$2.25/M

$2.75/M

Read:$2.25/M

Write:—

—

12/22/2025

205K

0.4s

246tps

$0.60/M

$2.20/M

Read:$0.11/M

Write:—

—

09/30/2025

200K

0.3s

79tps

$0.07/M

$0.40/M

Read:$0.01/M

Write:—

—

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

GLM-4.6V

More models by Z.ai