Sample Page Title

January 21, 2026

20

GLM-4.7-Flash is a brand new member of the GLM 4.7 household and targets builders who need sturdy coding and reasoning efficiency in a mannequin that’s sensible to run regionally. Zhipu AI (Z.ai) describes GLM-4.7-Flash as a 30B-A3B MoE mannequin and presents it because the strongest mannequin within the 30B class, designed for light-weight deployment the place efficiency and effectivity each matter.

Mannequin class and place contained in the GLM 4.7 household

GLM-4.7-Flash is a textual content technology mannequin with 31B params, BF16 and F32 tensor sorts, and the structure tag glm4_moe_lite. It helps English and Chinese language, and it’s configured for conversational use. GLM-4.7-Flash sits within the GLM-4.7 assortment subsequent to the bigger GLM-4.7 and GLM-4.7-FP8 fashions.

Z.ai positions GLM-4.7-Flash as a free tier and light-weight deployment choice relative to the total GLM-4.7 mannequin, whereas nonetheless concentrating on coding, reasoning, and common textual content technology duties. This makes it attention-grabbing for builders who can’t deploy a 358B class mannequin however nonetheless need a fashionable MoE design and powerful benchmark outcomes.

Structure and context size

In a Combination of Specialists structure of this kind, the mannequin shops extra parameters than it prompts for every token. That enables specialization throughout specialists whereas maintaining the efficient compute per token nearer to a smaller dense mannequin.

GLM 4.7 Flash helps a context size of 128k tokens and achieves sturdy efficiency on coding benchmarks amongst fashions of comparable scale. This context dimension is appropriate for giant codebases, multi-file repositories, and lengthy technical paperwork, the place many current fashions would wish aggressive chunking.

GLM-4.7-Flash makes use of a typical causal language modeling interface and a chat template, which permits integration into current LLM stacks with minimal modifications.

Benchmark efficiency within the 30B class

The Z.ai workforce compares GLM-4.7-Flash with Qwen3-30B-A3B-Considering-2507 and GPT-OSS-20B. GLM-4.7-Flash leads or is aggressive throughout a mixture of math, reasoning, lengthy horizon, and coding agent benchmarks.

https://huggingface.co/zai-org/GLM-4.7-Flash

This above desk showcase why GLM-4.7-Flash is without doubt one of the strongest mannequin within the 30B class, at the very least among the many fashions included on this comparability. The necessary level is that GLM-4.7-Flash isn’t solely a compact deployment of GLM but additionally a excessive performing mannequin on established coding and agent benchmarks.

Analysis parameters and pondering mode

For many duties, the default settings are: temperature 1.0, high p 0.95, and max new tokens 131072. This defines a comparatively open sampling regime with a big technology price range.

For Terminal Bench and SWE-bench Verified, the configuration makes use of temperature 0.7, high p 1.0, and max new tokens 16384. For τ²-Bench, the configuration makes use of temperature 0 and max new tokens 16,384. These stricter settings cut back randomness for duties that want secure device use and multi step interplay.

Z.ai workforce additionally recommends turning on Preserved Considering mode for multi flip agentic duties akin to τ²-Bench and Terminal Bench 2. This mode preserves inside reasoning traces throughout turns. That’s helpful if you construct brokers that want lengthy chains of perform calls and corrections.

How GLM-4.7-Flash matches developer workflows

GLM-4.7-Flash combines a number of properties which might be related for agentic, coding targeted functions:

A 30B-A3B MoE structure with 31B params and a 128k token context size.
Sturdy benchmark outcomes on AIME 25, GPQA, SWE-bench Verified, τ²-Bench, and BrowseComp in comparison with different fashions in the identical desk.
Documented analysis parameters and a Preserved Considering mode for multi flip agent duties.
Top notch help for vLLM, SGLang, and Transformers based mostly inference, with prepared to make use of instructions.
A rising set of finetunes and quantizations, together with MLX conversions, within the Hugging Face ecosystem.

Take a look at the Mannequin weight. Additionally, be happy to observe us on Twitter and don’t overlook to affix our 100k+ ML SubReddit and Subscribe to our E-newsletter. Wait! are you on telegram? now you may be a part of us on telegram as effectively.

Michal Sutter is a knowledge science skilled with a Grasp of Science in Information Science from the College of Padova. With a strong basis in statistical evaluation, machine studying, and knowledge engineering, Michal excels at reworking complicated datasets into actionable insights.

Sample Page Title

Mannequin class and place contained in the GLM 4.7 household

Structure and context size

Benchmark efficiency within the 30B class

Analysis parameters and pondering mode

How GLM-4.7-Flash matches developer workflows

Related Articles

Tim Scott Expects Proposal for Stalled Crypto Invoice This Week

Remodel Your TFSA Right into a Money-Gushing Machine With Simply $20,000

Foreign exchange EA Setup Information: The First 90 Days That Really Matter – My Buying and selling – 17 March 2026

LEAVE A REPLY Cancel reply

Latest Articles

Tim Scott Expects Proposal for Stalled Crypto Invoice This Week

Remodel Your TFSA Right into a Money-Gushing Machine With Simply $20,000

Foreign exchange EA Setup Information: The First 90 Days That Really Matter – My Buying and selling – 17 March 2026

Simply How A lot Danger Is Trump Prepared to Soak up Iran?

Apple pushes first Background Safety Enhancements replace to repair WebKit flaw

EDITOR PICKS

Tim Scott Expects Proposal for Stalled Crypto Invoice This Week

Remodel Your TFSA Right into a Money-Gushing Machine With Simply $20,000

Foreign exchange EA Setup Information: The First 90 Days That Really...

POPULAR POSTS

Qubic’s Mining Pool Attacking Monero Falls Beneath Assault

What’s nano-texture glass and do I would like it?

Feedback on the brand new buying and selling dialog in Metatrader...

POPULAR CATEGORY