AI modelsAI Docs & Research

Overview

Home

LLMs

All Models
Open Source
Proprietary

Learn

Topics
Study Material

Industry

News

v1.0 · Updated daily

/ AI models

Search docs…⌘K

Architectures

Mixture of Experts

Scale parameters without scaling compute.

8 min read

MoE models route each token to a subset of expert sub-networks.

Total parameter count grows while active compute stays modest.

Mixtral, DeepSeek V3, and rumored GPT-4 all use MoE designs.