Run any MLX model on your Mac

vMLX auto-detects 50+ architectures from HuggingFace and runs them natively on Apple Silicon. Browse popular model families, check RAM requirements, and discover our in-house abliterated and REAP-pruned MLX models.

Model Library

Popular Model Families

These are the most popular HuggingFace MLX models people run locally on Mac with vMLX. Every model listed below is auto-detected, auto-configured, and ready to run in one click.

DeepSeek V3 / R1

Run DeepSeek locally on Mac. V3 is a 671B MoE general-purpose powerhouse; R1 is the reasoning variant with chain-of-thought. Both work with vMLX's reasoning parser and tool calling.

MoE Reasoning

Llama 3 / Llama 4

Run Llama locally on any Mac. Meta's open-weight family spans 1B to 405B parameters. Llama 4 Scout and Maverick bring MoE efficiency and 10M-token context.

Dense & MoE

Qwen 2.5 / 3 / 3.5

Run Qwen locally on Mac. Dense models from 0.5B to 72B, plus MoE variants up to 397B. Qwen 3.5 adds vision-language (VL) capabilities. Best-in-class tool calling support.

Dense & MoE Vision

Gemma 3

Google's efficient open model family. Strong multilingual performance in 1B, 4B, 12B, and 27B sizes. Optimized for Apple Silicon via MLX quantization.

Dense

Mistral / Mixtral

Mistral AI's fast and capable models. Mistral (7B, 22B dense) and Mixtral (8x7B, 8x22B MoE) deliver strong coding and instruction-following on Mac.

Dense & MoE

Phi-3 / Phi-4

Microsoft's compact models punch above their weight. Phi-4 (14B) and Phi-3 mini (3.8B) are ideal for Macs with 8–16 GB RAM. Great for coding tasks.

Dense Compact

MiniMax M2.5

A large MoE model with efficient active parameters designed for long-context generation and complex multi-turn conversations.

MoE

GLM 4.7 Flash

THUDM's fast reasoning model with native thinking support. Collapsible chain-of-thought blocks display inline in vMLX's UI.

Reasoning

Step 3.5 Flash

StepFun's lightweight model built for speed. Responsive real-time generation makes it a strong choice for interactive local inference on Mac.

Compact

Published by dealignai

In-House Models by dealignai

We publish abliterated (CRACK) and REAP-pruned MLX models on HuggingFace. These models remove refusal behavior or prune redundant experts for dramatically better efficiency without sacrificing quality.

Qwen3.5-VL-9B CRACK

CRACK 8-bit 4-bit

Abliterated vision-language model based on Qwen 3.5 VL 9B. CRACK removes alignment-imposed refusal while preserving instruction-following and visual reasoning. Available in both 8-bit and 4-bit MLX quantizations for flexible RAM usage.

9B params Vision + Language 16 GB+ RAM

View on HuggingFace

Qwen3.5-397B-A17B REAP

REAP 4-bit

REAP-pruned 397B Mixture-of-Experts with only 17B active parameters. REAP (Redundant Expert Ablation Pruning) removes low-impact experts to cut memory and compute requirements while maintaining benchmark performance. The largest REAP-pruned MLX model available.

397B total / 17B active MoE 128 GB+ RAM

View on HuggingFace

Qwen3.5-VL-35B-A3B CRACK

CRACK 8-bit

Abliterated vision-language MoE model: 35B total parameters with only 3B active. Combines the efficiency of Mixture-of-Experts routing with CRACK abliteration for an uncensored VL experience that fits in modest RAM.

35B total / 3B active Vision + Language MoE 32 GB+ RAM

View on HuggingFace

Qwen3.5-VL-397B-A17B REAP

REAP 4-bit

The largest vision-language model on MLX. A REAP-pruned 397B MoE with 17B active parameters and full multimodal support. Runs on 192 GB+ Macs with vMLX's 5-layer caching stack including VL-aware prefix caching, paged KV, q4/q8 quantized KV, batching, and disk cache.

397B total / 17B active Vision + Language MoE 192 GB+ RAM

View on HuggingFace

Qwen3.5-VL-2B CRACK

CRACK 4-bit

The tiniest abliterated vision-language model in our lineup. At 4-bit quantization, it fits comfortably on 8 GB Macs while still delivering image understanding, OCR, and visual Q&A without refusal. Perfect entry point for VL on minimal hardware.

2B params Vision + Language 8 GB RAM

View on HuggingFace

Hardware Guide

RAM Guide — Which Models Fit Your Mac?

Apple Silicon's unified memory is shared between the OS, apps, and the model. The table below shows the largest MLX model size you can comfortably run at each RAM tier, with recommended examples.

Unified RAM	Max Model Size	Example Models
8 GB	~4B	Phi-3 mini 3.8B, Qwen3.5-VL-2B CRACK (4-bit), Llama 3.2 3B
16 GB	~20B	Qwen3.5-VL-9B CRACK (8-bit), Phi-4 14B, Mistral 7B, Gemma 3 12B
32 GB	~35B	Qwen3.5-VL-35B-A3B CRACK, Gemma 3 27B, Llama 3 70B (4-bit)
64 GB	~70B	Llama 3.1 70B (8-bit), Qwen 2.5 72B, DeepSeek R1 Distill 70B
128 GB	~100B+	Qwen3.5-397B-A17B REAP (4-bit), DeepSeek V3 (4-bit), Llama 4 Scout
192 GB+	397B MoE	Qwen3.5-VL-397B-A17B REAP, DeepSeek V3 (8-bit), Llama 4 Maverick

Engine Capabilities

Architecture Support

vMLX auto-detects the model architecture, tool call format, and reasoning format from HuggingFace config files. No manual configuration needed — just download and run.

50+

Auto-Detected Architectures

Tool Call Parsers

Reasoning Parsers

Layer Caching Stack

Mamba / SSM Hybrids — Dedicated BatchMambaCache for batched inference on state-space models. No other MLX engine supports this.
Vision-Language Models — Full 5-layer caching on VL models (prefix + paged KV + q4/q8 quantized KV + continuous batching + persistent disk cache). Only vMLX does this.
Mixture-of-Experts — Efficient routing for MoE models like DeepSeek V3, Qwen MoE, Mixtral, and REAP-pruned variants with reduced expert counts.
Tool Calling — 14 parsers covering Hermes, Qwen, Llama, Mistral, DeepSeek, Functionary, and more. Function calls work out of the box with vMLX's 20+ built-in agentic tools.
Reasoning / Thinking — 4 parsers for DeepSeek R1, Qwen 3, GLM-4.7, and generic <think> blocks. Collapsible reasoning UI with enable_thinking and reasoning_effort API support.
Speculative Decoding — Pair any large model with a smaller draft model for faster token generation on Apple Silicon.

Start Running Models Locally

Browse our abliterated and REAP-pruned models on HuggingFace, or download vMLX to run any MLX model on your Mac.

dealignai on HuggingFace Download vMLX