Run any MLX model on your Mac

vMLX auto-detects 50+ architectures from HuggingFace and runs them natively on Apple Silicon. Browse popular model families, check RAM requirements, and discover our in-house abliterated and REAP-pruned MLX models.

In-House Models by dealignai

We publish abliterated (CRACK) and REAP-pruned MLX models on HuggingFace. These models remove refusal behavior or prune redundant experts for dramatically better efficiency without sacrificing quality.

Qwen3.5-VL-9B CRACK

CRACK 8-bit 4-bit

Abliterated vision-language model based on Qwen 3.5 VL 9B. CRACK removes alignment-imposed refusal while preserving instruction-following and visual reasoning. Available in both 8-bit and 4-bit MLX quantizations for flexible RAM usage.

9B params Vision + Language 16 GB+ RAM
View on HuggingFace

Qwen3.5-397B-A17B REAP

REAP 4-bit

REAP-pruned 397B Mixture-of-Experts with only 17B active parameters. REAP (Redundant Expert Ablation Pruning) removes low-impact experts to cut memory and compute requirements while maintaining benchmark performance. The largest REAP-pruned MLX model available.

397B total / 17B active MoE 128 GB+ RAM
View on HuggingFace

Qwen3.5-VL-35B-A3B CRACK

CRACK 8-bit

Abliterated vision-language MoE model: 35B total parameters with only 3B active. Combines the efficiency of Mixture-of-Experts routing with CRACK abliteration for an uncensored VL experience that fits in modest RAM.

35B total / 3B active Vision + Language MoE 32 GB+ RAM
View on HuggingFace

Qwen3.5-VL-397B-A17B REAP

REAP 4-bit

The largest vision-language model on MLX. A REAP-pruned 397B MoE with 17B active parameters and full multimodal support. Runs on 192 GB+ Macs with vMLX's 5-layer caching stack including VL-aware prefix caching, paged KV, q4/q8 quantized KV, batching, and disk cache.

397B total / 17B active Vision + Language MoE 192 GB+ RAM
View on HuggingFace

Qwen3.5-VL-2B CRACK

CRACK 4-bit

The tiniest abliterated vision-language model in our lineup. At 4-bit quantization, it fits comfortably on 8 GB Macs while still delivering image understanding, OCR, and visual Q&A without refusal. Perfect entry point for VL on minimal hardware.

2B params Vision + Language 8 GB RAM
View on HuggingFace

RAM Guide — Which Models Fit Your Mac?

Apple Silicon's unified memory is shared between the OS, apps, and the model. The table below shows the largest MLX model size you can comfortably run at each RAM tier, with recommended examples.

Unified RAM Max Model Size Example Models
8 GB ~4B Phi-3 mini 3.8B, Qwen3.5-VL-2B CRACK (4-bit), Llama 3.2 3B
16 GB ~20B Qwen3.5-VL-9B CRACK (8-bit), Phi-4 14B, Mistral 7B, Gemma 3 12B
32 GB ~35B Qwen3.5-VL-35B-A3B CRACK, Gemma 3 27B, Llama 3 70B (4-bit)
64 GB ~70B Llama 3.1 70B (8-bit), Qwen 2.5 72B, DeepSeek R1 Distill 70B
128 GB ~100B+ Qwen3.5-397B-A17B REAP (4-bit), DeepSeek V3 (4-bit), Llama 4 Scout
192 GB+ 397B MoE Qwen3.5-VL-397B-A17B REAP, DeepSeek V3 (8-bit), Llama 4 Maverick

Architecture Support

vMLX auto-detects the model architecture, tool call format, and reasoning format from HuggingFace config files. No manual configuration needed — just download and run.

50+
Auto-Detected Architectures
14
Tool Call Parsers
4
Reasoning Parsers
5
Layer Caching Stack
  • Mamba / SSM Hybrids — Dedicated BatchMambaCache for batched inference on state-space models. No other MLX engine supports this.
  • Vision-Language Models — Full 5-layer caching on VL models (prefix + paged KV + q4/q8 quantized KV + continuous batching + persistent disk cache). Only vMLX does this.
  • Mixture-of-Experts — Efficient routing for MoE models like DeepSeek V3, Qwen MoE, Mixtral, and REAP-pruned variants with reduced expert counts.
  • Tool Calling — 14 parsers covering Hermes, Qwen, Llama, Mistral, DeepSeek, Functionary, and more. Function calls work out of the box with vMLX's 20+ built-in agentic tools.
  • Reasoning / Thinking — 4 parsers for DeepSeek R1, Qwen 3, GLM-4.7, and generic <think> blocks. Collapsible reasoning UI with enable_thinking and reasoning_effort API support.
  • Speculative Decoding — Pair any large model with a smaller draft model for faster token generation on Apple Silicon.

Start Running Models Locally

Browse our abliterated and REAP-pruned models on HuggingFace, or download vMLX to run any MLX model on your Mac.