Changelog

What's new in vMLX

v1.1.9 March 2026 Latest
  • Multilingual website — Korean, Spanish, Chinese, and Japanese with auto-detect
  • Updated model compatibility
  • Bug fixes and stability improvements
v1.1.8 February 2026
  • Persistent disk cache — cache survives app restarts
  • Configurable disk cache size and directory
  • Performance improvements for large contexts
v1.1.7 February 2026
  • KV cache quantization (q4/q8) — 2–4x cache memory savings
  • Storage-boundary quantization — full precision during generation
  • Improved memory efficiency
v1.1.6 January 2026
  • Browser automation tool — Playwright-based
  • Web search tool — DuckDuckGo + Brave
  • URL fetch tool
  • Expanded agentic toolkit to 20+ tools
v1.1.5 January 2026
  • Vision model support — Qwen VL, LLaVA
  • VL models work with full 5-layer caching stack
  • Image attachment in chat UI
v1.1.0 December 2025
  • Mamba/SSM support with BatchMambaCache
  • Speculative decoding with configurable draft model
  • Separate embedding model endpoint
v1.0.0 November 2025
  • Initial release
  • Paged multi-context KV cache with prefix caching
  • Continuous batching — up to 256 sequences
  • Built-in agentic coding tools (MCP)
  • OpenAI-compatible API — 7 endpoints
  • Built-in HuggingFace model browser
  • Code-signed with Apple Developer ID