VCAL Project
Privacy-first • Self-hosted • Built for production

Infrastructure products for LLM applications

VCAL Project offers two complementary products: VCAL Server for semantic caching and AI Cost Firewall for OpenAI-compatible request cost control.

Product

VCAL Server

Production semantic cache for LLM applications. Reduce repeated model calls, lower latency, and keep data inside your own environment.

  • • On-prem / VPC deployment
  • • Open-core foundation with commercial server features
  • • Metrics, licensing, snapshots, and enterprise support
Product

AI Cost Firewall

Free open-source OpenAI-compatible gateway that reduces duplicate and semantically similar LLM requests to cut token spend and improve response times.

  • • Drop-in /v1/chat/completions gateway
  • • Exact cache with semantic reuse layer
  • • Cost-saving visibility with Prometheus and Grafana

Self-hosted by design

Both products are designed for private infrastructure, controlled deployment, and predictable operations inside your own perimeter.

Built for production

Focus on latency, observability, cost control, and operations rather than demos.

Designed to work together

Start with AI Cost Firewall to reduce duplicate LLM requests, then scale with VCAL Server for semantic caching and deeper cost optimization.

Choose the path that fits your stage

Start with AI Cost Firewall when you want a free, easy entry point for immediate savings. Move to VCAL Server when you need a commercial semantic caching platform for production scale.