Privacy-first • Self-hosted • Built for production

Infrastructure products for LLM applications

Start with AI Cost Firewall, a free OpenAI-compatible gateway that reduces wasted LLM spend and improves visibility. Add VCAL Server when semantic caching becomes critical infrastructure for latency, reuse, and scale.

Start with AI Cost Firewall See VCAL Server

Free open-source entry point

AI Cost Firewall

An OpenAI-compatible gateway that sits in front of your application to reduce duplicate and semantically similar requests, improve cost visibility, and add production safeguards.

• Drop-in gateway for /v1/chat/completions
• Exact and semantic reuse to reduce token spend
• Prometheus and Grafana visibility out of the box
• Best starting point for teams adopting cost control quickly

Explore AI Cost Firewall View on GitHub

Production semantic cache layer

VCAL Server

A production-ready semantic cache service for teams that need caching to be a durable, observable, and deployable part of their AI infrastructure.

• Dedicated semantic cache service for LLM workloads
• Built for private deployment inside your own environment
• Snapshots, observability, licensing, and enterprise support
• Natural next step when semantic reuse becomes strategic

Explore VCAL Server Open-source core

How the product flow works

Free AI Cost Firewall → VCAL Server

The products are designed to work as a progression, not as competing choices. Teams typically start with AI Cost Firewall because it is free, fast to deploy, and immediately useful. As semantic caching proves its value, VCAL Server becomes the production cache layer for deeper optimization and stronger operational control.

Step 1

Start with the gateway

Put AI Cost Firewall in front of your LLM application to reduce duplicate requests, observe traffic, and understand where spend is being wasted.

Step 2

Validate semantic reuse

Use real traffic and metrics to confirm where semantic caching meaningfully lowers latency and model usage.

Step 3

Scale with VCAL Server

Add VCAL Server when semantic caching becomes a core infrastructure function that needs persistence, operational maturity, and enterprise deployment options.

Why VCAL

Built for private, production AI infrastructure

Self-hosted by design

Both products are built for private infrastructure, controlled deployment, and data staying inside your own perimeter.

Built for production

Focused on latency, observability, cost control, and operational predictability, not demo-only AI tooling.

Designed to work together

AI Cost Firewall helps teams start quickly. VCAL Server extends that foundation when semantic caching needs to become a durable production service.