VCAL Project
Privacy-first • Self-hosted • Built for production

Infrastructure products for LLM applications and agentic workflows

Start with AI Cost Firewall, a free OpenAI-compatible gateway for reducing repeated LLM calls, improving cost visibility, and adding safeguards to AI applications, RAG systems, and agentic workflows. Add VCAL Semantic Cache when semantic reuse becomes critical infrastructure for low-latency, private, and scalable AI systems.

Free open-source entry point

AI Cost Firewall

An OpenAI-compatible gateway that sits in front of your application to reduce duplicate and semantically similar requests, improve cost visibility, and add production safeguards.

  • • Drop-in gateway for /v1/chat/completions
  • • Exact and semantic reuse to reduce token spend
  • • Fits agentic steps such as routing, classification, summarization, validation, and repeated reasoning
  • • Prometheus and Grafana visibility out of the box
  • • Tested at 500 RPS as a full gateway layer
  • • Best starting point for teams adopting cost control quickly
Production semantic cache layer

VCAL Semantic Cache

A production-ready semantic cache service for teams that need caching to be a durable, observable, and deployable part of their AI infrastructure.

  • • Dedicated semantic cache service for LLM workloads
  • • Semantic reuse layer for LLM applications, RAG systems, and agentic workflows
  • • Built for private deployment inside your own environment
  • • Snapshots, observability, licensing, and enterprise support
  • • Tested at 30K RPS as a focused cache lookup layer
  • • Natural next step when semantic reuse becomes strategic

Benchmarks measure different workloads: AI Cost Firewall is tested as a full OpenAI-compatible gateway, while VCAL Semantic Cache is tested as a focused semantic cache lookup layer. Real-world throughput depends on hardware, configuration, request size, cache behavior, and deployment topology.

How the product flow works

Free AI Cost Firewall → VCAL Semantic Cache

The products are designed to work as a progression, not as competing choices. Teams typically start with AI Cost Firewall because it is free, fast to deploy, and immediately useful. As semantic caching proves its value, VCAL Semantic Cache becomes the production cache layer for deeper optimization and stronger operational control.

Step 1

Start with the gateway

Put AI Cost Firewall in front of your LLM application to reduce duplicate requests, observe traffic, and understand where spend is being wasted.

Step 2

Validate semantic reuse

Use real traffic and metrics to confirm where semantic caching meaningfully lowers latency and model usage.

Step 3

Scale with VCAL Semantic Cache

Add VCAL Semantic Cache when semantic caching becomes a core infrastructure function that needs persistence, operational maturity, and enterprise deployment options.

Why VCAL

Built for private, production AI infrastructure

Self-hosted by design

Both products are built for private infrastructure, controlled deployment, and data staying inside your own perimeter.

Built for production

Focused on latency, observability, cost control, and operational predictability, not demo-only AI tooling.

Designed to work together

AI Cost Firewall helps teams start quickly. VCAL Semantic Cache extends that foundation when semantic caching needs to become a durable production service.

Contact VCAL Project

Questions about VCAL Semantic Cache, AI Cost Firewall, partnerships, pilots, or product fit? Send us a message and we’ll get back to you.

We’ll only use your email to reply to your inquiry.