AI Cost Firewall
Reduce repeated LLM spend with a fully free OpenAI-compatible gateway that combines exact caching and semantic reuse in front of your upstream provider.
Best for support bots, internal assistants, documentation search, and agent workloads with repeated prompts.
Why AI Cost Firewall
Many LLM applications repeatedly send the same or slightly rephrased prompts. AI Cost Firewall sits in front of the provider API and intercepts those requests before they become unnecessary token spend.
Exact cache
Instant reuse for identical prompts and payloads.
Semantic layer
Reuse answers for meaningfully similar prompts, not only exact matches.
Operational visibility
Track hits, misses, and estimated savings with Prometheus and Grafana.
Quickstart
Run the gateway in front of your existing OpenAI-compatible client and point your application to the firewall endpoint.
git clone https://github.com/vcal-project/ai-firewall.git
cd ai-firewall
cp configs/ai-firewall.conf.example configs/ai-firewall.conf
nano configs/ai-firewall.conf # Replace the placeholders with your API keys
docker compose pull
docker compose up -d
Open source and self-deployable
AI Cost Firewall is a fully free project that you can deploy in your own environment. It is designed for teams that want a simple OpenAI-compatible gateway with exact and semantic caching, without committing to a managed service or commercial package.
Fully free
Use, test, modify, and deploy it yourself as part of your own stack.
Self-hosted
Keep traffic, cache data, and observability inside your own infrastructure perimeter.
OpenAI-compatible
Put it in front of existing clients and workflows with minimal application changes.
Get started with AI Cost Firewall
Explore the repository, deploy it in your own environment, and test how much repeated LLM traffic you can eliminate with exact and semantic caching.