VCAL

Frequently Asked Questions

Last updated: April 2026


VCAL builds infrastructure products for LLM applications. The portfolio currently includes vcal-core, a lightweight Rust library for fast in-process semantic indexing, VCAL Server, a production-ready semantic cache service, and AI Cost Firewall, an OpenAI-compatible gateway that reduces repeated and semantically similar LLM requests before they reach upstream providers.

This FAQ explains the “why” and “how” for CIOs, DevOps/MLOps engineers, developers, and business leaders evaluating adoption.


General

What is VCAL?

VCAL is a family of privacy-first infrastructure products for LLM applications. vcal-core is the embedded Rust library, VCAL Server is the production semantic cache service built on top of it, and AI Cost Firewall is an OpenAI-compatible gateway that helps reduce unnecessary LLM traffic before it reaches providers. Together, these products help teams improve latency, control spend, and keep sensitive AI traffic inside their own environment.

Who is VCAL for?

VCAL is built for teams operating AI-powered applications in production. CIOs and CTOs get better cost control, more predictable latency, and privacy-first deployment options. DevOps and MLOps teams get deployable infrastructure with Prometheus and Grafana visibility. Developers can embed vcal-core directly in Rust services or deploy VCAL Server and AI Cost Firewall as standalone services. Finance and product leaders benefit from clearer cost attribution and better visibility into the ROI of caching and traffic optimization.

What is the difference between vcal-core, VCAL Server, and AI Cost Firewall?

vcal-core is the low-level Rust library for in-process semantic indexing and nearest-neighbor search. VCAL Server is the production semantic cache service built for deployment, operations, and observability. AI Cost Firewall sits in front of LLM APIs as a gateway and reduces unnecessary requests using exact and semantic reuse, request validation, traffic controls, and cost visibility. In simple terms: vcal-core is the building block, VCAL Server is the semantic cache service, and AI Cost Firewall is the upstream traffic-control layer.

Is VCAL a vector database?

No. VCAL is not positioned as a general-purpose vector database. vcal-core and VCAL Server are optimized for semantic caching, nearest-neighbor lookup, and operational reuse of embeddings and answers. They are designed as focused infrastructure components rather than broad analytics or retrieval platforms.

Why not just call the LLM provider directly?

Direct provider calls are simple at the beginning, but they often become expensive and harder to govern as traffic grows. Repeated prompts, semantically similar prompts, oversized requests, invalid models, and weak observability all create unnecessary cost and operational risk. VCAL products add a control layer that helps teams reduce avoidable spend, improve response predictability, and introduce clearer operational guardrails.

VCAL Server & vcal-core

What core features are included?

Core capabilities include:
- HNSW-based nearest-neighbor search
- insert, upsert, and delete by external ID
- single-query and batch search APIs
- snapshot save/load for faster restarts
- TTL and capacity-based eviction
- Prometheus metrics and operational visibility in the server
- configuration via file and environment-based deployment patterns

How does snapshotting work?

In vcal-core, snapshot persistence is byte-based and application-controlled. In VCAL Server, snapshots are exposed operationally so the service can load an index on startup and save one on demand. This helps speed up restarts and reduce warm-up time after deployments or maintenance.

Does VCAL support deletes and updates?

Yes. Deletes are soft by default and use tombstones so removed items are skipped by future searches. Updates are typically handled via upsert semantics. This keeps the interface simple while preserving index integrity and predictable runtime behavior.

What observability is included?

VCAL Server exposes Prometheus metrics for requests, errors, active entries, snapshot freshness, and latency. Grafana dashboards can be used to monitor cache efficiency, operational health, and system behavior over time. This helps teams prove value, spot regressions, and tune deployments with real traffic data rather than guesswork.

Is VCAL safe to run on-prem?

Yes. VCAL products are designed for deployment in your own infrastructure, including on-prem environments, private cloud, or VPC-based setups. By default, data stays within the systems you control unless you explicitly route or export it elsewhere. As with any infrastructure component, access control, reverse proxying, TLS, and network policies should be configured by your team.

What data does VCAL store?

vcal-core and VCAL Server store embeddings, identifiers, and associated cache data that your application writes into them. If snapshots are enabled, that state can also be persisted to disk. Metrics remain local unless you export them to your monitoring stack.

How should we handle PII or sensitive data?

Many teams store only minimized identifiers, hashes, or internal references alongside vectors. If your workflow involves personal or sensitive data, follow your own data minimization, retention, and access-control policies. VCAL products are designed to support private deployment, but data governance decisions should still be made by your organization.

What scale does VCAL target?

VCAL is designed for fast, practical semantic caching and local vector reuse rather than massive distributed analytics. Actual scale depends on embedding dimensions, hardware, memory budget, and operational goals. For larger deployments, teams can segment workloads, shard by key or tenant, or run multiple instances behind standard infrastructure controls.

What parameters matter most?

The most important parameters typically include embedding dimension, graph connectivity, search breadth, TTL, and capacity limits. Quality also depends on embedding normalization and model consistency. For configuration examples and deployment guidance, see the documentation.

AI Cost Firewall

What is AI Cost Firewall?

AI Cost Firewall is an OpenAI-compatible gateway that sits in front of LLM providers and reduces unnecessary API spend. It does this by intercepting traffic, validating requests, applying exact and semantic reuse, and forwarding only the requests that actually need to reach the upstream model provider.

How is AI Cost Firewall different from VCAL Server?

VCAL Server is a semantic cache service. AI Cost Firewall is a traffic-control gateway. VCAL Server focuses on vector-based reuse and semantic lookup. AI Cost Firewall focuses on controlling upstream LLM traffic, reducing duplicate and semantically similar requests, classifying failures, enforcing request limits, and exposing cost-focused operational metrics. Some teams may use them separately; others may combine them as layers in the same architecture.

How does AI Cost Firewall save money?

It reduces repeated and semantically similar LLM calls before they reach the provider. Instead of sending every request upstream, it can serve exact matches from cache and, when configured, reuse responses for semantically close prompts. This reduces token consumption, cuts avoidable requests, and makes overall LLM spend more predictable.

Does AI Cost Firewall change my application code?

Usually very little. AI Cost Firewall is designed to be OpenAI-compatible, so many teams can adopt it by changing the base URL and API routing in their application or gateway layer. This makes evaluation and rollout simpler than rewriting application logic.

What kinds of requests can AI Cost Firewall optimize?

It is especially effective for workloads with repeated prompts, recurring support questions, internal copilots, onboarding assistants, policy lookups, agent workflows, or other traffic patterns where prompts repeat exactly or remain semantically close across users and sessions.

Does AI Cost Firewall only work with OpenAI?

It is designed around an OpenAI-compatible interface, which makes it straightforward to place in front of providers or systems that support the same request format. Compatibility and deployment patterns depend on your stack, but the goal is to fit into existing LLM applications with minimal friction.

What operational controls does AI Cost Firewall add?

AI Cost Firewall adds request validation, model allow-listing, request-size controls, error classification, timeout visibility, readiness and health endpoints, graceful shutdown behavior, cache metrics, and cost-related observability. In practice, it functions as an operations-aware control layer for LLM API traffic, not just as a cache.

What visibility does AI Cost Firewall provide?

It provides Prometheus-friendly metrics for cache hits, misses, upstream calls, savings, and request behavior. This helps teams understand not only whether reuse is happening, but also whether it is creating measurable financial and operational value over time.

Can AI Cost Firewall run privately?

Yes. It is intended for deployment in your own environment, including private cloud, VPC, and on-prem infrastructure. That allows teams to introduce cost control and traffic supervision without routing their prompts through an external optimization SaaS layer.

When should we choose AI Cost Firewall instead of building caching ourselves?

Building ad hoc caching is usually easy at first and harder later. Teams often discover they also need validation, observability, cost reporting, semantic matching, timeout handling, deployment behavior, and cleaner failure modes. AI Cost Firewall packages these concerns into a deployable control layer so teams can move faster and operate with fewer blind spots.

Business & Pricing

How do VCAL products save money?

They reduce redundant LLM work. VCAL Server avoids repeated or similar semantic lookups, while AI Cost Firewall reduces unnecessary upstream traffic and adds visibility into what is being saved. Together, they help teams lower token spend, improve latency, and make operating costs more predictable.

Is the core open-source?

Yes. vcal-core is open-source under Apache-2.0. AI Cost Firewall is also open-source. VCAL Server adds production-oriented service capabilities and is offered commercially. This model supports transparency and adoption while funding continued product development.

Can VCAL integrate into existing stacks?

Yes. Teams can embed vcal-core in Rust services, deploy VCAL Server as an internal HTTP service, or place AI Cost Firewall in front of existing LLM clients as a gateway layer. The products are designed to fit into existing infrastructure rather than force a full platform migration.

Can we evaluate VCAL Server?

Yes. Contact us for an evaluation license, pilot discussion, or deployment guidance. We can also help with sizing, dashboards, and architecture review.

Can we try AI Cost Firewall?

Yes. Because AI Cost Firewall is open-source, teams can evaluate it directly in their own environment. If you want help with deployment patterns, dashboards, or pilot architecture, contact us.

Roadmap & Support

What is on the roadmap?

Roadmap areas include richer integrations, broader deployment options, improved operational tooling, and deeper enterprise capabilities. Specific priorities may evolve based on customer needs, pilot feedback, and product maturity.

How do we get help?

For trials, pilots, architecture discussions, or enterprise support, email sales@vcal-project.com. For security or privacy inquiries, contact privacy@vcal-project.com.