Tensormesh uses a transparent, usage-based pricing model designed to align with your success. You only pay for what you use, and our pricing reflects the value you receive.Documentation Index
Fetch the complete documentation index at: https://docs.tensormesh.ai/llms.txt
Use this file to discover all available pages before exploring further.
Serverless Token Pricing
Serverless inference is billed on a per-token basis. Each model has three token price tiers:| Token Type | What It Is | Cost |
|---|---|---|
| Input Token | Tokens in your prompt (system message, user message, context) | Per-model rate |
| Output Token | Tokens generated in the model’s response | Per-model rate |
| Cached Token | Input tokens served from Tensormesh’s KV cache | $0.00 |
Cached tokens are free. When Tensormesh serves a token from its KV cache instead of recomputing it, you are not charged. The more your requests share common prefixes — system messages, conversation history, shared context — the higher your cache hit rate and the lower your effective cost.
How $0 Cached Tokens Work
Every time you send a request, Tensormesh checks whether the input tokens have already been computed and stored in the KV cache. If they have, those tokens are served instantly — at zero cost to you.First Request
Input tokens are computed and stored in the KV cache. You pay the standard input token rate.
Subsequent Requests
Matching input tokens are served from cache. You pay $0 for those tokens.
What Gets Cached
- System messages and instructions
- Shared conversation prefixes
- Repeated context windows (e.g. long documents passed in every request)
- Common prompt templates
Maximizing Your Cache Hit Rate
Use consistent system messages
Use consistent system messages
Keep your system prompt identical across requests. Even a single character change creates a cache miss. Use a fixed template and only vary the user message.
Put stable content first
Put stable content first
Structure prompts so that the content that changes least (system prompt, static context) comes before content that varies (the user’s latest message). The cache matches from the start of the prompt.
Standardize prompt templates
Standardize prompt templates
Implement shared prompt templates across your application. Consistent prefixes across different users or sessions all benefit from the same cached tokens.
Monitor your hit rate
Monitor your hit rate
Check your cache hit rate under Operations → Serverless Usage. If it’s low, look for variability in the early parts of your prompts — that’s where cache misses happen.
External Storage
By default, the KV cache is in-memory and scoped to a single request window. External Storage extends the cache to a persistent bucket that survives across requests, sessions, and even across different users in your application.External Storage is available as a paid subscription tier. Navigate to Operations → Storage to view plans and subscribe.
What External Storage Does
Without external storage, every new session starts cold — tokens that were cached in a previous session need to be recomputed. With a storage bucket, that context is persisted.Persistent KV Cache
Cache entries survive beyond a single request window. Long system prompts and shared context are stored across sessions.
Cross-Session Savings
Users returning to an existing conversation, or requests sharing a common prefix, all benefit from cached tokens — even hours later.
Faster Cold Starts
New sessions that share context with previous ones skip recomputation entirely, reducing time-to-first-token.
Works Automatically
Once your bucket is active, caching is handled transparently. No changes to your API calls required.
Storage Tiers
Storage plans are tiered by bucket size. A free tier (0 GB) is available by default — you only pay when you subscribe to a plan.| Plan | Best For |
|---|---|
| Bronze | Getting started — low to moderate request volume |
| Silver | Agentic developers — more headroom for parallel workloads |
| Gold | Production-scale inference — high volume and large system prompts |
Billing Details
Billing Frequency
Monthly billing cycles with detailed usage reports
Cost Tracking
Real-time cost visibility under Operations → Serverless Usage
Usage Analytics
Per-model breakdown of input, output, and cached token counts
Storage Subscription
Flat monthly rate per storage tier, billed separately from token usage
Frequently Asked Questions
Are cached tokens really free?
Are cached tokens really free?
Yes. When a token is served from the KV cache, the cached token price is $0.00. You are only charged the standard input token rate for tokens that need to be freshly computed.
How do I know how many tokens are being cached?
How do I know how many tokens are being cached?
Navigate to Operations → Serverless Usage to see your input, output, and cached token counts per model and per period.
Do I need External Storage to get $0 cached tokens?
Do I need External Storage to get $0 cached tokens?
No. In-memory caching is active by default and cached tokens are always $0.00. External Storage extends caching across sessions and increases your cache hit rate, but it is not required to benefit from $0 cached tokens.
What happens if my cache hit rate is 0%?
What happens if my cache hit rate is 0%?
You pay only the standard input and output token rates. There are no additional charges. A 0% cache hit rate just means you’re not saving anything yet — structuring prompts with consistent prefixes is the fastest way to improve it.
Are there any hidden fees?
Are there any hidden fees?
Can I predict my costs?
Can I predict my costs?
Check the per-model rates on the Deploy → Serverless page, estimate your input and output token counts, and factor in your expected cache hit rate. Your Serverless Usage history is a good baseline for projecting future spend.

