Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.tensormesh.ai/llms.txt

Use this file to discover all available pages before exploring further.

Tensormesh uses a transparent, usage-based pricing model designed to align with your success. You only pay for what you use, and our pricing reflects the value you receive.

Serverless Token Pricing

Serverless inference is billed on a per-token basis. Each model has three token price tiers:
Token TypeWhat It IsCost
Input TokenTokens in your prompt (system message, user message, context)Per-model rate
Output TokenTokens generated in the model’s responsePer-model rate
Cached TokenInput tokens served from Tensormesh’s KV cache$0.00
Cached tokens are free. When Tensormesh serves a token from its KV cache instead of recomputing it, you are not charged. The more your requests share common prefixes — system messages, conversation history, shared context — the higher your cache hit rate and the lower your effective cost.
Current per-model rates are displayed on the Deploy → Serverless page. Each model card shows the exact input, output, and cached token price.

How $0 Cached Tokens Work

Every time you send a request, Tensormesh checks whether the input tokens have already been computed and stored in the KV cache. If they have, those tokens are served instantly — at zero cost to you.

First Request

Input tokens are computed and stored in the KV cache. You pay the standard input token rate.

Subsequent Requests

Matching input tokens are served from cache. You pay $0 for those tokens.

What Gets Cached

  • System messages and instructions
  • Shared conversation prefixes
  • Repeated context windows (e.g. long documents passed in every request)
  • Common prompt templates

Maximizing Your Cache Hit Rate

Keep your system prompt identical across requests. Even a single character change creates a cache miss. Use a fixed template and only vary the user message.
Structure prompts so that the content that changes least (system prompt, static context) comes before content that varies (the user’s latest message). The cache matches from the start of the prompt.
Implement shared prompt templates across your application. Consistent prefixes across different users or sessions all benefit from the same cached tokens.
Check your cache hit rate under Operations → Serverless Usage. If it’s low, look for variability in the early parts of your prompts — that’s where cache misses happen.

External Storage

By default, the KV cache is in-memory and scoped to a single request window. External Storage extends the cache to a persistent bucket that survives across requests, sessions, and even across different users in your application.
External Storage is available as a paid subscription tier. Navigate to Operations → Storage to view plans and subscribe.

What External Storage Does

Without external storage, every new session starts cold — tokens that were cached in a previous session need to be recomputed. With a storage bucket, that context is persisted.

Persistent KV Cache

Cache entries survive beyond a single request window. Long system prompts and shared context are stored across sessions.

Cross-Session Savings

Users returning to an existing conversation, or requests sharing a common prefix, all benefit from cached tokens — even hours later.

Faster Cold Starts

New sessions that share context with previous ones skip recomputation entirely, reducing time-to-first-token.

Works Automatically

Once your bucket is active, caching is handled transparently. No changes to your API calls required.

Storage Tiers

Storage plans are tiered by bucket size. A free tier (0 GB) is available by default — you only pay when you subscribe to a plan.
PlanBest For
BronzeGetting started — low to moderate request volume
SilverAgentic developers — more headroom for parallel workloads
GoldProduction-scale inference — high volume and large system prompts
Subscribe or change plans anytime under Operations → Storage. Billing adjusts immediately — no data is lost when upgrading.

Billing Details

Billing Frequency

Monthly billing cycles with detailed usage reports

Cost Tracking

Real-time cost visibility under Operations → Serverless Usage

Usage Analytics

Per-model breakdown of input, output, and cached token counts

Storage Subscription

Flat monthly rate per storage tier, billed separately from token usage

Frequently Asked Questions

Yes. When a token is served from the KV cache, the cached token price is $0.00. You are only charged the standard input token rate for tokens that need to be freshly computed.
Navigate to Operations → Serverless Usage to see your input, output, and cached token counts per model and per period.
No. In-memory caching is active by default and cached tokens are always $0.00. External Storage extends caching across sessions and increases your cache hit rate, but it is not required to benefit from $0 cached tokens.
You pay only the standard input and output token rates. There are no additional charges. A 0% cache hit rate just means you’re not saving anything yet — structuring prompts with consistent prefixes is the fastest way to improve it.
No. You pay per token for serverless inference plus an optional flat monthly rate for External Storage. No setup fees, no minimums, no per-request overhead.
Check the per-model rates on the Deploy → Serverless page, estimate your input and output token counts, and factor in your expected cache hit rate. Your Serverless Usage history is a good baseline for projecting future spend.