Pricing Overview - Tensormesh User Documentation

Tensormesh uses a transparent, usage-based pricing model designed to align with your success. You only pay for what you use, and our pricing reflects the value you receive.

Serverless Token Pricing

Serverless inference is billed on a per-token basis. Each model has three token price tiers:

Token Type	What It Is	Cost
Input Token	Tokens in your prompt (system message, user message, context)	Per-model rate
Output Token	Tokens generated in the model’s response	Per-model rate
Cached Token	Input tokens served from Tensormesh’s KV cache	$0.00

Cached tokens are free. When Tensormesh serves a token from its KV cache instead of recomputing it, you are not charged. The more your requests share common prefixes — system messages, conversation history, shared context — the higher your cache hit rate and the lower your effective cost.

Current per-model rates are displayed on the Deploy → Serverless page. Each model card shows the exact input, output, and cached token price.

How $0 Cached Tokens Work

Every time you send a request, Tensormesh checks whether the input tokens have already been computed and stored in the KV cache. If they have, those tokens are served instantly — at zero cost to you.

First Request

Input tokens are computed and stored in the KV cache. You pay the standard input token rate.

Subsequent Requests

Matching input tokens are served from cache. You pay $0 for those tokens.

What Gets Cached

System messages and instructions
Shared conversation prefixes
Repeated context windows (e.g. long documents passed in every request)
Common prompt templates

Maximizing Your Cache Hit Rate

Use consistent system messages

Keep your system prompt identical across requests. Even a single character change creates a cache miss. Use a fixed template and only vary the user message.

Put stable content first

Structure prompts so that the content that changes least (system prompt, static context) comes before content that varies (the user’s latest message). The cache matches from the start of the prompt.

Standardize prompt templates

Implement shared prompt templates across your application. Consistent prefixes across different users or sessions all benefit from the same cached tokens.

Monitor your hit rate

Check your cache hit rate under Operations → Serverless Usage. If it’s low, look for variability in the early parts of your prompts — that’s where cache misses happen.

External Storage

By default, the KV cache is in-memory and scoped to a single request window. External Storage extends the cache to a persistent bucket that survives across requests, sessions, and even across different users in your application.

External Storage is available as a paid subscription tier. Navigate to Operations → Storage to view plans and subscribe.

What External Storage Does

Without external storage, every new session starts cold — tokens that were cached in a previous session need to be recomputed. With a storage bucket, that context is persisted.

Persistent KV Cache

Cache entries survive beyond a single request window. Long system prompts and shared context are stored across sessions.

Cross-Session Savings

Users returning to an existing conversation, or requests sharing a common prefix, all benefit from cached tokens — even hours later.

Faster Cold Starts

New sessions that share context with previous ones skip recomputation entirely, reducing time-to-first-token.

Works Automatically

Once your bucket is active, caching is handled transparently. No changes to your API calls required.

Storage Tiers

Storage plans are tiered by bucket size. A free tier (0 GB) is available by default — you only pay when you subscribe to a plan.

Plan	Best For
Bronze	Getting started — low to moderate request volume
Silver	Agentic developers — more headroom for parallel workloads
Gold	Production-scale inference — high volume and large system prompts

Subscribe or change plans anytime under Operations → Storage. Billing adjusts immediately — no data is lost when upgrading.

Billing Details

Billing Frequency

Monthly billing cycles with detailed usage reports

Cost Tracking

Real-time cost visibility under Operations → Serverless Usage

Usage Analytics

Per-model breakdown of input, output, and cached token counts

Storage Subscription

Flat monthly rate per storage tier, billed separately from token usage

Frequently Asked Questions

Are cached tokens really free?

Yes. When a token is served from the KV cache, the cached token price is $0.00. You are only charged the standard input token rate for tokens that need to be freshly computed.

How do I know how many tokens are being cached?

Navigate to Operations → Serverless Usage to see your input, output, and cached token counts per model and per period.

Do I need External Storage to get $0 cached tokens?

No. In-memory caching is active by default and cached tokens are always $0.00. External Storage extends caching across sessions and increases your cache hit rate, but it is not required to benefit from $0 cached tokens.

What happens if my cache hit rate is 0%?

You pay only the standard input and output token rates. There are no additional charges. A 0% cache hit rate just means you’re not saving anything yet — structuring prompts with consistent prefixes is the fastest way to improve it.

Are there any hidden fees?

No. You pay per token for serverless inference plus an optional flat monthly rate for External Storage. No setup fees, no minimums, no per-request overhead.

Can I predict my costs?

Check the per-model rates on the Deploy → Serverless page, estimate your input and output token counts, and factor in your expected cache hit rate. Your Serverless Usage history is a good baseline for projecting future spend.

Documentation Index

​Serverless Token Pricing

​How $0 Cached Tokens Work

First Request

Subsequent Requests

​What Gets Cached

​Maximizing Your Cache Hit Rate

​External Storage

​What External Storage Does

Persistent KV Cache

Cross-Session Savings

Faster Cold Starts

Works Automatically

​Storage Tiers

​Billing Details

Billing Frequency

Cost Tracking

Usage Analytics

Storage Subscription

​Frequently Asked Questions

Serverless Token Pricing

How $0 Cached Tokens Work

What Gets Cached

Maximizing Your Cache Hit Rate

External Storage

What External Storage Does

Storage Tiers

Billing Details

Frequently Asked Questions