Cosavu API Documentation
Try Cosavu API for your AI Pipelines and workflows and see the change instantly.
Documentation
API Documentation
The Cosavu API provides programmatic access to our state-of-the-art prompt optimization engine. Build cost-efficient LLM agents by reducing noise and enforcing structural integrity at the source.
Authentication
Authentication
The Cosavu API uses Bearer Authentication. All requests must include an API key in the request header. You can manage your API keys in the developer dashboard.
Base URL
Endpoints Overview
All API requests should be made to the following base internal endpoint.
Rate Limits
Request & Response Examples
Rate limits vary by plan tier. Limits are applied per API key on a per-minute basis.
POST /optimize
Request & Response Examples
Submits raw prompt text to the optimization engine. The engine decomposes the input into structural blocks, refines instructions, and strips redundant tokens.
curl -X POST https://api.cosavu.com/v1/optimize \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "Write a summary of the quarterly report focus on finance.",
"target_model": "gpt-4"
}'Response Format
Request & Response Examples
The engine returns a Prompt Intermediate Representation (IR) consisting of identified blocks and token metadata.
{
"original_text": "...",
"blocks": [
{
"block_type": "INSTRUCTION",
"content": "Summarize Q3 financial report...",
"original_tokens": 42,
"optimized_tokens": 12,
"is_compressed": true
}
],
"total_original_tokens": 42,
"total_optimized_tokens": 12,
"latency_ms": 284
IDENTITY
System persona definitions
DATA
Background data/knowledge
INSTRUCTION
Specific action requests
CONSTRAINT
Formatting or safety rules
EXAMPLE
Few-shot demonstrations
OUTPUT
Format specifications
IDENTITY
System persona definitions
DATA
Background data/knowledge
INSTRUCTION
Specific action requests
CONSTRAINT
Formatting or safety rules
EXAMPLE
Few-shot demonstrations
OUTPUT
Format specifications
Error Handling
Request & Response Examples
Bad Request
Often due to empty prompt or invalid JSON.
Unauthorized
Invalid or missing API key.
Too Many Requests
Rate limit exceeded for your tier.
Internal Error
Optimization cluster timeout or malfunction.
Bad Request
Often due to empty prompt or invalid JSON.
Unauthorized
Invalid or missing API key.
Too Many Requests
Rate limit exceeded for your tier.
Internal Error
Optimization cluster timeout or malfunction.
Bad Request
Often due to empty prompt or invalid JSON.
Unauthorized
Invalid or missing API key.
Too Many Requests
Rate limit exceeded for your tier.
Internal Error
Optimization cluster timeout or malfunction.
GET /health
Request & Response Examples
Returns the current operational status of the optimization clusters.
{
"status": "ok",
"engine": "Cosavu-Stan-1",
"version": "1.2.0"
Error Handling
Best Practices
Explicit Structure Separation
The optimizer works best when background data is clearly distinct from instructions. Use clear headers like DATA: or INFO: in your prompts.
Target Specific Models
Setting the target_model parameter allows the engine to strip tokens known to be redundant for that specific architecture (e.g., removing excessive formatting for GPT-4).
Latency Management
Optimization takes 200-500ms on average. For real-time chat applications, we recommend optimizing system prompts asynchronously or during agent initialization, rather than on every user turn.