Cosavu API Documentation

Try Cosavu API for your AI Pipelines and workflows and see the change instantly.

Documentation

API Documentation

The Cosavu API provides programmatic access to our state-of-the-art prompt optimization engine. Build cost-efficient LLM agents by reducing noise and enforcing structural integrity at the source.

Authentication

Authentication

The Cosavu API uses Bearer Authentication. All requests must include an API key in the request header. You can manage your API keys in the developer dashboard.

Base URL

Endpoints Overview

All API requests should be made to the following base internal endpoint.


Rate Limits

Request & Response Examples

Rate limits vary by plan tier. Limits are applied per API key on a per-minute basis.

Plan

Requests/min

Developer

150K/Min

Enterprise

Custom

Plan

Requests/min

Developer

150K/Min

Enterprise

Custom

Plan

Requests/min

Developer

150K/Min

Enterprise

Custom

POST /optimize

Request & Response Examples

Submits raw prompt text to the optimization engine. The engine decomposes the input into structural blocks, refines instructions, and strips redundant tokens.

prompt

string (required)

target_model

string (optional)

curl -X POST https://api.cosavu.com/v1/optimize \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Write a summary of the quarterly report focus on finance.",
    "target_model": "gpt-4"
  }'

Response Format

Request & Response Examples

The engine returns a Prompt Intermediate Representation (IR) consisting of identified blocks and token metadata.

{
  "original_text": "...",
  "blocks": [
    {
      "block_type": "INSTRUCTION",
      "content": "Summarize Q3 financial report...",
      "original_tokens": 42,
      "optimized_tokens": 12,
      "is_compressed": true
    }
  ],
  "total_original_tokens": 42,
  "total_optimized_tokens": 12,
  "latency_ms": 284

IDENTITY

System persona definitions

DATA

Background data/knowledge

INSTRUCTION

Specific action requests

CONSTRAINT

Formatting or safety rules

EXAMPLE

Few-shot demonstrations

OUTPUT

Format specifications

IDENTITY

System persona definitions

DATA

Background data/knowledge

INSTRUCTION

Specific action requests

CONSTRAINT

Formatting or safety rules

EXAMPLE

Few-shot demonstrations

OUTPUT

Format specifications

Error Handling

Request & Response Examples

400

Bad Request

Often due to empty prompt or invalid JSON.

401

Unauthorized

Invalid or missing API key.

429

Too Many Requests

Rate limit exceeded for your tier.

500

Internal Error

Optimization cluster timeout or malfunction.

400

Bad Request

Often due to empty prompt or invalid JSON.

401

Unauthorized

Invalid or missing API key.

429

Too Many Requests

Rate limit exceeded for your tier.

500

Internal Error

Optimization cluster timeout or malfunction.

400

Bad Request

Often due to empty prompt or invalid JSON.

401

Unauthorized

Invalid or missing API key.

429

Too Many Requests

Rate limit exceeded for your tier.

500

Internal Error

Optimization cluster timeout or malfunction.

GET /health

Request & Response Examples

Returns the current operational status of the optimization clusters.

{
  "status": "ok",
  "engine": "Cosavu-Stan-1",
  "version": "1.2.0"

Error Handling

Best Practices



Explicit Structure Separation

The optimizer works best when background data is clearly distinct from instructions. Use clear headers like DATA: or INFO: in your prompts.


Target Specific Models

Setting the target_model parameter allows the engine to strip tokens known to be redundant for that specific architecture (e.g., removing excessive formatting for GPT-4).


Latency Management

Optimization takes 200-500ms on average. For real-time chat applications, we recommend optimizing system prompts asynchronously or during agent initialization, rather than on every user turn.