Skip to main content

ASG Inference

Access state-of-the-art AI models with per-token pricing and instant access.

Overview

ASG Inference provides:
  • 100+ models — GPT-5.2, Claude Sonnet 4, Gemini 2.5 Pro, DeepSeek R1 and more
  • OpenAI-compatible — Drop-in replacement
  • Per-token billing — Pay exactly for usage
  • Automatic fallback — Reliability across providers

Quick Example

curl -X POST https://agent.asgcompute.com/mcp \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "tools/call",
    "params": {
      "name": "inference_chat",
      "arguments": {
        "model": "openai/gpt-4o-mini",
        "messages": [
          {"role": "user", "content": "Explain quantum computing in one sentence."}
        ]
      }
    }
  }'

Available Models

Pass the full model identifier in the model parameter:
ModelBest ForCost
openai/gpt-4o-miniQuick responses, chat$
openai/gpt-4.1General purpose$$
openai/gpt-5.2Complex reasoning$$$
anthropic/claude-sonnet-4Coding, analysis$$$
google/gemini-2.5-proMultimodal, long context$$
deepseek/deepseek-r1Math, reasoning$$
Use the Quote response to see exact per-token pricing for any model before execution.

Parameters

ParameterTypeRequiredDescription
modelstringYesModel identifier (see table above)
messagesarrayYesConversation messages
max_tokensnumberNoMax output tokens (default: 1024)
temperaturenumberNoRandomness (0-2, default: 1)
streambooleanNoEnable streaming (default: false)

Response

{
  "result": {
    "content": "Quantum computing uses quantum bits...",
    "usage": {
      "prompt_tokens": 12,
      "completion_tokens": 45,
      "total_tokens": 57
    },
    "_meta": {
      "receipt_id": "rcpt_abc123",
      "debited_usdc_microusd": 2400
    }
  }
}

Streaming

For real-time responses, set stream: true in arguments. Streaming responses use Server-Sent Events.

Pricing

See Pricing for current rates.
Cost Optimization: Use lightweight models like openai/gpt-4o-mini for simple tasks and reserve frontier models for complex reasoning.