Skip to main content

ASG Inference

Access state-of-the-art AI models with per-token pricing and instant access.

Overview

ASG Inference provides:
  • 100+ models — From fast to frontier
  • OpenAI-compatible — Drop-in replacement
  • Per-token billing — Pay exactly for usage
  • Automatic fallback — Reliability across providers

Quick Example

curl -X POST https://agent.asgcompute.com/mcp \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "tools/call",
    "params": {
      "name": "inference_chat",
      "arguments": {
        "model": "asg-fast",
        "messages": [
          {"role": "user", "content": "Explain quantum computing in one sentence."}
        ]
      },
      "_meta": {
        "payment": {"tx_signature": "<your-signature>"}
      }
    }
  }'

Model Tiers

TierBest ForLatencyCost
asg-fastQuick responses, chat~500ms$
asg-balancedGeneral purpose~1s$$
asg-powerfulComplex reasoning~2s$$$
asg-visionImage understanding~2s$$$

Parameters

ParameterTypeRequiredDescription
modelstringYesModel tier to use
messagesarrayYesConversation messages
max_tokensnumberNoMax output tokens (default: 1024)
temperaturenumberNoRandomness (0-2, default: 1)
streambooleanNoEnable streaming (default: false)

Response

{
  "result": {
    "content": "Quantum computing uses quantum bits...",
    "usage": {
      "prompt_tokens": 12,
      "completion_tokens": 45,
      "total_tokens": 57
    },
    "_meta": {
      "receipt_id": "rcpt_abc123",
      "debited_usdc_microusd": 8500
    }
  }
}

Streaming

For real-time responses, enable streaming:
const stream = await client.callTool('inference_chat', {
  model: 'asg-fast',
  messages: [...],
  stream: true
});

for await (const chunk of stream) {
  process.stdout.write(chunk.content);
}

Pricing

See Pricing for current rates.
Cost Optimization: Use asg-fast for simple tasks and reserve asg-powerful for complex reasoning.