Table of contents
Official Content
  • This documentation is valid for:

This article explains how to enable and control the reasoning features available in three leading Large Language Model (LLM) providers—OpenAI, Anthropic, and Google Vertex AI—when calling the chat/completions endpoint. For each provider, you will find an overview of the relevant parameters and a cURL example that follows the standard request format used in GEAI projects.

Providers

1. OpenAI

OpenAI’s o-series reasoning models expose an optional reasoning_effort parameter that lets you choose how many tokens the model may devote to internal reasoning. This gives you explicit control over latency and cost.

Allowed values Effect
low Minimal extra reasoning (fastest, cheapest)
medium Balanced reasoning vs. speed/cost
high Maximum reasoning effort (slowest, highest cost)

Sample cURL

curl -X POST "$BASE_URL/chat/completions" \
  -H "Authorization: Bearer $SAIA_PROJECT_APITOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/o4-mini",
    "messages": 
      {
        "role": "user",
        "content": "Could you explain and compare in detail the relationships between category theory and homotopy theory in modern algebraic topology, describe concrete applications to the classification of fibrations, then outline how these ideas extend to theoretical physics—especially quantum field theory and quantum gravity—considering AdS/CFT correspondence and renormalisation methods in non‑commutative geometry, and finally discuss the philosophical consequences of these advances on the unity of physical laws?"
      }
    ,
    "stream": false,
    "temperature": 1,
    "max_completion_tokens": 100000,
    "reasoning_effort": "high"
}'

2. Anthropic

Claude 3.7 Sonnet offers an Extended Thinking mode. Enable it by adding a top‑level thinking object with a budget_tokens field that sets the token budget for reasoning.

  • Range: 0 to ≈ 63000 (Claude’s max output tokens =  64 000).
  • Note: Above 32000 tokens, the model may not consume the entire budget.

Sample cURL

curl -X POST "$BASE_URL/chat/completions" \
  -H "Authorization: Bearer $SAIA_PROJECT_APITOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-3-7-sonnet-latest",
    "max_tokens": 20000,
    "thinking": {
        "type": "enabled",
        "budget_tokens": 16000
    },
    "messages": 
      {
        "role": "user",
        "content": "Are there infinitely many prime numbers that leave a remainder of 2 when divided by 3?"
      }
    
}'

3. Google Vertex AI

Google’s latest model, Gemini 2.5 Flash Preview, also supports a reasoning budget via a thinking object.

  • Range: 1 to 24 000 tokens.

Sample cURL

curl -X POST "$BASE_URL/chat/completions" \
  -H "Authorization: Bearer $SAIA_PROJECT_APITOKEN" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "vertex_ai/gemini-2.5-flash-preview-04-17",
  "messages": 
    {
      "role": "user",
      "content": "A farmer buys 30 animals consisting only of chickens and cows, and when he counts their legs he gets 74—how many of the animals are cows?"
    }
  ,
  "thinking": {
      "type": "enabled",
      "budget_tokens": 10000
  },
  "stream": false,
  "temperature": 0.1
}'
Note: Although Gemini 2.5 Pro Preview is also a reasoning‑capable model, its reasoning feature is currently fixed and cannot be enabled, disabled, or tuned.

 

Availability

Since April 2025 release.

Last update: March 2025 | © GeneXus. All rights reserved. GeneXus Powered by Globant