AI value generation
A transformer
Reads text as a sequence of tokens and figures out which earlier parts matter most for the next part. The key trick is attention — the model can “look at” different words in the input and weigh how relevant they are to each other. The core ideas are:
Tokens: text split into small pieces Embeddings: tokens turned into vectors Attention / self-attention: each token checks which other tokens matter Layers: this process repeats many times, building richer understanding Next-token prediction: the model learns by predicting what comes next
Reasoning models are LLMs tuned to do more deliberate multi-step thinking before answering.
Better at math, logic, coding etc.
In practice, “reasoning model” can involve a mix of:
training methods that reward stepwise problem solving inference-time techniques that allocate more compute tool use, verification, or self-checking architectures or prompting patterns that improve multi-step accuracy
How does LLM decide whether problem need extra internal computation
- The system or product chooses the mode (router)
- The model learns patterns that correlate with hard problems
- It may generate internal “deliberation signals”. If early internal passes show uncertainty, conflict, or many constraints, it may continue spending compute.
Gemini 3 flash thinking latency is 7 seconds, whereas non-thinking latency is 1 seconds.
GPT OSS 120 billion is good, and available on VertexAI and Amazon Bedrock.
SELECT
runname,
ROUND(total_score::numeric, 2) AS total_score,
model_size,
to_timestamp(model_release_date / 1000)::date AS model_release_date
FROM wandb_llm
LIMIT 10;
| name | thinking | intelligence_index | price_usd | speed | latency_ms |
|---|---|---|---|---|---|
| MiMo-V2-Flash (Feb 2026) | true | 41 | 0.15 | 144 | 1.99 |
| gpt-oss-120B (high) | true | 33 | 0.26 | 282 | 0.78 |
| Qwen3.5 9B | true | 32 | 0.11 | 63 | 0.61 |
| Mistral Small 4 | true | 27 | 0.26 | 135 | 0.62 |
| gpt-oss-120B (low) | true | 24 | 0.26 | 287 | 0.75 |
| gpt-oss-20B (high) | true | 24 | 0.09 | 297 | 0.67 |
| NVIDIA Nemotron 3 Nano | true | 24 | 0.10 | 165 | 1.49 |
| gpt-oss-20B (low) | true | 21 | 0.09 | 302 | 0.66 |