What Is Tree-of-Thought?
π In One Sentence
Tree-of-Thought prompting instructs a model to explore multiple reasoning branches, evaluate them, and select the best one before finalizing a response.
π¬ In Plain Terms
Instead of thinking step-by-step in a single direction, you ask the model to generate 3 different approaches, compare them, pick the best, and then execute it.
Tree-of-Thought (ToT) prompting instructs a language model to explore multiple possible reasoning paths β like branches of a decision tree β evaluate each one, and then select the best path before giving a final answer. Unlike chain-of-thought prompting, which follows a single linear reasoning path, ToT explicitly generates and compares alternatives. This makes it useful for strategy, planning, and complex decision-making where exploring multiple options leads to better outcomes.
The term comes from the 2023 paper by Yao et al. from Princeton and Google DeepMind: "Tree of Thoughts: Deliberate Problem Solving with Large Language Models" (NeurIPS 2023).
In simple terms: Chain-of-thought is like walking a single road and explaining your steps. Tree-of-Thought is like exploring a fork in the road, comparing both paths, and then committing to the one that makes more sense.
π Pro Tip
When using Tree-of-Thought, always specify the number of branches ("Generate exactly 3 approaches") and the evaluation criteria ("Compare on feasibility, cost, and time-to-market"). Without explicit criteria, the model tends to pick the first branch it generates.
What Is ReAct?
π In One Sentence
ReAct is the pattern of reasoning, taking an action, observing the result, and then adjusting your reasoning based on what you learned.
π¬ In Plain Terms
You ask the model to think about what it needs, take a specific action (like searching for information), see what it found, then decide what to do next based on the results.
ReAct (Reason + Act) is a prompting framework where the model alternates between reasoning steps ("thoughts") and actions (tool calls, searches, lookups). After each action, the model observes the result and updates its reasoning. This pattern is the foundation of modern AI agents β every time an AI tool searches the web, reads a file, or runs code, it's executing a ReAct loop. ReAct is also the pattern behind retrieval-augmented workflows β see RAG explained for how retrieval integrates with reasoning.
The pattern comes from the 2023 paper by Yao et al.: "ReAct: Synergizing Reasoning and Acting in Language Models" (ICLR 2023).
Manual ReAct format (for educational or explicit tracing):
```
Thought: What do I need to do first?
Action: search web, lookup database, run code, etc.
Observation: result of that action
Thought: Based on this result, what's my next step?
Action: next action
... (repeat until final answer)
Final Answer: synthesized conclusion
```
π Did You Know
Every time Claude Code edits a file, runs a test, and fixes an error based on the output, it's executing a ReAct loop. The Thought-Action-Observation pattern from the 2023 paper is now the backbone of autonomous AI coding tools used by millions of developers.
How They Differ
Chain-of-Thought (CoT) is a single linear reasoning path. You say "think step by step," and the model explains its logic from start to finish without branching or pausing to take actions.
Tree-of-Thought (ToT) branches reasoning. The model generates multiple paths, evaluates each one, and selects the best before finalizing.
ReAct interleaves reasoning with external actions. The model reasons, takes a concrete step (search, lookup, code execution), observes the result, and adjusts its reasoning accordingly.
Use Case Summary:
- CoT when: You need clear reasoning for a well-defined problem (math, logic, straightforward explanations)
- ToT when: You're exploring strategy, planning, or making a high-stakes decision where comparing alternatives matters
- ReAct when: You need to retrieve information, debug, or interact with tools or external systems
Comparison Table: CoT vs ToT vs ReAct
| Dimension | Chain-of-Thought (CoT) | Tree-of-Thought (ToT) | ReAct |
|---|---|---|---|
| Reasoning shape | Linear (single path) | Branching (multiple paths β select best) | Linear with tool loops |
| Core action | "Think step by step" | "Explore 3 approaches, evaluate, choose" | "Reason β Act β Observe β Repeat" |
| External tools? | No | No (internal reasoning only) | Yes β search, APIs, code execution |
| Token cost vs baseline | ~1.5-2Γ | ~2-5Γ | Variable (depends on tool calls) |
| Best for | Math, logic, explanations | Strategy, planning, creative exploration | Research, debugging, fact-checking |
| 2026 model support | All models | Best with reasoning models (Opus 4.7, o3) | Built into all frontier models via tool use |
| Manual prompting needed? | Yes (on non-reasoning models) | Yes (explicit branching structure helps) | No (native tool use), unless on open-weights |
How to Write a Tree-of-Thought Prompt
- 1State the problem and the number of branches explicitly. Example: "Generate exactly 3 approaches to problem." Being specific about the branch count helps the model explore systematically.
- 2Specify evaluation criteria before asking the model to select. Example: "Compare them on feasibility, cost, and implementation time." Define what makes one branch better than another.
- 3Have the model evaluate each branch. Ask it to score or rank the approaches: "For each approach, list the pros, cons, and risk factors."
- 4Add a selection instruction. Example: "Select the approach that best balances your criteria. Explain your choice in 2 sentences."
- 5Complete the task with the selected branch. Once the model commits to a path, have it execute with full reasoning: "Now, provide step-by-step instructions for implementing selected approach."
How to Write a ReAct Prompt
For explicit ReAct tracing (useful for education, debugging, or when you want to see every step), use this manual format:
```
Thought: What information do I need to answer this question?
Action: search for X topic, lookup Y in the database, run Z command
Observation: result of the action β paste actual data or output here
Thought: Based on this result, what's my next step?
Action: next action
Observation: result
... (repeat as needed)
Final Answer: synthesized conclusion based on all observations
```
For frontier models with native tool use (GPT-4o, Claude Opus 4.7/Sonnet 4.6, Gemini 3.1 Pro), you don't need to format this manually. Just state what you want to do: "Research the 2026 AI model landscape and compare GPT-4o, Claude Opus 4.7, and Gemini 3.1 Pro." The model will call tools automatically, observe results, and continue reasoning.
ReAct in 2026: From Prompting Pattern to Built-In Behavior
The original ReAct paper (2023) proposed the Thought-Action-Observation loop as a prompting format β a technique to structure how you instruct a model to reason and act. In 2023β2024, users had to format this manually in their prompts.
In 2026, all frontier models implement the ReAct loop automatically via native tool use / function calling. When you ask GPT-4o, Claude Opus 4.7, Gemini 3.1 Pro, or Claude Sonnet 4.6 to research a topic, run code, or look something up, the model decides when to call a tool, receives the result, and continues reasoning β no manual `Thought: / Action: / Observation:` formatting is needed. The loop happens under the hood.
When manual ReAct formatting still matters:
- Open-weights models without native tool use (e.g., LLaMA 4, Mistral, older Qwen variants). These models don't have built-in function calling, so explicit ReAct formatting can improve structured reasoning.
- Educational/debugging contexts where you want to see the full reasoning trace and every step the model takes.
- Simulated scenarios where you're setting up a mock environment with no real APIs connected.
Agentic coding as productionized ReAct: Claude Code, OpenAI Codex, and Cursor are large-scale ReAct loops. The agent reasons about what code needs to be written, edits a file, runs tests, observes the results, and fixes errors β all automatically. This is the ReAct loop at production scale.
π Warning
Tree-of-Thought prompts can generate 3-5Γ the output tokens of a standard prompt because the model writes out multiple branches before selecting one. At $25/1M output tokens (Claude Opus 4.7), a complex ToT prompt that generates 5,000 tokens costs ~$0.125 per run. Budget accordingly for high-volume use.
Tree-of-Thought and ReAct in Agentic Systems
Claude Code / OpenAI Codex / Cursor are productionized ReAct: the agent reasons about what needs to be coded β writes code β runs tests β observes errors β fixes and iterates. Multi-hour autonomous coding sessions are extended ReAct loops operating at scale.
Research agents (Perplexity, Deep Research features in Claude/ChatGPT) use ReAct: formulate question β search web β read results β synthesize answer β search again if needed.
Claude Managed Agents (launched 2026) are a fully managed ReAct harness with secure sandboxing, tool management, and built-in loop handling.
ToT in agentic planning: Some advanced agent frameworks use ToT at the planning stage β propose multiple high-level strategies, evaluate feasibility, then execute the best one via ReAct loops at each step.
MCP (Model Context Protocol) standardizes tool connections, making ReAct-style agent loops plug-and-play. You connect a tool once, and any ReAct-capable model can use it.
Prompt Examples
β Generic (no structure)
Come up with three ways to improve our customer retention. Which is best?
β ToT with explicit criteria
Generate exactly 3 strategies to improve customer retention. For each strategy, evaluate it on: (1) implementation difficulty (1-5 scale), (2) expected impact on retention (%, 6 months), and (3) cost to implement. Then, select the strategy that best balances impact and feasibility. Explain your choice.
β No explicit actions
What is the latest research on transformer scaling laws?
β ReAct structure (search, observe, synthesize)
I need to understand transformer scaling laws as of 2026. Please: (1) Search for recent papers or benchmarks on scaling laws, (2) Look for data on model size vs performance trade-offs, (3) Find information on training cost vs inference cost relationships. After gathering information, summarize the key findings.
Token Cost
Tree-of-Thought uses significantly more tokens than linear chain-of-thought because the model generates multiple branches before selecting one. Expect 2-5Γ the output tokens of a standard CoT prompt.
Example: A simple CoT prompt might generate 500 output tokens. A ToT prompt that explores 3 branches might generate 3 Γ 500 = 1,500 tokens, then maybe 200 more for the final synthesis. Total: ~1,700 output tokens.
Cost at 2026 pricing (Claude Opus 4.7: $25/1M output tokens):
- Simple CoT: 500 tokens Γ $25 / 1M = $0.0125
- Complex ToT: 5,000 tokens Γ $25 / 1M = $0.125
For high-volume use, reserve ToT for strategic, high-stakes decisions where exploring alternatives is worth the cost. For routine tasks, linear CoT or single-pass prompting is more efficient.
ReAct cost is variable based on the number of tool calls. Each action/observation round adds tokens, but the work may be worth it if the external data significantly improves the answer.
How to Get Started
- 1For strategy and planning β use Tree-of-Thought. You're making a high-stakes decision (product roadmap, investment, system architecture). Explicitly ask the model to generate 3 approaches, evaluate them on your criteria, and select the best. The extra tokens are worth the structured thinking. For a related branching technique that votes across paths instead of evaluating, see self-consistency prompting.
- 2For research, debugging, or fact-finding β use ReAct or native tool use. Ask the model to look things up, observe the results, and synthesize. On frontier models (GPT-4o, Claude Opus 4.7, Gemini 3.1 Pro), native tool use handles ReAct automatically. On open-weights, you can structure it manually if needed.
- 3Combine both techniques. Use ToT at the planning stage: "Generate 3 strategies for X. For each, list the steps needed." Then use ReAct within the chosen strategy: "For strategy selected, research the following: question 1, question 2. Observe results, then execute." For simpler multi-step tasks that don't need branching or tool use, prompt chaining is lighter and cheaper.
- 4Test both on your use case in PromptQuorum. Compare how GPT-4o, Claude Opus 4.7, Gemini 3.1 Pro, and Mistral Large handle your specific ToT or ReAct prompt. You'll see which model's reasoning style fits your task best.
Common Mistakes
β Using ToT for simple tasks
Why it hurts: ToT adds 2-5Γ token cost. For a "summarize this email" task, linear chain-of-thought is faster, cheaper, and equally accurate. ToT only pays off when the problem genuinely has multiple viable paths.
Fix: Test with chain-of-thought first. If accuracy is >90%, don't upgrade to ToT.
β Asking for too many branches
Why it hurts: "Generate 10 approaches" overwhelms the model's ability to evaluate meaningfully. Beyond 5 branches, quality of evaluation drops and the model may start generating filler options.
Fix: 3-5 branches is the sweet spot. For complex problems, use 3. For creative brainstorms, use 5.
β ReAct without real tools
Why it hurts: Simulated ReAct (where the model imagines action results) is weaker than real ReAct (where the model calls actual APIs/tools). Simulated actions still hallucinate data.
Fix: For production ReAct, use an agent framework (LangChain, CrewAI) with real tool bindings. Simulated ReAct is fine for exploration and prototyping.
β No evaluation criteria in ToT
Why it hurts: "Pick the best approach" without criteria means the model picks randomly or by default preference. Without explicit criteria, the branching evaluation step is meaningless.
Fix: Specify 3-5 evaluation criteria: "Evaluate each branch on feasibility (1-5), cost (1-5), time-to-implement (1-5). Choose the highest total score."
β Combining ToT + ReAct on every problem
Why it hurts: The combination is powerful but expensive and slow. Most problems need one technique, not both.
Fix: Use ToT for "which strategy" problems. Use ReAct for "find information and reason" problems. Combine only when you need both: "which strategy, and each strategy needs data to evaluate."
β Not specifying branch selection criteria in ToT
Why it hurts: Models often stop after generating branches but don't clearly state why they're choosing one over the others. Implicit selection is weak and hard to audit.
Fix: Require explicit reasoning: "After evaluating each branch, state: Branch A scores X on criterion Y because reason. Final choice: Branch Z because total score and rationale."
β Using ReAct without observation loops
Why it hurts: Model reasons, takes an action, then immediately continues without pausing to observe the result. This loses the benefit of real-world feedback.
Fix: Enforce the loop: "After each action, STOP and state: Observation: what you learned. Updated reasoning: how this changes your approach. Next action: what you'll do differently."
β Allowing ToT branches to drift into off-topic exploration
Why it hurts: Without clear constraints, the model may generate imaginative but irrelevant branches that don't help solve the original problem.
Fix: Set branch boundaries: "Generate 3 approaches to specific problem. Each approach must directly address constraint. Do not explore tangential ideas or side effects."
β Using the same number of branches for every problem
Why it hurts: Simple problems with 3 branches may show one dominant option and waste token budget. Complex problems with only 2 branches may miss important alternatives.
Fix: Match branch count to problem complexity: 2 branches for binary decisions, 3 for typical problems, 4-5 for open-ended creative work, 1 (just CoT) for simple tasks.
Using ToT and ReAct in PromptQuorum
PromptQuorum lets you test Tree-of-Thought and ReAct patterns side by side across GPT-4o, Claude Opus 4.7, Claude Sonnet 4.6, Gemini 3.1 Pro, and open-weights models like Mistral Large and LLaMA 4.
Write a ToT or ReAct prompt once, and PromptQuorum will send it to all models simultaneously. Watch how each one interprets the branching structure or the action-observation loop. Some models (like Claude Opus 4.7) naturally ask clarifying questions when reasoning through branches. Others (like GPT-4o) tend to be more direct. Seeing the differences helps you refine your prompting for specific use cases.
Example workflow:
1. Write a ToT prompt: "Generate 3 ways to optimize a database query. Evaluate on speed, complexity, and maintainability."
2. Send to GPT-4o, Claude Opus 4.7, and Gemini 3.1 Pro via PromptQuorum.
3. Compare results. Which model explored the most branches? Which explanation was clearest? Which trade-off analysis was most useful?
4. For your next iteration, you now know which model and which tone works best for your team.
Frequently Asked Questions
What is Tree-of-Thought prompting?
Tree-of-Thought (ToT) prompting instructs a model to explore multiple reasoning paths β like branches of a decision tree β evaluate each one, and then select the best path before giving a final answer. Unlike linear chain-of-thought, ToT explicitly generates and compares alternatives.
What is ReAct prompting?
ReAct (Reason + Act) is a prompting framework where the model alternates between reasoning steps ("thoughts") and actions (tool calls, searches, lookups). After each action, the model observes the result and updates its reasoning. This pattern is the foundation of modern AI agents.
How is Tree-of-Thought different from chain-of-thought?
Chain-of-thought follows a single linear reasoning path. Tree-of-Thought branches into multiple paths, evaluates them, and selects the best one. Think of CoT as walking a single road vs. ToT as exploring a fork in the road before choosing which way to go.
Do I still need to format ReAct manually in 2026?
For frontier models with native tool use (GPT-4o, Claude Opus 4.7, Gemini 3.1 Pro), no. These models implement the Reason-Act-Observe loop automatically via function calling APIs. Manual Thought: / Action: / Observation: formatting is still useful for open-weights models without tool use, for educational purposes, or for simulated scenarios.
Can I combine Tree-of-Thought and ReAct?
Yes. Use ToT at the strategic level to explore and compare multiple high-level approaches, then use ReAct within the chosen branch to execute steps that require tool interactions or data lookups. This is common in complex planning tasks.
Which models handle Tree-of-Thought best?
Models with extended thinking / reasoning modes handle ToT most naturally: Claude Opus 4.7 (extended thinking), GPT-4o (reasoning mode), and Gemini 3.1 Pro (Deep Think). These models can internally explore multiple branches without explicit prompt-level ToT formatting β though explicit ToT prompting can still improve structure.
What are real-world applications of ReAct?
Every modern AI agent is a ReAct loop: Claude Code (reason about code β edit β run tests β observe β iterate), research assistants (reason about question β search web β read results β synthesize), customer support bots (reason about query β look up knowledge base β draft response β verify). The pattern scales from simple lookups to multi-hour autonomous coding sessions.
How does Tree-of-Thought affect token cost?
ToT uses significantly more tokens than linear CoT because the model generates multiple branches before selecting one. Expect 2-5Γ the output tokens of a standard CoT prompt. At $25/1M output tokens (Claude Opus 4.7), a complex ToT prompt that generates 5,000 tokens costs ~$0.125 per run. Budget accordingly for high-volume use.
Sources & Further Reading
- Yao, S., Yu, D., Zhao, J., et al. (2023). "Tree of Thoughts: Deliberate Problem Solving with Large Language Models." NeurIPS 2023. arXiv:2305.10601
- Yao, S., Zhao, J., Yu, D., et al. (2023). "ReAct: Synergizing Reasoning and Acting in Language Models." ICLR 2023. arXiv:2210.03629
- Wei, J., Wang, X., Schuurmans, D., et al. (2022). "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models." NeurIPS 2022. arXiv:2201.11903
- Shinn, N., Cassirer, A., Goyal, A., et al. (2023). "Reflexion: Language Agents with Verbal Reinforcement Learning." arXiv:2303.11366
- Anthropic. (2026). "Tool Use β Claude API Documentation." Retrieved from https://docs.anthropic.com
- OpenAI. (2026). "Function Calling β Responses API." Retrieved from https://platform.openai.com/docs