Wichtigste Erkenntnisse
- Local 7B models need more explicit guidance than GPT-4o. Longer prompts, clearer instructions.
- Chain-of-thought ("Let me think step by step") improves reasoning accuracy by 10β20%.
- Always specify output format (JSON, Markdown, plain text). Unstructured outputs are unpredictable.
- Few-shot examples (1β3) work better than zero-shot for local models. More examples = better consistency.
- Role definition ("You are a Python expert") improves domain-specific responses.
How Are Local Models Different?
| Aspect | GPT-4o | Local 7B Model |
|---|---|---|
| Context window | 128K tokens | β |
| Instruction following | Excellent | β |
| Few-shot learning | 1β2 examples sufficient | β |
| Reasoning complexity | Multi-step, implicit | β |
| Personality consistency | Highly consistent | β |
Chain-of-Thought: Make Models Reason
Chain-of-thought (CoT) prompting asks the LLM to show its reasoning step-by-step before answering.
Without CoT: "What is 17 Γ 24?" β Model answers directly, often wrong.
With CoT: "Solve this step-by-step: 17 Γ 24" β Model shows: 17 Γ 20 = 340, 17 Γ 4 = 68, total = 408. More accurate.
# Prompt with CoT
prompt = """
You will answer a question by thinking step-by-step.
Let me think about this:
Question: Why do local LLMs require more explicit prompting than cloud APIs?
Thinking:
1. First, consider the differences in model size...
2. Then, think about training data and fine-tuning...
3. Finally, consider the architecture and inference optimization...
Answer:
"""
# This guides the model to reason through the problemSpecifying Structured Output Formats
Local models produce unpredictable outputs unless you specify format explicitly.
Example: "Extract entities from the text" might return narrative text instead of a list.
Better: "Extract entities as JSON with keys: person, location, organization".
# Bad: ambiguous output
prompt = "Summarize this text"
# Good: explicit format
prompt = """
Summarize the text in EXACTLY 3 bullet points.
Format as a JSON list:
{
"summary": [
"- Point 1",
"- Point 2",
"- Point 3"
]
}
"""Role Definition and Persona Prompting
Telling the model to adopt a role improves domain-specific responses.
Examples:
- "You are a Python expert" β better code explanations
- "You are a medical researcher" β more detailed biomedical responses
- "You are a skeptical analyst" β more critical thinking
Few-Shot Learning for Consistency
Provide examples (few-shot) to guide the model's output style and format.
Local models benefit from 3β5 examples. Cloud models work with 1β2.
# Few-shot prompt
prompt = """
Classify sentiment. Examples:
"I love this product!" β positive
"Worst experience ever" β negative
"It's okay, nothing special" β neutral
Now classify: "This is amazing!"
Answer: """
# Model learns format and style from examplesCommon Prompt Engineering Mistakes
- Verbose prompts without structure. Rambling instructions confuse local models. Be concise and explicit.
- Not using chain-of-thought. CoT improves accuracy 10β20%. Always include for reasoning tasks.
- Assuming one prompt works for all. Iterate and test. Small wording changes cause large output changes.
- Ignoring output format. Without explicit format specification, outputs are unpredictable.
- Using vague role definitions. "You are an expert" is vague. "You are a Python expert with 10 years experience" is better.
Sources
- Chain-of-Thought Paper (Wei et al.) β arxiv.org/abs/2201.11903
- Prompt Engineering Guide β github.com/dair-ai/Prompt-Engineering-Guide