PromptQuorumPromptQuorum
Home/Prompt Engineering/From GPT-2 to Today: How Prompt Engineering Evolved
Fundamentals

From GPT-2 to Today: How Prompt Engineering Evolved

Β·10 min readΒ·By Hans Kuepper Β· Founder of PromptQuorum, multi-model AI dispatch tool Β· PromptQuorum

The history of prompt engineering from GPT-3 and few-shot prompting in 2020 to context design in 2026. Key milestones, papers, and turning points.

How Prompt Engineering Evolved: A Short Overview

Prompt engineering evolved from informal trial-and-error text manipulation around GPT-3 in 2020 to a structured discipline with named techniques, frameworks, and tools by 2026. The arc spans five phases: early few-shot experiments, the ChatGPT moment that brought the skill into mainstream awareness, the development of structured reasoning techniques, the rise of automated prompt optimisation, and the current shift toward context design.

The discipline did not emerge from a single paper or company. It grew from the overlap between research (few-shot learning, chain-of-thought reasoning, RAG), practitioner communities sharing prompt collections online, and the sudden public availability of powerful models that made good prompting immediately rewarding. By 2026, prompt engineering is no longer a niche trick β€” it is a baseline skill for anyone working with AI systems.

Key Takeaways

  • 2019–2020: GPT-2 and early transformers β€” prompts were inputs, not a discipline
  • 2020: GPT-3 and Brown et al. introduced few-shot prompting as a paradigm (Liu et al. Pre-train, Prompt, and Predict, arXiv:2107.13586) shift
  • 2022: Chain-of-Thought reasoning prompts turned prompting into a structured skill
  • Late 2022: ChatGPT brought prompt engineering into mainstream awareness and job postings
  • 2023: GPT-4, multimodal prompting, and frameworks formalised best practices
  • 2024–2026: Context design, automated prompting, and open-source LLMs redefined the field

Before Prompt Engineering Had a Name (Pre-2020)

Before the term "prompt engineering" existed, researchers were already manipulating model inputs to elicit better outputs β€” they just did not call it that. Early transformer models like GPT-2 (2019, OpenAI) and BERT (2018, Google) and the foundational Vaswani et al. Attention Is All You Need (arXiv:1706.03762, 2017) were used through carefully chosen input text, but the practice was treated as part of data preprocessing, not a skill in its own right.

GPT-2, released in February 2019, was a 1.5-billion-parameter model that could complete text in surprisingly coherent ways. Researchers and early practitioners noticed that the phrasing of an input dramatically changed the quality of the completion β€” but there was no framework, no terminology, and no community built around this observation yet. Prompts were inputs, not engineering artifacts.

2020: GPT-3 and the Few-Shot Breakthrough

The modern history of prompt engineering effectively begins with GPT-3. In May 2020, OpenAI released GPT-3, a 175-billion-parameter model, alongside the landmark paper by Brown et al., "Language Models are Few-Shot Learners" Brown et al., 2020 – Language Models are Few-Shot Learners. The paper demonstrated that by including a few examples of the desired task directly in the prompt β€” without any weight updates to the model β€” performance on downstream tasks improved dramatically.

This was the seed of prompt engineering as a discipline. Researchers and developers realised that the same model could be turned into a translator, a summariser, a code generator, or a question-answering system simply by changing how the prompt was written. The model did not need retraining β€” it needed a better prompt. That insight reframed what a prompt was: not just an input, but a design artifact.

Brown et al. reported that few-shot performance scaled consistently with model size: the 175B GPT-3 model substantially outperformed smaller variants across every benchmark tested, establishing that scale and prompt-based learning were directly linked. This made the quality of the prompt a variable that practitioners β€” not just researchers β€” could control.

See Zero-Shot vs. Few-Shot: Which Approach Gets Better Results? for a practical guide to the technique GPT-3 made famous.

2021–Early 2022: From Prompt Tricks to a Recognised Skill

Between 2021 and early 2022, prompt crafting moved from research papers into practitioner communities. GitHub repositories with curated prompt collections appeared β€” "awesome-prompts" style lists that shared what worked for coding assistance, summarisation, and creative writing. Prompt collections, shared on Twitter and Reddit, became community assets. The Prompt Engineering Guide (promptingguide.ai) Prompt Engineering Guide – promptingguide.ai became one of the first dedicated references cataloguing techniques systematically.

The term "prompt engineering" began appearing more frequently in research papers, blog posts, and job descriptions through this period. OpenAI's InstructGPT paper (Ouyang et al., 2022) introduced RLHF-tuned models that responded far more reliably to natural-language instructions β€” making prompt quality even more consequential. By mid-2022, it was clear that this was a transferable skill, not just a researcher's curiosity.

2022: Chain-of-Thought and Reasoning Prompts

The introduction of Chain-of-Thought (CoT) prompting in 2022 was the most significant technical development in the discipline's short history. Wei et al. (Google Brain) published "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models", demonstrating that asking a model to reason step by step before answering dramatically improved performance on arithmetic, commonsense reasoning, and symbolic reasoning tasks. In one headline result, chain-of-thought prompting improved PaLM's accuracy on the GSM8K grade-school maths benchmark from 17.9% to 58% β€” a gain achieved purely by changing the prompt structure, with no additional model training. The implication was profound: the structure of the prompt could activate different reasoning behaviour β€” not just different facts.

Related techniques followed quickly. Zhou et al. introduced least-to-most prompting, which decomposed complex problems into a sequence of simpler sub-problems solved in order. These approaches turned prompt engineering from a formatting exercise into a tool for eliciting structured reasoning from models that had not been explicitly trained to reason that way. Prompting had become scaffolding for cognition.

For the full technique guide, see Chain-of-Thought Prompting: Make AI Show Its Reasoning and Prompt Chaining: How to Break Big Tasks Into Winning Steps.

Late 2022–2023: The ChatGPT Moment and the Prompt Engineer Job Title

The release of ChatGPT on November 30, 2022, changed the public profile of prompt engineering overnight. ChatGPT reached one million users within its first five days β€” confirmed by OpenAI CEO Sam Altman on Twitter in December 2022 β€” and 100 million monthly active users by January 2023, according to a UBS analysis cited by Reuters. Within days, millions of people were experimenting with prompts and discovering that their results varied enormously based on how they phrased requests. Tech media covered "prompt engineering" as a skill worth learning. The Oxford English Dictionary added "prompt" as a verb related to AI in 2023, and the word itself became a runner-up for word of the year in multiple rankings.

By early 2023, "prompt engineer" appeared as a job title with reported salaries of $175,000–$335,000 at companies including Anthropic, according to widely cited job postings. The role attracted significant media attention β€” Bloomberg, The Guardian, and The Atlantic all covered whether prompt engineering was a real career. The consensus at the time: it was a transitional role, part human-computer interface design, part subject-matter expertise, part quality assurance.

The popularisation of the phrase "prompt engineering" is sometimes attributed to various practitioners and commentators. Richard Socher, former Chief Scientist at Salesforce, is mentioned in some commentary as having helped frame the idea early. The Wikipedia article on prompt engineering Prompt Engineering – Wikipedia provides a balanced overview of competing claims about the term's origins.

2023: GPT-4, Multimodal Prompting and Frameworks

The release of GPT-4 in March 2023 expanded prompt engineering in two directions simultaneously: larger context windows (up to 128K tokens in later versions) and multimodal inputs. Practitioners could now include images in prompts alongside text, opening prompt engineering to visual tasks β€” describing images, comparing diagrams, annotating charts. Early Gemini models from Google and multimodal Claude versions from Anthropic followed within months.

The same year saw the formalisation of prompt engineering best practices. OpenAI published its official prompt engineering guide OpenAI – Best Practices for Prompt Engineering. Google Cloud released its own prompt engineering documentation Google Cloud – Prompt Engineering for AI Guide. Independent authors codified frameworks β€” CRAFT, CO-STAR, SPECS, RISEN, TRACE β€” that gave practitioners repeatable templates for structuring prompts, reducing the reliance on trial and error.

These frameworks represented the maturation of prompt engineering from a personal skill into a teachable, shareable practice. See Which Prompt Framework Should You Use? for a guide to choosing between them, and Beyond Text: How to Prompt with Images for the multimodal dimension.

2023–2024: Automated Prompt Engineering and RAG

A striking development in 2023 was research showing that LLMs could optimise prompts as well as humans could. Zhou et al. published "Large Language Models Are Human-Level Prompt Engineers" (APE), demonstrating that an LLM tasked with generating and evaluating prompt candidates could match or exceed human-written prompts on benchmark tasks. Stanford's DSPy framework (2023) took this further β€” allowing developers to describe what a prompt should accomplish and letting the system optimise the wording automatically.

Simultaneously, Retrieval-Augmented Generation (RAG) β€” originally introduced by Lewis et al. at Meta in 2020 β€” became a central pattern in production AI systems. RAG injected retrieved documents directly into the prompt context, grounding model outputs in real, up-to-date sources rather than requiring prompts to contain all the necessary facts. This shifted the emphasis in prompt engineering from "how do I make the model know this?" to "how do I structure the context so the model uses this correctly?"

See RAG Explained: How to Ground AI Answers in Real Data and Self-Consistency Prompting: Let the AI Check Its Own Work for coverage of the key techniques from this period.

2024–2025: From Prompt Engineering to Context Design

By 2024, a new framing began to displace the simple idea of "write a better prompt." Practitioners and researchers started referring to context engineering β€” the practice of orchestrating what goes into the full context window: the system prompt, retrieved documents, tool outputs, conversation history, and user input, all composed deliberately to guide model behaviour. The prompt was no longer a standalone artifact; it was one layer in a designed context.

Several developments accelerated this shift. Meta's Llama 3-class models (2024) made capable open-source LLMs available for private deployment, shifting some prompt engineering from cloud APIs to local infrastructure. Context windows grew to 1 million tokens or more (Gemini 1.5 Pro), making it practical to inject entire codebases, books, or document collections into a single prompt. Multi-agent frameworks like LangChain and AutoGen turned prompting into orchestration β€” one prompt triggers another model, which triggers a tool, which returns context to the next prompt.

See Prompt Engineering from 2020 to 2025 – AI Supremacy and The Evolution of Prompt Engineering to Context Design – 2026 for external perspectives on this transition.

2026 and Beyond: Prompt Engineering as a Core Literacy

As of 2026, research and commentary increasingly describe prompt engineering not as a niche job title, but as a fundamental literacy skill for knowledge workers who use AI tools. Academic papers like "Prompt Engineering as a New 21st Century Skill" Prompt engineering as a new 21st century skill – Frontiers frame structured prompting alongside reading, writing, and computation as a baseline competency for working with generative AI systems.

The role has split into two distinct tracks. The first is system and context design β€” the engineering of production AI systems where prompts form part of a larger architecture involving retrieval, agents, and evaluation pipelines. The second is everyday use β€” the ability to write clear, structured prompts that produce useful outputs without knowing the underlying architecture. Both tracks benefit from the same core principles: clear task specification, appropriate context, constraints, and output format.

What has not changed, despite more capable models and automated tools, is the fundamental principle: the clearer and more structured the input, the more reliable and useful the output. The techniques, terminology, and tooling have matured, but the core insight from the GPT-3 era remains true in 2026.

Timeline: Key Milestones in Prompt Engineering

The table below summarises the key milestones from 2018 to 2026 β€” the events, papers, and model releases that shaped how prompt engineering evolved into its current form.

YearMilestoneWhy It Matters
2018–2019BERT (Google) and GPT-2 (OpenAI) releasedDemonstrated transformer models could be guided by input phrasing β€” but no formal discipline yet
2020GPT-3 and Brown et al. "Language Models are Few-Shot Learners"Established few-shot prompting as a paradigm: rewriting the prompt changes the model's behaviour without retraining
2022 (Jan)InstructGPT / RLHF (Ouyang et al., OpenAI)Models trained to follow instructions β€” made prompt quality far more consequential
2022 (May)Chain-of-Thought prompting (Wei et al., Google Brain)Proved that prompt structure could elicit step-by-step reasoning β€” turned prompting into a cognitive scaffold
2022 (Nov)ChatGPT launchBrought prompt engineering into mainstream awareness; millions began experimenting overnight
2023 (Q1)"Prompt engineer" job title reaches $300K+ salary postings; OED adds prompt as a verbDefined prompt engineering as a recognised profession and named skill
2023 (Mar)GPT-4 release; multimodal prompting with imagesExtended prompt engineering beyond text to visual inputs and large context windows
2023Frameworks formalised: CRAFT, CO-STAR, SPECS, RISEN; official guides from OpenAI and GoogleTurned prompt engineering from personal craft into teachable, shareable practice
2023–2024APE paper (Zhou et al.) and DSPy framework β€” AI-optimised promptsLLMs shown to write prompts as well as humans; automated prompt optimisation became practical
2024Llama 3-class models; context windows exceed 1M tokens (Gemini 1.5 Pro)Open-source LLMs for private deployment; massive context shifted focus to context engineering
2025–2026Context design and multi-agent orchestration replace simple prompt tweakingPrompting becomes one layer in a composed context β€” system-level thinking required

How the History Shapes Today's Best Practices

Each phase of prompt engineering's evolution left a lasting deposit in current practice. The GPT-3 era gave us the core insight that model behaviour is shaped by input structure β€” not just content. The Chain-of-Thought era gave us explicit reasoning scaffolds: step-by-step prompting, prompt chaining, and tree-of-thought approaches. The framework era gave us reusable templates that encode best practices without requiring each practitioner to discover them from scratch.

The RAG and context-design era gave us the understanding that prompts do not exist in isolation β€” they are composed with retrieved data, system instructions, and tool outputs to form a full context. And the automated-prompting era reminded us that the principles of good prompting are measurable: better-structured prompts produce better outputs in ways that can be evaluated and optimised systematically.

FAQ: The Evolution of Prompt Engineering

Who first coined the term "prompt engineering"?

The exact origin is debated. The term appeared in research contexts as early as 2021 and gained wider use through 2022. Richard Socher is mentioned in some commentary as having helped frame the concept publicly, though no single person is credited with inventing it. The Wikipedia article on prompt engineering Prompt Engineering – Wikipedia provides a balanced overview of the competing claims.

Why did prompt engineering explode in popularity after ChatGPT?

ChatGPT was the first general-purpose AI model that millions of non-researchers could use immediately, for free, without writing code. The gap between a well-crafted prompt and a vague one was visible and immediately consequential β€” better prompts produced usably better outputs. That feedback loop, experienced simultaneously by millions of people, turned prompt engineering from a research concept into a mass skill.

How did research papers influence real-world prompting techniques?

The transfer was unusually fast for AI research. Chain-of-Thought prompting (Wei et al., 2022) went from academic paper to widely used practitioner technique within months, partly because it required no tooling β€” just a change in how you wrote the prompt. Few-shot prompting from the GPT-3 paper (Brown et al., 2020) was immediately adoptable by anyone with API access. The accessibility of the techniques accelerated their spread.

Is prompt engineering becoming less important as models improve?

No β€” more capable models respond better to well-structured prompts, not less. The gains from good prompting increase as the model becomes more capable of following precise instructions. What has changed is the level of prompt engineering required for simple tasks: conversational questions now require less crafting than they did in 2021. But for complex, production-grade outputs, structured prompting remains the most reliable lever available.

What is the difference between prompt engineering and context engineering?

Prompt engineering typically refers to designing the text input to a model to improve its output. Context engineering is a broader, more recent concept that refers to orchestrating everything in the model's context window: the system prompt, retrieved documents, conversation history, tool outputs, and user input β€” all composed deliberately. Context engineering treats the prompt as one component in a designed system, not a standalone artifact.

Will automated tools replace the need to understand prompt engineering?

Automated tools like DSPy can optimise prompt wording within defined objectives, but they require a human to specify what the objective is, what constraints apply, and how to evaluate success. Understanding prompt engineering principles remains necessary to use these tools effectively β€” and to diagnose when they produce the wrong outcome. Automation removes some of the manual iteration; it does not remove the need for structured thinking.

Is prompt engineering dead in 2026?

No. The discipline has shifted, not disappeared. As models grow more capable, the work moves from syntax tricks and formatting hacks to context design β€” structuring inputs, managing retrieval, and composing tool outputs. The job title "Prompt Engineer" is narrowing, but the underlying skill is embedded in every role that uses AI: developer, analyst, marketer, researcher. McKinsey 2024 State of AI found that effective AI adoption still correlates strongly with how well users frame tasks for the model.

Do I need to learn prompt engineering if AI models keep improving?

Yes β€” but the focus shifts with each generation. Better models reduce the need for elaborate workarounds (special tokens, repetitive reinforcement, rigid formatting constraints) and increase the payoff for clear intent, structured context, and well-chosen examples. The fundamentals β€” role, context, format, constraints β€” remain stable across every model generation. Learning them now means the skill compounds rather than expires.

What is the difference between prompt engineering and fine-tuning?

Prompt engineering changes how you talk to a model without modifying its weights. Fine-tuning retrains a model on new data to change its behaviour permanently. Prompt engineering is faster, cheaper, and reversible β€” you can iterate in minutes. Fine-tuning is better when the target behaviour is consistent, high-volume, or impossible to describe reliably in a prompt. Most teams start with prompting and fine-tune only when prompting approaches a ceiling on their specific task.

Apply these techniques across 25+ AI models simultaneously with PromptQuorum.

Try PromptQuorum free β†’

← Back to Prompt Engineering

From GPT-2 to Today: How Prompt Engineering Evolved | PromptQuorum