Skip to main content
PromptQuorumPromptQuorum
Home/Local LLMs/Ollama OpenAI API: Python ๋ฐ Node.js ์—ฐ๋™ 3๋‹จ๊ณ„ (์ฝ”๋“œ ์˜ˆ์ œ + ์ŠคํŠธ๋ฆฌ๋ฐ + ํ•จ์ˆ˜ ํ˜ธ์ถœ)
๋„๊ตฌ ๋ฐ ์ธํ„ฐํŽ˜์ด์Šค

Ollama OpenAI API: Python ๋ฐ Node.js ์—ฐ๋™ 3๋‹จ๊ณ„ (์ฝ”๋“œ ์˜ˆ์ œ + ์ŠคํŠธ๋ฆฌ๋ฐ + ํ•จ์ˆ˜ ํ˜ธ์ถœ)

ยท10๋ถ„ ์ฝ๊ธฐยทBy Hans Kuepper ยท Founder of PromptQuorum, multi-model AI dispatch tool ยท PromptQuorum

LM Studio(localhost:1234), Ollama(localhost:11434), vLLM(localhost:8000)์€ ๋ชจ๋‘ OpenAI ํ˜•์‹์˜ REST API๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๋‘ ์ค„๋งŒ ๋ณ€๊ฒฝํ•˜๋ฉด ๊ณต์‹ OpenAI Python ๋˜๋Š” Node.js SDK๋กœ ๋กœ์ปฌ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. base_url์„ ๋กœ์ปฌ ์—”๋“œํฌ์ธํŠธ๋กœ, api_key๋ฅผ ์ž„์˜์˜ ๋ฌธ์ž์—ด๋กœ ์„ค์ •ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.

LM Studio(localhost:1234), Ollama(localhost:11434), vLLM(localhost:8000)์€ ๋ชจ๋‘ OpenAI ํ˜•์‹์˜ REST API๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๋‘ ์ค„๋งŒ ๋ณ€๊ฒฝํ•˜๋ฉด ๊ณต์‹ OpenAI Python ๋˜๋Š” Node.js SDK๋กœ ๋กœ์ปฌ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. base_url์„ ๋กœ์ปฌ ์—”๋“œํฌ์ธํŠธ๋กœ, api_key๋ฅผ ์ž„์˜์˜ ๋ฌธ์ž์—ด๋กœ ์„ค์ •ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. 2026๋…„ 5์›” ๊ธฐ์ค€, ์ด ๋ฐฉ์‹์€ ํด๋ผ์šฐ๋“œ ๋น„์šฉ์ด๋‚˜ ๋ฒค๋” ์ข…์† ์—†์ด ๋กœ์ปฌ LLM์„ Python ๋ฐ Node.js ํ”„๋กœ๋•์…˜ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์—์„œ ์‹คํ–‰ํ•˜๋Š” ํ‘œ์ค€ ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.

Key Takeaways

  • Ollama๋Š” OpenAI API์™€ ๋™์ผํ•œ ํ˜•์‹์˜ REST API๋ฅผ `http://localhost:11434/v1`์—์„œ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
  • OpenAI Python ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์‚ฌ์šฉ ์‹œ: `api_key="openai"`๋ฅผ `api_key="ollama"`๋กœ, `base_url="http://localhost:11434/v1"`๋กœ ๋ณ€๊ฒฝํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.
  • Node.js๋„ ๋™์ผํ•œ ๋ฐฉ์‹์œผ๋กœ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. OpenAI SDK๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ localhost:11434๋กœ ์—ฐ๊ฒฐํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.
  • OpenAI ํ˜ธํ™˜ API๋Š” Ollama, vLLM, LM Studio ๋ชจ๋‘ ๋™์ผํ•ฉ๋‹ˆ๋‹ค. ์ฝ”๋“œ ๋ณ€๊ฒฝ ์—†์ด ์ œ๊ณต์ž๋ฅผ ์ „ํ™˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • 2026๋…„ 5์›” ๊ธฐ์ค€, ์ŠคํŠธ๋ฆฌ๋ฐ(ํ† ํฐ๋ณ„ ์‘๋‹ต)๊ณผ ํ•จ์ˆ˜ ํ˜ธ์ถœ ๋ชจ๋‘ ์ด API๋ฅผ ํ†ตํ•ด ๋กœ์ปฌ ๋ชจ๋ธ์—์„œ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.

โšก ๋น ๋ฅธ ์ •๋ณด

Ollama API: `http://localhost:11434/v1` โ€” OpenAI์˜ `/chat/completions`์™€ ์™„์ „ํžˆ ๋™์ผ

LM Studio API: `http://localhost:1234/v1` โ€” ๋™์ผํ•œ ํ˜•์‹, ๋‹ค๋ฅธ ํฌํŠธ

vLLM API: `http://localhost:8000/v1` โ€” ํ”„๋กœ๋•์…˜ ์ˆ˜์ค€ ์„œ๋น™

์ฝ”๋“œ ๋ณ€๊ฒฝ: 2์ค„ โ€” `base_url`๊ณผ `api_key`. ๋‚˜๋จธ์ง€ ์ฝ”๋“œ๋Š” ๊ทธ๋Œ€๋กœ ์œ ์ง€๋ฉ๋‹ˆ๋‹ค.

์ง€์› ๊ธฐ๋Šฅ: ์ฑ„ํŒ… ์™„์„ฑ, ํ…์ŠคํŠธ ์™„์„ฑ, ์ž„๋ฒ ๋”ฉ, ์ŠคํŠธ๋ฆฌ๋ฐ, ํ•จ์ˆ˜ ํ˜ธ์ถœ

์ธ์ฆ: ๊ธฐ๋ณธ์ ์œผ๋กœ ์—†์Œ โ€” localhost ์ ‘๊ทผ๋งŒ ๊ฐ€๋Šฅ. ๋„คํŠธ์›Œํฌ ์ ‘๊ทผ์„ ์œ„ํ•ด์„œ๋Š” ๋ฆฌ๋ฒ„์Šค ํ”„๋ก์‹œ๋ฅผ ์ถ”๊ฐ€ํ•˜์„ธ์š”.

์ฝ”๋“œ ์˜ˆ์ œ ๋ชจ๋ธ: Llama 4 Scout (12 GB์—์„œ ์ตœ๊ณ  ํ’ˆ์งˆ, MoE) ๋˜๋Š” Llama 3.2 3B (๊ฒฝ๋Ÿ‰ํ˜•)

OpenAI ํ˜ธํ™˜์ด๋ž€ ๋ฌด์—‡์„ ์˜๋ฏธํ•ฉ๋‹ˆ๊นŒ?

OpenAI ํ˜ธํ™˜์ด๋ž€ API ์—”๋“œํฌ์ธํŠธ๊ฐ€ OpenAI API์™€ ๋™์ผํ•œ ํ˜•์‹์œผ๋กœ ์‘๋‹ต์„ ๋ฐ˜ํ™˜ํ•œ๋‹ค๋Š” ์˜๋ฏธ์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด OpenAI์šฉ์œผ๋กœ ๋งŒ๋“ค์–ด์ง„ ๋ชจ๋“  ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋‚˜ ๋„๊ตฌ๊ฐ€ ๋‹ค๋ฅธ URL์„ ๊ฐ€๋ฆฌํ‚ค๋Š” ๊ฒƒ๋งŒ์œผ๋กœ ๋กœ์ปฌ ๋ชจ๋ธ๊ณผ ํ•จ๊ป˜ ์ž‘๋™ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ํ‘œ์ค€์˜ ๊ตฌํ˜„ ๋ฐฉ์‹์—์„œ Ollama vs LM Studio๊ฐ€ ์–ด๋–ป๊ฒŒ ๋‹ค๋ฅธ์ง€ ์•Œ์•„๋ณด์„ธ์š”.

์˜ˆ์‹œ: OpenAI Python ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์š”์ฒญ์„ ๋ณด๋ƒ…๋‹ˆ๋‹ค:

``` POST /chat/completions { "model": "gpt-4o", "messages": [...], "temperature": 0.7 } ```

Ollama์˜ API๋Š” `localhost:11434/v1/chat/completions`์—์„œ ์™„์ „ํžˆ ๋™์ผํ•œ ์š”์ฒญ์„ ์ˆ˜์‹ ํ•˜๋ฉฐ OpenAI ํ˜•์‹์œผ๋กœ ์‘๋‹ต์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค:

``` { "choices": [{"message": {"content": "..."}}], "usage": {"prompt_tokens": 10, "completion_tokens": 20} } ```

ํ˜•์‹์ด ๋™์ผํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ƒˆ๋กœ์šด API๋ฅผ ๋ฐฐ์šธ ํ•„์š”๋„, ์ฝ”๋“œ๋ฅผ ๋‹ค์‹œ ์ž‘์„ฑํ•  ํ•„์š”๋„ ์—†์Šต๋‹ˆ๋‹ค.

---

๐Ÿ” ์•Œ๊ณ  ๊ณ„์…จ์Šต๋‹ˆ๊นŒ? OpenAI API ํ˜•์‹์€ ๋ชจ๋“  LLM API์˜ ๋น„๊ณต์‹ ํ‘œ์ค€์ด ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. Anthropic(Claude), Google(Gemini), ๊ทธ๋ฆฌ๊ณ  ๋ชจ๋“  ์ฃผ์š” ๋กœ์ปฌ ์ถ”๋ก  ๋„๊ตฌ(Ollama, vLLM, LM Studio, llama.cpp)๊ฐ€ ์ด ํ˜•์‹์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. ์ด ํ˜•์‹์œผ๋กœ ์ž‘์„ฑ๋œ ์ฝ”๋“œ๋Š” ์ง„์ •ํ•œ ์˜๋ฏธ์—์„œ ์ œ๊ณต์ž์— ๋…๋ฆฝ์ ์ž…๋‹ˆ๋‹ค. AI ์—…๊ณ„๊ฐ€ ๋ณด์œ ํ•œ ๋ฒ”์šฉ API์— ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ํ˜•ํƒœ์ž…๋‹ˆ๋‹ค.

OpenAI์—์„œ Ollama๋กœ ์ „ํ™˜ํ•˜๋ ค๋ฉด base_url๊ณผ api_key ๋‘ ์ค„๋งŒ ๋ณ€๊ฒฝํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. ๋‚˜๋จธ์ง€ ์ฝ”๋“œ๋Š” ๊ทธ๋Œ€๋กœ ์œ ์ง€๋ฉ๋‹ˆ๋‹ค.
OpenAI์—์„œ Ollama๋กœ ์ „ํ™˜ํ•˜๋ ค๋ฉด base_url๊ณผ api_key ๋‘ ์ค„๋งŒ ๋ณ€๊ฒฝํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. ๋‚˜๋จธ์ง€ ์ฝ”๋“œ๋Š” ๊ทธ๋Œ€๋กœ ์œ ์ง€๋ฉ๋‹ˆ๋‹ค.

Ollama์˜ API ์—”๋“œํฌ์ธํŠธ๋Š” ๋ฌด์—‡์ž…๋‹ˆ๊นŒ?

**`ollama serve`๋ฅผ ์‹คํ–‰ํ•˜๋ฉด Ollama๊ฐ€ `http://localhost:11434`์—์„œ REST API๋ฅผ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค.** OpenAI ํ˜ธํ™˜ ์—”๋“œํฌ์ธํŠธ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

EndpointURLDescription
์ฑ„ํŒ… ์™„์„ฑPOST http://localhost:11434/v1/chat/completionsOpenAI์˜ `/chat/completions`์™€ ์ผ์น˜
ํ…์ŠคํŠธ ์™„์„ฑPOST http://localhost:11434/v1/completionsOpenAI์˜ `/completions`์™€ ์ผ์น˜
์ž„๋ฒ ๋”ฉPOST http://localhost:11434/v1/embeddingsํ…์ŠคํŠธ๋ฅผ ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜
๋ชจ๋ธ ๋ชฉ๋กGET http://localhost:11434/v1/models์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๋ชจ๋ธ ๋ชฉ๋ก ์กฐํšŒ
Ollama๋Š” OpenAI ํ˜•์‹์˜ ์š”์ฒญ์„ ๋ฐ›์•„ ๋กœ์ปฌ์—์„œ ์ถ”๋ก ์„ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์‘๋‹ต์€ ๋™์ผํ•œ OpenAI ํ˜•์‹์œผ๋กœ ๋ฐ˜ํ™˜๋˜๋ฉฐ ์ธํ„ฐ๋„ท ์—ฐ๊ฒฐ์ด ํ•„์š” ์—†์Šต๋‹ˆ๋‹ค.
Ollama๋Š” OpenAI ํ˜•์‹์˜ ์š”์ฒญ์„ ๋ฐ›์•„ ๋กœ์ปฌ์—์„œ ์ถ”๋ก ์„ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์‘๋‹ต์€ ๋™์ผํ•œ OpenAI ํ˜•์‹์œผ๋กœ ๋ฐ˜ํ™˜๋˜๋ฉฐ ์ธํ„ฐ๋„ท ์—ฐ๊ฒฐ์ด ํ•„์š” ์—†์Šต๋‹ˆ๋‹ค.

Python์—์„œ Ollama API๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•(OpenAI ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ)?

OpenAI ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์„ค์น˜ํ•˜๊ณ  localhost๋ฅผ ๊ฐ€๋ฆฌํ‚ค๋„๋ก ์„ค์ •ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.

๐Ÿ” ์ „๋ฌธ๊ฐ€ ํŒ: `OPENAI_BASE_URL=http://localhost:11434/v1`์„ ํ™˜๊ฒฝ ๋ณ€์ˆ˜๋กœ ์„ค์ •ํ•˜์„ธ์š”. ๋งŽ์€ ๋„๊ตฌ(LangChain, LlamaIndex, aider)๊ฐ€ ์ด ๋ณ€์ˆ˜๋ฅผ ์ž๋™์œผ๋กœ ์ฝ์Šต๋‹ˆ๋‹ค. ์ฝ”๋“œ๋ฅผ ๋ณ€๊ฒฝํ•  ํ•„์š” ์—†์ด ํ™˜๊ฒฝ ๋ณ€์ˆ˜ ํ•˜๋‚˜๋งŒ ๋ฐ”๊ฟ”์„œ OpenAI์™€ Ollama๋ฅผ ์ „ํ™˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

python
# 1. Install the OpenAI library
pip install openai

# 2. Connect to Ollama
from openai import OpenAI

client = OpenAI(
  base_url="http://localhost:11434/v1",
  api_key="ollama"  # dummy key; Ollama ignores it
)

# 3. Make a request
response = client.chat.completions.create(
  model="llama4:scout",  # Best quality on 12 GB VRAM (MoE)
  # model="llama3.2:3b",  # Lightweight alternative for 8 GB RAM
  messages=[
    {"role": "user", "content": "What is 2+2?"}
  ]
)

print(response.choices[0].message.content)

Node.js์—์„œ Ollama API๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์€?

OpenAI SDK๋ฅผ ์„ค์น˜ํ•˜๊ณ  ๋กœ์ปฌ Ollama ์ธ์Šคํ„ด์Šค์— ์—ฐ๊ฒฐํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.

javascript
// 1. Install
npm install openai

// 2. Connect to Ollama
const OpenAI = require("openai").default;

const client = new OpenAI({
  baseURL: "http://localhost:11434/v1",
  apiKey: "ollama"
});

// 3. Make a request
const response = await client.chat.completions.create({
  model: "llama4:scout",       // Best quality on 12 GB VRAM
  // model: "llama3.2:3b",     // Lightweight for 8 GB RAM
  messages: [{
    role: "user",
    content: "What is 2+2?"
  }]
});

console.log(response.choices[0].message.content);

LM Studio OpenAI ํ˜ธํ™˜ ์„œ๋ฒ„ ์‚ฌ์šฉ๋ฒ•(localhost:1234)

**LM Studio๋Š” `http://localhost:1234/v1`์—์„œ OpenAI ํ˜ธํ™˜ API๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.** ๋กœ์ปฌ ์„œ๋ฒ„ ํƒญ์—์„œ ํ™œ์„ฑํ™”ํ•˜๊ณ  ๋ชจ๋ธ์„ ๋ถˆ๋Ÿฌ์˜จ ํ›„ ์„œ๋ฒ„ ์‹œ์ž‘์„ ํด๋ฆญํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. ๋™์ผํ•œ Python ๋ฐ Node.js ์ฝ”๋“œ๊ฐ€ LM Studio์—์„œ๋„ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค. ํฌํŠธ๋งŒ 11434์—์„œ 1234๋กœ ๋ณ€๊ฒฝํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.

LM Studio๋Š” GUI๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ์„ ํƒ์ƒ‰ํ•˜๊ณ  ์‰ฝ๊ฒŒ ์ „ํ™˜ํ•˜๋ ค๋Š” ์‚ฌ์šฉ์ž์—๊ฒŒ ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค. ์Šคํฌ๋ฆฝํŒ…, ์ž๋™ํ™”, CI ํŒŒ์ดํ”„๋ผ์ธ์—๋Š” Ollama๊ฐ€ ๋” ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค.

PlatformPortBest ForGPU Required
LM Studiolocalhost:1234GUI ์‚ฌ์šฉ์ž, ์‹œ๊ฐ์  ๋ชจ๋ธ ๊ด€๋ฆฌ์•„๋‹ˆ์˜ค (CPU ๊ฐ€๋Šฅ)
Ollamalocalhost:11434์Šคํฌ๋ฆฝํŒ…, ์ž๋™ํ™”, ํ”„๋กœ๋•์…˜์•„๋‹ˆ์˜ค (CPU ๊ฐ€๋Šฅ)
vLLMlocalhost:8000๋‹ค์ค‘ GPU, ๊ณ ์ฒ˜๋ฆฌ๋Ÿ‰ ์„œ๋ฒ„๊ถŒ์žฅ
python
# Python: Connect to LM Studio (localhost:1234)
from openai import OpenAI

client = OpenAI(
  base_url="http://localhost:1234/v1",
  api_key="lm-studio"  # any string; LM Studio ignores it
)

response = client.chat.completions.create(
  model="llama-3.2-3b-instruct",  # exact model name shown in LM Studio
  messages=[
    {"role": "user", "content": "What is 2+2?"}
  ]
)

print(response.choices[0].message.content)

๋ธŒ๋ผ์šฐ์ € JavaScript์—์„œ Ollama API๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์€?

๋ธŒ๋ผ์šฐ์ € ์ธก JavaScript์—์„œ Ollama๋ฅผ ํ˜ธ์ถœํ•˜๋ ค๋ฉด ๋ธŒ๋ผ์šฐ์ €์™€ ์„œ๋ฒ„๊ฐ€ ๊ฐ™์€ ๋จธ์‹ ์— ์žˆ์–ด์•ผ ํ•˜๊ฑฐ๋‚˜ CORS๋ฅผ ํ—ˆ์šฉํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋ณด์•ˆ์ƒ์˜ ์ด์œ ๋กœ localhost์— ๋Œ€ํ•œ ๋ธŒ๋ผ์šฐ์ € ์š”์ฒญ์€ JavaScript๊ฐ€ localhost์—์„œ ์ œ๊ณต๋  ๋•Œ๋งŒ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค. CORS๋ฅผ ์›ํ™œํ•˜๊ฒŒ ์ฒ˜๋ฆฌํ•˜๋Š” ๋ธŒ๋ผ์šฐ์ €์šฉ UI๋Š” ์ตœ๊ณ ์˜ ๋กœ์ปฌ LLM ํ”„๋ก ํŠธ์—”๋“œ๋ฅผ ํ™•์ธํ•˜์„ธ์š”.

๋‹ค๋ฅธ IP์˜ ๋ธŒ๋ผ์šฐ์ €์—์„œ Ollama๋ฅผ ํ˜ธ์ถœํ•ด์•ผ ํ•˜๋Š” ๊ฒฝ์šฐ CORS ํ”„๋ก์‹œ๋ฅผ ์„ค์ •ํ•˜๊ฑฐ๋‚˜ ์„œ๋ฒ„ ์ธก ๋ฏธ๋“ค์›จ์–ด๋ฅผ ์‚ฌ์šฉํ•˜์„ธ์š”.

javascript
// Browser-side JavaScript (if server is localhost:3000, Ollama is localhost:11434)
fetch("http://localhost:11434/v1/chat/completions", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    model: "llama4:scout",      // Best quality on 12 GB VRAM
    // model: "llama3.2:3b",    // Lightweight for 8 GB RAM
    messages: [{ role: "user", content: "What is 2+2?" }]
  })
})
  .then(res => res.json())
  .then(data => console.log(data.choices[0].message.content))

ํ† ํฐ๋ณ„ ์‘๋‹ต ์ŠคํŠธ๋ฆฌ๋ฐ ๋ฐฉ๋ฒ•์€?

์ŠคํŠธ๋ฆฌ๋ฐ์„ ์‚ฌ์šฉํ•˜๋ฉด ์ „์ฒด ์‘๋‹ต์„ ๊ธฐ๋‹ค๋ฆฌ์ง€ ์•Š๊ณ  ์ƒ์„ฑ๋˜๋Š” ๋Œ€๋กœ ํ† ํฐ ๋‹จ์œ„๋กœ ์‘๋‹ต์„ ํ‘œ์‹œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. 2026๋…„ 5์›” ๊ธฐ์ค€, ์ŠคํŠธ๋ฆฌ๋ฐ์€ OpenAI ํ˜ธํ™˜ API๋ฅผ ํ†ตํ•œ ๋ชจ๋“  ๋กœ์ปฌ ๋ชจ๋ธ์—์„œ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.

python
# Python: streaming example
from openai import OpenAI

client = OpenAI(
  base_url="http://localhost:11434/v1",
  api_key="ollama"
)

stream = client.chat.completions.create(
  model="llama4:scout",
  messages=[{"role": "user", "content": "Count to 10"}],
  stream=True
)

for chunk in stream:
  if chunk.choices[0].delta.content:
    print(chunk.choices[0].delta.content, end="", flush=True)
stream=True๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด Ollama๊ฐ€ ์•ฝ 0.1์ดˆ ๋‚ด์— ์ฒซ ๋ฒˆ์งธ ํ† ํฐ์„ ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค. ์ „์ฒด ์‘๋‹ต์„ ๊ธฐ๋‹ค๋ฆฌ์ง€ ์•Š๊ณ  ์ฆ‰์‹œ ์ถœ๋ ฅ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
stream=True๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด Ollama๊ฐ€ ์•ฝ 0.1์ดˆ ๋‚ด์— ์ฒซ ๋ฒˆ์งธ ํ† ํฐ์„ ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค. ์ „์ฒด ์‘๋‹ต์„ ๊ธฐ๋‹ค๋ฆฌ์ง€ ์•Š๊ณ  ์ฆ‰์‹œ ์ถœ๋ ฅ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋กœ์ปฌ ๋ชจ๋ธ์—์„œ ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?

๋„ค, 2026๋…„ 5์›” ๊ธฐ์ค€์œผ๋กœ OpenAI API๋ฅผ ํ†ตํ•ด ๋กœ์ปฌ ๋ชจ๋ธ์—์„œ ํ•จ์ˆ˜ ํ˜ธ์ถœ์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. ํ•จ์ˆ˜ ์Šคํ‚ค๋งˆ๋ฅผ ์ •์˜ํ•˜๋ฉด ๋ชจ๋ธ์ด ํ•จ์ˆ˜์— ์ „๋‹ฌํ•  ์ธ์ˆ˜๋กœ ์‘๋‹ตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์ฝ”๋”ฉ์„ ์œ„ํ•œ ์ตœ๊ณ ์˜ ๋กœ์ปฌ LLM์ด ๋„๊ตฌ ์ƒํƒœ๊ณ„์™€ ํ†ตํ•ฉ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ํ•จ์ˆ˜ ํ˜ธ์ถœ ์ง€์› ์—ฌ๋ถ€๋Š” ๋ชจ๋ธ์— ๋”ฐ๋ผ ๋‹ค๋ฆ…๋‹ˆ๋‹ค. Llama 4 Scout, Qwen3 8B, Gemma 4 9B, Mistral Small 3.1 ๋ชจ๋‘ ๋„๊ตฌ ํ˜ธ์ถœ์„ ์•ˆ์ •์ ์œผ๋กœ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. Llama 3.3 8B์™€ Qwen3 7B๋„ ์ง€์›๋ฉ๋‹ˆ๋‹ค(๋ ˆ๊ฑฐ์‹œ). ๋” ์ž‘์€ ๋ชจ๋ธ(3B)์€ ๊ตฌ์กฐํ™”๋œ ๋„๊ตฌ ํ˜ธ์ถœ JSON์„ ์•ˆ์ •์ ์œผ๋กœ ์ƒ์„ฑํ•˜์ง€ ๋ชปํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

2026๋…„์—๋Š” Model Context Protocol(MCP)์ด ํ•จ์ˆ˜ ํ˜ธ์ถœ์„ ํ‘œ์ค€ํ™”๋œ ๋„๊ตฌ ์—ฐ๊ฒฐ ๋ ˆ์ด์–ด๋กœ ํ™•์žฅํ•ฉ๋‹ˆ๋‹ค. MCP๋Š” ๋ชจ๋“  ํด๋ผ์ด์–ธํŠธ(Claude Code, Cursor, ์ปค์Šคํ…€ ์•ฑ)๊ฐ€ ๋‹จ์ผ ํ”„๋กœํ† ์ฝœ์„ ํ†ตํ•ด ๋ชจ๋“  ๋„๊ตฌ ์„œ๋ฒ„์— ์—ฐ๊ฒฐํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. ์œ„์— ํ‘œ์‹œ๋œ ์š”์ฒญ๋ณ„ ๋„๊ตฌ ์ •์˜๋ฅผ ๋„˜์–ด์„ญ๋‹ˆ๋‹ค. Ollama๋Š” ํ‘œ์ค€ OpenAI ํ˜ธํ™˜ ํ•จ์ˆ˜ ํ˜ธ์ถœ API๋ฅผ ํ†ตํ•ด MCP ์Šคํƒ€์ผ์˜ ๋„๊ตฌ ํ˜ธ์ถœ์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. ํ”„๋กœ๋•์…˜ ๋„๊ตฌ ํ†ตํ•ฉ์˜ ๊ฒฝ์šฐ MCP๊ฐ€ ํ‘œ์ค€์ด ๋˜๊ณ  ์žˆ์œผ๋ฉฐ, ์—ฌ๊ธฐ์˜ ํ•จ์ˆ˜ ํ˜ธ์ถœ ์˜ˆ์ œ๋Š” ๊ทธ ๊ธฐ๋ฐ˜์ž…๋‹ˆ๋‹ค.

OpenAI ํ˜ธํ™˜ API๋ฅผ ๋กœ์ปฌ์—์„œ ์‚ฌ์šฉํ•  ๋•Œ ๊ตฌ์กฐํ™”๋œ ์ถœ๋ ฅ๊ณผ JSON ๋ชจ๋“œ๋Š” ํด๋ผ์šฐ๋“œ API์™€ ๋™์ผํ•œ ๋ฐฉ์‹์œผ๋กœ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค. ๋กœ์ปฌ ๋ฐ ํด๋ผ์šฐ๋“œ ๋ชจ๋ธ์—์„œ ์Šคํ‚ค๋งˆ ์ค€์ˆ˜ ๋ฐ ํ˜•์‹ ์ œ์–ด๋ฅผ ์œ„ํ•ด ๊ตฌ์กฐํ™”๋œ ์ถœ๋ ฅ ๋ฐ JSON ๋ชจ๋“œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

OpenAI ํ˜ธํ™˜ API๋Š” ํด๋ผ์šฐ๋“œ ๋ฒ„์ „๊ณผ ๋™์ผํ•œ ํ”„๋กฌํ”„ํŠธ ํ˜•์‹์„ ์ˆ˜์šฉํ•ฉ๋‹ˆ๋‹ค. ์‹œ์Šคํ…œ ๋ฉ”์‹œ์ง€, ์‚ฌ์šฉ์ž ๋ฉ”์‹œ์ง€, ๊ตฌ์กฐํ™”๋œ ์ถœ๋ ฅ์ด ๋ชจ๋‘ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค. ํ”„๋กฌํ”„ํŠธ ์—”์ง€๋‹ˆ์–ด๋ง ๊ธฐ๋ฒ•์˜ ์ „์ฒด ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๊ฐ€ ๋กœ์ปฌ API ํ˜ธ์ถœ์— ์ง์ ‘ ์ ์šฉ๋ฉ๋‹ˆ๋‹ค.

python
# Example: local model calls a weather function
tools = [{
  "type": "function",
  "function": {
    "name": "get_weather",
    "description": "Get current weather",
    "parameters": {
      "type": "object",
      "properties": {
        "location": {"type": "string"}
      }
    }
  }
}]

response = client.chat.completions.create(
  model="llama4:scout",
  messages=[{"role": "user", "content": "What is the weather in SF?"}],
  tools=tools
)

# Check if model returned a function call
if response.choices[0].message.tool_calls:
  call = response.choices[0].message.tool_calls[0]
  print(f"Call function: {call.function.name} with {call.function.arguments}")
Ollama๋ฅผ ์‚ฌ์šฉํ•œ ํ•จ์ˆ˜ ํ˜ธ์ถœ ํ๋ฆ„: ๋กœ์ปฌ ๋ชจ๋ธ์ด tool_call JSON์„ ๋ฐ˜ํ™˜ํ•˜๋ฉด ์•ฑ์ด ํ•จ์ˆ˜๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค. Llama 4 Scout, Qwen3 8B, Gemma 4 9B, Mistral์—์„œ ์ง€์›๋ฉ๋‹ˆ๋‹ค.
Ollama๋ฅผ ์‚ฌ์šฉํ•œ ํ•จ์ˆ˜ ํ˜ธ์ถœ ํ๋ฆ„: ๋กœ์ปฌ ๋ชจ๋ธ์ด tool_call JSON์„ ๋ฐ˜ํ™˜ํ•˜๋ฉด ์•ฑ์ด ํ•จ์ˆ˜๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค. Llama 4 Scout, Qwen3 8B, Gemma 4 9B, Mistral์—์„œ ์ง€์›๋ฉ๋‹ˆ๋‹ค.

์ง€์—ญ๋ณ„ ๋กœ์ปฌ LLM OpenAI API

EU / GDPR ๋ฐ AI ๋ฒ•: EU ๊ฐœ๋ฐœ์ž์˜ ๊ฒฝ์šฐ Ollama๋ฅผ ๋กœ์ปฌ์—์„œ ์‹คํ–‰ํ•˜๋ฉด GDPR ์ œ5์กฐ ์ค€์ˆ˜(๋ฐ์ดํ„ฐ ์ตœ์†Œํ™”)๊ฐ€ ๋ณด์žฅ๋ฉ๋‹ˆ๋‹ค. ๋ชจ๋“  ์ถ”๋ก ์ด ํด๋ผ์šฐ๋“œ API๋กœ์˜ ๋ฐ์ดํ„ฐ ์œ ์ถœ ์—†์ด ๊ธฐ๊ธฐ์—์„œ ์ฒ˜๋ฆฌ๋ฉ๋‹ˆ๋‹ค. Ollama๋Š” MIT ๋ผ์ด์„ ์Šค๋กœ GitHub์—์„œ ๋‹ค์šด๋กœ๋“œํ•  ์ˆ˜ ์žˆ์–ด EU ๊ทœ์ • ์ค€์ˆ˜ ์š”๊ฑด์„ ์ถฉ์กฑํ•ฉ๋‹ˆ๋‹ค. EU AI ๋ฒ• ๊ณ ์œ„ํ—˜ ์‹œ์Šคํ…œ ์˜๋ฌด๋Š” 2026๋…„ 8์›” 2์ผ๋ถ€ํ„ฐ ์ ์šฉ๋ฉ๋‹ˆ๋‹ค(Digital Omnibus ๋Œ€๊ธฐ ์ค‘). ๋กœ์ปฌ API ์ถ”๋ก ์€ ๊ธฐ๋ณธ์ ์œผ๋กœ GDPR ๋ฐ์ดํ„ฐ ๊ฑฐ์ฃผ ์š”๊ฑด์„ ์ถฉ์กฑํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์—…์˜ ๊ฒฝ์šฐ ๋ฒค๋” ์ข…์†์„ ์—†์• ๊ณ  ๋ฐ์ดํ„ฐ ๊ฑฐ์ฃผ๋ฅผ ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค.

์ผ๋ณธ / APPI: ์ผ๋ณธ์˜ ๊ฐœ์ธ์ •๋ณด๋ณดํ˜ธ๋ฒ•(APPI)์— ๋”ฐ๋ฅด๋ฉด ์˜จํ”„๋ ˆ๋ฏธ์Šค ๋ชจ๋ธ ์ถ”๋ก ์€ ํด๋ผ์šฐ๋“œ ๋ฐ์ดํ„ฐ ์ „์†ก ์š”๊ฑด์„ ์šฐํšŒํ•ฉ๋‹ˆ๋‹ค. Ollama + Qwen3 8B๋Š” ํ‘œ์ค€ ๊ธฐ์—… ๋…ธํŠธ๋ถ(8 GB RAM)์—์„œ ์‹คํ–‰ ๊ฐ€๋Šฅํ•˜๋ฉฐ Qwen3 ๋Œ€๋น„ ํ–ฅ์ƒ๋œ ์ผ๋ณธ์–ด ์ง€์›๊ณผ 30-50 tok/sec ์ง€์—ฐ ์‹œ๊ฐ„์œผ๋กœ ์ผ๋ณธ์–ด ์ฒ˜๋ฆฌ์˜ ์‹ค์‹œ๊ฐ„ ์‘๋‹ต ๊ธฐ๋Œ€์น˜๋ฅผ ์ถฉ์กฑํ•ฉ๋‹ˆ๋‹ค.

์ค‘๊ตญ / CAC: ์ค‘๊ตญ ์‚ฌ์ด๋ฒ„๋ณด์•ˆ๋ฒ•(CAC ์ œ37์กฐ)์— ๋”ฐ๋ฅธ ๋ฐฐํฌ ์‹œ ๋กœ์ปฌ ์ถ”๋ก ์ด ๋ฐ์ดํ„ฐ ํ˜„์ง€ํ™” ์š”๊ฑด์„ ์ถฉ์กฑํ•ฉ๋‹ˆ๋‹ค. Ollama + Qwen3๋Š” ์™ธ๋ถ€ API ํ˜ธ์ถœ ์—†์ด ๋ชจ๋“  Linux ๊ธฐ๊ธฐ์—์„œ ์‹คํ–‰๋ฉ๋‹ˆ๋‹ค. Qwen3์˜ ๋„ค์ดํ‹ฐ๋ธŒ ์ค‘๊ตญ์–ด ํ† ํฌ๋‚˜์ด์ €๋Š” Llama ๋Œ€๋น„ 30-40% ํšจ์œจ์„ ๋†’์—ฌ ๋กœ์ปฌ ์ถ”๋ก  ์˜ค๋ฒ„ํ—ค๋“œ๋ฅผ ์ค„์ž…๋‹ˆ๋‹ค.

๋กœ์ปฌ LLM OpenAI API์—์„œ ์ž์ฃผ ๋ฐœ์ƒํ•˜๋Š” ์‹ค์ˆ˜๋Š”?

  • API ํ‚ค๊ฐ€ ๋ฌด์‹œ๋œ๋‹ค๋Š” ๊ฒƒ์„ ์žŠ๋Š” ๊ฒฝ์šฐ. Ollama๋Š” ์ธ์ฆํ•˜์ง€ ์•Š์œผ๋ฏ€๋กœ `api_key="ollama"`(์ž„์˜์˜ ๋ฌธ์ž์—ด ๊ฐ€๋Šฅ)๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ์‹ค์ œ ์ธ์ฆ์€ ์š”์ฒญ์ด ์ธํ„ฐ๋„ท์ด ์•„๋‹Œ localhost๋‚˜ ๋กœ์ปฌ ๋„คํŠธ์›Œํฌ์—์„œ ์˜จ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
  • ๋ชจ๋ธ ์ด๋ฆ„์ด ์ค‘์š”ํ•˜๋‹ค๋Š” ๊ฒƒ์„ ์ธ์‹ํ•˜์ง€ ๋ชปํ•˜๋Š” ๊ฒฝ์šฐ. `/chat/completions`๋ฅผ `model="gpt-4"`๋กœ ํ˜ธ์ถœํ–ˆ์ง€๋งŒ Ollama์— `llama3.2:3b`๋งŒ ํ’€๋ง๋˜์–ด ์žˆ๋‹ค๋ฉด ์š”์ฒญ์ด ์‹คํŒจํ•ฉ๋‹ˆ๋‹ค. `ollama list`์—์„œ ์ •ํ™•ํ•œ ๋ชจ๋ธ ์ด๋ฆ„์„ ์‚ฌ์šฉํ•˜์„ธ์š”.
  • Ollama๊ฐ€ ์ธํ„ฐ๋„ท์ด ํ•„์š”ํ•˜๋‹ค๊ณ  ๊ฐ€์ •ํ•˜๋Š” ๊ฒฝ์šฐ. ๊ทธ๋ ‡์ง€ ์•Š์Šต๋‹ˆ๋‹ค. API๋Š” ์™„์ „ํžˆ ๋กœ์ปฌ์ž…๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ Python ์ฝ”๋“œ๊ฐ€ ๊ธฐ๋ณธ์ ์œผ๋กœ OpenAI ์„œ๋ฒ„์— ๋จผ์ € ์ ‘๊ทผํ•˜๋ ค ํ•˜๋ฉด ์‹คํŒจํ•ฉ๋‹ˆ๋‹ค. ํ•ญ์ƒ `base_url`์„ ๋ช…์‹œ์ ์œผ๋กœ ์„ค์ •ํ•˜์„ธ์š”.
  • ๋ธŒ๋ผ์šฐ์ €์—์„œ์˜ CORS ์˜ค๋ฅ˜. ๋ธŒ๋ผ์šฐ์ € ์ธก ์Šคํฌ๋ฆฝํŠธ์—์„œ Ollama๋ฅผ ํ˜ธ์ถœํ•  ๋•Œ CORS ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•˜๋ฉด ๋ณด์•ˆ ์ด์œ ๋กœ ๋ธŒ๋ผ์šฐ์ €๊ฐ€ ์š”์ฒญ์„ ์ฐจ๋‹จํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. CORS๋ฅผ ์šฐํšŒํ•˜๋Š” ์—๋””ํ„ฐ ๊ธฐ๋ฐ˜ ์†”๋ฃจ์…˜์€ VS Code ๋ฐ Cursor๋กœ ๋กœ์ปฌ LLM ์‚ฌ์šฉํ•˜๊ธฐ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.
  • ์ŠคํŠธ๋ฆฌ๋ฐ์„ ์›ํ•  ๋•Œ stream=True๋ฅผ ์„ค์ •ํ•˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ. ํ† ํฐ๋ณ„ ์‘๋‹ต์„ ์›ํ•œ๋‹ค๋ฉด ์š”์ฒญ์— `stream=True`๋ฅผ ๋ช…์‹œ์ ์œผ๋กœ ์„ค์ •ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ๋ณธ๊ฐ’์€ ์ „์ฒด ์‘๋‹ต์„ ๊ธฐ๋‹ค๋ฆฝ๋‹ˆ๋‹ค.
  • ๋” ๋‚˜์€ ๋ชจ๋ธ์ด ์žˆ๋Š”๋ฐ๋„ ์˜ˆ์ œ์—์„œ `llama3.2:3b`๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ. ๋งŽ์€ ํŠœํ† ๋ฆฌ์–ผ์ด 8 GB RAM์—์„œ ์‹คํ–‰๋˜๊ธฐ ๋•Œ๋ฌธ์— ์•„์ง๋„ Llama 3.2 3B๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. 12+ GB VRAM์ด ์žˆ๋‹ค๋ฉด `llama4:scout`๋กœ ์ „ํ™˜ํ•˜์„ธ์š”. ๋™์ผํ•œ API ์ฝ”๋“œ๋กœ ํ›จ์”ฌ ๋†’์€ ํ’ˆ์งˆ์„ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. 3B ๋ชจ๋ธ์€ API ํ†ตํ•ฉ ํ…Œ์ŠคํŠธ์šฉ์œผ๋กœ๋งŒ ์‚ฌ์šฉํ•˜๊ณ  ํ”„๋กœ๋•์…˜ ์›Œํฌ๋กœ๋“œ์—๋Š” ์‚ฌ์šฉํ•˜์ง€ ๋งˆ์„ธ์š”.
  • ๋™์‹œ ์š”์ฒญ์„ ์œ„ํ•ด `OLLAMA_NUM_PARALLEL`์„ ์„ค์ •ํ•˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ. ๊ธฐ๋ณธ์ ์œผ๋กœ Ollama๋Š” ํ•œ ๋ฒˆ์— ํ•˜๋‚˜์˜ ์š”์ฒญ๋งŒ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์ค‘ ์‚ฌ์šฉ์ž ์•ฑ์ด๋‚˜ ๋ณ‘๋ ฌ ํ…Œ์ŠคํŠธ ์Šค์œ„ํŠธ์˜ ๊ฒฝ์šฐ ๋™์‹œ API ํ˜ธ์ถœ์„ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด `OLLAMA_NUM_PARALLEL=4`(๋˜๋Š” ๋” ๋†’๊ฒŒ)๋ฅผ ์„ค์ •ํ•˜์„ธ์š”. ์ด๋ฅผ ์„ค์ •ํ•˜์ง€ ์•Š์œผ๋ฉด ์š”์ฒญ์ด ๋Œ€๊ธฐ์—ด์— ์Œ“์ด๊ณ  ์ง€์—ฐ ์‹œ๊ฐ„์ด ๊ธ‰์ฆํ•ฉ๋‹ˆ๋‹ค.
  • ---
  • โš ๏ธ ๊ฒฝ๊ณ : Ollama์˜ API๋Š” ๊ธฐ๋ณธ์ ์œผ๋กœ ์ธ์ฆ์ด ์—†์Šต๋‹ˆ๋‹ค. ๋„คํŠธ์›Œํฌ์— ๋…ธ์ถœํ•˜๋ฉด(`OLLAMA_HOST=0.0.0.0`) ํ•ด๋‹น ๋„คํŠธ์›Œํฌ์˜ ๋ˆ„๊ตฌ๋‚˜ ์š”์ฒญ์„ ๋ณด๋‚ด๊ณ , ๋ชจ๋ธ์„ ๋ถˆ๋Ÿฌ์˜ค๊ณ , GPU ๋ฆฌ์†Œ์Šค๋ฅผ ์†Œ๋น„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์ค‘ ์‚ฌ์šฉ์ž ๋˜๋Š” ํ”„๋กœ๋•์…˜ ์„ค์ •์˜ ๊ฒฝ์šฐ ์ธ์ฆ์ด ์žˆ๋Š” ๋ฆฌ๋ฒ„์Šค ํ”„๋ก์‹œ(nginx, Caddy)๋ฅผ Ollama ์•ž์— ๋ฐฐ์น˜ํ•˜์„ธ์š”. ํฌํŠธ 11434๋ฅผ ์ธํ„ฐ๋„ท์— ์ง์ ‘ ๋…ธ์ถœํ•˜์ง€ ๋งˆ์„ธ์š”.
Ollama(ํฌํŠธ 11434), vLLM(ํฌํŠธ 8000), LM Studio(ํฌํŠธ 1234) ๋ชจ๋‘ OpenAI ํ˜ธํ™˜ ์—”๋“œํฌ์ธํŠธ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๋™์ผํ•œ ํด๋ผ์ด์–ธํŠธ ์ฝ”๋“œ, ๋‹ค๋ฅธ ํฌํŠธ์™€ ์‚ฌ์šฉ ์‚ฌ๋ก€์ž…๋‹ˆ๋‹ค.
Ollama(ํฌํŠธ 11434), vLLM(ํฌํŠธ 8000), LM Studio(ํฌํŠธ 1234) ๋ชจ๋‘ OpenAI ํ˜ธํ™˜ ์—”๋“œํฌ์ธํŠธ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๋™์ผํ•œ ํด๋ผ์ด์–ธํŠธ ์ฝ”๋“œ, ๋‹ค๋ฅธ ํฌํŠธ์™€ ์‚ฌ์šฉ ์‚ฌ๋ก€์ž…๋‹ˆ๋‹ค.

๋กœ์ปฌ LLM API์— ๊ด€ํ•œ ์ž์ฃผ ๋ฌป๋Š” ์งˆ๋ฌธ

Ollama๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด OpenAI ์ฝ”๋“œ๋ฅผ ์ˆ˜์ •ํ•ด์•ผ ํ•ฉ๋‹ˆ๊นŒ?

์•„๋‹ˆ์š”. `base_url="http://localhost:11434/v1"`์™€ `api_key="ollama"`๋ฅผ ์„ค์ •ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. ๋‚˜๋จธ์ง€๋Š” ๊ทธ๋Œ€๋กœ ์œ ์ง€๋ฉ๋‹ˆ๋‹ค. OpenAI ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์ฝ”๋“œ๊ฐ€ ์žˆ๋‹ค๋ฉด ์ด ๋‘ ์ค„์„ ๊ต์ฒดํ•˜๋ฉด ๋กœ์ปฌ ๋ชจ๋ธ์—์„œ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.

๋„คํŠธ์›Œํฌ์˜ ๋‹ค๋ฅธ ์ปดํ“จํ„ฐ์—์„œ API๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?

๋„ค. ๊ธฐ๋ณธ์ ์œผ๋กœ Ollama๋Š” localhost์—์„œ๋งŒ ์ˆ˜์‹ ํ•ฉ๋‹ˆ๋‹ค. ๋„คํŠธ์›Œํฌ ์ ‘๊ทผ์„ ํ—ˆ์šฉํ•˜๋ ค๋ฉด Ollama๋ฅผ ์‹คํ–‰ํ•˜๊ธฐ ์ „์— ํ™˜๊ฒฝ ๋ณ€์ˆ˜ `OLLAMA_HOST=0.0.0.0:11434`๋ฅผ ์„ค์ •ํ•˜์„ธ์š”. ๊ทธ๋Ÿฐ ๋‹ค์Œ ์ฝ”๋“œ์—์„œ `http://<machine-ip>:11434/v1`๋กœ ์—ฐ๊ฒฐํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. ๋ณด์•ˆ์— ์ฃผ์˜ํ•˜์„ธ์š”. ํ”„๋กœ๋•์…˜ ํ™˜๊ฒฝ์ด๋ผ๋ฉด ๋ฐฉํ™”๋ฒฝ์„ ์‚ฌ์šฉํ•˜์„ธ์š”.

LM Studio์—๋Š” OpenAI ํ˜ธํ™˜ API๊ฐ€ ์žˆ์Šต๋‹ˆ๊นŒ?

๋„ค. LM Studio๋Š” `http://localhost:1234/v1`์—์„œ OpenAI ํ˜ธํ™˜ API๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๋กœ์ปฌ ์„œ๋ฒ„ ํƒญ์—์„œ ํ™œ์„ฑํ™”ํ•˜๊ณ  ๋ชจ๋ธ์„ ๋ถˆ๋Ÿฌ์˜จ ํ›„ ์„œ๋ฒ„ ์‹œ์ž‘์„ ํด๋ฆญํ•˜์„ธ์š”. Ollama์™€ ๋™์ผํ•œ Python ๋˜๋Š” Node.js ์ฝ”๋“œ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. ํฌํŠธ๋งŒ 11434์—์„œ 1234๋กœ ๋ณ€๊ฒฝํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.

์—ฌ๋Ÿฌ ๋ชจ๋ธ์„ ๋™์‹œ์— ํ˜ธ์ถœํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?

Ollama์— ๋กœ๋“œ๋˜์–ด ์žˆ๋‹ค๋ฉด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ๋‘ ๋ชจ๋ธ์„ ๋™์‹œ์— ์‹คํ–‰ํ•˜๋ฉด VRAM ์‚ฌ์šฉ๋Ÿ‰์ด ๋‘ ๋ฐฐ๊ฐ€ ๋ฉ๋‹ˆ๋‹ค. ์ถฉ๋ถ„ํ•œ GPU ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

API์— ์ธ์ฆ์ด ์žˆ์Šต๋‹ˆ๊นŒ?

์•„๋‹ˆ์š”. ๊ธฐ๋ณธ์ ์œผ๋กœ Ollama API์—๋Š” ์ธ์ฆ์ด ์—†์Šต๋‹ˆ๋‹ค. localhost:11434์— ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ๋Š” ๋ˆ„๊ตฌ๋‚˜ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋„คํŠธ์›Œํฌ ์ ‘๊ทผ์ด ์žˆ๋Š” ํ”„๋กœ๋•์…˜ ํ™˜๊ฒฝ์—์„œ๋Š” ๋ฆฌ๋ฒ„์Šค ํ”„๋ก์‹œ(nginx Basic Auth ๋“ฑ)๋ฅผ ํ†ตํ•ด ์ธ์ฆ์„ ์ถ”๊ฐ€ํ•˜์„ธ์š”.

Ollama OpenAI API์—์„œ ์ŠคํŠธ๋ฆฌ๋ฐ์„ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์€?

OpenAI ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ํ˜ธ์ถœ์—์„œ stream=True๋ฅผ ์„ค์ •ํ•˜์„ธ์š”. Ollama๋Š” ๊ฐ ํ† ํฐ๊ณผ ํ•จ๊ป˜ ์„œ๋ฒ„ ์ „์†ก ์ด๋ฒคํŠธ(SSE)๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค. Python์—์„œ: for chunk in client.chat.completions.create(stream=True, ...): print(chunk.choices[0].delta.content).

Ollama๊ฐ€ API๋ฅผ ํ†ตํ•œ ํ•จ์ˆ˜ ํ˜ธ์ถœ/๋„๊ตฌ ์‚ฌ์šฉ์„ ์ง€์›ํ•ฉ๋‹ˆ๊นŒ?

๋„ค, ์ง€์›ํ•˜๋Š” ๋ชจ๋ธ์—์„œ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค(Llama 4 Scout, Qwen3 8B, Gemma 4 9B, Mistral Small 3.1). ๋ ˆ๊ฑฐ์‹œ ๋ชจ๋ธ(Llama 3.3 8B, Qwen3 7B)๋„ ์ง€์›๋ฉ๋‹ˆ๋‹ค. OpenAI์™€ ๋™์ผํ•˜๊ฒŒ API ํ˜ธ์ถœ์— tools=[]๋ฅผ ์ „๋‹ฌํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. Ollama๋Š” ๋„๊ตฌ ํ˜ธ์ถœ์„ ํŒŒ์‹ฑํ•˜๊ณ  ๊ตฌ์กฐํ™”๋œ JSON์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋“  ๋ชจ๋ธ์ด ์ด๋ฅผ ์ง€์›ํ•˜๋Š” ๊ฒƒ์€ ์•„๋‹ˆ๋ฏ€๋กœ ๋ชจ๋ธ ๋ฌธ์„œ๋ฅผ ํ™•์ธํ•˜์„ธ์š”.

MCP๋ž€ ๋ฌด์—‡์ด๋ฉฐ OpenAI ํ˜ธํ™˜ API์™€ ์–ด๋–ค ๊ด€๊ณ„๊ฐ€ ์žˆ์Šต๋‹ˆ๊นŒ?

MCP(Model Context Protocol)๋Š” AI ๋ชจ๋ธ์„ ์™ธ๋ถ€ ๋„๊ตฌ ๋ฐ ๋ฐ์ดํ„ฐ ์†Œ์Šค์— ์—ฐ๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ํ‘œ์ค€ํ™”๋œ ํ”„๋กœํ† ์ฝœ์ž…๋‹ˆ๋‹ค. ํ•จ์ˆ˜ ํ˜ธ์ถœ ์œ„์— ๊ตฌ์ถ•๋ฉ๋‹ˆ๋‹ค. ์œ„ ์˜ˆ์ œ์˜ ๋™์ผํ•œ `tools=[]` ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์ง€๋งŒ ํ‘œ์ค€ ์„œ๋ฒ„-ํด๋ผ์ด์–ธํŠธ ์•„ํ‚คํ…์ฒ˜๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ๋„๊ตฌ๋ฅผ ๋ฐœ๊ฒฌ ๊ฐ€๋Šฅํ•˜๊ณ  ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๊ฐ„์— ์žฌ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. Ollama๋Š” OpenAI ํ˜ธํ™˜ ํ•จ์ˆ˜ ํ˜ธ์ถœ ์—”๋“œํฌ์ธํŠธ๋ฅผ ํ†ตํ•ด MCP ์Šคํƒ€์ผ ๋„๊ตฌ ์ƒํ˜ธ ์ž‘์šฉ์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. ๊ฐ„๋‹จํ•œ ํ†ตํ•ฉ์˜ ๊ฒฝ์šฐ ์ด ๋ฌธ์„œ์˜ ํ•จ์ˆ˜ ํ˜ธ์ถœ ์˜ˆ์ œ๋กœ ์ถฉ๋ถ„ํ•ฉ๋‹ˆ๋‹ค. ๋ณต์žกํ•œ ๋‹ค์ค‘ ๋„๊ตฌ ์›Œํฌํ”Œ๋กœ์˜ ๊ฒฝ์šฐ MCP๊ฐ€ ๋” ๊ตฌ์กฐํ™”๋œ ์ ‘๊ทผ ๋ฐฉ์‹์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

Ollama /api/generate์™€ /v1/chat/completions์˜ ์ฐจ์ด์ ์€?

/api/generate๋Š” Ollama์˜ ๋„ค์ดํ‹ฐ๋ธŒ ๋‹จ์ผ ํ„ด ์—”๋“œํฌ์ธํŠธ์ž…๋‹ˆ๋‹ค. /v1/chat/completions๋Š” OpenAI ํ˜ธํ™˜ ๋‹ค์ค‘ ํ„ด ์—”๋“œํฌ์ธํŠธ์ž…๋‹ˆ๋‹ค. ๋ชจ๋“  ์ƒˆ ํ”„๋กœ์ ํŠธ์—์„œ๋Š” /v1/chat/completions๋ฅผ ์‚ฌ์šฉํ•˜์„ธ์š”. ๋Œ€ํ™” ๊ธฐ๋ก์„ ์ง€์›ํ•˜๋ฉฐ OpenAI ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์™€ ํ˜ธํ™˜๋ฉ๋‹ˆ๋‹ค.

vLLM์„ OpenAI ํ˜ธํ™˜ API๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?

๋„ค. vLLM์€ ๊ธฐ๋ณธ์ ์œผ๋กœ http://localhost:8000/v1์—์„œ OpenAI ํ˜ธํ™˜ ์„œ๋ฒ„๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์Œ ๋ช…๋ น์œผ๋กœ ์‹œ์ž‘ํ•˜์„ธ์š”: python -m vllm.entrypoints.openai.api_server --model mistralai/Mistral-7B-v0.1. Ollama์™€ ๋™์ผํ•œ ํด๋ผ์ด์–ธํŠธ ์ฝ”๋“œ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.

Node.js openai ํŒจํ‚ค์ง€๋กœ Ollama API๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์€?

openai์—์„œ OpenAI๋ฅผ ๊ฐ€์ ธ์˜ค์„ธ์š”. ์ƒ์„ฑ์ž์—์„œ baseURL: "http://localhost:11434/v1"๊ณผ apiKey: "ollama"๋ฅผ ์„ค์ •ํ•˜์„ธ์š”. ๊ทธ๋Ÿฐ ๋‹ค์Œ ์‹ค์ œ OpenAI API์™€ ๋™์ผํ•˜๊ฒŒ client.chat.completions.create()๋ฅผ ํ˜ธ์ถœํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. ๋‹ค๋ฅธ ๋ณ€๊ฒฝ์ด ํ•„์š” ์—†์Šต๋‹ˆ๋‹ค.

๋™์ผํ•œ ์ฝ”๋“œ๋ฒ ์ด์Šค์—์„œ Ollama์™€ OpenAI๋ฅผ ์–ด๋–ป๊ฒŒ ์ „ํ™˜ํ•ฉ๋‹ˆ๊นŒ?

ํ™˜๊ฒฝ ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์„ธ์š”. Ollama์˜ ๊ฒฝ์šฐ USE_LOCAL=true๋ฅผ ์„ค์ •ํ•˜๊ณ (base_url http://localhost:11434/v1, api_key "ollama"), OpenAI์˜ ๊ฒฝ์šฐ USE_LOCAL=false๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. OpenAI Python ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋Š” ์ƒ์„ฑ์ž ์ธ์ˆ˜๋กœ base_url์„ ์ˆ˜์šฉํ•ฉ๋‹ˆ๋‹ค. ํ”„๋กœ๋•์…˜์—์„œ USE_LOCAL=false๋กœ ์„ค์ •ํ•˜๋ฉด ๋‹ค๋ฅธ ์ฝ”๋“œ๋ฅผ ๋ณ€๊ฒฝํ•˜์ง€ ์•Š๊ณ ๋„ OpenAI๋กœ ์ „ํ™˜๋ฉ๋‹ˆ๋‹ค.

LangChain๊ณผ ํ•จ๊ป˜ OpenAI ํ˜ธํ™˜ API๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?

๋„ค. base_url="http://localhost:11434/v1"๊ณผ api_key="ollama"๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ChatOpenAI๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด Ollama๊ฐ€ ๋ชจ๋“  LangChain ํŒŒ์ดํ”„๋ผ์ธ์—์„œ OpenAI์˜ ๋“œ๋กญ์ธ ๋Œ€์ฒดํ’ˆ์ด ๋ฉ๋‹ˆ๋‹ค. RAG ์ฒด์ธ, ์—์ด์ „ํŠธ, ๋„๊ตฌ ๋ชจ๋‘ ์ˆ˜์ • ์—†์ด ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค. LangChain์—๋Š” Ollama ์ „์šฉ ๊ธฐ๋Šฅ์„ ์œ„ํ•œ ์ „์šฉ ChatOllama ํด๋ž˜์Šค๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

์ถœ์ฒ˜

  • Ollama. (2026). "Ollama OpenAI Compatibility." https://github.com/ollama/ollama/blob/main/docs/openai.md -- Ollama์˜ OpenAI ํ˜ธํ™˜ REST API ์—”๋“œํฌ์ธํŠธ์— ๋Œ€ํ•œ ๊ณต์‹ ๋ฌธ์„œ.
  • LM Studio. (2026). "LM Studio Local Server." https://lmstudio.ai/docs/local-server -- localhost:1234์˜ LM Studio OpenAI ํ˜ธํ™˜ ๋กœ์ปฌ ์„œ๋ฒ„ ๋ฌธ์„œ.
  • OpenAI. (2024). "OpenAI Python Library." https://github.com/openai/openai-python -- base_url ์žฌ์ •์˜๋ฅผ ํ†ตํ•ด OpenAI์™€ ๋กœ์ปฌ LLM ๋ชจ๋‘์— ์—ฐ๊ฒฐํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ๊ณต์‹ Python SDK.
  • vLLM Team. (2024). "vLLM OpenAI-Compatible Server." https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html -- vLLM์˜ OpenAI ํ˜ธํ™˜ API ์„œ๋ฒ„ ๋ฌธ์„œ(ํฌํŠธ 8000, ํ”„๋กœ๋•์…˜ ์‚ฌ์šฉ).

A Note on Third-Party Facts

This article references third-party AI models, benchmarks, prices, and licenses. The AI landscape changes rapidly. Benchmark scores, license terms, model names, and API prices can shift between the time of writing and the time you read this. Before making deployment or compliance decisions based on this article, verify current figures on each providerโ€™s official source: Hugging Face model cards for licenses and benchmarks, provider websites for API pricing, and EUR-Lex for current GDPR and EU AI Act text. This article reflects publicly available information as of May 2026.

Run PromptQuorum with a local LLM, your own API keys, or both โ€” you pick the backend.

Join the PromptQuorum Waitlist โ†’

โ† Back to Local LLMs

LM Studio & Ollama OpenAI API: Python & Node.js ์„ค์ • (2026)