Skip to main content
PromptQuorumPromptQuorum
Home/Local LLMs/VS Code์™€ Cursor์—์„œ ๋กœ์ปฌ LLM ์‚ฌ์šฉํ•˜๊ธฐ: ์„ค์ • ๋ฐ ๋ชจ๋ฒ” ์‚ฌ๋ก€
Tools & Interfaces

VS Code์™€ Cursor์—์„œ ๋กœ์ปฌ LLM ์‚ฌ์šฉํ•˜๊ธฐ: ์„ค์ • ๋ฐ ๋ชจ๋ฒ” ์‚ฌ๋ก€

ยท10๋ถ„ ๋ถ„๋Ÿ‰ยทBy Hans Kuepper ยท Founder of PromptQuorum, multi-model AI dispatch tool ยท PromptQuorum

VS Code์™€ Cursor(AI ์ค‘์‹ฌ ์ฝ”๋“œ ํŽธ์ง‘๊ธฐ)๋Š” ๋ชจ๋‘ Continue.dev ํ™•์žฅ(VS Code) ๋˜๋Š” ์ง์ ‘ ํ†ตํ•ฉ(Cursor)์„ ํ†ตํ•ด ๋กœ์ปฌ LLM์„ ์ฝ”๋“œ ์ž๋™ ์™„์„ฑ ๋ฐ ์ œ์•ˆ์— ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

VS Code์™€ Cursor(AI ์ค‘์‹ฌ ์ฝ”๋“œ ํŽธ์ง‘๊ธฐ)๋Š” ๋ชจ๋‘ Continue.dev ํ™•์žฅ(VS Code) ๋˜๋Š” ์ง์ ‘ ํ†ตํ•ฉ(Cursor)์„ ํ†ตํ•ด ๋กœ์ปฌ LLM์„ ์ฝ”๋“œ ์ž๋™ ์™„์„ฑ ๋ฐ ์ œ์•ˆ์— ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. 2026๋…„ 4์›” ๊ธฐ์ค€, 7B~13B ๋ชจ๋ธ์—์„œ ๋กœ์ปฌ ์ฝ”๋“œ ์ž๋™ ์™„์„ฑ์ด ์‹ค์šฉ์ ์ด๋ฉฐ 8~16GB RAM์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ฐ€์ด๋“œ๋Š” ์„ค์ • ๋ฐฉ๋ฒ•, ์ตœ์  ๋ชจ๋ธ, ์„ฑ๋Šฅ ํŠœ๋‹์„ ๋‹ค๋ฃน๋‹ˆ๋‹ค.

Key Takeaways

  • VS Code๋Š” Continue.dev ํ™•์žฅ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋กœ์ปฌ ๋ชจ๋ธ(Ollama, LM Studio, vLLM)์— ์—ฐ๊ฒฐํ•ฉ๋‹ˆ๋‹ค.
  • Cursor๋Š” VS Code ํฌํฌ๋กœ ๋กœ์ปฌ ๋ชจ๋ธ ์ง€์›์ด ๋‚ด์žฅ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋ณ„๋„ ํ™•์žฅ์ด ํ•„์š”ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
  • ์ฝ”๋“œ์šฉ ์ตœ์  ๋กœ์ปฌ ๋ชจ๋ธ: Qwen3-Coder 7B, Llama Code 13B ๋˜๋Š” Mistral Small.
  • 7B ๋ชจ๋ธ ๊ธฐ์ค€ ์†Œ๋น„์ž GPU์—์„œ 2~5์ดˆ์˜ ์ž๋™ ์™„์„ฑ ์ง€์—ฐ์„ ์˜ˆ์ƒํ•˜์‹ญ์‹œ์˜ค.
  • 2026๋…„ 4์›” ๊ธฐ์ค€, ๋กœ์ปฌ ์ฝ”๋“œ ์ž๋™ ์™„์„ฑ์€ ๊ฐœ์ธ ์‚ฌ์šฉ์—๋Š” ์‹ค์šฉ์ ์ด๋‚˜ ํŒ€ ํ”„๋กœ๋•์…˜ ํ™˜๊ฒฝ์—๋Š” ์•„์ง ์ ํ•ฉํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

VS Code์—์„œ Continue.dev๋ฅผ ์„ค์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋ฌด์—‡์ž…๋‹ˆ๊นŒ?

Continue.dev๋Š” ๋กœ์ปฌ ๋ฐ ํด๋ผ์šฐ๋“œ ์ฝ”๋“œ ์ž๋™ ์™„์„ฑ์„ ์œ„ํ•œ VS Code ํ™•์žฅ์ž…๋‹ˆ๋‹ค.

json
# 1. Install Continue from VS Code marketplace
# Search "Continue" and click Install

# 2. Make sure Ollama is running
ollama serve

# 3. Open Continue settings (Ctrl+Shift+P โ†’ Continue: Open Settings)
# config.json opens

# 4. Configure for your local model:
# Replace the default settings with:
{
  "models": [{
    "title": "Ollama",
    "provider": "ollama",
    "model": "qwen2.5-coder:7b",
    "apiBase": "http://localhost:11434"
  }],
  "tabAutocompleteModel": {
    "title": "Ollama",
    "provider": "ollama",
    "model": "qwen2.5-coder:7b"
  }
}

# 5. Start typing code and press Tab for completions
# Or Ctrl+Shift+\ to manually trigger completions

Cursor์—์„œ ๋กœ์ปฌ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋ฌด์—‡์ž…๋‹ˆ๊นŒ?

Cursor๋Š” AI ์ง€์› ์ฝ”๋”ฉ์— ์ตœ์ ํ™”๋œ VS Code ํฌํฌ์ž…๋‹ˆ๋‹ค. Ollama๋ฅผ ํ†ตํ•œ ๋กœ์ปฌ ๋ชจ๋ธ ์ง€์›์ด ๋‚ด์žฅ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

bash
# 1. Download Cursor from cursor.sh
# 2. Make sure Ollama is running
ollama serve

# 3. Open Cursor Settings (Cmd/Ctrl + ,)
# 4. Search "Model" and set:
#    - Model Provider: "Ollama"
#    - Model: "qwen2.5-coder:7b" (or your choice)
#    - API Base: "http://localhost:11434"

# 5. Type code and press Tab for inline completions
# 6. Ctrl+K for multi-line completions

์ฝ”๋“œ์šฉ์œผ๋กœ ๊ฐ€์žฅ ์ ํ•ฉํ•œ ๋ชจ๋ธ์€ ๋ฌด์—‡์ž…๋‹ˆ๊นŒ?

ModelHumanEvalVRAMSpeedBest For
Qwen3-Coder 7B72%4.7 GB๋น ๋ฆ„์ตœ์ƒ์˜ ๊ท ํ˜•, ๊ฐ€์žฅ ๋น ๋ฆ„
Llama Code 7B69%4.7 GB๋น ๋ฆ„์ผ๋ฐ˜ ์ฝ”๋”ฉ
Mistral Small61%4.5 GB๋งค์šฐ ๋น ๋ฆ„๊ฒฝ๋Ÿ‰, EU ์„œ๋ฒ„
Llama Code 13B74%8.5 GB๋ณดํ†ต16GB ๋จธ์‹ ์—์„œ ๋” ๋†’์€ ํ’ˆ์งˆ
DeepSeek-Coder 6.7B68%4 GB๋น ๋ฆ„๊ฒฝ๋Ÿ‰ ๋Œ€์•ˆ

์˜ˆ์ƒ ์ง€์—ฐ ์‹œ๊ฐ„๊ณผ VRAM์€ ์–ผ๋งˆ์ž…๋‹ˆ๊นŒ?

์ž๋™ ์™„์„ฑ ์ง€์—ฐ ์‹œ๊ฐ„(์ฒซ ๋ฒˆ์งธ ํ† ํฐ๊นŒ์ง€์˜ ์‹œ๊ฐ„)์€ IDE ๊ฒฝํ—˜์— ๋งค์šฐ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค. 2026๋…„ 4์›” ๊ธฐ์ค€ ๋Œ€ํ‘œ์ ์ธ ์ˆ˜์น˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

HardwareModelLatencyThroughput
RTX 4090 GPUQwen3-Coder 7B0.3~0.5์ดˆ150 ํ† ํฐ/์ดˆ
RTX 4070 GPUQwen3-Coder 7B0.8~1.5์ดˆ80 ํ† ํฐ/์ดˆ
M3 MacBook ProQwen3-Coder 7B2~3์ดˆ20 ํ† ํฐ/์ดˆ
8์ฝ”์–ด CPU๋งŒ ์‚ฌ์šฉQwen3-Coder 7B5~10์ดˆ3 ํ† ํฐ/์ดˆ

์ฝ”๋“œ ์ž๋™ ์™„์„ฑ์„ ์œ„ํ•œ ๊ณ ๊ธ‰ ์„ค์ •

๋‹ค์Œ ์„ค์ •์œผ๋กœ ๊ฒฝํ—˜์„ ์„ธ๋ฐ€ํ•˜๊ฒŒ ์กฐ์ •ํ•˜์‹ญ์‹œ์˜ค:

json
# config.json advanced settings
{
  "tabAutocompleteModel": {
    "contextLength": 2048,     # How much code context to send
    "maxTokens": 50            # Max tokens per completion
  },
  "completionOptions": {
    "maxContextTokens": 1024,
    "maxSuggestionsCount": 5,
    "debounceWaitMs": 200      # Wait before showing completions (ms)
  },
  # For faster inference, use smaller context:
  "models": [{
    "contextLength": 1024      # Smaller context = faster
  }]
}

# For best speed on 8GB machines:
# - Use 7B model (not 13B)
# - Set maxTokens to 30
# - Set debounceWaitMs to 500 (less flickering)

๋กœ์ปฌ ์ฝ”๋“œ ์ž๋™ ์™„์„ฑ์˜ ์ž์ฃผ ๋ฐœ์ƒํ•˜๋Š” ์‹ค์ˆ˜

  • ๋””๋ฐ”์šด์Šค ์ง€์—ฐ ์‹œ๊ฐ„ ๋ฏธ์กฐ์ •. ์ž๋™ ์™„์„ฑ์ด "๋А๋ฆฌ๊ฒŒ" ๋А๊ปด์ง„๋‹ค๋ฉด debounceWaitMs๋ฅผ ๋Š˜๋ฆฌ์‹ญ์‹œ์˜ค(์˜ˆ: 400ms). ๋ถˆ์™„์ „ํ•œ ์ œ์•ˆ์ด ๋‚˜ํƒ€๋‚˜๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • VRAM์— ๋น„ํ•ด ๋„ˆ๋ฌด ํฐ ๋ชจ๋ธ ์‚ฌ์šฉ. 13B ๋ชจ๋ธ๊ณผ ํŽธ์ง‘๊ธฐ ์˜ค๋ฒ„ํ—ค๋“œ๋ฅผ ํ•ฉ์น˜๋ฉด 12GB ์ด์ƒ์ด ํ•„์š”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. 8GB ๋จธ์‹ ์—์„œ๋Š” 7B ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์‹ญ์‹œ์˜ค.
  • ํด๋ผ์šฐ๋“œ ์ˆ˜์ค€์˜ ์ฝ”๋“œ ํ’ˆ์งˆ ๊ธฐ๋Œ€. GPT-5.5๋Š” 7B ๋ชจ๋ธ๋ณด๋‹ค ์ฝ”๋“œ ํ’ˆ์งˆ์ด ํ˜„์ €ํžˆ ๋†’์Šต๋‹ˆ๋‹ค. ๋กœ์ปฌ ์ž๋™ ์™„์„ฑ์€ ํด๋ผ์šฐ๋“œ ํ’ˆ์งˆ์˜ 70~80% ์ˆ˜์ค€์ž…๋‹ˆ๋‹ค.
  • CPU์—์„œ ์ถ”๋ก  ์‹คํ–‰. CPU ์ž๋™ ์™„์„ฑ์€ ๋น„์‹ค์šฉ์ ์ž…๋‹ˆ๋‹ค(5~10์ดˆ ์ง€์—ฐ). ์‹ค์šฉ์ ์ธ ์ž๋™ ์™„์„ฑ์„ ์œ„ํ•ด์„œ๋Š” GPU๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

๋กœ์ปฌ ์ฝ”๋“œ ์ž๋™ ์™„์„ฑ์— ๊ด€ํ•œ ์ž์ฃผ ๋ฌป๋Š” ์งˆ๋ฌธ

๋กœ์ปฌ ์ฝ”๋“œ ์ž๋™ ์™„์„ฑ์ด ํด๋ผ์šฐ๋“œ๋ณด๋‹ค ๋น ๋ฆ…๋‹ˆ๊นŒ?

๊ทธ๋ ‡์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ํด๋ผ์šฐ๋“œ ์ž๋™ ์™„์„ฑ(GitHub Copilot)์€ ์ตœ์ ํ™”๋œ ์„œ๋ฒ„ ๋•๋ถ„์— ๋” ๋น ๋ฆ…๋‹ˆ๋‹ค. ๋กœ์ปฌ ์ž๋™ ์™„์„ฑ์€ ์ง€์—ฐ ์‹œ๊ฐ„์ด ๋” ๊ธธ์ง€๋งŒ ๋น„์šฉ์ด ์—†๊ณ  ๊ฐœ์ธ ์ •๋ณด ๋ณดํ˜ธ ์œ„ํ—˜๋„ ์—†์Šต๋‹ˆ๋‹ค.

๋‹ค๋ฅธ IDE(PyCharm, Neovim)์—์„œ๋„ ๋กœ์ปฌ ์ž๋™ ์™„์„ฑ์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?

๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. ๋‹จ, ์„ค์ • ๋ฐฉ๋ฒ•์ด ๋‹ค๋ฆ…๋‹ˆ๋‹ค. PyCharm์—๋Š” Ollama ํ”Œ๋Ÿฌ๊ทธ์ธ์ด ์žˆ์Šต๋‹ˆ๋‹ค. Neovim์˜ ๊ฒฝ์šฐ cmp-ollama(์ž๋™ ์™„์„ฑ ํ”Œ๋Ÿฌ๊ทธ์ธ)๋ฅผ ์‚ฌ์šฉํ•˜์‹ญ์‹œ์˜ค. ๊ฐ IDE ์ปค๋ฎค๋‹ˆํ‹ฐ์—์„œ ํ†ตํ•ฉ ๋ฐฉ๋ฒ•์„ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค.

Continue๋‚˜ Cursor์—์„œ ํด๋ผ์šฐ๋“œ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?

๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. Continue๋ฅผ OpenAI, Claude ๋˜๋Š” Gemini์™€ ํ•จ๊ป˜ ์‚ฌ์šฉํ•˜๋„๋ก ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋น ๋ฅธ ์ž‘์—…์—๋Š” ๋กœ์ปฌ, ๋ณต์žกํ•œ ์ฝ”๋“œ์—๋Š” ํด๋ผ์šฐ๋“œ๋ฅผ ํ˜ผํ•ฉํ•˜์—ฌ ์‚ฌ์šฉํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

๋กœ์ปฌ ์ฝ”๋“œ ์ž๋™ ์™„์„ฑ์€ ์˜คํ”„๋ผ์ธ์—์„œ ์ž‘๋™ํ•ฉ๋‹ˆ๊นŒ?

์˜ˆ. Ollama์—์„œ ๋ชจ๋ธ์„ ์ด๋ฏธ pullํ•œ ๊ฒฝ์šฐ ์ž๋™ ์™„์„ฑ์€ ์™„์ „ํžˆ ์˜คํ”„๋ผ์ธ์œผ๋กœ ๋™์ž‘ํ•ฉ๋‹ˆ๋‹ค.

์ถœ์ฒ˜

  • Continue.dev -- continue.dev
  • Cursor Editor -- cursor.sh
  • Continue GitHub -- github.com/continuedev/continue
  • Qwen3-Coder -- github.com/QwenLM/Qwen3-Coder
  • IDE ํ†ตํ•ฉ์€ ์ ˆ๋ฐ˜์— ๋ถˆ๊ณผํ•ฉ๋‹ˆ๋‹ค. ์ฝ”๋“œ ์ƒ์„ฑ์„ ์œ„ํ•œ ํšจ๊ณผ์ ์ธ ํ”„๋กฌํ”„ํŠธ ์ž‘์„ฑ์€ ์ผ๋ฐ˜ ๋Œ€ํ™”์™€ ๋‹ค๋ฅธ ๋งˆ์ธ๋“œ์…‹์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ๊ฐœ๋ฐœ์ž๋ฅผ ์œ„ํ•œ ํ”„๋กฌํ”„ํŠธ ์—”์ง€๋‹ˆ์–ด๋ง์„ ์•Œ์•„๋ณด์‹ญ์‹œ์˜ค: best prompt engineering IDEs์—์„œ ๋„๊ตฌ์™€ ๊ธฐ๋ฒ•์„ ๋น„๊ตํ•ฉ๋‹ˆ๋‹ค.

A Note on Third-Party Facts

This article references third-party AI models, benchmarks, prices, and licenses. The AI landscape changes rapidly. Benchmark scores, license terms, model names, and API prices can shift between the time of writing and the time you read this. Before making deployment or compliance decisions based on this article, verify current figures on each providerโ€™s official source: Hugging Face model cards for licenses and benchmarks, provider websites for API pricing, and EUR-Lex for current GDPR and EU AI Act text. This article reflects publicly available information as of May 2026.

Run PromptQuorum with a local LLM, your own API keys, or both โ€” you pick the backend.

Join the PromptQuorum Waitlist โ†’

โ† Back to Local LLMs

VS Code์™€ Cursor์—์„œ ๋กœ์ปฌ LLM ์‚ฌ์šฉํ•˜๊ธฐ: 2026 ์„ค์ • ๊ฐ€์ด๋“œ | PromptQuorum