How do I build a WeChat bot with a local LLM?

Use WeChatFerry (Windows) to hook into WeChat PC client, connect to Ollama via HTTP API, and route incoming messages to the local LLM. Total setup time: 30–60 minutes with Python.

What is the best model for Chinese WeChat messages?

Qwen3 8B is the best balance of quality and speed — excellent Chinese comprehension, 4.7GB fits in 8GB VRAM, responds in 1–3 seconds with a GPU.

WeChat Bot with Local LLM: Personal Assistant Guide 2026

This page contains links to third-party products for reference. PromptQuorum is not enrolled in any affiliate program — these are plain links that earn no commission. Clicking links and your next steps are entirely your own responsibility. These links do not represent any endorsement or verification by PromptQuorum.

Key Takeaways

WeChatFerry + Ollama: the recommended local WeChat bot stack for 2026
Qwen3 8B: best local model for Chinese-language WeChat responses
Windows required: WeChatFerry hooks into the WeChat PC client (Windows only)
Setup time: 30–60 minutes for someone comfortable with Python
No cloud API: all inference runs locally, no message data sent externally
Risk: WeChat ToS prohibits automated bots — use for personal assistants only

1
Install Ollama and pull Qwen3 8B
Why it matters: Download Ollama from ollama.com and run: `ollama pull qwen3:8b`
2
Log in to WeChat PC
Why it matters: Open WeChat on Windows and scan the QR code to log in. Keep it logged in and running in the background.
3
Install WeChatFerry
Why it matters: Install via pip: `pip install wcferry`. WeChatFerry injects into the WeChat process to expose a message API.
4
Create the Python message handler
Why it matters: Create `wechat_bot.py` with WeChatFerry client, Ollama HTTP API calls, and message routing logic.
5
Test with a self-message
Why it matters: Send a WeChat message to yourself starting with "@ai" and verify the bot responds within 10 seconds.
6
Add conversation history
Why it matters: Store the last 10 messages per contact in a dict to enable multi-turn conversation context.
7
Run as a background service
Why it matters: Use NSSM (Non-Sucking Service Manager) to run the Python script as a Windows service that starts automatically.

Model	Size	Chinese Quality	Speed (CPU)	Speed (8GB VRAM)
Qwen3:8b	4.7 GB	Excellent	3–5 tok/s	30–45 tok/s
Qwen3:14b	9 GB	Best	1–2 tok/s	15–20 tok/s
Qwen3:3b	2 GB	Good	8–12 tok/s	60+ tok/s
Llama3.1:8b	4.7 GB	Moderate	3–5 tok/s	30–45 tok/s

Does this WeChat bot work on Mac?

No. WeChatFerry requires Windows and hooks into the WeChat Windows PC client via DLL injection. macOS users can run Windows in a virtual machine (Parallels or VMware Fusion) to use this setup.

Will my WeChat account get banned for using a bot?

WeChat prohibits automated bots in its Terms of Service. Accounts detected using automation tools risk temporary suspension or permanent ban. Use only for personal productivity at low message volumes.

What is the best Ollama model for Chinese WeChat messages?

Qwen3 8B is the best balance of quality and speed for Chinese-language WeChat responses — excellent Chinese comprehension, fast enough on most hardware, and the 4.7GB model fits in 8GB VRAM.

Can the bot handle group chats?

Yes. WeChatFerry exposes group messages with the room ID. Modify the on_message handler to check msg.roomid and filter which groups the bot should respond in. Add a trigger keyword to avoid responding to every group message.

WeChat Bot with Local LLM: Personal Assistant 2026

Does this WeChat bot work on Mac?

Will my WeChat account get banned for using a bot?

What is the best Ollama model for Chinese WeChat messages?

Can the bot handle group chats?