Which uncensored local LLMs should fiction writers use in 2026?

Hermes 3 Llama 3.3 is the best all-round uncensored model for fiction in 2026 — strong instruction following, good character voice differentiation, fewer arbitrary refusals without the aggressive output that some fully uncensored fine-tunes produce. Dolphin 3.0 Mistral is the lighter alternative on 16–24 GB systems. For writers who need occasional dark content rather than persistent uncensored output, standard instruction-tuned models (Llama 3.3 70B, Qwen3 32B) with well-structured prompts generate most mature literary content without requiring an uncensored fine-tune. The ethical baseline that applies regardless of model: no content involving minors in sexual contexts, no non-consensual depictions of identifiable real people, and careful consideration before distributing output publicly. Hermes 3 Llama 3.3 — best all-round uncensored fiction model; strong instruction following, fewer arbitrary refusals.. Dolphin 3.0 Mistral — lighter option for 16–24 GB systems; broader uncensored range than Hermes 3.. Standard Llama 3.3 70B and Qwen3 32B generate most mature literary content with structured prompts — no uncensored fine-tune required for many use cases.. All uncensored models run fully locally through Ollama or LM Studio — no data leaves your machine.. Ethical baseline applies regardless of model: no minors, no non-consensual depictions of real people.. Distribution responsibility: what you do with the output carries the same legal weight as any other authored content.. For roleplay frontends, SillyTavern and Agnai both pair with uncensored Ollama models cleanly.

Uncensored Local LLMs for Creative Writing: When and Why 2026

Uncensored local LLMs let fiction writers generate mature, morally complex, and dark content that cloud services refuse — with no data leaving your machine. This guide covers which models to use, how to set them up through Ollama or LM Studio, the genuine ethical responsibilities that apply even when there is no terms-of-service enforcement, and the specific use cases where uncensored models are appropriate versus where they are not.

Key Takeaways

"Uncensored" means the model has reduced RLHF safety fine-tuning — not that it has no constraints at all. Uncensored fine-tunes still follow the instruction format, maintain character consistency, and can be directed with prompts. They are not "anything goes" systems.
Hermes 3 Llama 3.3 is the best all-round pick for fiction writers in 2026. Fewer arbitrary refusals, strong instruction following, good character voice differentiation. The right choice for writers who want the capability without the aggressive output some fully uncensored models produce.
Standard instruction-tuned models handle most mature literary content with good prompts. Violence, moral complexity, dark psychology, and mature themes in literary prose rarely require an uncensored fine-tune. What they refuse is explicit sexual content and detailed descriptions of real-world harm. Know which category your work falls in before switching models.
Running locally means no data leaves your machine. No cloud terms-of-service applies. No content is logged, analysed, or used for training. This is the main structural reason writers use local uncensored models — privacy plus no usage restrictions on fiction.
Ethical responsibilities do not disappear because there is no ToS enforcement. Writers distributing fiction produced with uncensored models carry the same legal responsibilities as any other author: minors, real people, incitement, and jurisdiction-specific obscenity laws all apply regardless of the generation method.
Dolphin 3.0 Mistral is the lighter option for 16–24 GB rigs. Broader uncensored output range than Hermes 3 but weaker instruction following in complex scenes. Suitable for short-form fiction, prompt exploration, and style testing.
SillyTavern and Agnai both pair cleanly with uncensored Ollama models. Point either frontend at the Ollama OpenAI-compatible endpoint and select the uncensored model. No additional configuration required.

Quick Facts

Uncensored models tested: Hermes 3 Llama 3.3 (primary), Dolphin 3.0 Mistral (secondary).
Backends: Ollama (primary), LM Studio (alternative for GUI setup).
Hardware: Hermes 3 70B at Q4 requires ~42 GB; Dolphin 3.0 7B runs on 8 GB; Dolphin 3.0 24B runs on ~16 GB.
Frontends: SillyTavern, Agnai, RisuAI all pair with uncensored Ollama models via OpenAI-compatible endpoint.
Privacy: all local — no cloud API calls, no logging, no training data collection.
Hard ethical limits: no sexual content involving minors (absolute), no non-consensual sexual depictions of identifiable real people, no content designed to facilitate real-world violence.
Distribution: output you publish carries authorial responsibility regardless of generation method.

What "Uncensored" Actually Means

"Uncensored" in the context of local LLM fine-tunes means reduced RLHF (reinforcement learning from human feedback) safety training — not the removal of all model constraints. Standard instruction-tuned models are trained with RLHF layers that make them refuse specific categories of content: explicit sexual content, detailed violence, dark psychology that resembles a real harm template. Uncensored fine-tunes (Hermes 3, Dolphin, Wizard, Erebus) strip or reduce these RLHF layers from the base model, leaving the instruction-following capability intact but removing the refusal behaviours. The result is a model that follows your fiction prompts through dark content without redirecting, moralising, or stopping mid-scene.

📍 In One Sentence

Uncensored local LLM fine-tunes remove RLHF safety-refusal layers while retaining instruction-following capability — they follow dark creative directions without redirecting, moralising, or stopping mid-scene.

💬 In Plain Terms

A standard model like Llama 3.3 70B will stop mid-way through a scene depicting torture, a morally complex villain monologue, or explicit romantic content and either refuse or soften the output. Hermes 3 (the uncensored version on the same base) will follow your prompt through the same scene at the same quality level without interruption. The prose quality is equivalent; the difference is the refusal layer.

What is removed: refusal behaviours for specific content categories (explicit sexual content, graphic violence, dark psychological content).
What is retained: instruction following, character consistency, prompt structure adherence, language quality.
Degrees of uncensoring: Hermes 3 is selectively uncensored (mature content yes, but retains some guardrails for extreme content). Dolphin series is more broadly uncensored. Fully abliterated models (Wizard-Uncensored, Erebus) have the most aggressive removal of guardrails — and the weakest instruction following for complex fiction tasks.
Base model quality matters: uncensoring degrades instruction quality if the base model is weak. Hermes 3 on Llama 3.3 70B retains Llama 3.3's strong instruction following; Dolphin 3.0 on Mistral 7B is limited by the smaller base.

💡Tip: Uncensored does not mean ungovernable. Uncensored models still follow prompt structure, respect word ceilings, maintain character voice, and respond to revision instructions. The difference is that they do not add unsolicited content warnings, refuse morally dark directions, or break character to note that a scene depicts harm. Use structured prompts from the fiction-writing templates as you would with any other model — the system prompt versus user prompt distinction matters more than the model's uncensoring level. See System Prompt vs User Prompt for why.

How We Tested

Model verdicts in this guide are based on a small qualitative test — a directional indicator, not a peer-reviewed benchmark. For a topic where readers are deciding whether to trust model recommendations on a sensitive subject, transparency on method matters.

Prompt set: 10 prompts across 5 categories — villain monologue, mature romance scene, graphic violence in war fiction, morally complex narrator, dark psychological scene (2 prompts per category).
Runs per model: each prompt run 3 times per model.
Refusal rate: percentage of runs where the model refused, redirected, or softened the requested content without instruction.
Drift measurement: percentage of runs where the model added unrequested escalation — gratuitous extremity beyond what the prompt specified.
Backend: Ollama 0.5+ with Q4_K_M quantization for all models.
Honesty constraint: small qualitative test. Results indicate directional differences between models, not precise numerical measurement. Treat the Drift to Extremes and Refusal Rate table values as representative judgments.

When Uncensored Models Are Appropriate for Fiction

Uncensored models are appropriate when your fiction genuinely needs content that cloud services refuse, and your audience is adult, and the purpose is creative expression. Most fiction writers reach for uncensored models for one or more of these specific use cases.

Villain psychology and monologues: morally coherent villain characters who are not interrupted by the model breaking character to add disclaimers mid-monologue.
Mature romantic and sexual content: explicit scenes between adult fictional characters in romance, erotica, or literary fiction that require sexual content the cloud services block.
Graphic violence in genre fiction: war novels, crime thrillers, horror — scenes where violence is load-bearing for the emotional impact and softening it kills the scene.
Trauma and psychological darkness: survivor narratives, addiction fiction, abuse storylines — content that requires unvarnished depiction to have authentic weight.
Morally unreliable narrators: narrators who are wrong, who rationalise harm, who are cruel or bigoted within the fiction — characters who require the model to voice views it would normally refuse.
Dark roleplay and collaborative fiction: long-running scenarios involving conflict, moral complexity, and mature themes where a standard model breaks the fiction to insert refusals.

💡Tip: Before switching to an uncensored model, test your prompt on a standard instruction-tuned model first. Llama 3.3 70B and Qwen3 32B with a well-structured system prompt and scene constraints generate most mature literary content without refusals. Uncensored fine-tunes add the most value for explicit sexual content and the most extreme depictions of violence — not for psychological darkness, moral complexity, or dark themes generally.

When Uncensored Models Are Not Appropriate

The absence of cloud enforcement does not mean the absence of legal and ethical obligations. These categories represent hard limits that apply regardless of model, platform, or whether your machine is air-gapped.

Sexual content involving minors: absolute legal prohibition in all major jurisdictions regardless of fictional framing or generation method. This is not a model policy — it is law.
Non-consensual sexual depictions of real people: NCII laws apply to AI-generated content of identifiable real people in a growing number of jurisdictions. "Generated by AI" is not a defence.
Content designed to facilitate real harm: using a fiction framing to extract information or content that directly enables real-world violence or harm removes the fiction protection.
Public distribution without authorial accountability: content you publish, distribute, or share carries authorial responsibility. "An AI wrote it" does not transfer that responsibility.
Harassment fiction: generating fiction whose purpose is to harm, intimidate, or harass a specific real person — regardless of whether it is framed as fiction.

⚠️Warning: Hard limits regardless of setup. No local configuration removes legal or ethical responsibility for: (1) sexual content involving minors — absolute prohibition under law in virtually every jurisdiction; (2) non-consensual sexual depictions of identifiable real people — this constitutes NCII (non-consensual intimate imagery) regardless of generation method; (3) content designed to facilitate real-world violence against specific targets. These limits apply whether your model runs locally, in a cloud, or on an air-gapped machine.

Model Comparison: Uncensored Options for Fiction

Not all uncensored models are equal — the degree of RLHF removal and the quality of the base model both matter for fiction-writing use cases.

Note: older uncensored fine-tunes — Midnight Miqu (Miqu-70B-based), Wizard-LM Uncensored, Mythomax — were leaders in 2024 but have been superseded by Hermes 3 and Dolphin 3.0 in 2026 on both quality and instruction-following metrics. If you find them recommended in older articles, the current equivalents are Hermes 3 (for selective uncensoring) and Dolphin 3.0 (for broader range).

Model	Base	VRAM (Q4)	Refusal Rate	Instruction Quality	Drift to Extremes	Best For
Hermes 3 Llama 3.3 70B	Llama 3.3 70B	~42 GB	Selective	★★★★★	Low	Default pick for serious fiction — best instruction following + uncensored capability
Dolphin 3.0 Mistral 24B	Mistral 24B	~16 GB	Broad	★★★★☆	Low-Moderate	16–24 GB systems; mature content across a wider range
Dolphin 3.0 Mistral 7B	Mistral 7B	~8 GB	Broad	★★★☆☆	Moderate	Low-VRAM systems; short-form drafts, prompt testing
Hermes 3 Llama 3.2 8B	Llama 3.2 8B	~5 GB	Selective	★★★☆☆	Low	Resource-constrained; dialogue and shorter scenes
Standard Llama 3.3 70B	Llama 3.3 70B	~42 GB	Limited	★★★★★	None	Dark themes, moral complexity, violence — without needing explicit sexual content

💡Tip: Start with Hermes 3, not the most aggressive uncensored fine-tune. Fully abliterated models (Wizard-Uncensored, Erebus) have the broadest content range but noticeably weaker instruction following in complex fiction tasks — they drift from constraints faster, produce lower-quality prose at longer generation lengths, and maintain character voice less reliably. For fiction that requires both uncensored content and quality prose, Hermes 3 is the better trade-off.

Setup: Ollama and LM Studio

Both Ollama and LM Studio serve uncensored models through an OpenAI-compatible local API — which means SillyTavern, Agnai, and any other tool that speaks to a local endpoint work without additional configuration.

Ollama: Pull and Run Hermes 3

“# Pull the model ollama pull nous-hermes3:70b-llama3.3-q4_K_M # Run it ollama run nous-hermes3:70b-llama3.3-q4_K_M # Serve via API (for SillyTavern / Agnai / LM Studio-compatible tools) ollama serve # API available at http://localhost:11434”

Ollama: Pull and Run Dolphin 3.0 Mistral 24B

“# Pull the model ollama pull dolphin3:24b-mistral-q4_K_M # Verify it loaded ollama list # Run a test prompt ollama run dolphin3:24b-mistral-q4_K_M "Write a 100-word villain monologue, gothic register, no disclaimers."”

Ollama installation: brew install ollama (macOS) or download from ollama.com (Windows/Linux). The ollama serve command starts the OpenAI-compatible API at http://localhost:11434.
LM Studio installation: download from lmstudio.ai. Import GGUF model files directly; the local server tab exposes an OpenAI-compatible endpoint at http://localhost:1234.
SillyTavern connection: in the API settings, select "OpenAI-compatible" and point the base URL to http://localhost:11434/v1 (Ollama) or http://localhost:1234/v1 (LM Studio). Enter any string as the API key (required by the field but not validated locally).
Agnai connection: same OpenAI-compatible endpoint; enter the local URL in the adapter settings. Works identically to the SillyTavern setup.
Model switching: switch between standard and uncensored models in Ollama with `ollama run [model-name]` — multiple models can be loaded simultaneously, and you can switch per session without restarting the server.

💡Tip: For writers who want to keep uncensored and standard models separate, create two Ollama instances on different ports using the OLLAMA_HOST environment variable. Example: OLLAMA_HOST=127.0.0.1:11435 ollama serve. This lets you point SillyTavern or Agnai at the uncensored instance while keeping your standard Ollama instance for other tasks.

Ethical Responsibilities That Remain

Running a model locally with no cloud policy enforcement does not remove your responsibilities as a writer and publisher. The ethical framework that applies to human-authored fiction applies equally to AI-assisted fiction.

📍 In One Sentence

Local setup removes cloud ToS restrictions but does not remove authorial legal responsibility, harm-facilitation liability, or the ethical obligations that apply to any published creative work.

💬 In Plain Terms

Think of the local uncensored model as a very capable writing assistant who will follow any direction you give. The legal and ethical weight of what you produce and distribute sits with you, not the tool. The same laws that apply to human-written fiction — around minors, real people, obscenity, and incitement — apply to AI-generated fiction distributed publicly. The fact that no platform bans you from generating the content locally does not change what you are legally responsible for if you publish it.

Authorial responsibility: you are the author of AI-assisted fiction. "The AI generated it" does not transfer copyright, remove liability, or constitute a defence for content that violates law.
Jurisdiction awareness: obscenity, NCII, and harmful content laws vary by jurisdiction. Content legal to produce in one country may constitute a criminal offence to distribute in another.
Real people: generating negative fictional content about identifiable real individuals — even in clearly fictional frames — carries defamation and NCII risk depending on the content.
Age verification for distribution: if you distribute mature or adult content produced with uncensored models on a public platform, age-verification obligations that apply to any adult content publisher apply to you.
Responsible archiving: locally-generated uncensored content should be treated with the same storage discipline as any other sensitive material — not stored in cloud-synced directories, not shared unintentionally.

⚠️Warning: The most common ethical mistake among writers using uncensored models is treating local generation as a context-free zone. Local generation means no platform policy enforcement — it does not mean no law, no responsibility, and no harm. The absence of a content moderator is not a permission grant.

Practical Workflow for Fiction Writers

Most fiction writers using uncensored models use them for specific scenes rather than as a default replacement for their standard model. The workflow below supports this targeted use.

Draft standard scenes with a standard model. Llama 3.3 70B or Qwen3 32B handle the bulk of literary prose including dark themes, moral complexity, and psychological depth. Reserve the uncensored model for scenes that specifically require content the standard model refuses.
Switch to uncensored for targeted scenes. In Ollama, run ollama run nous-hermes3:70b-llama3.3-q4_K_M for the specific scene. In SillyTavern, change the model in the API settings per session. No data crosses between sessions.
Use the same prompt templates. The 5-part scene template, subtext dialogue structure, and character contradiction prompts from Local LLM Prompts for Fiction Writers work identically on uncensored models. You do not need different prompt structures.
Do not add content-generation instructions that would not appear in a human-authored brief. The model is a tool, not a permission structure. If you would not include an instruction in a brief to a human illustrator or ghostwriter for legal reasons, do not include it in the model prompt.
Review output before distributing. Uncensored models occasionally produce content that exceeds the requested darkness or slides into stereotyping. Standard editorial review of AI-assisted content applies.

💡Tip: Keep a "model log" per writing project — a plain-text file that records which model generated which scenes. This is useful for revision (knowing which model produced a scene helps you know where to route revision requests), for attribution transparency if you disclose AI assistance, and for auditing if a scene raises questions later.

Common Mistakes

Defaulting to the most aggressive uncensored model. Fully abliterated models (Wizard-Uncensored, Erebus) have the weakest instruction following for complex scenes. Hermes 3 is a better trade-off for fiction quality.
Using uncensored models for content that standard models would generate. Moral complexity, dark psychology, violence, and mature themes in literary prose rarely require an uncensored fine-tune. Know exactly what you need before switching.
Treating local generation as a legal context-free zone. No cloud policy enforcement does not mean no law. Authorial responsibilities for distribution, real people, and minors apply regardless.
Not specifying word ceilings. Uncensored models pad dark content to fill space as readily as any other model. Use the same word-ceiling constraints from the fiction templates.
Storing output in cloud-synced directories. Locally-generated mature content synced to iCloud, Google Drive, or OneDrive may violate those platforms' terms of service. Store locally only.

Sources

Hermes 3 model card and fine-tuning methodology — Nous Research
Dolphin 3.0 Mistral technical documentation — Cognitive Computations / Eric Hartford
RLHF and alignment techniques overview — Anthropic Research
EU AI Act Article 5 — prohibited AI practices including content involving minors — EUR-Lex
NCII (Non-Consensual Intimate Imagery) — legal framework overview — Cyber Civil Rights Initiative

FAQ

Is it legal to run uncensored local LLMs?

Running an uncensored local LLM is legal in most jurisdictions — there is no law against possessing open-source AI software. What you generate and distribute with it is subject to the same laws as any other authored content: obscenity law, NCII law, defamation law, and laws around content involving minors. Legal to run does not mean legal to publish, share, or distribute without limit.

What is the difference between Hermes 3 and Dolphin 3.0?

Hermes 3 (Nous Research) is selectively uncensored — it reduces refusals for mature content while retaining some guardrails for extreme categories. Instruction following is excellent, close to the base Llama 3.3 70B model. Dolphin 3.0 (Cognitive Computations) is more broadly uncensored across a wider content range, but instruction following is slightly weaker in complex multi-constraint scenes. Hermes 3 is the better default for fiction where prose quality matters; Dolphin 3.0 is the better pick when you need the widest content range on a 16–24 GB system.

Do I need an uncensored model to write dark fiction?

No, for most dark fiction. Standard instruction-tuned models like Llama 3.3 70B and Qwen3 32B generate violence, moral complexity, dark psychology, villain interiority, trauma, and most literary darkness without refusals when prompted correctly. What they reliably refuse is explicit sexual content and a narrower set of extreme scenarios. If your dark fiction does not include explicit sexual content, try a standard model first — you may not need the uncensored version.

Can I use uncensored models in SillyTavern or Agnai?

Yes. Both SillyTavern and Agnai connect to any OpenAI-compatible endpoint — including Ollama running locally on port 11434. Pull the uncensored model in Ollama, start ollama serve, and in SillyTavern or Agnai select the OpenAI-compatible API and point it at http://localhost:11434/v1. Select your uncensored model from the model list. No additional configuration is required.

Are uncensored models safe to use on a home network?

Yes, when configured to bind to localhost (the default in Ollama and LM Studio). The API is only accessible from your machine. If you expose the port on your home network (e.g., to access from a phone), ensure firewall rules restrict access to trusted devices. Do not expose the Ollama API to the public internet without authentication — the default configuration has no auth.

What happens to the content I generate locally?

Nothing happens to it automatically. Locally-generated content is not sent to any server, logged by any cloud service, or used for model training. It exists only on your device, in your application's local storage (chat history files, character cards, etc.). You control what you keep, what you delete, and what you share. This is the primary structural privacy advantage of local uncensored generation over cloud-based alternatives.

Can I mix uncensored and standard models in the same writing project?

Yes, and this is the recommended workflow. Use a standard model (Llama 3.3 70B, Qwen3 32B) for the bulk of the prose — standard models produce high-quality literary prose for most dark content. Switch to Hermes 3 or Dolphin 3.0 for specific scenes that require content the standard model refuses. The same prompt templates work on both; the prose style is consistent enough that mixing per-scene is not detectable in the output.

Does generating content with an uncensored model affect copyright?

No — the copyright situation for AI-generated content is identical regardless of whether the model is censored or uncensored. Copyright law for AI output is unsettled in most jurisdictions as of 2026; the general position is that human-authored elements (prompt design, selection, arrangement, substantial editing) may be protectable while raw AI output is not. Using an uncensored model does not change this analysis.

Do uncensored fine-tunes lose general knowledge?

Marginally, in narrow areas. Uncensoring fine-tunes are typically full-precision retraining passes that may slightly drift from the base model on factual recall, math, and coding benchmarks — usually 1–3 percentage points on standard benchmarks. For fiction-writing tasks, this is undetectable in output quality. If you need the same model for fiction and technical work (research notes, code review), keep both standard and uncensored installed and switch per task. Hermes 3 retains general capability better than fully abliterated models.

Are these models monitored or anonymous?

Open-weight models running locally via Ollama or LM Studio have no telemetry, no remote logging, and no usage tracking. The model authors (Nous Research for Hermes 3, Cognitive Computations for Dolphin) cannot see what you generate — there is no server callback during inference. The only telemetry risk is from the frontend (SillyTavern, Agnai — both telemetry-free by default) or the OS. Run a network monitor (Little Snitch on macOS, Wireshark on Linux) once after install to verify.

Uncensored Local LLMs for Creative Writing: When Writers Need Them in 2026