Key Takeaways
- RTX 4090: 575W. Needs 1200W PSU, excellent case airflow.
- RTX 4080: 320W. Needs 850W PSU, good airflow.
- RTX 4070 Ti: 290W. Needs 750W PSU, adequate airflow.
- M5 Max Mac: 25-35W for inference (extremely efficient).
- Running 24/7 costs: RTX 4090 = $50-70/month, RTX 4070 Ti = $20-25/month.
- As of April 2026, cooling is critical. Poor airflow reduces lifespan and throttles performance.
How Much Power Does Each GPU Draw for LLM Inference?
The RTX 4090 and RTX 5090 both draw 575W at full load β the highest tier available for local LLMs. GPU power draw is the dominant factor in your PSU choice and electricity bill.
Note: NVIDIA RTX 4090 has 450W base TDP but real-world inference can hit 575W under sustained load. RTX 5090 ships with 575W native TDP. AMD RX 7900 XTX is the strongest non-NVIDIA discrete GPU for local LLMs at 355W with 24 GB VRAM. Apple M5 Max draws 10Γ less power per token than RTX 4090 β the most efficient choice for sustained 24/7 inference.
| GPU | Power | Idle | PSU |
|---|---|---|---|
| RTX 5090 | 575W | 20W | 1200W+ |
| RTX 4090 | 450W (575W max) | 10W | 1200W+ |
| RTX 5080 | 360W | 15W | 1000W |
| RTX 4080 | 320W | 8W | 850W+ |
| RTX 5070 | 250W | 12W | 800W |
| RTX 4070 Ti | 285W | 7W | 750W+ |
| RTX 4070 | 200W | 6W | 650W |
| AMD RX 7900 XTX | 355W | 25W | 850W |
| Apple M5 Max (GPU) | 25β35W | 1W | Built-in |
| Apple M5 Pro (GPU) | 20β28W | 1W | Built-in |
β οΈWarning: RTX 5090 TDP: NVIDIA rates it at 575W but real-world peaks can hit 600W+ depending on power limit settings.
How Much Total Power Does a Local LLM PC Use?
The GPU is not the only power consumer. Factor in CPU, RAM, storage, and motherboard:
| Component | Power | Notes |
|---|---|---|
| GPU (RTX 4090) | 575W | Peaks at 100% utilization |
| CPU (Ryzen 9 7950X) | 170W | Under load |
| Motherboard + RAM + SSD | 100W | Typical |
| Cooling fans, PSU overhead | 50-100W | Safety margin |
| Total system load | ~895β945W | Needs 1200W PSU minimum |
β’Keypoint: GPU is 60β65% of total system power. CPU, cooling, and overhead are the remaining 35β40%.
What Does It Cost to Run a Local LLM 24/7?
Assuming $0.12/kWh (US average):
π¬ In Plain Terms
kWh (kilowatt-hour): One thousand watts of power used for one hour. At $0.12/kWh, running a 600W RTX 4090 for 24 hours uses 14.4 kWh, costing $1.73/day.
| GPU | Daily Cost | Monthly | Annual |
|---|---|---|---|
| RTX 4090 (600W avg) | $1.73 | $52 | $625 |
| RTX 4080 (350W avg) | $1.01 | $30 | $360 |
| RTX 4070 Ti (300W avg) | $0.86 | $26 | $315 |
| M5 Max Mac (30W avg) | $0.09 | $2.60 | $32 |
π‘Tip: Power limiting RTX 4090 to 350W saves 40% electricity with only ~10% speed loss β the sweet spot for efficient inference at scale.
What Cooling Do You Need for Local LLM Inference?
Proper cooling is critical for GPU lifespan (5+ years) and preventing thermal throttling.
Adequate case airflow: Front fans pull cool air in, rear/top fans exhaust hot air. RTX 4090 needs large case with 3+ fans.
Ambient temperature: Ideally 18-24Β°C. In hot climates (30Β°C+), cooling becomes critical.
Thermal paste: Replace every 2-3 years for optimal heat transfer (if applicable).
Monitoring: Use GPU-Z or nvidia-smi to monitor temperatures. Keep under 80Β°C sustained.
π In One Sentence
Thermal throttling: Automatic clock speed reduction when GPU detects unsafe temperatures, protecting the chip from heat damage at the cost of inference speed.
β οΈWarning: GPU throttles above 83Β°C β performance drops 10β20%. Poor airflow causes sustained throttling even at 75Β°C in hot rooms.
π οΈPractice: Use `nvidia-smi -q -d TEMPERATURE` to monitor GPU temperature continuously. Set up alerts at 75Β°C to prevent throttling.
Quick Facts
- RTX 4090 peak draw: 575W (GPU alone)
- Required PSU: 1200W for RTX 4090 system
- 24/7 cost at $0.12/kWh: ~$52/month (RTX 4090)
- Apple M5 Max total draw: 25β35W
- Efficiency ratio: M5 Max uses ~10Γ less power per token than RTX 4090
- Safe GPU temp: Keep below 83Β°C for sustained inference
π‘Tip: Apple Silicon vs NVIDIA: efficiency winner. M5 Max achieves 65β85 tok/sec β 4Γ faster than M4 generation while using the same power on just 25β35W, while RTX 4090 requires 600W for 150 tok/sec on the same model.
Common Power and Cooling Mistakes
- Undersizing the PSU. RTX 4090 with 750W PSU will trigger shutdowns under load. Always budget 2Γ the GPU power draw.
- Ignoring case airflow. Poor airflow causes thermal throttling (~10% performance loss) and shortens GPU lifespan.
- Running 24/7 without considering costs. RTX 4090 costs $50/month electricity. Not practical for personal use unless you run inference constantly.
- Not monitoring GPU temperature. Cards can silently throttle due to thermal stress. Monitor with nvidia-smi.
- Forgetting cooling overhead in TCO calculations. Cooling is the second-largest cost after the GPU itself. Running a dual-GPU rig in a hot climate (30Β°C+ ambient) requires ~$200β400/year in additional A/C costs to maintain 22Β°C room temperature. Apple Silicon eliminates this: M5 Max draws 30W and produces minimal heat, no extra cooling needed.
β οΈWarning: 750W PSU + RTX 4090 = random shutdowns under sustained inference. Real-world power spikes exceed PSU capacity, triggering automatic shutdown to protect components.
Power Costs by Region
EU (Germany/France): β¬0.30β0.40/kWh β 3Γ the US average. Running an RTX 4090 24/7 costs β¬120β160/month in Germany. GDPR encourages on-premise deployment but energy costs make Apple Silicon or power-limited GPU inference essential for EU users.
Japan: Β₯27β30/kWh (~$0.18β0.20/kWh). Energy costs are 50β70% higher than the US average. METI's 2024 AI efficiency guidelines favor energy-efficient hardware for corporate deployments.
China: Β₯0.5β0.8/kWh ($0.07β0.11/kWh) in eastern cities. Lower electricity costs favor NVIDIA GPU deployments. China Data Security Law requirements make on-premise inference common for enterprises.
Power & Cooling FAQ
πInsight: Power-limited inference at 60% TDP is a common data center practice. RTX 4090 at 350W (60% of 575W) delivers 90% of peak performance with 40% lower electricity costs and less cooling load.
How much power does running a local LLM use?
Power draw depends on GPU tier. RTX 4090: 575W peak (600W average with system). RTX 4080: 320W GPU (450W system). RTX 4070 Ti: 290W GPU (400W system). Apple M5 Max Mac: 25β35W total β the most energy-efficient option by far. Inference loads the GPU to 90β100% utilization continuously.
How much does it cost to run a local LLM 24/7?
At $0.12/kWh (US average): RTX 4090 system costs ~$52/month. RTX 4080 system: ~$30/month. RTX 4070 Ti system: ~$26/month. Apple M5 Max Mac: ~$2.60/month. Electricity rates vary β in Germany (~$0.40/kWh), multiply by 3Γ. Running inference only during work hours (8h/day) reduces costs by ~67%.
What PSU wattage do I need for an RTX 4090?
Minimum 1000W PSU; 1200W recommended. The RTX 4090 draws 575W at peak. Add CPU (150β170W), motherboard/RAM/storage (100W), and a 20% safety margin β total system load reaches ~900W. A 750W PSU will trigger shutdowns under sustained LLM inference load. Always buy from reputable PSU brands (Seasonic, Corsair, EVGA).
Is Apple Silicon more efficient than NVIDIA for local LLMs?
Yes β by a large margin. M5 Max (128 GB unified, Mar 2026) runs 7B models at 65β85 tok/sec on 25β35W total system power. An RTX 4090 runs the same model at 150 tok/sec on 600W. M5 Max uses ~10Γ less power per token than RTX 4090, plus offers 4Γ larger memory pool (128 GB vs 32 GB) for 70B models.
What GPU temperature is safe for sustained LLM inference?
Keep GPU temperature below 83Β°C for sustained inference. RTX 4090 thermal throttle triggers at 83Β°C, reducing clock speeds and inference speed by 10β20%. Ideal operating range: 65β75Β°C. Use `nvidia-smi -q -d TEMPERATURE` to monitor. If temperatures exceed 80Β°C, improve case airflow or add/replace thermal paste.
How do I reduce power consumption without losing inference speed?
Power limit the GPU (NVIDIA) without reducing clock speeds. RTX 4090: setting power limit to 350W (from 575W) reduces power by 40% with only ~10% speed loss β the sweet spot for efficient inference. Use `nvidia-smi -pl 350` to set power limit. Apple Silicon users: no tuning needed, the hardware is already optimized.
What is TDP and why does it matter for local LLMs?
TDP (Thermal Design Power) is the maximum heat a GPU generates at peak load, measured in watts. NVIDIA rates RTX 4090 TDP at 575W, but real-world inference can peak at 600W+ depending on power limits and clock speeds. TDP matters because it determines your minimum PSU size and cooling requirements. Higher TDP = larger PSU, more electricity cost, more cooling needed.
Does running a local LLM damage my GPU?
No β sustained inference will not damage a healthy GPU if cooling is adequate. GPUs are designed to run at 100% utilization 24/7 (data centers do this). The real risks are: (1) poor cooling causes throttling and shortens lifespan, (2) power spikes from undersized PSU can trigger shutdowns, (3) dust/bad airflow degrades performance over years. Monitor temperatures and maintain good airflow, and your GPU will last 5+ years.
Sources
- NVIDIA GPU Power Specifications
- US Electricity Rates β U.S. Energy Information Administration
- GPU Temperature Monitoring with nvidia-smi
- Power efficiency gains speed, but speed doesn't guarantee quality output. Temperature and sampling settings can offset energy consumption with better results: temperature and top-p explains how these parameters trade off speed and consistency.