Unveiling the Hidden Costs of Large Language Models at RacketCon 2024
Table of Contents
Introduction
As part of the (fourteenth RacketCon) one of the talks walked through the expected cost in terms of energy, water, and time for using and training the LLMs.
This document serves as a comprehensive resource on the multifaceted costs of large language models, from training to inference, environmental impact to optimization strategies.
Training Costs
Energy Consumption
Training large language models requires massive computational resources. Recent studies have quantified these costs:
- GPT-3 (175B parameters): Estimated 1,287 MWh during training, equivalent to the annual energy consumption of ~130 US homes
- BLOOM (176B parameters): Consumed approximately 433 MWh for training
- PaLM (540B parameters): Training consumed an estimated 2,500 MWh of energy
- Meta's LLaMA-2 70B: Required approximately 1,720 MWh for training
The energy costs scale non-linearly with model size, with each doubling of parameters requiring significantly more than double the energy.
Water Usage
Data centers require substantial water for cooling systems:
- GPT-3 training is estimated to have consumed ~700,000 liters of water for cooling
- A single data center can consume 1-5 million liters of water daily
- Microsoft reported 34% increase in water consumption from 2021-2022, largely attributed to AI infrastructure
- Google's water usage increased by 20% in 2022, with AI workloads being a significant factor
Compute Resources
- GPT-3: ~3,640 petaflop-days of compute
- GPT-4: Estimated 50,000-100,000 petaflop-days (rumored, not officially disclosed)
- Training duration: From several weeks to several months depending on model size and infrastructure
- Hardware costs: NVIDIA A100 GPU clusters can cost $10M-$100M+ for training facilities
Inference Costs
Per-Token Economics
Running inference on LLMs incurs costs for every token generated:
- GPT-4: ~$0.03-0.06 per 1K tokens (input), $0.06-0.12 per 1K tokens (output)
- GPT-3.5: ~$0.0015-0.002 per 1K tokens
- Claude 3 Opus: ~$0.015 per 1K input tokens, $0.075 per 1K output tokens
- Open-source models (self-hosted): $0.001-0.01 per 1K tokens depending on infrastructure
Energy Per Query
- Single ChatGPT query: ~0.002-0.003 kWh (approximately 2-3 Wh)
- Daily operational cost for ChatGPT estimated at $100,000-700,000 (energy + infrastructure)
- Scaling estimates: 10 billion queries/day would require ~20-30 MW continuous power
Environmental Impact
Carbon Footprint
- Training GPT-3: Estimated 552 metric tons CO2e (equivalent to 120 cars driven for a year)
- BLOOM training: ~25 metric tons CO2e (significantly lower due to French nuclear power grid)
- Full lifecycle emissions (including hardware manufacturing): 2-5x higher than training alone
Comparative Environmental Costs
- Training one large model: Equivalent to 5x the lifetime emissions of an average car
- Daily ChatGPT operations: Comparable to a small town's electricity consumption
- Projected AI sector emissions by 2030: 0.5-1.5% of global greenhouse gas emissions
Academic Studies
Key research findings:
- Strubell et al. (2019): "Energy and Policy Considerations for Deep Learning in NLP"
- Demonstrated that training a single large transformer model can emit as much carbon as five cars in their lifetimes
- Highlighted the environmental cost of neural architecture search (NAS)
- Patterson et al. (2021): "Carbon Emissions and Large Neural Network Training"
- Showed that carbon footprint varies dramatically based on energy grid composition
- Proposed using carbon-aware computing to reduce emissions by 100-1000x
- Luccioni et al. (2023): "Power Hungry Processing: Watts Driving the Cost of AI Deployment?"
- First comprehensive study of inference costs across multiple model families
- Found that image generation models have significantly higher per-query costs than text models
Cost Optimization Strategies
Model Efficiency
- Model distillation: Reduce parameters by 10-100x while maintaining 95%+ performance
- Quantization: 4-bit and 8-bit models reduce memory and compute by 2-4x
- Sparse models: Mixture-of-Experts (MoE) architectures activate only 10-20% of parameters per query
- Pruning: Remove redundant weights to reduce model size by 30-70%
Infrastructure Optimization
- Carbon-aware scheduling: Run training jobs when renewable energy is available
- Geographic optimization: Locate data centers in regions with clean energy grids
- Liquid cooling: Reduce water consumption by 20-30% compared to traditional cooling
- Custom accelerators: Google TPUs, AWS Trainium offer 2-5x better price/performance
Operational Best Practices
- Batch processing: Amortize fixed costs across multiple requests
- Caching: Store and reuse common completions
- Prompt optimization: Reduce token counts through efficient prompt engineering
- Model selection: Use smallest model sufficient for task (GPT-3.5 vs GPT-4)
- Local deployment: Self-host smaller models for privacy and cost reduction
Future Directions
Emerging Approaches
- Retrieval-Augmented Generation (RAG): Reduce model size requirements
- Parameter-efficient fine-tuning (PEFT): LoRA, QLoRA reduce training costs by 100x
- Edge deployment: Move inference to user devices
- Specialized models: Task-specific smaller models replacing general-purpose large ones
Sustainability Initiatives
- Green AI movement: Developing energy-efficient architectures
- Renewable energy commitments: Major providers targeting 100% renewable energy
- Carbon offset programs: Companies purchasing carbon credits for AI operations
- Efficiency reporting: Standardized metrics for comparing model environmental costs
References
- https://docs.racket-lang.org/llm/LLM_Cost_Model.html
- https://arxiv.org/abs/2309.14393 - "LLM Cost Models and Optimization"
- https://arxiv.org/pdf/2304.03271 - "Environmental Impact of Large Language Models"
- https://arxiv.org/abs/2304.08485 - "Sustainable AI Development"
- Strubell et al. (2019): https://arxiv.org/abs/1906.02243
- Patterson et al. (2021): https://arxiv.org/abs/2104.10350
- Luccioni et al. (2023): https://arxiv.org/abs/2311.16863
- Sharir et al. (2020): "The Cost of Training NLP Models" https://arxiv.org/abs/2004.08900
- Wu et al. (2022): "Sustainable AI: Environmental Implications" https://arxiv.org/abs/2111.00364
- de Vries (2023): "The growing energy footprint of AI" https://www.cell.com/joule/fulltext/S2542-4351(23)00365-3
Acknowledgments
This research was presented at RacketCon 2024, the fourteenth annual conference for the Racket programming language community, fostering discussion about responsible AI development and computational sustainability.