- Published on
📝 Nested Learning:Why AI Can't Remember Yesterday (And How to Fix It)
- Authors

- Name
- Chengchang Yu
- @chengchangyu
🧠 The Problem: AI Has Amnesia
Here's the uncomfortable truth about ChatGPT and other AI models:
They can't actually learn from your conversations.
Think about it:
- You teach ChatGPT something new in a conversation
- It seems to understand and use that information
- You start a new chat the next day
- It has completely forgotten everything
This isn't a bug. It's how these models are designed.
🎭 The Anterograde Amnesia Analogy
Remember the movie Memento? The main character can't form new long-term memories. He remembers everything before his accident, but nothing new sticks.
Current AI models have the exact same condition:
✓ Long-term memory: Everything from training (frozen forever)
✓ Short-term memory: Current conversation (lasts minutes)
✗ Medium-term memory: DOESN'T EXIST
Result: AI is stuck experiencing an eternal present, unable to grow beyond its training data.
💡 The Big Idea: Nested Learning
Researchers from Google asked a simple question:
"What if we stopped thinking about AI as a stack of layers and started thinking about it as a brain with different parts that learn at different speeds?"
This is Nested Learning – and it changes everything.
🏗️ The Three Levels of Learning
Traditional AI (One Speed)
Training: Learn everything at once
Deployment: Learn nothing, ever
Nested Learning (Multiple Speeds)
Ultra-Fast Memory: Updates every word (like attention span)
Fast Memory: Updates every sentence (like working memory)
Medium Memory: Updates every conversation (like short-term memory)
Slow Memory: Updates every week (like long-term memory)
Ultra-Slow Memory: Core knowledge (like instincts)
The key insight: Human brains work this way! Different neurons fire at different speeds.
🔍 Three Breakthrough Discoveries
Discovery #1: Your Optimizer Is Actually a Brain
The Shocking Truth: The algorithm that trains AI (called an "optimizer") isn't just a tool - it's a memory system.
What researchers found:
The "momentum" in gradient descent (a common training trick) is actually:
- A mini-brain inside the main brain
- Learning to compress and remember past training steps
- Making decisions about what to remember and forget
In plain English:
We thought: Optimizer = Tool (like a hammer)
Reality: Optimizer = Memory (like a notepad that learns)
Why this matters: We can make optimizers "deeper" by giving them better memory systems.
Discovery #2: Models Can Learn to Modify Themselves
The Innovation: What if AI could rewrite its own code while running?
Traditional AI:
1. Train the model
2. Freeze all parameters
3. Use it (no changes allowed)
Self-Modifying AI:
1. Train the model
2. Model learns HOW to update itself
3. Model adapts to new information in real-time
Real-world example:
Traditional ChatGPT:
- You: "I prefer being called 'Chief' instead of my name"
- ChatGPT: "Got it, Chief!" (stores in context)
- New conversation
- ChatGPT: "Hello! How can I help?" (forgot everything)
Self-Modifying AI:
- You: "I prefer being called 'Chief'"
- AI: Updates its own parameters to remember this
- New conversation next week
- AI: "Hello Chief! Good to see you again."
Discovery #3: Memory Isn't Binary
Old thinking:
Short-term memory OR Long-term memory
(Context window) (Frozen parameters)
New thinking: The Memory Spectrum
Seconds: What you just said
Minutes: This conversation's topic
Hours: Today's discussions
Days: This week's patterns
Weeks: Your preferences
Months: Domain expertise
Forever: Core knowledge
The breakthrough: AI can have memory at ALL these timescales simultaneously.
🚀 Introducing HOPE: AI That Actually Learns
HOPE = Hierarchical Optimization with Persistent Encoding
Think of it as AI with a proper memory system:
How HOPE Works (Simple Version)
When you teach HOPE something new:
Step 1: Fast Memory captures it immediately
↓ (over minutes)
Step 2: Medium Memory consolidates patterns
↓ (over hours)
Step 3: Slow Memory stores core concepts
↓ (over days)
Step 4: Knowledge becomes permanent
This is exactly how human memory works!
📊 Does It Actually Work?
Yes. Here are the results:
Language Understanding Test
| Model | Score | Memory Type |
|---|---|---|
| Standard Transformer | 52.25% | Context only |
| Modern RNN (RetNet) | 52.02% | Limited memory |
| Titans (Previous best) | 56.82% | Better memory |
| HOPE | 57.23% | Multi-speed memory |
Translation: HOPE understands language better because it can remember and learn from context more effectively.
🎯 Real-World Applications
1. Personalized AI Assistants
Current: Forgets you every conversation
Future: Learns your preferences over time
2. Lifelong Learning Robots
Current: Needs retraining for new tasks
Future: Accumulates skills over years
3. Adaptive Customer Service
Current: Same responses for everyone
Future: Learns from each interaction
4. Medical AI
Current: Static knowledge from training
Future: Updates with new research continuously
🧩 The Key Insights (In Plain English)
Insight #1: Deep Learning Isn't Actually Deep
What we thought:
- 50 layers = 50 levels of thinking
Reality:
- 50 layers = More capacity, but still ONE level of learning
Nested Learning:
- 5 levels of nested optimization = TRUE depth
Insight #2: Everything Is Memory
The researchers discovered that ALL parts of AI are memory systems:
- Attention: Remembers recent tokens
- Optimizer: Remembers past gradients
- Training: Remembers the dataset
- Parameters: Remember compressed knowledge
They're all doing the same thing at different speeds!
Insight #3: Speed Matters More Than Size
Old approach to better AI:
Make it BIGGER (more parameters)
New approach:
Make it DEEPER (more levels of learning at different speeds)
Analogy:
- Old way: Hiring more workers
- New way: Creating a management hierarchy
🌟 Why This Changes Everything
For You (End Users)
Soon, AI will:
- ✅ Remember your preferences across sessions
- ✅ Learn your communication style
- ✅ Improve from every interaction
- ✅ Never forget what you taught it
For Developers
New possibilities:
- ✅ Models that don't need constant retraining
- ✅ AI that adapts to changing environments
- ✅ Systems that learn without forgetting old skills
- ✅ More efficient fine-tuning
For Researchers
New frontiers:
- ✅ True continual learning (no catastrophic forgetting)
- ✅ Biologically plausible AI architectures
- ✅ Explainable learning processes
- ✅ Meta-learning at multiple levels
🤔 The Simple Mental Model
Think of AI like a company:
Old AI (Current ChatGPT)
CEO: Makes all decisions
Notepad: Remembers last 5 minutes
Filing Cabinet: LOCKED after training
Problem: Can't learn anything new!
New AI (Nested Learning)
CEO: Strategic decisions (updates monthly)
↓
Managers: Tactical decisions (updates weekly)
↓
Team Leads: Daily operations (updates daily)
↓
Workers: Immediate tasks (updates constantly)
Information flows UP: Workers → CEO
Decisions flow DOWN: CEO → Workers
Result: Company learns continuously!
⚡ The Bottom Line
Three sentences to remember:
Current AI can't form new long-term memories – it's stuck with what it learned during training
Nested Learning gives AI a proper memory system – with fast, medium, and slow memory that work together
This enables true continual learning – AI that gets smarter with every interaction, just like humans
🎓 Key Takeaway
Nested Learning reveals that:
- Deep learning isn't as "deep" as we thought
- True depth comes from nested optimization, not stacked layers
- AI needs a memory system like the brain: fast, medium, and slow
- This could finally give AI the ability to truly learn and remember
The future of AI isn't just bigger models - it's smarter learning architectures.
This analysis is based on the research paper "Nested Learning: The Illusion of Deep Learning Architectures"