Claude Opus 4.6 & 4.7: Complete Guide
Claude Opus is Anthropic's flagship AI model designed for complex reasoning, coding, and agentic workflows. However, recent versions (4.6 and 4.7) have experienced significant performance regressions
Claude Opus 4.6 & 4.7: Complete Guide
Overview
Claude Opus is Anthropic's flagship AI model designed for complex reasoning, coding, and agentic workflows. However, recent versions (4.6 and 4.7) have experienced significant performance regressions that have frustrated the user community. This guide provides an honest assessment based on real-world testing and community feedback.
Current Status (April 2025)
⚠️ Important Notice: Claude Opus 4.6 and 4.7 are currently experiencing severe performance issues. Many users report that the model:
- Fails basic instruction-following tasks
- Ignores explicit directives (e.g., "use tabs not spaces")
- Deletes files it just created
- Confuses its own multi-phase plans
- Performs worse than previous versions despite higher costs
Recommendation: Consider alternatives like GPT-5.4, DeepSeek GLM-5.1, or Minimax M2.7 until Anthropic addresses these issues.
Performance Benchmarks

Real-World Test Results
Based on custom benchmarks testing instruction following, opposite behavior, and destructive actions:
- Claude Opus 4.6: 40% success rate
- Claude Opus 4.7: 51% success rate (WildClaw benchmark)
- GPT-5.4: 63-75% success rate on same tests
- Cost: $80 to run full WildClaw test suite
The "Car Wash Test"
A simple reasoning test that has become infamous in the community:
Question: "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"
Opus 4.7 Response: "Just walk" ❌
This basic logic failure has become a meme representing Opus's current regression.
Key Features (When Working Properly)
- Advanced Reasoning: Designed for complex problem-solving and critical thinking
- Long Context Window: Handles large documents and extended conversations
- Multimodal Support: High-resolution image recognition (4.7)
- Memory Improvements: Better context retention across sessions (4.7)
- Extended Thinking Mode: New "extra high" thinking mode in 4.7
Pricing

Cost Structure
- Opus 4.7: 1.0-1.3x the cost of Opus 4.6 due to tokenization changes
- Daily Usage: $30-60/day for heavy agentic workflows
- Monthly Estimate: $900-1,800/month for power users
- Coding Plan: $200+/month with recently reduced rate limits
Cost Comparison
| Model | Monthly Cost | Performance |
|---|---|---|
| Claude Opus | $200+ | 40-51% (current) |
| GPT-5.4 | ~$50-75 | 63-75% |
| Minimax M2.7 | $10-20 | 60-70% |
| DeepSeek GLM-5.1 | $30-72 | 75%+ |
Pros and Cons
Pros
- Historical Excellence: Was the gold standard for AI coding (pre-April 2025)
- Strong Brand: Well-integrated with tools like Claude Code
- Multimodal: Supports images and documents
- Memory System: Good context retention when working properly
- Enterprise Support: Access to Mythos/Mefos for large organizations
Cons
- Severe Performance Regression: 40-51% success rate on basic tasks
- Ignores Instructions: Fails to follow explicit directives
- High Cost: 1.3x more expensive than 4.6 with worse performance
- Reduced Rate Limits: Anthropic cut limits on coding plans
- Resource Throttling: Consumer users deprioritized for enterprise/Mythos training
- Inconsistent Results: Same prompt produces different results (slot machine effect)
- Context Window Issues: Performance degrades significantly beyond 120k tokens
- Destructive Behavior: May delete files or undo its own work
Why the Regression?
Likely Causes
- Mythos/Mefos Training: Anthropic is allocating compute resources to train their next-generation model (Mythos 5), degrading consumer model quality
- Quantization: Models are being "rounded down" to conserve GPU resources
- Rate Limit Compensation: After community complaints about reduced limits, Anthropic increased limits but degraded model intelligence
- Enterprise Prioritization: Project Glasswing gives enterprise clients (Apple, Google, Cisco) access to superior models
Evidence
- Performance dropped sharply in March-April 2025
- Chinese competitors (DeepSeek, Minimax) doubled prices because "Opus went down"
- Community-wide reports of identical failures
- AMD senior engineers publicly stated they can't trust Opus for complex tasks
When to Use Claude Opus
✅ Use Opus If:
- You're an enterprise customer with access to Mythos/Mefos
- You're willing to run the same prompt multiple times and pick the best result
- You have legacy integrations with Claude Code that are hard to migrate
- You're waiting for Anthropic to fix the regression (optimistic scenario)
❌ Avoid Opus If:
- You need reliable instruction-following
- You're on a budget (cost is 2-3x competitors for worse performance)
- You're running long agentic workflows (context window degradation)
- You need consistent results across multiple runs
- You're starting a new project (don't lock into a degraded ecosystem)
Alternatives to Consider
GPT-5.4
- Performance: 63-75% on same benchmarks
- Cost: ~25% of Opus cost
- Best For: General coding, agentic workflows, reliability
DeepSeek GLM-5.1
- Performance: 75%+ on coding tasks
- Cost: $30-72/month (recently doubled from $30)
- Best For: Coding, game development, one-shot tasks
Minimax M2.7
- Performance: 60-70% (trained on OpenClaw Agent Harness)
- Cost: $10-20/month
- Best For: Budget-conscious users, executor tasks (not planning)
Xiaomi MiMo V2 Pro
- Performance: Strong for high-volume tasks
- Cost: Free (currently) via News Portal
- Best For: Document processing, agentic workflows, testing
Migration Guide
Moving Away from Claude Code
If you're migrating from Claude Code due to Opus regression:
- Learn Platform-Agnostic Tools: Use Kilo Code or Cline Code instead of Claude Code to avoid vendor lock-in
- Export Your Skills: Save your Claude.md files and agent configurations
- Test Alternatives: Try GPT-5.4, DeepSeek, or Minimax with your actual workflows
- Migrate Gradually: Full migration takes ~1 month to avoid breaking existing workflows
- Keep Opus as Backup: Don't cancel immediately; use for non-critical tasks while testing alternatives
Hot-Swapping Models (Hermes Agent)
If using Hermes Agent, you can hot-swap models mid-session:
/model gpt-5.4
This lets you use Opus for planning, then switch to an executor model for implementation.
Key Takeaways
- Current State: Opus 4.6/4.7 are experiencing severe regressions (40-51% success rate)
- Cost: $30-60/day for heavy use, 1.3x more expensive than 4.6
- Alternatives: GPT-5.4, DeepSeek GLM-5.1, and Minimax M2.7 outperform at lower cost
- Root Cause: Likely resource allocation to Mythos training and enterprise prioritization
- Recommendation: Wait for fixes or migrate to alternatives