How to improve productivity and reduce costs through thoughtful AI agent usage
We are entering usage-based pricing. The way we work with AI agents needs to evolve.
Open GitHub's Copilot usage-based pricing preview page live — real personal subscription data.
single developer · 1 month
GitHub pricing preview tool · real personal subscription · 2026
| Scale | Monthly | Annual |
|---|---|---|
| 1 developer | $3,200 | $38,400 |
| Team of 30 | ~$96,000 | ~$1,152,000 |
Based on GitHub's own pricing preview estimate. API-equivalent pricing; GitHub Copilot billing structure may differ. Order of magnitude, not a precision forecast.
You learned commits, branches, merges.
You use it deliberately.
You query intentionally.
You scope dashboards.
Same requirement.
Understand the resource model. Use deliberately.
Transition: to use it well, you need to understand how pricing works.
| Token Type | Opus 4.6 | Sonnet 4.6 | Haiku 4.5 |
|---|---|---|---|
| Input | $5 / MTok | $1.50 / MTok | $1 / MTok |
| Cache Read | $0.50 / MTok | $0.15 / MTok | $0.10 / MTok |
| Cache Write | $6.25 / MTok | $1.875 / MTok | $1.25 / MTok |
| Output | $25 / MTok | $7.50 / MTok | $5 / MTok |
Choosing the wrong model for a task is a direct cost multiplier. Anthropic API pricing, May 2026.
"You are billed for the sum of all context windows across all turns — not the final context size."
A session showing 1M input tokens doesn't mean 1M in context — it means a growing context was re-sent many times.
When context approaches the limit, the model summarizes and loses precision.
The negative feedback loop
Demo: show live how enabling/disabling a skill or MCP affects initial context token count.
What default actually looks like
| Uncontrolled Behavior | Your Mitigation |
|---|---|
| Number of tool calls | Limit tool visibility via MCP / primitive tool whitelisting |
| Agent loop iterations | Write tighter scoped prompts; cancel early |
| Internal memory writes | Disable; use explicit memory.md instead |
| Compaction timing | Reduce context size so compaction is rare |
Transition: what can we control right now, without changing the architecture?
Custom session analysis scripts built on raw events.jsonl telemetry
Every item on the right addresses a specific cost and reasoning quality failure covered in this presentation.
| Convenience Architecture | Production Architecture |
|---|---|
| Single agent | Tiered orchestration |
| All skills visible | Task-scoped skills |
| All MCPs active | Task-scoped MCPs |
| All tools visible | Scoped visible tools |
| Implicit context growth | Explicit working memory (memory.md) |
| Optimized for: onboarding | Optimized for: scale, cost, predictability |
| A — Tiered workflow | B — All Opus, tiered | C — Default single agent | |
|---|---|---|---|
| Architecture | Orchestrator + 4 subagents | Same topology, same token volumes | Single agent, all tools/skills/MCPs |
| Models | Opus 4.6 + Haiku 4.5 | Opus 4.6 only | Opus 4.6 |
| Evidence | Measured telemetry, exact API repricing | Exact repricing of measured token volumes | Architectural behavior estimate* |
| Est. Cost | ~$11 | ~$27.50 | ~$100–230* |
*Scenario C range is wide by design. Default single-agent architecture does not produce stable or predictable token growth. Anthropic API pricing, May 2026.
This is a tested, production-validated starting point — not the final answer.
Key takeaways