On June 15th 2026, Anthropic will change how Claude subscription billing works. They are splitting it into two pools: your subscription covers interactive usage (Claude.ai and the claude CLI), but anything going through their ACP protocol gets billed from a separate monthly credit. That is $20/month on Pro, up to $200 on Max. When that credit runs out, you pay API rates. Problem is: I am a big fan of Zed.
Zed's AI integration goes through ACP, so it gets hit even in interactive mode. Zed shipped Terminal Threads as a fast workaround: run claude in a Zed terminal and it stays on the interactive pool. It works, but the community reaction was sharp. Same subscription price, roughly 25x less usage for heavy automation users. Their blog put it plainly: "Terminal Threads are now the only way to keep using Claude Code in Zed with your existing Claude subscription."
That got me thinking. I was already relying on these subsidized tokens more than I realized. Not for chat, but for the kind of background automation that just quietly makes development faster. Losing that at API rates would hurt. So I started looking for alternatives.
Chapter 1: Pi + autoresearch is cool
While looking around, I came across a post about autoresearch. The idea comes from Andrej Karpathy: give an AI agent a benchmark script that outputs a single number, let it propose changes, measure the metric, keep improvements and revert regressions, repeat. Tobi Lütke ran this on Shopify's 20-year-old Liquid templating engine across roughly 120 automated experiments and got 53% faster parse times and 61% fewer memory allocations, with all 974 tests passing. The PR is still unmerged and there is valid criticism about benchmark overfitting, but the concept is solid.
Then I found pi-autoresearch, a plugin that brings this loop to Pi, an open-source coding agent CLI. Any optimization with a numeric output is a good candidate. CI build time is a perfect one.
The setup is straightforward. Write autoresearch.sh to output your metric. Optionally write autoresearch.checks.sh to run your tests as a correctness gate after each iteration. Then run /autoresearch with your goal and leave it. It proposes a change, commits it, runs the benchmark, keeps it if the number improved, reverts if not. It runs until you stop it. In practice I skipped the script entirely : I just typed /autoresearch my CI is slow, my gh is configured, go figure it out, CI runner is on this machine and it generated what to measure itself.
My setup: remote server with a heavy codebase and CI on the same machine, Pi with the autoresearch plugin, and a direct Anthropic API key (not a subscription account, pay-per-token is what you need here, pi doesn't work on Anthropic's subscription). The CI build time went from 9 minutes to 3 by finding smarter Docker and NPM caching opportunities that we would never have prioritized.
This worked well and 11$ to improve the CI by 6 minutes for each run is a fantastic deal but I wanted to have cheaper tokens
Chapter 2: Midscene.js + Openrouter
I have a very old and complex desktop application to automate. Not browser-based, so I needed a truly agnostic AI QA tool. I found Midscene.js, an open-source library from ByteDance. What sold me was its car community showcase: natural language automation on a real-world application with robots(!). Impressive enough to try. It reminded me of my time at AirConsole in the car industry.
Midscene worked best with vision-optimized models, which led me to Qwen3.6 Plus via OpenRouter. I eventually settled on Gemini 3.1 Flash, which performed better for my use case. OpenRouter was not strictly necessary, but being able to swap providers in two seconds without touching the code is genuinely useful. No affiliate link, no affiliate program. Just good software with more to come.
Mixed feelings overall. Great for writing expressive tests, especially for nightly runs. Not great for building solid, reusable automation scripts. The core issue is the loop: screenshot, AI inference (slow), write action, execute, repeat.
On stable software that does not change much, that cost and latency stacks up fast. On macOS, screen capture permissions made it unreliable on top of that. Getting Midscene to consistently see the screen required juggling accessibility and screen recording grants, and it never felt fully solid. Might be a me problem. I ended up moving to a lower-level solution. Either way, it pushed me to explore what else OpenRouter had to offer.
Chapter 3: Playing with Deepseek 4 Pro and OpenRouter
In the search for more intelligence per dollar, I spotted the DeepSeek V4 technical report. The core idea: in a normal transformer, every token has to look at every other token to understand context. That gets very expensive very fast as the context grows. DeepSeek V4 replaces most of that with two smarter mechanisms. Compressed Sparse Attention (CSA) groups tokens together and only pays attention to the most relevant groups instead of everything. Heavily Compressed Attention (HCA) goes further and compresses chunks of 128 tokens into a single entry before attending. The two alternate across layers. The end result is 27% of the computation and 10% of the memory compared to V3, while still supporting up to one million tokens of context. Less math, less memory, same (or better) quality. That is what makes the pricing below possible.
See by yourself:
The output of Deepseek V4 Pro is 28x less expensive than Opus 4.8 and 17x less expensive than Sonnet 4.6. This is astonishing.
Yes but...
As you see on the disclaimer, the retention policy and prompt training are not the same so you can definitely not use for professional developement because you cannot guarantee that your data nor your IP will stay safe. OpenRouter itself does not store your prompts by default, but DeepSeek explicitly stores all data on servers in China, uses it to train their models, and is legally required to share it with Chinese authorities on demand. It's a tradeoff that is completely fine in my playground environnement but impossible in my day to day job... My job is quite litterally creating IP...
How to use it?
When I used Open router for the first time with Deepseek, I had changes in providers and the 2 providers that I used were american and the caching was absolutely aweful (again maybe a me problem or a pi problem).
So, it ended up being more expensive than Claude because the cache was not working properly.
Guardrails
Openrouter introduced Guardrails in May 2026 as part of their Workspaces feature, to give directive to your model routing. This is extremely practical because I can say: "no matter the privacy policy of Deepseek nor its availability, I want Deepseek all the way"
Et voilà, you can get dirt cheap inference!!