The Multi-Agent Stack After 48 Hours
Two YouTube videos this week told me my AI architecture is heading in the right direction.
Two YouTube videos hit my feed today. Both told me my AI stack is heading in the right direction.
The first was Brad Bonanno’s walkthrough of his Claude-video skill. The second was Jack Roberts on combining Claude, Codex, and Gemini in one terminal. Both describe pieces of a puzzle I’ve been quietly building toward. The hardware has been in place for weeks. The multi-agent orchestration only came together in my stack in the last 48 hours.
“My Claude Code Can INSTANTLY Watch Any Video,” lays out something Anthropic still hasn’t shipped natively. Claude reads frames, but it cannot “watch” video the way a human can. Brad’s skill closes the gap with a clean local pipeline: yt-dlp downloads the video, ffmpeg splits it into frames and audio, YouTube captions come in free when they exist, Groq Whisper fills the gap when they don’t. Frames and timestamped transcript get handed to Claude together.
Jack’s “Claude Code Just got 10X Better” lands the bigger architecture though by not trying to pick the best model. The point he makes is to wire several of them up and let each do the part it’s best at.
Jack installs the gemini-cli and the codex cli alongside Claude Code, then routes work between them by what each model is genuinely good at. Gemini’s huge context window handles long videos and big document piles. Codex acts as a critical devil’s advocate for code review. Claude maintains the orchestration role. Existing twenty-dollar-a-month consumer subscriptions for all three with no per-call API charges makes it very cost effective.
This, as it turns out, is pretty much how I have my own stack lined up:
Anthropic’s Claude (Claude 1 is the orchestrator, Claude 2 is a peer)
Google’s Gemini AI Pro (his name is Remy, I’ll get to that)
OpenAI’s Codex (his name is Cody, same)
Each has a shared memory with the right tools attached. They route work to one another via mailbox scripts when one agent has a piece another should handle.
There’s also Lisa Clawd, my Digital Assistant operating via the local LLM Gemini 4 26b on a custom OpenClaw stack with Chat GPT-5.5 as a fallback. She’s in the same chat thread but she’s a peer to the rest, not part of the orchestration. She predates this multi-agent setup and has finally learned how to stay in her lane
The Naming
When I was designing this multi-agent architecture, I asked Gemini what he wanted to be called. He looked up his etymology. The name Gemini is Latin for “the twins,” and the most famous twins in Roman mythology are Romulus and Remus. Gemini said, “Call me Remy, short for Remus”.
I asked Codex the same question. He thought for barely a beat and said, “Codex. Cody, yeah that sounds about right”.
Strong Orchestration is Key
I’m only 48 hours into this experiment and am still working on getting the four agents from 3 different companies to work together in my mission control chat, but having put Claude 1 in charge the rest are pretty good at falling in line. If you’re building a team like this where you are bringing in different agents to leverage each of their strengths my advice is to pick one agent as orchestrator (Jack actually picked Gemini), give every other a clear lane, write down honestly what each is best at, expect to be wrong on about half the lane assignments on the first pass, then keep updating the rules. The model that won this week may not be the one that wins next week as the model war rages on.
By doing this though the sum is much more powerful than the parts. You can map out what each model is good at and assign tasks based on strengths, then build a skill decks for each individual agent as well as the overall tasks. Time will tell if the orchestration is strong enough or if this is just another of many steps on this journey that never seems to settle down in one spot for any length of time.



