This looks like a security nightmare in case someone decides to publish this interface publicly. Prompt injection to exfiltrate sensitive Information being on the top of the list.
You're right. For now, it's only local. For a public deployment, the idea is to have sandboxes and verification steps. That won't completely eliminate the risk of prompt injection, but so far no solution has managed to completely resolve this problem.
This is clever and provides a clean alternative to using custom plugins and mcp servers for doing code reviews.
For example, with the degradation of Claude in the past 1-2 months, I am always asking Codex to review Claude's plans and vice versa and I get excellent results that way.
Also, making a skill an API call allows for easy deployment if the security around tool calling could be isolated in an ephemeral sandbox.
Thanks! Sandbox deployment is planned in the roadmap. I already have a RuntimeAdapter interface in my architecture that I'll use to isolate the VMs. I'm doing exactly the same thing: I'm cross-referencing the models to challenge their plan, and my code reviewer agent's API is a big help.
Auto-switching across model providers basically concedes the model layer is commodity, which I think is right (1)
tbd whether the skill registry develops network effects or just stays a flat directory. Portable skills as APIs tracks with the broader pattern of agent stacks decomposing into specialized swappable layers, where the defensible asset is whatever process knowledge orgs encode, not the deployment infra.
I agree on the commodity point, that's why I went multi-model from start.
The registry question is the one I'm thinking about the most. Right now it's flat. I plan to integrate usage data (success rates, cost, trust scores). So the registry tells you which skills actually work well, and that's valuable.
My colleague built this because he wanted to use his skills outside of Claude Code.
With this project you can expose your skills as an API endpoint in under 2 minutes.
If you could have a look at the repo and give your feedback, it would be much appreciated.
Thanks!
For example, with the degradation of Claude in the past 1-2 months, I am always asking Codex to review Claude's plans and vice versa and I get excellent results that way.
Also, making a skill an API call allows for easy deployment if the security around tool calling could be isolated in an ephemeral sandbox.
tbd whether the skill registry develops network effects or just stays a flat directory. Portable skills as APIs tracks with the broader pattern of agent stacks decomposing into specialized swappable layers, where the defensible asset is whatever process knowledge orgs encode, not the deployment infra.
(1) wrote about it here from an enterprise perspective: https://philippdubach.com/posts/dont-go-monolithic-the-agent...
The registry question is the one I'm thinking about the most. Right now it's flat. I plan to integrate usage data (success rates, cost, trust scores). So the registry tells you which skills actually work well, and that's valuable.
Your article looks interesting, I'll read it.