Two days ago, Greptile launched something new. The AI code review company didn’t add another model or tweak its prompts. It built an agent that actually executes the code under review.
TREX stands for Test, Run, Execute. And it changes the math on what static analysis can achieve.
Daksh Gupta announced the feature in a blog post on June 15, 2026. Until then, Greptile reviewed pull requests by reading diffs and repository context. Greptile’s blog post explains the shift plainly. “Static review has a ceiling. It can reason about what the code says, not about what it does.”
Some bugs stay hidden in plain sight. An endpoint returns 500 on real traffic. A UI component breaks on specific screen sizes. The diff looks clean. The code compiles. Only execution reveals the truth.
TREX closes that gap. In Greptile’s internal evaluations, adding the execution layer caught 20% more bugs than the base reviewer alone. Most of those additional catches required running the code. More inference wouldn’t have found them.
The agent doesn’t run everything. When Greptile’s core reviewer spots a PR, it identifies behavior worth testing. Then it spins up a TREX agent. That agent creates a sandbox. It generates targeted tests for the changes plus edge cases. It runs them against the actual services, dependencies, and frameworks in the repository. No mocks. No generic environments. No extra setup from the developer.
Results come back with proof. Logs. Screenshots. API traces. Execution scripts. Sometimes a short video of the UI interaction. These artifacts attach directly to the PR comment. A simple pass or fail won’t do. Engineers and downstream agents need to see what happened.
But execution introduces new problems. Sandbox security. Cost control. Deciding what to run without exploding CI budgets. Greptile charges $2 per TREX run after a free beta period ending this month. The company positions it as an add-on to existing review pricing.
This move arrives at a moment when AI coding agents proliferate. Recent discussions around Model Context Protocol and code execution as an alternative to heavy tool calling highlight the same pressure. Agents that only plan and call tools hit token limits and context walls fast. Running code directly can reduce overhead. Yet it demands safe, reliable environments.
Security questions loom large. A Pillar Security analysis from December 2025 warned about the “Coding Agent Backdoor Factory.” Malicious or poorly validated tools in agent workflows create new attack surfaces. TREX’s sandbox must withstand determined adversaries if it gains wide adoption.
Greptile built its reputation on independent review. The company markets itself as vendor-agnostic. It works alongside any IDE or coding assistant. TREX extends that independence into runtime validation. Other tools generate tests. Few execute them automatically against real stack conditions and attach verifiable evidence.
Early reaction on X mixed excitement with caution. One post noted the 20% lift in bug detection and emphasized that many new finds required execution, not just smarter reading. Developers have chased better test coverage for decades. Human-written tests miss edge cases. AI-generated tests improve the numbers but still need validation. TREX tries to close the loop.
The architecture relies on shared context. The reviewer already understands the PR intent and codebase structure. TREX inherits that knowledge. It avoids the cold-start problem that plagues standalone test generators. No boilerplate scaffolding. No custom configuration files before the first run.
Artifacts matter more than many admit. A failing test without logs or visuals leaves engineers debugging the debugger. TREX’s approach forces the agent to produce explainable output. That transparency builds trust. It also creates audit trails for teams that route PRs through multiple AI layers.
Challenges remain. Not every codebase lends itself to easy sandboxing. Complex microservice architectures, heavy database dependencies, or proprietary services complicate faithful reproduction. Greptile claims it works with the repo’s existing stack. Real-world results will vary by organization.
Cost adds up. At $2 per run, high-velocity teams could face meaningful bills. The company offers the feature free until end of June 2026 to gather feedback and usage data. That period will test whether developers find the lift worth the price.
Broader industry trends support the direction. Reports from early 2026 show agentic coding systems gaining ground. Models like Claude Opus 4.8 and Gemini variants emphasize parallel subagents and complex workflows. Execution becomes table stakes for serious validation.
Greptile’s step feels measured. It doesn’t promise bug-free software tomorrow. The blog post calls TREX “a step towards our goal of software with no bugs.” Ambitious. Yet grounded in a specific, solvable problem: runtime bugs that static review cannot see.
Teams already using Greptile for review can enable TREX today in public beta. Others will watch the initial adopters. If the 20% gain holds across diverse codebases and the artifacts prove useful, expect competitors to follow with their own execution layers.
The future of code review won’t stop at reading. It will test. It will run. It will show its work. TREX takes one concrete step in that direction.
Greptile’s TREX Runs Your Code in Review: The Execution Layer That Catches What Reading Misses first appeared on Web and IT News.
