Human‑Inspired Agents: Translating Workflows into Robust AI Systems

   

When ChatGPT and its peers burst onto the scene at the end of 2022, the analyst community immediately began probing one question: could large language models write SQL for us? The appeal is obvious. More than 400 million Office 365 users—and upwards of 90 percent of firms—still rely on spreadsheets for core analysis, so any effective AI tool for analysts taps a vast, lucrative market. I have argued before that such tools are shifting analysts from “dashboard jockeys” to strategic AI orchestrators who pair domain insight with machine assistance.


Help fuel future editions with a small contribution. 💡


The first thing we all tried was fine tuning. However, simply fine-tuning pre-trained LLMs for text-to-SQL quickly reveals critical limitations. Natural language is inherently ambiguous, database schema context is often fragmented, and models frequently lack the factual knowledge needed to generate correct queries. For production applications—especially customer-facing ones—this unreliability is unacceptable. Analysts will only trust systems that consistently deliver accurate results. The industry needs more robust approaches beyond basic fine-tuning to make text-to-SQL viable for real-world implementation.

Learning From Human SQL Craft

At the recent Agent Conference in New York, Timescale’s CTO Mike Freedman laid out a blueprint for a more reliable text‑to‑SQL agent—without further fine tuning or post-training. His starting point is disarmingly simple: observe how experienced analysts write SQL, then mirror that workflow.

(click to enlarge)

Timescale distills those observations into two companion modules:

  1. Semantic Catalog. Think of this as an always‑up‑to‑date knowledge base that maps user vocabulary to database reality. It stores table semantics, column aliases, units, and business definitions. When the LLM receives a prompt, the agent first queries the catalog to ground ambiguous terms (“revenue” versus “gross_sales”) and to inject table‑specific hints. Because the catalog is version‑controlled alongside the schema, new columns or renamed fields propagate automatically—no retraining required. As I noted in an earlier piece on GraphRAG and related approaches, Timescale is part of a broader shift toward grounding RAG systems in structured knowledge rather than vectors alone.

  2. Semantic Validation. After the model drafts a query, the agent runs EXPLAIN in Postgres to catch undefined columns, type mismatches, and egregious cost estimates. Invalid plans trigger a structured error that the agent feeds back into the LLM for another revision cycle. The loop resembles a compiler pass more than a chat exchange, and it neatly aligns with how modern coding copilots lean on build tools to sanity‑check generated code. 

The practical effect is a system that converges on syntactically and semantically correct SQL in a handful of turns—often faster than a fine‑tuned model that “hallucinates” table names it was never shown.

From Text-to-SQL to Broader Lessons in Agent Design

The Timescale approach yields tangible results, sharply reducing query errors, particularly for complex joins, once its Semantic Catalog and Validation components are active. More importantly, it offers a methodological blueprint. Instead of merely layering a large language model onto existing interfaces, Timescale started by dissecting how expert analysts actually write SQL—understanding intent, mapping terms to schema, testing, and correcting. They then encoded this structured workflow into an agent that intelligently combines probabilistic generation with deterministic checks.

(click to enlarge)

This specific example highlights broader lessons for building effective AI agents. Firstly, it underscores the value of deeply understanding the human workflow you aim to automate or assist; modeling the human process provides critical insights into the necessary information and feedback mechanisms. Secondly, it reinforces the idea that realizing AI’s full potential often requires transforming workflows, not just augmenting them. As others, including Microsoft, have argued regarding AI agents, the most significant gains come when we redesign how work gets done, integrating AI tightly with deterministic tools and structured data sources rather than treating it as a simple add-on.

For practitioners building AI applications, particularly those involving complex generation tasks, several practical takeaways emerge. Invest in building and maintaining structured context layers (like semantic catalogs or knowledge graphs) to ground the model accurately. Leverage existing deterministic tools—databases, compilers, APIs, linters—as cheap, reliable oracles for validating AI output. Finally, design agents with tight feedback loops, enabling them to interpret structured validation results and iteratively self-correct. The journey towards trustworthy AI systems relies significantly on such thoughtful system design, combining generative power with structured knowledge and verification.

The post Human‑Inspired Agents: Translating Workflows into Robust AI Systems appeared first on Gradient Flow.

See also  Computex 2025: Five Takeaways From Asia’s Biggest AI Tech Show