Three questions every AI agent must answer before you trust it with real work
Wall Street wiped $285 billion off SaaS stocks in early 2026 after watching AI agents that could actually do work — not just chat about it. The bet was simple: if an AI can draft emails, manage calendars, and handle operational tasks, who needs a $200/month SaaS license?
The irony is that the agent that caused the panic — Anthropic's Cowork — is still in research preview. Close your laptop, and it stops working. That's not how serious software behaves.
So the real question isn't whether AI agents are coming. They are. The question is how to tell which ones actually work.
The three questions that separate real agents from demos
After testing dozens of AI agents and watching the space evolve from coding assistants to full outcome-focused tools, three questions consistently separate agents that deliver from agents that demo well.
1. Does the agent have persistent memory?
Not session memory. Not "it remembers the last 5 messages." Actual persistent memory that survives between sessions, learns your preferences, and builds context over time.
Most agents start from zero every time you open them. You re-explain who your contacts are, how you like your emails written, what your scheduling preferences are. This is the equivalent of hiring a new assistant every morning and spending the first hour bringing them up to speed.
A real agent remembers that you write to your Dutch colleagues in Dutch, that you prefer 45-minute meetings over 60, that your board member prefers formal language while your co-founder gets "Hey, quick update" style emails.
This isn't a nice-to-have feature. It's the foundation everything else builds on.
2. Does the agent produce artifacts you can inspect and edit?
This is where many agents fall apart. They promise to "handle things" but produce opaque results you can't verify before they're sent.
An email drafted by your AI should appear as a draft you can read, edit, and approve — not something that fires off into the void. A calendar event should show up with the right details for you to confirm. A task should have context attached, not just a title.
The distinction matters because trust is built through transparency. If you can see what the agent did, edit it, and then approve it, you develop confidence in the system. If the agent operates behind a curtain, you're gambling.
This is one of the most common complaints about tools like Lindy — they try to simplify the interface by hiding the work, but that makes it impossible to catch errors or learn what the agent is doing on your behalf.
3. Does context compound over time?
The hardest question, and the one that separates a productivity tool from a real AI agent.
Is the 10th email the agent drafts for you better than the 1st? Does it know that the last time you emailed this person, you used Dutch and signed off with "Groet"? Does it understand that meetings with this client always run long, so it should add a 30-minute buffer?
Compounding context means the agent gets smarter with every interaction. Not because the underlying AI model improves, but because your agent accumulates knowledge about you, your contacts, your workflows, and your preferences.
Without compounding context, you have an expensive auto-complete. With it, you have something that genuinely improves how you work.
How current AI agents score on these questions
Here's an honest assessment of the major players:
Anthropic Cowork
The agent that started the SaaS panic. Strong on artifacts — it produces documents, spreadsheets, and presentations you can edit. Memory exists but isn't reliable enough to depend on. Context compounding is essentially absent. And it stops working when you close your laptop.
Score: roughly 1.5 out of 3.
Lindy
The most well-known outcome agent for executives. Has persistent memory in theory, but produces opaque results that are hard to inspect or edit. Users report credits burning on failed tasks with no clear explanation. Trustpilot sits at 2.4 out of 5.
Score: roughly 1 out of 3.
Google Opal
Free, which matters. The memory feature exists but looks like a spreadsheet — not durable enough for real context compounding. Artifacts are limited. The biggest risk is Google's history of abandoning experimental products.
Score: roughly 0.5 out of 3, but the price is right.
Sauna (ex-Wordware)
The most interesting conceptual play. Explicitly builds on memory as a substrate, not a feature. Pivoted from building an AI IDE to building an AI workspace after realizing people don't want to build automations — they want their work done. Very early, mostly demos so far.
Score: promising direction, too early to judge.
What a real AI agent workflow looks like
Instead of abstract promises, here's what passing all three questions looks like in daily use.
Monday morning, 8:30am. You open your inbox. Your AI agent has already triaged 47 emails overnight. Newsletters and notifications are archived. Six emails need your attention — each with a suggested reply drafted in the right language, matching your communication style with that specific person.
Your board member gets a formal Dutch email. Your co-founder gets a casual English message. The agent knows the difference because it analyzed your email history and learned the patterns.
You review, edit two drafts slightly, approve all six. Total time: 4 minutes.
9:00am. Your morning briefing arrives — today's calendar, pending tasks sorted by priority, and a note that a contact you haven't spoken to in 3 months has a meeting with you next week. The agent has already pulled their recent LinkedIn activity and company news so you're prepared.
During the day, someone proposes a meeting for Thursday at 2pm. The agent checks your scheduling rules, sees that you have a "no meetings after 1pm on Thursdays" rule, suggests Friday morning instead, and drafts the reply. You see the draft, approve it, and it sends.
Every step is visible. Every action is editable. Every interaction makes the next one better.
Why this matters for choosing your AI tools
The AI agent market is flooded with tools that look impressive in a 2-minute demo video. The demo shows the happy path. It doesn't show what happens when the agent misinterprets your request, burns through your credits on a failed task, or sends an email in the wrong tone to an important client.
Before you commit to any AI agent — free or paid — run it through the three questions:
- Memory: Will it remember my preferences next week? Next month?
- Artifacts: Can I see, edit, and approve everything before it acts?
- Compounding: Is it measurably better at my tasks after a month of use?
If the answer to any of these is no, you're paying for a chatbot with extra steps.
The architecture behind agents that actually work
For the technically curious, agents that pass all three questions share a common architecture:
A knowledge store — not just chat history, but structured memory. Your contacts, their communication styles, your scheduling preferences, your email patterns. This needs to be persistent, queryable, and updatable without touching the AI model.
Pre-wired workflows — common tasks (email triage, meeting scheduling, follow-up reminders) should be reliable recipes, not improvised every time. The agent should know the steps for "schedule a meeting" without figuring it out from scratch.
A feedback loop — every approved action, every edit you make, every correction feeds back into the knowledge store. This is how context compounds. The agent doesn't just execute — it learns from your corrections.
And critically, durable execution. Your agent should keep working whether your laptop is open or closed. It should survive server restarts. If it's processing a multi-step workflow and something fails, it should retry — not silently give up.
The bottom line
The AI agent revolution is real, but most of the tools riding the wave don't meet the bar for actual daily use. The three questions — persistent memory, editable artifacts, compounding context — aren't arbitrary criteria. They're the minimum requirements for software you'd trust with your professional communication and schedule.
The companies that get all three right will win. The ones that nail the demo but skip the fundamentals will join the graveyard of productivity tools that promised to change your life but ended up as another unused subscription.
Ask the three questions. Trust the answers.