Notes on Human-Facing AI Harnesses
中文I do not think the most interesting use of AI is replacing a person at a keyboard. The better use is expanding the surface area of what a person can inspect, compare, and reason about.
That matters especially for non-technical users. A technical user can already wrap AI with scripts, prompts, logs, git, terminals, and review habits. A non-technical user often only sees the chat box. If the model gets stuck, repeats itself, hides uncertainty, or loses track of the task, the user has very little leverage.
An AI harness is a way to change that relationship. It is not another agent. It is the frame around the agent: the task board, the loop, the logs, the stop conditions, the progress surface, and the place where a human can intervene.
1. AI Should Expand the Cognitive Surface
When I say AI expands cognition, I do not mean it makes decisions for me. I mean it lets me look at more of the problem at once.
A good AI workflow should help a person see:
- what the current goal is
- what has already been tried
- what changed
- what failed
- where the system is uncertain
- what needs a human decision
- what evidence exists for the current state
This is different from making the model sound confident. Confidence is cheap. A visible thinking surface is more valuable.
For non-technical users, that surface cannot be “open a terminal and inspect JSON logs”. It has to be a small operational contract: these are the tasks, this is the current round, this is what happened, this is why it stopped.
2. The Harness Is Not the Agent
The first design boundary is important:
agent = the system that reasons and acts |
The harness should not try to become a smarter model. It should not own the product direction. It should not pretend to understand every task deeply.
Its job is narrower:
- give the agent a task
- call the agent in a repeatable way
- capture what happened
- detect whether progress was made
- stop before the loop becomes wasteful
- show the human a useful summary
That separation keeps the tool honest. If the model is the brain, the harness is the table, notebook, timer, checklist, and recorder.
3. Non-Technical Users Need a Task Board, Not a Prompt Box
A prompt box is flexible, but it is also slippery. A task board is less expressive, but it creates traction.
For a non-technical user, the core input should look closer to this:
- [ ] Draft the landing page copy |
The user does not need to know how the model is called. They need to know how to describe work, see progress, and correct direction.
The task board gives the harness a simple source of truth:
function nextTask(board): |
This is intentionally plain. The more complicated the task format becomes, the more the user has to learn before they can use the tool.
4. Use Fresh Rounds Instead of One Endless Session
Long AI sessions are convenient until they become muddy. Old assumptions remain in context. Failed attempts stay nearby. The model may continue from a confused state because that state is still part of the conversation.
A harness can use rounds instead:
function runLoop(board): |
Each round gives the agent a fresh view of the current task and the rules. The harness carries durable facts forward, not the entire conversational fog.
For non-technical users, this is useful because the system can explain work in rounds:
Round 1: drafted options |
That is much easier to understand than a long chat transcript.
5. Progress Must Be Observable
If a human cannot tell whether the AI is moving, they cannot supervise it.
A harness should expose a small set of states:
- running
- waiting
- done
- failed
- blocked by human
- stopped because no progress was detected
It should also expose just enough live detail:
- current task
- current round
- elapsed time
- last visible action
- number of completed tasks
- reason for stopping
Pseudocode:
status = { |
This is not only for developers. Non-technical users need this more, because they have fewer other tools to infer what happened.
6. Stop Conditions Are a Product Feature
AI systems need brakes.
The harness should stop when:
- all tasks are done
- the model fails repeatedly
- a round takes too long
- the same task does not make progress
- the task explicitly requires human input
- another run is already using the workspace
The most important stop condition is no progress. A model can produce text forever. That does not mean it is helping.
function detectProgress(before, after): |
This is one of the places where the harness becomes humane. It does not let the user discover waste after the fact. It stops and asks for a decision.
7. Human Handoff Should Be Built In
For non-technical users, “the AI got stuck” should not mean “the system is broken”.
It should mean:
The current task needs your decision. |
The harness can represent that as a human task:
- [ ] HUMAN: Choose whether the homepage should emphasize speed or trust. |
This makes the human part explicit. It also prevents the agent from pretending it can solve a product decision that belongs to the user.
I like this because it respects both sides. The AI can widen the option space. The human still chooses direction.
8. Evidence Matters More Than Transcript Volume
A full transcript is often too much. A useful evidence trail is smaller and more structured.
For each round, I want to know:
- which task was active
- what input was given to the agent
- what the agent produced
- what changed
- whether the task was marked done
- whether there was an error
- what the next handoff should say
Pseudocode:
roundRecord = { |
The user should not have to read every raw event. But the harness should keep enough evidence that a future user, maintainer, or agent can reconstruct the work.
This is another way AI expands cognition: it leaves behind a map, not just an answer.
9. Provider Choice Should Be Replaceable
Different AI tools are good at different things. A harness should avoid hard-wiring itself to one provider’s personality.
The interface can be simple:
interface Provider: |
The loop does not need to know whether the provider is Claude, Codex, Gemini, or something else. It only needs a consistent contract:
prompt in |
For non-technical users, provider choice should feel like selecting an engine, not rewriting the workflow.
10. Keep the Deployment Shape Small
If the harness is for non-technical users, installation matters.
The smaller the runtime shape, the easier it is to trust:
- one local folder
- one task file
- one prompt file
- one private config file
- one run command
- one status command
- one watch command
The mental model should fit in a sentence:
Put the harness in a workspace, write tasks, choose a provider, run the loop, watch progress, answer HUMAN tasks when needed. |
That is still not “no learning”. But it is learnable.
Current Rule
For a human-facing AI harness, my rule is:
make AI work inspectable enough for non-technical supervision |
The goal is not to hide the machine. The goal is to give people a bigger and calmer surface for thought: tasks, rounds, evidence, stop reasons, and human decisions in the same frame.