/2026/05/01/human-facing-ai-harness-notes

Notes on Human-Facing AI Harnesses

中文

I do not think the most interesting use of AI is replacing a person at a keyboard. The better use is expanding the surface area of what a person can inspect, compare, and reason about.

That matters especially for non-technical users. A technical user can already wrap AI with scripts, prompts, logs, git, terminals, and review habits. A non-technical user often only sees the chat box. If the model gets stuck, repeats itself, hides uncertainty, or loses track of the task, the user has very little leverage.

An AI harness is a way to change that relationship. It is not another agent. It is the frame around the agent: the task board, the loop, the logs, the stop conditions, the progress surface, and the place where a human can intervene.

1. AI Should Expand the Cognitive Surface

When I say AI expands cognition, I do not mean it makes decisions for me. I mean it lets me look at more of the problem at once.

A good AI workflow should help a person see:

  • what the current goal is
  • what has already been tried
  • what changed
  • what failed
  • where the system is uncertain
  • what needs a human decision
  • what evidence exists for the current state

This is different from making the model sound confident. Confidence is cheap. A visible thinking surface is more valuable.

For non-technical users, that surface cannot be “open a terminal and inspect JSON logs”. It has to be a small operational contract: these are the tasks, this is the current round, this is what happened, this is why it stopped.

2. The Harness Is Not the Agent

The first design boundary is important:

agent = the system that reasons and acts
harness = the system that frames, runs, observes, and stops the agent

The harness should not try to become a smarter model. It should not own the product direction. It should not pretend to understand every task deeply.

Its job is narrower:

  • give the agent a task
  • call the agent in a repeatable way
  • capture what happened
  • detect whether progress was made
  • stop before the loop becomes wasteful
  • show the human a useful summary

That separation keeps the tool honest. If the model is the brain, the harness is the table, notebook, timer, checklist, and recorder.

3. Non-Technical Users Need a Task Board, Not a Prompt Box

A prompt box is flexible, but it is also slippery. A task board is less expressive, but it creates traction.

For a non-technical user, the core input should look closer to this:

- [ ] Draft the landing page copy
- [ ] Compare the three strongest positioning options
- [ ] Rewrite the pricing FAQ in a calmer tone
- [ ] Mark anything that needs my decision as HUMAN

The user does not need to know how the model is called. They need to know how to describe work, see progress, and correct direction.

The task board gives the harness a simple source of truth:

function nextTask(board):
for task in board:
if task is unchecked:
return task
return null

This is intentionally plain. The more complicated the task format becomes, the more the user has to learn before they can use the tool.

4. Use Fresh Rounds Instead of One Endless Session

Long AI sessions are convenient until they become muddy. Old assumptions remain in context. Failed attempts stay nearby. The model may continue from a confused state because that state is still part of the conversation.

A harness can use rounds instead:

function runLoop(board):
while true:
task = nextTask(board)

if task is null:
stop("done")

context = buildContext(board, task, previousEvidence)
result = runAgentOnce(context)
evidence = capture(result)

updateBoardIfTaskCompleted(board, result)
recordRound(task, evidence)

if needsHuman(board, evidence):
stop("blocked_by_human")

Each round gives the agent a fresh view of the current task and the rules. The harness carries durable facts forward, not the entire conversational fog.

For non-technical users, this is useful because the system can explain work in rounds:

Round 1: drafted options
Round 2: compared tradeoffs
Round 3: stopped because a human decision is needed

That is much easier to understand than a long chat transcript.

5. Progress Must Be Observable

If a human cannot tell whether the AI is moving, they cannot supervise it.

A harness should expose a small set of states:

  • running
  • waiting
  • done
  • failed
  • blocked by human
  • stopped because no progress was detected

It should also expose just enough live detail:

  • current task
  • current round
  • elapsed time
  • last visible action
  • number of completed tasks
  • reason for stopping

Pseudocode:

status = {
runId,
currentTask,
round,
completedTasks,
totalTasks,
state,
lastEvent,
exitReason
}

renderStatus(status):
show progress
show current task
show last event
show stop reason if finished

This is not only for developers. Non-technical users need this more, because they have fewer other tools to infer what happened.

6. Stop Conditions Are a Product Feature

AI systems need brakes.

The harness should stop when:

  • all tasks are done
  • the model fails repeatedly
  • a round takes too long
  • the same task does not make progress
  • the task explicitly requires human input
  • another run is already using the workspace

The most important stop condition is no progress. A model can produce text forever. That does not mean it is helping.

function detectProgress(before, after):
if checkedTaskCount(after.board) > checkedTaskCount(before.board):
return true

if visibleArtifactsChanged(before, after):
return true

return false

if detectProgress(before, after):
stallCount = 0
else:
stallCount += 1

if stallCount >= limit:
insertHumanTask("The loop stopped because no progress was detected.")
stop("blocked_by_human")

This is one of the places where the harness becomes humane. It does not let the user discover waste after the fact. It stops and asks for a decision.

7. Human Handoff Should Be Built In

For non-technical users, “the AI got stuck” should not mean “the system is broken”.

It should mean:

The current task needs your decision.
Here is what happened.
Here are the options.
Here is the smallest thing you need to answer.

The harness can represent that as a human task:

- [ ] HUMAN: Choose whether the homepage should emphasize speed or trust.
- [ ] Rewrite the homepage based on that decision.

This makes the human part explicit. It also prevents the agent from pretending it can solve a product decision that belongs to the user.

I like this because it respects both sides. The AI can widen the option space. The human still chooses direction.

8. Evidence Matters More Than Transcript Volume

A full transcript is often too much. A useful evidence trail is smaller and more structured.

For each round, I want to know:

  • which task was active
  • what input was given to the agent
  • what the agent produced
  • what changed
  • whether the task was marked done
  • whether there was an error
  • what the next handoff should say

Pseudocode:

roundRecord = {
taskId,
roundNumber,
startedAt,
finishedAt,
resultState,
changedArtifacts,
summary,
error,
nextAction
}

The user should not have to read every raw event. But the harness should keep enough evidence that a future user, maintainer, or agent can reconstruct the work.

This is another way AI expands cognition: it leaves behind a map, not just an answer.

9. Provider Choice Should Be Replaceable

Different AI tools are good at different things. A harness should avoid hard-wiring itself to one provider’s personality.

The interface can be simple:

interface Provider:
checkAvailable()
runOnce(prompt, outputTarget)
collectEvidence()
diagnoseFailure()

The loop does not need to know whether the provider is Claude, Codex, Gemini, or something else. It only needs a consistent contract:

prompt in
events out
evidence copied
failure diagnosed

For non-technical users, provider choice should feel like selecting an engine, not rewriting the workflow.

10. Keep the Deployment Shape Small

If the harness is for non-technical users, installation matters.

The smaller the runtime shape, the easier it is to trust:

  • one local folder
  • one task file
  • one prompt file
  • one private config file
  • one run command
  • one status command
  • one watch command

The mental model should fit in a sentence:

Put the harness in a workspace, write tasks, choose a provider, run the loop, watch progress, answer HUMAN tasks when needed.

That is still not “no learning”. But it is learnable.

Current Rule

For a human-facing AI harness, my rule is:

make AI work inspectable enough for non-technical supervision

The goal is not to hide the machine. The goal is to give people a bigger and calmer surface for thought: tasks, rounds, evidence, stop reasons, and human decisions in the same frame.