/2026/05/01/human-facing-ai-harness-notes

Notes on Human-Facing AI Harnesses

2026-05-01

I do not think the most interesting use of AI is replacing a person at a keyboard. The better use is expanding the surface area of what a person can inspect, compare, and reason about.

That matters especially for non-technical users. A technical user can already wrap AI with scripts, prompts, logs, git, terminals, and review habits. A non-technical user often only sees the chat box. If the model gets stuck, repeats itself, hides uncertainty, or loses track of the task, the user has very little leverage.

An AI harness is a way to change that relationship. It is not another agent. It is the frame around the agent: the task board, the loop, the logs, the stop conditions, the progress surface, and the place where a human can intervene.

1. AI Should Expand the Cognitive Surface

When I say AI expands cognition, I do not mean it makes decisions for me. I mean it lets me look at more of the problem at once.

A good AI workflow should help a person see:

what the current goal is
what has already been tried
what changed
what failed
where the system is uncertain
what needs a human decision
what evidence exists for the current state

This is different from making the model sound confident. Confidence is cheap. A visible thinking surface is more valuable.

For non-technical users, that surface cannot be “open a terminal and inspect JSON logs”. It has to be a small operational contract: these are the tasks, this is the current round, this is what happened, this is why it stopped.

2. The Harness Is Not the Agent

The first design boundary is important:

agent = the system that reasons and acts
harness = the system that frames, runs, observes, and stops the agent

The harness should not try to become a smarter model. It should not own the product direction. It should not pretend to understand every task deeply.

Its job is narrower:

give the agent a task
call the agent in a repeatable way
capture what happened
detect whether progress was made
stop before the loop becomes wasteful
show the human a useful summary

That separation keeps the tool honest. If the model is the brain, the harness is the table, notebook, timer, checklist, and recorder.

3. Non-Technical Users Need a Task Board, Not a Prompt Box

A prompt box is flexible, but it is also slippery. A task board is less expressive, but it creates traction.

For a non-technical user, the core input should look closer to this:

- [ ] Draft the landing page copy
- [ ] Compare the three strongest positioning options
- [ ] Rewrite the pricing FAQ in a calmer tone
- [ ] Mark anything that needs my decision as HUMAN

The user does not need to know how the model is called. They need to know how to describe work, see progress, and correct direction.

The task board gives the harness a simple source of truth:

function nextTask(board):
  for task in board:
    if task is unchecked:
      return task
  return null

This is intentionally plain. The more complicated the task format becomes, the more the user has to learn before they can use the tool.

4. Use Fresh Rounds Instead of One Endless Session

Long AI sessions are convenient until they become muddy. Old assumptions remain in context. Failed attempts stay nearby. The model may continue from a confused state because that state is still part of the conversation.

A harness can use rounds instead:

function runLoop(board):
  while true:
    task = nextTask(board)

    if task is null:
      stop("done")

    context = buildContext(board, task, previousEvidence)
    result = runAgentOnce(context)
    evidence = capture(result)

    updateBoardIfTaskCompleted(board, result)
    recordRound(task, evidence)

    if needsHuman(board, evidence):
      stop("blocked_by_human")

Each round gives the agent a fresh view of the current task and the rules. The harness carries durable facts forward, not the entire conversational fog.

For non-technical users, this is useful because the system can explain work in rounds:

Round 1: drafted options
Round 2: compared tradeoffs
Round 3: stopped because a human decision is needed

That is much easier to understand than a long chat transcript.

5. Progress Must Be Observable

If a human cannot tell whether the AI is moving, they cannot supervise it.

A harness should expose a small set of states:

running
waiting
done
failed
blocked by human
stopped because no progress was detected

It should also expose just enough live detail:

current task
current round
elapsed time
last visible action
number of completed tasks
reason for stopping

Pseudocode:

status = {
  runId,
  currentTask,
  round,
  completedTasks,
  totalTasks,
  state,
  lastEvent,
  exitReason
}

renderStatus(status):
  show progress
  show current task
  show last event
  show stop reason if finished

This is not only for developers. Non-technical users need this more, because they have fewer other tools to infer what happened.

6. Stop Conditions Are a Product Feature

AI systems need brakes.

The harness should stop when:

all tasks are done
the model fails repeatedly
a round takes too long
the same task does not make progress
the task explicitly requires human input
another run is already using the workspace

The most important stop condition is no progress. A model can produce text forever. That does not mean it is helping.

function detectProgress(before, after):
  if checkedTaskCount(after.board) > checkedTaskCount(before.board):
    return true

  if visibleArtifactsChanged(before, after):
    return true

  return false

if detectProgress(before, after):
  stallCount = 0
else:
  stallCount += 1

if stallCount >= limit:
  insertHumanTask("The loop stopped because no progress was detected.")
  stop("blocked_by_human")

This is one of the places where the harness becomes humane. It does not let the user discover waste after the fact. It stops and asks for a decision.

7. Human Handoff Should Be Built In

For non-technical users, “the AI got stuck” should not mean “the system is broken”.

It should mean:

The current task needs your decision.
Here is what happened.
Here are the options.
Here is the smallest thing you need to answer.

The harness can represent that as a human task:

- [ ] HUMAN: Choose whether the homepage should emphasize speed or trust.
- [ ] Rewrite the homepage based on that decision.

This makes the human part explicit. It also prevents the agent from pretending it can solve a product decision that belongs to the user.

I like this because it respects both sides. The AI can widen the option space. The human still chooses direction.

8. Evidence Matters More Than Transcript Volume

A full transcript is often too much. A useful evidence trail is smaller and more structured.

For each round, I want to know:

which task was active
what input was given to the agent
what the agent produced
what changed
whether the task was marked done
whether there was an error
what the next handoff should say

Pseudocode:

roundRecord = {
  taskId,
  roundNumber,
  startedAt,
  finishedAt,
  resultState,
  changedArtifacts,
  summary,
  error,
  nextAction
}

The user should not have to read every raw event. But the harness should keep enough evidence that a future user, maintainer, or agent can reconstruct the work.

This is another way AI expands cognition: it leaves behind a map, not just an answer.

9. Provider Choice Should Be Replaceable

Different AI tools are good at different things. A harness should avoid hard-wiring itself to one provider’s personality.

The interface can be simple:

interface Provider:
  checkAvailable()
  runOnce(prompt, outputTarget)
  collectEvidence()
  diagnoseFailure()

The loop does not need to know whether the provider is Claude, Codex, Gemini, or something else. It only needs a consistent contract:

prompt in
events out
evidence copied
failure diagnosed

For non-technical users, provider choice should feel like selecting an engine, not rewriting the workflow.

10. Keep the Deployment Shape Small

If the harness is for non-technical users, installation matters.

The smaller the runtime shape, the easier it is to trust:

one local folder
one task file
one prompt file
one private config file
one run command
one status command
one watch command

The mental model should fit in a sentence:

Put the harness in a workspace, write tasks, choose a provider, run the loop, watch progress, answer HUMAN tasks when needed.

That is still not “no learning”. But it is learnable.

Current Rule

For a human-facing AI harness, my rule is:

make AI work inspectable enough for non-technical supervision

The goal is not to hide the machine. The goal is to give people a bigger and calmer surface for thought: tasks, rounds, evidence, stop reasons, and human decisions in the same frame.