As AI becomes increasingly autonomous, humans require affordances that allow them to monitor it without disrupting its flow, and to intervene if necessary. The concept of keeping “humans in the loop principles, dates back to theroots of AI in Norbert Wiener’s pivotal Cybernetics.

The principles of this approach allow autonomy under supervision: while acting, the system must remain observable, interruptable, and accountable.

In the phsyical world, Assistants like Alexa and autonomous vehicles have already established the pattern of AI informing users when it it active or when it requires human intervention, generally in the form of color, sound, or lights.

GM's SuperCruise and Tesla's AutoPilot features demonstrate how subtle affordances are used to communicate the state of AI autonomy to the user without distracting their focus.

In the digital space, small affordances that keep the user informed have a long history as well, from spinners to more recent “AI is thinking…”-type elements. These are effective for situations where the risk from errors is low, like a form submission timing out or where an AI returns a poor response in conversation. They are not sufficient for more risky endeavors, particularly as AI programs take on more actions on behalf of the user and agents begin interacting with each other.

The intensity of the human in the loop pattern is equivalent to the risk of the AI's actions. These might be set by the user, or by the underlying program. We can bucket this scale into three categories:

1. Ambient cues

These indicators inform the user that the AI is actively working. They are meant to attract attention, show what the AI is seeing, and make it clear that a user can intervene.

Perplexity’s Comet browser offers a recent example. When the AI is working in a tab, the page where it’s active has a slight inset glow. So far that’s the extent of the affordance, but one might imagine other cues–particularly color–being used to inform the user when they need to intervene, much like the colors on the GM steering wheel.

When the assistant in Perplexity’s Comet browser is active in a tab, a slight glow alerts the user to its presence. Note the prominance of the Controls pattern in the main screen of the bowser in addition to the sidebar and the clear reasoning in the chat that transparently explains the AI’s reasoning steps.

OpenAI's operator mode takes a similar approach and shows the user the browser within the context of the conversation. The AI's activity is updated in real time, and a ••• menu in the top right includes controls to allow the user to take over.

OpenAI's operator mode emulates a shared browser by creating a window that shows the AI’s actions in real-time as it navigates across multiple sites or applications in pursuit of its goal.

2. Stream of consciousness

Understanding how the AI is working through a task and anticipating its next moves allow the user to preempt issues that could come up before they occur. Visible reasoning and step-by-step tasks are good examples of this pattern in action. In most cases, these are ignored by the user and may be useful for retroactively debugging odd behavior. However, for more complicated or risky actions, these allow the user to actively monitor the AI in the moment, thereby maintaining their agency.

Figma Make and other generators demonstrate their step-by-step reasoning in the sidebar and show the code being written to fulfill its goal. This approach works well for chat-based assistants. For assistants operating independently, KaibanJS takes a novel approach by using a Kanban board to show thinking and progress in real time.

3. Review and approve

There are limits to the actions that AI can take independently. Whether set by the user or the policies of the application, when the AI hits a step that requires human intervention, it needs to alert the user that it is pausing its work. The limitation here is when the user expects the AI to fully take over, and come back later to find a task they expected to be done is halted mid-way due to an unnecessary blocker.

ChatGPT’s operator mode has explicit limitations that require user intervention. When the AI reaches one of these steps, it summarizes it’s process thus far and prompts the user to take over. Once the user is done with the task, they can hand control back to the AI to continue.

Overtime, we will likely see user-led rules in settings panels or agent.md files that provide instructions about when to stop during unforseen loops, similar to how workflow builders like Zapier and Relay allow these steps to be entered manually. This becomes even more complicated as agents work with each other, where a subagent may require approval from a more established or senior agent. Look to parallels in how teams of humans operate interdependently to explore what this could look like in your domain.

Workflow builders like Relay and Zapier include the option for human in the loop steps that prompt some sort of conclusive action by the user. The example from Relay on the left demonstrates the different types of actions supported. The actual confirmation can be sent via email or added as a task in another tool, as demonstrated via Zapier on the right.

Details and variations

  • Affordances that help the user understand where in the interface and task the AI is operating gives the human maximum control.
  • Not all tasks require interruption. Consider the severity of risk that can be caused by an error and the policies of the connected applications and services.
  • Combine with patterns like reasoning and controls to ensure the user can step in easily if necessary.
  • Ambient cues like color, sound, and other affordances are helpful, especially if the user is likely to be multi-tasking while the AI runs.

Considerations

Positives

Opportunity for delight

Consider that users are likely to build their relationship with the AI through smaller, less risky tasks. Clear affordances can make users feel safe and in control, which helps to quickly build trust with the system.

Avoid liability

Any errors that occur in the flow of work taken by the AI on behalf of the user will be attributed to the AI. Caveats are not sufficient to avoid potentially serious issues. Building policies and safe guards into the AI can ensure a smooth and secure user experience.

Concerns

Potential for friction

If users are conditioned to believe the AI can fully complete a task on their behalf, they may become frustrated if the AI repeatedly stops it’s work to have them intervene. Consider an up-front step where the AI reviews the likely permissions needed to complete a step, and have the AI proactively request the user’s permission to continue. The more proactive the AI behaves, the less chance for surprise in any direction.

Security limitations still exists

AI is vulnerable to prompt injection and other malicious activities. Alerting the user that AI is active in a browser might not be enough to counteract this type of security flaw. For example, Brave documented that a prompt injection in a Reddit thread could be used to secure personal information from the user via Comet’s assistant. Training and additional safeguards are needed on the agent-side, but users also need to be aware of these risks themselves. If a user is exposed to a security vulnerability due to computer use, they will blame the AI and its provider.

Use when:
The AI is acting autonomously in shared surfaces or on the user’s behalf so the user can observe it’s actions, intervene if necessary, and retain agency over the AI’s outcomes.