Shared vision

As AI becomes increasingly autonomous, humans require affordances that allow them to monitor it without disrupting its flow, and to intervene if necessary. This shared vision allows them to observe and orchestrate the AI passively.

In the phsyical world, Assistants like Alexa and autonomous vehicles have already established the pattern of AI informing users when it it active or when it requires human intervention, generally in the form of color, sound, or lights.

GM's SuperCruise and Tesla's AutoPilot features demonstrate how subtle affordances are used to communicate the state of AI autonomy to the user without distracting their focus.

In the digital space, small affordances that keep the user informed of what the AI is seeing and doing have a long history as well, from spinners to more recent “AI is thinking…”-type elements. When the AI is working in sensitive spaces or in a shared context, taking action while the user observes, users need clear affordances into what it's doing so they can intervene if necessary

Ambient shared vision cues are an important clue to alert users to what the AI is seeing and doing. They are meant to attract attention, show what the AI is seeing, and make it clear that a user can intervene.

Perplexity’s Comet browser offers a recent example. When the AI is working in a tab, the page where it’s active has a slight inset glow. So far that’s the extent of the affordance, but one might imagine other cues–particularly color–being used to inform the user when they need to intervene, much like the colors on the GM steering wheel.

When the assistant in Perplexity’s Comet browser is active in a tab, a slight glow alerts the user to its presence. Note the prominance of the Controls pattern in the main screen of the bowser in addition to the sidebar and the clear reasoning in the chat that transparently explains the AI’s reasoning steps.

OpenAI's operator mode takes a similar approach and shows the user the browser within the context of the conversation. The AI's activity is updated in real time, and a ••• menu in the top right includes controls to allow the user to take over.

OpenAI's operator mode emulates a shared browser by creating a window that shows the AI’s actions in real-time as it navigates across multiple sites or applications in pursuit of its goal.

.

Design considerations

  • Ensure friction is warranted. If users are conditioned to believe the AI can fully complete a task on their behalf, they may become frustrated if the AI repeatedly stops it’s work to have them intervene. Consider an up-front step where the AI reviews the likely permissions needed to complete a step, and have the AI proactively request the user’s permission to continue. The more proactive the AI behaves, the less chance for surprise in any direction.
  • Don't let users confuse oversight with full security. AI is vulnerable to prompt injection and other malicious activities. Alerting the user that AI is active in a browser might not be enough to counteract this type of security flaw. For example, Brave documented that a prompt injection in a Reddit thread could be used to secure personal information from the user via Comet’s assistant. Training and additional safeguards are needed on the agent-side, but users also need to be aware of these risks themselves. If a user is exposed to a security vulnerability due to computer use, they will blame the AI and its provider.
  • Lets users constrain the scope of control. Limit shared view to a specific tab, app, or frame rather than full system access. Scope boundaries reassure users that the AI cannot observe unrelated activity or sensitive data elsewhere on the screen.
  • Signal boundaries visually and persistently. Use distinct overlays, colored outlines, or tooltips to show what the AI can see or manipulate. The interface should always communicate which elements are in the AI’s field of view.
  • Design for oversight in reverse as well. Include screenshots and specific details in the AI's stream of thought so users can audit what the AI was seeing in addition to its actions.

Examples

ChatGPT Operator Mode lives seamlessly within the conversational interface. Users can observe as the AI navigates the multiple browser windows, taking action until it reaches the end of the task or a step that requires user intervention.