Transform

Converting content from one modality to another while preserving intent, constraints, and fidelity is a keystone feature of generative AI. Examples include audio to text, text to audio, text to image, image to text (OCR or captioning), text to video, document to slides, code to diagram, and screenshot to HTML.

Different modalities are useful for different purposes. Text is the best tool for working through outlines and narrative. Seeing tokens translated into image can reveal tone and mood, while transforming text or data into diagrams allows users to review it easily. In this way, the transform pattern serves as a creative pipeline between the user and the AI.

Where to find the action

Right-click menus or inline actions (Midjourney)
Chained together on an open canvas (FloraFauna)
Direct generation via open input (Krea)
Background helpers, such as transcription (Descript)
Built into the core interface (Chronicle)

Design considerations

Maintain a connection to the original. Show outlines, prompts, and blended sources and make them accessible for additional iteration, even if the form of the generation has changed.
Add stop points between modalities. As transformational actions tend to be more expensive to fulfill, put action plans, sample responses, and verifications in front of the user before committing to a broader generative run.
Start small and work up. Show drafts of the transformed generation before committing to a full run, like generating an outline or snippet first.

Transform

Where to find the action

Design considerations

Related Patterns

Examples