Aurora best practices: How to create ultra-realistic AI videos

Aurora best practices: How to create ultra-realistic AI videos

6 mars 2026

Aurora best practices
Creatify logo
Creatify logo

Équipe Creatify

6 mars 2026

PARTAGER

Icône LinkedIn
Icône X
Icône Facebook

DANS CET ARTICLE

Most AI video generators give you the uncanny valley - mouths that move, eyes that don't, bodies that stay frozen like a cardboard cutout. Aurora is built to fix that.

Ultra realistic AI video capture

Aurora is Creatify's proprietary diffusion transformer (DiT) model for audio-driven avatar synthesis. Give it one photo and an audio clip, and it generates a studio-quality video of that person speaking, presenting, or singing - with synchronized facial expressions, natural eye movements, breathing, and full upper-body gestures. It's not just lip sync. It's a full performance.

The model has already been integrated into ElevenLabs, Runware and fal.ai as one of the first video generation models - a signal of where AI video generation is heading.

This guide covers how to get the best results out of it.

What makes Aurora different

Most talking-head tools animate the mouth and call it a day. Aurora treats the avatar as a whole person, setting a new benchmark for realistic AI video generation.

Here's what the model actually produces:

  • Lip sync that tracks the audio accurately, including subtle mouth shapes for different phonemes

  • Facial expressions that match vocal tone and emotional delivery

  • Eye movements - blinking, gaze shifts, natural focus

  • Head movement - nods, tilts, subtle position changes

  • Upper-body gestures - hand movements, shoulder shifts, the kind of natural motion that makes a talking head feel real rather than robotic

  • Breathing - chest movement between sentences

What makes Aurora different

The underlying architecture fuses an image encoder, text encoder, and audio encoder into a shared latent space, so the model understands the emotional context of what's being said and reflects it visually. If the audio sounds enthusiastic, the avatar looks enthusiastic.

Aurora Diffusion Transformer

What you can build with it

Aurora on  screens

Aurora supports a wide range of content types beyond simple talking heads, making it a powerful tool for ai video gen workflows:

  • Product demos - Show a spokesperson holding a product, pointing at it, and explaining its benefits. Works for skincare, tech, consumer goods, anything.

  • UGC-style ads - Selfie format, slight handheld camera shake, casual delivery. Hard to distinguish from real creator content.

  • Podcast clips - Avatar faces slightly to the side as if talking to a co-host, with an engaged, conversational expression.

  • Multilingual content - Generate the same video in any language without re-filming. Aurora keeps the avatar's lip movements in sync with the new audio.

  • Singing avatars - Give it album art and a song, and the avatar performs it. Useful for music marketing or entertainment content.

  • Animated characters - Works with illustrated characters and stylized art, not just realistic photos.

Choose an avatar SS

Getting the best results with AI video gen

1. Start with the right image

Aurora is flexible - it works with photos, renders, and character art. But a few things help:

  • The subject should be clearly visible and distinguishable in the frame

  • For consistent multi-scene videos, keep similar framing across all images (e.g., all portrait shots)

  • If movement looks unnatural, try an image with a cleaner, more neutral pose

There are no strict limitations on angle, lighting, or composition. Aurora adjusts dynamically.

How to prepare a better image

2. Use Voice Model V3

This is non-negotiable for quality results. Voice Model V3 delivers the most accurate lip sync and the widest expressive range. Older voice models produce noticeably worse output.

Keep speech speed moderate and clear. If sync feels slightly off, slowing the voice down slightly usually fixes it. Add natural pauses between sentences - they give the avatar room to breathe and make the performance feel more human.

3. Master your prompt

This is where most people leave results on the table. The prompt tells Aurora how the avatar should behave - not just what it looks like, but how it moves, what emotion it conveys, and how it interacts with the scene.

Use this as your base prompt for any standard talking-head video:

4K studio interview, medium close-up (shoulders-up crop). Solid light-grey seamless backdrop, uniform soft key-light - no lighting change. Presenter faces lens, steady eye-contact. Hands remain below frame, body perfectly still. Ultra-sharp.

From there, layer in behavioral cues specific to your use case.

Prompt examples by format:

Use Case

Behavioral Prompt to Add

Product demo

The person holding the product is showing the label face to the camera while explaining, pointing at it from time to time.

Natural talking head

The person is talking and facing the camera directly and naturally with breathing chest movement. Natural explaining gestures and eye movements.

Podcast

The person is looking and facing to the side as if talking to someone in that direction, with engaging expression showing interest in the topic.

UGC selfie

The person is talking in front of the camera with one hand not visible. The camera has a slight shake as if handheld.

Enthusiastic product review

The person's hands move enthusiastically trying to explain the benefit of the product.

The more specific you get with emotional tone and physical behavior, the better the output. Vague prompts produce generic results.

Pro tip: Use GPT to combine the base cinematic setup with your specific use case. Prompt it: "Generate an optimized Aurora prompt for a [X] product demo" and it'll blend the technical framing with the right behavioral cues automatically.

4. Dial in prompt_guidance

Aurora has a prompt_guidance parameter ranging from 0 to 4. It controls how strictly the model follows your prompt versus allowing natural variation.

  • Start at 1 for most scenes. It gives the model room to perform naturally while still following direction.

  • Increase it if the avatar drifts off-prompt or doesn't follow the behavioral cues you set.

  • Decrease it if the performance feels stiff or mechanical.

5. Match audio, image, and prompt emotionally

The most common mistake: using an energetic, upbeat audio track with a neutral-faced image and a calm behavioral prompt. The model fuses all three inputs. If they're pulling in different directions, the output feels inconsistent.

If your audio is enthusiastic, your prompt should call for energetic, expressive behavior. If it's calm and informational, your prompt should reflect that. The more aligned these three inputs are, the more convincing the result.

Quick troubleshooting

Problem

Fix

Lip sync feels off

Slow down voice speed slightly

Movement looks unnatural

Try a different image with a cleaner pose

Avatar drifts off-prompt

Increase prompt_guidance

Performance feels too robotic

Lower prompt_guidance; add softer behavioral cues

Inconsistency across scenes

Use images with similar framing and style

Quick ai avatar troubleshooting

The bigger picture

Aurora represents a meaningful step forward in AI video generation — not because it's a novelty, but because it solves a real production problem. Creating high-quality avatar video used to require a camera, a studio, a performer, and a post-production workflow. Now it requires a photo and a script.

For performance marketers running paid campaigns, that changes the math on creative testing. For agencies managing multiple clients, it changes the economics of video production. For anyone who's ever passed on video ads because of cost or complexity, it removes the barrier entirely.

The model is live on Creatify, and integrations with ElevenLabs, Runware and fal.ai mean it's increasingly accessible as a standalone capability for developers and creators building on top of AI infrastructure.

One photo. One audio clip. A video that looks like you shot it in a studio.

Try Aurora on Creatify →

Most AI video generators give you the uncanny valley - mouths that move, eyes that don't, bodies that stay frozen like a cardboard cutout. Aurora is built to fix that.

Ultra realistic AI video capture

Aurora is Creatify's proprietary diffusion transformer (DiT) model for audio-driven avatar synthesis. Give it one photo and an audio clip, and it generates a studio-quality video of that person speaking, presenting, or singing - with synchronized facial expressions, natural eye movements, breathing, and full upper-body gestures. It's not just lip sync. It's a full performance.

The model has already been integrated into ElevenLabs, Runware and fal.ai as one of the first video generation models - a signal of where AI video generation is heading.

This guide covers how to get the best results out of it.

What makes Aurora different

Most talking-head tools animate the mouth and call it a day. Aurora treats the avatar as a whole person, setting a new benchmark for realistic AI video generation.

Here's what the model actually produces:

  • Lip sync that tracks the audio accurately, including subtle mouth shapes for different phonemes

  • Facial expressions that match vocal tone and emotional delivery

  • Eye movements - blinking, gaze shifts, natural focus

  • Head movement - nods, tilts, subtle position changes

  • Upper-body gestures - hand movements, shoulder shifts, the kind of natural motion that makes a talking head feel real rather than robotic

  • Breathing - chest movement between sentences

What makes Aurora different

The underlying architecture fuses an image encoder, text encoder, and audio encoder into a shared latent space, so the model understands the emotional context of what's being said and reflects it visually. If the audio sounds enthusiastic, the avatar looks enthusiastic.

Aurora Diffusion Transformer

What you can build with it

Aurora on  screens

Aurora supports a wide range of content types beyond simple talking heads, making it a powerful tool for ai video gen workflows:

  • Product demos - Show a spokesperson holding a product, pointing at it, and explaining its benefits. Works for skincare, tech, consumer goods, anything.

  • UGC-style ads - Selfie format, slight handheld camera shake, casual delivery. Hard to distinguish from real creator content.

  • Podcast clips - Avatar faces slightly to the side as if talking to a co-host, with an engaged, conversational expression.

  • Multilingual content - Generate the same video in any language without re-filming. Aurora keeps the avatar's lip movements in sync with the new audio.

  • Singing avatars - Give it album art and a song, and the avatar performs it. Useful for music marketing or entertainment content.

  • Animated characters - Works with illustrated characters and stylized art, not just realistic photos.

Choose an avatar SS

Getting the best results with AI video gen

1. Start with the right image

Aurora is flexible - it works with photos, renders, and character art. But a few things help:

  • The subject should be clearly visible and distinguishable in the frame

  • For consistent multi-scene videos, keep similar framing across all images (e.g., all portrait shots)

  • If movement looks unnatural, try an image with a cleaner, more neutral pose

There are no strict limitations on angle, lighting, or composition. Aurora adjusts dynamically.

How to prepare a better image

2. Use Voice Model V3

This is non-negotiable for quality results. Voice Model V3 delivers the most accurate lip sync and the widest expressive range. Older voice models produce noticeably worse output.

Keep speech speed moderate and clear. If sync feels slightly off, slowing the voice down slightly usually fixes it. Add natural pauses between sentences - they give the avatar room to breathe and make the performance feel more human.

3. Master your prompt

This is where most people leave results on the table. The prompt tells Aurora how the avatar should behave - not just what it looks like, but how it moves, what emotion it conveys, and how it interacts with the scene.

Use this as your base prompt for any standard talking-head video:

4K studio interview, medium close-up (shoulders-up crop). Solid light-grey seamless backdrop, uniform soft key-light - no lighting change. Presenter faces lens, steady eye-contact. Hands remain below frame, body perfectly still. Ultra-sharp.

From there, layer in behavioral cues specific to your use case.

Prompt examples by format:

Use Case

Behavioral Prompt to Add

Product demo

The person holding the product is showing the label face to the camera while explaining, pointing at it from time to time.

Natural talking head

The person is talking and facing the camera directly and naturally with breathing chest movement. Natural explaining gestures and eye movements.

Podcast

The person is looking and facing to the side as if talking to someone in that direction, with engaging expression showing interest in the topic.

UGC selfie

The person is talking in front of the camera with one hand not visible. The camera has a slight shake as if handheld.

Enthusiastic product review

The person's hands move enthusiastically trying to explain the benefit of the product.

The more specific you get with emotional tone and physical behavior, the better the output. Vague prompts produce generic results.

Pro tip: Use GPT to combine the base cinematic setup with your specific use case. Prompt it: "Generate an optimized Aurora prompt for a [X] product demo" and it'll blend the technical framing with the right behavioral cues automatically.

4. Dial in prompt_guidance

Aurora has a prompt_guidance parameter ranging from 0 to 4. It controls how strictly the model follows your prompt versus allowing natural variation.

  • Start at 1 for most scenes. It gives the model room to perform naturally while still following direction.

  • Increase it if the avatar drifts off-prompt or doesn't follow the behavioral cues you set.

  • Decrease it if the performance feels stiff or mechanical.

5. Match audio, image, and prompt emotionally

The most common mistake: using an energetic, upbeat audio track with a neutral-faced image and a calm behavioral prompt. The model fuses all three inputs. If they're pulling in different directions, the output feels inconsistent.

If your audio is enthusiastic, your prompt should call for energetic, expressive behavior. If it's calm and informational, your prompt should reflect that. The more aligned these three inputs are, the more convincing the result.

Quick troubleshooting

Problem

Fix

Lip sync feels off

Slow down voice speed slightly

Movement looks unnatural

Try a different image with a cleaner pose

Avatar drifts off-prompt

Increase prompt_guidance

Performance feels too robotic

Lower prompt_guidance; add softer behavioral cues

Inconsistency across scenes

Use images with similar framing and style

Quick ai avatar troubleshooting

The bigger picture

Aurora represents a meaningful step forward in AI video generation — not because it's a novelty, but because it solves a real production problem. Creating high-quality avatar video used to require a camera, a studio, a performer, and a post-production workflow. Now it requires a photo and a script.

For performance marketers running paid campaigns, that changes the math on creative testing. For agencies managing multiple clients, it changes the economics of video production. For anyone who's ever passed on video ads because of cost or complexity, it removes the barrier entirely.

The model is live on Creatify, and integrations with ElevenLabs, Runware and fal.ai mean it's increasingly accessible as a standalone capability for developers and creators building on top of AI infrastructure.

One photo. One audio clip. A video that looks like you shot it in a studio.

Try Aurora on Creatify →

Icon
Icon
Icon

Prêt à transformer votre produit en une vidéo captivante ?

Prêt à accélérer votre marketing ?

Testez vos nouvelles idées de produits en quelques minutes avec des publicités vidéo générées par l'IA

Icône de flèche.
Gradient

Prêt à accélérer votre marketing ?

Testez vos nouvelles idées de produits en quelques minutes avec des publicités vidéo générées par l'IA

Icône de flèche.
Gradient

Prêt à accélérer votre marketing ?

Testez vos nouvelles idées de produits en quelques minutes avec des publicités vidéo générées par l'IA

Icône de flèche.
Gradient

Prêt à accélérer votre marketing ?

Testez vos nouvelles idées de produits en quelques minutes avec des publicités vidéo générées par l'IA

Icône de flèche.
Gradient
Gradient