Master Guide to Perfect Consistent Prompting for AI Video Creation
Table of Contents
1. Introduction: Why Consistency Matters in AI Video Creation
2. Core Principles of Consistent Prompting
3. Tools & Models Overview
4. Basic Prompt Anatomy: Single-Line and Image-Driven Inputs
5. Text-to-Video Prompting Framework
6. Image-to-Video Prompting Framework
7. Multi-Shot Video Prompting Framework
8. Advanced Cinematic Language & Style Controls
9. Post-Processing: Enhancements & Upscaling
10.Templates & Example Prompts
1. Introduction: Why Consistency Matters in AI Video
Creation
When crafting AI-generated video content, consistency is the foundation that transforms a
sequence of disjointed clips into a coherent narrative. Consistent character appearance,
lighting, mood, and framing ensure:
● Storytelling Integrity: Just as films rely on continuity between shots, AI videos
demand stable visual elements to maintain audience immersion.
● Brand Reliability: For commercial campaigns or influencer-style content, a
recognizable avatar (character) and consistent style become a digital spokesperson
and reinforce brand identity.
● Emotional Continuity: Subtle shifts in facial features, color tones, or camera angles
can break the illusion, reminding viewers of the synthetic origin rather than the story
itself.
● Monetization Value: High realism and seamless consistency command premium
rates—clients pay more for videos that feel authentic and polished, rather than
patchwork.
Key Takeaways:
1. Consistency + Realism = Business Asset: Stable video outputs increase brand
recall and engagement, directly impacting ROI.
2. Prompts Are Your Blueprint: A well-structured prompt is akin to a director’s shot
list; it guides the AI toward repeatable, controlled outputs.
3. Think Like a Filmmaker, Not a Prompter: Adopt roles like art director and
DP—frame your prompts with camera angles, lighting notes, and emotional cues to
achieve cinematic results.
2. Core Principles of Consistent Prompting
Building on the importance of consistency, this section outlines the foundational principles
that every prompt must embody to produce stable, high-quality AI video outputs.
2.1 Clarity of Intent
● Single Objective: Each prompt should focus on one primary action or scene. Avoid
combining unrelated actions.
● Explicit Naming: If using a trained character, always start with a clear identifier (e.g.,
CharacterName:) to lock in the specialized model.
2.2 Shot Structuring
● Shot Descriptors: Use cinematic terms in brackets to denote shot types:
○ [Establishing shot: ...]
○ [Close-up: ...]
○ [Tracking shot: ...]
● Sequence Ordering: List shots in narrative order. The AI reads prompts top-down,
simulating shot progression.
2.3 Style Anchors
● Mood & Lighting: Specify tone words (e.g., moody, high contrast, golden
hour) to maintain visual coherence.
● Color Palette: If necessary, add palette hints (e.g., muted blues, warm sepia) for
consistent coloring across scenes.
2.4 Parameter Control
● Character Weight: For models supporting it (e.g., Open Art), set a slider value (e.g.,
0.85) to balance identity preservation vs. creative variation.
● Preserve Toggles: Toggle features like preserve_key_features to lock outfits or
allow wardrobe changes as needed.
2.5 Reference Integration
● Image References: When using image-to-video, upload your source image in the
designated slot. The AI uses it as the visual anchor.
● Style References: Optionally include a secondary image that exemplifies desired
lighting or camera movement.
2.6 Prompt Economy
● Conciseness: Eliminate unnecessary adjectives. Prioritize impactful descriptors that
drive the scene.
● Consistency in Language: Reuse established terms within a series to reinforce
continuity (e.g., always soft neon lighting).
3. Tools & Models Overview
AI video creation relies on specialized engines and models. Choosing the right tool shapes
your workflow, speed, and final output quality. Below is a concise survey of the most
established and cutting‑edge options:
Model / Type Input Strengths Weaknesses Best
Platform Modalities Use
Cas
es
SeaDance Proprietary AI Text‑to‑Video • Exceptional
(Enhancer) video , multi‑shot
Image‑to‑Vid consistency
eo (Cense 1.0)
• Natural
language
cinematic
prompting
• Built‑in • Still in closed
upscaling beta
pipeline via
Enhancer AI
• • Narrative
720 – 1080p sequences with
output limits shot continuity
• Indie
filmmakers,
ads, social
shorts
Google Text‑to‑Video Text only • Early
VE(O) access to
high‑quality
diffusion
• Rapid • Limited
prototyping multi‑shot
in Google support
Cloud
• Less • Quick concept
cinematic tests
language
parsing
• Educational
demos
Cling Image‑to‑Video Image + • Superior
(Runway) Simple text fidelity in
single‑scene
transforms
• Flexible • No native
upscaling to text‑to‑video yet
4K via
integrations
• Requires •
paid plan for Cinemagraphs,
extended product demos
use
• Isolated
shot
animations
OmniHuma Image‑to‑Lip‑Sy Image + • Rapid
n (CapCut) nc Audio lip‑sync from
static portraits
• Full‑body • 15 sec clip
animation length cap
and
backgrounds
• Queue • Talking heads,
times on free avatar
tier presentations
MidJourney Text‑to‑Image Text only • Rich artistic
v7 styles
• OmniRef • Not designed
for for video
consistent
character
variants
• Must export • Keyframe stills
and animate
separately
• Preliminary
concept art
Reeve.v1 Text‑to‑Image Text only • Free access • No
with strong image‑conditioni
fashion/editori ng
al aesthetic
• Requires • Editorial
external portrait series
upscaling &
animation
Flux Pro Text‑to‑Image Text only •
(FAL) High‑resolutio
n human
features
• Specialized • Less creative
face/body flexibility
realism
• • Avatar creation
Underperfor for training
ms on
background
fidelity
ComfyUI Open‑source Text, Image, • Fully
node‑based API customizable
pipelines
• Integrates
Gemini 2.0,
Flux,
upscalers
• Local or • Steep learning
cloud GPU curve
execution
• Requires •
GPU Production‑grad
resources e bespoke
workflows
Model Selection Guidelines
1. Early prototyping & storyboarding: MidJourney, Reeve, Flux — generate static
frames, build mood boards.
2. Shot continuity & narrative: SeaDance/Enhancer’s Cense 1.0 or Google VE(O) for
text‑driven video, Cling for image sequences.
3. Character consistency: Use OmniRef techniques in MidJourney or dedicated
“consistent character” features (Open Art, Cense).
4. Lip‑sync & talking heads: OmniHuman/CapCut for rapid image‑to‑video lip‑sync,
Synthetic UGC or Cling for full video lip‑sync.
5. High‑fidelity final renders: Post‑process with Enhancer AI (skin texture fix,
upscaling) and ComfyUI custom upscaler.
4. Basic Prompt Anatomy: Single-Line and Image-Driven Inputs
4.1 When You Have Only Text (Single-Line Prompts)
A single-line prompt might seem limiting, but if formatted correctly it’ll give the AI enough
structure to produce usable outputs:
css
CopyEdit
[CharacterName:] A close-up shot of a confident female detective
under moody neon light, rain droplets glistening on her trench coat,
shallow depth of field.
● [CharacterName:] – Locks the generation to your pre-trained character model (if
using a “consistent character” feature).
● Primary Action/Description – “A close-up shot of a confident female detective”
● Mood & Lighting – “under moody neon light”
● Detail Anchor – “rain droplets glistening on her trench coat”
● Cinematic Note – “shallow depth of field”
Tips:
● Always begin with your character’s name or identifier if you’ve trained one.
● Use commas to separate distinct descriptive clauses.
● End with a camera/visual note to guide framing.
4.2 When You Have One Reference Image (Image-Driven Prompts)
Uploading a single, high-quality reference image lets the AI use it as a visual anchor. Your
prompt then needs only minimal extra description:
1. Upload your source image in the “Face Reference” (or equivalent) slot.
Prompt:
sql
CopyEdit
Close-up of the uploaded subject gazing into the distance at golden
hour, soft backlight, cinematic portrait.
2.
3. Settings:
○ Strength / Character Weight: 0.8 (locks in face identity)
○ Preserve Key Features: ON (locks outfit/accessories)
○ Aspect Ratio: 9:16 for portrait, 16:9 for landscape
Why It Works:
● The AI already “knows” your subject from the image.
● Your textual prompt only needs to describe the scene variation, not the subject’s
identity.
4.3 Prompt Variations for Common Scenarios
Scenario Prompt Template
Simple Portrait Close-up of [CharacterName:] with soft fill light,
natural skin texture.
Environmental Wide shot of [CharacterName:] walking through a
Change bustling marketplace at dusk.
Emotional Extreme close-up on [CharacterName:]’s eyes filled
Close-Up with tears, reflections of city lights.
Dynamic Action Tracking shot of [CharacterName:] sprinting down a
rain-slicked alley, neon signs overhead.
Product Medium shot of [CharacterName:] holding a steaming
Placement coffee cup, morning sunlight filtering through
window.
Section 5: Text-to-Video Prompting Framework
In this section, we’ll outline a step-by-step structure for writing effective text-to-video prompts
that yield coherent, multi-shot narratives with consistent characters and style.
5. Text-to-Video Prompting Framework
Every strong text-to-video prompt consists of four layers. Use this sequence as your
template:
css
CopyEdit
[ShotType:] [SceneDescription] [ShotList / Beats] [StyleAnchors]
ShotType (optional but recommended)
Clarifies you want multiple shots:
makefile
CopyEdit
Multi-shot:
or for a single scene:
makefile
CopyEdit
Single-shot:
1.
SceneDescription
Who, where, when, and what:
css
CopyEdit
A lone female detective wanders through a rain-soaked neon alley at
midnight,
2.
ShotList / Beats (in brackets)
Describe a sequence of 2–4 camera beats. Each beat in its own bracket:
csharp
CopyEdit
[Wide shot: her silhouette framed against flickering signage]
[Tracking shot: camera follows her steps, water splashing]
[Close-up: raindrops on her cheek as she narrows her eyes]
3.
StyleAnchors
Ties together look and feel:
nginx
CopyEdit
Realistic style, handheld camera, high contrast neon lighting.
4.
Putting it all together:
yaml
CopyEdit
Multi-shot: A lone female detective wanders through a rain-soaked
neon alley at midnight, [Wide shot: her silhouette framed against
flickering signage] [Tracking shot: camera follows her steps, water
splashing] [Close-up: raindrops on her cheek as she narrows her
eyes] Realistic style, handheld camera, high contrast neon lighting.
5.1 Why This Framework Works
● Clarity for the Model: Each layer guides the AI to understand story context, shot
progression, and visual consistency.
● ShotList Brackets: Explicitly signals “cut to”—the AI stitches multiple mini-scenes
into a coherent video.
● StyleAnchors: Ensures all shots share consistent lighting, camera behavior, and
aesthetic.
5.2 Common Shot Types
● Wide shot – Establish setting and scale.
● Medium shot – Show character body language.
● Close-up – Capture emotion and detail.
● Tracking shot – Imply motion through space.
● Over-the-shoulder – Suggest perspective and interaction.
5.3 Tuning Tips
● Length: Keep to 2–4 beats for a 5–10 s clip.
● Specificity: Use precise descriptors (“flickering red neon,” “soft golden backlight”).
● Consistency: Reuse the same keywords across all shots (“neon,” “rain-soaked,”
“handheld”) to anchor style.
● Character Name: If you’ve trained a custom character, start with
[CharacterName:] before the scene description.
Section 6: Image-to-Video Prompting Variations
Building upon text-to-video structure, image-to-video prompts leverage a source image to
generate coherent motion while preserving style and character. Use these guidelines to craft
prompts that transform static frames into dynamic sequences.
6. Image-to-Video Prompting Variations
Template
makefile
CopyEdit
SourceImage: [URL or “uploaded”]
Prompt: [CameraAction] + [SubjectAction] + [ContextSwitch]
StyleAnchors: [ConsistentStyle]
1. SourceImage
○ Always reference the uploaded image explicitly so the model knows which
frame to animate.
Example:
makefile
CopyEdit
SourceImage: uploaded
○
2. Prompt
Compose with three parts concatenated into one natural sentence:
○ CameraAction
Describes how the “camera” moves relative to the scene:
■ Camera pushes in
■ Slow whip pan
■ Dolly out to reveal environment
○ SubjectAction
What the character does in the source image:
■ on screen the woman lifts her chin and blinks
■ the hooded figure turns head to the left
■ he raises a lantern into frame
○ ContextSwitch (optional)
Adds environmental or emotional shift:
■ as rain drips from her eyelashes
■ while neon reflections dance across wet pavement
■ as a gust of wind sweeps through
Full Prompt Example
csharp
CopyEdit
Camera pushes in as the woman lifts her chin and blinks, rain
dripping from her eyelashes.
3.
4. StyleAnchors
Append a short tag to lock in style consistency across frames:
○ Realistic style, handheld camera, moody neon lighting
○ Cinematic drama, shallow depth of field, soft fog
Format (append to end):
CopyEdit
…, Realistic style, handheld camera, neon glow.
○
6.1 Prompt Variations
Minimalist
vbnet
CopyEdit
SourceImage: uploaded
Prompt: Camera pushes in as her eyes flutter open.
StyleAnchors: Realistic style, soft directional key light.
● Use for quick emphasis on a single gesture.
Environmental Emphasis
vbnet
CopyEdit
SourceImage: uploaded
Prompt: Slow dolly out as the hooded figure turns head to the left,
neon puddles reflecting underfoot.
StyleAnchors: Cyberpunk, high contrast color grading.
● Highlights both character motion and background.
Emotional Close-Up
vbnet
CopyEdit
SourceImage: uploaded
Prompt: Tight close-up as a single tear rolls down her cheek.
StyleAnchors: Cinematic drama, cool blue undertones, soft focus
background.
● Perfect for conveying subtle emotional beats.
6.2 Best Practices
● Match Source Composition: Keep camera angles that align with the framing of the
image (e.g., if the face is full-frame, use close-up prompts).
● Limit ContextSwitch: Overloading environmental changes can confuse the model;
use sparingly.
● Reusing StyleAnchors: Copy the exact style phrase for each variation to maintain
uniform look.
● Test & Iterate: Generate 2–3 versions per prompt, compare, and pick the best; small
wording tweaks (e.g., “pushes in” → “slow push-in”) can improve fluidity.
Section 7: Combining Multi-Shot and Image-to-Video for Advanced Sequences
By layering multi-shot prompt structures (from Section 5) onto image-to-video variations
(Section 6), you can craft complex, cinematic sequences from a single source image. This
hybrid approach gives you both narrative flow and dynamic motion.
7. Advanced Sequence Workflow
Overall Structure
1. Storyboard Definition (shot list)
2. Generate Image Variations (context edits)
3. Animate Each Shot (image-to-video)
4. Stitch & Edit (assemble clips)
7.1 Storyboard Definition (Shot List)
Write a concise shot list using multi-shot syntax:
csharp
CopyEdit
[Shot1] Establishing wide shot of Model in Rainy City at Night
[Shot2] Tracking medium shot from behind
[Shot3] Close-up on tear rolling down cheek
[Shot4] Over-the-shoulder reveal of neon sign
●
● Use bracketed descriptions exactly as in Section 5.
7.2 Generate Image Variations
For each shot in your list, produce a static variation via context editing:
Prompt Template
makefile
CopyEdit
SourceImage: uploaded
Prompt: [your bracketed shot description]
StyleAnchors: [identical style tag]
●
Example for Shot 2:
makefile
CopyEdit
SourceImage: uploaded
Prompt: Tracking shot of her walking under flickering street lights.
StyleAnchors: Realistic style, handheld camera, neon glow.
●
● Generate and download each shot’s edited image.
7.3 Animate Each Shot
Turn each static shot into motion with an image-to-video prompt:
Prompt Template
makefile
CopyEdit
SourceImage: [static_shot_image]
Prompt: [CameraAction] + [SubjectAction] + [ContextSwitch]
StyleAnchors: [same style tag]
●
● Map shot list to camera actions:
○ Shot 1 → Camera dollies out to reveal surrounding
cityscape.
○ Shot 2 → Camera tracks from behind as she advances under
neon lights.
○ Shot 3 → Camera pushes in for a tight close-up on her
cheek.
○ Shot 4 → Camera whistles over shoulder to reveal glowing
sign ahead.
● Generate 5–10 sec clips for each shot.
7.4 Stitch & Edit
● Assembly: Import all generated clips into your NLE (e.g., Premiere, DaVinci).
● Transitions: Use standard cuts, match-on-action, and brief cross-dissolves to
smooth between differing camera movements.
● Sound Design: Layer ambient city sounds, rain FX, and a simple score to unify the
sequence.
● Color Grade: Apply a global LUT or adjustment layer matching your StyleAnchors
(e.g., teal & orange, neon contrast).
7.5 Tips for Cohesion
● Consistent Frame Rate: Ensure all clips render at the same FPS (e.g., 24 fps).
● Lighting Continuity: Verify directional light cues across shots align (key-light from
left, fill from right, etc.).
● Motion Matches: Match camera move directions—if Shot 2 is a left-to-right pan,
Shot 4’s over-the-shoulder should also move leftward.
● Audio Bridge: Use a continuous ambient track to mask any visual seams.
Section 8: Scaling & Automating Your AI Film Pipeline
Once you’ve validated the hybrid workflow in Sections 1–7, you can scale production and
automate repetitive steps. This lets you churn out dozens of mini-films with minimal manual
effort.
8.1 Template-Based Prompt Generation
● Shot List Templates
Maintain a library of shot-list “blueprints” (e.g., “urban night walk,” “forest suspense,”
“interior drama”) with prewritten bracketed descriptions.
● Character & Style Profiles
Store each character’s source-image reference ID and its associated StyleAnchors
tags (e.g., neon_noir, cinematic_moody) in a JSON or CSV.
json
CopyEdit
[
{
"character_name": "Neo",
"source_image_id": "img_12345",
"style_anchor": "neon_noir",
"shot_list": [
"Establishing wide shot of Neo on rooftop",
"Tracking shot from behind across wet tiles",
"Close-up on determined expression"
]
},
{ … }
]
● Automated Prompt Assembly
Use a simple script or a custom GPT assistant to merge shot-list entries with the
character/style metadata into the Section 7 hybrid prompts.
8.2 Batch Context Editing
● Bulk Upload
Feed all source images into your chosen context-editing tool (Enhancer.ai, Flux
Context) via API — one shot list entry per image.
● Concurrent Processing
Trigger simultaneous jobs for each bracketed prompt; collect outputs into a
designated folder structure (/project_name/shot01/, etc.).
8.3 Batch Image-to-Video Rendering
● API-Driven Rendering
Submit your edited static shots to your image-to-video model (Cense, Cling) using
their developer API.
● Job Monitoring & Retry
Implement a polling routine that checks job status, retries failures up to N times, and
logs final clip URLs.
8.4 Automated Assembly & Post-Production
● Timeline Generation
With a script or NLE API (e.g., Premiere Pro Extendscript, DaVinci Resolve Python),
auto-import clips and place them in order.
● Standard Transitions & Titles
Insert a prebuilt transition and optional title cards (e.g., “Scene 1,” “Scene 2”).
● Auto-Audio Mix
Batch-apply the same ambient/music track; normalize levels to a target LUFS.
8.5 Cloud Deployment & Storage
● Cloud Workers
Run your batch jobs on cloud instances (AWS Lambda, Google Cloud Functions) for
unlimited parallelism.
● Asset Management
Store generated assets in object storage (e.g., S3, GCS), with lifecycle rules to
archive older versions.
8.6 Monitoring & Quality Control
● Automated QA Checks
○ Frame Rate/Resolution Validation: Ensure each clip matches your spec
(720p/24 fps or 4K/30 fps).
○ Duration Check: Confirm clip lengths fall within ±0.5 s of target.
○ Style Consistency: Run a lightweight image classifier on key frames to verify
style-anchor adherence.
● Human Spot-Check
Randomly sample 5% of sequences for manual review, focusing on lip sync,
continuity, and lighting.
8.7 Cost & Throughput Optimization
● Model Selection per Stage
Use faster/cheaper models for initial drafts; reserve high-quality modes for final
deliverables.
● Adaptive Quality
For internal review tiers, render at lower resolution — upscale only client-approved
cuts.
● Resource Scheduling
Schedule heavy upscaler jobs overnight when compute rates are lower.
Section 9: Analytics, Feedback Loops & Iterative Improvement
To continuously raise the bar on quality and efficiency, integrate data-driven feedback at
each stage of your AI film pipeline. Here’s how:
9.1 Viewer Engagement Metrics
● Key Metrics to Track
○ Play Rate: % of viewers who hit “play” on social previews.
○ Watch Time / Completion Rate: Average seconds watched per clip; % of
viewers who watch to the end.
○ Drop-Off Points: Timestamp heatmaps showing where attention drops.
○ Replays & Shares: How often clips are rewatched or shared.
● Instrumenting Player
Embed your AI videos in a player (e.g., Wistia, Vimeo Pro, custom JS) that exposes
these metrics via analytics APIs.
9.2 A/B Testing Variants
● Shot Order Variants
Swap shot sequencing (e.g., start with close-up vs. wide-establishing) to see which
hook retains viewers.
● Soundtrack & SFX
Test different music tracks, ambient sound intensity, and mix levels.
● Color Grade & Style Filters
Apply distinct LUTs or AI filters (warm vs. cool tones) to gauge emotional impact.
● Automated Delivery
Use a feature-flag system or a content-delivery A/B tool (e.g., LaunchDarkly) to
randomly expose variants to segments of your audience.
9.3 Prompt Performance Analytics
● Logging Prompt & Result Pairings
Store the exact prompt text, model/version, and resulting clip URL in a database
along with engagement metrics.
● Natural Language Analysis
Periodically analyze high-performing vs. low-performing prompts:
○ Keyword Frequency: Which descriptive terms (e.g., “tracking shot,” “golden
hour”) correlate with higher watch time?
○ Structure Patterns: Does including bracketed shot lists outperform free-form
descriptions?
● Recommendation Engine
Build a simple recommender that suggests prompt tweaks based on historical
performance (e.g., “Videos with ‘close-up tear’ had +20% retention; consider adding
emotional close-up”).
9.4 Automated Iteration Cycle
1. Collect Data: Ingest viewer and prompt analytics nightly.
2. Analyze & Hypothesize: Automatically flag underperforming sequences or shots.
3. Generate Variants: Spin up new edits with modified prompts (e.g., swap lighting
cue, re-order shots).
4. Deploy & Test: Release new variants to a test cohort.
5. Evaluate & Promote: Promote winning versions to full audience; retire losers.
Leverage orchestration frameworks (Airflow, Prefect) to schedule these cycles, ensuring
your content continually self-optimizes.
9.5 Quality Improvement via Human-in-the-Loop
● Annotator Review: Have editors tag specific failure modes (e.g., “lips don’t sync,”
“background flicker,” “style drift”).
● Retraining Custom GPT: Feed annotated examples back into your custom GPT to
refine its shot-list & prompt generation logic.
● Model Version Upgrades: When new AI model versions release, re-benchmark and
adopt if superior; phase out deprecated versions.
9.6 Business Metrics & ROI
● Cost per Finished Film: GPU hours + human spot-check time / number of final
deliverables.
● Revenue per Film: Direct client billings or ad revenue attributed.
● Profitability Curve: Track margin improvements as automation reduces manual
touches.
● Time to Delivery: Average turnaround from brief to final deliverable; target
incremental reduction each iteration.
Section X: HeyGen “Custom Motion” Prompts
HeyGen’s Avatar IV interface includes an optional Custom Motion field where you can
specify precise gestures, facial expressions, and body language to complement your script.
Using this field effectively keeps your avatar’s performance consistent and on-brand. Below
is a checklist and several example prompts to guide you.
1. When to Use “Custom Motion”
● Enhance Engagement: Add subtle nods or hand gestures when emphasizing key
points.
● Express Emotion: Direct the avatar’s facial cues—smiles, surprise, furrowed
brows—to match the tone of your line.
● Match Scene Context: If your avatar is “holding” or “pointing to” an on-screen
graphic, cue the appropriate arm movement.
2. Anatomy of a Good Motion Prompt
1. Start with the Region: e.g. Head, Right hand, Torso, Eyes
2. Define the Action: e.g. tilts, raises, leans, smiles, blinks
3. Add Timing/Cadence (optional): e.g. slightly, slowly, quickly, on-beat, linger for
1 sec
4. Tie to Script: reference the corresponding text segment or emotion
Template:
css
CopyEdit
[Region] [action] [modifier] to express [emotion/context].
3. Example Custom-Motion Prompts
Emphasizing a Key Point
sql
CopyEdit
Right hand raises slowly with palm up to emphasize “new AI feature.”
●
Friendly Greeting
css
CopyEdit
Head tilts slightly to the right and smiles warmly at “Hello,
everyone!”
●
Expressing Surprise
python
CopyEdit
Eyebrows raise quickly and eyes widen on “You won’t believe what
happened next.”
●
Pointing at On-Screen Graphic
scss
CopyEdit
Left hand extends and points gently toward the bottom-right corner
when referencing the chart.
●
Pensive Pause
vbnet
CopyEdit
Eyes glance downward for 1 sec then back up before “let me show you
how.”
●
Natural Blinks
css
CopyEdit
Blink naturally every 4–6 seconds throughout the entire clip.
●
4. Tips for Smooth Motion
● Keep It Lightweight: Avoid overloading with simultaneous gestures—1–2 cues per
10 sec clip is usually sufficient.
● Align with Speech: Place the motion cue immediately before or during the matching
script segment.
● Test & Iterate: Preview at 1× and 2× speed to ensure gestures feel natural and
aren’t cut off.
● Fallback to “Surprise me”: HeyGen’s built-in shuffle can inspire new gesture ideas
if you’re stuck.
Section Y: Model-Specific Prompting & Special Instructions
Different AI video engines have varying strengths, input requirements, and prompt syntaxes.
Below are tailored guidelines for the four models you’ll be using: HeyGen Avatar IV, Cense
Multi-Shot, Google Veo (Flow), and Cling 2.x.
1. HeyGen Avatar IV (Photo → Talking Video)
● Input Image:
○ Minimum 720p, frontal portrait with clear face visibility.
○ Neutral or minimal background helps the avatar extraction.
● Script Field (≤840 characters / 60 sec):
○ Write in conversational tone, include filler words (“um,” “so,” “right?”) for
natural pacing.
○ If using ElevenLabs audio, upload the .wav/.mp3 directly to “upload or record
audio.”
● Custom Motion Field:
○ See Section X above for gesture cues. Keep to 1–2 motions per 15 sec.
● Settings:
○ “More expressive” toggle ON for natural facial nuances.
○ Resolution: 720p (for faster processing) or 1080p if available.
● Tip: Break long scripts into multiple 20–30 sec segments and concatenate outputs
for smoother delivery.
2. Cense 1.0 Multi-Shot Editing (Text → Video & Image → Video)
● Text → Video:
○ Use bracketed shot lists (see Section 9) to define camera angles and cuts.
○ Keep each scene segment concise—2–4 sentences per 10 sec block.
● Image → Video:
○ Source image should match the story’s style; avoid mixing styles mid-prompt.
○ Simple Prompt (“camera pushes in”) yields smooth, continuous motion.
○ Detailed Prompt: Add emotion and action (“wind brushes hair…”) for richer
results.
● Settings:
○ Clip length: 5–10 sec; Resolution: 720p.
● Tip: Cense natively understands natural language—no need for special keywords
like “photorealistic.”
3. Google Veo (Flow API, Text → Video)
● Prompting:
○ More literal: describe exactly what you want the camera to do or the subject
to express.
○ Include mentions of blinking or minor head movements to avoid static-looking
avatars.
● Limitations:
○ Max clip length: 8 sec; Resolution: up to 1080p (depending on API tier).
● Settings:
○ Select “VO3” or “Best Quality” model.
● Tip: Because Veo often nails blinks and facial nuances better, use a short prefix like
“Add natural blinks every 4 sec” in your prompt.
4. Cling 2.x (Image → Video)
● Availability:
○ Currently supports only image → video (no text → video).
● Prompting:
○ Combine a short action cue (“camera tilts up”) with a scene descriptor
(“misty forest background”).
● Settings:
○ Clip length: up to 15 sec; Resolution: 720p or 1080p.
● Tip: Cling excels at faithful depiction of the source image. For minor deviations, keep
prompts minimal—let the source drive most of the style.
Quick-Reference Prompt Template per Engine
Engine Prompt Style Key Notes
HeyGen Full script + Custom Motion cues ≤840 chars, “More expressive” ON,
720p–1080p
Cense Bracketed shot list (Text) or simple Use natural language, 5–10 sec,
action (Image) 720p
Google Literal scene description + blink 8 sec max, add “natural blinks”
Veo reminders
Cling 2.x Short action + environment descriptor 15 sec max, minimal prompt for
consistency