Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
576 views26 pages

AI Video GPT Prompt Data

The document is a comprehensive guide on achieving consistency in AI video creation through effective prompting techniques. It covers core principles, tools, and frameworks for both text-to-video and image-to-video prompting, emphasizing the importance of clarity, shot structuring, and style anchors. Key takeaways include the necessity of well-structured prompts for coherent narratives and the integration of cinematic techniques to enhance visual storytelling.

Uploaded by

jayesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
576 views26 pages

AI Video GPT Prompt Data

The document is a comprehensive guide on achieving consistency in AI video creation through effective prompting techniques. It covers core principles, tools, and frameworks for both text-to-video and image-to-video prompting, emphasizing the importance of clarity, shot structuring, and style anchors. Key takeaways include the necessity of well-structured prompts for coherent narratives and the integration of cinematic techniques to enhance visual storytelling.

Uploaded by

jayesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Master Guide to Perfect Consistent Prompting for AI Video Creation

Table of Contents

1.​ Introduction: Why Consistency Matters in AI Video Creation


2.​ Core Principles of Consistent Prompting
3.​ Tools & Models Overview
4.​ Basic Prompt Anatomy: Single-Line and Image-Driven Inputs
5.​ Text-to-Video Prompting Framework
6.​ Image-to-Video Prompting Framework
7.​ Multi-Shot Video Prompting Framework
8.​ Advanced Cinematic Language & Style Controls
9.​ Post-Processing: Enhancements & Upscaling
10.​Templates & Example Prompts

1. Introduction: Why Consistency Matters in AI Video


Creation
When crafting AI-generated video content, consistency is the foundation that transforms a
sequence of disjointed clips into a coherent narrative. Consistent character appearance,
lighting, mood, and framing ensure:

●​ Storytelling Integrity: Just as films rely on continuity between shots, AI videos


demand stable visual elements to maintain audience immersion.
●​ Brand Reliability: For commercial campaigns or influencer-style content, a
recognizable avatar (character) and consistent style become a digital spokesperson
and reinforce brand identity.
●​ Emotional Continuity: Subtle shifts in facial features, color tones, or camera angles
can break the illusion, reminding viewers of the synthetic origin rather than the story
itself.
●​ Monetization Value: High realism and seamless consistency command premium
rates—clients pay more for videos that feel authentic and polished, rather than
patchwork.

Key Takeaways:

1.​ Consistency + Realism = Business Asset: Stable video outputs increase brand
recall and engagement, directly impacting ROI.
2.​ Prompts Are Your Blueprint: A well-structured prompt is akin to a director’s shot
list; it guides the AI toward repeatable, controlled outputs.
3.​ Think Like a Filmmaker, Not a Prompter: Adopt roles like art director and
DP—frame your prompts with camera angles, lighting notes, and emotional cues to
achieve cinematic results.
2. Core Principles of Consistent Prompting
Building on the importance of consistency, this section outlines the foundational principles
that every prompt must embody to produce stable, high-quality AI video outputs.

2.1 Clarity of Intent

●​ Single Objective: Each prompt should focus on one primary action or scene. Avoid
combining unrelated actions.
●​ Explicit Naming: If using a trained character, always start with a clear identifier (e.g.,
CharacterName:) to lock in the specialized model.

2.2 Shot Structuring

●​ Shot Descriptors: Use cinematic terms in brackets to denote shot types:


○​ [Establishing shot: ...]
○​ [Close-up: ...]
○​ [Tracking shot: ...]
●​ Sequence Ordering: List shots in narrative order. The AI reads prompts top-down,
simulating shot progression.

2.3 Style Anchors

●​ Mood & Lighting: Specify tone words (e.g., moody, high contrast, golden
hour) to maintain visual coherence.
●​ Color Palette: If necessary, add palette hints (e.g., muted blues, warm sepia) for
consistent coloring across scenes.

2.4 Parameter Control

●​ Character Weight: For models supporting it (e.g., Open Art), set a slider value (e.g.,
0.85) to balance identity preservation vs. creative variation.
●​ Preserve Toggles: Toggle features like preserve_key_features to lock outfits or
allow wardrobe changes as needed.

2.5 Reference Integration

●​ Image References: When using image-to-video, upload your source image in the
designated slot. The AI uses it as the visual anchor.
●​ Style References: Optionally include a secondary image that exemplifies desired
lighting or camera movement.

2.6 Prompt Economy

●​ Conciseness: Eliminate unnecessary adjectives. Prioritize impactful descriptors that


drive the scene.
●​ Consistency in Language: Reuse established terms within a series to reinforce
continuity (e.g., always soft neon lighting).

3. Tools & Models Overview


AI video creation relies on specialized engines and models. Choosing the right tool shapes
your workflow, speed, and final output quality. Below is a concise survey of the most
established and cutting‑edge options:

Model / Type Input Strengths Weaknesses Best


Platform Modalities Use
Cas
es

SeaDance Proprietary AI Text‑to‑Video • Exceptional


(Enhancer) video , multi‑shot
Image‑to‑Vid consistency
eo (Cense 1.0)

• Natural
language
cinematic
prompting

• Built‑in • Still in closed


upscaling beta
pipeline via
Enhancer AI

• • Narrative
720 – 1080p sequences with
output limits shot continuity

• Indie
filmmakers,
ads, social
shorts

Google Text‑to‑Video Text only • Early


VE(O) access to
high‑quality
diffusion

• Rapid • Limited
prototyping multi‑shot
in Google support
Cloud
• Less • Quick concept
cinematic tests
language
parsing

• Educational
demos

Cling Image‑to‑Video Image + • Superior


(Runway) Simple text fidelity in
single‑scene
transforms

• Flexible • No native
upscaling to text‑to‑video yet
4K via
integrations

• Requires •
paid plan for Cinemagraphs,
extended product demos
use

• Isolated
shot
animations

OmniHuma Image‑to‑Lip‑Sy Image + • Rapid


n (CapCut) nc Audio lip‑sync from
static portraits

• Full‑body • 15 sec clip


animation length cap
and
backgrounds

• Queue • Talking heads,


times on free avatar
tier presentations

MidJourney Text‑to‑Image Text only • Rich artistic


v7 styles

• OmniRef • Not designed


for for video
consistent
character
variants
• Must export • Keyframe stills
and animate
separately

• Preliminary
concept art

Reeve.v1 Text‑to‑Image Text only • Free access • No


with strong image‑conditioni
fashion/editori ng
al aesthetic

• Requires • Editorial
external portrait series
upscaling &
animation

Flux Pro Text‑to‑Image Text only •


(FAL) High‑resolutio
n human
features

• Specialized • Less creative


face/body flexibility
realism

• • Avatar creation
Underperfor for training
ms on
background
fidelity

ComfyUI Open‑source Text, Image, • Fully


node‑based API customizable
pipelines

• Integrates
Gemini 2.0,
Flux,
upscalers

• Local or • Steep learning


cloud GPU curve
execution

• Requires •
GPU Production‑grad
resources e bespoke
workflows
Model Selection Guidelines

1.​ Early prototyping & storyboarding: MidJourney, Reeve, Flux — generate static
frames, build mood boards.
2.​ Shot continuity & narrative: SeaDance/Enhancer’s Cense 1.0 or Google VE(O) for
text‑driven video, Cling for image sequences.
3.​ Character consistency: Use OmniRef techniques in MidJourney or dedicated
“consistent character” features (Open Art, Cense).
4.​ Lip‑sync & talking heads: OmniHuman/CapCut for rapid image‑to‑video lip‑sync,
Synthetic UGC or Cling for full video lip‑sync.
5.​ High‑fidelity final renders: Post‑process with Enhancer AI (skin texture fix,
upscaling) and ComfyUI custom upscaler.

4. Basic Prompt Anatomy: Single-Line and Image-Driven Inputs

4.1 When You Have Only Text (Single-Line Prompts)

A single-line prompt might seem limiting, but if formatted correctly it’ll give the AI enough
structure to produce usable outputs:

css
CopyEdit
[CharacterName:] A close-up shot of a confident female detective
under moody neon light, rain droplets glistening on her trench coat,
shallow depth of field.

●​ [CharacterName:] – Locks the generation to your pre-trained character model (if


using a “consistent character” feature).​

●​ Primary Action/Description – “A close-up shot of a confident female detective”​

●​ Mood & Lighting – “under moody neon light”​

●​ Detail Anchor – “rain droplets glistening on her trench coat”​

●​ Cinematic Note – “shallow depth of field”​

Tips:

●​ Always begin with your character’s name or identifier if you’ve trained one.​

●​ Use commas to separate distinct descriptive clauses.​

●​ End with a camera/visual note to guide framing.​


4.2 When You Have One Reference Image (Image-Driven Prompts)

Uploading a single, high-quality reference image lets the AI use it as a visual anchor. Your
prompt then needs only minimal extra description:

1.​ Upload your source image in the “Face Reference” (or equivalent) slot.​

Prompt:​

sql​
CopyEdit​
Close-up of the uploaded subject gazing into the distance at golden
hour, soft backlight, cinematic portrait.

2.​
3.​ Settings:​

○​ Strength / Character Weight: 0.8 (locks in face identity)​

○​ Preserve Key Features: ON (locks outfit/accessories)​

○​ Aspect Ratio: 9:16 for portrait, 16:9 for landscape​

Why It Works:

●​ The AI already “knows” your subject from the image.​

●​ Your textual prompt only needs to describe the scene variation, not the subject’s
identity.​

4.3 Prompt Variations for Common Scenarios


Scenario Prompt Template

Simple Portrait Close-up of [CharacterName:] with soft fill light,


natural skin texture.

Environmental Wide shot of [CharacterName:] walking through a


Change bustling marketplace at dusk.

Emotional Extreme close-up on [CharacterName:]’s eyes filled


Close-Up with tears, reflections of city lights.

Dynamic Action Tracking shot of [CharacterName:] sprinting down a


rain-slicked alley, neon signs overhead.
Product Medium shot of [CharacterName:] holding a steaming
Placement coffee cup, morning sunlight filtering through
window.

Section 5: Text-to-Video Prompting Framework

In this section, we’ll outline a step-by-step structure for writing effective text-to-video prompts
that yield coherent, multi-shot narratives with consistent characters and style.

5. Text-to-Video Prompting Framework

Every strong text-to-video prompt consists of four layers. Use this sequence as your
template:

css
CopyEdit
[ShotType:] [SceneDescription] [ShotList / Beats] [StyleAnchors]

ShotType (optional but recommended)​


Clarifies you want multiple shots:​

makefile​
CopyEdit​
Multi-shot:
or for a single scene:​

makefile​
CopyEdit​
Single-shot:

1.​

SceneDescription​
Who, where, when, and what:​

css​
CopyEdit​
A lone female detective wanders through a rain-soaked neon alley at
midnight,

2.​

ShotList / Beats (in brackets)​


Describe a sequence of 2–4 camera beats. Each beat in its own bracket:​

csharp​
CopyEdit​
[Wide shot: her silhouette framed against flickering signage]
[Tracking shot: camera follows her steps, water splashing]
[Close-up: raindrops on her cheek as she narrows her eyes]

3.​

StyleAnchors​
Ties together look and feel:​

nginx​
CopyEdit​
Realistic style, handheld camera, high contrast neon lighting.

4.​

Putting it all together:

yaml
CopyEdit
Multi-shot: A lone female detective wanders through a rain-soaked
neon alley at midnight, [Wide shot: her silhouette framed against
flickering signage] [Tracking shot: camera follows her steps, water
splashing] [Close-up: raindrops on her cheek as she narrows her
eyes] Realistic style, handheld camera, high contrast neon lighting.

5.1 Why This Framework Works

●​ Clarity for the Model: Each layer guides the AI to understand story context, shot
progression, and visual consistency.​

●​ ShotList Brackets: Explicitly signals “cut to”—the AI stitches multiple mini-scenes


into a coherent video.​

●​ StyleAnchors: Ensures all shots share consistent lighting, camera behavior, and
aesthetic.​

5.2 Common Shot Types

●​ Wide shot – Establish setting and scale.​

●​ Medium shot – Show character body language.​


●​ Close-up – Capture emotion and detail.​

●​ Tracking shot – Imply motion through space.​

●​ Over-the-shoulder – Suggest perspective and interaction.​

5.3 Tuning Tips

●​ Length: Keep to 2–4 beats for a 5–10 s clip.​

●​ Specificity: Use precise descriptors (“flickering red neon,” “soft golden backlight”).​

●​ Consistency: Reuse the same keywords across all shots (“neon,” “rain-soaked,”
“handheld”) to anchor style.​

●​ Character Name: If you’ve trained a custom character, start with


[CharacterName:] before the scene description.

Section 6: Image-to-Video Prompting Variations

Building upon text-to-video structure, image-to-video prompts leverage a source image to


generate coherent motion while preserving style and character. Use these guidelines to craft
prompts that transform static frames into dynamic sequences.

6. Image-to-Video Prompting Variations

Template

makefile
CopyEdit
SourceImage: [URL or “uploaded”]
Prompt: [CameraAction] + [SubjectAction] + [ContextSwitch]
StyleAnchors: [ConsistentStyle]

1.​ SourceImage​

○​ Always reference the uploaded image explicitly so the model knows which
frame to animate.​

Example:​

makefile​
CopyEdit​
SourceImage: uploaded

○​
2.​ Prompt​
Compose with three parts concatenated into one natural sentence:​

○​ CameraAction​
Describes how the “camera” moves relative to the scene:​

■​ Camera pushes in​

■​ Slow whip pan​

■​ Dolly out to reveal environment​

○​ SubjectAction​
What the character does in the source image:​

■​ on screen the woman lifts her chin and blinks​

■​ the hooded figure turns head to the left​

■​ he raises a lantern into frame​

○​ ContextSwitch (optional)​
Adds environmental or emotional shift:​

■​ as rain drips from her eyelashes​

■​ while neon reflections dance across wet pavement​

■​ as a gust of wind sweeps through​

Full Prompt Example​



csharp​
CopyEdit​
Camera pushes in as the woman lifts her chin and blinks, rain
dripping from her eyelashes.

3.​
4.​ StyleAnchors​
Append a short tag to lock in style consistency across frames:​

○​ Realistic style, handheld camera, moody neon lighting​

○​ Cinematic drama, shallow depth of field, soft fog​

Format (append to end):​



CopyEdit​
…, Realistic style, handheld camera, neon glow.

○​

6.1 Prompt Variations


Minimalist​

vbnet​
CopyEdit​
SourceImage: uploaded
Prompt: Camera pushes in as her eyes flutter open.
StyleAnchors: Realistic style, soft directional key light.

●​ Use for quick emphasis on a single gesture.​

Environmental Emphasis​

vbnet​
CopyEdit​
SourceImage: uploaded
Prompt: Slow dolly out as the hooded figure turns head to the left,
neon puddles reflecting underfoot.
StyleAnchors: Cyberpunk, high contrast color grading.

●​ Highlights both character motion and background.​

Emotional Close-Up​

vbnet​
CopyEdit​
SourceImage: uploaded
Prompt: Tight close-up as a single tear rolls down her cheek.
StyleAnchors: Cinematic drama, cool blue undertones, soft focus
background.

●​ Perfect for conveying subtle emotional beats.​

6.2 Best Practices

●​ Match Source Composition: Keep camera angles that align with the framing of the
image (e.g., if the face is full-frame, use close-up prompts).​

●​ Limit ContextSwitch: Overloading environmental changes can confuse the model;


use sparingly.​

●​ Reusing StyleAnchors: Copy the exact style phrase for each variation to maintain
uniform look.​

●​ Test & Iterate: Generate 2–3 versions per prompt, compare, and pick the best; small
wording tweaks (e.g., “pushes in” → “slow push-in”) can improve fluidity.​

Section 7: Combining Multi-Shot and Image-to-Video for Advanced Sequences

By layering multi-shot prompt structures (from Section 5) onto image-to-video variations


(Section 6), you can craft complex, cinematic sequences from a single source image. This
hybrid approach gives you both narrative flow and dynamic motion.

7. Advanced Sequence Workflow

Overall Structure

1.​ Storyboard Definition (shot list)​

2.​ Generate Image Variations (context edits)​

3.​ Animate Each Shot (image-to-video)​

4.​ Stitch & Edit (assemble clips)​

7.1 Storyboard Definition (Shot List)


Write a concise shot list using multi-shot syntax:​

csharp​
CopyEdit​
[Shot1] Establishing wide shot of Model in Rainy City at Night
[Shot2] Tracking medium shot from behind
[Shot3] Close-up on tear rolling down cheek
[Shot4] Over-the-shoulder reveal of neon sign

●​
●​ Use bracketed descriptions exactly as in Section 5.​

7.2 Generate Image Variations

For each shot in your list, produce a static variation via context editing:

Prompt Template​

makefile​
CopyEdit​
SourceImage: uploaded
Prompt: [your bracketed shot description]
StyleAnchors: [identical style tag]

●​

Example for Shot 2:​



makefile​
CopyEdit​
SourceImage: uploaded
Prompt: Tracking shot of her walking under flickering street lights.
StyleAnchors: Realistic style, handheld camera, neon glow.

●​
●​ Generate and download each shot’s edited image.​

7.3 Animate Each Shot

Turn each static shot into motion with an image-to-video prompt:

Prompt Template​

makefile​
CopyEdit​
SourceImage: [static_shot_image]
Prompt: [CameraAction] + [SubjectAction] + [ContextSwitch]
StyleAnchors: [same style tag]

●​
●​ Map shot list to camera actions:​

○​ Shot 1 → Camera dollies out to reveal surrounding


cityscape.​

○​ Shot 2 → Camera tracks from behind as she advances under


neon lights.​

○​ Shot 3 → Camera pushes in for a tight close-up on her


cheek.​

○​ Shot 4 → Camera whistles over shoulder to reveal glowing


sign ahead.​

●​ Generate 5–10 sec clips for each shot.​

7.4 Stitch & Edit

●​ Assembly: Import all generated clips into your NLE (e.g., Premiere, DaVinci).​

●​ Transitions: Use standard cuts, match-on-action, and brief cross-dissolves to


smooth between differing camera movements.​

●​ Sound Design: Layer ambient city sounds, rain FX, and a simple score to unify the
sequence.​

●​ Color Grade: Apply a global LUT or adjustment layer matching your StyleAnchors
(e.g., teal & orange, neon contrast).​

7.5 Tips for Cohesion

●​ Consistent Frame Rate: Ensure all clips render at the same FPS (e.g., 24 fps).​
●​ Lighting Continuity: Verify directional light cues across shots align (key-light from
left, fill from right, etc.).​

●​ Motion Matches: Match camera move directions—if Shot 2 is a left-to-right pan,


Shot 4’s over-the-shoulder should also move leftward.​

●​ Audio Bridge: Use a continuous ambient track to mask any visual seams.

Section 8: Scaling & Automating Your AI Film Pipeline

Once you’ve validated the hybrid workflow in Sections 1–7, you can scale production and
automate repetitive steps. This lets you churn out dozens of mini-films with minimal manual
effort.

8.1 Template-Based Prompt Generation

●​ Shot List Templates​


Maintain a library of shot-list “blueprints” (e.g., “urban night walk,” “forest suspense,”
“interior drama”) with prewritten bracketed descriptions.​

●​ Character & Style Profiles​


Store each character’s source-image reference ID and its associated StyleAnchors
tags (e.g., neon_noir, cinematic_moody) in a JSON or CSV.​

json
CopyEdit
[
{
"character_name": "Neo",
"source_image_id": "img_12345",
"style_anchor": "neon_noir",
"shot_list": [
"Establishing wide shot of Neo on rooftop",
"Tracking shot from behind across wet tiles",
"Close-up on determined expression"
]
},
{ … }
]

●​ Automated Prompt Assembly​


Use a simple script or a custom GPT assistant to merge shot-list entries with the
character/style metadata into the Section 7 hybrid prompts.​

8.2 Batch Context Editing

●​ Bulk Upload​
Feed all source images into your chosen context-editing tool (Enhancer.ai, Flux
Context) via API — one shot list entry per image.​

●​ Concurrent Processing​
Trigger simultaneous jobs for each bracketed prompt; collect outputs into a
designated folder structure (/project_name/shot01/, etc.).​

8.3 Batch Image-to-Video Rendering

●​ API-Driven Rendering​
Submit your edited static shots to your image-to-video model (Cense, Cling) using
their developer API.​

●​ Job Monitoring & Retry​


Implement a polling routine that checks job status, retries failures up to N times, and
logs final clip URLs.​

8.4 Automated Assembly & Post-Production

●​ Timeline Generation​
With a script or NLE API (e.g., Premiere Pro Extendscript, DaVinci Resolve Python),
auto-import clips and place them in order.​

●​ Standard Transitions & Titles​


Insert a prebuilt transition and optional title cards (e.g., “Scene 1,” “Scene 2”).​

●​ Auto-Audio Mix​
Batch-apply the same ambient/music track; normalize levels to a target LUFS.​

8.5 Cloud Deployment & Storage


●​ Cloud Workers​
Run your batch jobs on cloud instances (AWS Lambda, Google Cloud Functions) for
unlimited parallelism.​

●​ Asset Management​
Store generated assets in object storage (e.g., S3, GCS), with lifecycle rules to
archive older versions.​

8.6 Monitoring & Quality Control

●​ Automated QA Checks​

○​ Frame Rate/Resolution Validation: Ensure each clip matches your spec


(720p/24 fps or 4K/30 fps).​

○​ Duration Check: Confirm clip lengths fall within ±0.5 s of target.​

○​ Style Consistency: Run a lightweight image classifier on key frames to verify


style-anchor adherence.​

●​ Human Spot-Check​
Randomly sample 5% of sequences for manual review, focusing on lip sync,
continuity, and lighting.​

8.7 Cost & Throughput Optimization

●​ Model Selection per Stage​


Use faster/cheaper models for initial drafts; reserve high-quality modes for final
deliverables.​

●​ Adaptive Quality​
For internal review tiers, render at lower resolution — upscale only client-approved
cuts.​

●​ Resource Scheduling​
Schedule heavy upscaler jobs overnight when compute rates are lower.

Section 9: Analytics, Feedback Loops & Iterative Improvement

To continuously raise the bar on quality and efficiency, integrate data-driven feedback at
each stage of your AI film pipeline. Here’s how:
9.1 Viewer Engagement Metrics

●​ Key Metrics to Track​

○​ Play Rate: % of viewers who hit “play” on social previews.​

○​ Watch Time / Completion Rate: Average seconds watched per clip; % of


viewers who watch to the end.​

○​ Drop-Off Points: Timestamp heatmaps showing where attention drops.​

○​ Replays & Shares: How often clips are rewatched or shared.​

●​ Instrumenting Player​
Embed your AI videos in a player (e.g., Wistia, Vimeo Pro, custom JS) that exposes
these metrics via analytics APIs.​

9.2 A/B Testing Variants

●​ Shot Order Variants​


Swap shot sequencing (e.g., start with close-up vs. wide-establishing) to see which
hook retains viewers.​

●​ Soundtrack & SFX​


Test different music tracks, ambient sound intensity, and mix levels.​

●​ Color Grade & Style Filters​


Apply distinct LUTs or AI filters (warm vs. cool tones) to gauge emotional impact.​

●​ Automated Delivery​
Use a feature-flag system or a content-delivery A/B tool (e.g., LaunchDarkly) to
randomly expose variants to segments of your audience.​

9.3 Prompt Performance Analytics

●​ Logging Prompt & Result Pairings​


Store the exact prompt text, model/version, and resulting clip URL in a database
along with engagement metrics.​

●​ Natural Language Analysis​


Periodically analyze high-performing vs. low-performing prompts:​
○​ Keyword Frequency: Which descriptive terms (e.g., “tracking shot,” “golden
hour”) correlate with higher watch time?​

○​ Structure Patterns: Does including bracketed shot lists outperform free-form


descriptions?​

●​ Recommendation Engine​
Build a simple recommender that suggests prompt tweaks based on historical
performance (e.g., “Videos with ‘close-up tear’ had +20% retention; consider adding
emotional close-up”).​

9.4 Automated Iteration Cycle

1.​ Collect Data: Ingest viewer and prompt analytics nightly.​

2.​ Analyze & Hypothesize: Automatically flag underperforming sequences or shots.​

3.​ Generate Variants: Spin up new edits with modified prompts (e.g., swap lighting
cue, re-order shots).​

4.​ Deploy & Test: Release new variants to a test cohort.​

5.​ Evaluate & Promote: Promote winning versions to full audience; retire losers.​

Leverage orchestration frameworks (Airflow, Prefect) to schedule these cycles, ensuring


your content continually self-optimizes.

9.5 Quality Improvement via Human-in-the-Loop

●​ Annotator Review: Have editors tag specific failure modes (e.g., “lips don’t sync,”
“background flicker,” “style drift”).​

●​ Retraining Custom GPT: Feed annotated examples back into your custom GPT to
refine its shot-list & prompt generation logic.​

●​ Model Version Upgrades: When new AI model versions release, re-benchmark and
adopt if superior; phase out deprecated versions.​

9.6 Business Metrics & ROI


●​ Cost per Finished Film: GPU hours + human spot-check time / number of final
deliverables.​

●​ Revenue per Film: Direct client billings or ad revenue attributed.​

●​ Profitability Curve: Track margin improvements as automation reduces manual


touches.​

●​ Time to Delivery: Average turnaround from brief to final deliverable; target


incremental reduction each iteration.​

Section X: HeyGen “Custom Motion” Prompts

HeyGen’s Avatar IV interface includes an optional Custom Motion field where you can
specify precise gestures, facial expressions, and body language to complement your script.
Using this field effectively keeps your avatar’s performance consistent and on-brand. Below
is a checklist and several example prompts to guide you.

1. When to Use “Custom Motion”

●​ Enhance Engagement: Add subtle nods or hand gestures when emphasizing key
points.​

●​ Express Emotion: Direct the avatar’s facial cues—smiles, surprise, furrowed


brows—to match the tone of your line.​

●​ Match Scene Context: If your avatar is “holding” or “pointing to” an on-screen


graphic, cue the appropriate arm movement.​

2. Anatomy of a Good Motion Prompt

1.​ Start with the Region: e.g. Head, Right hand, Torso, Eyes​

2.​ Define the Action: e.g. tilts, raises, leans, smiles, blinks​

3.​ Add Timing/Cadence (optional): e.g. slightly, slowly, quickly, on-beat, linger for
1 sec​

4.​ Tie to Script: reference the corresponding text segment or emotion​

Template:
css
CopyEdit
[Region] [action] [modifier] to express [emotion/context].

3. Example Custom-Motion Prompts


Emphasizing a Key Point​

sql​
CopyEdit​
Right hand raises slowly with palm up to emphasize “new AI feature.”

●​

Friendly Greeting​

css​
CopyEdit​
Head tilts slightly to the right and smiles warmly at “Hello,
everyone!”

●​

Expressing Surprise​

python​
CopyEdit​
Eyebrows raise quickly and eyes widen on “You won’t believe what
happened next.”

●​

Pointing at On-Screen Graphic​



scss​
CopyEdit​
Left hand extends and points gently toward the bottom-right corner
when referencing the chart.

●​

Pensive Pause​

vbnet​
CopyEdit​
Eyes glance downward for 1 sec then back up before “let me show you
how.”
●​

Natural Blinks​

css​
CopyEdit​
Blink naturally every 4–6 seconds throughout the entire clip.

●​

4. Tips for Smooth Motion

●​ Keep It Lightweight: Avoid overloading with simultaneous gestures—1–2 cues per


10 sec clip is usually sufficient.​

●​ Align with Speech: Place the motion cue immediately before or during the matching
script segment.​

●​ Test & Iterate: Preview at 1× and 2× speed to ensure gestures feel natural and
aren’t cut off.​

●​ Fallback to “Surprise me”: HeyGen’s built-in shuffle can inspire new gesture ideas
if you’re stuck.

Section Y: Model-Specific Prompting & Special Instructions

Different AI video engines have varying strengths, input requirements, and prompt syntaxes.
Below are tailored guidelines for the four models you’ll be using: HeyGen Avatar IV, Cense
Multi-Shot, Google Veo (Flow), and Cling 2.x.

1. HeyGen Avatar IV (Photo → Talking Video)

●​ Input Image:​

○​ Minimum 720p, frontal portrait with clear face visibility.​

○​ Neutral or minimal background helps the avatar extraction.​

●​ Script Field (≤840 characters / 60 sec):​

○​ Write in conversational tone, include filler words (“um,” “so,” “right?”) for
natural pacing.​
○​ If using ElevenLabs audio, upload the .wav/.mp3 directly to “upload or record
audio.”​

●​ Custom Motion Field:​

○​ See Section X above for gesture cues. Keep to 1–2 motions per 15 sec.​

●​ Settings:​

○​ “More expressive” toggle ON for natural facial nuances.​

○​ Resolution: 720p (for faster processing) or 1080p if available.​

●​ Tip: Break long scripts into multiple 20–30 sec segments and concatenate outputs
for smoother delivery.​

2. Cense 1.0 Multi-Shot Editing (Text → Video & Image → Video)

●​ Text → Video:​

○​ Use bracketed shot lists (see Section 9) to define camera angles and cuts.​

○​ Keep each scene segment concise—2–4 sentences per 10 sec block.​

●​ Image → Video:​

○​ Source image should match the story’s style; avoid mixing styles mid-prompt.​

○​ Simple Prompt (“camera pushes in”) yields smooth, continuous motion.​

○​ Detailed Prompt: Add emotion and action (“wind brushes hair…”) for richer
results.​

●​ Settings:​

○​ Clip length: 5–10 sec; Resolution: 720p.​

●​ Tip: Cense natively understands natural language—no need for special keywords
like “photorealistic.”​

3. Google Veo (Flow API, Text → Video)


●​ Prompting:​

○​ More literal: describe exactly what you want the camera to do or the subject
to express.​

○​ Include mentions of blinking or minor head movements to avoid static-looking


avatars.​

●​ Limitations:​

○​ Max clip length: 8 sec; Resolution: up to 1080p (depending on API tier).​

●​ Settings:​

○​ Select “VO3” or “Best Quality” model.​

●​ Tip: Because Veo often nails blinks and facial nuances better, use a short prefix like
“Add natural blinks every 4 sec” in your prompt.​

4. Cling 2.x (Image → Video)

●​ Availability:​

○​ Currently supports only image → video (no text → video).​

●​ Prompting:​

○​ Combine a short action cue (“camera tilts up”) with a scene descriptor
(“misty forest background”).​

●​ Settings:​

○​ Clip length: up to 15 sec; Resolution: 720p or 1080p.​

●​ Tip: Cling excels at faithful depiction of the source image. For minor deviations, keep
prompts minimal—let the source drive most of the style.​

Quick-Reference Prompt Template per Engine


Engine Prompt Style Key Notes
HeyGen Full script + Custom Motion cues ≤840 chars, “More expressive” ON,
720p–1080p

Cense Bracketed shot list (Text) or simple Use natural language, 5–10 sec,
action (Image) 720p

Google Literal scene description + blink 8 sec max, add “natural blinks”
Veo reminders

Cling 2.x Short action + environment descriptor 15 sec max, minimal prompt for
consistency

You might also like