How to write Midjourney video prompts that actually work
Stop leaving your animations to chance
Prompting for Midjourney video
Midjourney's video feature opens up exciting possibilities for creators who want more control over their animations. While you can simply animate an image without any prompting, writing effective prompts gives you the power to guide exactly how your video unfolds.
This guide focuses on the art and science of video prompting, turning your static images into compelling animated stories.
If you're new to Midjourney video, check out this article to cover the basics. Now let's look into more details that'll make your videos stand out.
How Midjourney video actually works
Understanding the technical foundation helps you craft better prompts. The video feature uses a diffusion video model that works quite differently from what you might expect.
Instead of creating frames one by one, the model generates all frames simultaneously, treating them as a unified volume of data that represents both space (the contents of the frame) and time (the sequence of the frames).
This parallel processing approach has important implications:
The entire sequence gets processed at once, not frame-by-frame
Providing comprehensive context in your prompt helps since the model considers everything simultaneously
This parallel processing approach means your prompting strategy should account for the entire video sequence, not just individual moments.
Other details to know:
• Videos output at 24 fps for smooth playback
• The video format for social media uses H.264 - MPEG-4 AVC codec with Planar 4:2:0 YUV color format for optimal platform compatibility for web and streaming video
Starting frame essentials
Your starting frame sets the foundation for everything that follows. Get this right, and your video generation becomes much more predictable. Your starting frame should contain every element you want to see throughout the video.
Here's what matters most:
Skip the upscaling since --video 1 only supports 480p (the bot will downscale high-resolution images, creating unwanted artifacts)
Style carries through consistently, meaning illustrations stay illustrated, photographs stay photographic
When subjects move out of view and return, describe their visual characteristics again and reinforce with phrases like "the same cup appears again"
Character states matter for animation flow (want your character to open their eyes? Start with eyes closed, not already open)
Timing strategy
Think backwards from your desired action. You want to prompt images that happen seconds before the peak action, not during or after.
The "about to" technique works particularly well:
Use this keyword to signal the model to prepare for imminent action
Create an image of a dog looking at a hamburger, about to take a bite. When animated, the dog will complete the eating motion
cinematic photo of an excited dog is about to bite a hamburger given by his owner --ar 16:9 --profile oaefodl jqfuczz --v 7
the dog swallows the burger in one bite --ar 91:51 --motion high --video 1
Interestingly, if you do not write the prompt for the dog to eat the burger, the dog may refuse to eat it! Apparently, the prompt for image generation is insufficient to guide the action (laugh).
cinematic photo of an excited dog is about to bite a hamburger --ar 91:51 --motion high --video 1
Sorry…Doggo not gonna eat that, human!!
Sometimes dogs in the video refuse to eat the burger if you don't specifically prompt the eating action! Apparently, the image generation prompt alone isn't enough to guide the action (laugh)
Work within the 5.2 second limit by keeping actions simple and achievable • For complex actions, try extending the video to see if the bot continues the last action
Time travel is possible because you can prompt the video to show what happened before your starting frame, so choose that first moment carefully.
Describing actions like a director
Specificity transforms basic movements into compelling animations. Think like a film director giving precise instructions. Instead of "he picks up an apple," write "he picks up the apple with his left hand."
Pro Tip: Layer your actions for more dynamic results:
Primary action (what's happening)
Secondary action (what happens as a result)
Background activity (what's happening in the environment)
Without clear action prompts, the bot tends to rotate or spin geometric patterns or static images
Prompting techniques that deliver results
Video prompting differs significantly from image prompting. The focus shifts to motion, sequence, and temporal elements rather than static visual details.
Your video prompts should prioritize:
Motion and movement patterns
Sequence of actions
Object permanence (keeping things visible)
Camera movement instructions
Start with a clean slate approach. Remove those image generation prompts because they don't help with video. Focus purely on motion and sequence instead.
For action prompts, use this simple formula: Subject + active action verb + (adverb). For example: "A rabbit hopping rapidly along the pathway."
Building sequences becomes crucial for longer videos. Link actions using connective keywords like: first, then, next, after that, suddenly, eventually, as, while, at the same time, finally.
If the object moves in and out from view, make sure you provide more details to track that object. A good example: "She turns in a circle, holding a cup with one hand" (this maintains object presence throughout the turn).
Character consistency gets easier once you understand the rules. You don't need to re-describe characters since the bot picks up details from the starting frame. However, if you edit the image and change the prompt significantly, provide more context. For complex characters, simplify your prompts: "Long-haired, blue-eyed elf in green dress" becomes just "she" + action.
Pro Tip: Simplify your subject/object reference by using “he/she/it…”
Environmental reinforcement helps when using camera movement. Describe surroundings like "a messy and dirty place" to maintain that atmosphere throughout the video.
A few technical tips that make a difference:
Use --raw for closer prompt adherence
Action order in prompts doesn't matter, but sequence of actions does
Describe what you can see, not internal character feelings
Physics interactions (waves, wind, impacts) aren't reliable yet
Introducing new elements mid-video
Adding elements that weren't in your starting frame presents unique challenges. It's significantly harder to introduce new elements if they're not in the first frame, but it's possible with the right approach.
When introducing new subjects, describe them thoroughly including their type (illustration/photography). Based on testing, new elements often work better when introduced during video extensions rather than in the initial generation.
For example, a cat (new element) is introduced into the video via video extension:
cinematic photograph of a black cat with blue eyes taking shelter from the rain. The cat at far is looking at the droids --ar 52:29 --motion low --video 1
Another example. It is easier to bring in a new subject that the model is familiar with (like a cat, which is a common subject). Note: The first frame of this video does not have a cat.
A black cat with blue eyes walking pass the robots leisurely --ar 91:51 --motion high --video 1
The video model has no idea what size the new subject (cat) is! The size may not be proportional to other existing subjects/objects. The first frame of this video also does not contain the cat.
a black cat with blue eyes come to greet the robots. The robot pet the black cat --ar 91:51 --motion high --video 1
Camera movement experimentation
Midjourney recognizes various camera movements, but results vary significantly. The available movement types include pan, tilt, zoom, dolly, tracking, crane, pedestal, steadicam, handheld, POV, rack focus, and different static camera approaches (subject moves while camera doesn't, or camera moves while subject stays still).
Not all camera keywords work as expected, so experimentation is essential.
For example, try "tracking view from a distance showing [subject] moving through [environment]" for a tracking shot. Results may vary, but when it works, the effect can be quite striking.
For example, the tracking view from a distant:
sudden wind blows up dust and rubbish while the cat walk away from the camera, rubbish everywhere. Camera tracking view from behind the cat following the cat from a distance --ar 52:29 --motion high --video 1
Quality management
Understanding quality limitations helps you plan better video projects. Video quality decreases with each extension, especially after the third extension.
The first 5 seconds are your best footage, so plan your most important content for the initial generation.
Maintaining consistency in longer videos
Creating cohesive, longer videos requires strategic planning. For character management, set up character libraries using Omni Reference and document key actions, movements, and poses. Choose characters with signature visual attributes (like distinctive blue eyes) for better consistency.
Style control matters just as much. Establish Style Reference early to maintain color, worldbuilding, and style consistency throughout your project. Generate separate images for each scene rather than relying solely on extensions.
Also, limit video extensions to a maximum of 2 whenever possible. Both quality and consistency benefit from this restraint.
Related articles
Key takeaways
Midjourney video prompts require a holistic approach. The model generates all frames simultaneously, so prompts should provide comprehensive context for the entire video sequence, not just individual moments.
Effective video prompting focuses on motion, sequence, and clear action instructions; starting frames must include all desired elements, and prompts should describe specific actions and camera movements to guide the animation.
Maintaining video quality and consistency involves planning important content for the first 5 seconds, limiting video extensions, and using character and style references to ensure cohesive results in a longer project.
Cover prompt: Indie film director on minimalist indoor set, passionately directing actors, clapboard ready, camera on dolly, raw emotion captured --ar 16:9 --profile oaefodl --v 7
I hope you liked this article!
Please subscribe, like, share, and comment so that more people can discover Geeky Curiosity newsletter.
Geeky Curiosity is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.