Learning FFmpeg


I've decided to take a bit of a break from AWS training and pursue some personal projects for a bit (which will probably be hosted on AWS in the end so it's still kind of part of the training I guess).
It's a video based project where one of the key elements is finding a way to script or automate very basic video edits. The workflow needs to look something like this:

input video + input audio
🠇
combined output video
(conformed spec and trimmed same duration as audio)

This is easily possible with a non-linear editor like Adobe Premiere, it's a straight forward transform and render job. However I'm looking for a way to script and automate this so that I can feed these input files into a script and have it output everything for me.
The first trick to automating something easily is to see if you can do the task within the command line. Step in FFmpeg.
FFmpeg is probably the most powerful video processing library out there and it's open source, meaning it's the fantastic price of free! If you've used a video converter program before, the chances are, it was FFmpeg under the hood that was doing the heavy lifting.
I've put off learning FFmpeg for nearly a decade, partly because it seemed really intimidating because it's pretty much command line only. But this project seemed a good way to wrap my head around it.
I decided to document the steps to give me an excuse to post something up on the blog and (most importantly) I'll be able to remember what I've done.

Our Test Files

To start with, I have a test video that is going to be my input video simply called "inputVIDEO.mp4". I also have a test audio track that will be the input video just called "inputAUDIO.mp3". It's a pretty basic naming convention but should be a lot easier to document what's going on.

Creating the Command

Step 1: Converting a Video to a Different Format

We're going to ingnore the audio for the time being and focus on getting the video to where we want it to be. FFmpeg has a really basic syntax. To convert a video from one format to another we simply enter the following:

ffmpeg -i inputVIDEO.mp4 output.webm

That's it! Let's have a look at this:
"-i" indicates that "inputVIDEO.mp4" is the input file. FFmpeg figures out that I want a WebM, transcodes with default values and spits out "output.webm".
The conversion was pretty quick, it took just over a minute to convert a five minute 720p mp4 into a WebM of close specs. We'll revisit resolution and video specs later on.

Step 2: Trimming the Clip

What I'm trying to do is have FFmpeg trim the duration of the video down to match the length of the audio track. To do this, we'll need to figure out working with timecodes.
As part of the test, I wanted to have the start of the clip be at 1 minute and I wanted it to cut the clip off after 30 seconds. To do this, I used the following:

ffmpeg -ss 00:01:00 -i inputVIDEO.mp4 -t 00:00:30 output.webm

"-ss" specifies where to start the clip. This can be put after the the "-i" which will make FFmpeg transcode first, then trim (which takes a bit longer as it's transcoding the whole clip). Putting this before means that it will start the transcoding from the point specified.
"-t" is the duration of the clip, i.e. how long after "-ss" should the clip run for. This is actually a much better way of doing the duration in this instance from an automation perspective because it may be possible to extract the duration information from the audio track in a programmatic way. EDIT: Don't need this anymore, see below.
This conversion was even quicker because we were only transcoding 30 seconds worth of material. So far, so good.

Step3: Replacing the Video Audio

The next piece of the puzzle is to make "mux" the audio from "inputAUDIO.mp3" so that it replaces the audio from the cropped and scaled output of "inputVIDEO.mp4".

ffmpeg -ss 00:01:00 -i inputVIDEO.mp4 -i inputAUDIO.mp3 -map 0:0 -map 1:0 -shortest output.webm

What we've added here is "-i inputAUDIO.mp3 -map 0:0 -map 1:0 -shortest". Let's break this down a bit.
"-i inputAUDIO.mp3" is our second input file. In this case, the audio file that's getting muxed together with our video file.
"-map 0:0 -map 1:0" is a bit trickier to explain but I'll try my best! "-map" is a flag that will allow you to specify what streams from the input(s) should be in the output. You'll notice that the number format is "x:y" where "x" refers to which input you are talking about (so "0" will be the first input file and "1" will be the second input file... because, computers) and the "y" value refers to the stream you are specifying to keep in the output.
In our case, the video file has two streams...
  • stream #0 = the video track for inputVIDEO.mp4
  • stream #1 = the audio track for inputVIDEO.mp4
The audio file has one stream...
  • stream #0 = the audio track for inputAUDIO.mp3
By specifying "-map 0:0 -map 1:0" we are telling FFmpeg to make our output file have the following streams:
  • stream #0 = the video track for inputVIDEO.mp4
  • stream #1 = the audio track for inputAUDIO.mp3
This is exactly what we're after. Just so you know, if we didn't specify any mappings then FFmpeg will add all of the streams from all of the input files into the final output file and give us this:
  • stream #0 = the video track for inputVIDEO.mp4
  • stream #1 = the audio track for inputVIDEO.mp4
  • stream #2 = the audio track for inputAUDIO.mp3
"-shortest" is the last bit will tell FFmpeg to make the final output file match the length of the shortest input file. This is great because we no longer need to specify the duration as we were doing above (using the "-t" flag). As long as the video file is longer than the audio file (which it will be) this should work fine for what we're doing here.

Step 4: Cropping, Scaling and Standardising the Video

For this project, I want all the videos to conform to the same specification. The input video may come from a variety of different sources with different specifications (different resolutions, aspect ratios, frame rates, etc). Therefore our FFmpeg command needs to include some rules to handle this transformation.
The specs that I want all videos to conform to is:
  • Resolution = 720 x 720
  • Pixel Aspect Ratio = 1:1
  • Frame Rate = 25fps
The command to get this to work will now be as follows:

ffmpeg -ss 00:01:00 -i inputVIDEO.mp4 -i inputAUDIO.mp3 -map 0:0 -map 1:0 -vf "scale=iw*sar:ih,yadif,fps=fps=25,crop=in_h:in_h,scale=720:720" -shortest output.webm

The addition is "-vf "scale=iw*sar:ih,yadif,fps=fps=25,crop=in_h:in_h,scale=720:720"". There's a fair bit packed into this command, but it's straightfoward once broken down.
"-vf" tells FFmpeg that video filters are going to be applied to the source. Notice we've chained all of the filters together within the quotation marks with commas in between. FFmpeg will apply these in the order they are written, so lets go through each step.
"scale=iw*sar:ih" sorts out our Pixel Aspect Ratio. If the video is using non-square, anamorphic pixels, we need a way to expand them out so that they are being processed as square pixels by the rest of the filter chain. This command basically tells FFmpeg to expand out the anamorphic pixels (as it would have been if the file were being played back) and apply the transformations to the fully expanded frame. The good thing about this implementation is that if we provide an input file which already has a 1:1 Pixel Aspect Ratio, the resolution will remain intact (because multiplying the width "iw" with a 1:1 aspect ratio "sar" will keep the width unchanged).
"yadif" de-interlaces the video. Using the defaults of FFmpeg means that it will try to automatically figure out the field order and so on. The downside is that if a progressive video gets input into this, we are wasting CPU cycles de-interlacing it. But the videos are so small anyway that I don't really care.
"fps=fps=25" sets the frame rate to 25. Again, using the FFmpeg defaults here which basically takes a guess at how to deal with the file. That works fine for me.
"crop=in_h:in_h" simply crops the video to be in a square aspect ratio of 1:1. The defaults here are to crop into the centre.
"scale=720:720" is the last bit of the transform that will just re-size everything to 720 x 720. Because we've already cropped to a 1:1 aspect ratio, we don't need to specify anything else and the defaults will do the job just fine.

Done!

We now have a single command that just needs three inputs; a video file, an audio file and a time value. The command will happily convert anything you throw at it.
Thanks to the following resources which really helped me figure this out:

Where next?

I feel like I would like to develop this further. I think the next thing I want to be able to do is convert this command into a shell script (because I've always wanted to try my hand at scripting and this would be the perfect excuse). Also, I think it'll be interesting to explore what could be done with the meta data in the MP3 files.