There are many nuances that must come together to make exchange of multi-vendor video and data work smoothly. Here is a quick break down of issues we faced and how we approached a solution. Warning — math ahead!
Issues with orphaned B frames between marks, which are introduced when the video is “spliced” together
For workflow reasons some vendors take the approach of splicing raw camera files together to create a clean master. However, since the video is “spliced” rather than a clean encode, there is essentially a break in the video. Since mp4 is compressed video, this creates a quality problem near the break because frames before the break reference video frames that are simply no longer there. One solution is to allow some hanging frames after this break for references of the preceding frames. This keeps the video inside the play clean, but your see those hanging frames are now in the master file and are pixilated since they too now point to video that is missing.
In testing during the previous years, these frames were not included in exchange files. With some recent changes to workflows they are now included in the video. We had to make adjustments to remove those orphaned frames without the video getting out of sync. Think of it as needing to remaster the master.
Variations in xml marks formats between vendors, ambiguity of field definitions, fields simply missing completely
The xml marks file varies slightly from vendor to vendor and in some cases between different builds of the same editor. In some cases, required filed definitions may not be there at all. All of the variations we have seen in real world usage are properly handled.
30 frames vs. 60 fields vs. Progressive vs. Interlaced
You all know that interlaced video has been around since the beginning of time. Interlaced video is where the picture is divided into 2 fields and each field is handled separately due to the technology limits of the first TV signals. Unfortunately, that standard continues to this day and comes directly onto your plate in the HD world of 1080i vs. 720p.
In 1080i video, the interlaced video is represented by 2 fields, which together make a full picture. In typical 30 fps video (or 29.97 to be exact) you have 60 fields (59.94) per second coming together to constitute the full picture.
Now, jump to the 720p video. This is progressive video rather than interlaced which means that the video is truly equivalent to the old reels of video … 30 full pictures flashing across the screen every second. (side note – most were 24 fps, some were 16 or 18) but wait, if 30 (29.97) is good, why not double it to 60 (59.94) and get an even better picture. That makes sense on high-speed sports video so 720p is just that … 60 fps. But those are 60 full pictures every second where as 1080i was 60 fields per second. A field is of a picture and, you guessed it, this is where the half speed or double speed video playback problem crept in with conversions between these two formats.
Remember that if you are running Windows XP and see the video come out at the wrong speed, try it on Windows 7.
Gop structures — new strict requirements
To support the most efficient encoding, mp4 video commonly supports a variable size gop structure. This allows the encoder to place i frames at points in the video where significant changes have occurred in a scene and to not included them when you have relatively static video (think Mad Max vs. Driving Miss Daisy). Take for instance the transition between the scoreboard and sideline. Going from an all red scoreboard to a completely green field seems like a pretty good place to say, “hey something significant changed” in the video. At the snap of the ball is another common location since the video changes from mostly static to sudden movement.
I frames take a proportionately large amount of bits to encode since they store full pictures of video. On the other hand p and b frames only encode differences between video frames. This is one reason why long gop interframe compression gives you higher quality at the same bitrate than intraframe compression (like dv), or conversely gives you the same quality at lower bitrates. You’re only storing what is needed. Variable gop improves bit rates further by only storing full i frames when needed.
However, there is always a trade-off. Some editors require fixed gop lengths in order to quickly seek to a given frame of video, and cannot easily work with variable gop lengths. If gops are always every 5 i frames, locating them in the file is simple math and eliminates more complex trick-play implementations for ff and rw. This is the source of the “paused” or “sticky” video many have seen.
In addition to the issues above, we found and fixed a few bugs this week that didn’t come to light in last year’s more limited hd exchanges or in partner testing this summer. The bugs would have caused marks to be off by a few frames in one direction or the other. Those couple of frames, combined with the missing, mangled, stapled and folded marks above made for many difficult exchanges in the first weekend.