We just had two new Skylake-based workstations in the office. True to form, they were both "entry-level" workstations, and these days, "entry-level" means "pretty darn powerful" with 3.6 GHz Skylake CPUs, a Quadro K1200 & M4000, up to 64 GB of RAM and up to 27 TB of storage.
As it turned out the test systems gave me the opportunity to test new Maxwell-architecture NVIDIA Quadro GPUs — the K1200 and the M4000 — side by side.
One set of tests we run to benchmark hardware performance are rendering tests for Adobe Premier Pro CC and Adobe After Effects CC. For this article we were running the latest 2015 version.
When I test a system, it is certainly to measure performance, but it is also to investigate questions about performance. In this case, we wanted to investigate how the Mercury Playback Engine performs under a range of conditions. We set up our test video file with four tests. All video segments are HD and all video segments use colour correction effects. The four tests render 1, 2, 3, and 4 concurrent video streams, respectively.
Any number of scenarios support testing multiple video stream conditions: titling, overlaying effects, merging multiple cameras, and inserting secondary images are a few examples. That said, many Premier Pro users probably use the first test case for most of their work: a single video stream at a time with colour correction or a similar effect. I shoot video with 2 or 3 cameras and routinely have 3 or 4 over-lapping video streams to render out in addition to titles and over-laying extra video footage.
So as to provide relevant test measurements for you, the table results have been normalised to show the number of seconds required to render a second of video.
What stands out are several results.
•The GPU-accelerated rendering is excellent when using one or two video streams. It renders faster than realtime, and it performs at twice the speed of the Mercury Playback Engine using software (CPU) rendering only.
•The MPE performance drops dramatically when a third video stream is rendered simultaneously.
•The CPU rendering is faster than the GPU rendering in the MPE when rendering 3 or 4 simultaneous video streams
•The GPU-accelerated tests ran at the same speed on the Quadro K1200 and M4000.
Premier Pro CC 2015
* Rendering time (sec) / sec of video (25 fps) HD video
These are interesting results. First, it says that the Quadro K1200 is the right GPU for your Adobe video-editing workstation. I would have expected the much more capable Quadro M4000 to out-pace the more modest Quadro K1200. The conclusion appears to be that the Mercury Playback Engine itself functions as the performance bottleneck. Whether the reason behind this is due to video management, storage, or other issues, is not clear.
What is clear is this. The performance delta between 1 video stream and 2 concurrent video streams when using CPU rendering is what I would expect - around twice the time for about twice the rendering work. It is impressive that the GPU rendering performance delta in the same circumstances imposes no more than a 28% penalty. This is, frankly a surprise. But it means that for the first and second test cases, which apply to most editing situations, the Quadro K1200 is an excellent upgrade for your system.
The extreme drop in performance between 2 and 3 concurrent video streams is inexplicable. Going from 2 HD video streams to 3 HD video streams increases the data by 50% yet the rendering times for CPU rendering jumped by more than 200% and the rendering times for the GPU rendering increased a remarkable 800% to 900%.
I cannot explain this performance degradation, but we can give you an idea what it looks like. While testing, we monitored the GPU overhead and the CPU overhead. You can see in the image below, the CPU utilization (from left to right) for 2 video streams with GPU, 2 video streams with CPU, 4 video streams with GPU, and 4 video streams with CPU.
In the first column, the CPU is almost, but not fully utilized. The CPU usage follows a pattern of small peaks. The same pattern is mirrored on the GPU where GPU utilisation swings from 0% to 30%. The second column is the CPU renderer. Each thread in each core goes to 100% and stays there. Compare this to the first column where the addition of the GPU renders the same video 2.5 times faster. The third column is the CPU utilization while rendering 4 video streams concurrently using the GPU renderer and the fourth column is the same video sequence using the CPU renderer. In the 3rd column, both the CPU and the GPU had a lower utilization, and this rendering combination was the slowest. In the fourth column is the CPU utilization for rendering 4 video streams with still exceptionally slow results. Unlike the second column, the CPU is not being fully utilized.
I also benchmarked Adobe After Effects. With two new NVIDIA Quadro GPUs we planned to measure After Effects 3D ray-tracing renderer. This feature is apparently considered obsolete and is not supported by Adobe. New GPUs like the Quadro M4000 are not recognized, and the ray-tracing renderer returns an error.
Render times in After Effects are notoriously inconsistent. Due to this, I ran a 3D logo rendering test 4 times and provide an average result. As with the Premier Pro CC testing, the normalized results represent the number of seconds required to render one second of video.
The rendering time for a second of After Effects video can be more than 100 times longer than the rendering time for Premier Pro and the GPU basically doesn't help at all for After Effects. An After Effects workstation needs extreme computing power. In fact a good After Effects CC workstation needs high performance subsystems like memory and storage, too. One hint of this can be seen in the CPU utilization levels which rise and fall in peaks that can easily range from 50% to 100% utilization. This indicates that the processor is waiting on the rest of the system a fair amount of the time.
After Effects CC 2015 Render Test
Results in seconds: number of seconds to render one second of video
Test 1 Test 2 Test 3 Test 4 AVERAGE
Intel Xeon E3-1275 v5 @ 3.6 Ghz 113.33 100.00 123.00 123.33 114.92
Peaks and valleys in CPU usage indicate possible bottlenecks in other parts of the workstation
A Final Perspective
Our original expectations were that the Quadro M4000 would provide the highest performance. If After Effects CC had still supported 3D ray-tracing and if the Mercury Playback Engine could have more efficiently use the GPU, this might have been true.
On the other hand, the Quadro K4000 not only offers excellent Premier Pro CC performance, the board supports 4K resolutions and can even support four 4K displays. And while it is not important to the results, all our testing has been done on a 4K monitor. (for more on 4K displays, see : ThinkVision Pro2840M: professional 4K display) And with these displays at affordable prices, a video editing workstation should have a minimum of two 4K displays.
This makes the NVIDA Quadro K1200 the first choice for an Adobe video-editing workstation.