One of the hardest things for computer animation to get right is the human face because we're so used to seeing faces. Even some of the most carefully done work to date — memorably the young Jeff Bridges produced for Tron: Legacy or the recreation of a youthful Sean Young for Blade Runner 2049 — has been frowned upon simply because the faces didn't quite look right. In both cases, facial capture was used, so that the animation is driven by a recording of a real human and in both cases that was done by painting tracking dots onto a person.
Tracking without dots has been demonstrated before, perhaps most memorably using the Kinect accessory built for the Xbox 360 and Xbox One. Kinect, discontinued this year but still much in use, projects a random pattern of dots onto a scene and observes those dots with a camera mounted with a horizontal offset. This makes it so that the apparent offset in the horizontal position of the dots is proportional to the depth of the scene at that point. This is much the same approach taken by Apple's recent iPhone X, which is no surprise given that Apple now owns PrimeSense, who developed the technique.
Alternative depth-sensing technologies include time-of-flight, where a very (very) fast sensor detects differences in the timing of a pulse of light bouncing back out of a scene, and multiple-camera approaches (which is how Microsoft's competing Windows Hello feature works). Consumer-level implementations of all of these techniques are generally subject to considerable noise, although a multi-camera array such as the sparse Light-field developed by Fraunhofer has much better performance at the cost of requiring a large array of perhaps 16 cameras, plus the associated control and recording provisions.
The new Apple phone is perhaps best known for using its 3D capture facility as part of a facial recognition security system (the actual security of which has been seriously questioned, but that's a subject for another day.) As we might expect, someone's already done facial capture with it:
At first glance, it looks pretty good, perhaps somewhat less trembly (read: noisy) than similar attempts made using the Kinect. It looks better than it probably is because the image map seems to be of a much higher resolution than the 3D capture. Notice the jagged edges of the shadow beneath the object to get a real idea of what the 3D resolution is like. This would limit the ability of a 3D renderer to relight the face to suit a different environment, but it might make for perfectly reasonable face tracking.
Of course, it's still a little twitchy, because we simply don't have 3D cameras of the resolution and noise level that we'd like to have, but it hints at a positive future. Whether it solves the common problems of facial capture is another matter. Humans are super-sensitive to the appearance of other humans and even with very good facial capture, it still isn't trivial to make things look right. We'll need more work on this before convincingly capturing humans becomes a push-button experience, but the appearance of this sort of feature on a cell phone, of all things, is an interesting step in the right direction.