For most of the history of film, if you wanted to insert something into the picture that didn't exist, the camera had to be stationary. Motion tracking allows artificial objects to be inserted convincingly into real footage. Phil Rhodes explains
Convincing alignment between the apparent motion in the frame with the motion of an inserted, unreal visual effects element was simply beyond the sort of precision a human being can achieve, even given an animation stand and a stack of paper with punched registration holes, and even if it's OK for the job to take a week.
Manual registration between real and unreal elements has been done – Who Framed Roger Rabbit is a shining example of the technique – but usually involving visibly unreal objects. People wanting to drop in more than a cartoon rabbit* need motion tracking, which is really just a term for getting the computer to evaluate how things are moving. This is something a machine can do with a degree of consistency that humans, practically speaking, can't. It's worth a brief examination of how this is actually done, because knowing how it works helps us shoot material that allows it to work better. Or, to put it another way, it's worth knowing how to keep the visual effects department sweet without having to organise a weekly beer delivery to their offices. Most people are aware of how poorly-shot material can cause problems with chromakey shots, but motion tracking is just as sensitive to noise and heavy compression.
The first users of point tracking techniques were the military, with their keen professional interest in designating a thing to blow up, and allowing an automated system to make sure the bombs, bullets, rockets and missiles all went down the appropriate chimney. The earliest implementations performed a simple search by exhaustion, taking the chunk of image containing the target and comparing its pixel values with those of potentially-matching areas, subtracting one set from the other and looking for a result near zero.
This is a longwinded and simplistic approach, and more modern mathematics allow for various refinements, but the fundamental idea is that of looking for matching images frame by frame. That's why most point trackers allow the user to define both a target area and a search box, which limits the amount of image that must be searched to find a match. This makes things a lot faster, but of course the search box needs to be large enough to enclose the largest possible motion between frames, or the target area will move outside it. At this point, the track will fail spectacularly, because this technique doesn't, in its simplest form, detect a good match; it simply detects the least worst match, which may be a very bad match indeed.