Your monitor might be accurate, but you can't calibrate your eyes. We open a pandora's box and ask, how much calibration do you need?
What everyone wants is for the grading display to look exactly like what the audience will see. That’s next to impossible, given the notorious inconsistency of home TVs, but there’s another problem. No matter how good TVs get, you still can't calibrate your eyes.
Normally, there's a storm of controversy when anyone suggests working with a less-than-perfect display, and to be fair, a well set up monitor never hurt anyone. No matter how unreliable our eyes are, we should control the things we can control, so as to minimise the errors once we include the things we can't control. Even so, there’s a few problems with the human visual system, mainly in the area of moment to moment consistency, that no display calibration can fix – and there’s science to show it.
A lot of what we're about to discuss is based on the issues explored in, among other things, this 2016 study in the Journal of Experimental Psychology by Kyle O. Hardman, Evie Vergauwe and Timothy J. Ricker. To save us all ploughing through the paper, one major conclusion of it is that human beings cannot accurately remember what colour things are from, literally, one moment to the next. We should be clear: humans can tell two different colours apart quite well if they're viewed side by side, but they become very quickly unable to match or differentiate colours viewed independently.
Side by side, on the other hand, humans are very good at differentiating colours. Really good monitor calibration ensures that two monitors put side by side will look very nearly indistinguishable, but it's massively difficult to achieve a completely perfect match. When Sony first showed the BVM-HX310 monitor, it was compared side by side with the gold-standard BVM-X300, and the match was not completely perfect.
It was incredibly close, shockingly close, more than close enough, but the two pictures were distinguishable. If an organisation the size and expertise of Sony can't quite do it for a crucial international trade show, it's a fair bet that it's difficult to do in general, precisely because humans are great at side by side colour comparison.
Separate those two monitors into two different rooms, though, and most people will become unable to tell one from the other in the time it takes to walk between the two (actually, they'll forget in seconds). Golden-eyed experts might claim an above-average ability to do this, but even if they're ten times better, they'll lose any accurate idea of what the monitor looks like, twice, in the time it takes to go to the bathroom.
It's not just about colour
Of course, calibration works on things other than colour, particularly both black and white levels. The threshold of visibility in shadow detail is something that's really easy to get wrong in – let's call it – less well-funded monitoring setups. Egregious bits of filmmaking equipment love to hide in shadows that someone's home TV, set up to look bright and punchy in the showroom, might make visible. This is not subject to the same sort of forgetfulness; once our eyes are adapted to the light, something’s visible or it isn’t.
It's for this reason that brightness, particularly black level, is sometimes cheated a little. There are many sources of inaccuracy to begin with: the viewing environment in the average lounge is generally brighter and warmer in tone than the average grading suite, and people sit in grading suites all day, as opposed to watching TV for an hour or two. Home TVs, in standard dynamic range, tend to have brighter whites and foggier shadows than standard, while in HDR, home TVs tend to be dimmer than professional displays, though they still tend to have brighter shadows.
The reaction to this is sometimes to tweak the black level up on grading displays. Apparently, most people are more worried about inadvertently revealing something inappropriate than they are about inadvertently crushing details; story points are rarely played out in the deepest shades of black. As a result, it's not uncommon to set up displays with very slightly brighter shadows than the specifications would recommend (sometimes by switching a complex gamma curve for a simpler true power law). We may not be able to remember what colour something is, but we can tell if a detail is visible or not.
None of this makes calibration irrelevant, but it does mean that anyone forced to economise shouldn’t be too depressed. A picture need not be ideal to be acceptable. It's often said that a delta-E of 2 represents the threshold of visible change in colour, but that’s a change we can’t identify in delayed estimation. It's not clear how large the error has to be to be big enough for humans to identify independently of a reference, but the safe assumption is “a lot bigger than 2.”