Construct validity, and how do we know that we're not barking up the wrong tree?

Since the super-high-impact Science article, a lot of researchers in psychology have been discussing replicability in psychology, and what can be done to increase it.
What kinds of experiments will this lead to? How might we, for example, address the question ‘do infants follow an adult’s gaze’ is as replicable a way as possible? Well, we make some videos (see the screenshot below) of an actress, against a blank background. We want everything to be tightly standardised, so we show exactly the same set of videos to lots of different infants. To ensure test-retest reliability we show the same (or similar) set of events to each particular infant across tons of different trials. In each trial the might start by looking down for 8000ms, then look up directly at the camera for 8000ms, then down, directly at one of the two objects, for 15000ms. We present a series of these identical clips, within everything nicely counterbalanced between objects and sides. We present them in a room, using an eyetracker, with lighting conditions tightly controlled, and we analyse the results in a tightly standardised way. Infants who follow gaze tend to look at the same object to which she is looking within a particular time window after she looked down at it, with certain criteria for excluding trials based on poor tracking etc etc.


So this is a great, highly replicable experiment, right? Well, sure, but…
The problem that I, and other researchers who work using a lot of naturalistic experimental paradigms, is this – what is this experiment really measuring? It claims to measuring gaze following – but there are a number of ways in which the situations in which the infant’s being asked to follow gaze are nothing like the situations in which infants actually have to follow gaze in the real world. First, there is the discrete, repeated trial structure. The infants learn that an identical series of events (actress looks down, looks up, looks to one object) happens again and again. Second, there is the setting, which looks like a situation that a child rarely if ever actually encounters in the real world. Third, there is the timings – which again is completely different to the time-frame on which we do actually follow gaze in real life.
Researchers who measure infants’ naturalistic gaze behaviour suggest that infants don’t actually follow gaze in ‘real-world’ settings. This, despite decades of research using paradigms such as the one I’ve shown, suggesting that infants do follow gaze, using these reductionist paradigms. I’ve got some (unpublished) data suggesting that if you measure infants’ gaze following using these videos, and measure gaze following in the same infants in a table-top task from the Early Social Communication Scales, you get no cross-validity at all between the screen and the tabletop tasks – even though they claim to be measuring the same thing.
So if this screen-based task isn’t measuring gaze following, well what is it measuring? I think this illustrates a real, and important problem in experimental psychology. In attempting to standardise and ensure consistency by paring down and reducing untracked variables, the risk of throwing the baby out with the bathwater is very real.
I make the same point in this paper, in which I compare screen-based and naturalistic methods for measuring infants’ peak look duration. I show that traditional screen-based assessments of peak look duration don’t actually map on to individual differences in peak look duration, measured in naturalistic settings. So again, same question – if they’re not measuring the ‘real-world’ construct of how long children look at novel objects that are presented to them, then what are they actually measuring?

Comments

Popular posts from this blog

Parental neural responsivity to infants’ visual attention: how mature brains influence immature brains during social interaction.

How the differences between what babies are incredibly bad at, and what they’re incredibly good at, might all lie on one dimension.

Should we only use the prior literature as the basis for deriving new hypotheses?