Wednesday, October 11, 2017,
- Location: Wilson Hall
- Room: Room 115
- Contact: Angel Gaither
- Email: email@example.com
- Phone: (615) 322-0080
- Audience: Free and Open to the Public
Marcus Watson (Womelsdorf Lab)
Department of Biology
Information hierarchies, reward, and uncertainty in a feature-based learning task
Learning the task-relevance of specific object features enables the deployment of selective attention, which allows many tasks to be performed more efficiently. We track this feature learning and characterize shifts in behaviour (object selection)and information sampling (fixations to objects) that accompany it. Our working hypotheses are that (1) representations of relevant feature dimensions and value are hierarchically organized, (2) information sampling of features follows this hierarchy, and (3) learning is achieved by reducing the uncertainty of predictions about reward.
We test these hypotheses using a novel context-dependent object selection task. Human participants have their gaze tracked as they navigate through a naturalistic 3D virtual environment. On each trial, they choose between two multidimensional objects, only one of which is rewarded. Reward rules are based on object feature values and contexts: a value from one feature dimension is rewarded in one context, while a value from a different dimension is rewarded in another context.
After exploratory periods of variable duration, learning of relevant features occurs abruptly. In a single trial, participants' choices change from showing no evidence of rule-learning to near-perfect accuracy for the remainder of the block. On the same trial, choice times drop and rewarded stimuli begin to be preferentially fixated, consistent with a switch to a reward-driven exploitative strategy in both behavior and information-sampling.
The transition from exploratory behavior to the learned state is preceded by trials in which learners are more likely to have selected objects with non-rewarded values along the relevant feature dimension, as opposed to random selections across dimensions. Consistent with this finding, new rules are learned faster when they require intra-dimensional as opposed to extra-dimensional shifts.
These findings suggest that efficient learning of object relevance employs a hierarchically organized information sampling strategy. Attention is first allocated to feature values within a single dimension, before switching to different dimensions of the object. This information sampling strategy is decoupled from immediate reward-driven strategies during learning.
Our task also enables us to distinguish the relative roles of estimated reward probability of objects and the uncertainty of this estimation in determining attention and learning. Thus we can begin to characterize the roles of these two quantities in driving behaviour and information sampling, within both exploratory and exploitative periods.