It’s an elegant idea I haven’t run into before: gather data on site preferences by selecting what version to present on the epsilon-greedy solution to the multi-armed bandit problem and just letting it run. You’re looking at a setting where effectiveness can be easily measured, such as by clickthrough, but the contrast is with A/B testing where the effect of a single change is being measured for a time and then a switch is being made, if desirable. Comments suggest tweaks/details like ensuring that a single visitor sees a consistent view of the site, at least for small windows of time.
The technique builds in the idea that, if preferences change over time, the site could automatically detect that – which the blog author and the commenters note isn’t really things work – but it gets me wondering if there *are* choices that work that way. Perhaps not in key navigation, but how desirable a piece of content is might evolve over time – perhaps code like this could be installed under a rotating banner of featured items (we have a rotating slideshow of news items at the top of the College’s website) to figure out which ones get clickthrough and have those persist with less effective ones fading out more quickly. For a place like a College which may not get many repeat visitors nor have profiles on their visitors and their interests the way Amazon and other big eCommerce sites do, this might be a lightweight method for getting some preference learning built in.