We recently completed the first of what will be a running series of online rationality experiments. We’ll be publishing a short report on each experiment, regardless of whether our results were significant or exciting. In part we’re hoping to give you a look inside our process at CFAR, to see the kinds of rationality training techniques we’re considering and how we go about testing our hypotheses.

Our other main goal is to avoid the problem of publication bias. If only significant results get published, while other studies languish unpublished in file-drawers, the public never knows to what degree the significance of the published results represents the discovery of real phenomena in the world, and to what degree it simply represents the fact that if you test enough hypotheses, some will look significant just by chance.

Personally, I did find the results of our first experiment somewhat interesting, despite the fact that the main hypothesis we were testing was not supported. The full 4-page report, which details our background reasoning, method, results, and discussion of those results, is here:

**CFAR Rationality Experiment #1: Surprise as a Cue to Probability**

Here’s a summary:

We hypothesized that prompting people to consider their feelings of surprise about a hypothetical outcome might improve their accuracy in predicting the probability of that outcome. This is a technique we sometimes use at CFAR to combat overconfidence in planning: we might feel confident that we’ll have finished some particular project by Thursday, but when we ask ourselves: “Imagine Thursday night rolls around and the project’s not done yet. How surprised would you be?” we often realize the answer is, “Not very.” Which means we should assign a lower probability to that outcome than our initial, overconfident guess.

We asked subjects (n=101) to make predictions about the demographics, values, and lifestyles of Americans today. (Data for the questions came from Pew.) Subjects were randomly assigned to one of three groups:

1. A **control group** was simply asked to estimate the probability of each of the outcomes; for example, “What do you think is the probability an 18-29 year old American who self-identifies as conservative has a tattoo?”

2. The first intervention group was given a **one-way surprise question** (“Imagine meeting an 18-29 year old American who self-identifies as conservative. How surprised would you be to learn (s)he has a tattoo?”) and then asked the same probability question as the control group.

3. The second intervention group was given a **two-way surprise question** (“Imagine meeting an 18-29 year old American who self identifies as conservative. How surprised would you be to learn (s)he has a tattoo? How surprised would you be to learn (s)he does NOT have a tattoo?”), and then asked the same probability question as the control group.

We were interested in whether either intervention group would make more accurate probability estimates than the control group. However, there was no significant difference between the accuracy of the groups (although the one-way surprise group was slightly less accurate). This suggests the surprise technique is not universally useful, in its current form, at least – although it’s still plausible that it could prove effective with (1) more instruction on how to perform it, and/or (2) a different type of prediction question, like one more similar to planning. We’ll be investigating those possibilities in the future.

But one statistically significant pattern did emerge. The one-way surprise intervention group gave significantly higher probability estimates compared to the other two groups. (In fact, their average estimate was higher on all 13 out of 13 questions.) Our interpretation: there is some evidence that the act of imagining an outcome makes it seem more probable, which means that the one-sided surprise technique could be introducing a new bias, and should be used with caution.

“Interesting! I have a hunch that the reason the one-way surprise group gave higher estimates of what they were asked to think about might be related to “anchoring.” Just the fact that the experimenter asked them to consider the possibility of “X” might make the subjects rate it as more likely, in the same way that asking if house is worth more or less than $1,200,000 tends to cause their guesses as to its actual value to skew toward that value.”

Are there any plans to publish these studies in the standard scientific literature a.k.a peer reveiwed journals?

Wow. This IS interesting. Especially the part where the one-way questioning differs noticably more from the control group than the two-way questioning.

I am not sure why this is a surprise at all. Is it not common knowledge in psychology, that a question can carry an intentional yield from the interviewer and that this intentional yield is being incooperated into the subjects response. More or less in the way, that they think some kind of answer is expected from them and thus they try to guess that particulare answer and try to give it! Asking them, “how surprised they would be” suggests them that there is a surprising answer and thus biases the answer.

Just a thought: is the two way surprise increasing the variance? I would formulate the hypothesis, that there will be a bigger spread and that you might find potential two-group clustering! Please tell me if that is so. Maybe we can publish that together 😉

Greetings from a Neurobiologist

GalacticAC, it’s actually the other way round. Asking the test subjects “how surprised would they be” leads to *higher* estimates. (If asking “how surprised” would suggest that the answer is surprising, it would lead to lower estimates.)

Is the data available?

If the data isn’t available because it needs to be cleaned up and anonymized or something and you don’t have the resources, I volunteer. Email me with your desires and the raw data. If it’s not available to me because I missed a link somewhere… point me to it? 😀 If it can’t be made available for some reason, let’s talk about future experiment design that doesn’t lock up the actual data.

I’m really glad there are professional people looking into this. I wish I lived in the Bay Area, because I’m really interested in this.

Following up GalacticAC’s comment, I wonder if there’s a significant mis-communication.

Like Vilaim, I would interpret ‘surprise’ as suggesting a low probability event. If the one-way group are trying to comply by raising their probability estimate, are they interpreting it as a synonym for ‘good’ or ‘newsworthy’?

If there were unlimited resources, then a dictionary search of one-way tests would be interesting.

As for the imagining it, increases subjective probability. There is already studies on that. It does indeed introduce bias. See: http://www.goodreads.com/book/show/89158.Expert_Political_Judgment

The sample size of this study is way too small for it to be plausible. With the publication bias in this kind of psychology, no one should be convinced by such a small sample size.

Maybe it works better as a means of modifying a first estimate. After they give their answer, ask them to consider their hypothetical surprise, THEN ask if they’d like to change their answer.

Another approach is “How confident are you in your answer? Then much money would you bet on your answer being correct?”.

Ah, this is a good question.

Personally I think the difference stems from this:

Described person self-identifies as conservative. Here I have no reference, so I use my own, and determine whether I would identify someone as a conservative. I assume:

P( I find a person conservative | Person self-identifies as conservative) = close to 1

This person has a tattoo, that is surprising, since it doesn’t match my values, I think a tattoo is non-conservative.

P(Person has tattoo | person is conservative ) = close to zero

However I quickly realise that I have ignored another possibility, people may use different values to determine conservatism. I have ignored:

P (Person self-identifies as conservative | person has a tattoo) = ???

Now that I have realised that people have very different views on such matters, I cannot really be surprised anymore when a person who self-identifies as a conservative turns out to have a tattoo.

Another suggested possibility: the concept of “surprise” carries negative associations (i.e. being “wrong” in judgment), which result in positive feedback for increasing probability estimates:

1) Probability estimate reconsidered against “surprise estimate”

2) Increasing certainty in the validity of preexisting cognitive model

3) Increase in perceived probability

4) Repeat 2) & 3)

It’s easy enough to subconsciously discount or devalue a real-world statistic as representative of a different (read: flawed) cognitive model. If feedback for “correctness” was delivered mid-survey, values may have been biased through discrete repetitions of this cycle.

Two-sided surprise questions result in mutual interference between the two lines of questioning, +P(surprise) and -P(surprise). At this point, I’d predict a reversion to simplified control-group reasoning.

Commentary welcome.