Date: Tue, 13 Oct 2009 19:08:45 +0100
Subject: Request for data "Genetic Determinants of Financial Risk Taking"
From: Dan Bolser
To: c-kuhnen@kellogg.northwestern.edu, jchiao@northwestern.edu
Dear Drs. Kuhnen and Chiao,
I read with great interest your article "Genetic Determinants of
Financial Risk Taking".
I noticed that the p-values for the significance of the difference in
the means (figure 1 B and C) are quite marginal, less than 0.02 and
less than 0.04 for 5-HTTLPR and DRD4 respectively.
Was there a significant difference under a two-tailed significance test?
I would be interested to see the underlying data so that I can try my
own statistical analysis. I'm reviewing your paper for a journal club,
and I think one question for discussion will be the difference between
a one-tailed and a two-tailed significance test. I note that your
hypothesis for the direction of the difference seems valid, and that a
one-tailed test seems appropriate in this case.
One question that I would love to know with regard to the work, which
was missing from the paper... Which group of people made the most
money during the test? I guess this result is somewhat 'sensitive'?
;-)
Thanks very much for your assistance with the above request and the questions.
I've shown this paper to about 10 people so far, and I'm really
looking forward to reviewing it. I think it touches on one of the
"defining issues of our age" (to steal a piece of hyperbole from the
global warming campaigners... actually I think you could go so far as
to call your results "an inconvenient truth"!).
Sincerely,
Dan Bolser.
Date: Tue, 13 Oct 2009 13:37:59 -0500
From: "Camelia M. Kuhnen"
To: Dan Bolser
Cc: jchiao@northwestern.edu
Subject: Re: Request for data "Genetic Determinants of Financial Risk Taking"
Dear Dan,
Thank you for your interest in our paper. Unfortunately, we are unable
to share the individual level data from this project at this point.
Regards,
Camelia
Camelia M. Kuhnen
Assistant Professor of Finance
Kellogg School of Management
Northwestern University
2001 Sheridan Road
Evanston, IL 60208-2001
E-mail: c-kuhnen@kellogg.northwestern.edu
Phone: 847-467-1841
Fax: 847-491-5719
Web: http://www.kellogg.northwestern.edu/faculty/kuhnen/htm/index.html
Date: Wed, 14 Oct 2009 02:19:51 +0100
Subject: Re: Request for data "Genetic Determinants of Financial Risk Taking"
From: Dan Bolser
To: "Camelia M. Kuhnen"
Cc: jchiao@northwestern.edu
2009/10/13 Camelia M. Kuhnen
> Dear Dan,
> Thank you for your interest in our paper. Unfortunately, we are unable to
> share the individual level data from this project at this point.
Why is that if you don't mind me asking?
At what point do you think you will be able to share it?
I was wondering, for example, if there were any significant
associations with gender that could be used (on average) to improve
the calculation of the residual 'excess risky investment'? If the data
is unavailable, would it be possible for you to make this test for me?
If I can't repeat the analysis directly, can I ask exactly which 'mean
comparison test' was used? i.e. was it the t-test? Can you please
describe in a bit more detail how the regression was done?
Also (sorry!), Figure 1 B/C shows 'standard errors'. Are these
standard errors of the mean? (it looks that way, but I'm just trying
to make sure I have all the details for my 'journal club').
Thanks for getting back to me so promptly, and thanks again for the
interesting work, and for considering the above questions. I'm sorry
that I can't simply work on these questions myself using the
underlying data.
Sincerely,
Dan.
From: "Camelia M. Kuhnen"
To: Dan Bolser
Subject: Re: Request for data "Genetic Determinants of Financial Risk Taking"
Date: Tue, 13 Oct 2009 20:39:45 -0500
Cc: "jchiao@northwestern.edu"
I am sorry, Dan, but the data set is proprietary and cost us a lot of
money to obtain. We plan to use it in other studies in the future and
at this point we will not make it publicly available. While I
understand there are interesting questions to answer that we have not
addressed in the PLoS paper, such as whether there are gender effects,
I really do not have time right now to do additional analyses to
answer these questions. Please don't take it personally, but I have
other priorities at work at the moment.
Regards,
Camelia
Date: Wed, 14 Oct 2009 03:11:47 +0100
Subject: Re: Request for data "Genetic Determinants of Financial Risk Taking"
From: Dan Bolser
To: "Camelia M. Kuhnen"
Cc: "jchiao@northwestern.edu"
2009/10/14 Camelia M. Kuhnen
> I am sorry, Dan, but the data set is proprietary and cost us a lot of money
> to obtain. We plan to use it in other studies in the future and at this
> point we will not make it publicly available. While I understand there are
> interesting questions to answer that we have not addressed in the PLoS
> paper, such as whether there are gender effects, I really do not have time
> right now to do additional analyses to answer these questions. Please don't
> take it personally, but I have other priorities at work at the moment.
No, of course I don't take it personally, I'm just surprised that
there is no way for you to give me the data.
The normal practice in biology is to provide the data under an
institutional 'non disclosure' agreement that basically says that if I
want to publish any findings based on the data that I should get your
express permission to do so first, and abide by any conditions that
you require (usually co-authorship). Failure to abide by the NDA is
then a serious institutional legal issue.
Anyway, could you not provide the ~5,900 'residuals' with only a s/s
yes/no flag and a 7-repeat yes/no flag? I'm struggling to see what I
could do with that data other than simply try to repeat the analysis
that you have presented. I'm assuming this would just be 65 rows of
data with three columns?
Could you or one of your colleagues please let me know how the
difference between the mean residuals in the different groups was
tested for significance? I'm guessing t-test, but I'd like to know for
sure.
Thanks again for your help (and patience).
Sincerely,
Dan.
Date: Wed, 14 Oct 2009 07:09:53 -0500
Subject: Re: Request for data "Genetic Determinants of Financial Risk Taking"
From: joan.chiao@gmail.com
To: Dan Bolser
Cc: "Camelia M. Kuhnen"
Dear Dr. Bolser,
Thanks very much for your interest in our work and this project more
broadly.
If you would like to collaborate on this project, would you be send us a cv
or some kind of resume so we can better understand your scientific
background and expertise? I was unable to find a your professional website
online.
Regarding distribution of primary data, I understand the issue of 'open
access' to data, but usually, at least in neuroimaging, this is done by the
authors depositing the data into a repository which then mitigates requests
for data. As I understand from Cami, in the field of Finance, distribution
of raw data is much rarer, and typically only occurs to collaborators or
known colleagues who work in the same research field. Finally, as Cami
mentioned earlier, we are still working with the data and have planned a
number of additional analyses and at this time would prefer not to make the
data 'open access'.
In any case, hope that you understand the starting assumptions underlying
our preference not to distribute raw data for the time being.
Sincere regards,
Joan Chiao
Date: Wed, 14 Oct 2009 16:10:04 +0100
Subject: Re: Request for data "Genetic Determinants of Financial Risk Taking"
From: Dan Bolser
To: jchiao@northwestern.edu
Cc: "Camelia M. Kuhnen"
Hi guys,
Thanks for all your kind, patient and careful replies so far.
First off, to pick up your request, I have put some information about
myself at the end of the email. I must apologise, because I appreciate
that this is a courtesy I often forgo. I hope the information won't
affect your decision about weather or not to answer my email! ;-)
OK, to clarify there are two outstanding issues:
1) request for the data, and
2) request for clarification of the methods.
Request for data:
To be clear, I'm not asking for you to make the data open access.
In the first instance, I thought you could 'safely' send me the table
of the 65 "average excess risky investment" scores per subject, along
with a flag for the two genotypes of interest. i.e. 65 rows, three
columns. I want this data so I can repeat the statistical analysis
that you presented in the paper. As a bonus, I thought you could send
me an extra column with the gender of the subjects so I could look for
correlation within those groups. This request still stands, because I
just can't see how releasing that data to me could be in any way
problematic for you.
In the second instance, as Professor Kuhnen raised the issue of
proprietary data, I suggested that I would be willing to sign an NDA,
that would legally guarantee your right to fully control the data and
any results, be they commercial or academic in nature. I am still
willing to sign an NDA for the above data if you see fit to draw one
up.
In case you are worrying, my intention is simply to better understand
the analysis that you published. i.e. I want to look at the standard
deviation of the "average excess risky investment" and also try
various different statistical tests to compare the difference in the
mean, including a comparison of a one-tailed and a two-tailed test.
This is simply for my own curiosity. I do not intend to publish any of
the results, nor do I seek to obtain financial gain from the results.
Request for clarification of the methods:
I asked several questions about the analysis presented in the
publication that I honestly feel are (more or less) reasonable. I feel
that this kind of question is exactly why there is a communicating
author. I'll reiterate my questions here because of the garbled nature
of my previous emails (sorry about that):
1) Figure 1 B and C show 'standard errors'. Are these standard errors
of the mean?
2) Which 'mean comparison test' was used in the comparison? i.e. was
it the t-test?
3) Was there a significant difference between the groups under a
two-tailed significance test?
4) Can you describe in a bit more detail how the regression was done?
5) Is there any significant association with subject gender?
6) Which group of people made the most money during the test? I
suppose this has more to do with the experimental design than the
exact strategy employed.
Please accept my apology if any of these questions are very basic
within your respective fields. As I hinted before, I'm a biologist
(bioinformatician), and I don't think there are such 'standard' tests
in the field.
About me
I'm a bioinformatician working as a post doc research assistant at the
University of Dundee. I have a Ph. D in the bioinformatics of
protein-protein interaction and protein structure analysis. I'm
interested in NGS, genotyping, GWAS, and the future of a
'post-genomic' society. Since you were looking, here are some links:
* http://network.nature.com/people/dan
* http://www.linkedin.com/in/bolser
* http://openwetware.org/wiki/User:Dan_Bolser
* http://www.slideshare.net/danbolser
Please let me know if anything is still unclear.
Sincerely,
Dan Bolser
Date: Wed, 14 Oct 2009 10:40:40 -0500
From: "Camelia M. Kuhnen"
To: Dan Bolser
Cc: jchiao@northwestern.edu
Subject: Re: Request for data "Genetic Determinants of Financial Risk Taking"
Dan,
I normally do not share data unless requested by the editor of a
journal. However, I will make an exception now just because of the
cultural difference between your field and mine.
I have attached here the data file containing subject-level residual
investment and genotype. This is absolutely the maximum amount of data
that Joan and I will share at this point.
Regarding your questions:
(1), (2) and (3) you can answer on your own based on the attached data
(4) is answerable based on the details in the paper and supplement
(5) and (6) are not the focus of the paper and we will not discuss these
additional results for now. They may be part of other publications.
Good luck with your research,
Camelia
Date: Wed, 14 Oct 2009 17:39:19 +0100
Subject: Re: Request for data "Genetic Determinants of Financial Risk Taking"
From: Dan Bolser
To: "Camelia M. Kuhnen"
Cc: jchiao@northwestern.edu
Awesome!
Thanks very much for providing the data!
> Regarding your questions:
>
> (1), (2) and (3) you can answer on your own based on the attached data
> (4) is answerable based on the details in the paper and supplement
> (5) and (6) are not the focus of the paper and we will not discuss these
> additional results for now. They may be part of other publications.
With regard to question 2, "Which 'mean comparison test' was used in
the comparison?", I have tried a one-tailed t-test (Welch Two Sample
t-test), a one-tailed ks-test (Two-sample Kolmogorov-Smirnov test) and
a one-tailed wilcox-test (Wilcoxon rank sum test with continuity
correction).
The wilcox-test seems to fit the given p-value for 5-HTTLPR, but not
DRD4 (nor any of the other tests).
Can you enlighten me as to which 'mean comparison test' was used to
obtain the p-value of 0.04?
The lowest p-value I can obtain using the above tests is about 0.07.
Thanks again for providing the data.
Sincerely,
Dan.
Date: Wed, 14 Oct 2009 13:00:38 -0500
From: "Camelia M. Kuhnen"
To: Dan Bolser
Cc: Joan Chiao
Subject: Re: Request for data "Genetic Determinants of Financial Risk Taking"
See attached pdf -- the one-tailed p-value of the mean comparison test
is 0.0364.
Camelia
Date: Thu, 15 Oct 2009 10:38:41 +0100
Subject: Re: Request for data "Genetic Determinants of Financial Risk Taking"
From: Dan Bolser
To: "Camelia M. Kuhnen"
Cc: Joan Chiao
2009/10/14 Camelia M. Kuhnen
> See attached pdf -- the one-tailed p-value of the mean comparison test is
> 0.0364.
Hi Camelia,
Thanks for the information.
I have now resolved the difference that I was seeing between your
results and mine (details included below for your reference).
Would you mind if I wrote this up as a comment for the PLoS One
website? I think this is the kind of thing they would like to see
there. If you agree, I'll send the text for your approval before
posting it there.
Thanks very much for all the help.
All the best,
Dan.
Details:
I found that the problem was that the stats package I use (called 'R')
does not automatically pool the variance of the two groups when
performing the t-test. (Stata clearly does this by default.) The
result is that R uses the Welch (or Satterthwaite) approximation to
the degrees of freedom for when the variances of the two samples is
not equal. Usually the difference in the magnitude of p is not large,
but in this case it was 'important' (0.07 vs. 0.04).
I used Levene's test to check the assumption of equal variance, and
found no evidence for unequal variances (F(1,63) = 0.8758, p =
0.3529).
Changing the t-test to use equal variances now gives me the exact same
result in R as you observe in Stata.
Date: Thu, 15 Oct 2009 13:31:05 -0500
From: "Camelia M. Kuhnen"
To: Dan Bolser
Cc: Joan Chiao
Subject: Re: Request for data "Genetic Determinants of Financial Risk Taking"
Sure, Dan, you have my approval to write up the comment on the PLoS One
website.
Regards,
Camelia
