The Mystery of Born to Rebel:
Sulloway's Re-Analysis of Old Birth Order Data
by Judith Rich Harris
In my previous three essays on this website, I explained that
birth order effects can be detected when people are in the presence
of their parents and siblings but not in other social contexts, and
that standard personality tests generally show no birth order
effects. Yet in the book Born to Rebel (1996), Frank J.
Sulloway claimed that birth order effects are ubiquitous and
important -- capable of changing the very course of history! -- and
presented a plethora of data to back up his claim. Before I can
dismiss birth order effects as unimportant, I have to look carefully
at those data and consider what Sulloway said about them in Born
to Rebel (BTR for short) and elsewhere.
According to Sulloway, children develop different personalities
depending on their birth order. Their childhood experiences with
parents and siblings affect the things they do, the attitudes they
hold, and the way they behave to others all through life. Sulloway
characterizes firstborns as ambitious, domineering, jealous,
aggressive, conventional, and close-minded; he depicts laterborns as
rebellious, adventurous, agreeable, sympathetic, and receptive to
new ideas. I might as well confess right off the bat: I am a
firstborn. Sulloway, as you may have guessed, is a laterborn.
Though much of Sulloway's data is historical, the part of his
book that made the biggest impression on readers in the academic
world was his re-analysis of data from a 1983 book on birth order by
the Swiss researchers Cécile Ernst and Jules Angst (here
called E&A). E&A's book includes a comprehensive review of
hundreds of previous birth order studies. The studies reviewed by
E&A, now 22 to 56 years old, are the only "modern" data Sulloway
presents in BTR -- the only data that come from systematic studies
of the personality or behavior of ordinary people. Sulloway's
re-analysis of the studies in E&A's review is the keystone of
his evidence. The results of this re-analysis were presented in
Table 4 (p. 73) of BTR and, a year earlier, in a commentary
(Sulloway, 1995) on an article by David Buss in the journal
Psychological Inquiry.
E&A did a thorough and conscientious job of reviewing the
birth order literature. They examined research reports from all over
the world, placing the most weight on carefully done studies that
controlled for sibship size (the number of offspring in a family)
and socioeconomic status. Here is their conclusion:
Birth order and sibship size do not have a strong impact on
personality. . . . An environmental variable that is
considered highly relevant is thus disaffirmed as a predictor for
personality and behavior. (E&A, 1983, p. 284, emphasis
theirs)
But Sulloway re-analyzed the data from the same set of studies
and came to a very different conclusion. He maintains that if the
data from the well-done studies are combined in what he refers to as
a "meta-analysis," the following result pops out clearly:
In spite of occasional negative findings, the literature on
birth order exhibits consistent trends that overwhelmingly exceed
chance expectations. (Sulloway, 1996, pp. 72-74).
How did Sulloway look at the same data -- old data that had
already been winnowed many times by others -- and find gold where
the others had found only chaff? Why have other investigators been
unable to figure out where he got the numbers in Table 4 of BTR? In
an effort to answer these questions I will look at what Sulloway has
said, in BTR and elsewhere, about his re-analysis of E&A's data,
and will consider the various explanations he has given for the lack
of agreement. I'll also comment on his research methods and,
briefly, on the historical data in BTR.
Ernst & Angst Didn't Do a Meta-Analysis
Sulloway has pointed out several times (1995, p. 77; 1996, p.
472; 1998c) that E&A conducted their 1983 review of the birth
order literature without the benefit of meta-analysis. "In a
postscript to the preface of their book," he wrote, "Ernst and Angst
regretted that meta-analytic methods had become available only as
they were completing their study and hence were not employed by
them" (1995, p. 77). One is left with the impression that it was the
unavailability of the technique Sulloway used that caused E&A to
conclude -- incorrectly, in his opinion -- that birth order has
little or no effect on adult personality.
It is true that E&A expressed regret about not having used
"the excellent meta-analytic method" (1983, p. xi) that was just
coming into use in the early 1980s. But they were referring to the
sophisticated statistical procedure, new at the time, for combining
multiple results by taking into account the magnitude of the effect
found in each study and the number of subjects who participated in
that study. E&A were not expressing regret that they hadn't used
the method called "vote-counting" -- the method Sulloway actually
used. Vote-counting involves a simple tally of positive, negative,
and no-difference results (or just positive and negative); plus a
simple calculation, using binomial statistics, to determine whether
the outcome deviates significantly from chance expectations. Though
Sulloway called his tally of E&A's data a "meta-analysis," that
term is ordinarily restricted to the method that takes into account
effect size and number of subjects.
If E&A had wanted to use the vote-counting method they could
certainly have done so, because it's as old as the hills. Their
decision not to use it was a wise one. As medical researcher Mark
Petticrew (2001) has pointed out, it is often inappropriate, for a
variety of reasons, to use statistical procedures to pool data in a
systematic review of the literature. "Systematic reviews should not
therefore be seen as automatically involving statistical pooling,"
Petticrew explained, "as narrative synthesis of the included studies
is often more appropriate and sometimes all that is possible" (p.
100). Furthermore, systematic reviews of the literature, with or
without a meta-analysis, "are not intended to be a substitute for
primary research" (p. 101). Because the studies combined in a
meta-analysis tend to be of variable quality, it is not uncommon for
meta-analyses to produce results that are later contradicted by
larger, more carefully done studies (LeLorier et al., 1997). It is
the large, carefully done study (in medical research, the randomized
control trial), not the meta-analysis, that biomedical scientists
regard as the "gold standard."
E&A were well aware of the shortcomings in the research
studies they reviewed. To check on the conclusions they drew from
their review, they carried out a large, carefully done study of
their own. Their study supported their conclusions and was reported
in the same book. I will return to E&A's study later in this
essay.
In any case, the vote-counting method Sulloway used was
available to E&A, though they chose not to use it. Since neither
E&A nor Sulloway carried out a real meta-analysis, the
unavailability of this method before 1980 is not a viable
explanation for the difference in their conclusions.
The Numbers in Table 4
The outcome of Sulloway's vote-count is displayed in Table 4 of
BTR (1996, p. 73). This table is headed "Summary of 196 Controlled
Birth-Order Studies, Classified According to the Big Five
Personality Dimensions." It shows a total of 72 "confirming" results
(that is, favorable to Sulloway's theory), 14 "negating" results
(opposite to those predicted by his theory), and 110 "no difference"
results (also unfavorable to his theory, since his theory predicts a
difference). In the text, Sulloway reports that "72 of the 196
studies display significant birth-order results that are consistent
with my psychodynamic hypotheses." The likelihood of obtaining this
result by chance, he says, is "less than 1 in a billion billion" (p.
72).
In January 1997, BTR was reviewed in the journal Science
by the sociologist and historian John Modell. It was this review
that first drew my attention to Table 4. Referring to Sulloway's
re-analysis of E&A's data, Modell said, "I was persuaded by
Sulloway's reworking of these materials -- until I tried to
replicate it with [E&A's] literature review in hand. I could not
do so, try as I might, or even come near" (1997, p. 624).
It is unusual for a person writing a book review to attempt to
replicate an analysis reported in the book. Perhaps Modell wouldn't
have felt it necessary if Sulloway had provided the information that
normally accompanies the published report of a meta-analysis. But
the voluminous appendixes and endnotes of BTR do not contain what
psychologist Toni Falbo, in another critical review of BTR,
described as "an essential element in any presentation of
meta-analytic results" (1997, p. 939): a list of the studies that
were included.
To the best of my knowledge (as of May 22, 2002), Sulloway has
still not released his list of the "196 controlled birth-order
studies" he found in E&A. In the absence of this list, attempts
to figure out how and where he got the numbers in Table 4 must be
based on what he has said, in BTR and elsewhere, about the method he
used to re-analyze the data from E&A's survey of the birth order
literature.
At least three attempts have been made to replicate Sulloway's
re-analysis. Modell's was the first. The second was mine, described
in Appendix 1 of The Nurture Assumption (Harris, 1998b). The
third, by Frederic Townsend (in press) has not yet been published.
All three attempts failed. That is, all three produced tallies that
do not match the numbers in Table 4 of BTR.
In considering Sulloway's criticisms of these attempts, it is
important to understand what Modell, Townsend, and I were trying to
do. We weren't trying to re-analyze the data from E&A's survey:
we were simply trying to reproduce Sulloway's methods in an effort
to replicate his results.
"Studies" versus "findings."
Table 4 of BTR (1996), and
a nearly identical table in Sulloway's 1995 commentary, are headed
"Summary of 196 Controlled Birth-Order Studies." But a footnote
under these tables (1995, p. 78n; BTR, p. 73n) conveys this
information: "Each reported finding constitutes a `study.'"
Thus, Table 4 actually contains 196 findings, not 196
studies. The distinction is important, because a single study
(employing a single sample of subjects) can generate multiple
findings. If a researcher found, for example, that a particular
sample of firstborns was more conservative, punitive, and nervous
than the laterborns in the study, that would be three positive
findings. But what if the researcher found that the firstborns in
that sample were more conservative, punitive, nervous,
self-centered, ambitious, jealous, domineering, organized, and
parent-oriented? How many positive findings could a single study
contribute to Sulloway's tally? As far as I know, Sulloway has never
answered this question. Thus, the uncertainties about which studies
were included in his tally and which were counted as "confirming"
are compounded by an additional uncertainty: How many findings did
each study contribute? How many studies contributed the 196
findings? The number of studies must have been less than 196, but
how much less?
The hardcover edition of BTR (1996) doesn't offer many clues to
Sulloway's methods. The note under Table 4 just says: "Data are
tabulated from Ernst and Angst (1983:93-189), using only those
studies controlled for social class or sibship size."
In pages 93 through 189 of E&A's book, hundreds of studies
are reviewed. Twenty-six tables, many of which run to two pages,
give lists of studies, specify whether or not they were controlled
for sibship size and social class, and summarize their results.
Studies are also described in the text. Information provided in the
text isn't necessarily included in the tables and vice versa.
I spent days combing though those 97 pages of E&A, searching
for findings from studies controlled for sibship size or social
class or both. Sulloway reported 110 no-difference findings; I
counted 109. Sulloway reported 14 negative findings; I counted 13.
So far so good. But Sulloway reported 72 positive findings and I
counted only 52. Five other studies yielded no clear-cut outcome in
regard to Sulloway's theory. Altogether I tallied 179 findings,
versus Sulloway's 196.
In an online essay, Sulloway (1998c) offered an explanation for
the unsuccessful attempts to replicate his re-analysis of E&A's
data. Modell (1997), he said, had erred by counting studies, not
findings. "Modell had overlooked a crucial footnote at the bottom of
the table in which I presented my meta-analytic findings," Sulloway
alleged. "Because Modell counted studies -- ignoring multiple
findings in the same study -- he naturally obtained different
totals."
The trouble with this explanation is that overlooking the
footnote under Sulloway's table doesn't result in counting studies
instead of findings: it results in counting findings and
thinking they are studies. When E&A reviewed studies that
produced multiple findings, they almost always reported each finding
in a different section of their chapter. For example, Blustein's
(1967) result for conformity is reported in E&A's Table 25 (p.
126), her result for academic motivation is reported in Table 28 (p.
138), and her result for academic self-esteem is reported in Table
33 (p. 156). It is possible to tabulate all the findings reported by
E&A without noticing that some researchers' names are repeated
several times. Koch's sample of 384 five- and six-year-olds
contributed eleven findings to E&A's chapter (and to my tally);
Macbeth's 981 college students contributed ten.
Counting findings and thinking they are studies wouldn't affect
the outcome of Modell's tallies and leaves his failure to replicate
Sulloway's vote-count unexplained. The accusation that Modell
overlooked the footnote is a red herring.
I got slapped with the same red herring. "Unfortunately,"
Sulloway (1998c) wrote, "Judith Harris followed in Modell's
methodological footsteps, basing her own meta-analytic counts on
`studies.' She did so in spite of being fully aware that my own
counts were by `findings.' Additionally, she made no effort to
ascertain how these two alternative methods of counting might differ
in their outcomes." That accusation is demonstrably false; it is
directly contradicted by what I said in Appendix 1 of The Nurture
Assumption (pp. 368-370). As I explained there, I counted
findings in E&A and then figured out how many studies these
findings came from. I even reported both results: 179 findings, 116
studies.
The accusation that Modell and I ignored the distinction between
studies and findings is all the more maladroit in view of Sulloway's
own failure to distinguish clearly between them. Consider these
statements:
If we decide to ignore all birth-order studies [in E&A's
survey] that were uncontrolled either for social class or sibship
size, we are left with 196 well-designed studies. . . . involving
120,800 subjects. (Sulloway, 1995, p. 77)
If we ignore all birth-order findings that lack controls for
social class or sibship size, 196 controlled studies remain in Ernst
and Angst's survey, involving 120,800 subjects. (Sulloway, BTR, p.
72)
If we take the remaining 196 studies [in E&A's survey] that
are controlled for class and sibship size, we may ask how many
significant findings are there in this set of 196 studies.
(Sulloway, 1998b)
But there weren't 196 well-designed studies, only 196
findings. And there weren't 120,800 subjects, because there
weren't 196 studies. Sulloway's repeated use of the word "studies"
is misleading but not outright wrong, because in a footnote under a
table he redefined "studies" to mean "findings." But there is no way
to redefine the word "subjects" that would make "120,800 subjects"
true. The number of subjects depends on the number of studies, not
on the number of findings. If counting a subject ten times (in the
case of a study that produced ten findings) turned him or her into
ten subjects, researchers could save themselves a lot of trouble.
Now we can see why E&A refrained from using the
vote-counting procedure. Testing the significance of results
obtained by a vote-counting procedure (using binomial statistics)
requires the assumption of independence: each study -- each sample
of subjects -- can contribute only one vote to the total. Multiple
measures of the same sample of subjects are not independent.
Sulloway's use of binomial statistics (the formulas he used to
produce statements such as "less than 1 in a billion billion") was
an error in elementary statistical reasoning.
Publication bias and the file-drawer test.
The most disturbing example of the blurring of the distinction between
"studies" and "findings" involves Sulloway's use of the "file-drawer
test." In his online essay (1998c) he took me to task for my failure
to bring his passing mark on the file-drawer test to the attention
of my readers. I will make up for that oversight now.
The file-drawer test is a method for dealing with a scientific
hazard, well known to statisticians, called publication bias
-- the tendency for statistically significant results to be
published and nonsignificant results, or results that are contrary
to expectations, to be stuffed in a file drawer and forgotten. As I
explained in my previous essay (see "Why Do People Believe that
Birth Order Has Important Effects on Personality?" on this website),
publication bias is a serious problem in medical research, where a
no-difference result can be important. For example, if there is no
difference in survival rates between patients who do and do not
undergo a certain surgical procedure, this is something that
physicians and patients would like to know. And yet, even in
medicine, studies that fail to find a significant difference are
less likely to be published and, if published, are slower to appear
in print (Ioannidis, 1998). Publication bias is one of the reasons
why meta-analyses of medical data sometimes lead to incorrect
conclusions (Sutton et al., 2000). When enough no-difference and
negative findings are left out, the overall outcome of a
meta-analysis can be shifted from no-difference to positive.
Medical researchers use a statistical technique called a
funnel plot to test for publication bias in a meta-analysis
(Egger et al., 1997). The size of the effect obtained in each study
is plotted against the number of subjects who participated in that
study. Publication bias results in a dearth of small studies showing
small or no-difference effects, because a study that yields
nonsignificant results (or results that are contrary to
expectations) is less likely to be published if it involved a
relatively small number of subjects. Thus, if effect sizes tend to
be larger for studies with small samples (causing the funnel-shaped
plot to lean to one side), it's an indication of publication bias.
In the table on page 372 of The Nurture Assumption, I
demonstrated a publication bias in the birth order studies reviewed
by E&A: 40 percent of findings from small studies, but only 19
percent from large ones, were significant and positive.
According to Sulloway (1995, p. 79; BTR, p. 472; 1998c),
publication bias cannot account for the results of his re-analysis
of the studies reviewed by E&A, because his results passed the
file-drawer test.
The file-drawer test was devised by psychologist Robert
Rosenthal (1987) as a quick and easy way of estimating the
probability that there are enough nonsignificant studies sitting
around in file drawers to invalidate the results of a meta-analysis.
All one needs to do is to compare two numbers: the number of
unpublished studies it would take to invalidate the results of a
meta-analysis of published studies, and the estimated number of
unpublished studies sitting around in file drawers. To calculate the
first number, Rosenthal gave this simple formula: 19s - n,
where s is the number of significant published studies and
n is the number of nonsignificant published studies. The
formula for the second number is equally simple: 5K +10,
where K is the total number of published studies. What Sulloway
apparently did was to make s = 72, n = 124, and
K = 196; the results he obtained were 1,244 and 990. He
concluded, "This number (1,244) exceeds 990, indicating that the
published findings pass the file-drawer test" (Sulloway, 1995, p.
79).
But 196, 72, and 124 aren't numbers of studies -- they're
numbers of findings! Sulloway nonetheless plugged those
numbers into Rosenthal's formulas, which he found on pages 224 and
225 of Rosenthal's book, disregarding Rosenthal's words of warning
on page 224. Rosenthal clearly stipulated that his formulas are
based on the assumption "that each of the K studies is
independent of all other K - 1 studies, at least in the sense
of employing different sampling units."
Sulloway's 196 findings did not employ 196 different sampling
units -- there were not 196 different samples of subjects. It was
wrong to use Rosenthal's formulas on those data.
How many votes?
Another claim that Sulloway made is that
I undercounted findings in E&A's review of birth order studies:
If, for example, a given study reported that firstborns are more
conscientious than laterborns, but also more agreeable, I counted
one confirmation and one refutation (in accordance with my formal
hypotheses for these two dimensions). By contrast, Harris classified
such "mixed" results as a single null [no-difference] outcome.
(Sulloway, 1998c)
This accusation is another red herring. E&A almost always
reported results for different aspects or measures of personality in
separate sections of their chapter and in different tables, even if
the results came from the same study. When they did this, I recorded
each finding as a separate vote. For example, Macbeth (1975) gave
her subjects a number of different tests, assessing their
affiliative need, achievement motivation, political views,
vocational interests, originality, and so on. The results of each of
these tests -- mostly no-difference -- were reported separately in
E&A, and each contributed a vote to my tally.
Occasionally E&A did report multiple findings from a single
study in the same table. For these cases Sulloway is correct: I did
give these studies a single vote. However, my undercounting mainly
involved no-difference findings, not positive findings, and thus
fails to explain why my vote-count didn't agree with his. My tally
produced about the same number of no-difference findings as
Sulloway's; the discrepancy involved only the number of positive
findings.
By way of illustration, consider the aforementioned Macbeth
(1975). In addition to the other kinds of tests Macbeth gave her
subjects, she also gave them several standard personality tests.
According to E&A (Table 37, p. 170), these tests turned up no
reliable differences between firstborns and laterborns on any
dimension of personality. I gave this outcome one no-difference
vote. In an unpublished document I received from him in 1998,
Sulloway (1998a) said that he gave Macbeth's results for the MMPI
(one of the personality tests she used) five no-difference
votes -- one for each of the five dimensions of personality. That
would have increased Sulloway's no-difference vote by four relative
to mine, but doesn't help to explain why he counted more positive
findings than I did.
My most questionable decision involved a study by Price (1969).
Price asked parents to judge their children's personalities and
found, according to E&A (Table 37, p. 170), that parents
considered their firstborns to be "more introverted, nervous,
precocious, and conforming" than their laterborns. Because I
regarded Price's result as favorable to Sulloway overall (despite
the result for introversion, which is contrary to Sulloway's
predictions), I gave Price one positive vote. In hindsight, I
probably should have given this study one negative vote for
introversion and two positives for nervousness and conformity
("precocious" is not a personality characteristic). Fortunately, in
the unpublished document he sent me, Sulloway (1998a, pp. 4-5)
explained how he decided on the number of votes to give Price. The
reasoning was intricate but the outcome was that he awarded Price
one positive vote, one negative vote, and three no-difference votes.
Thus, the difference between the way I counted Price (one positive
vote) and the way Sulloway said he counted Price involved only the
number of negative and no-difference votes. Again, the difference in
our procedures doesn't account for the difference in our tallies,
which involved only the number of positive votes.
The unpublished document in which Sulloway revealed how he
tallied Macbeth's and Price's findings was titled "Referee's
Report." I'll now explain the origins of this document.
The Referee's Report.
In October 1997, I submitted a
manuscript on birth order, including a highly critical review of
BTR, to the journal Psychological Science. It was turned down
on the basis of a single peer review. The reviewer was Frank J.
Sulloway. Sulloway's 20-page review, with 14 footnotes, bore the
heading "Referee's Report on Judith Harris's `Personality and Birth
Order: The Remarkable Resilience of Deeply Held Beliefs.'" He mailed
a copy of this document to me and another to the editor of the
journal, who sent me a second copy at the time he rejected my
manuscript. The review was signed. (Although the identity of
reviewers is usually concealed, they are free to reveal themselves
if they wish to do so.)
The copy I received directly from Sulloway was accompanied by a
letter (January 25, 1998) in which he explained that if my
manuscript were published in a journal he would want to "revise and
expand [his] referee's report for some form of publication."
However, I did not submit my manuscript to another journal; instead,
I revised it and it became Appendix 1 of The Nurture
Assumption. Consequently, as far as I know, Sulloway's report
was never published. In the endnotes of The Nurture
Assumption (p. 416) I cited the Referee's Report as an
unpublished manuscript. Sulloway referred to it in one of his online
essays (1998c), though he didn't give it a name.
Interactions.
The Referee's Report contained two
bombshells. The first had to do with how Sulloway handled
statistical interactions. An interaction is where different
subgroups of subjects produce different results for the same outcome
variable: for example, a significant difference between firstborns
and laterborns might be found for female subjects but not for males;
or a significant positive difference (that is, a difference in line
with predictions) might be found for middle-class subjects, while
working-class subjects yield a significant difference in the
opposite direction.
Sulloway correctly surmised that I gave such studies a single
vote: either one positive or negative vote (when one subgroup
produced significant results and the other did not), or one
no-difference vote (when the two subgroups produced results that
went in opposite directions). Bear in mind that I was not trying to
analyze the data in E&A's book: I was trying to replicate
Sulloway's method in an effort to figure out where he got his
numbers. If I had counted the first kind of interaction as one
positive and one no-difference result, I would have ended up with
too many no-difference results. If I had counted the second kind of
interaction as one positive and one negative, I would have ended up
with too many negative results. My tally already included about as
many no-difference and negative results as his.
That's why I was flabbergasted by Sulloway's statement in his
Referee's Report about how he handled interactions. "By way of
illustration," he said, "I coded all two-way interaction effects as
either one confirmation and one refutation, or as one null and one
significant finding (as determined by an inspection of the means).
Three-way interaction effects were coded for all four outcomes"
(1998a, p. 4).
Whoosh! The number of potential votes for Sulloway's tally had
suddenly multiplied like fruitflies in July. The question now
became: If a single study could contribute one vote for each
measured outcome variable, times two or four for each interaction,
why were there only 196 entries in Table 4 of BTR? In particular,
since most interactions consist of a positive or negative finding
plus a no-difference finding, why weren't there many more
no-difference findings in his tally?
The revelation about how Sulloway treated interactions meant
that the distinction between "studies" and "findings" assumed even
greater importance. The more findings a study was permitted to
contribute to Sulloway's vote-count, the fewer the studies that must
have contributed them. This is not a trivial matter. Sulloway
mentioned in BTR (p. 76) that Koch's study produced 31 significant
interactions involving birth order: "These nonadditive effects
involved birth order, subject's sex, sibling's sex, and age spacing
-- interacting in pairs, triplets, and even foursomes" -- that is,
two-way, three-way, and four-way interactions. A four-way
interaction, by the rule Sulloway gave in his Referee's Report,
would produce eight findings. How many votes did Koch's sample of
384 five- and six-year-olds contribute to Table 4? We don't know,
because Sulloway hasn't provided a list of his 196 findings and
there's no way of figuring it out from the information he did
provide.
Even if Sulloway imposed some limit on the total number of votes
a single study could contribute, it is wrong to count interactions
in the way he described, because it allows a study that resulted in
an interaction to contribute more votes to the tally than a study
that produced a simple no-difference result. If Sulloway counted a
study that produced a positive result for females and a
no-difference result for males as one positive vote and one
no-difference vote, then he should have counted a study that
produced no-difference results for both sexes as two
no-difference votes.
In any event, the way I treated interactions cannot account for
the discrepancy between my tally and his, because the discrepancy
involved the number of favorable votes, not the number of negative
or no-difference votes. But Sulloway's claim that he counted many
more findings per study than I did implies that his tally must have
included considerably fewer studies than mine (I had calculated that
the 179 findings in my tally came from 116 studies), which raises
the question of how he decided which studies to include and which to
leave out.
The "Errors" in E&A.
The second bombshell in the
Referee's Report (Sulloway, 1998a) had to do with the "errors" in
E&A. According to Sulloway, another reason why my tallies don't
agree with his is that I accepted E&A's summaries of the studies
they reviewed, whereas he consulted the original documents -- which
included unpublished doctoral and masters' dissertations, abstracts
of talks given at professional meetings, and articles published in
obscure foreign journals such as Acta Psychologica Taiwanica.
Often, he said, these documents produced evidence that, in his
judgment, conflicted with E&A's reports. When that occurred, he
"rectified" E&A's "errors."
Sulloway's methodology, which I was trying to reproduce in order
to confirm or disconfirm his tallies, was looking more and more like
a moving target. In his 1995 commentary and in the hardcover edition
of BTR (1996), the footnote under the table simply said "Data are
tabulated from E&A (1983:93-189)." The news about rectifying
errors was added to the endnotes of the paperback edition (1997,
p. 472), published in September, 1997 -- about eight months after
Modell (1997) announced, in the pages of Science, that he had
been unable to replicate Sulloway's results. However, I was unaware
of this addition to the endnotes until I received Sulloway's
Referee's Report in January, 1998.
In the Referee's Report, Sulloway said that he did not accept
Ernst and Angst's survey of the birth-order literature "at face
value" but instead went back to the original publications and found
"more than forty errors" in their book. "The net result of
correcting these errors," he continued, "is to increase [Harris's]
number of confirming results by about 22, and to reduce her number
of nulls by about 15" (1998a, pp. 2-3). In other words, of more than
40 errors Sulloway claims to have corrected, 37 resulted in changes
in his favor. If E&A's errors were random, only about half of
the corrections should have resulted in a favorable change, so this
statistically unlikely outcome requires an explanation. Here's how
Sulloway explained it: "Ernst and Angst's errors in reporting have a
tendency to favor their own viewpoint, namely, that there are
negligible birth-order differences in controlled studies" (1998a, p.
3n).
This is an accusation of bias: Sulloway is claiming that E&A
were biased against reporting significant birth order effects. Yet
in BTR (p. 472, hardcover and paperback), he had praised them:
"Researchers owe a considerable intellectual debt to Ernst and Angst
(1983) for their systematic analysis of the birth order literature."
Why did he say that, if he had already discovered evidence of
systematic bias in their survey? For that matter, why did he use
their review at all, if he was going to ignore what they said and go
back to the original reports of the studies? And if he was going to
consult the original reports, why did he restrict his
"meta-analysis" to studies done before 1981, as E&A had? Thanks
to E&A, later studies tended to be of better quality; why hadn't
Sulloway included them?
Evidently the Referee's Report looks convincing to someone who
hasn't pored over the relevant pages of BTR and E&A -- the
editor of Psychological Science turned down my critique of
BTR on the basis of this document. But for me it raised more
questions than it answered. Take, for instance, the issue of whether
Sulloway's "196 controlled studies" were controlled for both social
class and sibship size, or controlled for either social class
or sibship size -- a distinction that turns out to have
important ramifications, as Townsend (in press) has demonstrated. In
his Referee's Report (1998a, p. 3), Sulloway quoted the footnote
under Table 4 in BTR as follows: "Data are tabulated from Ernst and
Angst (1983:93-189), using only those studies controlled for social
class and sibship size." But the footnote in the book (both
hardcover and paperback) actually reads "controlled for social class
or sibship size" (p. 73, emphasis mine). In converting a
lenient rule into a stricter one, Sulloway had misquoted from his
own book!
What was I to make of the claim of errors in E&A and the
accusation that they were biased? As Sulloway knew, I do not have
access to a university library (see the author profile at
tna/bio.htm). Thus, though the
Referee's Report included a list (titled "Errors in Ernst and Angst's
Literature Review") of the studies on which he and E&A had come
to different conclusions, I was unable to resolve the disagreements
by consulting the original sources myself. However, I thought I
would at least be able to determine from Sulloway's list which
studies had contributed findings to Table 4 and which had been
rejected. But my optimism proved to be unfounded.
I ran into problems immediately. The first item in Sulloway's
list of errors, under the heading "Studies erroneously reported as
being controlled for sibship size or social class," was Corsello,
1973. The heading implies that I should not have included these
studies in my tally -- but I had not included Corsello, 1973, in my
tally! What Corsello studied, according to E&A (p. 93), was not
personality as a function of birth order but "perception of
differential treatment by parents." No one doubts that siblings
think their parents treat them differently; the question is whether
this perceived differential treatment affects their personalities.
There were several studies in E&A's review in which the outcome
variable was perception of differential treatment rather than
personality; I included none of them in my tally because they
weren't relevant. Sulloway eliminated one. Were the other studies of
this type included in his tally? My guess would be that they were
but I have no way of knowing for sure.
Another problem was the vagueness of the headings in Sulloway's
list. Twelve studies were listed under the heading "Studies
erroneously reported as not being statistically significant that
are significant (or that involve doubt as nulls)" (1998a, p.
19). Was Sulloway saying that studies that "involve doubt as nulls"
contributed favorable votes to Table 4 of BTR? And what about the
eleven studies listed under the heading "Miscellaneous studies whose
findings are reported in an incomplete, inaccurate, or otherwise
problematic manner" -- did they contribute data to Table 4? Even if
I consulted the original publications in the "miscellaneous" list
(e.g., Yang & Liang's 1973 paper in Acta Psychologica
Taiwanica), I still wouldn't know whether or not Sulloway had
included their findings in Table 4.
Show me the data.
Such questions ordinarily don't come
up, because (as Falbo, 1997, pointed out) when researchers publish
the results of a meta-analysis it is customary to include a list of
the studies that contributed data to it. But Sulloway did not do
that. In the Referee's Report he said that Price's study contributed
five votes to his tally and Macbeth's result for the MMPI another
five, but as far as I know he has not provided that sort of specific
information for any other study. His list of "errors" in E&A
cleared nothing up but only added to the confusion. If he wanted me
to know which of the studies in E&A's survey he had included and
which he had rejected, and how many findings each had contributed,
why didn't he just send me a list of the 196 findings in Table 4 of
BTR?
On July 25, 1998, I gave up trying to replicate Sulloway's
methods and wrote to him, asking for a list of his 196 findings.
Here is his reply, dated August 11, 1998:
I have always made my data available to other researchers, and I
am happy to make my meta-analytic data available to you. I have been
planning, in any event, to put these data on the internet in order
to make them available to anyone else who wants them.
A year later (July 28, 1999) Sulloway informed me in a letter
that he was almost ready to send me the information I had requested.
However, there was a string attached: I would have to agree to "ask
[his] permission before giving any copies of these data to anyone
who lacks formal accreditation as a scientist -- for example, by not
having a Ph.D. degree in the social or behavioral sciences." (As
Sulloway knows, I don't have a Ph.D.) Since I believe that these
data should have been included in the original publication of
Sulloway's "meta-analytic" results, I refused to agree to his
restriction unless he could "provide me with a convincing reason why
this information must be kept away from the hoi polloi." He has not
replied; nor has he sent me the list of his 196 findings. To my
knowledge he has not carried out his 1998 plan "to put these data on
the internet in order to make them available to anyone else who
wants them."
(My own list of 179 findings tabulated from E&A -- who
occasionally made mistakes but who I don't believe were biased -- is
available on this website.)
Sulloway (1998c) has accused me of withholding from my readers
the information that he corrected "errors" in E&A. The truth is
that I did tell my readers (see p. 369 of The Nurture
Assumption) that Sulloway went back to many of the original
studies in E&A's review, that he formed his own opinions about
how they turned out and whether they used the proper controls, and
that his opinions often differed from E&A's. However, I remained
neutral on the question of whether these differences of opinion
were, in fact, errors on E&A's part. I pointed out, on one hand,
that correcting errors is a legitimate procedure in a meta-analysis;
but, on the other hand, that Sulloway's corrections almost always
resulted in changes that were favorable to his theory. "Sulloway
believes that E&A were biased against finding birth order
effects," I explained, leaving it up to my readers to decide what to
make of this information.
For those who are still undecided, there is additional
information now. Townsend (in press) has done what I was unable to
do: go back to the original reports of the studies on Sulloway's
list of "errors" in E&A. In his online essay, Sulloway (1998c)
had offered to send that list (though not the list of his 196
findings) to anyone who requested it. Townsend requested and
received a copy. The results of his investigation will be published
soon; here is a preview of his conclusions: "In other words, the
`reporting errors' were mostly Sulloway's, not Ernst and Angst's."
Townsend doesn't ask his readers to accept his conclusions on faith:
he provides detailed information about the studies he reviewed and
lists their positive, negative, and no-difference findings.
The bottom line.
In my previous essay on this website I
defined confirmation bias as "the tendency to seek, notice,
and remember evidence that confirms one's belief, and to ignore,
forget, or explain away contrary evidence." It's a universal human
failing. Sulloway accused E&A of bias and implied that they were
motivated to find evidence to support their belief that birth order
does not have important effects on personality. But bias could work
the other way as well. Confirmation bias could cause a person who
was convinced that birth order does have important effects to
question E&A's opinion whenever they reported that a study
yielded no-difference or negative evidence, and to let their opinion
stand whenever they reported that a study yielded positive evidence.
Thus, the process of rectifying errors could itself be biased in a
way that could affect the outcome.
A more serious problem is the fact that we know neither the
number nor the identity of the studies that contributed to the 196
findings in Table 4 of BTR. The tallying method Sulloway described
in his Referee's Report -- recording multiple findings from the same
study and counting interactions as two, four, or eight findings --
greatly increases the number of findings that a single study could
contribute to the table, and consequently decreases the number of
studies that contributed these findings. In the pages of E&A
(pp. 93-189) that Sulloway cites as the source of his data, there
were a number of studies that I would predict would produce
significant, though misleading, birth order effects -- studies in
which siblings were asked about differential treatment by parents,
for example, or where the data consisted of judgments by parents or
siblings. Many of these studies produced multiple findings. Koch's
study of five- and six-year-olds (31 interactions multiplied by two,
four, or eight) could alone have produced half the findings in Table
4! Koch's findings included judgments of differential treatment by
parents (firstborns felt less favored by the mother) and judgments
of sibling interactions by the siblings themselves (e.g.,
secondborns wanted to play with their older sibling more than
firstborns wanted to play with their younger one). Neither is a
measure of adult personality or even of child personality: both are
putative causes of personality differences between firstborns
and laterborns, rather than the personality differences
themselves.
Statements made by Sulloway about Table 4, such as "The
likelihood . . . is less than 1 in a billion billion" (BTR, p. 72)
and "These birth-order findings also pass the File Drawer Test" (p.
472) are based on a misuse of statistics. The statement that the
data in the table come from studies "involving 120,800 subjects" is
also incorrect. And Sulloway's explanation for my failure to
replicate his tally doesn't hold water.
Ernst and Angst's Own Study of Birth Order and
Personality
As E&A were aware, many of the studies they reviewed were so
poorly designed that they deserved to be consigned forever to the
file drawer of history. That's why E&A did a study of their own
-- to confirm or disconfirm the results of their survey. It was one
of the largest studies of birth order and personality ever done,
larger than any in their survey. E&A measured twelve different
aspects of personality (including Sulloway's favorite, openness) in
7,582 young adults in Zurich, controlling for family size and
socioeconomic status. They found no significant birth order effects
in families with two children -- the firstborn did not differ from
the secondborn in any dimension of personality. In families of three
or more, one small but significant effect turned up: the lastborn
was slightly lower in masculinity. These results were reported by
E&A in the same 1983 book, right after their survey of the
literature.
In an endnote in BTR (p. 475), Sulloway referred to E&A's
finding on masculinity ("On masculine versus feminine attitudes and
birth order, see Ernst and Angst 1983:259-60") without mentioning
that this finding came from a study done by E&A themselves. The
other results of E&A's study -- the no-difference findings they
obtained for all other dimensions of personality -- are not
mentioned at all in BTR, even though they were reported on the same
pages of E&A (pages 259-60) that Sulloway cited for the finding
on masculinity. Nor were any of these no-difference findings
included in the tally in Table 4 of BTR. (The data in Table 4,
according to the note under the table, came from pages 93-189 of
E&A's book.)
Why did Sulloway fail to mention that E&A carried out a
major study of their own or include their findings in his tally?
Surely it can't be because he thought they were biased: if he were
going to leave out all researchers who held an opinion about birth
order before they did their research, his list of findings would be
very short indeed! And surely it can't be because E&A used a
self-report personality test, because the MMPI is also a self-report
personality test, and Sulloway said in his Referee's Report that he
gave Macbeth's (1975) results for the MMPI five votes in his tally.
The Validity of Self-Report Personality Tests
It is true, however, that Sulloway (1998c, 1999) has expressed
skepticism about self-report personality tests. These are
paper-and-pencil questionnaires that require subjects to make
judgments about themselves, usually by agreeing or disagreeing with
statements describing characteristic behaviors, feelings, and
attitudes. Because they are easy to give to large numbers of
subjects, most major studies of birth order effects on personality
use self-report tests. E&A's results were typical of the outcome
produced by these studies: no significant birth order effects, or
one or two effects of negligible size that fail to be replicated in
other studies.
Sulloway himself has been forthright in admitting this outcome:
"When assessed by self-report questionnaires, birth-order effects
are typically modest and nonsignificant." But, as he pointed out
correctly, another type of test generally does yield significant
effects: "Yet systematic differences by birth order are generally
found when parents rate their own offspring or when siblings compare
themselves to one another" (1999, p. 192). If the two kinds of tests
produce conflicting results, which should we believe? Sulloway
favors the kind in which the judgments are made by parents or
siblings; he alleges that self-report tests have "serious problems"
(1998c).
As I've explained in print (Harris, 2000), in online writings
(Harris, 1998a, 1999), and in my previous essays on this website,
the results of both kinds of tests make sense if you take context
into account. The tests involving judgments by parents and siblings
give a valid picture of how the subjects behaved (and probably still
do behave) in the context of the family they grew up in. The
self-report tests, usually administered in a classroom or
laboratory, give a valid picture of how subjects behave when they're
not in the presence of their parents or siblings -- how they behave
in the world in which they live as adults. The fact that birth order
effects are found in "all-in-the-family" tests but not in other
kinds of tests is a confirmation of the theory I presented in The
Nurture Assumption (see "Why Are Birth Order Effects Dependent
on Context?" on this website).
But Sulloway, like most contemporary Americans and Europeans,
believes that the experiences children have with their parents and
siblings leave permanent marks on their personalities and affect the
way they behave in all areas of their adult lives. If this is the
case, why do self-report personality tests fail to show reliable
birth order effects?
Sulloway (BTR, 1998c, 1999) has offered three reasons why we
shouldn't take the results of self-report tests too seriously.
First, he believes that subjects -- at any rate, firstborns -- are
unlikely to respond to them truthfully: "How many firstborns are
willing to describe themselves as `callous' or `unadventurous'?"
(BTR, p. 474). This question betrays, among other things, a lack of
understanding of the way personality tests are constructed and
scored. Subjects' responses are not taken at face value -- a subject
is not scored high in agreeableness because she describes herself as
easy to get along with. Instead, the outcome depends on the
pattern of the subject's responses. Certain patterns have
been found to be associated with certain personality
characteristics. These tests are sophisticated instruments -- honed
over the years to make them more accurate, validated by
cross-checking their results with other methods of assessing
personality. As Jefferson, Herbst, and McCrae pointed out, "There is
vastly more evidence supporting the validity of self-reports than
there is supporting effects of birth order" (1998, p. 507).
Sulloway's second criticism of self-report tests is that studies
that do not directly compare siblings within the same family might
produce spurious results due to "confounding effects associated with
differences between families" (1999, p. 192). This is true. However,
some of the researchers who used self-report questionnaires
(Freese, Powell, & Steelman, 1999; Hauser, Kuo, & Cartmill,
1997) have performed within-family analyses, directly comparing the
responses of siblings in the same family, and nonetheless failed to
find significant birth order effects.
The final reason Sulloway has given (1998c, 1999) for not
trusting the results obtained with self-report personality tests is
that these tests use an "unanchored" method for measuring
personality: the subjects are required to judge themselves against a
hypothetical norm or average, rather than compare themselves to a
specific person such as a sibling. The implication is that
unanchored judgments are less accurate and therefore less valid than
anchored judgments.
There are at least three things wrong with that allegation.
First, a comparison with a hypothetical norm or average (even one
that is estimated by the subject) is a more valid measure than a
comparison with a specific individual, for the simple reason that
individuals vary over a wider range than averages do. Second, the
self-report questionnaires I have seen don't require comparisons
with a hypothetical norm -- they require subjects to express their
degree of agreement or disagreement with statements such as "I feel
nervous when I'm talking to someone I don't know very well," or "I
believe that convicted murderers should be put to death."
Third, as I already mentioned, responses on personality tests
are not taken at face value. They differ in this respect from
questionnaires designed, for example, to find out how much pain the
respondents are feeling or how depressed they are. As psychologist
Linda Bartoshuk has pointed out (in Goode, 2001), we can't conclude
that two people who both give their pain a score of 9 are feeling
the same amount of pain. But that's not how personality tests work.
A tendency to always check off high numbers, for example, is itself
an indication of one aspect of personality. Because these tests look
for patterns of responses and are validated by comparisons with
other ways of assessing personality (ratings by other people, for
example), the absence of an objective standard is not a problem.
In casting doubt on the results of self-report personality
tests, Sulloway has painted himself into a corner. In BTR (p. 68) he
described the "Big Five" theory of personality (McCrae & Costa,
1987) and said he used it as his guide: "Using the Big Five as my
guide, I offer here a psychodynamic account of birth order
differences." The Big Five theory of personality is based primarily
on the results of self-report tests -- the same personality tests
that fail to substantiate Sulloway's theory.
These same tests provide the scientific underpinnings of
Sulloway's theory: "The crux of my argument stems from a remarkable
discovery. Siblings raised together are almost as different in their
personalities as people from different families" (BTR, p. xiii).
Sulloway was exaggerating -- biologically related siblings are a
good deal more alike than people from different families -- but it
is true that there are hard-to-explain personality differences
between siblings. "This finding," Sulloway continued, was "firmly
established by studies in personality psychology." Yes, and most of
these studies used the same self-report personality tests that fail
to substantiate Sulloway's theory. (See "Why Can't Birth Order
Account for the Differences Between Siblings?" on this website.)
Sulloway (1998c) contrasts self-report data with "real-life
data"; he claims his theory is supported by the latter, even if it
is not supported by the former. Real-life data are collected, not in
artificial situations concocted by researchers, but in the world
outside the laboratory. According to Sulloway, "Parents value a
child's doing well in school, so firstborns are conscientious, do
their homework, generally do better at school," whereas laterborns
"are more likely to challenge the status quo, and they are more
likely to cause their parents aggravation by doing all sorts of
outrageous things" (1998b). Real-life data on tens of thousands of
people contradict these statements. On the average, if family size
and social class are controlled, firstborns do no better in school
than laterborns. Laterborns are no more likely than firstborns to
aggravate their parents by underachieving in grade school, dropping
out of high school, or failing to go to college (Blake, 1989;
E&A, 1983; McCall, 1992).
The Historical Evidence in Born to Rebel
I am at a disadvantage in assessing the historical data in BTR;
my field is developmental psychology. But others who have looked
closely at the historical data -- most notably Townsend -- have
found as many anomalies and unanswered questions as I found in
Sulloway's treatment of E&A's data. Townsend's lively and
revealing article is scheduled to appear, possibly with a reply by
Sulloway, in the journal Politics and the Life Sciences.
One problem that several of Sulloway's critics have noted is
that the research methods used to produce his historical data are
particularly vulnerable to the distorting effects of confirmation
bias. For example, Jeremy Freese and his colleagues have pointed out
that Sulloway's distinction between "functional" birth order and
biological birth order is a possible source of confirmation bias,
due to the difficulty of ferreting out information about the family
background of historical subjects. "This raises the possibility,"
they said, "that the `functional' birth status of some sample
members was investigated more thoroughly than others, precisely
because they otherwise would have been exceptions to the study's
general findings" (Freese et al., 1999, p. 213n; italics in the
original).
Critics have also questioned the way Sulloway categorized the
scientific controversies he studied and the scientists who
contributed to them. "It is uncertain whether Sulloway was blind to
birth order of individuals and to their scientific contributions
when making various decisions," David Rowe noted (1997, p. 365).
Michael Ruse commented on the "flexible" nature of Sulloway's
decisions: "When faced with the first-born counter example of James
Watson, surely the author of one of the most significant scientific
breakthroughs of our period, Sulloway denies that this counts as a
genuine revolution!" (Ruse, 1997, p. 373). Similar complaints have
been made about the way Sulloway has explained away other apparent
counter-examples: for example, the revolutionaries Ché
Guevara and Mao Tse-tung, who -- inconveniently for Sulloway's
theory -- were firstborns (Townsend, in press).
Although Sulloway consulted 94 historians of science and had
them rate the participants in each scientific controversy (BTR, p.
395), these ratings themselves may be another source of bias. As
Freese and his colleagues pointed out, the method used for obtaining
the historians' ratings makes them susceptible to "interviewer
effects":
One problem is that the ratings from historians were all
obtained through in-person interviews conducted by Sulloway well
after he began constructing his arguments about birth order. . . .
As a result, the possibility of substantial interviewer effects
cannot be ruled out (Freese et al., 1999, pp. 224-225).
Sulloway went to great lengths to collect these ratings in
person:
I flew a quarter of a million miles around the world as I
gathered these expert ratings from scholars in England, France,
Germany, Italy, and America. (Sulloway, 1998b)
Oddly enough, most social scientists -- who ought to know about
such things -- don't worry much about the dangers of confirmation
bias or interviewer effects. Medical researchers tend to be more
cautious. Over the years, they have developed elaborate procedures
designed to prevent researchers' biases from influencing the outcome
of medical studies. The use of these procedures makes modern medical
research a pain in the neck, but they are used nonetheless because
experience has shown that they are necessary. Without them, bias
inevitably creeps in and researchers tend to find the results they
are looking for. As physicist Robert Park has warned,
Alas, many "revolutionary" discoveries turn out to be wrong.
Error is a normal part of science, and uncovering flaws in
scientific observations or reasoning is the everyday work of
scientists. Scientists try to guard against attributing significance
to spurious results by repeating measurements and designing control
experiments, but even eminent scientists have had their careers
tarnished by misinterpreting unremarkable events in a way that is so
compelling that they are thereafter unable to free themselves of the
conviction that they have made a great discovery. Moreover,
scientists, no less than others, are inclined to see what they
expect to see, and an erroneous conclusion by a respected colleague
often carries other scientists along on the road to ignominy. (Park,
2000, p. 9)
The Difficulties of Confirming or Disconfirming Sulloway's
Data
When someone thinks up a new theory and collects data to support
it, the usual way of introducing the theory and presenting the data
is to publish an article in an academic journal. In order to get the
article through the process of peer review, certain standards have
to be met. The data, if newly collected by the theorist, have to be
presented in a clear and detailed way -- clear and detailed enough
so that if readers have any doubts they can repeat the study
themselves and see if they get the same results. Either Sulloway
decided not to go that route or he tried it and was turned down. The
article (Sulloway, 1995) in which he first presented the data shown
in Table 4 of BTR was a commentary in Psychological Inquiry
on someone else's article. Commentaries are not ordinarily subjected
to peer review. (I've written three commentaries for academic
journals, including one for Psychological Inquiry; all were
accepted without peer review. My other journal articles [see the
publication list at
tna/bio.htm#pubs] did go
through the peer-review process.)
Skeptical readers of BTR are stymied by the imprecise
descriptions of methods and results. Compare the following two
statements, the first from Sulloway's 1995 commentary, the second
from BTR:
During the century preceding publication of Darwin's Origin
of Species (1859), individual laterborns were four times more
likely than firstborns to support evolution. In some decades, these
group differences were as great as 10 to 1. (Sulloway, 1995, p. 80)
During the long period of debate preceding publication of
Darwin's Origin of Species (1859), individual laterborns were
9.7 times more likely than individual firstborns to endorse
evolution. (BTR, 1996, p. 33)
Was the dramatic increase in laterborn support, from "four times
more likely" to "9.7 times more likely," due to a change in the
timespan being considered -- that is, does the second of these
statements refer to a particular decade, rather than the entire
century? I was unable to answer that question, or to find any other
plausible explanation for the discrepancy, by examining the
information provided in the book.
Nor am I the only one who has been frustrated by the confusing
way data are presented in BTR. Here's historian and sociologist John
Modell:
At an extreme, one has to go to six places -- text, table, table
note, endnote, appendix, and bibliography -- to make sense of a
given operation, a task made all the harder by the author's careless
diction. This heedless intricacy, further exacerbated by the
omission of conventional information about sample sizes, the
distribution of values of variables employed, and the proportion of
values missing and imputed, surely will lead most readers to throw
up their hands, either accepting the author's procedures on faith or
dismissing the book out of hand. (Modell, 1997, p. 625)
Psychologist Toni Falbo:
In fact, there are no tables of descriptive statistics in the
text or the appendixes. Within the text are tables of correlations
and odds ratios and figures generated from sophisticated analyses,
but no simple presentation of the frequency of people falling into
key categories, such as firstborns, later borns, supporters of
radical ideas and their critics. This information is essential for
evaluating the evidence. . . . Considering the cavalier
way that Sulloway uses statistics, it is not surprising that after
presenting all his evidence, he concludes that the "effects of birth
order transcend gender, social class, race, nationality, and -- for
the last five centuries -- time" (p. 356). Anyone who is qualified
to review for an American Psychological Association journal would
find this statement unsupported by the evidence presented. (Falbo,
1997, p. 939)
Sociologists Jeremy Freese and Brian Powell:
Overall, while many appendices are extraordinarily detailed,
Sulloway is frustratingly unclear about some of his most important
measures and the details of his models. (Freese & Powell, 1998,
p. 58)
In an attempt to replicate some of the findings reported in BTR,
Freese and his colleagues did a study of their own (Freese, Powell,
& Steelman, 1999). They used questionnaire data on social
attitudes, collected from 1,945 subjects plus 1,115 siblings of
these subjects, to test Sulloway's claim that firstborns are more
conservative, supportive of authority, and punitive than laterborns.
"We find no support for these claims," these researchers concluded
(p. 207). Even the nonsignificant effects didn't go in the right
direction.
It is never surprising when the originator of a theory produces
evidence that supports the theory. The real test of a theory is
whether other people, working independently of the originator of the
theory, produce evidence that supports it. As psychologist John
McDonagh summed it up, "The dialogue that is the very process of
science insists that studies be replicated by researchers
independent of the original researchers before the scientific
community at large accepts the findings as valid" (2000, p. 678).
Acknowledgments
I thank Jeremy Freese, Charles S. Harris, John Modell, Richard
G. Rich, Carol Tavris, and Frederic Townsend for their helpful
comments on earlier versions of this essay.
References
Blake, J. (1989, July 7). Number of siblings and educational
attainment. Science, 245, 32-36.
Blustein, E. S. (1967). The relationship of sib position in the
family constellation to school behavior variables in elementary
school children from 2-child families. Ph.D. thesis, University of
Maryland. Dissertation Abstracts International, 28-B,
3046-3047 (1968).
Buss, D. M. (1995). Evolutionary psychology: A new paradigm for
psychological science. Psychological Inquiry, 6, 1-30.
Corsello, P. (1973). Birth order and children's perceptions of
love, authority, and personal adjustment. Dissertations Abstracts
International, 34-A, 3132.
Egger, M., Smith, G. D., Schneider, M., & Minder, C. (1997,
September 13). Bias in meta-analysis detected by a simple, graphical
test. British Medical Journal, 315, 629-634.
Ernst, C., & Angst, J. (1983). Birth order: Its influence
on personality. Berlin, Germany: Springer-Verlag.
Falbo, T. (1997). To rebel or not to rebel? Is this the birth
order question? Contemporary Psychology, 42, 938-939.
Freese, J., & Powell, B. (1998). Review of Born to
Rebel. Contemporary Sociology, 27, 57-58.
Freese, J., Powell, B., & Steelman, L. C. (1999). Rebel
without a cause or effect: Birth order and social attitudes.
American Sociological Review, 64, 207-231.
Goode, E. (2001, Jan. 2). Researcher challenges a host of
psychological studies. New York Times, pp. C1, C7.
Harris, J. R. (1998a, June). How is personality formed?
(commentary on a online talk by Frank J. Sulloway). Edge
(
http://www.edge.org/3rd_culture/harris/harris_index.html).
Harris, J. R. (1998b).
The nurture assumption: Why children turn out the way they do.
New York: Free Press.
Harris, J. R. (1999, June). Children don't do things half way: A
talk with Judith Rich Harris. Edge
(
http://www.edge.org/3rd_culture/harris_children/harris_p1.html).
Harris, J. R. (2000). Context-specific learning, personality,
and birth order. Current Directions in Psychological Science,
9, 174-177.
Hauser, R. M., Kuo, H.-H. D., & Cartmill, R. S. (1997,
March). Birth order and personality among adult siblings: Are
there any effects? Paper presented at the annual meeting of the
Population Association of America, Washington, DC.
Ioannidis, J. P. A. (1998, January 28). Effect of the
statistical significance of results on the time to completion and
publication of randomized efficacy trials. Journal of the
American Medical Association, 279, 281-286.
Jefferson, T., Jr., Herbst, J. H., & McCrae, R. R. (1998).
Associations between birth order and personality traits: Evidence
from self-report and observer ratings. Journal of Research in
Personality, 32, 498-509.
Koch, H. L.: E&A list 13 different publications in which
Koch reported the results of her study of 384 five- and
six-year-olds. For example: Koch, H. L. (1955). Some personality
correlates of sex, sibling position, and sex of sibling among five-
and six-year-old children. Genetic Psychology Monographs, 52,
3-50.
LeLorier, J., Grégoire, G., Benhaddad, A., Lapierre, J.,
& Derderian, F. (1997, August 21). Discrepancies between
meta-analyses and subsequent large randomized, controlled trials.
New England Journal of Medicine, 337, 536-542.
Macbeth, B. L. (1975). Birth order, personality, and scholastic
aptitude. Thesis, Department of Psychology, University of Oregon.
Dissertation Abstracts International, 36-B, 4757 (1976).
McCall, R. B. (1992). Academic underachievers. Current
Directions in Psychological Science, 3, 15-19.
McCrae, R. R., & Costa, P. T., Jr. (1987). Validation of the
five-factor model of personality across instruments and observers.
Journal of Personality and Social Psychology, 52, 81-90.
McDonagh, J. (2000). Science without a degree of objectivity is
dead. American Psychologist, 55, 678.
Modell, J. (1997, January 31). Family niche and intellectual
bent (review of Born to Rebel). Science, 275,
624-625.
Park, R. (2000). Voodoo science: The road from foolishness to
fraud. New York: Oxford University Press.
Petticrew, M. (2001, January 13). Systematic reviews from
astronomy to zoology: Myths and misconceptions. British Medical
Journal, 322, 98-101.
Price, J. (1969). Personality differences within families:
Comparison of adult brothers and sisters. Journal of Biosocial
Science, 1, 177-205.
Rosenthal, R. (1987). Judgment studies: Design, analysis, and
meta-analysis. Cambridge, UK: Cambridge University Press.
Rowe, D. C. (1997). Review of Born to Rebel. Evolution
and Human Behavior, 18, 361-367.
Ruse, M. (1997). Review of Born to Rebel. Evolution
and Human Behavior, 18, 369-373.
Sulloway, F. J. (1995). Birth order and evolutionary psychology:
A meta-analytic overview (commentary on target article by Buss).
Psychological Inquiry, 6, 75-80.
Sulloway, F. J. (1996). Born to rebel: Birth order, family
dynamics, and creative lives (hardcover edition). New York:
Pantheon.
Sulloway, F. J. (1997). Born to rebel: Birth order, family
dynamics, and creative lives (paperback edition). New York:
Vintage.
Sulloway, F. J. (1998a, January). Referee's Report on Judith
Harris's "Personality and Birth Order: The Remarkable Resilience of
Deeply Held Beliefs." Unpublished document: peer review of a
manuscript submitted to Psychological Science.
Sulloway, F. J. (1998b, May). How is personality formed? A talk
with Frank Sulloway. Edge (
http://www.edge.org/3rd_culture/sulloway/index.html).
Sulloway, F. J. (1998c, November). Birth order and the nurture
misassumption: A reply to Judith Harris. Edge (
http://www.edge.org/3rd_culture/sulloway_harris/).
Sulloway, F. J. (1999). Birth order. In M. A. Runco & S.
Pritzker (Eds.), Encyclopedia of creativity (vol. 1, pp.
189-202). San Diego, CA: Academic Press.
Sutton, A. J., Duval, S. J., Tweedie, R. L., Abrams, K. R.,
& Jones, D. R. (2000, June 10). Empirical assessment of effect
of publication bias on meta-analyses. British Medical
Journal, 320, 1574-1577.
Townsend, F. (in press). Birth order and rebelliousness:
Reconstructing the research in Born to Rebel. Politics and
the Life Sciences. (Townsend's email address:
FredT17@aol.com)
Yang, K. S. & Liang, W. H. (1973). Some correlates of
achievement motivation among Chinese high school boys. Acta
Psychologica Taiwanica, 15, 59-67.
|