To the birth order page

To The Nurture Assumption home page

The Mystery of Born to Rebel:
Sulloway's Re-Analysis of Old Birth Order Data

by Judith Rich Harris

In my previous three essays on this website, I explained that birth order effects can be detected when people are in the presence of their parents and siblings but not in other social contexts, and that standard personality tests generally show no birth order effects. Yet in the book Born to Rebel (1996), Frank J. Sulloway claimed that birth order effects are ubiquitous and important -- capable of changing the very course of history! -- and presented a plethora of data to back up his claim. Before I can dismiss birth order effects as unimportant, I have to look carefully at those data and consider what Sulloway said about them in Born to Rebel (BTR for short) and elsewhere.

According to Sulloway, children develop different personalities depending on their birth order. Their childhood experiences with parents and siblings affect the things they do, the attitudes they hold, and the way they behave to others all through life. Sulloway characterizes firstborns as ambitious, domineering, jealous, aggressive, conventional, and close-minded; he depicts laterborns as rebellious, adventurous, agreeable, sympathetic, and receptive to new ideas. I might as well confess right off the bat: I am a firstborn. Sulloway, as you may have guessed, is a laterborn.

Though much of Sulloway's data is historical, the part of his book that made the biggest impression on readers in the academic world was his re-analysis of data from a 1983 book on birth order by the Swiss researchers Cécile Ernst and Jules Angst (here called E&A). E&A's book includes a comprehensive review of hundreds of previous birth order studies. The studies reviewed by E&A, now 22 to 56 years old, are the only "modern" data Sulloway presents in BTR -- the only data that come from systematic studies of the personality or behavior of ordinary people. Sulloway's re-analysis of the studies in E&A's review is the keystone of his evidence. The results of this re-analysis were presented in Table 4 (p. 73) of BTR and, a year earlier, in a commentary (Sulloway, 1995) on an article by David Buss in the journal Psychological Inquiry.

E&A did a thorough and conscientious job of reviewing the birth order literature. They examined research reports from all over the world, placing the most weight on carefully done studies that controlled for sibship size (the number of offspring in a family) and socioeconomic status. Here is their conclusion:

Birth order and sibship size do not have a strong impact on personality. . . . An environmental variable that is considered highly relevant is thus disaffirmed as a predictor for personality and behavior. (E&A, 1983, p. 284, emphasis theirs)

But Sulloway re-analyzed the data from the same set of studies and came to a very different conclusion. He maintains that if the data from the well-done studies are combined in what he refers to as a "meta-analysis," the following result pops out clearly:

In spite of occasional negative findings, the literature on birth order exhibits consistent trends that overwhelmingly exceed chance expectations. (Sulloway, 1996, pp. 72-74).

How did Sulloway look at the same data -- old data that had already been winnowed many times by others -- and find gold where the others had found only chaff? Why have other investigators been unable to figure out where he got the numbers in Table 4 of BTR? In an effort to answer these questions I will look at what Sulloway has said, in BTR and elsewhere, about his re-analysis of E&A's data, and will consider the various explanations he has given for the lack of agreement. I'll also comment on his research methods and, briefly, on the historical data in BTR.

Ernst & Angst Didn't Do a Meta-Analysis

Sulloway has pointed out several times (1995, p. 77; 1996, p. 472; 1998c) that E&A conducted their 1983 review of the birth order literature without the benefit of meta-analysis. "In a postscript to the preface of their book," he wrote, "Ernst and Angst regretted that meta-analytic methods had become available only as they were completing their study and hence were not employed by them" (1995, p. 77). One is left with the impression that it was the unavailability of the technique Sulloway used that caused E&A to conclude -- incorrectly, in his opinion -- that birth order has little or no effect on adult personality.

It is true that E&A expressed regret about not having used "the excellent meta-analytic method" (1983, p. xi) that was just coming into use in the early 1980s. But they were referring to the sophisticated statistical procedure, new at the time, for combining multiple results by taking into account the magnitude of the effect found in each study and the number of subjects who participated in that study. E&A were not expressing regret that they hadn't used the method called "vote-counting" -- the method Sulloway actually used. Vote-counting involves a simple tally of positive, negative, and no-difference results (or just positive and negative); plus a simple calculation, using binomial statistics, to determine whether the outcome deviates significantly from chance expectations. Though Sulloway called his tally of E&A's data a "meta-analysis," that term is ordinarily restricted to the method that takes into account effect size and number of subjects.

If E&A had wanted to use the vote-counting method they could certainly have done so, because it's as old as the hills. Their decision not to use it was a wise one. As medical researcher Mark Petticrew (2001) has pointed out, it is often inappropriate, for a variety of reasons, to use statistical procedures to pool data in a systematic review of the literature. "Systematic reviews should not therefore be seen as automatically involving statistical pooling," Petticrew explained, "as narrative synthesis of the included studies is often more appropriate and sometimes all that is possible" (p. 100). Furthermore, systematic reviews of the literature, with or without a meta-analysis, "are not intended to be a substitute for primary research" (p. 101). Because the studies combined in a meta-analysis tend to be of variable quality, it is not uncommon for meta-analyses to produce results that are later contradicted by larger, more carefully done studies (LeLorier et al., 1997). It is the large, carefully done study (in medical research, the randomized control trial), not the meta-analysis, that biomedical scientists regard as the "gold standard."

E&A were well aware of the shortcomings in the research studies they reviewed. To check on the conclusions they drew from their review, they carried out a large, carefully done study of their own. Their study supported their conclusions and was reported in the same book. I will return to E&A's study later in this essay.

In any case, the vote-counting method Sulloway used was available to E&A, though they chose not to use it. Since neither E&A nor Sulloway carried out a real meta-analysis, the unavailability of this method before 1980 is not a viable explanation for the difference in their conclusions.

The Numbers in Table 4

The outcome of Sulloway's vote-count is displayed in Table 4 of BTR (1996, p. 73). This table is headed "Summary of 196 Controlled Birth-Order Studies, Classified According to the Big Five Personality Dimensions." It shows a total of 72 "confirming" results (that is, favorable to Sulloway's theory), 14 "negating" results (opposite to those predicted by his theory), and 110 "no difference" results (also unfavorable to his theory, since his theory predicts a difference). In the text, Sulloway reports that "72 of the 196 studies display significant birth-order results that are consistent with my psychodynamic hypotheses." The likelihood of obtaining this result by chance, he says, is "less than 1 in a billion billion" (p. 72).

In January 1997, BTR was reviewed in the journal Science by the sociologist and historian John Modell. It was this review that first drew my attention to Table 4. Referring to Sulloway's re-analysis of E&A's data, Modell said, "I was persuaded by Sulloway's reworking of these materials -- until I tried to replicate it with [E&A's] literature review in hand. I could not do so, try as I might, or even come near" (1997, p. 624).

It is unusual for a person writing a book review to attempt to replicate an analysis reported in the book. Perhaps Modell wouldn't have felt it necessary if Sulloway had provided the information that normally accompanies the published report of a meta-analysis. But the voluminous appendixes and endnotes of BTR do not contain what psychologist Toni Falbo, in another critical review of BTR, described as "an essential element in any presentation of meta-analytic results" (1997, p. 939): a list of the studies that were included.

To the best of my knowledge (as of May 22, 2002), Sulloway has still not released his list of the "196 controlled birth-order studies" he found in E&A. In the absence of this list, attempts to figure out how and where he got the numbers in Table 4 must be based on what he has said, in BTR and elsewhere, about the method he used to re-analyze the data from E&A's survey of the birth order literature.

At least three attempts have been made to replicate Sulloway's re-analysis. Modell's was the first. The second was mine, described in Appendix 1 of The Nurture Assumption (Harris, 1998b). The third, by Frederic Townsend (in press) has not yet been published. All three attempts failed. That is, all three produced tallies that do not match the numbers in Table 4 of BTR.

In considering Sulloway's criticisms of these attempts, it is important to understand what Modell, Townsend, and I were trying to do. We weren't trying to re-analyze the data from E&A's survey: we were simply trying to reproduce Sulloway's methods in an effort to replicate his results.

"Studies" versus "findings."   Table 4 of BTR (1996), and a nearly identical table in Sulloway's 1995 commentary, are headed "Summary of 196 Controlled Birth-Order Studies." But a footnote under these tables (1995, p. 78n; BTR, p. 73n) conveys this information: "Each reported finding constitutes a `study.'"

Thus, Table 4 actually contains 196 findings, not 196 studies. The distinction is important, because a single study (employing a single sample of subjects) can generate multiple findings. If a researcher found, for example, that a particular sample of firstborns was more conservative, punitive, and nervous than the laterborns in the study, that would be three positive findings. But what if the researcher found that the firstborns in that sample were more conservative, punitive, nervous, self-centered, ambitious, jealous, domineering, organized, and parent-oriented? How many positive findings could a single study contribute to Sulloway's tally? As far as I know, Sulloway has never answered this question. Thus, the uncertainties about which studies were included in his tally and which were counted as "confirming" are compounded by an additional uncertainty: How many findings did each study contribute? How many studies contributed the 196 findings? The number of studies must have been less than 196, but how much less?

The hardcover edition of BTR (1996) doesn't offer many clues to Sulloway's methods. The note under Table 4 just says: "Data are tabulated from Ernst and Angst (1983:93-189), using only those studies controlled for social class or sibship size."

In pages 93 through 189 of E&A's book, hundreds of studies are reviewed. Twenty-six tables, many of which run to two pages, give lists of studies, specify whether or not they were controlled for sibship size and social class, and summarize their results. Studies are also described in the text. Information provided in the text isn't necessarily included in the tables and vice versa.

I spent days combing though those 97 pages of E&A, searching for findings from studies controlled for sibship size or social class or both. Sulloway reported 110 no-difference findings; I counted 109. Sulloway reported 14 negative findings; I counted 13. So far so good. But Sulloway reported 72 positive findings and I counted only 52. Five other studies yielded no clear-cut outcome in regard to Sulloway's theory. Altogether I tallied 179 findings, versus Sulloway's 196.

In an online essay, Sulloway (1998c) offered an explanation for the unsuccessful attempts to replicate his re-analysis of E&A's data. Modell (1997), he said, had erred by counting studies, not findings. "Modell had overlooked a crucial footnote at the bottom of the table in which I presented my meta-analytic findings," Sulloway alleged. "Because Modell counted studies -- ignoring multiple findings in the same study -- he naturally obtained different totals."

The trouble with this explanation is that overlooking the footnote under Sulloway's table doesn't result in counting studies instead of findings: it results in counting findings and thinking they are studies. When E&A reviewed studies that produced multiple findings, they almost always reported each finding in a different section of their chapter. For example, Blustein's (1967) result for conformity is reported in E&A's Table 25 (p. 126), her result for academic motivation is reported in Table 28 (p. 138), and her result for academic self-esteem is reported in Table 33 (p. 156). It is possible to tabulate all the findings reported by E&A without noticing that some researchers' names are repeated several times. Koch's sample of 384 five- and six-year-olds contributed eleven findings to E&A's chapter (and to my tally); Macbeth's 981 college students contributed ten.

Counting findings and thinking they are studies wouldn't affect the outcome of Modell's tallies and leaves his failure to replicate Sulloway's vote-count unexplained. The accusation that Modell overlooked the footnote is a red herring.

I got slapped with the same red herring. "Unfortunately," Sulloway (1998c) wrote, "Judith Harris followed in Modell's methodological footsteps, basing her own meta-analytic counts on `studies.' She did so in spite of being fully aware that my own counts were by `findings.' Additionally, she made no effort to ascertain how these two alternative methods of counting might differ in their outcomes." That accusation is demonstrably false; it is directly contradicted by what I said in Appendix 1 of The Nurture Assumption (pp. 368-370). As I explained there, I counted findings in E&A and then figured out how many studies these findings came from. I even reported both results: 179 findings, 116 studies.

The accusation that Modell and I ignored the distinction between studies and findings is all the more maladroit in view of Sulloway's own failure to distinguish clearly between them. Consider these statements:

If we decide to ignore all birth-order studies [in E&A's survey] that were uncontrolled either for social class or sibship size, we are left with 196 well-designed studies. . . . involving 120,800 subjects. (Sulloway, 1995, p. 77)

If we ignore all birth-order findings that lack controls for social class or sibship size, 196 controlled studies remain in Ernst and Angst's survey, involving 120,800 subjects. (Sulloway, BTR, p. 72)

If we take the remaining 196 studies [in E&A's survey] that are controlled for class and sibship size, we may ask how many significant findings are there in this set of 196 studies. (Sulloway, 1998b)

But there weren't 196 well-designed studies, only 196 findings. And there weren't 120,800 subjects, because there weren't 196 studies. Sulloway's repeated use of the word "studies" is misleading but not outright wrong, because in a footnote under a table he redefined "studies" to mean "findings." But there is no way to redefine the word "subjects" that would make "120,800 subjects" true. The number of subjects depends on the number of studies, not on the number of findings. If counting a subject ten times (in the case of a study that produced ten findings) turned him or her into ten subjects, researchers could save themselves a lot of trouble.

Now we can see why E&A refrained from using the vote-counting procedure. Testing the significance of results obtained by a vote-counting procedure (using binomial statistics) requires the assumption of independence: each study -- each sample of subjects -- can contribute only one vote to the total. Multiple measures of the same sample of subjects are not independent. Sulloway's use of binomial statistics (the formulas he used to produce statements such as "less than 1 in a billion billion") was an error in elementary statistical reasoning.

Publication bias and the file-drawer test.   The most disturbing example of the blurring of the distinction between "studies" and "findings" involves Sulloway's use of the "file-drawer test." In his online essay (1998c) he took me to task for my failure to bring his passing mark on the file-drawer test to the attention of my readers. I will make up for that oversight now.

The file-drawer test is a method for dealing with a scientific hazard, well known to statisticians, called publication bias -- the tendency for statistically significant results to be published and nonsignificant results, or results that are contrary to expectations, to be stuffed in a file drawer and forgotten. As I explained in my previous essay (see "Why Do People Believe that Birth Order Has Important Effects on Personality?" on this website), publication bias is a serious problem in medical research, where a no-difference result can be important. For example, if there is no difference in survival rates between patients who do and do not undergo a certain surgical procedure, this is something that physicians and patients would like to know. And yet, even in medicine, studies that fail to find a significant difference are less likely to be published and, if published, are slower to appear in print (Ioannidis, 1998). Publication bias is one of the reasons why meta-analyses of medical data sometimes lead to incorrect conclusions (Sutton et al., 2000). When enough no-difference and negative findings are left out, the overall outcome of a meta-analysis can be shifted from no-difference to positive.

Medical researchers use a statistical technique called a funnel plot to test for publication bias in a meta-analysis (Egger et al., 1997). The size of the effect obtained in each study is plotted against the number of subjects who participated in that study. Publication bias results in a dearth of small studies showing small or no-difference effects, because a study that yields nonsignificant results (or results that are contrary to expectations) is less likely to be published if it involved a relatively small number of subjects. Thus, if effect sizes tend to be larger for studies with small samples (causing the funnel-shaped plot to lean to one side), it's an indication of publication bias. In the table on page 372 of The Nurture Assumption, I demonstrated a publication bias in the birth order studies reviewed by E&A: 40 percent of findings from small studies, but only 19 percent from large ones, were significant and positive.

According to Sulloway (1995, p. 79; BTR, p. 472; 1998c), publication bias cannot account for the results of his re-analysis of the studies reviewed by E&A, because his results passed the file-drawer test.

The file-drawer test was devised by psychologist Robert Rosenthal (1987) as a quick and easy way of estimating the probability that there are enough nonsignificant studies sitting around in file drawers to invalidate the results of a meta-analysis. All one needs to do is to compare two numbers: the number of unpublished studies it would take to invalidate the results of a meta-analysis of published studies, and the estimated number of unpublished studies sitting around in file drawers. To calculate the first number, Rosenthal gave this simple formula: 19s - n, where s is the number of significant published studies and n is the number of nonsignificant published studies. The formula for the second number is equally simple: 5K +10, where K is the total number of published studies. What Sulloway apparently did was to make s = 72, n = 124, and K = 196; the results he obtained were 1,244 and 990. He concluded, "This number (1,244) exceeds 990, indicating that the published findings pass the file-drawer test" (Sulloway, 1995, p. 79).

But 196, 72, and 124 aren't numbers of studies -- they're numbers of findings! Sulloway nonetheless plugged those numbers into Rosenthal's formulas, which he found on pages 224 and 225 of Rosenthal's book, disregarding Rosenthal's words of warning on page 224. Rosenthal clearly stipulated that his formulas are based on the assumption "that each of the K studies is independent of all other K - 1 studies, at least in the sense of employing different sampling units."

Sulloway's 196 findings did not employ 196 different sampling units -- there were not 196 different samples of subjects. It was wrong to use Rosenthal's formulas on those data.

How many votes?   Another claim that Sulloway made is that I undercounted findings in E&A's review of birth order studies:

If, for example, a given study reported that firstborns are more conscientious than laterborns, but also more agreeable, I counted one confirmation and one refutation (in accordance with my formal hypotheses for these two dimensions). By contrast, Harris classified such "mixed" results as a single null [no-difference] outcome. (Sulloway, 1998c)

This accusation is another red herring. E&A almost always reported results for different aspects or measures of personality in separate sections of their chapter and in different tables, even if the results came from the same study. When they did this, I recorded each finding as a separate vote. For example, Macbeth (1975) gave her subjects a number of different tests, assessing their affiliative need, achievement motivation, political views, vocational interests, originality, and so on. The results of each of these tests -- mostly no-difference -- were reported separately in E&A, and each contributed a vote to my tally.

Occasionally E&A did report multiple findings from a single study in the same table. For these cases Sulloway is correct: I did give these studies a single vote. However, my undercounting mainly involved no-difference findings, not positive findings, and thus fails to explain why my vote-count didn't agree with his. My tally produced about the same number of no-difference findings as Sulloway's; the discrepancy involved only the number of positive findings.

By way of illustration, consider the aforementioned Macbeth (1975). In addition to the other kinds of tests Macbeth gave her subjects, she also gave them several standard personality tests. According to E&A (Table 37, p. 170), these tests turned up no reliable differences between firstborns and laterborns on any dimension of personality. I gave this outcome one no-difference vote. In an unpublished document I received from him in 1998, Sulloway (1998a) said that he gave Macbeth's results for the MMPI (one of the personality tests she used) five no-difference votes -- one for each of the five dimensions of personality. That would have increased Sulloway's no-difference vote by four relative to mine, but doesn't help to explain why he counted more positive findings than I did.

My most questionable decision involved a study by Price (1969). Price asked parents to judge their children's personalities and found, according to E&A (Table 37, p. 170), that parents considered their firstborns to be "more introverted, nervous, precocious, and conforming" than their laterborns. Because I regarded Price's result as favorable to Sulloway overall (despite the result for introversion, which is contrary to Sulloway's predictions), I gave Price one positive vote. In hindsight, I probably should have given this study one negative vote for introversion and two positives for nervousness and conformity ("precocious" is not a personality characteristic). Fortunately, in the unpublished document he sent me, Sulloway (1998a, pp. 4-5) explained how he decided on the number of votes to give Price. The reasoning was intricate but the outcome was that he awarded Price one positive vote, one negative vote, and three no-difference votes. Thus, the difference between the way I counted Price (one positive vote) and the way Sulloway said he counted Price involved only the number of negative and no-difference votes. Again, the difference in our procedures doesn't account for the difference in our tallies, which involved only the number of positive votes.

The unpublished document in which Sulloway revealed how he tallied Macbeth's and Price's findings was titled "Referee's Report." I'll now explain the origins of this document.

The Referee's Report.   In October 1997, I submitted a manuscript on birth order, including a highly critical review of BTR, to the journal Psychological Science. It was turned down on the basis of a single peer review. The reviewer was Frank J. Sulloway. Sulloway's 20-page review, with 14 footnotes, bore the heading "Referee's Report on Judith Harris's `Personality and Birth Order: The Remarkable Resilience of Deeply Held Beliefs.'" He mailed a copy of this document to me and another to the editor of the journal, who sent me a second copy at the time he rejected my manuscript. The review was signed. (Although the identity of reviewers is usually concealed, they are free to reveal themselves if they wish to do so.)

The copy I received directly from Sulloway was accompanied by a letter (January 25, 1998) in which he explained that if my manuscript were published in a journal he would want to "revise and expand [his] referee's report for some form of publication." However, I did not submit my manuscript to another journal; instead, I revised it and it became Appendix 1 of The Nurture Assumption. Consequently, as far as I know, Sulloway's report was never published. In the endnotes of The Nurture Assumption (p. 416) I cited the Referee's Report as an unpublished manuscript. Sulloway referred to it in one of his online essays (1998c), though he didn't give it a name.

Interactions.   The Referee's Report contained two bombshells. The first had to do with how Sulloway handled statistical interactions. An interaction is where different subgroups of subjects produce different results for the same outcome variable: for example, a significant difference between firstborns and laterborns might be found for female subjects but not for males; or a significant positive difference (that is, a difference in line with predictions) might be found for middle-class subjects, while working-class subjects yield a significant difference in the opposite direction.

Sulloway correctly surmised that I gave such studies a single vote: either one positive or negative vote (when one subgroup produced significant results and the other did not), or one no-difference vote (when the two subgroups produced results that went in opposite directions). Bear in mind that I was not trying to analyze the data in E&A's book: I was trying to replicate Sulloway's method in an effort to figure out where he got his numbers. If I had counted the first kind of interaction as one positive and one no-difference result, I would have ended up with too many no-difference results. If I had counted the second kind of interaction as one positive and one negative, I would have ended up with too many negative results. My tally already included about as many no-difference and negative results as his.

That's why I was flabbergasted by Sulloway's statement in his Referee's Report about how he handled interactions. "By way of illustration," he said, "I coded all two-way interaction effects as either one confirmation and one refutation, or as one null and one significant finding (as determined by an inspection of the means). Three-way interaction effects were coded for all four outcomes" (1998a, p. 4).

Whoosh! The number of potential votes for Sulloway's tally had suddenly multiplied like fruitflies in July. The question now became: If a single study could contribute one vote for each measured outcome variable, times two or four for each interaction, why were there only 196 entries in Table 4 of BTR? In particular, since most interactions consist of a positive or negative finding plus a no-difference finding, why weren't there many more no-difference findings in his tally?

The revelation about how Sulloway treated interactions meant that the distinction between "studies" and "findings" assumed even greater importance. The more findings a study was permitted to contribute to Sulloway's vote-count, the fewer the studies that must have contributed them. This is not a trivial matter. Sulloway mentioned in BTR (p. 76) that Koch's study produced 31 significant interactions involving birth order: "These nonadditive effects involved birth order, subject's sex, sibling's sex, and age spacing -- interacting in pairs, triplets, and even foursomes" -- that is, two-way, three-way, and four-way interactions. A four-way interaction, by the rule Sulloway gave in his Referee's Report, would produce eight findings. How many votes did Koch's sample of 384 five- and six-year-olds contribute to Table 4? We don't know, because Sulloway hasn't provided a list of his 196 findings and there's no way of figuring it out from the information he did provide.

Even if Sulloway imposed some limit on the total number of votes a single study could contribute, it is wrong to count interactions in the way he described, because it allows a study that resulted in an interaction to contribute more votes to the tally than a study that produced a simple no-difference result. If Sulloway counted a study that produced a positive result for females and a no-difference result for males as one positive vote and one no-difference vote, then he should have counted a study that produced no-difference results for both sexes as two no-difference votes.

In any event, the way I treated interactions cannot account for the discrepancy between my tally and his, because the discrepancy involved the number of favorable votes, not the number of negative or no-difference votes. But Sulloway's claim that he counted many more findings per study than I did implies that his tally must have included considerably fewer studies than mine (I had calculated that the 179 findings in my tally came from 116 studies), which raises the question of how he decided which studies to include and which to leave out.

The "Errors" in E&A.   The second bombshell in the Referee's Report (Sulloway, 1998a) had to do with the "errors" in E&A. According to Sulloway, another reason why my tallies don't agree with his is that I accepted E&A's summaries of the studies they reviewed, whereas he consulted the original documents -- which included unpublished doctoral and masters' dissertations, abstracts of talks given at professional meetings, and articles published in obscure foreign journals such as Acta Psychologica Taiwanica. Often, he said, these documents produced evidence that, in his judgment, conflicted with E&A's reports. When that occurred, he "rectified" E&A's "errors."

Sulloway's methodology, which I was trying to reproduce in order to confirm or disconfirm his tallies, was looking more and more like a moving target. In his 1995 commentary and in the hardcover edition of BTR (1996), the footnote under the table simply said "Data are tabulated from E&A (1983:93-189)." The news about rectifying errors was added to the endnotes of the paperback edition (1997, p. 472), published in September, 1997 -- about eight months after Modell (1997) announced, in the pages of Science, that he had been unable to replicate Sulloway's results. However, I was unaware of this addition to the endnotes until I received Sulloway's Referee's Report in January, 1998.

In the Referee's Report, Sulloway said that he did not accept Ernst and Angst's survey of the birth-order literature "at face value" but instead went back to the original publications and found "more than forty errors" in their book. "The net result of correcting these errors," he continued, "is to increase [Harris's] number of confirming results by about 22, and to reduce her number of nulls by about 15" (1998a, pp. 2-3). In other words, of more than 40 errors Sulloway claims to have corrected, 37 resulted in changes in his favor. If E&A's errors were random, only about half of the corrections should have resulted in a favorable change, so this statistically unlikely outcome requires an explanation. Here's how Sulloway explained it: "Ernst and Angst's errors in reporting have a tendency to favor their own viewpoint, namely, that there are negligible birth-order differences in controlled studies" (1998a, p. 3n).

This is an accusation of bias: Sulloway is claiming that E&A were biased against reporting significant birth order effects. Yet in BTR (p. 472, hardcover and paperback), he had praised them: "Researchers owe a considerable intellectual debt to Ernst and Angst (1983) for their systematic analysis of the birth order literature." Why did he say that, if he had already discovered evidence of systematic bias in their survey? For that matter, why did he use their review at all, if he was going to ignore what they said and go back to the original reports of the studies? And if he was going to consult the original reports, why did he restrict his "meta-analysis" to studies done before 1981, as E&A had? Thanks to E&A, later studies tended to be of better quality; why hadn't Sulloway included them?

Evidently the Referee's Report looks convincing to someone who hasn't pored over the relevant pages of BTR and E&A -- the editor of Psychological Science turned down my critique of BTR on the basis of this document. But for me it raised more questions than it answered. Take, for instance, the issue of whether Sulloway's "196 controlled studies" were controlled for both social class and sibship size, or controlled for either social class or sibship size -- a distinction that turns out to have important ramifications, as Townsend (in press) has demonstrated. In his Referee's Report (1998a, p. 3), Sulloway quoted the footnote under Table 4 in BTR as follows: "Data are tabulated from Ernst and Angst (1983:93-189), using only those studies controlled for social class and sibship size." But the footnote in the book (both hardcover and paperback) actually reads "controlled for social class or sibship size" (p. 73, emphasis mine). In converting a lenient rule into a stricter one, Sulloway had misquoted from his own book!

What was I to make of the claim of errors in E&A and the accusation that they were biased? As Sulloway knew, I do not have access to a university library (see the author profile at tna/bio.htm). Thus, though the Referee's Report included a list (titled "Errors in Ernst and Angst's Literature Review") of the studies on which he and E&A had come to different conclusions, I was unable to resolve the disagreements by consulting the original sources myself. However, I thought I would at least be able to determine from Sulloway's list which studies had contributed findings to Table 4 and which had been rejected. But my optimism proved to be unfounded.

I ran into problems immediately. The first item in Sulloway's list of errors, under the heading "Studies erroneously reported as being controlled for sibship size or social class," was Corsello, 1973. The heading implies that I should not have included these studies in my tally -- but I had not included Corsello, 1973, in my tally! What Corsello studied, according to E&A (p. 93), was not personality as a function of birth order but "perception of differential treatment by parents." No one doubts that siblings think their parents treat them differently; the question is whether this perceived differential treatment affects their personalities. There were several studies in E&A's review in which the outcome variable was perception of differential treatment rather than personality; I included none of them in my tally because they weren't relevant. Sulloway eliminated one. Were the other studies of this type included in his tally? My guess would be that they were but I have no way of knowing for sure.

Another problem was the vagueness of the headings in Sulloway's list. Twelve studies were listed under the heading "Studies erroneously reported as not being statistically significant that are significant (or that involve doubt as nulls)" (1998a, p. 19). Was Sulloway saying that studies that "involve doubt as nulls" contributed favorable votes to Table 4 of BTR? And what about the eleven studies listed under the heading "Miscellaneous studies whose findings are reported in an incomplete, inaccurate, or otherwise problematic manner" -- did they contribute data to Table 4? Even if I consulted the original publications in the "miscellaneous" list (e.g., Yang & Liang's 1973 paper in Acta Psychologica Taiwanica), I still wouldn't know whether or not Sulloway had included their findings in Table 4.

Show me the data.   Such questions ordinarily don't come up, because (as Falbo, 1997, pointed out) when researchers publish the results of a meta-analysis it is customary to include a list of the studies that contributed data to it. But Sulloway did not do that. In the Referee's Report he said that Price's study contributed five votes to his tally and Macbeth's result for the MMPI another five, but as far as I know he has not provided that sort of specific information for any other study. His list of "errors" in E&A cleared nothing up but only added to the confusion. If he wanted me to know which of the studies in E&A's survey he had included and which he had rejected, and how many findings each had contributed, why didn't he just send me a list of the 196 findings in Table 4 of BTR?

On July 25, 1998, I gave up trying to replicate Sulloway's methods and wrote to him, asking for a list of his 196 findings. Here is his reply, dated August 11, 1998:

I have always made my data available to other researchers, and I am happy to make my meta-analytic data available to you. I have been planning, in any event, to put these data on the internet in order to make them available to anyone else who wants them.

A year later (July 28, 1999) Sulloway informed me in a letter that he was almost ready to send me the information I had requested. However, there was a string attached: I would have to agree to "ask [his] permission before giving any copies of these data to anyone who lacks formal accreditation as a scientist -- for example, by not having a Ph.D. degree in the social or behavioral sciences." (As Sulloway knows, I don't have a Ph.D.) Since I believe that these data should have been included in the original publication of Sulloway's "meta-analytic" results, I refused to agree to his restriction unless he could "provide me with a convincing reason why this information must be kept away from the hoi polloi." He has not replied; nor has he sent me the list of his 196 findings. To my knowledge he has not carried out his 1998 plan "to put these data on the internet in order to make them available to anyone else who wants them."

(My own list of 179 findings tabulated from E&A -- who occasionally made mistakes but who I don't believe were biased -- is available on this website.)

Sulloway (1998c) has accused me of withholding from my readers the information that he corrected "errors" in E&A. The truth is that I did tell my readers (see p. 369 of The Nurture Assumption) that Sulloway went back to many of the original studies in E&A's review, that he formed his own opinions about how they turned out and whether they used the proper controls, and that his opinions often differed from E&A's. However, I remained neutral on the question of whether these differences of opinion were, in fact, errors on E&A's part. I pointed out, on one hand, that correcting errors is a legitimate procedure in a meta-analysis; but, on the other hand, that Sulloway's corrections almost always resulted in changes that were favorable to his theory. "Sulloway believes that E&A were biased against finding birth order effects," I explained, leaving it up to my readers to decide what to make of this information.

For those who are still undecided, there is additional information now. Townsend (in press) has done what I was unable to do: go back to the original reports of the studies on Sulloway's list of "errors" in E&A. In his online essay, Sulloway (1998c) had offered to send that list (though not the list of his 196 findings) to anyone who requested it. Townsend requested and received a copy. The results of his investigation will be published soon; here is a preview of his conclusions: "In other words, the `reporting errors' were mostly Sulloway's, not Ernst and Angst's." Townsend doesn't ask his readers to accept his conclusions on faith: he provides detailed information about the studies he reviewed and lists their positive, negative, and no-difference findings.

The bottom line.   In my previous essay on this website I defined confirmation bias as "the tendency to seek, notice, and remember evidence that confirms one's belief, and to ignore, forget, or explain away contrary evidence." It's a universal human failing. Sulloway accused E&A of bias and implied that they were motivated to find evidence to support their belief that birth order does not have important effects on personality. But bias could work the other way as well. Confirmation bias could cause a person who was convinced that birth order does have important effects to question E&A's opinion whenever they reported that a study yielded no-difference or negative evidence, and to let their opinion stand whenever they reported that a study yielded positive evidence. Thus, the process of rectifying errors could itself be biased in a way that could affect the outcome.

A more serious problem is the fact that we know neither the number nor the identity of the studies that contributed to the 196 findings in Table 4 of BTR. The tallying method Sulloway described in his Referee's Report -- recording multiple findings from the same study and counting interactions as two, four, or eight findings -- greatly increases the number of findings that a single study could contribute to the table, and consequently decreases the number of studies that contributed these findings. In the pages of E&A (pp. 93-189) that Sulloway cites as the source of his data, there were a number of studies that I would predict would produce significant, though misleading, birth order effects -- studies in which siblings were asked about differential treatment by parents, for example, or where the data consisted of judgments by parents or siblings. Many of these studies produced multiple findings. Koch's study of five- and six-year-olds (31 interactions multiplied by two, four, or eight) could alone have produced half the findings in Table 4! Koch's findings included judgments of differential treatment by parents (firstborns felt less favored by the mother) and judgments of sibling interactions by the siblings themselves (e.g., secondborns wanted to play with their older sibling more than firstborns wanted to play with their younger one). Neither is a measure of adult personality or even of child personality: both are putative causes of personality differences between firstborns and laterborns, rather than the personality differences themselves.

Statements made by Sulloway about Table 4, such as "The likelihood . . . is less than 1 in a billion billion" (BTR, p. 72) and "These birth-order findings also pass the File Drawer Test" (p. 472) are based on a misuse of statistics. The statement that the data in the table come from studies "involving 120,800 subjects" is also incorrect. And Sulloway's explanation for my failure to replicate his tally doesn't hold water.

Ernst and Angst's Own Study of Birth Order and Personality

As E&A were aware, many of the studies they reviewed were so poorly designed that they deserved to be consigned forever to the file drawer of history. That's why E&A did a study of their own -- to confirm or disconfirm the results of their survey. It was one of the largest studies of birth order and personality ever done, larger than any in their survey. E&A measured twelve different aspects of personality (including Sulloway's favorite, openness) in 7,582 young adults in Zurich, controlling for family size and socioeconomic status. They found no significant birth order effects in families with two children -- the firstborn did not differ from the secondborn in any dimension of personality. In families of three or more, one small but significant effect turned up: the lastborn was slightly lower in masculinity. These results were reported by E&A in the same 1983 book, right after their survey of the literature.

In an endnote in BTR (p. 475), Sulloway referred to E&A's finding on masculinity ("On masculine versus feminine attitudes and birth order, see Ernst and Angst 1983:259-60") without mentioning that this finding came from a study done by E&A themselves. The other results of E&A's study -- the no-difference findings they obtained for all other dimensions of personality -- are not mentioned at all in BTR, even though they were reported on the same pages of E&A (pages 259-60) that Sulloway cited for the finding on masculinity. Nor were any of these no-difference findings included in the tally in Table 4 of BTR. (The data in Table 4, according to the note under the table, came from pages 93-189 of E&A's book.)

Why did Sulloway fail to mention that E&A carried out a major study of their own or include their findings in his tally? Surely it can't be because he thought they were biased: if he were going to leave out all researchers who held an opinion about birth order before they did their research, his list of findings would be very short indeed! And surely it can't be because E&A used a self-report personality test, because the MMPI is also a self-report personality test, and Sulloway said in his Referee's Report that he gave Macbeth's (1975) results for the MMPI five votes in his tally.

The Validity of Self-Report Personality Tests

It is true, however, that Sulloway (1998c, 1999) has expressed skepticism about self-report personality tests. These are paper-and-pencil questionnaires that require subjects to make judgments about themselves, usually by agreeing or disagreeing with statements describing characteristic behaviors, feelings, and attitudes. Because they are easy to give to large numbers of subjects, most major studies of birth order effects on personality use self-report tests. E&A's results were typical of the outcome produced by these studies: no significant birth order effects, or one or two effects of negligible size that fail to be replicated in other studies.

Sulloway himself has been forthright in admitting this outcome: "When assessed by self-report questionnaires, birth-order effects are typically modest and nonsignificant." But, as he pointed out correctly, another type of test generally does yield significant effects: "Yet systematic differences by birth order are generally found when parents rate their own offspring or when siblings compare themselves to one another" (1999, p. 192). If the two kinds of tests produce conflicting results, which should we believe? Sulloway favors the kind in which the judgments are made by parents or siblings; he alleges that self-report tests have "serious problems" (1998c).

As I've explained in print (Harris, 2000), in online writings (Harris, 1998a, 1999), and in my previous essays on this website, the results of both kinds of tests make sense if you take context into account. The tests involving judgments by parents and siblings give a valid picture of how the subjects behaved (and probably still do behave) in the context of the family they grew up in. The self-report tests, usually administered in a classroom or laboratory, give a valid picture of how subjects behave when they're not in the presence of their parents or siblings -- how they behave in the world in which they live as adults. The fact that birth order effects are found in "all-in-the-family" tests but not in other kinds of tests is a confirmation of the theory I presented in The Nurture Assumption (see "Why Are Birth Order Effects Dependent on Context?" on this website).

But Sulloway, like most contemporary Americans and Europeans, believes that the experiences children have with their parents and siblings leave permanent marks on their personalities and affect the way they behave in all areas of their adult lives. If this is the case, why do self-report personality tests fail to show reliable birth order effects?

Sulloway (BTR, 1998c, 1999) has offered three reasons why we shouldn't take the results of self-report tests too seriously. First, he believes that subjects -- at any rate, firstborns -- are unlikely to respond to them truthfully: "How many firstborns are willing to describe themselves as `callous' or `unadventurous'?" (BTR, p. 474). This question betrays, among other things, a lack of understanding of the way personality tests are constructed and scored. Subjects' responses are not taken at face value -- a subject is not scored high in agreeableness because she describes herself as easy to get along with. Instead, the outcome depends on the pattern of the subject's responses. Certain patterns have been found to be associated with certain personality characteristics. These tests are sophisticated instruments -- honed over the years to make them more accurate, validated by cross-checking their results with other methods of assessing personality. As Jefferson, Herbst, and McCrae pointed out, "There is vastly more evidence supporting the validity of self-reports than there is supporting effects of birth order" (1998, p. 507).

Sulloway's second criticism of self-report tests is that studies that do not directly compare siblings within the same family might produce spurious results due to "confounding effects associated with differences between families" (1999, p. 192). This is true. However, some of the researchers who used self-report questionnaires (Freese, Powell, & Steelman, 1999; Hauser, Kuo, & Cartmill, 1997) have performed within-family analyses, directly comparing the responses of siblings in the same family, and nonetheless failed to find significant birth order effects.

The final reason Sulloway has given (1998c, 1999) for not trusting the results obtained with self-report personality tests is that these tests use an "unanchored" method for measuring personality: the subjects are required to judge themselves against a hypothetical norm or average, rather than compare themselves to a specific person such as a sibling. The implication is that unanchored judgments are less accurate and therefore less valid than anchored judgments.

There are at least three things wrong with that allegation. First, a comparison with a hypothetical norm or average (even one that is estimated by the subject) is a more valid measure than a comparison with a specific individual, for the simple reason that individuals vary over a wider range than averages do. Second, the self-report questionnaires I have seen don't require comparisons with a hypothetical norm -- they require subjects to express their degree of agreement or disagreement with statements such as "I feel nervous when I'm talking to someone I don't know very well," or "I believe that convicted murderers should be put to death."

Third, as I already mentioned, responses on personality tests are not taken at face value. They differ in this respect from questionnaires designed, for example, to find out how much pain the respondents are feeling or how depressed they are. As psychologist Linda Bartoshuk has pointed out (in Goode, 2001), we can't conclude that two people who both give their pain a score of 9 are feeling the same amount of pain. But that's not how personality tests work. A tendency to always check off high numbers, for example, is itself an indication of one aspect of personality. Because these tests look for patterns of responses and are validated by comparisons with other ways of assessing personality (ratings by other people, for example), the absence of an objective standard is not a problem.

In casting doubt on the results of self-report personality tests, Sulloway has painted himself into a corner. In BTR (p. 68) he described the "Big Five" theory of personality (McCrae & Costa, 1987) and said he used it as his guide: "Using the Big Five as my guide, I offer here a psychodynamic account of birth order differences." The Big Five theory of personality is based primarily on the results of self-report tests -- the same personality tests that fail to substantiate Sulloway's theory.

These same tests provide the scientific underpinnings of Sulloway's theory: "The crux of my argument stems from a remarkable discovery. Siblings raised together are almost as different in their personalities as people from different families" (BTR, p. xiii). Sulloway was exaggerating -- biologically related siblings are a good deal more alike than people from different families -- but it is true that there are hard-to-explain personality differences between siblings. "This finding," Sulloway continued, was "firmly established by studies in personality psychology." Yes, and most of these studies used the same self-report personality tests that fail to substantiate Sulloway's theory. (See "Why Can't Birth Order Account for the Differences Between Siblings?" on this website.)

Sulloway (1998c) contrasts self-report data with "real-life data"; he claims his theory is supported by the latter, even if it is not supported by the former. Real-life data are collected, not in artificial situations concocted by researchers, but in the world outside the laboratory. According to Sulloway, "Parents value a child's doing well in school, so firstborns are conscientious, do their homework, generally do better at school," whereas laterborns "are more likely to challenge the status quo, and they are more likely to cause their parents aggravation by doing all sorts of outrageous things" (1998b). Real-life data on tens of thousands of people contradict these statements. On the average, if family size and social class are controlled, firstborns do no better in school than laterborns. Laterborns are no more likely than firstborns to aggravate their parents by underachieving in grade school, dropping out of high school, or failing to go to college (Blake, 1989; E&A, 1983; McCall, 1992).

The Historical Evidence in Born to Rebel

I am at a disadvantage in assessing the historical data in BTR; my field is developmental psychology. But others who have looked closely at the historical data -- most notably Townsend -- have found as many anomalies and unanswered questions as I found in Sulloway's treatment of E&A's data. Townsend's lively and revealing article is scheduled to appear, possibly with a reply by Sulloway, in the journal Politics and the Life Sciences.

One problem that several of Sulloway's critics have noted is that the research methods used to produce his historical data are particularly vulnerable to the distorting effects of confirmation bias. For example, Jeremy Freese and his colleagues have pointed out that Sulloway's distinction between "functional" birth order and biological birth order is a possible source of confirmation bias, due to the difficulty of ferreting out information about the family background of historical subjects. "This raises the possibility," they said, "that the `functional' birth status of some sample members was investigated more thoroughly than others, precisely because they otherwise would have been exceptions to the study's general findings" (Freese et al., 1999, p. 213n; italics in the original).

Critics have also questioned the way Sulloway categorized the scientific controversies he studied and the scientists who contributed to them. "It is uncertain whether Sulloway was blind to birth order of individuals and to their scientific contributions when making various decisions," David Rowe noted (1997, p. 365). Michael Ruse commented on the "flexible" nature of Sulloway's decisions: "When faced with the first-born counter example of James Watson, surely the author of one of the most significant scientific breakthroughs of our period, Sulloway denies that this counts as a genuine revolution!" (Ruse, 1997, p. 373). Similar complaints have been made about the way Sulloway has explained away other apparent counter-examples: for example, the revolutionaries Ché Guevara and Mao Tse-tung, who -- inconveniently for Sulloway's theory -- were firstborns (Townsend, in press).

Although Sulloway consulted 94 historians of science and had them rate the participants in each scientific controversy (BTR, p. 395), these ratings themselves may be another source of bias. As Freese and his colleagues pointed out, the method used for obtaining the historians' ratings makes them susceptible to "interviewer effects":

One problem is that the ratings from historians were all obtained through in-person interviews conducted by Sulloway well after he began constructing his arguments about birth order. . . . As a result, the possibility of substantial interviewer effects cannot be ruled out (Freese et al., 1999, pp. 224-225).

Sulloway went to great lengths to collect these ratings in person:

I flew a quarter of a million miles around the world as I gathered these expert ratings from scholars in England, France, Germany, Italy, and America. (Sulloway, 1998b)

Oddly enough, most social scientists -- who ought to know about such things -- don't worry much about the dangers of confirmation bias or interviewer effects. Medical researchers tend to be more cautious. Over the years, they have developed elaborate procedures designed to prevent researchers' biases from influencing the outcome of medical studies. The use of these procedures makes modern medical research a pain in the neck, but they are used nonetheless because experience has shown that they are necessary. Without them, bias inevitably creeps in and researchers tend to find the results they are looking for. As physicist Robert Park has warned,

Alas, many "revolutionary" discoveries turn out to be wrong. Error is a normal part of science, and uncovering flaws in scientific observations or reasoning is the everyday work of scientists. Scientists try to guard against attributing significance to spurious results by repeating measurements and designing control experiments, but even eminent scientists have had their careers tarnished by misinterpreting unremarkable events in a way that is so compelling that they are thereafter unable to free themselves of the conviction that they have made a great discovery. Moreover, scientists, no less than others, are inclined to see what they expect to see, and an erroneous conclusion by a respected colleague often carries other scientists along on the road to ignominy. (Park, 2000, p. 9)

The Difficulties of Confirming or Disconfirming Sulloway's Data

When someone thinks up a new theory and collects data to support it, the usual way of introducing the theory and presenting the data is to publish an article in an academic journal. In order to get the article through the process of peer review, certain standards have to be met. The data, if newly collected by the theorist, have to be presented in a clear and detailed way -- clear and detailed enough so that if readers have any doubts they can repeat the study themselves and see if they get the same results. Either Sulloway decided not to go that route or he tried it and was turned down. The article (Sulloway, 1995) in which he first presented the data shown in Table 4 of BTR was a commentary in Psychological Inquiry on someone else's article. Commentaries are not ordinarily subjected to peer review. (I've written three commentaries for academic journals, including one for Psychological Inquiry; all were accepted without peer review. My other journal articles [see the publication list at tna/bio.htm#pubs] did go through the peer-review process.)

Skeptical readers of BTR are stymied by the imprecise descriptions of methods and results. Compare the following two statements, the first from Sulloway's 1995 commentary, the second from BTR:

During the century preceding publication of Darwin's Origin of Species (1859), individual laterborns were four times more likely than firstborns to support evolution. In some decades, these group differences were as great as 10 to 1. (Sulloway, 1995, p. 80)

During the long period of debate preceding publication of Darwin's Origin of Species (1859), individual laterborns were 9.7 times more likely than individual firstborns to endorse evolution. (BTR, 1996, p. 33)

Was the dramatic increase in laterborn support, from "four times more likely" to "9.7 times more likely," due to a change in the timespan being considered -- that is, does the second of these statements refer to a particular decade, rather than the entire century? I was unable to answer that question, or to find any other plausible explanation for the discrepancy, by examining the information provided in the book.

Nor am I the only one who has been frustrated by the confusing way data are presented in BTR. Here's historian and sociologist John Modell:

At an extreme, one has to go to six places -- text, table, table note, endnote, appendix, and bibliography -- to make sense of a given operation, a task made all the harder by the author's careless diction. This heedless intricacy, further exacerbated by the omission of conventional information about sample sizes, the distribution of values of variables employed, and the proportion of values missing and imputed, surely will lead most readers to throw up their hands, either accepting the author's procedures on faith or dismissing the book out of hand. (Modell, 1997, p. 625)

Psychologist Toni Falbo:

In fact, there are no tables of descriptive statistics in the text or the appendixes. Within the text are tables of correlations and odds ratios and figures generated from sophisticated analyses, but no simple presentation of the frequency of people falling into key categories, such as firstborns, later borns, supporters of radical ideas and their critics. This information is essential for evaluating the evidence. . . . Considering the cavalier way that Sulloway uses statistics, it is not surprising that after presenting all his evidence, he concludes that the "effects of birth order transcend gender, social class, race, nationality, and -- for the last five centuries -- time" (p. 356). Anyone who is qualified to review for an American Psychological Association journal would find this statement unsupported by the evidence presented. (Falbo, 1997, p. 939)

Sociologists Jeremy Freese and Brian Powell:

Overall, while many appendices are extraordinarily detailed, Sulloway is frustratingly unclear about some of his most important measures and the details of his models. (Freese & Powell, 1998, p. 58)

In an attempt to replicate some of the findings reported in BTR, Freese and his colleagues did a study of their own (Freese, Powell, & Steelman, 1999). They used questionnaire data on social attitudes, collected from 1,945 subjects plus 1,115 siblings of these subjects, to test Sulloway's claim that firstborns are more conservative, supportive of authority, and punitive than laterborns. "We find no support for these claims," these researchers concluded (p. 207). Even the nonsignificant effects didn't go in the right direction.

It is never surprising when the originator of a theory produces evidence that supports the theory. The real test of a theory is whether other people, working independently of the originator of the theory, produce evidence that supports it. As psychologist John McDonagh summed it up, "The dialogue that is the very process of science insists that studies be replicated by researchers independent of the original researchers before the scientific community at large accepts the findings as valid" (2000, p. 678).


I thank Jeremy Freese, Charles S. Harris, John Modell, Richard G. Rich, Carol Tavris, and Frederic Townsend for their helpful comments on earlier versions of this essay.


Blake, J. (1989, July 7). Number of siblings and educational attainment. Science, 245, 32-36.

Blustein, E. S. (1967). The relationship of sib position in the family constellation to school behavior variables in elementary school children from 2-child families. Ph.D. thesis, University of Maryland. Dissertation Abstracts International, 28-B, 3046-3047 (1968).

Buss, D. M. (1995). Evolutionary psychology: A new paradigm for psychological science. Psychological Inquiry, 6, 1-30.

Corsello, P. (1973). Birth order and children's perceptions of love, authority, and personal adjustment. Dissertations Abstracts International, 34-A, 3132.

Egger, M., Smith, G. D., Schneider, M., & Minder, C. (1997, September 13). Bias in meta-analysis detected by a simple, graphical test. British Medical Journal, 315, 629-634.

Ernst, C., & Angst, J. (1983). Birth order: Its influence on personality. Berlin, Germany: Springer-Verlag.

Falbo, T. (1997). To rebel or not to rebel? Is this the birth order question? Contemporary Psychology, 42, 938-939.

Freese, J., & Powell, B. (1998). Review of Born to Rebel. Contemporary Sociology, 27, 57-58.

Freese, J., Powell, B., & Steelman, L. C. (1999). Rebel without a cause or effect: Birth order and social attitudes. American Sociological Review, 64, 207-231.

Goode, E. (2001, Jan. 2). Researcher challenges a host of psychological studies. New York Times, pp. C1, C7.

Harris, J. R. (1998a, June). How is personality formed? (commentary on a online talk by Frank J. Sulloway). Edge (

Harris, J. R. (1998b). The nurture assumption: Why children turn out the way they do. New York: Free Press.

Harris, J. R. (1999, June). Children don't do things half way: A talk with Judith Rich Harris. Edge (

Harris, J. R. (2000). Context-specific learning, personality, and birth order. Current Directions in Psychological Science, 9, 174-177.

Hauser, R. M., Kuo, H.-H. D., & Cartmill, R. S. (1997, March). Birth order and personality among adult siblings: Are there any effects? Paper presented at the annual meeting of the Population Association of America, Washington, DC.

Ioannidis, J. P. A. (1998, January 28). Effect of the statistical significance of results on the time to completion and publication of randomized efficacy trials. Journal of the American Medical Association, 279, 281-286.

Jefferson, T., Jr., Herbst, J. H., & McCrae, R. R. (1998). Associations between birth order and personality traits: Evidence from self-report and observer ratings. Journal of Research in Personality, 32, 498-509.

Koch, H. L.: E&A list 13 different publications in which Koch reported the results of her study of 384 five- and six-year-olds. For example: Koch, H. L. (1955). Some personality correlates of sex, sibling position, and sex of sibling among five- and six-year-old children. Genetic Psychology Monographs, 52, 3-50.

LeLorier, J., Grégoire, G., Benhaddad, A., Lapierre, J., & Derderian, F. (1997, August 21). Discrepancies between meta-analyses and subsequent large randomized, controlled trials. New England Journal of Medicine, 337, 536-542.

Macbeth, B. L. (1975). Birth order, personality, and scholastic aptitude. Thesis, Department of Psychology, University of Oregon. Dissertation Abstracts International, 36-B, 4757 (1976).

McCall, R. B. (1992). Academic underachievers. Current Directions in Psychological Science, 3, 15-19.

McCrae, R. R., & Costa, P. T., Jr. (1987). Validation of the five-factor model of personality across instruments and observers. Journal of Personality and Social Psychology, 52, 81-90.

McDonagh, J. (2000). Science without a degree of objectivity is dead. American Psychologist, 55, 678.

Modell, J. (1997, January 31). Family niche and intellectual bent (review of Born to Rebel). Science, 275, 624-625.

Park, R. (2000). Voodoo science: The road from foolishness to fraud. New York: Oxford University Press.

Petticrew, M. (2001, January 13). Systematic reviews from astronomy to zoology: Myths and misconceptions. British Medical Journal, 322, 98-101.

Price, J. (1969). Personality differences within families: Comparison of adult brothers and sisters. Journal of Biosocial Science, 1, 177-205.

Rosenthal, R. (1987). Judgment studies: Design, analysis, and meta-analysis. Cambridge, UK: Cambridge University Press.

Rowe, D. C. (1997). Review of Born to Rebel. Evolution and Human Behavior, 18, 361-367.

Ruse, M. (1997). Review of Born to Rebel. Evolution and Human Behavior, 18, 369-373.

Sulloway, F. J. (1995). Birth order and evolutionary psychology: A meta-analytic overview (commentary on target article by Buss). Psychological Inquiry, 6, 75-80.

Sulloway, F. J. (1996). Born to rebel: Birth order, family dynamics, and creative lives (hardcover edition). New York: Pantheon.

Sulloway, F. J. (1997). Born to rebel: Birth order, family dynamics, and creative lives (paperback edition). New York: Vintage.

Sulloway, F. J. (1998a, January). Referee's Report on Judith Harris's "Personality and Birth Order: The Remarkable Resilience of Deeply Held Beliefs." Unpublished document: peer review of a manuscript submitted to Psychological Science.

Sulloway, F. J. (1998b, May). How is personality formed? A talk with Frank Sulloway. Edge (

Sulloway, F. J. (1998c, November). Birth order and the nurture misassumption: A reply to Judith Harris. Edge (

Sulloway, F. J. (1999). Birth order. In M. A. Runco & S. Pritzker (Eds.), Encyclopedia of creativity (vol. 1, pp. 189-202). San Diego, CA: Academic Press.

Sutton, A. J., Duval, S. J., Tweedie, R. L., Abrams, K. R., & Jones, D. R. (2000, June 10). Empirical assessment of effect of publication bias on meta-analyses. British Medical Journal, 320, 1574-1577.

Townsend, F. (in press). Birth order and rebelliousness: Reconstructing the research in Born to Rebel. Politics and the Life Sciences. (Townsend's email address:

Yang, K. S. & Liang, W. H. (1973). Some correlates of achievement motivation among Chinese high school boys. Acta Psychologica Taiwanica, 15, 59-67.

Version 1.0
May 22, 2002

Citation (American Psychological Association format):
Harris, J. R. (2002, May 22). The mystery of Born to Rebel: Sulloway's re-analysis of old birth order data. Retrieved [insert date] from the World Wide Web: http://judithrichharris/tna/birth-order/index.htm

Copyright Notice
Copyright 2002 by Judith Rich Harris.
Permission is granted to link to this essay and to quote from it briefly. All other rights reserved.
For permission to reprint, contact Charles S. Harris, .

To the birth order page
To The Nurture Assumption home page
Back to top Visits to this page: Visits to this page: