The “Tuskegee Experiment” occurred over the 40-year period from 1932 to 1972. The experiment enrolled 600 Black men in Macon County, Alabama. Of this group, 399 had latent syphilis and 201 were free of syphilis. Those with syphilis were not told they had the disease and effective treatment was withheld. The experiment was perpetrated chiefly by the United States Public Health Service (PHS) and the Centers for Disease Control and Prevention (CDC). The Tuskegee Institute (later, University) collaborated, but in a subordinate role to those who initiated, directed, funded, and controlled the experiment.
The unethical and duplicitous nature of the Tuskegee Experiment became public in 1972. Since then, it has been thought that the experiment was a cause of mistrust of medical providers among Black people. The lack of trust of healthcare providers by Black people is also thought to diminish use of medical care, adversely affecting health, and be a partial explanation of the large racial disparities in health.
It is certainly reasonable to suppose that the Tuskegee atrocity would reduce Black people’s trust in medical services. And showing this to be the case would be an important finding. An article in the Quarterly Journal of Economics by Marcella Alsan and Marianne Wanamaker purports to do that. Their article is titled “Tuskegee and the Health of Black Men” (2018).
In the present article, I criticize the analysis by Alsan and Wanamaker (hereafter AW) and question the validity and interpretation of their evidence of the impact of the Tuskegee Experiment on the mortality of Black men. My reassessment and critique of AW (2018) is important because of the large racial disparities in health, the absence of credible evidence on other explanations of racial disparities in health, and the plausibility that mistrust of providers, and, more generally, the quality of patient-provider communication, is an important explanation of racial disparities in health.
Racial disparities in mortality
Racial disparities in health have been large for decades and remain large. As shown in the top half of Table 1, the death rate from circulatory disease among Black people ages 45 to 64 is approximately twice the rate for White people of the same age, and that difference has been steady for 50 years. While not quite as large, the racial disparity in the death rate from cancer among people ages 45 to 64 is approximately 40 percent, and that difference too has remained largely unchanged for 50 years.
The figures in the top half of Table 1 also illustrate the dramatic declines in death rates in the last 50 years among persons ages 45 to 64, particularly with respect to deaths due to circulatory disease. Between 1970 and 2019 deaths due to circulatory diseases declined by 60 to 70 percent. Notably, this decline in death rates over the last 50 years has been about the same, or greater, for Black people as compared to White people. This suggests that, to the extent that the declines in death rates are due to medical advances, Black and White people have benefited equally from these advances, which is inconsistent with some claims about the importance of race as a fundamental cause of health disparities (Phelan and Link 2015). Alternatively, if the declines in death rates are due to changes in the social determinants of health, for example income and education, then the roughly equal declines in mortality suggest similar improvements in these causes for both Black and White people.
It is also worth highlighting that the racial disparity in the cancer death rate is significantly smaller than it is for the circulatory death rate. This almost surely reflects the etiology of these two leading causes of death that account for approximately 50 percent of all deaths. The causes of cancer are more likely to be due to genetics and less likely to reflect variation in treatment and social determinants of health (e.g., education) than the causes of circulatory disease. The relatively smaller racial disparity with respect to the cancer death rate is consistent with the genetic similarities of Black and White people. In contrast, the relatively large racial disparity with respect to circulatory deaths suggests that racial disparities in the quantity and quality of healthcare and racial differences in the social determinants of health may be an important explanation of this disparity. A corollary of this is that evidence supporting explanations of racial disparities in health should be more evident for circulatory disease than cancer because the latter has a large genetic component except in a few cases such as lung cancer (Mucci et al. 2016; Teerlink et al. 2012).
Circulatory disease deaths per 100,000 | Cancer deaths per 100,000 | |||||
Ages 45–64 | White | Black | Ratio | White | Black | Ratio |
1970 | 496 | 889 | 1.79 | 286 | 390 | 1.36 |
1980 | 373 | 647 | 1.73 | 298 | 436 | 1.46 |
% Change 1970–1980 | −25% | −27% | 4% | 12% | ||
Change 1970–1980 | −123 | −242 | 12 | 46 | ||
2000 | 182 | 383 | 2.10 | 226 | 314 | 1.39 |
2019 | 167 | 305 | 1.83 | 191 | 229 | 1.20 |
% Change 2000–2019 | −8% | −20% | −15% | −27% | ||
Change 2000–2019 | −15 | −78 | −35 | −85 | ||
% Change 1970–2019 | −66% | −66% | −33% | −41% | ||
Change 1970–2019 | −329 | −584 | −95 | −161 | ||
Ages 65 and older | White | Black | Ratio | White | Black | Ratio |
1970 | 3882 | 3779 | 0.97 | 936 | 953 | 1.02 |
1980 | 3126 | 3160 | 1.01 | 1021 | 1148 | 1.12 |
% Change 1970–1980 | −17% | −16% | 9% | 20% | ||
Change 1970–1980 | −756 | −619 | 85 | 195 | ||
2000 | 2319 | 2568 | 1.11 | 1172 | 1335 | 1.14 |
2019 | 1376 | 1483 | 1.08 | 868 | 904 | 1.04 |
% Change 2000–2019 | −41% | −42% | −26% | −32% | ||
Change 2000–2019 | −943 | −1085 | −304 | −431 | ||
% Change 1970–2019 | −65% | −61% | −7% | −5% | ||
Change 1970–2019 | −2506 | −2296 | −68 | −49 |
The bottom half of Table 1 presents death rates of persons aged 65 and older. Death rates at these ages are three to four times larger than at ages 45 to 64. Racial disparities in death rates are much smaller among those aged 65 and older than among those aged 45 to 64. The ratios of Black to White death rates indicate a Black disadvantage of 5 to 10 percent—still high but not the 40 to 100 percent differences that characterize the younger cohort. The likely explanation for this relatively small disparity is selective mortality. The relatively high death rates among Black people aged 45 to 64 (and younger) leaves a relatively healthy cohort of persons aged 65 and older—those who made it to 65 are healthier than the average person in their cohort. Given the significantly higher death rates of Black people aged 45 to 64 than same-aged White people, Black people aged 65 and over are likely to be unusually healthy and this results in death rates much closer that of White people at these ages.
The critical question that the figures in Table 1 raise is why the death rates of Black people are so persistently high relative to White people. It is this question that AW (2018) address. AW assess whether the revelation in 1972 about the government’s unethical experiment that withheld treatment for syphilis from Black men in Tuskegee (Macon County, Alabama) affected Black men’s use of health care and mortality. The hypothesis motivating this study is that the Tuskegee Experiment heightened mistrust of the medical profession among Black men, but not Black women, and that this heightened mistrust led to less care and more death among Black men.
The importance of AW (2018) is that there is, at least in my opinion, a lack of credible evidence supporting various explanations of racial disparities in health. Obviously, discrimination is a potential explanation for racial disparities in health, but to what extent it explains disparities remains elusive. For example, studies of whether a person was exposed to interpersonal racism have not produced consistent evidence of racism’s impact on Black people’s health. These studies are also mostly cross-sectional, measure discrimination at a point in time, and lack credibility because of the absence of an experimental or quasi-experimental approach (e.g., Williams and Mohammed 2013; Liu and Kawachi 2017; Chae et al. 2020; 2012; Dunlay et al. 2017). Similarly, studies of physician bias do not provide consistent evidence that personal bias, which is documented among physicians, results in systematic racial differences in treatment or in health outcomes (e.g., van Ryn et al. 2011; Cooper et al. 2012; Haider et al. 2015; Hirsh et al. 2015; Green et al. 2007). Social determinants of health, for example, education, income, health insurance, and neighborhood, explain only part of the racial gap in health (see, e.g., Williams et al. 2016). Racial health disparities persist and remain non-trivial within education and/or income groups. Moreover, it is debatable whether it is appropriate to use factors such as education and income that may be a result of racial discrimination to explain another outcome that also may be the result of racial discrimination. Similarly, racial disparities in health persist among people with the same health insurance coverage (Ochieng et al. 2021; California Department of Health Care Services 2023).
Other obvious explanations for racial disparities in health also come up short in terms of convincing evidence. For example, there are few racial differences in the use of effective, preventive health care services. Consider hypertension, which is one of the primary markers of circulatory illness and antecedent of circulatory death (e.g., heart attack), the largest cause of death among adults. Awareness and treatment of hypertension are basically the same for Black and White people (Aggarwal et al. 2021). Similarly, rates of undiagnosed hypertension, hypercholesterolemia, and diabetes, which are indicators of inadequate diagnosis, are equal between Black and White people (Fryar et al. 2010). Finally, screenings for breast, colorectal, cervical, and prostate cancers are relatively equal by race (Islami et al. 2022).
AW (2018) and its underlying hypothesis of mistrust relates to one explanation of racial disparities that has some evidence to support it. That explanation is that there are racial disparities in the effectiveness of treatment. Again, consider hypertension. While awareness and receipt of treatment are equal, the effectiveness of treatment, as measured by uncontrolled hypertension, does differ significantly by race (Fryar et al. 2010; Aggarwal et al. 2021). There are also racial differences in statin use among those with similar cardiovascular risk (Jacobs et al. 2023). In the case of cancer, there is evidence of racial differences in the timeliness of treatment and variation in the guideline-concordant care (Islami et al. 2022). While the effectiveness of treatment and the choice of treatment protocol differ because of both patient and provider factors, patient trust and poor patient-provider communication may explain the lower effectiveness of treatment. Some evidence supporting this explanation is provided by Alsan et al. (2019) and Michael Frakes and Jonathan Gruber (2022). Both studies show that racial concordance between patient and physician significantly affects healthcare use and health—Black patients seen by Black providers have better outcomes than Black patients seen by non-Black providers.In contrast, there is no advantage of having a same-race provider for White patients.
The large, persistent racial disparities in health and the absence of convincing evidence of what causes these disparities underscores the potential importance of AW (2018) and its implication that mistrust and poor communication may be an important cause of racial disparities in health. AW conclude that the Tuskegee Experiment resulted in a decrease in use of health care and increase in mortality among Black men that lived near Tuskegee. The lesson according to AW is:
As long as biased beliefs, policies, and practices are still prevalent in the U.S. healthcare system, mistrust is a rational response that may continue to contribute to health disparities. (AW 2018, 451)
AW (2018) has 573 citations according to Google and its conclusions have been accepted as valid. However, and putting aside the fact that AW (2018) is a historical analysis with questionable relevance for current racial disparities in health, I show that the AW (2018) evidence of the effect of the Tuskegee Experiment on Black men’s mortality does not stand up to greater scrutiny. Therefore, the study and its conclusions should not be cited as evidence of the potential for Black mistrust of the medical system and poor communication between Black patients and White providers to explain racial disparities in health. Instead, research should continue to search for the causes of the high and persistent racial disparities in health because they remain largely unknown.
The unfounded AW hypothesis
The AW (2018) analysis is based on the hypothesis that the revelation of the Tuskegee Experiment in 1972 decreased trust in the medical profession among Black men, which resulted in a decrease in use of healthcare and higher death rates, and that this decrease in trust was greater the closer a person lived to Tuskegee (Macon County), Alabama. The study’s focus on Black men is based on two assumptions: first, because the Experiment was conducted using men it had greater saliency for men than women; and second, women are likely to have more experience with the medical system and, therefore, their health-seeking behavior was less adversely affected by the Experiment. The assertion that the Tuskegee Experiment had greater saliency for men is speculative and supported by one citation, a study by William Maddux and Marilynn Brewer (2005). In that study, the authors conducted an online game among 143 students at Ohio State University who had a friend at another Big Ten university. Students were asked whether they would accept a “sure thing” payment of $3 or accept whatever share of $11 a person from Ohio State University or from another university allocated to them. Men were more likely to take the $3 when it was from someone outside Ohio State University. This result was interpreted by the authors as evidence that men had greater willingness to trust in-group versus out-group people (AW 2018, 423). This was the only evidence cited to support the assumption that men, and not women, would be affected by the Tuskegee Experiment! The problems with this reference are obvious. It is conducted on a sample of college students; it is likely that only a small fraction of the students in the study was Black given the makeup of Ohio State University; and it took place around 2005. The study’s relevance as evidence of gender differences in adult Black male personality traits (in- versus out-group trust) in 1972 is questionable. Finally, how it relates to the Tuskegee context escapes this writer.
The assertion that the Tuskegee Experiment would affect women less than men because women have more experience with the medical system than men is unsupported by any evidence besides the authors’ assertion of it. Women use more healthcare services than men and one implication, which differs from the assertion of AW, is that therefore the Tuskegee Experiment and mistrust of healthcare providers could affect women more than men. The greater use of healthcare by women suggests that medical care plays a more important role in their health and that decreases in the use of care would have relatively larger impacts on women than men.
The central role in the AW hypothesis of proximity to Tuskegee is based on the assumption that physical distance to the Tuskegee Experiment matters and those geographically closer would be more affected.AW (2018) also use the share of migrants from Alabama in a State Economic Area as an alternative measure of “proximity”. However, this measure also reflects geographic distance. It assumes that changes in beliefs and actions of those from Alabama in response to Tuskegee are larger than for other Black people. To justify the use of proximity to Tuskegee as a central feature of their hypothesis and empirical analysis, AW (2018, 423) rely on an article by Guido Tabellini (2008). This reference seems, at best, misplaced. Tabellini (2008) develops a game-theoretic model of endogenous social norms (values). According to Tabellini:
Summarizing, the general insight of the model is that the evolution of values reflects the patterns of economic interactions relative to the pattern of moral ties between individuals. Whatever increases the likelihood of interactions in the region between Y0 and Y1, where the distinction between limited and generalized morality matters, also increases the diffusion of trustworthiness within the community. Very local interactions (below Y0) or very distant interactions (above Y1) have the opposite effect, because the distinction between limited and generalized morality has no behavioral implication in those regions, and this dampens the incentive to invest in good values. (Tabellini 2008, 932)
The model literally assumes that values are transmitted within the family, and that only parents make purposeful educational choices. In practice, other channels of cultural transmission, from peers, own experience, educational institutions, or the media, are also likely to be important. (Tabellini 2008, 940)
From my reading of the article, Tabellini (2008) does not address in any way the assumption that the revelation of the Tuskegee Experiment would have effects that vary by distance to Tuskegee.
In fact, media exposure, which Tabellini (2008) does reference, to the Tuskegee Experiment was geographically widespread and is likely to have raised awareness and saliency of the Tuskegee Experiment broadly throughout the country. Public awareness of the Tuskegee Experiment occurred in July 1972 after the publication of its existence by the Associated Press. The extent and timing of newspaper coverage can be illustrated by reviewing the archives of several major newspapers and prominent Black newspapers—newspapers with mostly Black readerships.
1972 | 1973 | 1974 | 1975 | 1976 | 1977–1979 | |
New York Times | 9 | 4 | 4 | 2 | 1 | 1 |
Washington Post | 5 | 8 | 3 | 1 | 1 | |
Chicago Defender | 4 | 4 | 1 | 1 | 1 | |
Amsterdam News | 1 | 4 | 1 | |||
Atlanta Daily WorldThe Atlanta Daily World was the closest major Black newspaper to Macon County, Alabama, and to the home of the Centers for Disease Control that was responsible for the Tuskegee Experiment. | 7 | 2 | 1 | |||
Philadelphia Tribune | 7 | 3 | 2 | |||
Pittsburgh Courier | 3 | 2 | 1 | |||
Michigan Chronicle | 4 | 2 | 1 | |||
Los Angeles Sentinel | 4 | 1 | 2 |
As the counting of newspaper articles in Table 2 makes clear, there was an initial burst of coverage in 1972 followed by sporadic coverage mostly about the terms of the legal settlement with the government in 1974. In terms of network television news, there was relatively little coverage of the Tuskegee Experiment according to the Vanderbilt Television News Archive, which was also noted by AW. There was one story on ABC Evening News in 1972 and three stories on CBS Evening News in 1973. It is clear that media coverage was widespread throughout the country including in major Black newspapers. Therefore, it is likely that Black people, men and women, throughout the country were aware of the unethical and deceptive study. While there is no evidence of the awareness of Tuskegee among Black people in the subsequent period immediately after Tuskegee, there are many studies that show widespread knowledge of the Tuskegee Experiment in later years (Gamble 1997; Corbie-Smith et al. 1999; Shavers et al. 2000; Brandon et al. 2005; McCallum et al. 2006; Katz et al. 2008). All this evidence suggests that knowledge of, and possible response to, the Tuskegee Experiment was widespread and not particular to those nearer Tuskegee. The evidence also suggests that Black women and Black men had about equal knowledge of the Experiment.
Overall, the foundational assumptions of the AW (2018) analysis, that effects of the Experiment, if any, were specific to men and differed systematically by geographical proximity to Tuskegee are not well-founded. The hypothesis lacks scientific plausibility. Given that, an empirical analysis based on that hypothesis is best viewed as speculative and exploratory—not causal.
However, I do not rest my argument on this basis alone. Next, I show empirically in greater detail that the evidence supposedly supporting the AW hypothesis does not stand up to greater scrutiny.
Data
The data for the analysis come from the Centers for Disease Control (CDC), which makes available counts of deaths by year, county, age, sex, and cause (Compressed Mortality file; link). These are the same data used by AW. The CDC also makes available population counts by the same demographic characteristics. The counts of death and population counts can be combined to form death rates by year, county, age, and cause. I focus on those ages 45 to 74, which is the same age group examined by AW. There are four race-by-sex groups: Black males, Black females, White males, and White females. There are 20 years (1968 to 1987) with two “periods”: 1968–1972, and 1973–1987. The 1968–1972 is the pre-Tuskegee period and the 1973–1987 is the post-Tuskegee period. These are the same years used by AW.It is not obvious why there is such a long post-Tuskegee period. The other analysis contained in the AW (2018) related to health care utilization does not use date form the 1980s.
I examine five causes of death: cardiovascular disease (CVD), cancer, respiratory and gastrointestinal disease, external causes, and chronic diseases. The last category, chronic diseases, was used by AW and is an amalgam of several causes of death including CVD, cancer, gastrointestinal, some respiratory diseases, diabetes, and ill-defined symptoms. CVD, cancer and respiratory and gastrointestinal causes of death make up approximately 88 percent of the AW (2018) chronic diseases category. Conducting analyses by cause of death is important for several reasons. First, the AW hypothesis, if correct, applies to all diseases, but it should be more pronounced for those that are most amenable to detection and treatment. As noted earlier, cancer mortality reflects genetic factors more so than death due to circulatory disease and other treatment-amenable illnesses. Thus, the hypothesis that the Tuskegee Experiment decreased trust and the use of medical care among Black men would be expected to have much more muted effects on cancer mortality because medical care in the 1970s and 1980s had relatively little to offer cancer patients. Routine screening for the most prevalent cancers was not highly prevalent. The American Cancer Society did not recommend routine mammography until 1976 and colorectal cancer screening until 1979 (link). Screening for prostate cancer through a Prostate Specific Antigen (PSA) test was not available prior to 1986, when the FDA first approved the PSA test, and was not routine until 1994 (link). While medical advances in cancer treatment such as chemotherapy were occurring in the 1970s and 1980s, the use of these services was still relatively limited (DeVita and Chu 2008). It is not surprising then that, as shown in Table 1, death rates due to cancer increased between 1970 and 1980 for both Black and White people, and that, nationally, the cancer death rate did not start to decline until 1990 when screening and treatment became more effective and widespread.It is likely that the increase in cancer deaths reflects the decline in deaths due to circulatory disease because these are competing risks of deaths. Nevertheless, there is little evidence of medical advancement in cancer care between 1970 and 1980.
Similarly, it is widely accepted that external causes of death are not easily preventable or treatable. Thus, exposure to Tuskegee Experiment should have little effect on deaths from external causes. In contrast, and as demonstrated in Table 1, there has been tremendous progress in treating CVD and there are effective treatments for CVD. It is this cause of death that should be most sensitive to the Tuskegee Experiment. Deaths from respiratory and gastrointestinal disease are also treatable, but less common than CVD. Together CVD and cancer account for approximately 50 percent of all deaths among those ages 45 to 74—the target sample used by AW and in this analysis.
A preliminary assessment
of the evidence presented by AW
Given the lack of scientific plausibility of the hypothesis underlying AW (2018), it is worth assessing preliminarily the predictions of the AW hypothesis. The hypothesis predicts that Black men would use less healthcare after the revelation about Tuskegee and that this decrease in use of care would increase mortality. These effects would be larger the closer a person was to Tuskegee. The hypothesis also suggests smaller effects for Black women than Black men and no effects for White people of any sex.
Some of these predictions are not borne out by the data presented in AW (2018). For example, AW measure the change in mortality pre- and post-1972 for Black versus White females (AW 2018, 428 Figure III Panel C) and found that Black female mortality decreased on average by 9 percent relative to White females pre- to post-1972, and that this decrease was larger the closer that a woman lived to Tuskegee. What explains this apparent beneficial effect for Black females and why was the benefit greater with proximity to Tuskegee? It is clearly inconsistent with the AW hypothesis.
Other evidence presented by AW also highlights the fragility and inconsistency of the evidence for the AW hypothesis. For example, AW (2018, Appendix p. 31 Table A.6) reported that Black male deaths due to external causes were 5.5 percent higher for every 1000 kilometers closer a person lived to Tuskegee. This effect is two-thirds as large as the effect for deaths caused by chronic diseases even though there are few to no healthcare services that prevent external causes of death and, therefore, little scope for mistrust to affect care that would prevent external causes of death.
Finally, consider the AW analyses that replace Tuskegee with other locations (2018, 444–445). Those “permutation tests” show that replacing Tuskegee with other locations results in many significant effects—the pre- to post-1972 change in Black male mortality (relative to other race-sex groups) differs by distance to those locations. While the effect is relatively large for Tuskegee, it is near as large or larger, although oppositely signed, for many other locations. The AW hypothesis does not predict that other locations would have results like Tuskegee, except perhaps those with a small radius of Tuskegee. While AW consider the relatively large effect of Tuskegee to be supportive of their hypothesis, such an interpretation ignores that a similar result is found for many other places, which has no explanation. This misinterpretation by AW of the results for Tuskegee stems from the lack of a well-founded, ex ante hypothesis that guides an empirical analysis, which encourages AW to explain away results in an ad hoc, ex post manner.
A simple, transparent assessment
of the AW hypothesis
Notwithstanding the lack of scientific foundation of the AW hypothesis, the simple empirical prediction of it is that, relative to other demographic groups, for example, White males, the change in Black male mortality pre- to post-1972 should be more negative the closer a person lives to Tuskegee. This prediction should hold for all causes of death but should be more pronounced for causes that are more amenable to treatment, for example, cardiovascular disease. All that is needed to test this hypothesis are death rates by year-by-race-by-sex-by-distance (to Tuskegee). Accordingly, I assess the predictions of AW (2018) by calculating the pre- to post-1972 difference in mortality of Black men and women, and White men and women, by distance to Tuskegee. Distance to Tuskegee is measured in 150-kilometer increments that result in 24 distance bins, which is the same as in AW (2018, 426).There is a 25th distance category for White persons, but so few Black persons live in that category that I drop it from the analysis. Thus, the sample size for a typical analysis is a maximum of 1,920 (4 demographic groups × 24 distance bins × 20 years).
AW chose to use a more complicated data setup than I, although the differences are not material in terms of results. First, instead of using the 24 distance bins to define geography, AW used either county or State Economic Area (SEA) as the geographical unit of analysis. SEAs are an old geographical designation created by Donald Bogue (1951) that are a single large county or groups of counties within the same state with similar economic characteristics. There are approximately 500 SEAs compared to approximately 3,000 counties. Second, AW aggregated deaths across two-year periods instead of single years. The explanation for these aggregations was that the Black population was relatively small within a county or SEA and that combining years would reduce measurement error in death rates due to imprecise estimates of population counts (deaths are not estimated). In the end, AW (2018) had sample sizes of approximately 20,000 when SEA was used and 120,000 when county was used. Third, when calculating distance to Tuskegee using either county or SEA, AW may have combined areas “in the middle of the country with less than 2,500 black men in 1970” (2018, 428). It is not clear whether they did so for all analyses or just the analysis related to the quote. As noted, these differences between the construction of the data between me and AW (2018) do not drive the differences in conclusions that I report later.
Patterns by cause of death
Figure 1 shows the pre- to post-1972 difference in mortality of Black men ages 45 to 74 by cause of death and distance to Tuskegee. To highlight how death rates change by distance to Tuskegee, all differences are relative to the pre- to post-1972 difference in Black male death rates in the geographic area encompassing Tuskegee (i.e., within 150 kilometers from Macon County, Alabama). I also show a line from an OLS regression with linear trend and intercept set to zero to show the gradient in the pre- to post-1972 change in mortality with respect to distance from Tuskegee. What is most notable about Figure 1 is that the gradient in the pre- to post-1972 difference in Black male death rates with respect to distance to Tuskegee differs markedly by cause of death. For CVD, the gradient is relatively steep and positive, although clearly not linear; the farther away from Tuskegee the higher is the pre- to post-1972 difference in CVD death rate, which is opposite of the simple prediction of AW (2018). For cancer and respiratory and gastrointestinal causes, the gradient is slightly negative, but again clearly not linear; the farther away from Tuskegee the lower the pre- to post-1972 difference in cancer death rate. The gradient with respect to distance from Tuskegee for external causes of death is nearly flat. The oppositely signed relationships of the pre- to post-1972 difference in death rates and distance from Tuskegee between CVD and other causes of death is inconsistent with the AW hypothesis, as is the highly non-linear association between the change in mortality pre- to post-1972 and distance from Tuskegee. This contradictory evidence is obscured by AW who only show results for all-cause, chronic causes and external causes. Combining causes of death hides the fact that for individual causes of death, the hypothesized relationship between the pre- to post-1972 change in mortality and distance from Tuskegee does not hold across causes. Most notably, for CVD mortality, which is the most likely to show the hypothesized pattern, Figure 1 shows the opposite.
Figure 1. Pre- to post-1972 difference in Black male mortality by cause and distance to Tuskegee differences relative to difference in Tuskegee
Image may be NSFW.Clik here to view.

More differences are better than less?
The main set of results reported by AW are derived from difference-in-differences (DD) models. For example, the pre- to post-1972 change in mortality of Black males is compared to the pre- to post-1972 change in mortality of White males (AW 2018, Appendix p. 27 Table IV). But AW do not stop at DD, and in the spirit of more differences are better than less, AW estimate a difference-in-differences-in-differences (DDD) model. In this model, the difference-in-differences among males just described is compared to the analogous difference-in-differences among females to yield the DDD estimate. Notably, none of these comparisons are justified ex ante by theory or evidence. It is just assumed that the pre- to post-1972 changes in mortality of other demographic groups are a valid and useful comparison (in the DD approach), or counterfactual, for the pre- to post-1972 change in mortality of Black males. And whether a DD is adequate or if a DDD is warranted is something never discussed. AW just report results, and because the signs of DD and DDD estimates are the same, take that as evidence that it doesn’t matter which approach is used. However, AW report DD and DDD estimates that differ by a factor of two, which is evidence that the approach matters, and I will show that some DD estimates have signs that are opposite of the DDD estimates relied on by AW.
The basic hypothesis of AW is that the pre- to post-1972 death rates of rates of Black males would be less negative the closer a Black male lived to Tuskegee. Implicit in this assumption is that Black males, and other demographic groups, near and far from Tuskegee are similar. Therefore, a good starting point in thinking about whether DD and DDD analyses are valid is to assess whether people near and far from Tuskegee are similar. Figure 2 shows the proportion of people with less than a high school degree and proportion employed by race, sex, and region. The data underlying Figure 2 is the 1980 Census (5% sample). There are four panels in Figure 2 showing results for each outcome (less than high school, and employed), and for each sex (male and female), and by race. The notable results of Figure 2 are that the people residing in the region containing Tuskegee are the lowest educated and least employed. Second, the variation across regions is greater for Black people than White people and greater for males than females. Figure 2 clearly shows that those in Tuskegee are very different from those in other regions with the largest differences being between Tuskegee and the farthest region (i.e., Pacific). Moreover, these regional differences vary by race and sex.
Given the differences across regions in education and employment, which are documented correlates of mortality, there is no reason to expect that, in the absence of the revelation about the Tuskegee Experiment, the pre- to post-1972 change in mortality would be the same across regions, or by distance to Tuskegee. But this is the assumption of the AW research design. Similarly, given the differences by region and by race and sex, it is unlikely that a DD or DDD analysis would be valid. The pre- to post-1972 difference in mortality is likely to differ by education and employment—i.e., by distance to Tuskegee—and this difference, differs by race and sex. Thus, the assumptions underlying DD and DDD analyses are questionable. AW provide no evidence to justify their DD or DDD analyses. But as just demonstrated, there are reasons for the pre- to post-1972 difference in mortality to differ by distance to Tuskegee and reasons why these differences would differ by race and sex. Figure 2 underscores a point I made earlier, that estimates of the effect of the Tuskegee Experiment reported in AW (2018) differed markedly by the choice of comparison group, for example, Black females or White males. Such sensitivity would be expected given the results in Figure 2 and the implications of those results. More importantly, AW provide no ex ante justification for why any of the DD and DDD comparisons would be valid or preferred. Combine this empirical problem with the lack of scientific plausibility of the AW hypothesis, and it is reasonable to dismiss all the results and conclusions of the AW analysis as, at best, speculative, and more likely, spurious.
Figure 2A. Differences by region in the proportion of males ages 45 to 74 with less than high school degree relative to East South Central Region (Tuskegee)
Figure 2B. Differences by region in the proportion of females ages 45 to 74 with less than high school degree relative to East South Central Region (Tuskegee)
Clik here to view.

Clik here to view.

Figure 2C. Differences by region in the proportion of males ages 45 to 74 employed relative to East South Central Region (Tuskegee)
Figure 2D. Differences by region in the proportion of females ages 45 to 74 employed relative to East South Central Region (Tuskegee)
Clik here to view.

Clik here to view.

Next, I show the pre- to post-1972 difference in mortality by cause of death and distance to Tuskegee for all four of the race-sex demographic groups. Doing so shows the building blocks of the AW DD and DDD analysis and highlights the empirical problems just described.
Figure 3 shows the pre- to post-1972 difference in mortality due to CVD for all four of the race-sex demographic groups. Again, the pre- to post-1972 difference in mortality is relative to the difference in Tuskegee to highlight how the difference changes with distance from Tuskegee. The first point to note about Figure 3 is that the death rates due to CVD of Black males and Black females tend to increase with distance from Tuskegee—exactly the opposite of the AW prediction. The increase in CVD mortality with distance from Tuskegee is more pronounced for females. Much of this apparent rise in CVD death rates with distance to Tuskegee occurs after about 2000 kilometers from Tuskegee—it is clearly not a linear relationship. For White males and females, there is little change in the pre- to post-1972 difference in CVD mortality with distance from Tuskegee. The second point to note is that it will clearly matter what comparison group is used in the DD and DDD analyses to account for unmeasured influences of the pre- to post-1972 difference in CVD death rates that vary by distance to Tuskegee. The pre- to post-1972 change in CVD mortality of the three potential comparison groups, White males and females and Black females, have somewhat different associations with distance from Tuskegee and DD and DDD estimates depend on those associations.
Figure 3. Pre- to post-1972 difference in CVD mortality by race, sex, and distance to Tuskegee differences relative to difference in Tuskegee
Image may be NSFW.Clik here to view.

In Figure 4, I show the DD and DDD estimates derived from the pre- to post-1972 differences in CVD death rates by distance to Tuskegee and the three potential comparison groups. There is a DD and DDD estimate for each of the 24 distance categories to Tuskegee. The estimates in Figure 4 are obtained by subtracting the pre-to-post change in Black male CVD mortality by distance from Tuskegee from the analogous change for the other demographic groups. In short, I am taking the differences between the lines in Figure 3 using Black males as the reference. As can be seen in Figure 4, using White males or White females as comparisons yields DD estimates of the effect of the Tuskegee Experiment on Black male CVD mortality that are increasing with distance from Tuskegee—the opposite of the AW hypothesis. In contrast, using Black females as the comparison group yields DD estimates that tend to decrease, although not linearly, with distance from Tuskegee. I also show the DDD estimates obtained from the difference in two DD estimates: the DD estimates that compare Black males to White males, and DD estimates that compare Black females to White females. The DDD estimates show that the effect of the Tuskegee Experiment on Black male CVD mortality is decreasing with distance from Tuskegee. It also merits remarking on the distinct non-linearity of the DD and DDD estimates with respect to distance from Tuskegee. For example, the DDD estimates have little relationship to distance from Tuskegee for the first 2000 kilometers.
Figure 4. Difference-in-differences in Black male CVD mortality by distance to Tuskegee
Image may be NSFW.Clik here to view.

Finally, it is important to note that virtually none of the DD or DDD estimates in Figure 4 are statistically significant. This is an important result, and it undoubtedly motivates why AW do not emphasize such estimates. Figure III in AW (2018, 428) presents similar estimates to those in Figure 4 and, although not discussed by AW, almost all estimates in their Figure III are not statistically significant. The lack of statistical significance of estimates in Figure 4, and in similar analyses presented below, motivated AW to present an alternative set of estimates based on a highly restrictive empirical specification that is inconsistent with the evidence in Figure 4. I will return to this point below.
Figure 5 is the same as Figure 3 except the cause of death is cancer. In this case, the pre- to post-1972 difference in cancer death rates of Black men decreases with distance from Tuskegee, although not linearly—the opposite of the pattern observed for CVD deaths (as already shown in Figure 1). For the other three race-sex categories, the pre- to post-1972 difference in cancer death rates show relatively little change by distance from Tuskegee.For Black females there is some evidence of an increase in cancer deaths in areas a couple thousand kilometers from Tuskegee, but these estimates are noisy due to relatively small populations. However, it is worth noting that, unlike for CVD, the pre-to-post change in Black female cancer mortality does not increase much with distance from Tuskegee. Why would this be the case? Why does the pattern in the pre- to post-1972 change in mortality of Black females differ between the two causes of death? The same puzzle characterizes the pre- to post-1972 change in mortality for Black males. The AW hypothesis predicts that Black men use less care, and that mortality would be relatively higher closer to Tuskegee. It cannot explain why Black male cancer mortality is relatively higher closer to Tuskegee, but Black male CVD mortality is relatively lower closer to Tuskegee. For Black females, the AW hypothesis predicts that they would use more or less the same amount of care after as before Tuskegee and that the pre- to post-1972 change in mortality would have the same general relationship with distance to Tuskegee. Again, neither of these predictions are reflected in the data.
Figure 5. Pre- to post-1972 difference in cancer mortality by race, sex, and distance to Tuskegee differences relative to difference in Tuskegee
Image may be NSFW.Clik here to view.

Figure 6 shows the DD and DDD estimates of the effect of the Tuskegee Experiment on Black male cancer mortality. For this cause of death, all DD and DDD estimates suggest that Black male cancer death rates pre- to post-1972 increased closer to Tuskegee, at least within the first 2,500 kilometers. Again, however, the relationship between the DD and DDD estimates and distance to Tuskegee is highly non-linear. Note that the declining DD and DDD estimates with distance from Tuskegee in Figure 6 are being driven by the relative decline in Black male mortality. For CVD, the declining DD and DDD estimates with distance from Tuskegee were caused by a relative increase in Black female mortality. There is nothing in the AW hypothesis that can reconcile these disparate results. While the DDD estimates seem similar for CVD and cancer, the underlying causes of those estimates is profoundly different and underscore the arbitrariness and lack of foundation of the AW empirical analysis.
Figure 6. Difference-in-differences in Black male cancer mortality by distance to Tuskegee
Image may be NSFW.Clik here to view.

It is also worth reiterating that any effect of the Tuskegee Experiment on cancer mortality is surprising. There were few effective screenings and treatments for cancer during this period. Thus, the hypothesized increase in “mistrust” and the purported decline in use of health care associated with it, is unlikely to have affected cancer mortality, or at a minimum, have had a smaller effect on cancer than CVD. Therefore, it is surprising, or more strongly implausible, that the AW hypothesis seems to be applicable with respect to cancer (at least up to 2,500 kilometers). It is more likely that the DD and DDD estimates for cancer mortality are spurious, suggesting that all DD and DDD estimates are suspect.
Taking stock of the evidence to this point
Before moving on to results for the remaining causes of death, it is worth taking stock of what has been revealed. First, there is substantial variation in the pattern of the pre- to post-1972 change in mortality by distance from Tuskegee by sex and by cause of death, as shown in Figures 3 and 5. AW provide no theory for why this would be the case. Second, and ignoring that virtually all DD and DDD estimates are statistically insignificant, DD estimates provide conflicting support for the AW hypothesis. Third, the relationship between DD and DDD estimates and distance to Tuskegee is highly non-linear, which again has no basis in the AW hypothesis. These three conclusions highlight the problem with the AW analysis. The empirical investigation was largely unmoored to any ex ante hypothesis as to how mortality rates would change pre- to post-1972 by race, sex and cause of death, and why those changes would differ by distance to Tuskegee. Only for Black males did AW offer a hypothesis, and this was not well-founded. The AW empirical exercise was largely atheoretical and best described as ‘any difference is better than none, and more differences are better than less.’ As a result, there is simply no basis to choose one DD or DDD estimate over another, and the lack of statistical significance, of virtually all estimates, makes all of the estimates suspect and suggest that none are credible. The highly non-linear association between DD and DDD estimates and distance to Tuskegee undermines the analyses of AW that restrict that association to be linear. I return to this last point below.
Assessing the evidence for other causes of death
Figure 7 presents the pre- to post-1972 difference in mortality due to respiratory and gastrointestinal disease (RGI) by race and sex. The figure shows that the pre- to post-1972 change in deaths due to RGI displays little relationship to distance from Tuskegee except at around 2,000 kilometers from Tuskegee for Black males, and to a lesser extent Black females. In this area (distance) there is a large decline in the pre- to post-1972 change in RGI mortality. These declines are not statistically significant and are the result of small populations of Black people. Figure 8 shows DD and DDD estimates of the pre- to post-1972 difference in RGI mortality. DD and DDD estimates decline slightly on average, but in a very non-linear way by distance from Tuskegee for Black males.The large deviations in the pre- to post-1972 difference in death rates around 2000 kilometers from Tuskegee are a result of small populations and noisy estimates of that population. The figure omits data points below −4. None of these DD and DDD estimates are significant.
Figure 7. Pre- to post-1972 difference in respiratory and gastrointestinal mortality by race, sex, and distance to Tuskegee differences relative to difference in Tuskegee
Image may be NSFW.Clik here to view.

Figure 8. Difference-in-differences in Black male respiratory and gastrointestinal mortality by distance to Tuskegee
Image may be NSFW.Clik here to view.

Figures 9 and 10 present similar estimates for external causes of death. As with cancer mortality, it is highly unlikely that the revelation of the Tuskegee Experiment would affect deaths from this cause because of a lack of known, effective treatments. Figure 9 shows that the pre- to post-1972 change in external causes of death were largely unrelated to distance from Tuskegee for all sex and race groups. As was the case for RGI mortality, the small population of Black people at around 2,000 kilometers from Tuskegee results in large changes in external deaths at that distance among Black males and females.
Figure 9. Pre- to post-1972 difference in external causes mortality by race, sex, and distance to Tuskegee differences relative to difference in Tuskegee
Image may be NSFW.Clik here to view.

As shown in Figure 10, the large declines in deaths due to external causes for Black males at the 2,000-kilometer range causes DD and DDD estimates to decline, on average, for example, if forced to have a linear relationship to distance from Tuskegee, as in AW (2018). As already noted, such a result for external causes of death is arguably implausible, and evidence of an invalid research design because external causes of death are not easily preventable or treatable.
Figure 10. Difference-in-differences in Black male external causes mortality by distance to Tuskegee
Image may be NSFW.Clik here to view.

Finally, Figures 11 and 12 present the same information as previous figures but for deaths due to chronic diseases, as defined by AW. Because CVD, cancer, and RGI account for approximately 85 percent of deaths due to chronic diseases as defined by AW, Figures 11 and 12 are an average of previous figures for these three causes of death.
Figure 11. Pre- to post-1972 difference in chronic disease mortality by race, sex, and distance to Tuskegee differences relative to difference in Tuskegee
Image may be NSFW.Clik here to view.

Figure 12. Difference-in-differences in Black male chronic diseases mortality by distance to Tuskegee
Image may be NSFW.Clik here to view.

Figure 11 shows that the pre- to post-1972 change in chronic diseases mortality for Black males declines modestly between Tuskegee and 2,500 kilometers and then rises markedly for the next 1,000 kilometers. This pattern is very much a combination (average) of the pattern for Black male mortality observed for CVD and cancer. An analogous ‘average’ applies to Black females. DD estimates obtained using White males or White females show little relationship to distance from Tuskegee (opposite of what AW hypothesize).
It is also interesting to note that AW reported DD estimates from models that use White males as a comparison that indicate that the pre- to post-1972 change in Black male mortality due to chronic diseases was higher the closer a Black male lived to Tuskegee (AW 2018, Appendix Table A.4 column 7). The evidence in Figure 12 is inconsistent with such an estimate, as the DD estimates obtained using White males as a comparison are constant over the first 1,500 to 2,000 kilometers from Tuskegee and then decline and rise over the last 1,500 kilometers—clearly non-linear and hardly evidence that Black male mortality was adversely affected by the Tuskegee Experiment the closer a Black male lived to Tuskegee. The AW estimate is likely due to the highly restrictive approach they use that forces a linear relationship between the DD and DDD estimates by distance to Tuskegee shown in Figure 12. I return to this point below.
Takeaways
My separate analysis and unpacking of results for individual causes within the chronic diseases causes demonstrated how misleading is the analysis of the aggregate category of deaths used by AW. The underlying patterns in the association of pre- to post-1972 changes in mortality with distance from Tuskegee differ by race, sex, and cause of death, and failing to show these separate changes masks the inconsistency of the evidence to support the AW hypothesis, and the fragility of DD and DDD estimates. Further evidence that the AW conclusions are questionable is that virtually none of the DD or DDD estimates in any of the figures above are statistically significant. Second, while it appears that some of the DDD estimates are similar across causes of death in terms of showing a tendency to decline with distance from Tuskegee, although not linearly, those patterns are being driven by different underlying trends, for example, by a decline in the pre- to post-1972 change in cancer mortality among Black males in the case of cancer, but an increase in the pre- to post-1972 change in CVD mortality among Black females in the case of CVD. Third, the similarity of DDD estimates for CVD and cancer is surprising because it is unlikely that the Tuskegee Experiment would affect cancer mortality. Similarly, the absence of an effect of the Tuskegee Experiment on RGI causes of death, which are also relatively more amenable to treatment, is additional evidence that the AW hypothesis is not valid.
The AW restricted specification
The lack of statistical significance of virtually all the DD and DDD estimates presented above, and in AW (2018, e.g., Figure III), likely motivated AW to move to an alternative approach. If not, then AW could have concluded little because of the lack of statistical reliability of the DD and DDD estimates. The alternative approach used by AW replaces the DD and DDD estimates calculated for each geographical distance from Tuskegee with what they refer to as an interacted DD and DDD analysis. The interacted DD and DDD approaches restrict the DD and DDD estimates in the above graphs to follow a linear relationship with distance from Tuskegee. As I have already noted several times, and as clearly shown in the previous figures, DD and DDD estimates of the effect of the Tuskegee Experiment on Black mortality do not have a linear relationship to distance from Tuskegee.
To construct the interacted DD and DDD estimates, AW obtained estimates of the effect of distance from Tuskegee on the pre- to post-1972 change in mortality for each of the four race and sex groups. The effect of distance is constrained to be linear—every additional kilometer away from Tuskegee had the same effect regardless of whether that was from 200 to 201 kilometers or 2,000 to 2,001 kilometers. No justification was provided for this assumption, and as the prior figures show, this assumption is clearly not valid. Consider Figure 11 related to chronic diseases mortality. The pre- to post-1972 change in mortality among Black males and females clearly does not have a linear relationship to distance from Tuskegee.
AW calculate DDD estimates by subtracting the estimate of the effect of distance for White males from that of Black males (DD estimate of the effect of distance for Black males); subtracting the estimate of the effect of distance for White females from that of Black females (DD estimate of the effect of distance for Black females); and then subtracting the DD estimate of the effect of distance for Black females from the DD estimate of the effect of distance from Black males. They do this in a regression framework, but the logic is the same as just described.
Figure 13. Pre- to post-1972 difference in CVD mortality by race, sex, and distance to Tuskegee differences relative to difference in Tuskegee
Image may be NSFW.Clik here to view.

To show that this restricted approach is problematic, I obtain estimates of the linear effect of distance from Tuskegee on the pre- to post-1972 change in mortality by race, sex and cause of death. I then superimpose these linear relationships on the actual pre- to post-1972 changes in mortality for each of the 24 distances from Tuskegee for each race and sex. Figure 13 shows the pre- to post-1972 change in CVD mortality by race, sex, and distance to Tuskegee, and the predicted change in the pre- to post-1972 change in CVD mortality based on the linear estimate of the effect of distance from Tuskegee on the pre- to post-1972 change in CVD mortality. It is clear from Figure 13 that, for Black males and females, the pre- to post-1972 change in CVD mortality does not have a linear relationship with distance from Tuskegee. For Black males, the pre- to post-1972 change in CVD mortality first declines with distance from Tuskegee; then remains flat until about 700 kilometers; then rises to zero and remains flat until about 2,500 kilometers; and then increases for the last 1,000 kilometers. In short, the pre- to post-1972 change in Black male CVD mortality is non-linear and forcing a linear relationship obscures this fact. Indeed, the linear relationship makes it seem as if the pre-to-post change in Black male CVD mortality increases with distance from Tuskegee.
Figure 14 shows the DD and DDD estimates based on the restrictive linear approach and are constructed from the estimates in Figure 13. When White males and White females are used as comparisons, DD estimates indicate that Black male CVD mortality increases with distance from Tuskegee, which is exactly the opposite of the AW hypothesis. The misleading nature of these estimates is evident from observing how the DD estimates using White males or White females as comparisons are being driven by the restrictive and misleading linear association between the pre- to post-1972 change in Black male CVD mortality and distance from Tuskegee shown in Figure 13. AW provide no rational for why these wrong-signed estimates are not valid. In contrast, the DDD estimate in Figure 14 indicates a negative association between the pre- to post-1972 change in Black male CVD mortality and distance from Tuskegee. This estimate is being driven by the increase in Black female CVD mortality with distance from Tuskegee—another finding lacking any rationale.
Figure 14. Difference-in-differences in Black male CVD mortality by distance to Tuskegee
Image may be NSFW.Clik here to view.

Because AW focus on chronic diseases mortality, in Figure 15 I present the DD and DDD estimates using their restrictive approach and superimpose those estimates on the unrestricted DD and DDD estimates obtained for each of the 24 geographic areas from Tuskegee. This figure, again, reveals the inappropriateness of the AW approach. Restricting the DD and DDD estimates to have a linear relationship with distance from Tuskegee masks considerable variation of DD and DDD estimates with distance from Tuskegee and makes it seem if the AW hypothesis is applicable despite not being the case. Here too, various DD and DDD estimates provide conflicting evidence for the AW hypothesis and whether one or another is closer to the truth is unknown. There was no ex ante evidence or theory to favor one over the other estimate and no ex ante evidence or theory to believe that any are correct.
Figure 15. Difference-in-differences in Black male chronic diseases mortality by distance to Tuskegee
Image may be NSFW.Clik here to view.

Replication
I begin with DDD estimates of the effect of the Tuskegee Experiment on Black male mortality due to chronic diseases. When the dependent variable is the log death rate, AW reported estimates of 0.087 (Table 1, Column 7) and 0.048 (Appendix Table A.5, Column 5) relating to SEA-level and county-level data, respectively. The fact that the two DDD estimates reported by AW differ by a factor of two between the two types of data reveals the inherent statistical noise of the analysis and how the restrictive, linear specification can be quite misleading because of that noise (imprecision). For example, the smaller estimate suggests that the pre- to post-1972 change in chronic diseases mortality of Black men in Tuskegee is 14 percent higher than that of a Black male in Los Angeles. But the larger estimate suggests an analogous figure of 26 percent. Both are large changes, for example, compared to the 10-year change in CVD mortality, which was 27 percent, as shown in Table 1. Is it plausible that the Tuskegee Experiment had the same effect (26 percent) on Black men in Tuskegee as did 10 years’ worth of progress treating CVD (27 percent)? My DDD estimates are 0.052 and 0.038 relating to 24-category and county-level data, respectively. These two estimates are very similar to the AW county estimate. When the dependent variable is the level of mortality rate, my estimates are similar to the SEA-level estimate reported by AW. The rest of Table 3 shows estimates for other causes of death. AW reported estimates for only one other cause of death—those from external causes. My estimates are like those reported by AW.
Cause of death | |||||
Log mortality rate | Chronic diseases | External causes | CVD | Cancer | Respiratory gastrointest’l |
AW restrictive, linear DDD estimate SEA geography | 0.087** (0.022) | 0.055 (0.042) | NA | NA | NA |
AW restrictive, linear DDD estimate county geography | 0.048** (0.019) | NA | NA | NA | NA |
My restrictive, linear DDD estimate 24 geographies | 0.052** (0.022) | 0.102** (0.033) | 0.064** (0.020) | 0.016 (0.032) | 0.010 (0.027) |
My restrictive, linear DDD estimate county geography | 0.038** (0.011) | 0.051** (0.017) | 0.030** (0.012) | 0.029 (0.017) | 0.001 (0.018) |
Level mortality rate | Chronic diseases | External causes | CVD | Cancer | Respiratory gastrointest’l |
AW restrictive, linear DDD estimate SEA geography | 1.566** (0.770) | 0.259 (0.253) | NA | NA | NA |
My restrictive, linear DDD estimate 24 geographies | 0.871** (0.422) | 0.153 (0.099) | 0.458** (0.179) | 0.220 (0.197) | 0.147 (0.163) |
My restrictive, linear DDD estimate county geography | 1.368 (0.921) | 0.118 (0214) | 0.782 (0.638) | 0.056 (0.513) | 0.395 (0.336) |
To summarize, the lack of statistical significance of DD and DDD estimates obtained for each geography separately, such as those in Figure III (p. 427) of AW (2018), and the highly non-linear relationship between those estimates and distance to Tuskegee that is inconsistent with their hypothesis, likely motivated AW to move to a restrictive specification that imposed a linear pattern on DD and DDD estimates of the effect of the Tuskegee Experiment on Black male mortality with distance from Tuskegee. As I have shown throughout the article to this point, this was a choice clearly inconsistent with the data. This inappropriate specification resulted in misleading estimates that seemed to support the hypothesis that the Tuskegee Experiment adversely affected Black male mortality of Black males living near Tuskegee (relative to Black males living further away, and many other differences not supported by evidence or theory). When the data is presented and seen in a transparent manner, as in Figures 4, 6, 8, 10, 12, 14, and 15, there is little reason to believe that the AW hypothesis is supported by the data. Moreover, even if you take as valid the AW restrictive, linear approach, estimates in Table 3 do not show uniform support for the AW hypothesis.
More problems: age-specific estimates
Another aspect of the AW analysis that is problematic and beclouding is the combining of age groups into one analysis. Death rates differ markedly by age and grow substantially from ages 45 to 74. As shown in Table 1 above, death rates of those ages 65 to 74 are three to four times larger than the death rates of those ages 45 to 64. The scope for medical care to have an impact grows with age as the prevalence of disease rises and treatment becomes more critical to maintaining good health. Thus, it is reasonable to expect that the Tuskegee Experiment, if it did increase mistrust and decrease use of medical care, would have a larger effect on older Black males than younger Black males.
Chronic diseases | CVD | Cancer | ||||
Log mortality rate | Ages 45–64 | Ages 65–74 | Ages 45–64 | Ages 65–74 | Ages 45–64 | Ages 65–74 |
My restrictive, linear DDD estimate 24 geographies | 0.067** (0.024) | 0.004 (0.019) | 0.087** (0.026) | 0.040 (0.027) | 0.037 (0.036) | 0.009 (0.034) |
My restrictive, linear DDD estimate county geography | 0.043** (0.013) | 0.014 (0.018) | 0.042** (0.016) | 0.011 (0.020) | 0.034 (0.018) | −0.027 (0.024) |
Level mortality rate | Ages 45–64 | Ages 65–74 | Ages 45–64 | Ages 65–74 | Ages 45–64 | Ages 65–74 |
My restrictive, linear DDD estimate 24 geographies | 0.881** (0.390) | 0.768 (0.753) | 0.381** (0.148) | 0.699 (0.650) | 0.270 (0.188) | 0.040 (0.407) |
My restrictive, linear DDD estimate county geography | 1.043 (1.036) | 2.477 (1.912) | 0.759 (0.602) | 1.173 (1.514) | −0.045 (0.569) | 0.233 (1.118) |
To assess this corollary of the AW hypothesis, I re-estimated the restrictive, linear DDD models using my samples but stratified by age: 45 to 64 and 65 to 74. AW do not report such estimates. I present these results not because I believe the AW analysis is valid, as I have provided ample evidence that it is not, but to show that even within the logic of their analysis, the hypothesis that the Tuskegee Experiment adversely affected Black men’s health does not hold across ages.
Table 4 presents the age-stratified DDD estimates for deaths due to chronic diseases, CVD, and cancer. Estimates in Table 4 differ markedly by age and there is little evidence of a larger effect of the Tuskegee Experiment on older Black men. In fact, for older males, not one estimate is statistically significant. Note too, that the lack of statistical significance isn’t because standard errors are substantially larger in the stratified analyses. While it is possible to tell stories that could reconcile the disparate estimates by age, they would all be ex post and speculative, much like the entire AW analysis.It is likely, as noted in the introduction, that older Black males are unusually healthy because of the relatively high rates of mortality at younger ages among Black males. But, death rates are very high at these ages and the role of medical care at these ages is still likely to be much more important than at younger ages. At a minimum, the inconsistent estimates across age groups are not easily explained by the AW hypothesis. Why would mistrust among Black males differ by age? AW do not have an answer. And, as already noted, decreases in the use of healthcare, for example, because of mistrust, would have more scope to affect mortality of older Black men. But we do not observe this.
Conclusion
The Tuskegee Experiment was racially motivated, unethical, and sanctioned by the federal government. It is plausible that such a despicable activity sponsored and conducted under the auspices of the government could have created mistrust of the medical profession among Black people. This hypothesis is widely held and continues to be viewed as a possible explanation of differences in health seeking behavior by race, for example, in response to the introduction of COVID-19 vaccines (Manning 2020; Ojikutu et al. 2022).
The role of mistrust of medical providers by Black people is important, not only from a historical perspective, but also because it may help explain large and persistent disparities in health by race. There are disparities in the effectiveness of medical treatments by race that are possibly explained by mistrust and the broader notion of poor patient-provider communication. Thus, establishing empirically a causal link between the Tuskegee Experiment and the health of Black people is an important endeavor with significant social and political ramifications. It is for this reason that AW (2018) was worth reviewing. If, as it claims, it provides evidence of such a causal link between Tuskegee and Black health, then it is plausible that the legacy of that event may continue to influence Black people’s health seeking behavior, and by extension, the persistent and cumulative effect of bias among medical providers and the larger society may be causing continued mistrust and the adverse health consequences that result from it.
Unfortunately, AW (2018) based their empirical analysis on an unfounded hypothesis that the Tuskegee Experiment would affect only Black men and not Black women, and that the effect would be larger the closer to Tuskegee a Black man lived. The first assumption that Black women were unaffected is unsupported by anything but the authors’ assertion, and it played a huge role in the empirical results. As shown above, the use of Black females as a counterfactual was the primary reason that restrictive, linear DDD estimates reported by AW showed the hypothesized relationship. I also showed that the relationship between the pre- to post-1972 change in mortality differed by sex and cause of death, which has no basis in the AW hypothesis. The second assumption, that the effect of the Tuskegee Experiment would differ by distance to Tuskegee was also unsupported. Empirically, it was clearly unsupported, as DD and DDD estimates had a highly non-linear relationship to distance from Tuskegee. It is only by imposing a linear relationship and using Black women as a counterfactual that AW produced limited evidence consistent with their hypothesis. As I have amply demonstrated, a simple, transparent presentation of the data shows no evidence that would support the AW hypothesis.
My conclusion is that AW (2018) was based on unsupported theoretical assumptions and faulty empirical methods. It strained in both its development of the conceptual hypothesis and in its conduct of the empirical analysis to conclude that the Tuskegee Experiment adversely affected Black men’s use of medical care and mortality. Here, I examined only the mortality component of the study, which is arguably the most important outcome because healthcare is used to improve health and mortality is the ultimate measure of health. My interrogation clearly showed that the AW hypothesis was not supported and why AW mistakenly concluded otherwise.
Finding solutions to the large and persistent racial disparities in health is one of the most important objectives in our society. The value of good health and longevity is huge, and the relatively poor health of Black people represents a massive loss of welfare to them and society. The possibility that poor communication between Black patients and their non-Black providers and a general mistrust of healthcare providers among Black people is plausible. Identifying this cause of racial disparities in health would, hopefully, lead to a solution. But AW (2018) does not provide the necessary evidence, and research into whether mistrust is a cause of racial disparities should continue.
Data and code
Data for this research is covered by a Data Use Agreement with the National Center for Health Statistics that precludes sharing. Code is available from the author upon request.