COVID-19 Antibody Seroprevalence in Santa Clara County, California
medRxiv preprint doi: https://doi.org/10.1101/2020.04.14.20062463 - not peer-reviewed
Eran Bendavid1 ebd at stanford.edu., Bianca Mulaney2, Neeraj Sood3, Soleil Shah2, Emilia Ling2, Rebecca Bromley-Dulfano2,
Cara Lai2, Zoe Weissberg2, Rodrigo Saavedra-Walker4, Jim Tedrow5, Dona Tversky6, Andrew Bogan7,
Thomas Kupiec8, Daniel Eichner9, Ribhav Gupta10, John P.A. Ioannidis1,10, Jay Bhattacharya1
Download the PDF from Vitamin D Life
Appears that most people who are infected develop antibodies
After their antibodies have fought off the virus, they do not test positive
Do most have different genes, better immune systems, or what?
Experts question results of startling Santa Clara coronavirus antibody study April 18
Due to possible testing errors and biases due to whom was tested "experts" reduce 50X ==> 30X or 10X
Addressing COVID-19 is a pressing health and social concern. To date, many epidemic projections and policies addressing COVID-19 have been designed without seroprevalence data to inform epidemic parameters. We measured the seroprevalence of antibodies to SARS-CoV-2 in Santa Clara County.
On 4/3-4/4, 2020, we tested county residents for antibodies to SARS-CoV-2 using a lateral flow immunoassay. Participants were recruited using Facebook ads targeting a representative sample of the county by demographic and geographic characteristics. We report the prevalence of antibodies to SARS- CoV-2 in a sample of 3,330 people, adjusting for zip code, sex, and race/ethnicity. We also adjust for test performance characteristics using 3 different estimates:
- (i) the test manufacturer’s data,
- (ii) a sample of 37 positive and 30 negative controls tested at Stanford, and
- (iii) a combination of both.
The unadjusted prevalence of antibodies to SARS-CoV-2 in Santa Clara County was 1.5% (exact binomial 95CI 1.11-1.97%), and the population-weighted prevalence was 2.81% (95CI 2.24-3.37%). Under the three scenarios for test performance characteristics, the population prevalence of COVID-19 in Santa Clara ranged from 2.49% (95CI 1.80-3.17%) to 4.16% (2.58-5.70%). These prevalence estimates represent a range between 48,000 and 81,000 people infected in Santa Clara County by early April, 50- 85-fold more than the number of confirmed cases.
The population prevalence of SARS-CoV-2 antibodies in Santa Clara County implies that the infection is much more widespread than indicated by the number of confirmed cases. Population prevalence estimates can now be used to calibrate epidemic and mortality projections.
The first two cases of COVID-19 in Santa Clara County, California were identified in returning travelers on January 31 and on February 1, 2020, and the third case was identified four weeks later on February 27, 2020.1 In the following month, nearly 1,000 additional cases were identified in Santa Clara County, showing a pattern of rapid case increase reflective of community transmission as well as the scaling up of SARS-CoV-2 viral testing that was common across many communities globally. In some countries, the rapid increase in COVID-19 case counts and hospitalizations has overwhelmed health systems and led to large reductions in social and economic activities. The measures adopted to slow the spread of COVID-19 were justified by projected estimates of health care system capacity and case fatality rate. These projections suggested that, in the absence of strict measures to reduce transmission, the COVID-19 pandemic would overwhelm existing hospital bed and ICU capacity throughout the United States and lead to over 2 million deaths.2
Measuring fatality rates and projecting the number of deaths depend on estimates of the total number of infections. To date, in the absence of seroprevalence surveys, estimates of the fatality rate have relied on the number of confirmed cases multiplied by an estimated factor representing unknown or asymptomatic cases to arrive at the number of infections.3-6 However, the magnitude of that factor is highly uncertain. Because the implications of infection fatality rate and projected deaths are large, the extent of COVID-19 infection under-ascertainment (the multiplier used to arrive from cases to infections) has been a topic of great interest and provided estimates of the number of infections about 1-6-fold higher than the number of cases.7-10 The extent of infection under-ascertainment has been difficult to assess because of three biasing processes: (i) cases have been diagnosed with PCR-based tests, which do not provide information about resolved infections; (ii) the majority of cases tested early in the course of the epidemic have been acutely ill and highly symptomatic, while most asymptomatic or mildly symptomatic individuals have not been tested; and (iii) PCR-based testing rates have been highly variable across contexts and over time, leading to noisy relationships between the number of cases and infections. If, in the absence of interventions, the epidemic’s early doubling time is estimated to be four days 6,11,12, then by February 27th, 2020, when the third case was identified in Santa Clara County, the county may have already had 256 infections.
At the time of this study, Santa Clara County had the largest number of confirmed cases of any county in Northern California (1,094). The county also had several of the earliest known cases of COVID-19 in the state - including one of the first presumed cases of community-acquired disease - making it an especially appropriate location to test a population-level sample for the presence of active and past infections.
On April 3rd and 4th, 2020 we conducted a survey of residents of Santa Clara County to measure the seroprevalence of antibodies to SARS-CoV-2 and better approximate the number of infections. Our goal is to provide new and well-measured data for informing epidemic models, projections, and public policy decisions.
We conducted serologic testing for SARS-CoV-2 antibodies in 3,330 adults and children in Santa Clara County using capillary blood draws and a lateral flow immunoassay. In this section we describe our sampling and recruitment approaches, specimen collection methods, antibody testing procedure, test kit validation, and statistical methods. Our protocol was informed by a World Health Organization protocol for population-level COVID-19 antibody testing.13 We conducted our study with the cooperation of the
Santa Clara County Department of Public Health. The IRB at Stanford University approved the study prior to recruitment.
Study Participants and Sample Recruitment
We recruited participants by placing targeted advertisements on Facebook aimed at residents of Santa Clara County. We used Facebook to quickly reach a large number of county residents and because it allows for granular targeting by zip code and sociodemographic characteristics.14 We used a combination of two targeting strategies: ads aimed at a representative population of the county by zip code, and specially targeted ads to balance our sample for under-represented zip codes. In addition, we capped registrations from overrepresented areas. Individuals who clicked on the advertisement were directed to a survey hosted by the Stanford REDcap platform, which provided information about the study.15 The survey asked for six data elements: zip code of residence, age, sex, race/ethnicity, underlying comorbidities, and prior clinical symptoms. Over 24 hours, we registered 3,285 adults, and each adult was allowed to bring one child from the same household with them (889 children registered).
Specimen Collection and Testing Methods
We established drive-through test sites in three locations spaced across Santa Clara County: two county parks in Los Gatos and San Jose, and a church in Mountain View. Only individuals with a participant ID were allowed into the testing area. Verbal informed consent was obtained to minimize participant and staff exposure. With participants in their vehicles, sample collectors in personal protective equipment drew 50-200uL of capillary blood into an EDTA-coated microtainer. Tubes were barcoded and linked with the participant ID. Samples were couriered from the collection sites to a test reading facility with steady lighting and climate conditions. Technicians drew whole blood up to a fill line on the manufacturer’s pipette and placed it in the test kit well, followed by a buffer. Test
kits were read 12-20 minutes after the buffer was placed. Technicians barcoded tests to match sample barcodes and documented all test results.
Test Kit Performance
The manufacturer’s performance characteristics were available prior to the study (using 85 confirmed positive and 371 confirmed negative samples). We conducted additional testing to assess the kit performance using local specimens. We tested the kits using sera from 37 RT-PCR-positive patients at Stanford Hospital that were also IgG and/or IgM-positive on a locally developed ELISA assay. We also tested the kits on 30 pre-COVID samples from Stanford Hospital to derive an independent measure of specificity. Our procedure for using these data is detailed below.
Our estimation of the population prevalence of COVID-19 proceeded in three steps. First, we reported the raw frequencies of positive tests as a proportion of the final sample size. Second, we re-weighted our sample by zip code, sex, and race/ethnicity (non-Hispanic White, Asian, Hispanic, and other). We chose these three adjustors because they contributed to the largest imbalance in our sample, and because including additional adjustors would result in small-N bins. Our weights were the zip-sex-race proportion in Santa Clara County divided by the zip-sex-race proportion in our sample, for each zip-sex-race combination in the county and in the sample.
Where Nc represents county counts, N represents sample counts, and the subscripts zsr identifies the unique zip-sex-race groups. These weights were then applied to the entire sample. To provide a concrete example, suppose the populations of two zip codes (A and B ) include 10,000 men and 10,000 women. Our sample included 250 men and 500 women from zip A, and 750 men and 1500 women from zip B. This is exemplary of the imbalance in our sample. Applying the formula above, we get a weight of 3 for men in zip A, 1.5 for women in zip A, 1 for men in zip B, and 0.5 for women in zip B.
Third, we adjusted the prevalence for test sensitivity and specificity. Because SARS-CoV-2 lateral flow assays are new, we applied three scenarios of test kit sensitivity and specificity. The first scenario uses the manufacturer’s validation data (S1). The second scenario uses sensitivity and specificity from a sample of 37 known positive (RT-PCR-positive and IgG or IgM positive on a locally-developed ELISA) and 30 known pre-COVID negatives tested on the kit at Stanford (S2). The third scenario combines the two collections of samples (manufacturer and local sample) as a single pooled sample (S3). We use the delta method to estimate standard errors for the population prevalence, which accounts for sampling error and propagates the uncertainty in the sensitivity and specificity in each scenario. A more detailed version of the formulas we use in our calculations is available in the Appendix to this paper.
The test kit used in this study (Premier Biotech, Minneapolis, MN) was tested in a Stanford laboratory prior to field deployment. Among 37 samples of known PCR-positive COVID-19 patients with positive IgG or IgM detected on a locally-developed ELISA test, 25 were kit-positive. A sample of 30 pre-COVID samples from hip surgery patients were also tested, and all 30 were negative. The manufacturer’s test characteristics relied on samples from clinically confirmed COVID-19 patients as positive gold standard and pre-COVID sera for negative gold standard. Among 75 samples of clinically confirmed COVID-19 patients with positive IgG, 75 were kit-positive, and among 85 samples with positive IgM, 78 were kitpositive. Among 371 pre-COVID samples, 369 were negative. Our estimates of sensitivity based on the manufacturer’s and locally tested data were 91.8% (using the lower estimate based on IgM, 95 CI 83.896.6%) and 67.6% (95 CI 50.2-82.0%), respectively. Similarly, our estimates of specificity are 99.5% (95 CI 98.1-99.9%) and 100% (95 CI 90.5-100%). A combination of both data sources provides us with a combined sensitivity of 80.3% (95 CI 72.1-87.0%) and a specificity of 99.5% (95 CI 98.3-99.9%).
Our study included 3,439 individuals that registered for the study and arrived at testing sites. We excluded observations of individuals who could not be tested (e.g. unable to obtain blood or blood clotted, N=49), whose test results could not be matched to their personal data (e.g. if an incorrect participant ID was recorded onsite, N=30) , who did not reside in Santa Clara County (N=29), and who had invalid test results (no Control band, N=1). This yielded an analytic sample of 3,330 individuals with complete records including survey registration, attendance at a test site for specimen collection, and lab results (Figure 1). The sample distribution meaningfully deviated from that of the Santa Clara County population along several dimensions: sex (63% in sample was female, 50% in county); race (8% of the sample was Hispanic, 26% in the county; 19% of the sample was Asian, 28% in the county); and zip distribution (median participant density per 1,000 population 1.6, IQR 0.9-3.6). Table 1 includes demographic characteristics of our unadjusted sample, population-adjusted sample, and Santa Clara County.16 Figure 2 shows the geographical zip code distribution of study participants in the county (counts and density per 1,000 population).
The total number of positive cases by either IgG or IgM in our unadjusted sample was 50, a crude prevalence rate of 1.50% (exact binomial 95% CI 1.11-1.97%). After weighting our sample to match Santa Clara County by zip, race, and sex, the prevalence was 2.81% (95% CI 2.24-3.37 without clustering the standard errors for members of the same household, and 1.45-4.16 with clustering). We further improved our estimation using the available data on test kit sensitivity and specificity, using the three scenarios noted above. The estimated prevalence was 2.49% (95CI 1.80%-3.17%) under the S1 scenario, 4.16% (95CI 2.58%-5.70%) under the S2 scenario, and 2.75% (95CI 2.01%-3.49%) under the S3 scenario. Notably, the uncertainty bounds around each of these population prevalence estimates propagates the uncertainty in each of the three component parameters: sample prevalence, test sensitivity, and test specificity.
After adjusting for population and test performance characteristics, we estimate that the seroprevalence of antibodies to SARS-CoV-2 in Santa Clara County is between 2.49% and 4.16%, with uncertainty bounds ranging from 1.80% (lower uncertainty bound of the lowest estimate), up to 5.70% (upper uncertainty bound of the highest estimate). Test performance characteristics are the most critical driver of this range, with lower estimates associated with data suggesting the test has a high sensitivity for identifying SARS- CoV-2, and higher estimates resulting from data suggesting over 30% of positive cases are missed by the test.
These results represent the first large-scale community-based prevalence study in a major US county completed during a rapidly changing pandemic, and with newly available test kits. We consider our estimate to represent the best available current evidence, but recognize that new information, especially about the test kit performance, could result in updated estimates. For example, if new estimates indicate test specificity to be less than 97.9%, our SARS-CoV-2 prevalence estimate would change from 2.8% to less than 1%, and the lower uncertainty bound of our estimate would include zero. On the other hand, lower sensitivity, which has been raised as a concern with point-of-care test kits, would imply that the population prevalence would be even higher. New information on test kit performance and population should be incorporated as more testing is done and we plan to revise our estimates accordingly.
The most important implication of these findings is that the number of infections is much greater than the reported number of cases. Our data imply that, by April 1 (three days prior to the end of our survey) between 48,000 and 81,000 people had been infected in Santa Clara County. The reported number of confirmed positive cases in the county on April 1 was 956, 50-85-fold lower than the number of infections predicted by this study.17 The infection to case ratio, also referred to as an under-ascertainment rate, of at least 50, is meaningfully higher than current estimates.10,18 This ascertainment rate is a fundamental parameter of many projection and epidemiologic models, and is used as a calibration target for understanding epidemic stage and calculating fatality rates.19,20 The under-ascertainment for COVID- 19 is likely a function of reliance on PCR for case identification which misses convalescent cases, early spread in the absence of systematic testing, and asymptomatic or lightly symptomatic infections that go undetected.
The under-ascertainment of infections is central for better estimation of the fatality rate from COVID-19. Many estimates of fatality rate use a ratio of deaths to lagged cases (because of duration from case confirmation to death), with an infections-to-cases ratio in the 1-5-fold range as an estimate of underascertainment.3,4,21 Our study suggests that adjustments for under-ascertainment may need to be much higher.
We can use our prevalence estimates to approximate the infection fatality rate from COVID-19 in Santa Clara County. As of April 10, 2020, 50 people have died of COVID-19 in the County, with an average increase of 6% daily in the number of deaths. If our estimates of 48,000-81,000 infections represent the cumulative total on April 1, and we project deaths to April 22 (a 3 week lag from time of infection to death22), we estimate about 100 deaths in the county. A hundred deaths out of 48,000-81,000 infections corresponds to an infection fatality rate of 0.12-0.2%. If antibodies take longer than 3 days to appear, if the average duration from case identification to death is less than 3 weeks, or if the epidemic wave has peaked and growth in deaths is less than 6% daily, then the infection fatality rate would be lower. These straightforward estimations of infection fatality rate fail to account for age structure and changing treatment approaches to COVID-19. Nevertheless, our prevalence estimates can be used to update existing fatality rates given the large upwards revision of under-ascertainment.
While our prevalence estimates of 2.49% to 4.16% are representative of the situation in Santa Clara County as of April 4, other areas are likely to have different seroprevalence estimates based on effective contact rates in the community, social distancing policies to date, and relative disease progression. Our prevalence estimate also suggests that, at this time, a large fraction of the population remains unexposed in Santa Clara County. Repeated serologic testing in different geographies, spaced a few weeks apart, could establish extent of infection over time.
This study had several limitations. First, our sampling strategy selected for members of Santa Clara County with access to Facebook and a car to attend drive-through testing sites. This resulted in an overrepresentation of white women between the ages of 19 and 64, and an under-representation of Hispanic and Asian populations, relative to our community. Those imbalances were partly addressed by weighting our sample population by zip code, race, and sex to match the county. We did not account for age imbalance in our sample, and could not ascertain representativeness of SARS-CoV-2 antibodies in homeless populations. Other biases, such as bias favoring individuals in good health capable of attending our testing sites, or bias favoring those with prior COVID-like illnesses seeking antibody confirmation are also possible. The overall effect of such biases is hard to ascertain.
The Premier Biotech serology test used in this study has not been approved by the FDA by the time of the study, and validation studies for this assay are ongoing. We used existing test performance data to establish a range of sensitivity and specificity, including reliable but small-size data sourced at Stanford. Test sensitivity varied between the manufacturer’s data and the local data. It is possible that asymptomatic or mildly symptomatic individuals may generate only low-titer antibodies, and that sensitivity may be even lower if there are many such cases.23 Additional validation of the assays used could improve our estimates and those of ongoing serosurveys.
Several teams worldwide have started testing population samples for SARS CoV-2 antibodies, with preliminary findings consistent with a large under-ascertainment of SARS CoV-2 infections. Reports from the town of Robbio, Italy, where the entire population was tested, suggest at least 10% seropositivity;24 and data from Gangelt, a highly affected area in Germany,25 point to 14% seropositivity.
A recent effort to test the town of Telluride, Colorado is underway, and interim results suggest a prevalence just under 2%.26 Our data from Santa Clara county suggest higher spread of the infection than Telluride but lower than some areas in Europe.
We conclude that based on seroprevalence sampling of a large regional population, the prevalence of SARS-CoV-2 antibodies in Santa Clara County was between 2.49% and 4.16% by early April. While this prevalence may be far smaller than the theoretical final size of the epidemic,27 it suggests that the number of infections is 50-85-fold larger than the number of cases currently detected in Santa Clara County. These new data should allow for better modeling of this pandemic and its progression under various scenarios of non-pharmaceutical interventions. While our study was limited to Santa Clara County, it demonstrates the feasibility of seroprevalence surveys of population samples now, and in the future, to inform our understanding of this pandemic’s progression, project estimates of community vulnerability, and monitor infection fatality rates in different populations over time. It is also an important tool for reducing uncertainty about the state of the epidemic, which may have important public benefits.