Peer-reviewed Articles

Working Papers

Multiple Imputation for Large Multi-Scale Data With Linear Constraints. (with Paul Beaumont)

Abstract: The use of multiple imputation of missing data in empirical studies has become increasingly popular in recent years. However, currently available multiple imputation methods face significant challenges when applied to large hierarchical, multidimensional data sets that are subject to linear aggregation constraints. In this paper we introduce a novel multiple imputation method designed to address these challenges. Our method leverages singular multivariate normal distributions within an Expectation Maximization algorithm combined with a Parallel-Sequential Imputation scheme to handle large and complex data sets that include linear aggregation constraints. Testing on real data sets demonstrates that the new method obtains up to twice the accuracy and is as much as an order of magnitude faster than leading alternative methods. We apply our method to estimate a panel data model of average weekly wages and show that our method produces estimates that unbiased and as efficient as estimates based on the dataset with no missing values.

Dynamic Synthetic Controls: Accounting for Varying Speeds in Comparative Case Studies. (with Thomas Chadefaux)

Abstract: Synthetic controls are widely used to estimate the causal effect of a treatment. However, they do not account for the different speeds at which units respond to changes. Reactions may be inelastic or “sticky” and thus slower due to varying regulatory, institutional, or political environments. We show that these different reaction speeds can lead to biased estimates of causal effects. We therefore introduce a dynamic synthetic control approach that accommodates varying speeds in time series, resulting in improved synthetic control estimates. We apply our method to re-estimate the effects of terrorism on income (Abadie and Gardeazabal 2003), tobacco laws on consumption (Abadie, Diamond, and Hainmueller 2010), and German reunification on GDP (Abadie, Diamond, and Hainmueller 2015). We also assess the method’s performance using Monte-Carlo simulations. We find that it reduces errors in the estimates of true treatment effects by up to 70% compared to traditional synthetic controls, improving our ability to make robust inferences. An open-source R package, dsc, is made available for easy implementation.

Ballot Rejections and Ballot Curing in Washington State. (with Canyon Foot, Jay Lee, R. Michael Alvarez, Paul Manson, and Paul Gronke)

Abstract: November 2020 was the first time in US history that a plurality of voters cast absentee or mail ballots. The dramatic rise of mail voting in response to the COVID-19 pandemic has led to increased attention on the potential benefits and limitations of conducting elections by mail. One of the main drawbacks to vote-by-mail policies is that states usually reject a much larger percentage of mail ballots than they do ballots cast in-person. This paper uses 27 ballot ``matchback’’ files from the state of Washington to examine, for the first time, the patterns in a state’s challenged and cured ballots. We find that younger voters, voters of color, inexperienced voters, and male voters all have substantially elevated rates of ballot rejections. These patterns are driven by disparities in signature-based ballot challenges, rather than differences in rates of ballot curing or any other part of the process. Additionally, we examine the amount of time between ballot challenges and ballot cures, geographic variation in rejection rates, and discuss potential policy interventions to reduce disparities and lower rejection rates overall.

Work in Progress

  • “Dynamic Interaction Panel Estimation: Accounting for Complex Interdependence in Panel Data.”
    (with Thomas Chadefaux)
  • “Enhancing Regression Analysis through Self-Aligned DTW-Derived Speed Profiles in Time Series Data.”
    (with Thomas Chadefaux)
  • “The Parallel Quasi-Monte Carlo Bayesian Multi-Scale Multiple Imputation Method.”
    (with Paul Beaumont)
  • “Mailing It In: Voter Confidence in Vote-By-Mail In the 2020 Presidential Election.”
    (with R. Michael Alvarez and Seo-young Silvia Kim)

Research Experience

Trinity College Dublin
Research Fellow (January 2022 – Present)
Project: Patterns of Conflict Emergence

  • Identify patterns in the pre-conflict actions using data on conflict events and in their perceptions using data from financial markets, news articles, and diplomatic documents.
  • Evaluate the utility of these patterns to improve forecasts of conflict with both historical and live out-of-sample predictions.
  • Summarize the core features of dangerous patterns into motifs that can help build new theories of conflict emergence and escalation.

California Institute of Technology
Visitor (January 2022 – Present)
Postdoctoral Scholar in Data Science and Election Integrity (July 2019 – December 2021)
Project: Election Auditing

  • Developed probabilistic matching and Bayesian multivariate models using GCP for large election database auditing in California and Florida.
  • Implemented entity resolution and anomaly detection on daily snapshots of voter registration databases that contain more than 20 million records and detected 10x more true anomalies than the existing methods did.

Project: Twitter Monitoring

  • Developed serverless architectures using GCP, AWS, and Oracle for long-term Twitter monitoring. They ingest, process, and store more than 4.5 billion tweets (30 TB in size) related to COVID-19, primary/general elections, and protests.
  • Work closely with the Computer Science team and implemented topic, spatial, network, and sentiment analyses on the collected tweets and identified COVID-19 misinformation and voting issues in the 2020 Election cycle.

Florida State University
Senior Researcher (August 2018 – June 2019)
Project: Large Missing Data Multiple Imputation

  • Developed the fastest and most accurate Bayesian inference method for missing data multiple imputation.
  • Developed a parallel-sequential imputation method that can impute large multi-scale data sets with 1.5 billion observations (500 GB in size).

Project: Economic Impact Modeling

  • Analyzed the economic impact of Florida’s housing and small business policies.
  • Developed a NETS-based impact analysis tool that has 1000 times finer resolution than the existing methods.