Estimating Total Effects in Bipartite Experiments: Addressing Spillovers and Partial Eligibility
This article outlines a practical, replication-ready step-by-step implementation for estimating total causal effects in bipartite experiments, specifically addressing the complexities of spillovers and partial eligibility. We provide the necessary data schema, exposure mapping techniques, and estimator blueprints, along with code skeletons and diagnostic procedures.
I. Core Concepts and Data Structure
Data Schema
We assume the following data structure:
- A: An (n×m) bipartite adjacency matrix representing connections between A-units and B-units.
- D_B: An m-length vector indicating treatment status for B-units.
- Y: An n-length vector of outcomes for A-units.
- X_A: An n×p matrix of covariates for A-units.
Exposure Mapping and Categorization
Exposure for an A-unit (i) is defined as the sum of treatments from its B-side neighbors:
Ei = Σj A[i,j] * DB[j]
To ensure stability, especially in sparse networks, we categorize exposure into discrete bins:
Ecat_i in {0,1,2,3,4} using bins [-0.5, 0.5, 1.5, 2.5, 3.5, Inf].
Estimator Blueprint
We propose using exposure-aware propensity scoring and nearest-neighbor matching on (XA, Ecat) to estimate the total causal effect of B-side treatment on A-outcomes, accounting for interference.
Replication-Ready Code Skeleton (Python)
The provided code includes steps for:
- Exposure mapping.
- Multinomial logistic regression for exposure propensity scores.
- Matching on (XA, Ecat).
- Total effect (TE) estimation.
- Bootstrap inference for standard errors and confidence intervals.
- Basic diagnostics.
Diagnostics and Reporting
Key diagnostics include:
- Assessing covariate balance before and after matching.
- Providing bootstrap standard errors (SEs) and confidence intervals (CIs).
- Documenting data-size and sparsity considerations.
- Conducting sensitivity checks.
II. Handling Partial Eligibility and Spillovers
Interference occurs when a unit’s outcome is influenced by its neighbors’ treatment. This section details a procedure to map exposure and handle partial eligibility using stabilized weights.
Defining Eligibility
An A-unit i is considered eligible if it has at least one treated neighbor:
Eligiblei = 1 if Ei > 0; else 0.
Partial-Eligibility Modeling with Stabilized Weights
Instead of a binary exposure indicator, we model the probability of eligibility given covariates Xi:
pElig_i = P(Eligiblei = 1 | Xi), using a logistic model.
Stabilized weights are defined as: wi = Eligiblei / pElig_i. These weights are used in outcome regressions or as sampling weights to account for varying exposure probabilities.
Concrete Steps for Implementation
- Compute Exposure (E): E = A %*% D, yielding Ei for each A-unit.
- Derive Eligibility: Set Eligiblei = 1 if Ei > 0, else 0.
- Estimate p_Elig: Fit a logistic regression of Eligible on covariates (e.g., X1, X2, X3) to get pElig_i = P(Eligiblei = 1 | Xi).
- Compute Weights: Define weights wi = 1/pElig_i for Eligiblei = 1, and wi = 0 for Eligiblei = 0.
- Model Outcomes: Fit an outcome regression using the (potentially categorized) exposure, covariates, and weights wi.
R Code Snippet for Partial Eligibility
# Assumes A (n x m sparse adjacency), D (length-m treatment), X1, X2, X3 in df, Y (outcomes)
A <- sparse(n, m)
D <- c(0,1)
E <- A %*% D
Eligible <- as.numeric(E > 0)
# p_Elig: probability of exposure given covariates X1, X2, X3
pf <- glm(Eligible ~ X1 + X2 + X3, data=df, family=binomial)
p_Elig <- predict(pf, type='response')
# Stabilized weights: 1/p_Elig for exposed units, 0 otherwise
weights <- ifelse(Eligible == 1, 1/p_Elig, 0)
# Build analysis data
data <- data.frame(Y=Y, X1=X1, X2=X2, X3=X3, E=E,
E_cat=cut(E, breaks=c(-0.5,0.5,1.5,2.5,3.5,Inf),
labels=FALSE), w=weights)
# Outcome regression using exposure categories and covariates
fit <- lm(Y ~ E_cat + X1 + X2 + X3, data=data, weights=data$w)
III. Quantifying Spillovers: Direct, Indirect, and Total Effects
Interference is common. We can quantify the total effect, separating direct impacts from neighbors’ spillovers.
Total Effect Definition
The total effect for an A-unit i as a function of exposure level k is:
TEi(k) = E[ Yi | Ei ≥ k ] - E[ Yi | Ei = 0 ]
Direct vs. Spillover Decomposition
- Direct Effect: The impact from the unit's own B-neighbor treatment.
- Spillover Effect: The incremental change in Yi due to treated neighbors, as counted in the exposure measure Ei.
Python Snippet for Marginal Spillover
A simple linear regression can estimate the response of Y to own exposure and neighbor exposure:
from sklearn.linear_model import LinearRegression
reg = LinearRegression()
X = np.c_[X_A, E] # Combine covariates and exposure
reg.fit(X, Y)
beta = reg.coef_[-1] # Marginal spillover per exposed neighbor
beta estimates the change in Y per additional exposed neighbor, holding own exposure constant. This can be extended with interactions or nonlinearities.
Summary Table: Effect Quantities
| Quantity | Definition | Formula / Note |
|---|---|---|
| TEi(k) | Total effect for an A-unit as a function of exposure | TEi(k) = E[ Yi | Ei ≥ k ] - E[ Yi | Ei = 0 ] |
| Direct effect | Effect from i’s own B-neighbor treatment | Part of TE attributable to own treatment; distinct from neighbor-induced changes |
| Spillover | Effect due to treated neighbors counted in Ei | Incremental change in Yi due to neighbors being treated |
IV. Diagnostics, Sensitivity Analyses, and Robustness Checks
A. Balance Diagnostics and Unconfoundedness Tests
Ensuring similarity between treated and comparison groups is crucial before trusting matched comparisons.
- Compute Standardized Mean Differences (SMD): Calculate SMDs for all covariates before and after matching. Aim for |SMD| < 0.1 for credible balance. Consider robust SMDs for different scales/distributions.
- Use Balance Summary Tools: Leverage libraries like R’s
cobalt::bal.tabor MatchIt outputs. Review summary tables and plots to assess balance across covariates and strata. Persistent imbalances require attention. - Bootstrap the Estimated Total Effect (TE_hat): Resample A-units with replacement, re-run matching and TE estimation for each understanding-the-adaptivity-barrier-in-batched-nonparametric-bandits-why-unknown-margin-increases-sample-costs/">sample. Report the standard error (SE) and 95% confidence interval (CI) from the bootstrap distribution (e.g., using percentiles). Use a sufficient number of replicates (e.g., 1000) for stability and set a seed for reproducibility.
Combined, balance checks ensure a fair comparison, while bootstrapping quantifies uncertainty, providing a transparent view of unconfoundedness.
B. Sensitivity to Spillover Structure and Exposure Mapping
Exposure mapping choices can influence conclusions. Test sensitivity to these choices:
- Vary Exposure Bin Granularity: Re-estimate TE under different binning schemes (e.g., coarse, mid, fine). Consistent TE estimates across schemes strengthen confidence. Significant swings or sign flips warrant caution and exploration of mapping impacts.
- Conduct Placebo Exposure Tests: Permute D_B within treatment status groups. Re-estimate TE on permuted data. If the observed TE is unlikely under this null distribution, the spillover structure matters.
- Apply Rosenbaum Bounds or E-values: Quantify the strength of hidden bias required to overturn conclusions. Report sensitivity metrics (e.g., Rosenbaum gamma, E-value) to indicate robustness. Higher values mean greater robustness to hidden confounding.
These checks—varying bins, placebo tests, and bias assessment—map the influence of spillover structure and mapping choices on conclusions. Stability and robustness across these tests increase confidence; deviations necessitate careful caveats.
C. Placebo Tests and Robustness Checks (General)
These are critical for verifying that the estimated effect stems from the treatment, not data quirks.
- Falsification Check: Estimate TE using placebo units (random subsets of B-units or units outside treatment support). TE_hat should be non-significant. Repeated significant findings across draws indicate issues like unobserved differences or model misspecification.
- Pre-treatment Checks: Compare outcome (Y) values and key covariates before B-side assignment across prospective groups. Look for imbalances using tests (t-tests, chi-square) and SMDs. No meaningful pre-treatment differences support a cleaner estimate.
- Visual Diagnostics: Inspect residual plots (vs. fitted values, covariates) for patterns, Q-Q plots for residuals to check normality, and TE_hat by Ecat level to detect effect heterogeneity. Random residuals and near-diagonal Q-Q plots support model adequacy. Consistent TE_hat across Ecat suggests a robust effect.
Summary of Diagnostics
| Diagnostic | What to Plot or Test | What to Look For | Interpretation |
|---|---|---|---|
| Falsification Check | TE_hat on placebo/D_B subsets | TE_hat non-significant across random draws | Supports that the estimated effect is not driven by spurious differences |
| Pre-treatment Checks | Y_pre and key covariates by assignment group | Balanced means, small standardized differences, no visible gaps | Reduces concerns about pre-existing differences driving results |
| Visual Diagnostics | Residuals vs fitted; Q-Q plot of residuals; TE_hat by E_cat | Residuals look random; Q-Q near diagonal; TE_hat stable across E_cat | Model fit is reasonable; potential heterogeneity is either absent or worth investigating |
V. Algorithmic Details, Complexity, and Scalability
A. Computational Costs by Algorithm
- Exposure Mapping:
- Dense A: O(n · m) operations.
- Sparse Adjacency: O(nnz(A)) operations. Preferred for large, sparse datasets.
- Nearest-Neighbor Matching:
- With indexing/efficient structures: O(n log n).
- Naive: degrades to O(n^2).
- Graph-Based Minimum-Cost-Flow Matching:
- Fast solvers: ~O(E · sqrt(V)), where E is edges, V is nodes.
- Expensive without sparse representations and batching.
B. Memory Considerations
- Store A as a sparse matrix (CSR/CSC).
- Store D_B, Y as vectors.
- Store X as a dense or sparse matrix.
- Use Batching for bootstrap procedures to manage peak memory and improve cache efficiency.
Choosing sparse representations and indexing strategies dramatically reduces time and memory for large networks.
VI. Data Structures and Implementation Tips (Python/R)
Optimize speed with the right storage, vectorized math, and batching.
Python (SciPy) / R (Matrix Package)
- Storage Format: Use sparse matrices (e.g., CSR in SciPy, sparse matrix class in R) for A. This speeds up matrix-vector products (A.dot(D_B)) and reduces memory.
- Vectorize Exposure Computation: Replace Python loops with single
A.dot(D_B)calls. Matrix multiplication is highly optimized. Ensure D_B has compatible dimensions. - Batch and Parallelize Bootstrap: Process bootstrap replications in batches to control memory. Distribute replications across cores/nodes using libraries like
joblib(Python) orparallel(R).
VII. Choosing Among Matching Algorithms
| Algorithm | Description / Approach | Pros | Cons | Complexity |
|---|---|---|---|---|
| Exposure-Weighted Nearest Neighbor Matching (E-WNNM) | Match on covariates plus discretized exposure | Intuitive exposure control; good for moderate graphs | Sensitive to exposure bin choices | Moderate |
| Propensity-Score Matching with Exposure Strata (PSM-ES) | Model P(E>0|X) and match within exposure strata | Leverages well-established PS machinery | Strata definitions can bias if too coarse | Moderate-to-high |
| Graph-Based Minimum-Cost-Flow Matching (GM-CFM) | Solve optimized flow to minimize total distance under spillover constraints | Theoretically optimal under model | Computationally intensive for large networks | High |
VIII. Implementation Checklist and Replication Plan
- Define Bipartite Graph: Identify A-nodes (outcome units) and B-nodes (treatment units); ensure D_B is randomized or aligned with the design.
- Preprocess Covariates (X_A): Standardize continuous variables; use one-hot encoding for categorical ones.
- Choose Exposure Mapping Resolution: Select based on network degree; test multiple mappings for robustness.
- Implement and Compare Matching Algorithms: Use at least two algorithms, compare TE_hat results and diagnostics. Document all parameter choices.
- Run Diagnostics: Perform balance checks (SMD), placebo/falsification tests, and sensitivity analyses (Rosenbaum bounds, E-values).
- Publish Replication Artifacts: Include data-generating process, code, random seeds, and software versions for full reproducibility.
- Address Scalability: Prefer sparse representations, batch processing, and parallel bootstrapping for large graphs.
- Interpret TE_hat Carefully: Contextualize results with spillovers, explaining decomposition into direct and spillover components where possible.

Leave a Reply