Estimating Total Effects in Bipartite Experiments:...

Estimating Total Effects in Bipartite Experiments: Addressing Spillovers and Partial Eligibility

This article outlines a practical, replication-ready step-by-step implementation for estimating total causal effects in bipartite experiments, specifically addressing the complexities of spillovers and partial eligibility. We provide the necessary data schema, exposure mapping techniques, and estimator blueprints, along with code skeletons and diagnostic procedures.

I. Core Concepts and Data Structure

Data Schema

We assume the following data structure:

A: An (n×m) bipartite adjacency matrix representing connections between A-units and B-units.
D_B: An m-length vector indicating treatment status for B-units.
Y: An n-length vector of outcomes for A-units.
X_A: An n×p matrix of covariates for A-units.

Exposure Mapping and Categorization

Exposure for an A-unit (i) is defined as the sum of treatments from its B-side neighbors:

E_i = Σ_j A[i,j] * D_B[j]

To ensure stability, especially in sparse networks, we categorize exposure into discrete bins:

E_{cat_i} in {0,1,2,3,4} using bins [-0.5, 0.5, 1.5, 2.5, 3.5, Inf].

Estimator Blueprint

We propose using exposure-aware propensity scoring and nearest-neighbor matching on (X_A, E_cat) to estimate the total causal effect of B-side treatment on A-outcomes, accounting for interference.

Replication-Ready Code Skeleton (Python)

The provided code includes steps for:

Exposure mapping.
Multinomial logistic regression for exposure propensity scores.
Matching on (X_A, E_cat).
Total effect (TE) estimation.
Bootstrap inference for standard errors and confidence intervals.
Basic diagnostics.

Diagnostics and Reporting

Key diagnostics include:

Assessing covariate balance before and after matching.
Providing bootstrap standard errors (SEs) and confidence intervals (CIs).
Documenting data-size and sparsity considerations.
Conducting sensitivity checks.

II. Handling Partial Eligibility and Spillovers

Interference occurs when a unit’s outcome is influenced by its neighbors’ treatment. This section details a procedure to map exposure and handle partial eligibility using stabilized weights.

Defining Eligibility

An A-unit i is considered eligible if it has at least one treated neighbor:

Eligible_i = 1 if E_i > 0; else 0.

Partial-Eligibility Modeling with Stabilized Weights

Instead of a binary exposure indicator, we model the probability of eligibility given covariates X_i:

p_{Elig_i} = P(Eligible_i = 1 | X_i), using a logistic model.

Stabilized weights are defined as: w_i = Eligible_i / p_{Elig_i}. These weights are used in outcome regressions or as sampling weights to account for varying exposure probabilities.

Concrete Steps for Implementation

Compute Exposure (E): E = A %*% D, yielding E_i for each A-unit.
Derive Eligibility: Set Eligible_i = 1 if E_i > 0, else 0.
Estimate p_Elig: Fit a logistic regression of Eligible on covariates (e.g., X1, X2, X3) to get p_{Elig_i} = P(Eligible_i = 1 | X_i).
Compute Weights: Define weights w_i = 1/p_{Elig_i} for Eligible_i = 1, and w_i = 0 for Eligible_i = 0.
Model Outcomes: Fit an outcome regression using the (potentially categorized) exposure, covariates, and weights w_i.

R Code Snippet for Partial Eligibility


# Assumes A (n x m sparse adjacency), D (length-m treatment), X1, X2, X3 in df, Y (outcomes)

A <- sparse(n, m)
D <- c(0,1)
E <- A %*% D

Eligible <- as.numeric(E > 0)

# p_Elig: probability of exposure given covariates X1, X2, X3
pf <- glm(Eligible ~ X1 + X2 + X3, data=df, family=binomial)
p_Elig <- predict(pf, type='response')

# Stabilized weights: 1/p_Elig for exposed units, 0 otherwise
weights <- ifelse(Eligible == 1, 1/p_Elig, 0)

# Build analysis data
data <- data.frame(Y=Y, X1=X1, X2=X2, X3=X3, E=E,
                   E_cat=cut(E, breaks=c(-0.5,0.5,1.5,2.5,3.5,Inf),
                               labels=FALSE), w=weights)

# Outcome regression using exposure categories and covariates
fit <- lm(Y ~ E_cat + X1 + X2 + X3, data=data, weights=data$w)

III. Quantifying Spillovers: Direct, Indirect, and Total Effects

Interference is common. We can quantify the total effect, separating direct impacts from neighbors’ spillovers.

Total Effect Definition

The total effect for an A-unit i as a function of exposure level k is:

TE_i(k) = E[ Y_i | E_i ≥ k ] - E[ Y_i | E_i = 0 ]

Direct vs. Spillover Decomposition

Direct Effect: The impact from the unit's own B-neighbor treatment.
Spillover Effect: The incremental change in Y_i due to treated neighbors, as counted in the exposure measure E_i.

Python Snippet for Marginal Spillover

A simple linear regression can estimate the response of Y to own exposure and neighbor exposure:


from sklearn.linear_model import LinearRegression

reg = LinearRegression()
X = np.c_[X_A, E]  # Combine covariates and exposure
reg.fit(X, Y)

beta = reg.coef_[-1]  # Marginal spillover per exposed neighbor

beta estimates the change in Y per additional exposed neighbor, holding own exposure constant. This can be extended with interactions or nonlinearities.

Summary Table: Effect Quantities

Quantity	Definition	Formula / Note
TE_i(k)	Total effect for an A-unit as a function of exposure	TE_i(k) = E[ Y_i \| E_i ≥ k ] - E[ Y_i \| E_i = 0 ]
Direct effect	Effect from i’s own B-neighbor treatment	Part of TE attributable to own treatment; distinct from neighbor-induced changes
Spillover	Effect due to treated neighbors counted in E_i	Incremental change in Y_i due to neighbors being treated

IV. Diagnostics, Sensitivity Analyses, and Robustness Checks

A. Balance Diagnostics and Unconfoundedness Tests

Ensuring similarity between treated and comparison groups is crucial before trusting matched comparisons.

Compute Standardized Mean Differences (SMD): Calculate SMDs for all covariates before and after matching. Aim for |SMD| < 0.1 for credible balance. Consider robust SMDs for different scales/distributions.
Use Balance Summary Tools: Leverage libraries like R’s cobalt::bal.tab or MatchIt outputs. Review summary tables and plots to assess balance across covariates and strata. Persistent imbalances require attention.
Bootstrap the Estimated Total Effect (TE_hat): Resample A-units with replacement, re-run matching and TE estimation for each understanding-the-adaptivity-barrier-in-batched-nonparametric-bandits-why-unknown-margin-increases-sample-costs/">sample. Report the standard error (SE) and 95% confidence interval (CI) from the bootstrap distribution (e.g., using percentiles). Use a sufficient number of replicates (e.g., 1000) for stability and set a seed for reproducibility.

Combined, balance checks ensure a fair comparison, while bootstrapping quantifies uncertainty, providing a transparent view of unconfoundedness.

B. Sensitivity to Spillover Structure and Exposure Mapping

Exposure mapping choices can influence conclusions. Test sensitivity to these choices:

Vary Exposure Bin Granularity: Re-estimate TE under different binning schemes (e.g., coarse, mid, fine). Consistent TE estimates across schemes strengthen confidence. Significant swings or sign flips warrant caution and exploration of mapping impacts.
Conduct Placebo Exposure Tests: Permute D_B within treatment status groups. Re-estimate TE on permuted data. If the observed TE is unlikely under this null distribution, the spillover structure matters.
Apply Rosenbaum Bounds or E-values: Quantify the strength of hidden bias required to overturn conclusions. Report sensitivity metrics (e.g., Rosenbaum gamma, E-value) to indicate robustness. Higher values mean greater robustness to hidden confounding.

These checks—varying bins, placebo tests, and bias assessment—map the influence of spillover structure and mapping choices on conclusions. Stability and robustness across these tests increase confidence; deviations necessitate careful caveats.

C. Placebo Tests and Robustness Checks (General)

These are critical for verifying that the estimated effect stems from the treatment, not data quirks.

Falsification Check: Estimate TE using placebo units (random subsets of B-units or units outside treatment support). TE_hat should be non-significant. Repeated significant findings across draws indicate issues like unobserved differences or model misspecification.
Pre-treatment Checks: Compare outcome (Y) values and key covariates before B-side assignment across prospective groups. Look for imbalances using tests (t-tests, chi-square) and SMDs. No meaningful pre-treatment differences support a cleaner estimate.
Visual Diagnostics: Inspect residual plots (vs. fitted values, covariates) for patterns, Q-Q plots for residuals to check normality, and TE_hat by E_cat level to detect effect heterogeneity. Random residuals and near-diagonal Q-Q plots support model adequacy. Consistent TE_hat across E_cat suggests a robust effect.

Summary of Diagnostics

Diagnostic	What to Plot or Test	What to Look For	Interpretation
Falsification Check	TE_hat on placebo/D_B subsets	TE_hat non-significant across random draws	Supports that the estimated effect is not driven by spurious differences
Pre-treatment Checks	Y_pre and key covariates by assignment group	Balanced means, small standardized differences, no visible gaps	Reduces concerns about pre-existing differences driving results
Visual Diagnostics	Residuals vs fitted; Q-Q plot of residuals; TE_hat by E_cat	Residuals look random; Q-Q near diagonal; TE_hat stable across E_cat	Model fit is reasonable; potential heterogeneity is either absent or worth investigating

V. Algorithmic Details, Complexity, and Scalability

A. Computational Costs by Algorithm

Exposure Mapping:

Dense A: O(n · m) operations.
Sparse Adjacency: O(nnz(A)) operations. Preferred for large, sparse datasets.

Nearest-Neighbor Matching:

With indexing/efficient structures: O(n log n).
Naive: degrades to O(n^2).

Graph-Based Minimum-Cost-Flow Matching:

Fast solvers: ~O(E · sqrt(V)), where E is edges, V is nodes.
Expensive without sparse representations and batching.

B. Memory Considerations

Store A as a sparse matrix (CSR/CSC).
Store D_B, Y as vectors.
Store X as a dense or sparse matrix.
Use Batching for bootstrap procedures to manage peak memory and improve cache efficiency.

Choosing sparse representations and indexing strategies dramatically reduces time and memory for large networks.

VI. Data Structures and Implementation Tips (Python/R)

Optimize speed with the right storage, vectorized math, and batching.

Python (SciPy) / R (Matrix Package)

Storage Format: Use sparse matrices (e.g., CSR in SciPy, sparse matrix class in R) for A. This speeds up matrix-vector products (A.dot(D_B)) and reduces memory.
Vectorize Exposure Computation: Replace Python loops with single A.dot(D_B) calls. Matrix multiplication is highly optimized. Ensure D_B has compatible dimensions.
Batch and Parallelize Bootstrap: Process bootstrap replications in batches to control memory. Distribute replications across cores/nodes using libraries like joblib (Python) or parallel (R).

VII. Choosing Among Matching Algorithms

Algorithm	Description / Approach	Pros	Cons	Complexity
Exposure-Weighted Nearest Neighbor Matching (E-WNNM)	Match on covariates plus discretized exposure	Intuitive exposure control; good for moderate graphs	Sensitive to exposure bin choices	Moderate
Propensity-Score Matching with Exposure Strata (PSM-ES)	Model P(E>0\|X) and match within exposure strata	Leverages well-established PS machinery	Strata definitions can bias if too coarse	Moderate-to-high
Graph-Based Minimum-Cost-Flow Matching (GM-CFM)	Solve optimized flow to minimize total distance under spillover constraints	Theoretically optimal under model	Computationally intensive for large networks	High

VIII. Implementation Checklist and Replication Plan

Define Bipartite Graph: Identify A-nodes (outcome units) and B-nodes (treatment units); ensure D_B is randomized or aligned with the design.
Preprocess Covariates (X_A): Standardize continuous variables; use one-hot encoding for categorical ones.
Choose Exposure Mapping Resolution: Select based on network degree; test multiple mappings for robustness.
Implement and Compare Matching Algorithms: Use at least two algorithms, compare TE_hat results and diagnostics. Document all parameter choices.
Run Diagnostics: Perform balance checks (SMD), placebo/falsification tests, and sensitivity analyses (Rosenbaum bounds, E-values).
Publish Replication Artifacts: Include data-generating process, code, random seeds, and software versions for full reproducibility.
Address Scalability: Prefer sparse representations, batch processing, and parallel bootstrapping for large graphs.
Interpret TE_hat Carefully: Contextualize results with spillovers, explaining decomposition into direct and spillover components where possible.

Estimating Total Effects in Bipartite Experiments:…