The impact of sampling patients on measuring physician patient-sharing networks using Medicare data

Health Serv Res. 2020 Oct 14;56(2):323-333. doi: 10.1111/1475-6773.13568. Online ahead of print.

Abstract

Objective: To investigate the impact of sampling patients on descriptive characteristics of physician patient-sharing networks.

Data sources: Medicare claims data from 10 hospital referral regions (HRRs) in the United States in 2010.

Study design: We form a sampling frame consisting of the full cohort of patients (Medicare enrollees) with claims in the 2010 calendar year from the selected HRRs. For each sampling fraction, we form samples of patients from which a physician ("patient-sharing") network is constructed in which an edge between two physicians depicts that at least one patient in the sample encountered both of those physicians. The network is summarized using 18 network measures. For each network measure and sampling fraction, we compare the values determined from the sample and the full cohort of patients. Finally, we assess the sampling fraction that is needed to measure each network measure to specified levels of accuracy.

Data collection/extraction methods: We utilized administrative claims from the traditional (fee-for-service) Medicare.

Principal findings: We found that measures of physician degree (the number of ties to other physicians) in the network and physician centrality (importance or prominence in the network) are learned quickly in the sense that a small sampling fraction suffices to accurately compute the measure. At the network level, network density (the proportion of possible edges that are present) was learned quickly while measures based on more complex configurations (subnetworks involving multiple actors) are learned relatively slowly with relative rates of learning depending on network size (the number of nodes).

Conclusions: The sampling fraction applied to Medicare patients has a highly heterogeneous effect across different network measures on the extent to which sample-based network measures resemble those evaluated using the full cohort. Even random sampling of patients may yield physician networks that distort descriptive features of the network based on the full cohort, potentially resulting in biased results.

Keywords: bias; bipartite network; learning; one‐mode projection; sampling; summary network measures.