Effect of median income on PSI-90
Introduction
The goal of this post is to see if the income in the area immediately around a hospital is correlated with the quality of outcomes from these hospitals.
Also quick disclaimer, do not use any information from this post when making decisions about your medical care. I am not a healthcare professional.
Data Sources
Income data was taken from the 2016 5 year American Community survey data, specifically the detail table. Median income was used as it is simple to access and avoids being skewed by a smaller group of high income earners. I was not sure what to use as a predictor for quality outcomes. Some quick research brought up PSI-90, colloquially known as the Patient Safety and Adverse Events Composite. Effectively this value combines the incidence of several preventable factors into one metric for hospital quality. Data for PSI-90 values was taken from this website. The start date for this data is Oct 2015 through Jun 2017, so it covers a similar range to the American Community Survey data.
Data Import
The following libraries will all eventually get used. The APIkey.txt file will be used to load the API key for census data. The csv file was downloaded from the website linked above.
Census Median Income Data
This function will be used to import data from the U.S. Census API. More info on this may be found in this blog post.
The census data is imported here. “ZCTA5” is the census variable for zip code and “DP03_0062E” is the census variable for median income. The name fields from the Census API are dropped to keep only the needed info. A quick summary and graph of the data shows nothing eye popping.
PSI-90 Data
Now I will look at the PSI-90 data loaded above. A summary of the data shows that the Measure_Value was loaded in as a character variable, so that is converted to numeric. For now a lot of these columns are not needed, so the select statement is used to get only what is needed. As all of the individual components of the composite are also included, the filter statement selects only PSI-90. Now the summary after this filter shows that roughly 30% of the values are N/A. To forge ahead, these values will be filtered out. It is important to keep in mind with any of the findings ahead that a decent sized portion of the data has been excluded. To get a rough idea of how the data looks, PSI-90 values are graphed by zip code. This is probably a good time to mention that a lower PSI-90 value is indicative of better care.
Model
Now that we have both datasets with zipcodes, they are simply joined using the left_join function. Plotting the data shows no obvious relationship between income and outcomes.
We will now look at the relationship between these two values with a simple linear model. From the summary of the model, there is a low p-value and also a low R2. What this is effectively showing is that the relationship shown by the model exists, but it does not explain most of the variability in the data. So what is the effect overall? The linear model shows that for every $1000 increase in median income, the PSI-90 value decreases by about 0.00028. Overall this is a very minor effect, and analyzing other sources of variability will be more useful.
A quick look at the residuals of the model confirm what we saw above. The frequency graph does not look normally distributed, even after limiting the x-axis manually to ignore the long tail on the positive side. Looking at the residuals when compared to income, it further illustrates the issue that the data is very clustered in this model, which means the model likely won’t be accurate at the high end of income.
Findings
In summary, overall this data does not show a strong relationship between income around a hospital and the quality of care (PSI-90) of that hospital. Most of the variability in the PSI-90 values will be found in other factors not covered in this quick overview.
Shortcomings
This analysis is extremely cursory and has several ways in which it may be misleading. Some of the major points are, in no particular order:
- Medicare PSI-90 values may not be a good indicator for overall hospital quality of care.
- The populations served by these hospitals are not well characterized by the median income of the zip code that it is located in. A number of factors determine which hospitals people use, and the overall area served may be larger than the individual zipcode.
- Some effects are obscured by the composite calculations for the PSI-90 value.
- The most obvious point from this quick overview is that most of the variability in the data is not explained by median income.
Areas for further analysis
This is an area that if very interesting to me, so I will likely be looking to dive deeper into this dataset. Here are some of the areas I am hoping to evaluate further:
- Increase dimensionality by looking at a larger range of data from the American Community Survey. This may help explain the rest of the variability in the data.
- Expand characterization of the populations served by these hospitals. This could include analyzing the region the hospital is located in, not just the zip code it is located in.
- Utilize clustering to group zip codes together by a number of factors.
If you have any thoughts or questions feel free to email me at ([email protected]) or message me on LinkedIn.