FDA Adverse Event API - a First Pass
To make blogging weekly more attainable, I am going to start making these posts less once I have something working and more of the process to get it working. Hopefully this makes it a little easier to keep momentum going on these.
Anyway, today I was looking at accessing some of the fda medical device report data using the FDA’s API. The FDA requests citation so here it is:
Data provided by the U.S. Food and Drug Administration (https://open.fda.gov)
Before writing this, I played around with writing a manual query and stumbled across the OpenHealth openfda R package. I found this pretty useful, particularly when going through the process of iterating a query to get what I wanted. Using the pipe functionality made it very simple to update in this way.
First we will get the library declarations out of the way.
Now to the actual use of the openfda package. Since I have some familiarity with hip orthopedics, I started with something simple that I would likely be able to understand. I filtered reports for devices containing the word “hip” in the “device.generic_name” field, and then counted the number of occurrences of that field.
Outputing the first 10 count values makes it clear that I am not counting what I want. “Acetabular” and “cup” are typically used to refer to the same component in a total hip arthoplasty. After looking back at the documentation, I missed that without the “.exact” suffix, I will get all of the tokenized words, not whole fields.
That looks a lot better, but not perfect. There is still the issue that multiple terms refer to effectively the entire hip joint construct. I will have to do some reading on how to handle data that is coded similarly, but not the same, as I expect most fields will be filled in differently by different manufacturers. The next step, I will look at manufacturers to see how those counts may impact things. I noticed when entering some of these requests into my web browser that there was a field for “medical_specialty_description.” I used this as another filter to ensure I was only getting orthopedic products.
This shows similar issues to what the “generic_name” field showed, where there are repeats. If I had to guess, this probably has to deal with how these companies submit their informaiton to the FDA, as well as acquisitions changing company names (i.e. Zimmer Biomet vs just Zimmer).
After this first experiment with the FDA API, I have a few thoughts on where to go with this.
- How can I expand the r package to fetch all data from a request?
Currently the data needed to request more than the limit of a single API request is housed in the “meta” section of the generated JSON file. I should be able to use the “skip”, “limit”, and “total” values present in the meta section to iterate through a request until all results are gathered. As the current openfda package does not access the “meta” section, I will have to fork it and work from there.
- How to handle the text processing to combine fields appropriately?
As this post has shown, the FDA adverse event data is filled with similar terms that mean the same thing. I am going to do some reading on how these issues have been handled by other. Hopefully they all are not manually combining the fields.
As always, if you have any feedback or questions please email me or connect on linkedin.