FDA Adverse Event API - a First Pass

To make blogging weekly more attainable, I am going to start making these posts less once I have something working and more of the process to get it working. Hopefully this makes it a little easier to keep momentum going on these.

Anyway, today I was looking at accessing some of the fda medical device report data using the FDA’s API. The FDA requests citation so here it is:

Data provided by the U.S. Food and Drug Administration (https://open.fda.gov)

Before writing this, I played around with writing a manual query and stumbled across the OpenHealth openfda R package. I found this pretty useful, particularly when going through the process of iterating a query to get what I wanted. Using the pipe functionality made it very simple to update in this way.

First we will get the library declarations out of the way.

library("openfda")

Now to the actual use of the openfda package. Since I have some familiarity with hip orthopedics, I started with something simple that I would likely be able to understand. I filtered reports for devices containing the word “hip” in the “device.generic_name” field, and then counted the number of occurrences of that field.

hipcount <- fda_query("/device/event.json") %>%
  fda_filter("device.generic_name","hip")  %>%
  fda_count("device.generic_name") %>%
  fda_exec()
## Fetching: https://api.fda.gov/device/event.json?search=device.generic_name:hip&count=device.generic_name
head(hipcount, 10)
##           term  count
## 1          hip 207498
## 2   prosthesis  54678
## 3      femoral  49040
## 4   acetabular  35582
## 5         head  27868
## 6        total  26830
## 7  replacement  22398
## 8      implant  21698
## 9         stem  20204
## 10         cup  16587

Outputing the first 10 count values makes it clear that I am not counting what I want. “Acetabular” and “cup” are typically used to refer to the same component in a total hip arthoplasty. After looking back at the documentation, I missed that without the “.exact” suffix, I will get all of the tokenized words, not whole fields.

hipcountexact <- fda_query("/device/event.json") %>%
  fda_filter("device.generic_name","hip")  %>%
  fda_count("device.generic_name.exact") %>%
  fda_exec()
## Fetching: https://api.fda.gov/device/event.json?search=device.generic_name:hip&count=device.generic_name.exact
head(hipcountexact, 10)
##                           term count
## 1              PROSTHESIS, HIP 31347
## 2             HIP FEMORAL HEAD 21298
## 3        TOTAL HIP REPLACEMENT 15043
## 4      HIP FEMORAL STEM/SLEEVE 14033
## 5           HIP ACETABULAR CUP 12921
## 6  HIP ACETABULAR INSERT/LINER 12848
## 7                HIP COMPONENT  8649
## 8         HIP INSTRUMENT/TRIAL  8208
## 9               HIP PROSTHESIS  8190
## 10   ASR TOTAL HIP REPLACEMENT  5740

That looks a lot better, but not perfect. There is still the issue that multiple terms refer to effectively the entire hip joint construct. I will have to do some reading on how to handle data that is coded similarly, but not the same, as I expect most fields will be filled in differently by different manufacturers. The next step, I will look at manufacturers to see how those counts may impact things. I noticed when entering some of these requests into my web browser that there was a field for “medical_specialty_description.” I used this as another filter to ensure I was only getting orthopedic products.

hipmfg <- fda_query("/device/event.json") %>%
  fda_filter("device.generic_name","hip")  %>%
  fda_filter("device.openfda.medical_specialty_description","Orthopedic")  %>%
  fda_count("device.manufacturer_d_name.exact") %>%
  fda_exec()
## Fetching: https://api.fda.gov/device/event.json?search=device.generic_name:hip+AND+device.openfda.medical_specialty_description:Orthopedic&count=device.manufacturer_d_name.exact
head(hipmfg, 10)
##                                term count
## 1          DEPUY ORTHOPAEDICS, INC. 24917
## 2               DEPUY INTERNATIONAL 23899
## 3                BIOMET ORTHOPEDICS 21977
## 4  DEPUY INTERNATIONAL LTD. 8010379 14034
## 5         DEPUY ORTHOPAEDICS INC US 11381
## 6  DEPUY ORTHOPAEDICS, INC. 1818910 10780
## 7               ZIMMER BIOMET, INC.  8863
## 8       STRYKER ORTHOPAEDICS-MAHWAH  8538
## 9        MICROPORT ORTHOPEDICS INC.  6971
## 10                     ZIMMER, INC.  5636

This shows similar issues to what the “generic_name” field showed, where there are repeats. If I had to guess, this probably has to deal with how these companies submit their informaiton to the FDA, as well as acquisitions changing company names (i.e. Zimmer Biomet vs just Zimmer).

After this first experiment with the FDA API, I have a few thoughts on where to go with this.

  1. How can I expand the r package to fetch all data from a request?

Currently the data needed to request more than the limit of a single API request is housed in the “meta” section of the generated JSON file. I should be able to use the “skip”, “limit”, and “total” values present in the meta section to iterate through a request until all results are gathered. As the current openfda package does not access the “meta” section, I will have to fork it and work from there.

  1. How to handle the text processing to combine fields appropriately?

As this post has shown, the FDA adverse event data is filled with similar terms that mean the same thing. I am going to do some reading on how these issues have been handled by other. Hopefully they all are not manually combining the fields.

As always, if you have any feedback or questions please email me or connect on linkedin.

Categories: , ,

Updated: