Current biostatistics graduate student at the University of Minnesota. Previously a med device engineer. All opinions are my own. Available for engineering or statistical consulting.
I’ve been wanting to work on some data mining techniques for awhile. As the semester is coming to a close, I figured procrastinating on this take home exam was a good time to get started. To begin with, I thought I would use the FDA Adverse Event Reporting System (FAERS), which “contains adverse event reports, medication error reports and product quality complaints” per the FDA Website.
I have previously used the FDA API and found it to be a bit cumbersome, so I was looking for a different solution to managing the data. I’ve spent some time learning SQL this semester, so this seemed like a good time to start putting it to use. I downloaded MYSQL 8.0 and setup a local database. After some configuration issues, I had it working and just needed to get the data. The files are available from the FDA website in the following format: “https://fis.fda.gov/content/Exports/faers_ascii_yyyy_q#.zip”. For example, the data for 2012, 4th quarter can be downloaded from: https://fis.fda.gov/content/Exports/faers_ascii_2012q4.zip.
Since it is in such a consistent format, it is fairly straightforward to download all of the files. I wrote a couple of python scripts to download the files, extract all of the data files into one folder, and then generate a MYSQL shell script to load the data into the MYSQL database.
Code
The code is a pretty quick and dirty solution for now. If I end up trying to download some of the other FDA adverse event data, then I may revisit it and create a more complete solution. The github link for the files is below:
The current version of the code is also included below and some very brief discussion as well.
Download
This section was fairly easy to put together, as it simply requires python to download each file based on a pattern, and only retain the .txt files.
SQL script generation for import
When importing the data, I used a MYSQL script in the form of the code chunk below:
As there are a lot of import statements (8 for every quarter per year from 2004 to 2019), it was much easier to generate them into a file using python and then run the script from the MYSQL shell.
The script below generates the SQL file, as well as the lines creating all of the necessary tables.
After creating this file, it imported all of the data into the SQL database. Currently all of the variables are stored as varchar variables. I am planning on fine tuning the import once I examine all of the data types more carefully, but that will be a problem for when the semester is officially over.