NASA & AidData | Data Mining
Data Mining & Automation
Data Science / Python / Process Efficiency
I majored in Applied Mathematics as an undergraduate, focusing on utilizing process automation to battle tedium and enhance workflow efficiency.
AidData: Automating Geocoding
College of William & Mary, Summer 2014
My first internship at a research lab involved manually extracting locations from hundreds of pages of foreign aid project documents. AidData, the lab, aimed to increase foreign aid transparency and efficacy by compiling a geographic database of project locations, funding, and outcomes. This database was built by interns - there were 20 others like me, skimming PDFs all day. Morale was low. These students thought they’d be contributing to meaningful research, instead they were buried under menial work. I applied for grants to automate the process. With funding from the William & Mary EXTREEMS-QED Program and Charles Center, I spent the summer building a Python program using text mining, named entity recognition, approximate string matching, and geographic outlier modeling. It predicted locations and produced abbreviated reading materials that shortened the word count by over 80% and consistently contained every correct location. The program was used to increase interns’ efficiency while I continued improving its location prediction accuracy as a research assistant. This research was presented at the William & Mary EXTREEMS-QED Summer 2014 Research Showcase, Charles Center Summer 2014 Research Showcase, and Informs Computing Society Conference 2015. A weekly account of my progress is published on the Charles Center Summer Research Blog.
NASA: Automating Requirements Gathering
College of William & Mary, Spring 2016
I solved a similar problem for NASA as my undergraduate senior capstone project, replacing a manual process of extracting and flagging requirements from elements and system requirements documents with a text-mining Python program. I presented this research at NASA Langley Research Center and trained a group of scientists to use it in their workflow.