I completed this project as part of the AdventHealth case study at the University of Florida. The goal of this case study was to improve a survey used to collect information from employees who use a mail order pharmacy system and to find a reproducible way to classify free response survey questions. In order to classify the free response results, I wrote a Python script that performed a keyword analysis. You can find this code here
To classify the survey results into different categories, I performed a keyword analysis on the survey responses to sort them into predefined categories, or into a custom category that the user was prompted to define. All words in a review were stemmed to obtain their root using multiple methods in order to avoid misclassification from stemming errors. Then, keywords were searched from these word lists and reviews were sorted into different categories.
In addition to classifying the survey results, I also wanted to score each review to obtain the sentiment of the review. To achieve this, I used the python Afinn Library. This library assigns each word in a passage an affinity score which measures how positive or negative a review is, with scores below 0 being negative, at 0 neutral and above 0 positive. The sum of the score for each word in a review is that reviews affinity score.
After performing the classification, we could look at the average affinity score of all reviews in a category and of all reviews collected. The bar graph below shows the average affinity score of the reviews in each category.
Since the average affinity score of each category is above 0, most reviews were positive. Service reviews were the most positive, with an affinity score of 1.46. Timeliness related reviews, on the other hand, had the lowest average affinity, with an affinity score of 0.38.
Only 116 survey free responses were provided for this project, so a machine learning algorithm was not feasible to apply, but a clustering algorithm could provide more comprehensive and accurate results if a larger sample size were available.