Finding the Needle in the Haystack: Can Natural Language Processing of Students' Evaluations of Teachers Identify Teaching Concerns?

C Jessica Dine; Judy A Shea; Caitlin B Clancy; Janae K Heath; William Pluta; Jennifer R Kogan

doi:10.1007/s11606-024-08990-6

Finding the Needle in the Haystack: Can Natural Language Processing of Students' Evaluations of Teachers Identify Teaching Concerns?

J Gen Intern Med. 2024 Aug 21. doi: 10.1007/s11606-024-08990-6. Online ahead of print.

Authors

C Jessica Dine¹, Judy A Shea¹, Caitlin B Clancy¹, Janae K Heath¹, William Pluta¹, Jennifer R Kogan²

Affiliations

¹ Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA.
² Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA. [email protected].

PMID: 39167336
DOI: 10.1007/s11606-024-08990-6

Abstract

Background: Institutions rely on student evaluations of teaching (SET) to ascertain teaching quality. Manual review of narrative comments can identify faculty with teaching concerns but can be resource and time-intensive.

Aim: To determine if natural language processing (NLP) of SET comments completed by learners on clinical rotations can identify teaching quality concerns.

Setting and participants: Single institution retrospective cohort analysis of SET (n = 11,850) from clinical rotations between July 1, 2017, and June 30, 2018.

Program description: The performance of three NLP dictionaries created by the research team was compared to an off-the-shelf Sentiment Dictionary.

Program evaluation: The Expert Dictionary had an accuracy of 0.90, a precision of 0.62, and a recall of 0.50. The Qualifier Dictionary had lower accuracy (0.65) and precision (0.16) but similar recall (0.67). The Text Mining Dictionary had an accuracy of 0.78 and a recall of 0.24. The Sentiment plus Qualifier Dictionary had good accuracy (0.86) and recall (0.77) with a precision of 0.37.

Discussion: NLP methods can identify teaching quality concerns with good accuracy and reasonable recall, but relatively low precision. An existing, free, NLP sentiment analysis dictionary can perform nearly as well as dictionaries requiring expert coding or manual creation.

Keywords: Natural language processing; Student evaluations of teaching; Teaching quality.