Subjective rating of cosmetic treatment with botulinum toxin type A: do existing measures demonstrate interobserver validity?

Ann Plast Surg. 2012 Oct;69(4):350-5. doi: 10.1097/SAP.0b013e31824a43e0.

Abstract

Background: Throughout the literature, investigators have assessed the cosmetic efficacy of botulinum toxin (BT) treatment by using various subjective, qualitative measures, including the Facial Wrinkle Scale (FWS) and Subject Global Assessment (SGA). The widely used FWS and SGA attempt to quantify both the magnitude and duration of cosmetic outcomes as assessed by physician and patient. We sought to determine the interobserver validity of these scales relative to the level of observer experience.

Methods: Botulinum toxin injections were performed to cosmetic effect in 6 patients recruited as part of an institutional review board-approved investigation. Subjects were photographed at rest and during animation (raising eyebrows, frowning, and blinking) before treatment and at 1, 2, 4 weeks, and monthly with follow-up to 6 months. Standardized digital 8″×10″ prints were scored using the FWS by board-certified plastic surgeons (n=5), general surgery residents (n=3), and medical students (n=4). Photographs at each time point were then compared to baseline using the SGA. Statistical analysis of observer data was performed using SPSS v19. Cohen κ (FWS) and Spearman ρ (SGA) were calculated for each pairwise comparison of observer data, with a conservative α of 0.01.

Results: The FWS observer scores for the upper face overall were generally in agreement, with no negative κ values. The distribution, even among members of a single group, was highly variable. Agreement among plastic surgeons was the greatest (κ, 0.194-0.609). Resident concordance was moderate, and medical students displayed the most variable agreement. Spearman ρ for SGA scores was much higher, with surgeons approaching excellent agreement (κ, 0.443-0.992). In comparisons between members of different groups, agreement was unpredictable for both the FWS and SGA. Comparisons using scores from individual areas of the face were least concordant.

Conclusions: The FWS and SGA represent the current standard of cosmetic outcomes measures; however, when subjected to scrutiny they display relatively unpredictable agreement even among plastic surgeons. Compared to the FWS, the SGA has a more acceptable user concordance, especially among plastic surgeons accustomed to using such scales. The interobserver variability of FWS and SGA scoring underlines the need to explore objective, quantitative cosmetic outcomes measures.

Publication types

  • Validation Study

MeSH terms

  • Adult
  • Botulinum Toxins, Type A / administration & dosage
  • Botulinum Toxins, Type A / pharmacology*
  • Cosmetic Techniques*
  • Female
  • Humans
  • Injections, Subcutaneous
  • Middle Aged
  • Neuromuscular Agents / administration & dosage
  • Neuromuscular Agents / pharmacology*
  • Observer Variation
  • Outcome Assessment, Health Care / methods*
  • Photography
  • Physicians
  • Reproducibility of Results
  • Skin Aging / drug effects*
  • Students, Medical

Substances

  • Neuromuscular Agents
  • Botulinum Toxins, Type A