Several studies have recently applied sentiment-based lexicons to Twitter to gauge local sentiment to understand health behaviors and outcomes for local areas. While this research has demonstrated the vast potential of this approach, lingering questions remain regarding the validity of Twitter mining and surveillance in local health research. First, how well does this approach predict health outcomes at very local scales, such as neighborhoods? Second, how robust are the findings garnered from sentiment signals when accounting for spatial effects? To evaluate these questions, we link 2,076,025 tweets from 66,219 distinct users in the city of San Diego over the period of 2014-12-06 to 2017-05-24 to the 500 Cities Project data and 2010-2014 American Community Survey data. We determine how well sentiment predicts self-rated mental health, sleep quality, and heart disease at a census tract level, controlling for neighborhood characteristics and spatial autocorrelation. We find that sentiment is related to some outcomes on its own, but these relationships are not present when controlling for other neighborhood factors. Evaluating our encoding strategy more closely, we discuss the limitations of existing measures of neighborhood sentiment, calling for more attention to how race/ethnicity and socio-economic status play into inferences drawn from such measures.