Computer vision techniques have made considerable progress in recognizing object categories by learning models that normally rely on a set of discriminative features. However, in contrast to human perception that makes extensive use of logic-based rules, these models fail to benefit from knowledge that is explicitly provided. In this paper, we propose a framework that can perform knowledge-assisted analysis of visual content. We use ontologies to model the domain knowledge and a set of conditional probabilities to model the application context. Then, a Bayesian network is used for integrating statistical and explicit knowledge and performing hypothesis testing using evidence-driven probabilistic inference. In addition, we propose the use of a focus-of-attention (FoA) mechanism that is based on the mutual information between concepts. This mechanism selects the most prominent hypotheses to be verified/tested by the BN, hence removing the need to exhaustively test all possible combinations of the hypotheses set. We experimentally evaluate our framework using content from three domains and for the following three tasks: 1) image categorization; 2) localized region labeling; and 3) weak annotation of video shot keyframes. The results obtained demonstrate the improvement in performance compared to a set of baseline concept classifiers that are not aware of any context or domain knowledge. Finally, we also demonstrate the ability of the proposed FoA mechanism to significantly reduce the computational cost of visual inference while obtaining results comparable to the exhaustive case.