Previous attempts to classify task from eye movement data have relied on model architectures designed to emulate theoretically defined cognitive processes and/or data that have been processed into aggregate (e.g., fixations, saccades) or statistical (e.g., fixation density) features. Black box convolutional neural networks (CNNs) are capable of identifying relevant features in raw and minimally processed data and images, but difficulty interpreting these model architectures has contributed to challenges in generalizing lab-trained CNNs to applied contexts. In the current study, a CNN classifier was used to classify task from two eye movement datasets (Exploratory and Confirmatory) in which participants searched, memorized, or rated indoor and outdoor scene images. The Exploratory dataset was used to tune the hyperparameters of the model, and the resulting model architecture was retrained, validated, and tested on the Confirmatory dataset. The data were formatted into timelines (i.e., x-coordinate, y-coordinate, pupil size) and minimally processed images. To further understand the informational value of each component of the eye movement data, the timeline and image datasets were broken down into subsets with one or more components systematically removed. Classification of the timeline data consistently outperformed the image data. The Memorize condition was most often confused with Search and Rate. Pupil size was the least uniquely informative component when compared with the x- and y-coordinates. The general pattern of results for the Exploratory dataset was replicated in the Confirmatory dataset. Overall, the present study provides a practical and reliable black box solution to classifying task from eye movement data.