The low-frequency impulsive gunshot vocalizations of baleen whales exhibit dispersive propagation in shallow-water channels which is well-modeled by normal mode theory. Typically, underwater acoustic source range estimation requires multiple time-synchronized hydrophone arrays which can be difficult and expensive to achieve. However, single-hydrophone modal dispersion has been used to range baleen whale vocalizations and estimate shallow-water geoacoustic properties. Although convenient when compared to sensor arrays, these algorithms require preliminary signal detection and human labor to estimate the modal dispersion. In this paper, we apply a temporal convolutional network (TCN) to spectrograms from single-hydrophone acoustic data for simultaneous gunshot detection and ranging. The TCN learns ranging and detection jointly using gunshots simulated across multiple environments and ranges along with experimental noise. The synthetic data are informed by only the water column depth, sound speed, and density of the experimental environment, while other parameters span empirically observed bounds. The method is experimentally verified on North Pacific right whale gunshot data collected in the Bering Sea. To do so, 50 dispersive gunshots were manually ranged using the state-of-the-art time-warping inversion method. The TCN detected these gunshots among 50 noise-only examples with high precision and estimated ranges which closely matched those of the physics-based approach.