Two new fluoroscopic fiducial tracking methods that exploit the spatial relationship among the multiple implanted fiducial to achieve fast, accurate and robust tracking are proposed in this paper. The spatial relationship between multiple implanted markers are modeled as Gaussian distributions of their pairwise distances over time. The means and standard deviations of these distances are learned from training sequences, and pairwise distances that deviate from these learned distributions are assigned a low spatial matching score. The spatial constraints are incorporated in two different algorithms: a stochastic tracking method and a detection based method. In the stochastic method, hypotheses of the 'true' fiducial position are sampled from a pre-trained respiration motion model. Each hypothesis is assigned an importance value based on image matching score and spatial matching score. Learning the parameters of the motion model is needed in addition to learning the distribution parameters of the pairwise distances in the proposed stochastic tracking approach. In the detection based method, a set of possible marker locations are identified by using a template matching based fiducial detector. The best location is obtained by optimizing the image matching score and spatial matching score through non-serial dynamic programming. In this detection based approach, there is no need to learn the respiration motion model. The two proposed algorithms are compared with a recent work using a multiple hypothesis tracking (MHT) algorithm which is denoted by MHT, Tang et al (2007 Phys. Med. Biol. 52 4081-98). Phantom experiments were performed using fluoroscopic videos captured with known motion relative to an anthropomorphic phantom. The patient experiments were performed using a retrospective study of 16 fluoroscopic videos of liver cancer patients with implanted fiducials. For the motion phantom data sets, the detection based approach has the smallest tracking error (μerr: 0.78-1.74 mm, σerr: 0.39-1.16 mm) for the images taken at low exposure (50 mAs). At higher exposure (500 mAs), the stochastic method gave the best performance (μerr: ∼0.39 mm, σerr: ∼0.27 mm). In contrast, the tracker (MHT) that does not model the spatial constraints only performs well when there is no occluded fiducial. With the RANDO phantom data, both of our proposed methods performed well and have the mean tracking errors around ∼1.8 mm with the standard deviations ∼0.93 mm at 100 mAs and ∼0.91 mm with 0.88 mm standard deviation at 500 mAs. The MHT tracker has the largest tracking errors with mean ∼4.8 mm) and standard deviation ∼2.4 mm in both sessions with the Rondo phantom data. On the patient data sets, the detection based method gave the smallest error (μerr: 0.39 mm, σerr: ∼0.19 mm). The stochastic method performed well (μerr: ∼0.58 mm, σerr: ∼0.39 mm) when the patient breathed consistently, the accuracy dropped to (μerr: ∼1.55 mm) when the patient breathed differently across sessions.