[36] Colin Lea, Michael D Flynn, Rene Vidal, Austin Reiter, and
Gregory D Hager. Temporal convolutional networks for ac-
tion segmentation and detection. In CVPR, 2017. 2
[37] Colin Lea, Austin Reiter, René Vidal, and Gregory D Hager.
Segmental spatiotemporal CNNs for fine-grained action seg-
mentation. In ECCV, 2016. 2
[38] Peng Lei and Sinisa Todorovic. Temporal deformable resid-
ual networks for action segmentation in videos. In CVPR,
2018. 2
[39] Yichong Leng, Zehua Chen, Junliang Guo, Haohe Liu, Ji-
awei Chen, Xu Tan, Danilo Mandic, Lei He, Xiang-Yang Li,
Tao Qin, et al. Binauralgrad: A two-stage conditional diffu-
sion probabilistic model for binaural audio synthesis. arXiv
preprint arXiv:2205.14807, 2022. 3
[40] Muheng Li, Lei Chen, Yueqi Duan, Zhilan Hu, Jianjiang
Feng, Jie Zhou, and Jiwen Lu. Bridge-prompt: Towards or-
dinal action understanding in instructional videos. In CVPR,
2022. 2, 5, 6
[41] Shi-Jie Li, Yazan AbuFarha, Yun Liu, Ming-Ming Cheng,
and Juergen Gall. MS-TCN++: Multi-stage temporal con-
volutional network for action segmentation. IEEE TPAMI,
2020. 1, 2, 4, 5, 6, 7
[42] Yunheng Li, Zhuben Dong, Kaiyuan Liu, Lin Feng, Lianyu
Hu, Jie Zhu, Li Xu, Shenglan Liu, et al. Efficient two-step
networks for temporal action segmentation. Neurocomput-
ing, 2021. 2
[43] Daochang Liu, Qiyue Li, Tingting Jiang, Yizhou Wang,
Rulin Miao, Fei Shan, and Ziyu Li. Towards unified surgical
skill assessment. In CVPR, 2021. 1
[44] Zhichao Liu, Leshan Wang, Desen Zhou, Jian Wang,
Songyang Zhang, Yang Bai, Errui Ding, and Rui Fan. Tem-
poral segment transformer for action segmentation. arXiv
preprint arXiv:2302.13074, 2023. 2
[45] Calvin Luo. Understanding diffusion models: A unified per-
spective. arXiv preprint arXiv:2208.11970, 2022. 3
[46] Khoi-Nguyen C Mac, Dhiraj Joshi, Raymond A Yeh, Jinjun
Xiong, Rogerio S Feris, and Minh N Do. Learning motion
in feature space: Locally-consistent deformable convolution
networks for fine-grained action detection. In ICCV, 2019. 2
[47] Junyong Park, Daekyum Kim, Sejoon Huh, and Sungho Jo.
Maximization and restoration: Action segmentation through
dilation passing and temporal reconstruction. Pattern Recog-
nition, 2022. 2, 6
[48] Konpat Preechakul, Nattanat Chatthee, Suttisak Wizad-
wongsa, and Supasorn Suwajanakorn. Diffusion autoen-
coders: Toward a meaningful and decodable representation.
In CVPR, pages 10619–10629, 2022. 3
[49] Robin Rombach, Andreas Blattmann, Dominik Lorenz,
Patrick Esser, and Björn Ommer. High-resolution image syn-
thesis with latent diffusion models. In CVPR, 2022. 3
[50] Fadime Sener, Dipika Singhania, and Angela Yao. Temporal
aggregate representations for long-range video understand-
ing. In ECCV, 2020. 2
[51] Bharat Singh, Tim K Marks, Michael Jones, Oncel Tuzel,
and Ming Shao. A multi-stream bi-directional recurrent neu-
ral network for fine-grained action detection. In CVPR, 2016.