What can a cook in Italy teach a mechanic in India? Action Recognition Generalisation Over Scenarios and Locations
This is the official resource for the paper "What can a cook in Italy teach a mechanic in India? Action Recognition Generalisation Over Scenarios and Locations" including the dataset (ARGO1M) and the code.
If you use the ARGO1M dataset and/or our CIR method code, please cite:
@inproceedings{Plizzari2023,
title={What can a cook in Italy teach a mechanic in India? Action Recognition Generalisation Over Scenarios and Locations},
author={Plizzari, Chiara and Perrett, Toby and Caputo, Barbara and Damen, Dima},
booktitle={ICCV2023},
year={2023}}
We provide modified training scripts for CIR to replicate paper results. To install dependencies:
conda env create -f environment.yml
Our annotated clips making up ARGO1M, are curated from videos of the large-scale Ego4D dataset. Before using ARGO1M, you thus need to sign the EGO4D License Agreement. Here are the three steps to follow for downloading the dataset:
-
Go to ego4ddataset.com to review and execute the EGO4D License Agreement, and you will be emailed a set of AWS access credentials when your license agreement is approved, which will take 48hrs.
-
The datasets are hosted on Amazon S3 and require credentials to access. AWS CLI uses the credentials stored in the home directory file:
~/.aws/credentials
. If you already have credentials configured then you can skip this step. If not, then:
- Install the AWS CLI from: AWS CLI
- Open a command line and type
aws configure
- Leave the default region blank, and enter your AWS access id and secret key when prompted.
The CLI requires python >= 3.8. Please install the prerequisites via python setup.py install
(easyinstall) at the repo root, or via pip install -r requirements.txt
.
- Download the dataset using the following command:
python code/scripts/download_all.py --flag DEST_DIR
, whereflag
is eitherffcv
orcsv
.
You can directly download our FFCV encodings for all ARGO1M splits as well as the CSV files described below.
We provide the .csv files for all the proposed splits.
Those contain the following entries:
uid
: uid of the video clip;scenario_idx
: scenario label (index-scenario association in index_scenario.txt);location_idx
: location label (index-location association in index_location.txt);label
: action label (index-action association in index_verb.txt);timestamp
: starting timestamp;timeframe
: starting timeframe;narration
: narration;action_start_feature_idx
: starting feature index for SlowFast pre-extracted features;action_end_feature_idx
: ending feature index for SlowFast pre-extracted features.
To speed up training, we used FFCV encodings of both training and test sets for each of the proposed splits.
We also provide the scripts for extracting them using the given CSV files. After downloading Ego4D SlowFast features, you can extract FFCVs by running:
python /scripts/dataset_ffcv_encode.py --config /configs/{config_file}.yaml --split {split_name}
We designed the code in such a way that makes it easier to try your own methods and losses on top of it. Suppose you want to introduce a new module called MyModule in our pipeline.
-
You can define MyModule in
models.py
. -
In the corresponding
config.yaml
, you can add tomodel_types
MyModule, with corresponding attributes inmodel_names
,model_lrs
,model_use_train
,model_use_eval
andstep
. -
In
model_inputs
, you can specify the input to MyModule, by prepending the model name that has provided the output, in the form {"arg":"other_model_name.output_name"}, e.g. {"input_logits":"mlp.logits"}. -
You can do the same by adding your new loss MyLoss in
loss_types
, along with the correspondingloss_names
, and by specifying the correspondingloss_inputs
in the form {"arg":"other_model_name.output_name"}, e.g. {"logits":"mlp.logits"}.
The folder scripts
contains code and bash scripts to reproduce the paper results. To re-create CIR results:
-
Modify
config
internal paths to match the location of FFCV data. -
Run
python run.py --config configs/config_run/run_CIR.yaml
All files in this repository are copyright by us and published under the Creative Commons Attribution-NonCommerial 4.0 International License, found here. This means that you must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. You may not use the material for commercial purposes.