Liting Zhou, Jianquan Liu, Shoji Nishimura, Joseph Antony, Cathal Gurrin
Refereed Conference Meeting Proceeding
Abstract—Notwithstanding recent advances in machine vision, video activity recognition from multiple cameras still remains a challenging task as many real-world interactions cannot be automatically recognised for many reasons, such as partial occlusion or coverage black-spots. In this paper we propose a new technique that infers the unseen relationship between two individuals captured by different cameras and use it to retrieve relevant video clips if there is a likely interaction between the two individuals. We introduce a human object interaction (HOI) model integrating the causal relationship between the humans and the objects. For this we first extract the key frames and generate the labels or annotations using the state-of-the-art image captioning models. Next, we extract SVO (subject, verb, object) triples and encode the descriptions into a vector form for HOI inference using the Stanford CoreNLP parser. In order to calculate the HOI co-existence and the possible causality score we use transfer entropy. From our experimentation, we found that integrating casual relations into the content indexing process and using transfer entropy to calculate the causality score leads to improvement in retrieval performance.
Digital Object Identifer (DOI):
Dublin City University (DCU)
Open access repository: