Home
REDVINE Version 0.5 Beta Documentation
Contents
1. efficient comparison of interactions by means of suitable similarity measure as described below However the expressive of activity graphs may be plagued by noisy quali tative spatio temporal relationships which arises due to the jitter arising from unstable bounding boxes and may not represent intuitive transitions between spatial states To address this problem in 4 we proposed an approach which in corporates a temporal model that reflects more natural state transitions learned from a manually annotated ground truth and therefore discourages unintuitive state transitions that may arise from video noise The temporal model arises from a state transition matrix which is defined on the Conceptual Neighbour hood Graphs CNG 4 for topological relationships RCC5 CNGs naturally capture transitions between mutually exclusive and exhaustive et of qualitative relationships at a certain level of granularity and thus enable the definition of a well defined temporal model This property motivates our choice of using CNGs arising from well researched calculi such as RCC5 QTC6 DIR4 Thus each sequence of episodes e g Fig 1 a correspond to one of the three different types of qualitative relations e g in Fig 1 a corresponds to topology RCC5 and relative trajectories QTC6 3 3 Learning Verb Models of Interaction Sub graphs An activity graph exhaustively represents all interactions i e all spatial relations between all pairs
2. mation can also be represented as layer 2 nodes Spatial Relations Disconnected Dr Event Ground Truth Background o 008 ge Chase eos f oe Approach Figure 5 An illustration of the relational approach 3 2 Video Activity Graphs We propose that activity graphs a natural way of representing interactions be tween objects participating in video activities using qualitative spatial and temporal relations We describe how an interaction between a person and a ball that occurs in a particular video is captured by the corresponding activity graph At the top in Fig 1 a is a sequence of images representing the in teraction between a person and a ball where the ball approaches the person the person chases the ball and then they are static with respect to each other Below that is shown two parallel sequences of episodes 2 where each episode corresponds to an interval during which a qualitative spatial relationship holds maximally For example the relationship Disconnected Dr holds maximally for the interval J between the objects O and O 2 respectively as shown in Fig la As already mentioned above activity graphs are three layered graphs in which the layer 1 nodes are mapped to the interacting object tracks Layer 2 nodes represent all the episodes between all those pairs of tracks which are co temporally observed and are labelled with their respective maximal spatial relation as shown in Fig 1
3. REDVINE Version 0 5 Beta Documentation Muralikrishna Sridhar Feng Gu Anthony G Cohn and David C Hogg School of Computing University of Leeds 1 Introduction This documentation details the user manual and technical background of the REDVINE syste a novel approach where interactions in a video are represented using an activity graph The activity graph embodies all the interactions i e all the qualitative spatiotemporal relations between all pairs of interacting co temporal object tracks during the entire video as well as other properties such as pre computed primitive events and object types Interactions between subsets of objects or those involving only a subset of spatial relations are captured by interaction sub graphs of an activity graph Learning involves using sub interaction graphs that are labelled in the ground truth with corresponding event label s in order to learn event models for each event under consideration Given an unseen video that we represent as a corresponding activity graph the task of event detection corresponds to finding the most probable covering of the activity graph with sub interaction graphs given a learned event model Each detected interaction sub graph is mapped back to the video in the form of event detections where an event detection corresponds to an interaction by a subset of objects during certain time intervals together with their respective event label s In this documentation we illu
4. _ Save Images Images to Video Speed i Grapheme Histogram Select Verb s 4 A gt Reset M Stop Recognition Initialize Select Display Option h Select Video b Train and Test Video N Fold Figure 4 A screen snapshot of the REDVINE GUI e The button Display Sub Graph allows users to display all the mined activity graphs obtained through the mining process e The button Mine Graphemes is for mining a Grapheme dictio nary that will used for feature extraction e The button Display Graphemes can be used to display the ob tained Grapheme dictionary 4 Feature Extraction e The button Histogram of Graphemes is used to extract features from the mined interaction sub graphs which now are represented as feature vectors of histograms e The button Display Embedded Features allows for the visuali sation of extracted features in a lower dimensional space 5 Learning and Detection e The button Video N Fold CV allows users to start the recognition and detection process where currently only the option of video based N fold N 5 cross validation is provided however in the future more options will added e g the option of choosing different classi fiers and the option of specifying the value of N 3 Technical Details 3 1 System Overview Recognizing verbs in videos has many interesting applications and is a reason ably well researched area However
5. a For example the sub graph with the two layer 1 nodes and the first of the layer 2 nodes labelled with Dr represents the fact that the relation Dr holds between O and O during the interval I3 and is equivalent to the logical formulae Holds O O2 Dr I4 Layer 3 nodes of the activity graph are labelled with Allens apostrophe temporal relations between certain pairs of intervals corresponding to episodes represented by respective pairs of layer 2 nodes and these pairs are characterized by the property that both intervals of a pair do not lie outside and on the same side of the interval characterizing the interaction For example the sub graph formed by the two layer 2 episode nodes labelled as Dr and Po respectively and the first of the layer 3 node labelled with the symbol m represents the fact that the relationship meets signified by the symbol m holds between the intervals corresponding to these two episodes namely J and I5 and is equivalent to the logical formulae Meets J4 I5 Note that we do not represent the relations between the intervals J and J as both of them can potentially lie outside and on the same side of the interval shown The choice of a graph based representation as opposed to a logic based one is motivated by the rationale that they offer a computationally efficient alternative to logical predicates by avoiding repetition of object and episode variables and also provide a well defined and computationally
6. action Graphs button can be used to start the computation process of activity graphs e The Select Video for Display button allows the visualisation of a selected video 3For the illustration of these relationships please refer to 3 which can be found at http www comp leeds ac uk qsr publications html 4Details of the Viper format are provided in http viper toolkit sourceforge net docs file the current system only accepts tracks presented as the Viper format V Parse Viper Files V Compute Interaction Graphs Q Rel Q Rel Primitive Events o y Select a Verb All the Videos Q Rel Graphs Select a Video V Display Options Adjustable Speed Spatial Relationships Shown Interaction Graphs Shown V Output Options Save Images Images to Video Figure 2 A flowchart of the relational representation stage Mine Activity Graph Display Mined Interaction Graphs Obtain Graphemes Frequency Based Display Graphemes Extract Features Recognition Detection N fold Cross Validation Figure 3 A flowchart of the learning and classification stage e In the list of verbs selected in the Set Up session will be shown for users to select a particular verb for visualisation e The Play Video s can be used to visualise for all the videos of th
7. bilise the interpretation of mereotopological relations In Max J Egenhofer Nicholas A Giudice Reinhard Moratz and Michael F Worboys editors COSIT volume 6899 of Lecture Notes in Computer Science pages 110 125 Springer 2011 11
8. e selected verb e Alternatively the Show QRel Graphs allows the user to visualise for all the qualitative relations graphs of the selected verb e The two tick boxes Spatial Relationship and Interaction Graph allow users to specify types of information for displaying a video for example if neither is chosen it will only show the tracks and if Spatial Relationship is chosen in addition to tracks spatial re lationships in a video will also be shown and so forth e The two tick boxes Save Images and Images Videos give users the options to save all the images that have been displayed as well as to convert those images into a video through ffmpeg e The slider is used to adjust the speed of displaying functions from slow left to fast right e The button of Stop is for aborting any displaying functions 3 Graph Mining e The button Mine Activity Graph is used to call the process of mining activity graphs from the interaction graph of each video Relational Representation Mining Set Up Parse Viper Files Mine Activity Graph Compute Interaction Graphs Display Mined Graphs Select I O Directories Interaction Graphs X Images Obtain Graphemes Frequency Select Verb for Display Display Graphemes Graphemes z v Select Spatial Rel s Spatial Relations Reset J Interaction Graph Feature Extraction
9. interestingly much of this research has focussed on representation of activities in videos using optical flow based rep resentations The flow based representation has found to be particularly well suited for a reasonable vocabulary of verbs that are predominantly action verbs such as kick and run Some recent approaches focus on extending the flow based approaches to model certain simple interaction verbs such as exchange and follow In this work the interactions between all the object tracks for an entire video is compactly represented using an activity graph The activity graph is a three layered graph which represents the spatio temporal relations between all pairs of co temporal object tracks In short layer one nodes of the activity graph correspond to object tracks layer two nodes represent qualitative spa tial relations that hold for certain mazimal intervals between certain pairs of object tracks that are observed co temporally and layer three nodes represent qualitative temporal relations between certain pairs of these maximal intervals Section 3 2 describes how videos are represented using activity graphs in our framework While an activity graph exhaustively represents all interactions i e all spa tial relations between all pairs of objects in a given video verbs would generally correspond to a subset of smaller interactions As we shall illustrate in sec tion 3 3 Interaction sub graphs of the activity graph natural
10. long to a small subset of classes 3 5 Summary and Future Work One direction for future research is to investigate the role of other qualitative relations and their role in representing activities Another interesting direction is to model human actions by considering relationships between body parts These body parts could be obtained using part based models A principal con tribution of this paper is a general way of addressing problems in video activity understanding using graph based relational learning In the future it would be interesting to extend this formalism to other tasks in activity understanding such as anomaly detection scene description and gap filling References 1 Hanchuan Peng Fuhui Long and Chris Ding Feature selection based on mutual information criteria of max dependency max relevance and min redundancy EEE Transactions on Pattern Analysis and Machine Intelli gence 27 1226 1238 2005 2 Muralikrishna Sridhar Anthony G Cohn and David C Hogg Unsupervised learning of event classes from video In Proc AAAI pages 1631 1638 AAAI Press 2010 3 Muralikrishna Sridhar Anthony G Cohn and David C Hogg Benchmark ing qualitative spatial calculi for video activity analysis In Proc IJCAI Workshop Benchmarks and Applications of Spatial Reasoning pages 15 20 2011 4 Muralikrishna Sridhar Anthony G Cohn and David C Hogg From video to rcc8 Exploiting a distance based semantics to sta
11. ly captures these smaller interactions and are thus readily usable as training samples for super vised training of verb models for each verb class when they are marked with respective verb labels as a part of a manually specified ground truth Section 3 3 describes the task of learning verb models Given an unseen video the task of verb detection can be formulated as the task of finding the most probable covering of the object tracks with interac tions that are detected as belonging to certain verb classes In our framework this task naturally translates to finding the most probable covering of the corre sponding activity graph with interaction sub graphs using a learned verb model We characterize the most probable covering as composed of a few highly prob able interaction sub graphs that are also large in size and are layed out on the activity graph in a manner such that labels for overlapping graphs tend to reflect co occurrence statistics from the training set Each detected sub interaction graph is mapped back to the video in the form of verb detections where a verb detection corresponds to an interaction by a subset of objects dur ing certain time intervals together with their respective verb label s Section 3 4 describes how the video event detection is formulated as finding an optimal covering of an activity graph 5Other properties which hold for intervals of time such as primitive events or type infor
12. of co temporally observed objects in a given video However verbs would generally correspond to a subset of smaller interactions i e a subset of spatial relations between a subset of objects For example the intervals for the two verbs follow and chase corresponds to a smaller interaction involving only a subset of spatial relations namely Ap Pu and Po In our framework the smaller interactions naturally map to interaction sub graphs of the activity graph For example the smaller interaction given above corresponds to an This is because these initial final relations are somewhat arbitrary in different instances of the same class depending on when these intervals start finish interaction sub graph of the activity graph in Fig 1 b which contains the corresponding objects the subset of spatial relations Ap Pu and Po and the appropriate temporal relationships as shown in Fig 1 c As our ground truth consists of such smaller interactions in terms of objects and time intervals with corresponding verb labels the corresponding interac tion sub graphs with their verb labels readily serve as training examples for supervised training of verb models for each verb class v V where V is the set of all verb classes under consideration In general we represent an event not just by one interaction graph that represents the interaction in the interval exactly specified by the ground truth but by a collection of interaction graphs that closely
13. ompute the similarity between a pair of graphs 3 4 Video Event Detection as finding an Optimal Covering of an Activity Graph Given an unseen video the task of verb detection can be formulated as the task of finding the most probable covering of the object tracks with interactions that are detected as belonging to certain verb classes In our framework this task naturally translates to finding the most probable covering of the corresponding activity graph with interaction sub graphs using a learned verb model 10 In order to identify the most likely sub interaction graphs given the event class models we first exhaustively mine all the interaction sub graphs of the activity graph For each of these sub graphs we measure the likelihood with respect to the learned model for each event class This results in an initial in terpretation of the activity graph in terms of all possible and highly overlapping sub interaction graphs each of which have a certain likelihood with respect to each of the event classes In order to obtain an interpretation with the above properties we first filter less likely graphs by applying a threshold to choose only those graphs whose likelihood probability with respect to the most likely class for that graph exceeds the threshold We further filter those graphs whose probabilities do not decline rapidly beyond the top k probabilities We perform this filtering operation to remove graphs that do not clearly be
14. overlap with this graph We also include background interaction sub graphs i e those interaction sub graphs that do not overlap any of the positive examples for all the verb classes 3 3 1 Spatio temporal Bag of Graphemes Given a set of sub interaction graphs we build a spatio temporal bag of graphemes BoG in order to learn verb class models for each verb As we shall see later this representation is also useful during verb detection The BoG represen tation requires the construction of a grapheme vocabulary We obtain this vocabulary by mining graphemes each of which tend to have relatively high dependencies with respect to a particular target class To this end we first exhaustively mine all interaction sub graphs that represent interactions upto a maximum number of objects and a maximum number of episodes Then we ap ply minimum redundancy maximum relevancy MRMR 1 which searches for a subset of graphemes that have a maximum mutual dependencies with respect to all classes and a minimum redundancy within this set as expressed below G mar are max Tan I gi V I gi gj ial 2 Ee 2 Ji Jj EG The BoG representation represents an interaction sub graph in terms of a histogram which records the normalized frequency with which each grapheme from the grapheme vocabulary occurs in the interaction sub graph We learn verb models for each verb class by using a one vs all learning scheme with SVM that uses x2 histogram kernel to c
15. s and spatio temporal ground truth are stored 2 2 System Architecture The work flow of REDVINE consists of three major stages namely system setup where the relevant system parameters are initialised e g directories of input and output relational representation where each input video is repre sented as an activity graph and numerous displaying tools are provided and learning and classification where the activity graph of a video is mined as interaction sub graphs corresponding to particular spatio temporal regions of interest that are then represented as histogram feature vectors for classifica tion Each of these stages is displayed in the following flowcharts Figure 1 Figure 2 and Figure 3 2 3 Graphical User Interface A comprehensive graphical user interface GUI is provided by the system where each component of the pipeline can be called individually Users need to firstly start Matlab and point it to the directory of REDVINE scripts through addpath genpath REDVINECode Now the REDVINE GUI can be launched by typing REDVINE in the Matlab command window as shown in Figure 4 There are five components of the GUI that corresponds to the three major stages of the systems including Set Up in the system setup stage Spatio Temporal Representation in the relational representation stage Graph Min ing Feature Extraction and Learning and Detection in
16. strate the REDVINE system on a dataset consisting of common outdoor verbs taken from www visint org 2 User Manual 2 1 Prerequisites of the System REDVINE is implemented in MatLab and to run the system properly users are required to install the Graphviz and ffmpeg packages These packages are required for the displaying functions Once the REDVINE package is unzipped it should contain the following directories If you have any questions of the documentation or software please contact Dr Muralikr ishna Sridhar scms leeds ac uk or Dr Feng Gu f gu leeds ac uk lIn this documentation and the current system we assume that the task is to learn verb models The system could equally be used to learn more general kinds of event models providing each instance takes place over some one piece interval and involves interactions of objects e Images this directory stores all the images converted from each video in the dataset through ffmpeg and they are required for the displaying functions of the system e Output this directory stores the output of the system e g images videos and results and if it is not originally included in the package users will be able to create it after the REDVINE is launched e RedVine this directory consists of all the Matlab scripts of the system under the directory of REDVINECode e Tracks this directory consists of all the Viper files where both track
17. the learning and classification stage The description of each part is given as follows 1 Set up e The buttons Browse for Image Directory Browse for Tacks Directory and Browse for Output Directory can be used to specify the directories of Images Tracks and Output please note Details of the ViPER the Video Performance Evaluation Resource can be found in http viper toolkit sourceforge net Select Verbs Select I O DIR Select Spatial Rels Images RCC3 QTC6V walk Figure 1 A flowchart of the system setup stage users are allowed to create a new directory for output files through the pop up window if necessary e In the list of Select Spatial Relation s users can select the types of qualitative spatial relationships e g RCC3 QTC6 and QTC6V included for the computation of activity graphs e In the list of Select Event Classes users can select a list of verbs that will be used by the system e g approach follow etc e The Save button is for initializing the system s default parameters which should be performed whenever users change any settings 2 Spatio Temporal Representation e The Parse Viper Files button is for extracting tracks and ground truth from all the Viper files supplied in the directory of Tracks and saving it in mat format that can be used by the rest of the pipeline e The Compute Inter
Download Pdf Manuals
Related Search
Related Contents
Yarvik YAC121 Mode d`emploi Sèche-linge à pompe à chaleur Manuale DigitalSign 5.0 Craftsman Garage Door Opener Wireless Keyless Entry Pad Owner's Manual Xシリーズ scarica qua - Home Milwaukee 6016 User's Manual FUS1010 IP65 NEMA 4X & IP66 NEMA 7 View Acrobat Installation & Service Manual Copyright © All rights reserved.
Failed to retrieve file