(LUẬN VĂN THẠC SĨ) Ontologie d’événements vidéos pour un système automatique d’interprétation vidéo

INTRODUCTION

Contexte du travail

For several years, a segment of the Orion team at Inria Sophia-Antipolis has focused on developing an automatic video sequence interpretation platform aimed at recognizing human behaviors, such as violence or bank attacks A simplified version of this automatic interpretation platform is illustrated in Figure 1 The primary goal of this system is to identify predefined video events from a video stream.

Figure 1 : Structure simplifiée d’une plate-forme d’interprétation vidéo

Lors de la construction d’une plate-forme d’interprétation vidéo, un des problèmes rencontré est celui de la représentation et de la définition d’événements vidéos d’intérêt

In the team's platform (VSIP), video event definitions currently rely on a description language that is not easily accessible to non-programmers To simplify the knowledge acquisition process, the team is focusing on developing ontologies to represent key concepts for defining video events This video event ontology aims to provide a semantic foundation and a consensus conceptual vocabulary that is useful for defining complex video events By utilizing this video event ontology, users can comprehend the terminology used to describe video events without needing to deal with image processing challenges.

Additionally, the video event ontology enhances communication between domain experts, such as banking security professionals, and developers of automatic video sequence interpretation systems This ontology is also valuable for sharing and reusing event models effectively.

Enfin, cette ontologie d'événements vidéos est utile pour évaluer les systèmes d'interprétation automatique de séquences vidéos et pour comprendre quels types d'événements ces systèmes sont capable de reconnaợtre.

VSIP : Un système automatique d'interprétation vidéo

We have outlined our interests in developing an automated video interpretation platform for recognizing human behaviors In this section, we will present our automatic video interpretation system.

VSIP, le système automatique d'interprétation vidéo de l’équipe ORION est décrit dans la figure

2 Le système prend comme entrée les connaissances à priori comprenant des événements vidéos d’intérêt prédéfinis par des experts, l'information géométrique 3D et la sémantique relative à l’environnement En cours d’utilisation du système, le flux vidéo acquis par une camera constitue une entrée du système Le sortie du système est l'ensemble des événements

Module de reconnaissance d’événements vidéos

Evénements vidéos reconnus vidéos identifiés (ex attaque en agence bancaire) Le module de reconnaissance de scénario est décrit dans [Vu et al 2003]

Notre objectif est de proposer une ontologie d’événements vidéos ainsi qu’un outil d’acquisition de connaissances afin de faciliter la définition d’événements vidéos d’intérêt Les chapitres 3 et

4 de ce rapport présentent respectivement une ontologie d’événements vidéos et un outil d’acquisition de connaissances

Figure 2 : Structure du système automatique d'interprétation vidéo VSIP

Langage de Description de Scénarios de l’équipe ORION

Des événements vidéos sont représentes par le Langage de Description de Scénarios développé dans l’équipe ORION

Il y a deux types principaux de concepts : les objets physiques observés dans la scène et les événements vidéos apparaissant dans la scène

• Un objet physique peut être un objet statique (ex bureau) ou un objet mobile (ex personne, voiture)

A video event can be categorized as a primitive state, a composite state, a primitive event, or a composite event Primitive states serve as essential building blocks for creating complex events that can be utilized by the ORION team's automatic video sequence interpretation platform.

An event can be viewed at various spatial and temporal granularities For instance, a man running in a marathon can be perceived either as a state (he is running) or as a composite event The chosen granularity reflects the properties of interest to the user.

La syntaxe pour définir un état ou un événement se compose trois parties :

PhysicalObjects: déclare les objets physiques concernant le concept Seulement les objets qui peuvent changer entre les occurrences de concept doivent être déclarés

Module de Reconnaissance d’événements vidéos

Information géométrique et sémantique relative à l’environnement

Evénements vidéos d’intérêt définis par des experts

Components: déclare tous les composants constituant l’événement vidéo

Constraints: déclare toutes les relations entre les sub-concepts (objets physiques et composants) du concept

L’exemple ci-dessous illustre la représentation de l’état Inside_zone dans le Langage de

Description de Scénarios de l’ORION :

Un autre exemple, l’événement Changes_zone est définit en utilisant des états Inside_zone comme ses composants :

((inside_z1 : PrimitiveState inside_zone(p, z1)) (inside_z2 : PrimitiveState inside_zone(p, z2))

Objectifs

Une de limitations de l’ontologie actuellement proposée par l’équipe Orion [Bremond at al.,

In 2004, it was noted that certain concepts, such as spatial relationships between physical objects, lack formal definitions Additionally, the scenario description language used by the ORION team is not a standardized knowledge representation language Our primary goal is to facilitate the sharing of video event ontology Therefore, the main objective of this internship is to represent this ontology in a standardized language.

Le but de ce stage de DEPA est d’améliorer le travail de l'équipe Orion sur l’ontologie d'événements vidéos:

• Premièrement, il s’agit d’étendre et d’implémenter dans un langage standard l’ontologie d'événements vidéos existante

Secondly, the goal is to develop a graphical knowledge acquisition tool based on video event ontology, enabling experts to understand, control, and manipulate events of interest relevant to their specific fields of application.

Contenu du rapport

This report is structured as follows: Chapter 2 provides an overview of the ontology field and addresses the challenges of representing and defining video events Chapter 3 proposes a video event ontology Chapter 4 introduces a graphical knowledge acquisition tool designed to assist domain experts in defining relevant events Finally, Chapter 5 concludes the report and offers future work perspectives.

ETAT DE L’ART

L'ontologie

In the past decade, ontological engineering has emerged as a highly popular research topic, significantly impacting various fields such as intelligent web search, knowledge sharing, management, and reuse.

In computer science, an ontology aims to formally represent existing knowledge This formal representation relies on a process called conceptualization, which involves identifying a set of concepts and the relationships that connect them An explicit specification of this conceptualization is referred to as an ontology.

1993] Formellement, une ontologie se compose des termes, de ses définitions, des relations et des axiomes [Gruber, 1993]

Selon [Roche, 2003] l’ingénierie ontologique repose sur une approche multidisciplinaire:

• linguistique, comme nous employons des mots pour communiquer;

• théorie de la connaissance, puisque les mots se rapportent à la connaissance qui définit leur signification;

• logique, afin de garantir de la concordance;

• représentation de la connaissance afin d’adresser le problème de la compréhension d’un domaine

Developing ontologies is essential for facilitating communication and fostering a shared understanding of a specific domain This is crucial for both individuals and software agents, as it enhances collaboration and knowledge sharing.

Another significant reason driving research on ontologies is their ability to facilitate knowledge reuse For instance, in the context of video event representation, various approaches exist to depict the concept of time, including time intervals, specific time points, and relative time measurements When a group of researchers develops such an ontology, it can be easily reused by others for different applications, enhancing collaboration and efficiency in knowledge sharing.

Finalement, l'analyse ontologique clarifie la structure de la connaissance [Chandrasekaran et al.,

Ontologies offer more than just specialized vocabulary for a specific field; they provide conceptualizations of domain terms Additionally, the formal analysis of these terms is crucial for the reuse and extension of existing ontologies.

2.1.3 Le processus de création d’une ontologie

The development of an ontology should be approached as a project, making project management methods applicable (Gandon, 2002) A proposed ontology lifecycle is outlined in Lopez et al (2000), identifying three key types of activities (Figure 3).

Figure 3 : Le cycle de vie d’une ontologie

L’activité de construction est la partie centrale du cycle, au cours duquel les activités de gestion et de support participent La construction se compose quatre étapes :

La construction d’une ontologie commence par la définition d’un domaine et de sa portée C’est à dire, il faut trouver des réponses a des questions comme [Noy et McGuinness, 2001] :

• Quel est le domaine que l'ontologie couvrira?

• Quels sont les buts de l'utilisation de l'ontologie?

• A quels types de questions l'information contenue dans l'ontologie devra-t-elle fournir des réponses?

• Qui va utiliser et maintenir l'ontologie ?

La phase conceptualisation a pour but de la structuration des connaissances d’un domaine Cette phase se compose des étapes suivantes:

• D’identifier des concepts, des attributs, et des valeurs dans un Dictionnaire Glossaire

• De classifier des groupes de concepts dans des Arbres de Concepts

• De construire une Table des Relations

• De construire une Table des Instances d’Attributs

• De définir des constantes dans un Table des Constantes

• De construire une Table de Formules

Once the conceptual model is structured, it must be translated into either formal or informal formalism This formalization makes the definitions of concepts more explicit and precise, facilitating the interpretation of the ontology Uschold and Jasper (1999) proposed four levels of ontology formalism, ranging from high-level informal to rigorous formal levels The choice of formalism level depends on the specific needs and the implementation language of the ontology For instance, if the ontology serves as a framework for communication among individuals, an informal representation may suffice However, if the ontology is intended for use by software tools, a more formal representation is required.

Parmi les nombreux langages de formalisation des ontologies, il y a trois grandes familles: les langages à base de frames, les modèles des graphes contextuels et les logiques de description

In frame-based languages, frames represent categories of objects and are equipped with attributes, known as slots, which can take on various values Each class instance has a unique identifier and attributes, connecting it to its class through an "is-a" relationship Classes are organized in a hierarchical structure defined by an "a-kind-of" link Flogic, introduced by Kifer et al in 1995, is a well-known formalism that combines frame-based languages, object-oriented formalisms, and first-order logic.

In the framework of conceptual graphs [Chein and Mugnier, 1992], distinct levels are identified At the conceptual level, it can serve as a foundation for a specialized communication language among experts from various disciplines engaged in collaborative cognitive work At the execution level, it can function as a common representation tool utilized by multiple modules within a complex system.

Description logics are founded on predicate logic, semantic networks, and frame-based languages Within the framework of description logics, knowledge is represented through concepts, roles, and individuals A concept serves as a general entity within a specific application domain, while roles denote binary relationships between concepts Individuals represent instances of these concepts, and the attributes of concepts, roles, and individuals are articulated using predicate logic.

Ultimately, it is essential to implement the ontology using a specific language that aligns with the formalization model A brief introduction to ontology languages will be provided in the following section.

2.1.4 Vue d'ensemble des langages d'ontologie

To implement ontologies, various languages have been utilized, primarily based on XML and description logics This section will provide a brief overview of the most popular languages, including XML, XML Schema, RDF, RDF Schema, and OWL.

XML (eXtensible Markup Language) recommandé par le World Wide Web Consortium (W3C) est une spécification destinée à rendre des documents lisibles par une machine

Figure 4 : Exemple de représentation XML

XML fournit seulement une structure syntaxique pour des documents [Klein, 2004] XML ne permet pas une interprétation sémantique des données

XML Schema permet de définir les balises ainsi que l’agencement de ces balises autorisé pour définir la validité d’un document XML XML Schema est recommandé par le W3C

While XML offers a syntax for encoding data, RDF (Resource Description Framework) serves as a foundation for data processing As its name suggests, RDF is not a language but a model designed to represent information about resources on the Internet.

Ce type de données pour données s'appelle des metadonnées Les choses sont des ressources dans le vocabulaire de RDF [Klein, 2004]

RDF est basée sur la notion d'URI (Uniform Resource Identifiers), en décrivant des ressources en termes de propriétés et de valeurs de propriétés

Figure 5 : Exemple de représentation RDF

Le modèle de base de données de RDF comporte quatre types d'objets:

Les ressources sont toutes les choses décrites par des expressions de RDF Une ressource peut varier d’une page entière de Web à un élément d’un document XML

Une propriété est un aspect, une caractéristique, un attribut, ou une relation spécifique employée pour décrire une ressource

La notion de literal est employée pour identifier des valeurs telles que des nombres et des dates au moyen d'une représentation lexicographique

Un statement se compose d'une ressource spécifique ainsi que d’une propriété plus la valeur de cette propriété pour cette ressource

Each statement is represented by a triplet consisting of a subject (a resource), an attribute (a property), and an object (either a resource or a literal) A collection of these triplets forms an RDF graph Figure 5 illustrates the previous example in the form of a triplet graph.

Similar to XML, an RDF model does not inherently define the semantics of a specific application domain It merely offers an intermediary mechanism for describing metadata To establish the specific properties of a domain and their semantics, additional elements must be employed.

RDF schema fournit un système de type pour RDF Il fournit un mécanisme pour définir des propriétés associées à des classes ainsi que leurs ranges et leurs domaines

Figure 6 : Représentation d’un graphe de triplets

Les primitives pour construire un schéma pour un domaine spécifique sont :

• Les définitions de classe (rdf:Class) de statements rdf:subClassof qui permettent la définition d’hiérarchies de classe

• Les définitions de propriété et les statements rdf :subPropertyof pour créer des hiérarchies de propriétés

• Les statements de domaine et de range pour limiter les combinaisons possibles des propriétés et des classes

• Les statements de type pour déclarer une ressource comme instance d'une classe spécifique http://www.test.org/

Son 7634 created_by phone_number name

Un Langage basé sur les logiques de descriptions et XML : OWL

La représentation d’événements vidéos

2.2.1 Revue de la bibliographie de la représentation d’événements vidéos

In the field of computer vision, extensive research has focused on defining and representing video events For instance, Nagel (1998) introduces a hierarchy of motion verbs with up to nine levels of abstraction, including terms like "change," "event," "verb phrase," "phase," and "history." They also recommend using transition diagrams at each abstraction level to illustrate the sequence of verbs Additionally, Nevatia et al (2003) propose a hierarchical decomposition of events, suggesting the use of the terminologies "single thread" and "multiple thread."

Research on video event ontologies has been limited, with significant efforts initiated in 2003 during three workshops under the ARDA program, aimed at developing a video event ontology for security applications Collaborating researchers from the computer vision community and U.S government agencies worked together to create a structured ontology to aid in the development of security-specific ontologies The computer vision experts proposed six ontologies tailored to particular security applications, including railroad crossing surveillance The ORION team contributed to these workshops by presenting two ontologies focused on bank and subway surveillance These ontologies were incrementally defined through interactions with end-users, such as subway and bank surveillance operators.

Following the ARDA program workshops, the authors of [Nevatia et al., 2004] introduced a formal language for representing video events, along with specific ontologies tailored for the security domain This formal language, known as VERL, will be explored in detail in the upcoming session.

(*) http://www.w3.org/TR/owl-ref/

(**)http://www.ic-arda.org/

2.2.2 VERL : Une langage de la représentation d’événements vidéos

VERL (Video Event Markup Language), introduced by Nevatia et al in 2004, is a formal language designed to describe an ontology of events It allows for the translation of VERL expressions into first-order logic, ensuring that these expressions possess clear semantics.

Objects, états, événements et types

Key concepts in VERL include objects, states, and events Objects possess properties or attributes that function as predicates of an argument and are interconnected with other objects through relationships, which can be viewed as predicates with two or more arguments Properties, attributes, and relationships can be classified as states, while an event signifies a change in the state of an object.

There are three basic types in language, with everything being categorized as a "thing." A "thing" describes a state, event, or entity, such as a physical object There are two types of "things": the first type includes entities and is typically regarded as physical objects.

Le type ev se compose des états et des événements Normalement, une personne serait du type ent, et son action de courir serait du type ev

Constants can be of any predefined type A constant or a variable is an expression of VERL A VERL expression (vexpr) is defined as follows: vexpr > constant | variable.

Une fonction symbolique qui s'est appliquée au nombre approprié de vexprs comme des arguments est une vexpr: vexpr > fct "(" [ vexpr { "," vexpr }* ] " )"

Un prédicat symbolique s’est appliqué au nombre approprié d'arguments est une vexpr : vexpr > pred "(" [vexpr { "," vexpr } * ] " )"

Fait attention que les arguments doivent être du bon type et le résultat est toujours de type ev

Un opérateur logique appliqué au nombre approprié des vexprs de type ev est un vexpr : vexpr > "AND" "( " vexpr {"," vexpr } * " )" | "OR" " (" vexpr { "," vexpr }* " ) "

| "NOT" " ( "vexpr " ) " | "EQUIV" " ( " vexpr "," vexpr " ) "

The core operator for defining composite events is PROCESS, which requires a predicate and a vexpr as its two arguments The predicate applies to the appropriate number of arguments, which may include optional type specifications The syntax is defined as follows: "PROCESS" "(" pred "(" [argspec {"," argspec}* ")" ["," vexpr] ")" where argspec consists of an optional type followed by a variable.

For instance, if we have the "located-at" predicate linking a thing to an entity, and the "change" predicate linking two arguments of type event, we can define the "move" predicate as follows:

PROCESS(move(thing x, ent y, ent z), change(located-at(x,y),located-at(x,z)))

A coté des annotations des événements spécifiés ainsi que les définitions des propriétés, des relations et des événements composites, on peut spécifier des règles d’inférence par l’opérateur

RULE qui prend deux vexprs de type ev comme ses arguments :

Une règle est une implication, la premier vexpr implique la deuxième

Par exemple, supposons que nous définissons carry(x, y, a, b, t), x porte y de a à b pendant l'intervalle de temps t, comme x tient y pendant t et x déplace d'a à b pendant t :

Alors si nous voulons dire que quand x porte y de a à b pendant t, alors y également déplace d'a à b pendant t, nous pouvons dire:

Conclusion

The ontology creation process should be viewed as a project, applying project management methods throughout its four key stages: specification, conceptualization, formalization, and implementation There are three main families of ontology formalization languages: frame-based languages, contextual graph models, and description logics Many languages have been utilized for implementing ontologies, predominantly based on XML and description logics, with OWL being the primary representative of this language family Established as a W3C recommendation in February 2004, OWL is built on XML and description logics, ensuring compatibility with existing web standards while providing the semantics of description logics.

Research on the structure of video event ontologies primarily focuses on breaking down complex spatiotemporal events into simpler components This decomposition facilitates the establishment of hierarchical structures to represent these events However, the resulting structures can be highly complex The challenge of implementing video event ontologies using a standard knowledge representation framework, such as OWL or SWRL, remains an open research issue.

ONTOLOGIE D’EVENEMENTS VIDEOS

3.1 Vue d’ensemble de la méthodologie

This chapter outlines a video event ontology, focusing on the representation of physical objects and video events within the observed scene Our goal is to model this ontology using the standardized OWL language The construction of our ontology follows a systematic approach, detailing the necessary steps involved.

• Conceptualisation: L’étape est pour le but de structurer des connaissances d’un domaine, l’indication des objets physiques, des événements, leurs propriétés et leurs relations entre eux

• Formalisation dans le langage VERL: Grâce à la formalisation, les définitions des concepts sont plus explicites et précises Nous choisissons le langage VERL pour la pas de formalisation

• L’implémentation dans le langage OWL: Différant aux l’auteur de VERL qui a utilisé XML pour représenter ses ontologies, dans cet étape, nous transformons des expression de VERL sur OWL

Les pas décrits ci-dessus sont détaillés dans les parties suivantes

In a bank robbery scenario, the assailant enters the bank and approaches the counter where an employee is present He then forces the employee to move towards the vault This example highlights key physical elements such as the assailant, the employee, and the vault, while also illustrating significant events like the assailant's entry into the bank and his approach to the counter Additionally, it emphasizes spatial relationships, such as 'in front of' and 'behind,' along with temporal relationships, including 'before' and 'after.'

The most abstract spatial concept is the physical object, which encompasses all real-world items within a monitored scene Based on the ability to predict their movement, physical objects are classified into two categories: mobile objects and contextual objects.

A mobile object is one whose movement cannot be predicted Typically, a mobile object initiates its own movement Common examples of mobile objects include individuals, groups of people, animals, and robots.

A contextual object is one whose movement can be anticipated using prior information Typically, a contextual object cannot autonomously change its position within a scene; however, it can be moved by another object Common examples of contextual objects include walls, entry areas, doors, and chairs.

There are various levels of granularity for considering physical objects At a high level of granularity, a group of people can be viewed as a mobile object At a finer level, an individual person is seen as a mobile object At an even more detailed level, a person can be regarded as a complex entity capable of performing simultaneous actions with different parts of their body, with each body part acting as a mobile object Consequently, a physical object can lead to the creation of multiple new objects or merge with others to form a unique object The hierarchical structure of physical objects is illustrated in Figure 7.

Spatial relationships define the connections between physical objects involved in a given event There are two key types of spatial relationships: topological relationships and distance relationships.

Topological and distance relations describe the connections between physical objects, such as the proximity of a person to an ATM Distance relations can be quantified; for instance, the distance between two individuals can be defined as 5 meters Additionally, object A is considered close to object B if the distance between them is less than 1 meter The hierarchical structure of spatial relationships is illustrated in Figure 8.

Figure 8 : Exemple des relations spatiales

The various methods for characterizing the movements and interactions of mobile objects within a scene involve states and events, both primitive and composite These concepts are defined in detail below.

A state is a valid spatio-temporal property at a specific moment or stable over a time interval It's important to note that physical objects possess properties or attributes that also involve relationships with other objects These properties, attributes, and relationships can be understood as states; for example, a person can be either inside or outside a room.

Un état primitif est une propriété spatio-temporelle valide et stable sur un intervalle de temps

Un état composé est une combinaison d’états Nous appelons composants tous les sous-états d’un l'état et nous appelons des contraintes toutes les relations concernant ces composants

Un événement est un ou plusieurs changement(s) d'état à deux instants successifs de temps ou sur un intervalle de temps (ex entre, sort)

Un événement primitif est un changement d'état

Un événement composé est une combinaison d’états et/ou d’événements

An event can be viewed at various spatial and temporal granularities For instance, a man running in a marathon can be perceived as a state (he is running) or as a composite event The chosen granularity reflects the properties of interest to the user.

Temporal relations are used to define events, incorporating Allen's interval algebra operators and quantitative relationships concerning the duration, start, and end of events.

The DAML-Time ontology is a comprehensive specification language designed for expressing temporal semantics Hobbs (2002) established axioms for the topological relationships between moments and intervals, as well as various temporal relations.

Moments are intuitively defined as points in time without duration, while intervals represent periods with measurable extent This distinction means that instantaneous events, such as a car accident, occur at a specific moment, whereas interval events, like a meeting from 2 PM to 3 PM, have a defined duration.

Les relations temporelles auxquelles on s’intéresse pour représenter des événements vidéos sont : avant, après, pendant et rencontre

3.3 Représentation des événements sur VERL

As explained in section 2.1.3, formalization enhances the clarity and precision of concept definitions In this session, we introduced the representation of events using the VERL language.

Il y a trois types basiques dans le langage VERL Tout est une thing Il y a deux types de things

Le type ent comprend des entités, et généralement peut être considéré comme des objets physiques Le type ev comprend des états et des événements

La définition des sous-types de type ent et ev est effectuée par l’opérateur SUBTYPE de VERL

Il prend deux arguments, le nom du sous-type et le nom du super-type :

UN OUTIL D’ACQUISITION DE CONNAISSANCES

CO C ON NN N AI A I SS S SA AN NC C ES E S

In Chapter 3, we presented a video event ontology This chapter introduces a graphical knowledge acquisition tool designed to assist experts in defining relevant events The tool leverages the ontology outlined in Chapter 3 to facilitate the knowledge acquisition process effectively.

L’ontologie d’événements est utilisée par l’outil et fournit un vocabulaire pour guider l'expert dans la description des événements vidéos de son domaine Les autres composants de l’outil se composent :

We utilize the Jena API to manage OWL ontologies, as it provides a fundamental interface for handling RDF, RDFS, and OWL files Developed by HP's research lab in Bristol, Jena is an open-source software that offers a comprehensive set of parsers and query engines implemented as Java classes.

• Des événements d’intérêt représentés par la Langage de Description de Scénarios de l’équipe ORION sont les sorties du processus d’acquisition de connaissances

Les objets physiques ainsi que les relations sont prédéfinies et représentées par des arbres

(figure 11) Grâce à l’interface graphique, l’utilisateur peut définir des événements spécifiques d'une manière intuitive sans avoir à se soucier de la syntaxe du langage de description

Figure 9 : L’arbre des objets physiques (a) et l’arbre des relations spatiales et temporelles (b)

Figure 10 : La fenêtre principale de l’outil

The tool simplifies the process of describing events, allowing users to articulate complex events based on previously defined simpler ones Defining a video event involves several key steps.

• Définition des objets physiques dans la scène

• Définition des composants, c’est à dire des sous événements

• Représentation des contraintes entre les objets physiques et les événements

La partie suivante donne un exemple concret du processus de définition de l’événement complexe de l’attaque en agence bancaire (figure 12)

4.3 Etude de cas : la représentation de l’événement de l’attaque en agence bancaire

La représentation de l’événement de l’attaque en agence bancaire commence par la définition des objets physiques d’intérêt dans la scène (figure 13)

Figure 11 : Définition d’un objet physique

Ensuite, à partir des objets physiques et des événements (ou états) définis auparavant, on ajoute des composants (sous-états et des sous-événements) (figure 14)

To simplify the definition of constraints in events, we identify various types of constraints For instance, some constraints are represented as binary relationships between two concepts, such as event A occurring before event B Additionally, other constraints can be articulated as functional constraints, such as specifying that the number of occurrences of an event is two.

Nous avons utilisé l'outil pour définir 20 états/événements primitifs et 18 événements composés pour la sécurité en agence bancaire Ces événements ont été validés dans le système VSIP

(section 1.2) comme les entrée du module de reconnaissances d’événements La figure 16 illustre la détection de l'événement de l’attaque en agence bancaire

Figure 14 : Détection de l'événement de l’attaque en agence bancaire utilisant des scénarios définis par l’outil graphique

L’événement complexe généré par l’outil est donné en figure 17 L’outil d’acquisition de connaissances ộvite à l’utilisateur de maợtriser un langage informatique complexe

(com_at_pos : PrimitiveState inside_zone(com, "Back_Branch")) (agr_enters : PrimitiveEvent changes_zone(agr,

"Infront_Branch", "Entrance_zone")) (com_at_safe : PrimitiveState inside_zone(com, "Safe"))

(agr_at_safe : PrimitiveState inside_zone(agr, "Safe"))

(com_at_pos before com_at_safe)

(agr_enters before agr_at_safe)

(agr_at_safe during com_at_safe)

Figure 15 : L’événement vidéo attaque en agence bancaire obtenu grâce à l’outil d’acquisition de connaissances

CONCLUSION & PERSPECTIVES

We have introduced a method for defining video event ontologies that can be utilized by an automatic video sequence interpretation platform A significant challenge in constructing such an ontology is the absence of a standard platform To address this issue, we formalized the ontologies using the Video Event Representation Language (VERL) Instead of employing XML for ontology representation, as originally proposed by the author of VERL, we converted VERL expressions into OWL FULL This transformation provides the ontology with clear semantics, facilitating easier sharing and reuse across different communities.

We have developed a graphical knowledge acquisition tool that utilizes this ontology to assist experts in defining events of interest within specific application domains With the video events ontology, an expert can create, manage, and manipulate complex events relevant to their field This tool is employed to define video events in the context of bank agency surveillance.

Premièrement, nous voulons utiliser cet outil pour définir des événements vidéos dans des autres applications (sécurité dans des stations de métro ou dans un aéroport)

Deuxièmement, nous sommes intéressés par l’utilisation des techniques de raisonnement basées sur l’ontologie d’événements :

• Le raisonnement permet de vérifier la cohérence de la connaissance acquise

• De plus, Le raisonnement basé sur l’ontologie peut être utile pour découvrir des relations implicites

Using OWL FULL complicates the integration of inference rules available in the ontology community However, we can still leverage a portion of the ontology, specifically OWL DL, to perform certain reasoning operations.

One effective way to enhance user interaction with the tool is by integrating it with an event recognition module For instance, when a user intends to create a new event, they can activate the recognition module, which identifies and returns detected events This functionality assists users in defining new scenarios more efficiently.

We are also considering integrating our knowledge acquisition tool with another tool from our team that generates 3D animations based on scenario models [Bannour et al., 2004] This additional feature would be beneficial in assisting experts to validate the definitions of new events.

[Allen, 1984] James F Allen Towards a general theory of action and time Artificial Intelligence, 23:123-

In their 2004 paper presented at the 4th IASTED International Conference on Visualization, Imaging, and Image in Marbella, Spain, Jihene Bannour, Benoit Georis, Francois Bremond, and Monique Thonnat explored the generation of 3D animations from scenario models Their research contributes to advancements in visual representation techniques, enhancing the understanding of complex data through dynamic visualizations.

[Bremond et al., 2004] Francois Bremond, Nicolas Maillot, Monique Thonnat, Van-Thinh Vu Ontologies For Video Events Technical Report N o 5189, INRIA Sophia-Antipolis, France, 2004

[Chandrasekaran, et al., 1999] B Chandrasekaran, et al What Are Ontologies, and Why Do We Need Them? IEEE Intelligent Systems and Their Applications, vol 14, no 1, pp 20-26, 1999

[Chein et Mugnier, 1992] M Chein, M L Mugnier Conceptual Graphs: Fundamental Notions Revue d’Intelligence Artificielle, vol 6, n 4, p 365-406, 1992

[Decker et al., 2000] Stefan Decker, Prasenjit Mitra, Sergey Melnik (2000) Framework for the Semantic Web: An RDF Tutorial IEEE Internet Computing, Pages: 68 - 73 Volume 4 , Issue 6 (November

[Gandon, 2002] Fabien Gandon Ontology Engineering: a survey and a return on experience Research Report of INRIA, RR4396, France - 2002

[Grube, 1993] T R Gruber A translation approach to portable ontologies Knowledge Acquisition, 5,

[Heflin, 2004] Jeff Heflin OWL Web Ontology Language Use Cases and Requirements, http://www.w3.org/TR/webont-req/

[Hobbs, 2002] Hobbs, J R A DAML Ontology of Time http://www.cs.rochester.edu/~ferguson/daml/daml-time-nov2002.txt

[Horrocks et al., 2000] Horrocks, I., Fensel, D., Broekstra, J., Decker, S., Erdmann, M., Goble, C., van Harmelen, F., Klein, M., Staab, S., Studer, R., and Motta, E OIL: The Ontology Inference Layer

Technical Report IR-479, Vrije Universiteit Amsterdam, Faculty of Sciences http://www.ontoknowledge.org/oil/

[Kifer et al., 1995] Kifer M., Lausen G et Wu J Logical Foundations of Objetc-Oriented and Frame- Based Languages Journal of the ACM, 1995

[Klein, 2004] Michel Klein Change Management for Distributed Ontologies PhdThesis Vrije Universiteit Amsterdam, http://www.cs.vu.nl/~mcaklein/thesis/

[Lopez et al., 2000] Lopez, M., Gomez-Perez, A., Rojas-Amaya, M Ontology's crossed life cycle The

12th International Conference on Knowledge Engineering and Knowledge Management, EKAW-

[Mezaris et al., 2004] V.Mezaris, I.Kompatsiaris, and M.G.Strintzis A knowledge-based approach to domain-specific compressed video analysis IEEE International Conference on Image Processing (ICIP

[McGuinness et Harmelen, 2004] Deborah L McGuinness and Frank van Harmelen Owl Web Ontology Language Overview http://www.w3.org/TR/owl-features/, 2004

[Moller et al., 1999] Moller, R., B Neumann, et al Towards computer vision with description logics: some recent progress Integration of Speech and Image Understanding, 1999

[Nagel, 1988] Nagel, H H From image sequences towards conceptual descriptions Image and Vision

[Nevatia et al., 2003] Ram Nevatia, Tao Zhao, Somboon Hongeng Hierarchical Language-based Representation of Events in Video Streams Conference on Computer Vision and Pattern Recognition

[Nevatia et al., 2004] R Nevatia, J Hobbs, B Bolles An Ontology for Video Event Representation

IEEE Workshop on Event Detection and Recognition, June 2004

[Noy et Hafner, 1997] Natalya Fridman Noy and Carole D Hafner The State of the Art in ontology Design: A survey and Comparative Review AI Magazine, 18(3): 53-74 (Fall 1997)

[Noy et McGuinness, 2001] N Fridman Noy and D McGuinness Ontology Development 101: A Guide to Creating Your First Ontology 2001 http://www.ksl.stanford.edu/people/dlm/papers/ontology101/ontology101-noy-mcguinness.html

[Nardi et Brachman, 2002] D Nardi, R J Brachman An Introduction to Description Logics Description

Logic Handbook, Cambridge University Press, 2002, pages 5-44

[Pan et Hobbs, 2004] Feng Pan and Jerry R Hobbs Time in OWL-S 2004 AAAI Spring Symposium

[Roche, 2003] C Roche Ontology : a Survey The 8th Symposium on Automated Systems Based on Human Skill and Knowledge IFAC September 22-24, 2003 , Goteborg , Sweden

[Uschold et Jasper, 1999] Mike Uschold and Robert Jasper A Framework for Understanding and Classifying Ontology Applications KRR5-99, Stockholm, Sweden, 1999

In their 2003 paper presented at the 18th International Joint Conference on Artificial Intelligence (IJCAI'03) in Acapulco, Mexico, Van-Thinh Vu, Francois Bremond, and Monique Thonnat introduced a novel algorithm for automatic video interpretation, focusing on temporal scenario recognition Their research contributes significantly to the field of artificial intelligence by enhancing the ability to analyze and understand video content through advanced recognition techniques.

AN A NN N EX E X : : RE R EP PR RE ES S EN E N TA T AT TI I ON O N DE D E L’ L ’O O NT N TO OL LO O GI G I E E SU S UR R

PROCESS (close_to (po: x, po: y), near(x, y))

PROCESS (close_to_wall(Person : p, Wall : w), near(p, w))

PROCESS (close_to_eq(Person : p, Equipment : eq), near(p, eq))

PROCESS (close_to_person(Person : p, Wall : w), near(p, w))

PROCESS (close_to_v(Vehicle: v, Equipment: eq), (near(v, eq))

PROCESS (far_from(po: x, po: y), NOT(near(x,y)))

PROCESS (far_from_eq(Person: p, Equipment: eq), NOT(near(p, eq)))

PROCESS (far_from_person(Person: p1, Person: p2), NOT(near(p1, p2)))

PROCESS (inside_zone(Person: p, Zone: z), inside(p,z))

PROCESS (outside_zone(Person : p, Zone : z), NOT(inside(p,z)))

PROCESS (moves_close_to(Person : p, Equipment : eq),

AND( at_interval(far_from(p, eq), T1), at_interval(close_to(p, eq), T2) before(T1, T2)))

PROCESS (moves_close_to_person(Person : p1, Person : p2),

AND( at_interval(far_from(p1, p2), T1), at_interval(close_to(p1, p2), T2) sequence(T1, T2)))

PROCESS(stay_at_wall( Person: p, Wall: w),

AND(at_interval(close_to_wall(p, w), T), (T >= 2 seconds )))

PROCESS(meets_person(Person : p1, Person : p2),

AND(at_interval(close_to_person(p1, p2), T), (T >= 2 seconds )))

PROCESS(stay_far_from( Person: p, Eqipment: eq),

AND(at_interval(far_from_eq(p, eq), T), (T >= 2 seconds )))

PROCESS(moves_away_from( Person: p, Eqipment: eq),

AND(at_interval(close_to_eq(p1, eq), T1), (at_interval(far_from_eq(p, eq), T2), (before(T1, T2)))

PROCESS(moves_away_from_person( Person: p1, Person: p2),

AND(at_interval(close_to_person(p1, p2), T1), (at_interval(far_from_person(p, eq), T2), (before(T1, T2)))

PROCESS(stay_inside_zone (Person :p, Zone : z),

AND(at_interval(inside_zone(p,z), T), (T >= 2 seconds )))

PROCESS(stay_outside_zone (Person :p, Zone : z),

AND(at_interval(outside_zone(p,z), T), (T >= 2 seconds )))

PROCESS(enters_zone(Person : p, Zone : z),

AND(at_interval(outside_zone(p,z), T1), At_interval(inside_zone(p, z), T2), before(T1, T2)))

PROCESS(leaves_zones(Person : p, Zone : z),

AND(at_interval(outside_zone(p,z), T1), At_interval(inside_zone(p, z), T2), before(T2, T1)))

PROCESS(changes_zone(Person : p, Zone : z1, Zone : z2),

AND( at_interval(inside_zone(p, z1), T1), At_interval(inside_zone(p, z2), T2), before(T1, T2)))

Tiêu đề	Ontologie d’événements vidéos pour un système automatique d’interprétation vidéo
Tác giả	Pham Le Son
Người hướng dẫn	Monique Thonnat, Nicolas Maillot
Trường học	inria
Chuyên ngành	video event ontology
Thể loại	thesis
Năm xuất bản	2004
Thành phố	sophia-antipolis

Định dạng
Số trang	35
Dung lượng	612,91 KB