Home

User Guide MOMIS Version 1.2

image

Contents

1. cose Fig 2 19 Annotation of Class and Attribute Names nn mom Select one base form from the list or add a new one hotel_name Attention the base form does not exist You can add a new synset or add a new synset and map on it the new lemma Add base form lemma and map it on an existing Remove base form Open WordNet Extender Select one or more senses e O Select one base form from the list or add a new one name Co asdbase form Lemma name exists into WordNet DataBase name Open WordNet Extender Remove base form Select one or more senses gt a language unit by which a person or thing is known his name really is George Wash N 3 person s reputation he wanted to protect his good name N family based on male descent he had no sons and there was no one to carry on his r N a well known or notable person they studied all the great names in the history of Fra N by the sanction or authority of halt in the name of the law N a defamatory or abusive word or phrase Y assign a specified usually proper proper name to They named their son David Th V aive the name ar identifvina chararteristire nf refer tn hv name nr enme nther ident E Fig 2 20 hotel name annotation 31 1 do not annotate the term 2 insert a different base form that is present in WordNet for exa
2. As shown in Fig 2 36 the Mapped Local Attributes area will show all the attributes of the Local Class Venere Hotels mapped on the GA rating The Function area will show all the functions applicable to the local at tributes to transform their values In order to compose the TF you have to double click on a function so it will be written in the Function editor then choose the attributes you want to involve and put them into the function as parameters by clicking twice You may manually edit the function in the Function editor In the example shown in Fig 2 36 the TF applied calculates the average value between the local attributes venere hotels rating and user_rating and is explained in fig 2 37 the local attributes venere hotels rating venere hotels user_rating and saperviaggiare hotel rank refer to an evaluation of the hotel quality and are mapped together on the same global attribute rating Now we need to define the type and then the possible values for this global attribute after having a look at the local sources through the Data Preview tool we discover 52 FIRATING USER RATING dre n Position 41 character Lenath 1 character Lenath 1 character Substrina ratina 47 7 i Substrina user_ratina 43 1 ugn gn StringTolnt g g StringTolnt 2 f RATING USER_RATING 3 Fig 2 37 Transformation Function applied for the Local Attributes rating and user_rating of the local class Venere Hotels 53 that
3. Fig 2 10 you have to enter the absolute path of the Excel File and complete the configuration parameters reported below e Column Name Line Number The line number from which to get the 5The Microsoft Excel Wrapper and CSV Wrapper are based on the Metamodel library http metamodel eobjects org 21 names of the columns e Skip Empty Lines Boolean that defines if empty lines in the Excel spreadsheet should be skipped while reading the spreadsheet e Skip Empty Columns Boolean that defines if empty columns in the Excel spreadsheet should be skipped while reading the spreadsheet e Transform Column Types Boolean that defines if the system should try to transform column data types Click the Next button to see the sheets of the file and the columns of each sheet Depending on the integration needs you can choose to acquire by selecting the tree nodes a subset of the sheets and columns You can use the Data Preview tool on a sheet to explore the first one hundred rows of the sheet or on a column to explore the first one hundred values of a column Then click on Finish to complete the source schema extraction Once the local source schema extraction process is finished the sheets of the file called in the ODLyslanguage local classes will appear in the Source Explorer view From there you can access at any time the local Data Preview by right clicking on a local class and choosing Data Preview 2 4 3 Connection to a MySQL d
4. Logic called ODL s derived from the standard ODMG is introduced for information extraction A software component named wrapper Fig 1 1 extracts the schema of a source translating it in the ODL slanguage The discovery of relationships between the schemas of the information sources ex ploits the semantics of the data sources clustering techniques and description logics inferences The integration process gives rise to a mediated schema also called Global Schema GS Fig 1 1 that is a reconciled integrated and virtual view of the underlying sources MOMIS performs data integra tion following a virtual approach no centralized copy of data is made and follows a global as view GAV approach the obtained global schema is ex pressed in terms of the local source schemas A full and detailed description of the MOMIS system from a theoretical point of view is out of our scope and can be found in 5 2 4 The goal of the MOMIS system 3 is the minimization of the integration process costs In traditional Data Integration Systems designers have to manually build the integrated schema defining all the mappings between 3 Database IT E e un Spreadsheet User Mediated Schema Other data sources Fig 1 1 The Data Integration Process each global class attribute and the corresponding local classes attributes on the local data sources thus the integration process requires several days or weeks depending on
5. be polysemous two words that share at least one sense in common are said to be synonyms So in WordNet each lemma is associated to one or more synsets as the same word can be used with different meanings in different contexts Each synset is associated to one or more lemmas several words can be used to express the same concept for example zip and postal_code are associated to the synset identified by the gloss a code of letters and digits added to a postal address to aid in the sorting of mail The synsets are linked through semantic relations 32 e A hypernym relationship connects two synsets where the first general 111 izes the second e g animal is a hypernym of dog e A hyponym relationship connects two synsets where the first specializes the second e g student is a hyponym of person footnoteThis means that the synset associated to student is a hyponym of the synset associated to person The hyponym is the opposite relationship of hypernym NEW ASSOCIATION NEW ASSOCIATION NEW ASSOCIATION NEW ASSOCIATION BETWEEN AN BETWEEN A BETWEEN A BETWEEN A EXISTING LEMMA AND AN NEW LEMMA AND AN NEW LEMMA ANDA NEW NEW SYNSET ANDAN EXISTING SYNSET EXISTING SYNSET SYNSET EXISTING LEMMA E LEMMA SYNSET LEMMA SYNSET a a j em j SYNSET the number is used the number is used a in calling a particular in calling a particular the number is used to serid a building
6. by right clicking on a tree node and choosing Data Preview see Fig 3 8 you will get the first one hundred records and the total number of records of the table Let s explore the query plan of the previous executed query You can notice that three local queries have been generated a local query is generated for each local class mapped in the global class and involved in the query and this query translation is 61 Query Plan Global Query Query Plan Queries Global Query saperviaggiare hotel SELECT name address price rating fax SELECT hotel address hotel name hotel city hotel rank hotel fax_number FROM structure FROM hotel WHERE city Bologna where price gt 100 and price lt 200 and rating 3 and city Bologna order by price Query Plan Tree a ofS Queries 4 ofS Local Queries venere hotels SELECT hotels address hotels hotel_name hotels city hotels price hotels user_rating hotels rating FROM hi 4 oz Mapping Query globalSource structure select je2_structure_saperviaggiare_hotel address AS address_1 je2_structure_venere_hotels addre a ofS Final Query SELECT name address price rating fax FROM je2_structure WHERE price gt 100 and price lt 200 and rating 3 and 4 mm Fig 3 7 Query Plan Viewer performed by considering the mappings among the global class and the local schema
7. generation of the final query 81 Step 1 Generation of the local queries LQs Each local query LQ is expressed as following LQ SELECT lt SELECT list gt FROM L WHERE lt condition gt where L is a local class related to GC The lt SELECT _list gt is computed by considering the union of e the global attributes in lt Q_SELECT list gt with a not null mapping in L e the global attributes used to express the join condition for L e the global attributes in lt Q_condition gt with a not null mapping in L The set of global attributes is transformed in the corresponding set of local attributes on the basis of the Mapping Table MT The lt condition gt is computed by performing an atomic constraint mapping each atomic con straint of lt condition gt is rewritten into one that is supported by the local source The atomic constraint mapping is performed on the basic of the Data Transformation Functions and Resolution Functions defined in the Mapping Table For example if a numerical global attribute GA is mapped onto the local classes Li and Ls and an average resolution function AVG is defined for GA the constraint GA value cannot be pushed at the local source because the AVG function has to be computed at a global level In this case the constraint will be mapped as true in both the local sources and the resolution function will be computed only at a global level On the other hand if GA is an homo
8. lt less than or equals operator gt lt lt greater than operator gt gt lt greater than or equals operator gt gt lt left paren gt lt right paren gt lt query gt SELECT lt quantifier gt lt select clause gt lt from clause gt lt where clause gt lt order by clause gt lt group by clause gt lt having clause gt lt quantifier gt DISTINCT 89 lt select clause gt lt asterisk gt lt select list gt lt comma gt lt select list gt lt select list gt lt qualifier gt lt period gt lt asterisk gt lt select sublist gt lt correlation specification gt lt select sublist gt lt attribute reference gt lt set function specification gt lt attribute reference gt lt qualifier gt lt period gt lt global attribute name gt lt qualifier gt lt global interface name gt lt correlation name gt lt global interface name gt lt identifier gt lt global attribute name gt lt identifier gt lt correlation name gt lt identifier gt lt set function specification gt COUNT lt left paren gt lt asterisk gt lt right paren gt lt general set function gt lt general set function gt lt set function type gt lt left paren gt lt quantifier gt lt attribute reference gt lt right paren gt lt set function type gt AVG MAX MIN SUM COUNT lt from clause g
9. mapped into the GlobalAttribute The context menu of the Global Sources tree right click on global source global class or global attribute allows you to add remove or rename global classes global attributes set or modify the type of a GA have a look at the local data using the Data Preview feature set an attribute as join attribute involved in a join condition edit join functions of global classes have a look at the global data using the Global Data Preview feature available just after having selected the join attributes for the global class edit the Transformation Function of a LA edit the Resolution Function ot a GA The Mapping phase permits the user to visualize and manage the GA of each Global Class created By selecting a node or a leaf of the Global Classes Tree the contents of the selected GC is shown In particular the user may visualize and manage the 46 ES Project Global Schema Help BEAR 3 Source Explorer O Touri ES Fm L guidacampeggi E campings B facility L saperviaggiare 15 hotel 3 venere facility E hotels Mapped Elements Unmapped Elements La maps I guidacampeggi 3 saperviaggiare 0 venere b locality string Bl logo string Global Schema Explorer El TourismGlobalschema campings guidacampeggi hotel saperviaggiare hotels venere Overview Local Sources Anno
10. of menu items that permits you to create a new global schema upload an existing one or save the current global schema 3 Help is composed of a set of menu items that permits you to open the Welcome Page and the About Dialog In the tables below are listed the menu items and the toolbar items avail able to you 14 Table 2 1 Menu Icon Menu Item Description New Project Create a new empty project Load Project Load a previously created project Load Demo Project Load the project used in this user guide Ba 60 O HD AS Save Project Save the current project Save Project as Create a copy of the current project New Local Source Upload the schema of a new source Exit Exit the application Table 2 2 Global Schema Menu Icon Menu Item Description E New Global Schema Create a new global schema in the current project Save Global Schema Save the current global schema ss Save Global Schema as Create a copy of the current global schema Table 2 3 Toolbar Items Toolbar Icon Description Create a new empty project Load a previously created project Load the demo project used in this user guide Save the current global schema BRL Save the current project Upload the schema of a new source Upload the schema of a new source through web service Launch Query Manager on the current schema 15 Project Generation Process A Project is a set of Local Sources and a set of Global Schemas the main step
11. phase is per formed by the Global Schema Designer tool Sources Tools Users amp Integrator Designer Global Schema Designer Global Schema Creation t Manager User gt Custom End User Application ae Fig 1 4 The MOMIS architecture GS generation starting from the discovered semantic relationships and the local sources schemas MOMIS generates a GS consisting of a set of global classes plus Mapping Tables which contain the mappings to connect the global attributes of each global class with the local source attributes The GS generation is a process where classes describing the same or semantically related concepts in different sources are identified and clusterized into the same global class Fig 1 2 4 The designer may interactively refine and complete the proposed integration result through the GUI provided by the Global Schema Designer tool In par ticular she he can modify the proposed global classes and mappings Fig 1 3 select the appropriate Join Function for each global class define Transformation Functions in order to transform the local at tribute values into the corresponding global attribute values and solve possible data conflicts through the definition of Resolution Functions applied to each global attribute to obtain starting from the values computed by the Transformation Functions the corresponding value of the global attribute Finally once obtained the desired i
12. semantic relationships Usually integration projects involve large data sources with hundreds of tables and attributes coming from a particular domain of interest e g tourism that s why it is possible to im port domain glossaries in order to annotate terms belonging to a specific domain For example for our project we included the glossary of the Tourism domain Fig 2 15 Why do we need to annotate Because starting from the local source annotations MOMIS can derive lexical relationships among el ements of different local sources this process is called Semantic Relationship 8WordNet is freely and publicly available for download see http wordnet princeton edu In the current version we used WordNet version 3 0 WordNet is distributed as it is and we extrapolated the WordNet internal organization into a relational DBMS DataBase Management System and used it to store also the domain glossaries 28 Extraction phase and it is explained in details in 2 8 section The manual annotation is a two steps process e Base form choice the system automatically proposes the base form if the term exists in WordNet or in the domain glossary if no base form is found or the base form found is not satisfactory we can manually insert it by choosing among all the base forms available in WordNet or in the domain glossary e Meanings choice we have to map the term on one or more mean ings among all the available meanings of the insert
13. synsets through a wide network of semantic relationships The designer can perform automatic annotation and or can manu ally select a base form and the appropriate WordNet meaning s i e synset s for each term Moreover the designer can extend Word Net with Domain Glossaries The Local Source Annotation phase is performed by the Global Schema Designer tool Fig 1 4 5 Mapping Refinement Refined Mapping Tables Mapping Tables Data Data Data Transformation Fusion Reconciliation Fig 1 3 Mapping Refinement 3 Semantic Relationships Extraction Fig 1 2 3 starting from the annotated local schemas MOMIS derives a set of intra and inter schema semantic relationships in the form of synonyms SYN broader terms narrower terms BT NT and related terms RT relationships The set of semantic relationships is incrementally built by adding structural relationships deriving from the structure of each schema lexical relationships deriving from the element annotations by ex ploiting the WordNet semantic network designer supplied relation ships representing specific domain knowledge and inferred relation The type of sources supported at present are MySQL Microsoft SQL Server Oracle DB2 PostgreSQL JDBC Sources JDBC ODBC Sources Microsoft Excel File CSV File Web Service ships deriving from Description Logics equivalence and subsumption computation The Semantic Relationship Extraction
14. 06068495 068841104 Download D Downton F QueryStructure oql TE Raccolte TA Documenti Immagini e Musica E Video 3 Gruppo home Ali Comentar TR Nomefile queryName ogl Salva come oql Nascondi cartelle se Annulla Fig 3 4 Save the Executed Query Also by clicking on Load query button see 2 in Fig 3 3 you can load an already saved query Just select the file that contains the query and the query will appear in the text field see 2 in Fig 3 2 3The file must have the oql extension 99 3 3 Query Plan Viewer Let us see how the result of a query is obtained MOMIS follows a Global As View GAV approach for the definition of map pings between the GS and local schemas the GS is expressed in terms of the local schemas The mapping is expressed by defining for each global class GC a mapping query qa over the schemas of a set of local classes L belonging to GC The query translation is performed by means of query unfolding i e by expanding a global query on a global class GCof the GS according to the definition of the mapping query qa for more details see Appendix C The query unfolding process generates for each global query a Query Plan composed by a set of queries e a set of local queries that have to be executed on the local sources simultaneously by means of wrappers see Fig 3 5 e the mapping query that will fuse the partial results
15. D Beneventano F Guerra and M Orsini Data integra tion In D W Embley and B Thalheim editors Handbook of Conceptual Modeling Springer Berlin Germany pages 443 478 2011 S Bergamaschi S Castano M Vincini and D Beneventano Seman tic integration of heterogeneous information sources Data Knowl Eng 36 3 215 249 2001 M Lenzerini Data Integration A Theoretical Perspective In L Popa editor PODS pages 233 246 ACM 2002 G A Miller R Beckwith C Fellbaum D Gross and K Miller Word Net An on line lexical database International Journal of Lexicography 3 235 244 1990 F Naumann J C Freytag and U Leser Completeness of integrated information sources Inf Syst 29 7 583 615 2004 M Orsini Query Management in Data Integration Systems the MOMIS approach PhD thesis Doctorate School in ICT Computer Engineering and Science University of Modena and Reggio Emilia 2009 89
16. DOM Project dataset 1 4 Important concepts in MOMIS As reported in the Section 1 1 data integration is the problem of combining data residing at different Local Sources and providing the user with a unified view of these data These unified view is called Global Schema Different Global Schemas may be created starting from the same set of Local Sources Project A Project is a set of Local Sources and a set of Global Schemas Projects will be described in Section 2 3 Global Schema A Global Schema is a set of global classes and relationships among them on the other in a Project we use the term Global Schema to denote not only classes and relationships among them but also a set of information local sources and for each local source the local classes and local attributes involved in the integration local sources annotation semantic relationships mappings between global classes and local classes etc All these informations related to a Global Schema are stored in an xml file A workspace is the directory where you store all your project folders You need to have one workspace directory You can create as many projects as you need and they are saved in the workspace 1 5 Glossary In this tutorial some abbreviations will be used e GS stands for Global Schema e GC stands for Global Class e GA stands for Global Attribute e LC stands for Local Class e LA stands for Local Attribute e TF stands for Transformation Function e RF stands
17. ECT hotels address hotels hotel_name hotels city hotels price hotels user_rating hotels rating FROM hotels WHERE price gt 100 and price lt 200 and city Bologna LQ3 saperviaggiare hotel SELECT hotel address hotel name hotel city hotel rank hotel fax_number FROM hotel WHERE city Bologna The local queries are executed by means of wrappers and partial results are materialized in temporary tables into the QMDB relational database Through the Data Preview tool you can visualize the content of this tables The partial results are fused together by executing the mapping query At the end the final query that applies the resolution functions and the residual clauses is executed SELECT name address price rating fax FROM jei_structure WHERE price gt 100 and price lt 200 and rating 3 and city Bologna ORDER BY price ASC 64 Chapter 4 Querying the Global Schema with the MOMIS QM Web Interface 4 1 Query Composition and Execution To compose and execute your query you have to follow some main steps Fig 4 1 DataRiVe MoMIS mam a Fig 4 1 Main steps for Query Composition and Execution 1 Upload the Global Schema you previously created with MOMIS the left panel of the interface will load the Global Source tree 65 Mozo 2 Click on a global class of the tree to see its attributes as checkbox down in the Class Attribute
18. HERE city roma AND name LIKE jolly ORDER BY name Click on Run Query the query will be executed and the result will be shown in the tabular panel see 3 in Fig 3 2 1You can find the full OQL s syntax in Appendix D 2In order to obtain a result in the query phase all the local sources must be reachable 58 3 2 Query Saving and Loading As shown in Fig 3 4 you can save the executed query To save the query just click on Save query button see 1 in Fig 3 3 and insert the name you want to assign to it amp Save the executed query 1 ho Loada saved query 2 Runthe inserted query 3 View the query plan forthe executed query 4 Fig 3 3 Query Manager icon buttons Y MOMIS TourismProject prj 2 8 X Project Global Schema Help OSIO Source Explorer D O TourismGlobalSchema _ 5 QM_TourismGlobalSchema 3 co guidecampeggi Query Manager E campings Ea facility Global Source s 2 eves goto E a a E feciity from structure venere E where city roma E facility E id string and name like jolly mick name string order by name B i E un string E value string E maps E city map string detailed_map string E id string ns T Save Query i rr So dy workspace momis g Demo Query 9 Cerca Demo Query PHONE NUMBER s Organizza Nuova cartella E 0632499 063610138 x 0666396 0666418457 El Risorse recenti Nome ima ipo
19. IS TourismProject prj lo as Project Global Schema Help Welcome to MOMIS What is Data Integration My mois in a nutshell MOMIS User Guide 44 MOMIS Video Tutorials MOMIS Demo Project ff DataRiver Website Fig 2 1 MOMIS Welcome Page MOMIS main window is divided into 3 views Fig 2 2 1 Source Explorer where are visualized the sources of the current project 2 Global Schema Explorer where are visualized the global schemas of the current project 3 Editor Panel where you will be able to edit the global schemas of the current project 12 Project GlobalSchema Help Bo EEE G r GC gt Global Schema Explorer a0 Fig 2 2 MOMIS Main Window 13 Menu bar and Toolbar Project Help Project Global Schema Help Lg New Project Eg IE New Global Schema Ex Load Project C3 Load External Global Schema CN Load Demo Project EQ Save Global Schema Save Project l d Save Global Schema as Save Project as New Local Source El Ext Fig 2 3 Menu bar The menu bar is composed of three menus Fig 2 3 1 Project is composed of a set of menu items that enables the acquisi tion of a local source schema permits you to create a new integration project upload an existing one upload the demo project or save the current project 2 Global Schema is composed of a set
20. N is a symmetric relationship e BT NT A BT B means that A is a broader than B The opposite relationship of BT is NT A NT B means that A is more specific than B e RT A RT B means that the terms are related Structural Schema derived relationships Lexical Lexicon derived relationships User Defined Designer provided relationships Inferred Inferred relationships Fig 2 29 Semantic Relationships Type The Common Thesaurus is incrementally built by adding four types of relationships Fig 2 33 Schema derived relationships also called Structural relationships automatic extraction of intra schema relationships from each schema sepa rately For example we extract intra schema RTrelationships from foreign keys in relational source schemas In the relational model a foreign key is a set of attributes in a relation used to express a reference to another relation In guidacempeggi source we have a foreign key so a Structural relationship is automatically discovered FK facility url REFERENCES campings url gt guidacampeggi facility RT guidacampeggi campings When a foreign key is also a primary key in both the original and reference relations MOMIS extracts BT and NT relationships which are derived from inheritance relationships in object oriented schemas In venere source we have a foreign key that is also a primary key in both the original and reference relations so a structural relationship is automatically discov
21. RICE 102 RATING 3 FAX 051402079 Map data 2012 Google Tele Atlas Termini e condizioni d uso Fig 4 6 Google Map 68 4 3 Query saving At any time you can save your query by clicking on the Save Query button and inserting the name you want to assign to it Fig 4 7 Open Query ES Save Quen Save Query x Query name rimini_Hoteld Fig 4 7 Query Saving TOMCAT saved queries x name schema rimini_Hotels Peer2 tpe TPCHperformance TURI Peer2 0 Fig 4 8 Opening a saved Query If you click on the Open Query button a new window will appear Fig 4 8 containing all the queries you saved before their names and the Global Schema on which they had been executed From there you can run or delete any of this queries by clicking on the corresponding button 4 4 Mapping table If you right click on a globalClass of the globalSource Tree a new window will appear containing the Mapping Table of the Global Class Fig 4 9 that shows how local attributes are mapped into a global attribute The leftmost column of the Mapping Table represents the list of all the global attributes the first row represents all the local classes belonging to the global class the table elements are the local attributes that are part of a local class mapped in a specific global attribute row More attributes may be mapped into the same global attribute 69 Fig 4 9 Hotels Mapp
22. S Bergamaschi D Beneventano S Quattrini E Kazazi Mediator envirOnment for Multiple Information Sources User Guide MOMIS Version 1 2 Data RVGr Contents 1 The MOMIS Data Integration System 3 1 1 MOMIS Overview 3 1 2 Data Integration Process and MOMIS Architecture 5 1 3 The WISDOM case study 242228 iid we A La 8 1 4 Important concepts in MOMIS 9 Lor Glossary siete i Weta ram dia 10 2 The MOMIS System 11 2 1 Honor MOMIS 22 3 sa he mn er be Ds 11 2 2 Getting Started with MOMIS 4 4 7 4 22 46 22404 dual 12 2 3 How to Create a new Project 16 2 4 Local Source Schema Extraction 16 2 4 1 Microsoft SQL Server database Schema Extraction 18 2 4 2 Microsoft Excel File Schema Extraction 21 2 4 3 Connection to a MySQL database through Web Service 22 2 5 How to create a new Global Schema 25 2 6 Local Sources Selection 27 2 7 Local Sources Annotation su biso d OS ee A 28 2 7 1 Manual Annotation 5 28 vii ey sede aye a 30 2 7 2 The WordNet Extender Tool 32 2 7 3 Automatic Annotation ic a 4 37 2 7 4 Annotation Importer 38 2 8 Semantic Relationships Extraction 42 2 9 Mapping Table Creation and Refinement 46 2 9 1 Join Functions 50 2 9 2 Transformation Functions Editor 53 2 9 3 Resol
23. a new or existing lemma with a Y WordNet Extender Y WordNet Extender LL O ES Lemma and Synsets Click Next to inserta new synset and map it on the inserted lemma Lemma and Syntactic Category Syntactic Category Select lemma s syntactic category noun Insert a new lemma or an existing one fax_number adjective adverb x_number in the selected syntactic category does not into the WordNet DataBase Insert one or more similar lemm as more gloss keywords The search will be performed in OR Search E Select one or more synsets from below View Hypernym Graph by inserting similar lemmas or search synsets by gloss keywords or insert a new synset The search will be performed in AND New Synset Please press Search and select a hypernym forthe new synset Hypernym Synset to make a search among existing synsets 2 Lemma s Gloss Al Insert a new synset New Synset Lemma Name fax_number Synset Insert the gloss for the new synset Press Search Hypemym Synset to find a hypernym for the new synset Relationships the number is used to send a document over the telephone line Syntactic Category noun Search Hypernym Synset RelationshipType Lemma s Gloss Exte Relationships that you are going to add into the WordNet Database nd pia V WordNet Gender I aa V WordNet Extender Search Hypern
24. antic Relationships Editor 2 2 44 2 ue Roue User provided Relationships Interface Global Classes Generation Global Schema Editor Icons Legend Mapping Refinement Panel Join PUNCEHON y Es e RES GR PUS CRIS AN TER Transformation Function Panel for the Local Attributes rating and user_rating of the local class Venere Hotels Transformation Function applied for the Local Attributes rat ing and user_rating of the local class Venere Hotels Resolution Function Interface Launch the Query Manager ssa lesene a els The Query Manager Interface Query Manager icon buttons Save the Executed Query Local Queries Execution os al le ig a ia CHE TOC ITR ss sue La Na See da ore da Seti Query Plan Viewer sad pale de PSE pated Data Preview sto ada pere AE D er ale ee aed Mapping Table of structure global class Main steps for Query Composition and Execution Add Condition panel Le pate a lee gee ph ON Be Gi tee gra Add Sorting Options panel vota e aa Row Data shown in a new window Select Attribute ape ae nn Google Map oaa Cla sl said Fid sl Query Saving ERE RINO a Opening a saved Query Hotels Mapping Table Introduction This User Guide helps you learning how to integrate hete
25. atabase through Web Service If it s not possible to establish a direct JDBC connection with a particular source that is distributed over the Internet then the Web Service Wrapper can be used This wrapper has two main components a server and a client 1 Wrapper Web Service Server component On the server side a daemon WSDLWrapper_impl runs it offers a Web Service server object and give you access to a data source for which a wrapper in MOMIS exists Let s see how to start the Web Service Server Obtain e datariver communication 1 2 jar e datariver core 1 2 jar e velocity 1 4 jar e commons collections 20040616 jar Note that this line number is affected by the Skip Empty Lines property If Skip Empty Lines is set to true the line numbers will begin from the first non empty line If set to false all columns types will be string 22 e log4j 1 2 14 jar e mysql connector java 5 1 6 jar packages from the source distribution Then you can directly start the Web Service as follows java cp list of the jars reported above it unimo datariver communication tools wsdl WSDLWrapper WSDLWrapper_impl End_Point_URL schemaName xml source_name e End Point URL an enpoind url describes the port and the path where the service will response to eg http localhost 8080 test e schemaName xml schemaName xml it is an xml file that contains the source connection configuration parameters e sourceName the name of the source B
26. ation Function transforms the type of a Local mapped Attribute into the one of the Global Attribute When you define a new TF you have to bear in mind the following con siderations 1 if only a LA of LC is mapped into a GA the default Transformation Function is Identity i e no transformation is needed and the type of the GA is assumed as the type of the LA 51 r V Transformation Function about venere hotels rating s Function editor strTolnt substring venere hotels rating 40 1 strTolnt substring venere hotels user_rating 42 1 2 Function Mapped Local Attributes Help Category all 7 a venere hotels rating S LA1 S LA2 Returns the concatenation of the values of two venere hotels user_rating a string local attributes Substrin Integer Sum S LA1 string const Integer Subtraction Returns the concatenation of a string local Integer Multiplication E attribute and a constant string Integer Division Float Sum Float Subtraction Float Multiplication Float Division Negation of an Integer Negation of a Float Casting Int To String Casting Int To Double Cactina String Ta Int Fig 2 36 Transformation Function Panel for the Local Attributes rating and user_rating of the local class Venere Hotels 2 if more than one LA of LC are mapped into a GA you have to select the type of the GA and apply the proper TF to the Local Attributes
27. be added into the WordNet database 11 click on Finish to complete the process at this point you can annotate the term by using the new synset just added into the WordNet database 2 7 3 Automatic Annotation In order to optimize the annotation phase and increase the annotation ac curacy we implemented an automatic annotation algorithm which includes stemming and stop words removal functionalities The main advantage of automatic annotation is simply speed wholly or partially automated meth ods facilitate the annotation of large sets of classes If the lexical reference has been extended the automatic annotation algorithm associates to each data source element the more recent meaning of the domain glossary else it associates to the data source element the first meaning most common used meaning present in WordNet By clicking on the Automatic Annotation button the system will try to annotate all the terms Fig 2 25 The algo rithm gives good results in presence of English words but is not able to deal with several compound words That s why we added the Data Preview tool right click on a tree node and explore the values of the class attribute For example winter_contact through the data preview tool we can notice that winter_contact is the number that should be used in winter so we choose the base form phone_number and select the firts meaning After the automatic annotation you can manually refine the annotations propo
28. c 2 ager wes ete ta ata 17 Microsoft SQL Server Connection Parameters 18 Tables and Attributes Selection 19 Data Preview sis sa a Lei ee Le we La e 20 Microsoft Excel File Configuration Parameters 21 Web Service Connection Settings 24 New Global Schema dialog 25 Global Schema Designer Overview 26 Local Sources Details ci es es we ibra Sg ee 27 Annotation with WordNet 28 Source Annotation Section 4 Di de Dil MU a 29 Annotation ICONE i Gig Piede 30 WordNet Icons Ss Lera he rt ene I d 30 Annotation of Class and Attribute Names 31 Hotel name annotation Gis lau io pl MA ie SOx 31 Using WordNet Extender for exending WordNet database 33 Hypernym Graph of the lemma telephone 35 Steps to follow for associating a new or existing lemma with an existing Synset 2 2 ee 36 Steps to follow for associating a new or existing lemma with a NEW SIS O ace tas Bee ba POE id 37 2 25 2 26 DDT 2 28 2 29 2 30 2 31 2 32 2 33 2 34 2 35 2 36 2 37 2 38 3 1 3 2 3 3 3 4 3 9 3 6 3 7 3 8 3 9 4 1 4 2 4 3 4 4 4 5 4 6 4 7 4 8 4 9 Automatic Annotation Process euri puri Periti ped Annotation Importing lar Diadora Les Hypernym Graph Viewer hotel Hypernym Graph Viewer New Synset for hotel Semantic Relationships Type Sem
29. catenation function accepta only string values as it s input For example STRINGCONCATENATION LC2 name LC1 name LC3 hotel name Note Resolution Functions have to be defined only after the definition of Transformation Functions 56 Chapter 3 Querying the Global Schema 3 1 Query Manager Interface Finally once completed the integration process you can pose query on the obtained Global Schema by using the Query Manager Interface To launch the Query Manager right click on the GS in the Global Schema Explorer view or click on Launch Query Manager hyperlink in the overview page see Fig 3 1 Y MOMIS TourismProject prj 5 Project Global Schema 4 Help Sua laa ls Source Explorer O E TourismGlobalSchema 3 os Global Schema Designer Overview Local Sources o In this section it is possible to select the desired sources for the integration project Sources Annotation o In this section it is possible to semantically annotate the selected sources edit section Semantic Relationships Q In this section it is possible to visualize and define inter schema and intra schema relations which are necessary for the clustering phase piola TourismGlobafSchema a Launch Query Manager possible to manually refine mappings automatically generated at the end of the integration project edit section Test Schema sn the global schema Overview Local Sources Annota
30. coming from the local sources on the basis of the join function see Fig 3 6 e a final query to apply the resolution functions and residual clauses see Fig 3 6 Global Query Global Schema facility maps i aS pes FA Lot y Maz So 193 bat ig F Local campings Local Local Schema guidacampeggi venere saperviaggiare Fig 3 5 Local Queries Execution A relational database QMDB gives support to the Query Manager for the fusion of partial results that are stored in temporary tables see Fig 3 6 60 Final Result Mapping Query Lai Result Set guidacampeggi venere saperviaggiare Fig 3 6 Query Processing For each executed query you can view the Query Plan that means you can view the set of queries that compose the query plan and also through the data preview tool you can explore the content of the temporary tables so you can understand how the data fusion process is performed In order to clarify let us execute the query SELECT name address price rating fax FROM structure WHERE price gt 100 AND price lt 200 AND rating 3 AND city Bologna ORDER BY price To open the Query Plan Viewer see Fig 3 7 just click on the Information button see 4 in Fig 3 3 By clicking on the tree nodes see 2 in Fig 3 7 you can visualize in the text field see 1 in Fig 3 7 the selected query Also each query can be executed on the QMDB
31. ddresses from local sources in this case Address local attributes are mapped onto a unique global attribute Address defined by the string concatenation resolution function 2 The Join Attribute Code is not a key in some local classes a A unique address chosen among addresses from local sources on the basis of some criteria b A unique address containing all addresses from local sources 79 80 Appendix C Query Unfolding Theoretical Background The query unfolding process is performed for each global query Q over a global class GC of the global schema Given the global query Q and the mapping defined on the Mapping Table MT the query unfolding process generates the set of local queries LQs to be executed on the sources the mapping query gg for merging the partial results and the final query to apply the resolution functions and residual clauses We give an intuitive explanation of the main steps in the query unfolding process you can find more information in 9 C 1 Query Unfolding steps Given a global query Q as following Q SELECT lt Q_SELECT list gt FROMG WHERE lt Q_condition gt where lt Q_condition gt is a boolean expression of positive atomic constraints GA op value or GA op GA2 with GA and GA attributes of GC The query unfolding process is made up of the following three steps e Step 1 generation of the local queries LQs e Step 2 generation of the mapping query qa e Step 3
32. e Via Au Bologna we Best Western Hote we Via Za Bologna me Savhotel Vita Bologna Royal Carlton Via Mo Bologna art Hotel Novecen Piazza Bologna internazionale Via del Bologna al Cappello Rosso we Viade Bologna art Hotel Commer Via de Bologna w Tower A Boscolo w Viale L Bologna we Paradise w Vicolo Bologna art Hotel Orologio e Via N o Bologna w Golden Tulip Aem we Via Za Fig 2 9 Data Preview 20 Then click on Finish to complete the source schema extraction Once the local source schema extraction process is finished the tables of the source called in the ODLyslanguage local classes will appear in the Source Ex plorer view From there you can access at any time the local Data Preview functionality by right clicking on a local class and choosing Data Preview 2 4 2 Microsoft Excel File Schema Extraction In the first wizard page select Microsoft Office Excel file as source type and enter a name r V Local Source Schema Extraction EI Local Source Schema Extraction Please click Next to get the source schema Es Microsoft Office Excel Absolute file path Advanced Parameters Column Name Line Number Skip Empty Lines 0 false X Skip Empty Columns Transform Column Types false false Reset Default Values Fig 2 10 Microsoft Excel File Configuration Parameters As you can see in figure
33. ed base form Y MOMIS TourismProject prj Leeks Project Global Schema Help an aml a ls Source Explorer D TourismGlobalSehema p ur Global Schema Designer Annotation E campings 3 taciity Sources Overview Semantic Relationships saperviaggiare hotel 1 E d Sources Annotation about winter_contact venere guidacampeggi A B facility E campings Type string E hotels B cy List of the selected senses B maps email fax Base Form Pos Sense Gloss id E tocality come J province B surface telephone Bur E web site leda SE Biz Global Schema Explorer E faciity NewGlobalSchema saperviaggiare _ TourismGlobalSchema Hi hotel venere B facility Ej hotels E maps Automatic Annotation _ f Edit Delete All Annotations Filter Tree Nodes T View Not Annotated Elements L View Annotated Elements Overview Local Sources Annotation Semantic Relationships Mapping Refinement n Fig 2 16 Source Annotation Section As you can see from Fig 2 16 on the left there is a tree representing the local source classes and attributes The colored icons help you find the elements that are not annotated and the elements already annotated see Fig 2 17 You can perform an automatic annotation and then manually refine the annotations proposed by the system You are not forced to annotate all The base form is the engl
34. ed by your interaction and you may manually modify it in the Query Editor but it s recommended to do it only after point 5 just before running the query 7 Run the query and you will see the corresponding output in the Re sults grid 66 The results are paginated 50 at a time and you can look through them using the bottom bar of the grid You may expand the grid by collapsing the top panel of the view port Lastly if you click on one of the rows of the results Grid you will see a new window Fig 4 4 containing all the attributes of that row in order to look through them in a more readable way Fig 4 4 Row Data shown in a new window 4 2 Visualize Results in a Map Once the query has been executed you can visualize the query result on a Google Map First of all click on Visualize in Google Map button see 8 in Fig 4 1 and then select the attribute you want to geocode so it s values will be used to place the markers on the Google Map see Fig 4 5 and click on View Results in Map button Now your records are shown in the map see Fig 4 6 S price rating fax Google Maps x ADDRESS gt View Results in Map Enable Query Editing Fig 4 5 Select Attribute https developers google com maps 2Geocoding is the process of converting an address into geographic coordinates which are used to place markers on the map 67 ADDRESS Via Marco Emilio Lepido 215 40132 Bologna P
35. egration designer Data Fusion is the pro cess of fusing multiple records representing the same real world object into a single and consistent representation In the MOMIS system Data Fusion is performed at the global classes level by the mapping query qa associated to a Global Class The first step in the Data Fusion process is Object Identi fication i e to identify instantiation of the same object in different sources also known as record linkage duplicate detection reference reconciliation and many others In the MOMIS system Object Identification is performed by Join Function also called Join Condition Join Conditions is a conve nient way to perform Object Identification when it is possible to assume that error free and shared object identifiers exist among different sources Join Conditions are defined among pairs of local classes belonging to the same global class More precisely we specify a set of global attributes JA of GC called Join Attributes such that for each join attribute JA that belongs to JA i 1 k and for each local class L L G belonging to G the element MT JA L is not null Given JA JA J Ag J Az for each pair of local classes L L belonging to L G the Join Condition between L and Lo denoted by JC L L is defined as follows MTF JAq Li MTF JA Lo and and MTF JAx L1 MTF J Ax Lo Join conditions are specified at design time and they are used at query time to identify
36. elow is reported the xml schema of the schemaName xml lt xml version 1 0 gt lt Schema name name momisCodeVersion momis 0 0 1 SNAPSHOT compiled today date gt lt Source name source_name description type relational gt lt SourceCommunicationConfiguration driverName it unimo datariver communication core jdbc WrapperJdbcCore_MySql gt WrapperJdbcCore DriverClassName com mysql jdbc Driver WrapperJdbcCore Password password WrapperJdbcCore Url jdbc mysql hostname 3306 database_name WrapperJdbcCore User user WrapperJdbcCore schemaName source_name lt SourceCommunicationConfiguration gt lt Source gt lt Schema gt Wrapper Web Service client component The Client is the wrapper component used to access a published web service So first of all you have to publish the service as shown above 23 Click on the toolbar item amp to upload the schema of the source through web service Enter the connection parameters as shown in figure Fig 2 11 and then click Finish FE Y casona a Local Source Schema Extraction Insert the required parameters in order to connect to the local source Ey Web Service Host Source Name CI Connect Fig 2 11 Web Service Connection Settings The new source displays in the Source Explorer view 24 2 5 How to create a new Global Schema Click on Global Schema gt New Global Schema enter a name for the new global schema and clic
37. emma Fig 2 21 4 for example we could create the new synset an establishment that provides lodging and usually meals and other services for travelers and other paying guests and associate it with the existing lemma hotel 5 in case you are adding a verb or a noun synset as in the previous case number 3 and 4 it is necessary to link the new synset to the hierarchy You should specify at least one new hypernym relationship with another synset in the graph N entity NN physical_entity N a N unit ss N 2 artefact NN instrumentality instrumentation N equipment N medium N electronic equipment telecom Fig 2 22 Hypernym Graph of the lemma telephone The Hypernym Graph Viewer is very helpful to create sound relationships between the added synsets and the pre existing ones during the extension of the lexical reference From the annotation tree right click on a class or attribute name then select the View Hypernym Graph menu item It is possible to navigate the graph by focusing on a specific synset to view only the branch of its hypernyms or by using the keyword search For example the direct hypernyms of the two synsets associated to telephone 34 E 1 Y were tende LLL a Lemma and Synsets Click Finish to map the lemma on the selected synset s Lemma and Syntactic Category Lemma 1 Syntactic Category Insert a new lemma or an existing one Select lemma s syntactic cate
38. en all Join Condition related to this element are considered true and then the fusion is not properly performed for the instances coming from this LocalClass As an example let us consider that in the above GC Hotel the element Name LC2 is null then if the global attribute Name is chosen as Join Attribute we have that JC LC1 LC2 is true and thus no fusion is performed If Name and City are chosen as Join Attributes we have that JC LC1 LC2 is true AND LC2 city LC1 city i e JC LC1 LC2 is LC2 city LC1 city and thus fusion is performed only on the City attribute 74 Appendix B Data Fusion Theoretical Background Data fusion is the process of fusing multiple records representing the same real world object into a single consistent and clean representation In MOMIS Data fusion is performed with the so called emphFULL OUTER JOIN MERGE OPERATOR 8 To give an intuition about this FULL OUTER JOIN MERGE OPERA TOR we introduce a simple example of GS The relational schema of the local classes relation tables LL1 K1 A K1 primary key LL2 K1 A K1 primary key LL3 K1 A K1 primary key are integrated in the global class C with the following Mapping Table LL1 LL2 LL3 K1 i K1 Kl Kl A A A A For the sake of simplicity we consider that all local attributes are not transformed with respect to the global class i e for each local attributes the identity transformation function is conside
39. er paying guests Then we have to build a relation between the added concept and pre existing ones We are asked to select some keywords the system automatically suggest a set of rele vant keywords deriving them from the inserted gloss or insert new keywords and make a search among existing synsets In our case we want to relate hotel with the already existing synset structures collectively in which peo ple are housed identified also by the set of lemmas living_accommodations housing lodging and create a new hypernym relationship Fig 2 28 That means into the WordNet database will be added two symmetric relationships e hotel is a hyponym of living_accommodations housing lodging e living_accommodations housing lodging is a hypernym of hotel 39 Hypernym Graph for hotel Searchi Fig 2 27 Hypernym Graph Viewer hotel TC O Hypernym Graph for hotel Search hotel X You can view for each synset the gloss and the hypenym relationships and you can focus on a particular Fig 2 28 Hypernym Graph Viewer New Synset for hotel 40 2 8 Semantic Relationships Extraction Starting from the annotated local schemas MOMIS builds a Common The saurus that describes intra and inter schema knowledge in the form of the following semantic relationships synonyms SYN broader terms narrower terms BT NT and related terms RT e SYN ASYNB means that A and B are synonyms SY
40. ered Al FK maps url REFERENCES hotels url gt venere maps NT venere hotels Lexicon derived relationships also called Lexical relationships inter schema lexical relationships derived by the annotated terms and Word Net interaction As described in the previous section WordNet connects the synsets through a wide network of semantic relationships Actually it is possible to exploit the following semantic relationships e Synonymy terms that share the same meaning i e belong to the same synset e Hypernymy Hyponymy Generalization relation Specialization relation Hypernym and Hyponym are opposite relations e Meronymy Holonymy is the partwhole or HASA relation Meronymy and Holonymy are opposite relations e Correlation is a relation between two terms in two synsets that shares the same father in the hypernymy graph The relations coming from WordNet are added to the Common Thesaurus according to the following mapping e Synonymy corresponds to a SYN relationship e g guidacampeggi campings zip SYNvenere hotels postal code e Hypernymy Hyponymy correspond respectively to a BT NT relation ship e g venere hotels hotel name NTquidacampeggi campings name e Meronymy Holonomy correspond to a RT relationship e Correlation corresponds to a RT relationship e g venere hotels hotel name RTvenere facility facility_name Designer supplied relationships specific domain knowledge capture The designer can suppl
41. et Database fax_number is a Hyponym of telephone_number telephone_number phone phone_number number number number is a Hypernym of fax_number lt Back Not 11 Ce new synset 5 click on finish following the next steps For associating a new or existing lemma with a new synset you are asked to follow the steps shown in Fig 2 24 1 2 insert the lemma choose the lemma s syntactic category check Insert a new synset click on Next insert the gloss for the new synset 36 Then you will be able to annotate the term by using the new association lemma synset just inserted into WordNet If at step 4 you do not find the proper synset for your new lemma you can add a new synset simply by 6 click on search hypernym synset a new window will appear for help ing you in finding the more appropriate hypernym synset in order to accurately place the new synset in the WordNet hierarchy 7 choose one or more keywords among the ones proposed by the system or insert other keywords and search among glosses or lemmas then click on search a list of synsets will appear 8 select the hypernym synset and click on View Hypernym Chart for having a look at its collocation in the WordNet hierarchy in order to check if it is the more appropriate hypernym for the new synset 9 click on OK 10 the system will notice the new relationships that will
42. f the same object in different sources also known as record linkage duplicate detection reference reconciliation and many oth ers In the MOMIS system Object Identification is performed by Join Conditions among local classes belonging to the GC this join condition is 49 specified on the transformed local attributes More precisely Object Identi fication requires to specify a set of global attributes called Join Attributes such that for each JoinAttribute the element JoinAttribute LocalClass is not null for each LocalClass belonging to the GC then Join Conditions are automatically defined as follows Let JA1 JAk be the join attributes specified by the integration de signer Given the element JAi LC let TF JA LC be the corresponding Transformation Function For each pair of local classes LC1 LC2 belonging to the GC the Join Condition between LC1 and LC2 denoted by JC LC1 LC2 is defined as follows TF JA1 LC1 TF JA1 LC2 AND AND TF JAk LC1 TF JAk LC2 As an example for the GC structure and the local classes LC1 quidacampeggi campings LC2 saperviaggiare hotel LC3 venere hotels in the mapping table we have respectively e name LC1 name with Identity as TF e name LC2 name with Identity as TF e name LC3 hotel name Identity as TF e city LC1 city with Identity as TF e city LC2 city with Identity as TF e city LC3 city with Identity as TF If the GA name is chosen as Join Attribute
43. for Resolution Function e JF stands for Join Function e Label is the name of an attribute or a class e Sense is a meaning of a label 10 Chapter 2 The MOMIS System 2 1 Launching MOMIS MOMIS is a Java based application and as such a Java Runtime Environment JRE is required in order to run it It is developed as an Eclipse Rich Client Platform RCP application so no installation is required To tun MOMIS complete the following steps e Download the right zip folder corresponding to your operating system e Unzip the MOMIS zip file and in the folder double click the executable file e A License window displays Read and accept the terms of the license agreement to continue That is all now you can start integrating sources and querying them http wiki eclipse org Rich_Client_Platform 2You can dowload the zip folder from http www datariver it gt Download menu 3The License displays only the first time you run MOMIS you will be asked to accept the MOMIS License GPL Version 2 if you don t accept the license the application will not start and will exit 11 2 2 Getting Started with MOMIS The MOMIS Main Window MOMIS starts presenting a Welcome Page Fig 2 1 in which you can find some information about Data Integration and the MOMIS Application If you click on MOMIS Demo Project the Welcome Page will close and the project used in this user guide will load in the workspace a Y MOM
44. geneous attribute no resolution function defined the constraint will be pushed at the local sources Thus an atomic constraint GA op value will be rewritten on the local class L as follows MT F GA L op value if MT GA L is not null and the op operator is supported by L and the data trans formation function MTF is supported by L the resolution function f is supported by L true else Atomic constraints of the kind GA op GA2 will be rewritten in a similar way 82 Step 2 Generation of the mapping query qa The LQs partial results will be merged together by means of the full outerjoin merge operator Step 3 Generation of the final query The final query performs the application of resolution functions and residual clauses e for Homogeneous Attributes no conflict on data values the system can consider one of the values without preference e for non Homogeneous Attributes the system have to apply the associ ated Resolution Function C 2 Multiple Class Queries Given the global classes GC GC GC we consider a Global Query Q Q select lt Q_select list gt from G1 G2 Gn where lt Q_condition gt order by lt order_by_list gt where lt Q_condition gt is a Boolean expression of positive atomic constraints Gi GA1 op value or Gi GA1 op Gj GA2 where GA1 and GA2 are global attributes The query unfolding is performed in two steps In the first step with standard rewriting rules the lt Q_co
45. gory telephone noun O verb adjective adverb Lemma telephone in the selected syntactic category already i exists into the WordNet DataBase View Hypernym Graph Synsets Search synsets by inserting similar lemmas or search synsets by gloss keywords or insert a new synset Insert one or more similar lemmas Insert one or more gloss keywords The search will be performed in OR The search will be performed in AND phone_number Search E Select one or more synsets from below Lemma s Gloss Extender la M J Found 1 synset s Insert a new synset A Fig 2 23 Steps to follow for associating a new or existing lemma with an existing synset are telecommunication telecom and electronic_equipment Fig 2 22 By double clicking on a synset you will see its whole hierarchy The wordnet Extender usage is quite simple for associating a new or existing lemma with an existing synset you are asked to follow the steps shown in fig 2 23 1 insert the new lemma 2 choose the lemma s syntactic category 3 search the synset by inserting similar lemmas or gloss keywords to make a search among synset glosses 4 select the synset s you want the lemma to be associated with 12The synset associated to entity is the top parent of the hierarchy for synsets of the noun syntactic category 35 Fig 2 24 Steps to follow for associating
46. i e JA name then e JC LC1 LC2 LC1 name LC2 name e JC LC1 LC3 LC1 name LC3 hotel name e JC LC2 LC3 LC2 name LC3 hotel name 50 If name and city are chosen as Join Attributes i e JAl name and JA2 city then e JC LC1 LC2 LCl name LC2 name AND LCl city LC2 city e JC LC1 LC3 LC1 name LC3 hotel name AND LC1 city LC3 city e JC LC2 LC3 LC2 name LC3 hotel name AND LC2 city LC3 city JC LC1 LC2 LC3 LC1 name LC2 name AND LC1 city LC2 city On the basis of these Join Conditions multiple records coming from dif ferent local classes and representing the same real world object are combined in a single record by means of a full outer join operator For example assuming the last case of join conditions the view for the GC structure is automatically obtained as LC1 full outer join LC2 on LCl name LC2 name AND LC1 city LC2 city full outer join LC3 on LC1 name LC3 hotel name AND LC1 city LC3 city OR LC2 name LC3 hotel name AND LC2 city LC3 city Note if you want to read more about Join Functions see Appendix B 2 9 2 Transformation Functions Editor When a GC is selected the corresponding Mapping Table is shown in the lower panel of the window By right clicking on a non empty element Glob alAttribute LocalClass or on a local attribute in the Global Classes Tree and choosing Edit Transformation Function a new window appears Fig 2 36 The Transform
47. i string true facility url REFERENCES hotels url 2 facility_name ss ic ni nt Pid Pur E hotels El address city El hotel name id El logo El postal code E price Bag E reviews number Overview Local Sources Annotation Semantic Relationships Mapping Refinement Fig 2 14 Local Sources Details 27 2 7 Local Sources Annotation LOCAL CLASSES NAMES LOCAL ATTRIBUTES NAMES WordN Lexical database http wordnet princeton ed Fig 2 15 Annotation with WordNet In the first step we selected the local sources that we want to integrate Fig 2 14 In this step we want to express the meaning of classes and at tributes therefore the annotation procces must be performed What is the annotation It is a mapping of a given term class or attribute name to a well defined set of concepts of a lexical ontology The annotation process consists in associating to each term one or more meanings w r t a common lexical reference For example the attribute name will be annotated with the meaning a language unit by which a person or thing is known In the cur rent version of MOMIS we adopt as lexical reference the WordNet database WordNet is a thesaurus for the English language that groups words called lemmas in the WordNet terminology into sets of synonyms called synsets provides short definitions called gloss and connects the synsets through a wide network of
48. ic Relationships Mapping Refinement Fig 2 30 Semantic Relationships Editor To compute the Structural and the Lexical relationships press the Com pute Structural and Lexical Rels button see 1 in Fig 2 30 To supply new relationships press the Add button see 2 in Fig 2 30 A new dia log will be open and it allows you to add one ore more relationships among attributes classes of different sources Fig 2 31 If a relationship is meaning less you can delete it from the Common Thesaurus select the relationship and then press the Delete button see 3 in Fig 2 30 Then to infer new relationshpis starting from the ones present in the Common Thesaurus press the Compute Inferred Rels button see 4 in Fig 2 30 SS 16The ODB Tools is a fully modular software for schema validation and query optimiza tion in OODB For more information see http www dbgroup unimo it ODB Tools html 43 r V Add Thesaurus Relationship ON sca s Please add a new relationship using the following fields Source Destination ib TourismGlobalSchema 4 il TourismGlobalSchema 3 guidacampeggi gt Li guidacampeggi saperviaggiare saperviaggiare 3 venere 4 venere La facility 3 facility E hotels Rel Type E hotels La maps RT 4 3 maps E city_map p city_map detailed_map E detailed_map E id El id E location El location P ur un E
49. ing Table 70 Appendix A Data Transformation Functions and Join Function Theoretical Background Data Transformation Functions For each not null element MT GA L an element MT GA L represents the set of local attributes of L which are mapped onto the global attribute GA we define a Data Transformation Function denoted by MTF GA L which represents how the local attributes of L are mapped into the global attribute GA MTF GA L is a function that must be executable supported by thse local source of the local class L For relational sources MTF GA L can be also a SQL value expression As an example let s take the global class structure the global attribute phone_number and the local class guidacampeggi campings In the Mapping Table we can notice that MT phone_number guidacampeggi campings telephone winter_contact One transformation function that can be applied is the String Concate nation telephone Winter Phone Number winter_contact Join Functions MOMIS follows a Global As View GAV approach for the definition of map pings between the GSand local schemas the G Sis expressed in terms of the 71 local schemas This means that for each Global Class GC a view VC over the Local Classes of GC must be defined One of the main innovation of the MOMIS system is that the view associ ated to GC is automatically defined by the system i e this view don t need to be explicitly defined by the int
50. ish root form of a word only the root form is stored in the database unless it has irregular inflected forms 10The meaning is a definition that represents the sense or the significance of a word i e a dictionary definition of a particular word 29 fi Class Attribute not annotated and no base form corresponding to the element name has been found in WordNet Class Attribute not annotated but the corresponding base form has been found in WordNet Class Attribute annotated w r t WordNet Database Es Class Attribute annotated using a new lemma synset association provided by the user through the WordNet Extender Tool Fig 2 17 Annotation Icons terms but in this case there is a loss of semantics This means that the system will be able to discover less Lexical Relationships and this impacts on the quality of the proposed global source 2 7 1 Manual Annotation As we described in the previous section the manual annotation is a two steps process 1 Base form choice 2 Meaning s choice This process must be repeated for each term you want to annotate In WordNet lemmas and synsets are classified into four syntactic cate gories nouns verbs adjectives and adverbs Each syntactic category is iden tified by the symbols shown in Fig 2 18 nouns Y verbs I adjectives if adverbs L1 Fig 2 18 WordNet Icons Let us see how to annotate class and attribute names Click on the tree node Fig 2 16 that repre
51. ject gt New Local Source and a wizard that will guide you in the local source schema acquisition process will start Choose from the list the local source type Fig 2 6 and enter a name in the text field the name entered should be unique within a project Depending on the source type selected the next wizard page can be different Let us see how to extract the schema of a Microsoft SQL Server database a Microsoft Excel file and a MySQL Database through Web Service 2 4 1 Microsoft SQL Server database Schema Extrac tion In the first wizard page select Microsoft SQL Server database as source type and enter a name E Y Local Source Schema Extraction Local Source Schema Extraction Insert the required parameters in order to connect to the new local source lt MS SQLServer Database Host Port localhost 1433 Username Password username 00000008 Connect Database Connection String Fig 2 7 Microsoft SQL Server Connection Parameters The type of sources supported in this version are MySQL Microsoft SQL Server Oracle DB2 PostgreSQL JDBC Sources JDBC ODBC Sources Microsoft Excel File CSV File Web Service For each data source a wrapper component is implemented 18 Local Source Schema Acquisition Click Finish to add the schema of the new source with the selected table s and attribute s in the project D Tables Views and Attributes 4 facility i id facili
52. k OK Fig 2 12 The name of the global schema must be unique within a project The new GS displays in the Global Schema Explorer view The system automatically loads the just created GS in the Global Schema Designer editor Fig 2 13 r Y New Global Schema Enter the Global Schema name NewGlobalSchemal Fig 2 12 New Global Schema dialog 25 Y MOMIS TourismProject prj Project Global Schema Help Ba ERRE 3 Source Explorer E NewGlobalSchema 23 D picape Global Schema Designer Overview campings 2 B facility saperviaggiare Local Sources In this section it is possible to select the desired sources for the integration project sit y Sources Annotation In this section it is possible to semantically annotate the selected sources edit section Semantic Relationships In this section it is possible to visualize and define inter schema and intra schema relations which are necessary for the clustering phase E NexGlobalSchema edit section TourismGlobalSchema Tags REGEN In this section it is possible to manually refine mappings automatically generated at the end of the integration project edit section Test Schema Execute queries on the global schema Launch Query Manager Overview Local Sources Annotation Semantic Relationships Mapping Refinement Fig 2 13 Gl
53. mple in fig 2 20 you may see that the base form hotel name is not present in WordNet so we decided to choose the base form name since we used name for annotating another element before Fig 2 19 a green arrow notices us which meanings have been previously choosen for that base form 3 insert the lemma hotel name into the WordNet database using the WordNet Extender Tool which will be explained in depth in the sub section 2 7 2 by clicking on the Open WordNet Extender button Since the manual annotation is time consuming MOMIS provides two particular tools for accelerating this process 1 The Automatic Annotation 2 The import of Source Annotations These tools will be described in details in the subsections 2 7 3 and 2 7 4 2 7 2 The WordNet Extender Tool The WordNet Extender 1 tool enables the extension of the lexical reference with domain glossaries In order to understand how to use the WordNet Extender it is important to get the sense of some concepts related to Word Net In WordNet english nouns verbs adjectives and adverbs are organized into sets of synonyms called synset each synset has a short definition called gloss One ore more base forms called lemmas are associated to each synset for example the gloss the number is used in calling a particular telephone is the definition of a synset associated to the lemma phone_number A word that has more than one sense is said to
54. ndition gt is unfolded w r t the global classes Gi of the query Q In this way we obtain the following rewriting Q select lt Q_select list gt from Q1 Q2 Qn where lt join_condition gt and lt residual_predicate gt order by lt order_by_list gt where e Qi is a Single Class Query Qi select lt Qi_select list gt from Gi where lt Qi_condition gt 83 where lt Qi_select list gt isthe union of the attributes in lt Q_select list gt in lt join_condition gt and in lt residual_predicate gt lt Qi_condition gt is a condition which can be solved w r t the global class Gi i e is a condition which uses global attributes of Gi e lt join_condition gt is a conjunction of constraints Qi GA1 op Qj GA2 e lt residual_predicate gt are the residual predicates In the second step each single class query is unfolded w r t local classes by taking into account the mappings M this step has been discussed in the previous section 84 Appendix D The OQL y query language syntax In the following we included the OQL 3 syntax accepted by the MOMIS Query Manager From the 1 2 version the aggregate functions has been added to the OQLyssyntax accepted by the MOMIS Query Manager lt asterisk gt lt comma gt lt period gt lt quote gt lt underscore gt _ lt equals operator gt lt not equals operator gt lt less than operator gt lt
55. ntegration result a user can pose queries on the GS by using the Query Manager tool see Fig 1 4 As MOMIS follows a GAV approach the query processing is performed by means of query unfolding The query unfolding process generates for each global query i e a query on the GS a Query Plan composed by a set of queries e a set of local queries that have to be executed on the local sources simultaneously by means of wrappers e a mapping query for merging the partial results defined by means of the join function e a final query to apply the resolution functions and residual clauses Moreover MOMIS provides the Query Manager Web Service which al lows to integrate MOMIS with other applications e g Business Intelligence solutions and a user friendly Web Application Fig 1 4 to guide an end user without experience on data integration solutions to easily compose and execute query on the integrated schema see Chapter 4 1 3 The WISDOM case study In this user guide we used an integration project that involves a set of local sources representing Hotels and Campings the data sources has been ex tracted from three popular Websites www venere com it www saperviaggiare it and www guidacampeggi com These sources have been used in the WISDOM project Web Intelligent Search based on DOMain ontologies www dbgroup unimo it wisdom for the development of a Tourism Vertical Web Portal Fig 1 5 Fig 1 5 The WIS
56. obal Schema Designer Overview 26 From here you can edit any of the following sections e Local Sources e Sources Annotation e Semantic Relationships e Mapping Refinement You have to complete each section in order to obtain a correct GS 2 6 Local Sources Selection In this section you have to select the sources you want to include in your GS by choosing them among the sources you acquired before Right click on a source from the Source Explorer view and click on Add selected source to the Global Schema for each selected source you can see more information in the Source Details ODLys Representation section as shown in Fig 2 14 Y MOMIS TourismProject prj So Project Global Schema Help G8 a ole Source Explorer 7 9 Tour da guidacampeggi Global Schema Designer Local Sources E campings ig facility Overview Overview Annotation saperviaggiare Local Sources Selection El hotel venere guidacampeggi ig facility ees El hotels rea B maps For each selected source you can have more information in the section below Remove Source Source Details ODLI3 Representation Source Name venere Type orgh2 Driver Server PortNumber Global Schema Explorer o Username sa _ NewGlobalSchema Connection String jdbch2testsourcesDb mapping_tourismivenere_d db MODE MySQL IGNORECASE TRUEuser sa L TourismGlobalSchema venere Attribute Name Type Primary Key Foreign Key la facility ur
57. or each GlobalAttribute such that there are more than one non empty elements GlobalAttribute LocalClass we must define a Resolution Function to obtain starting from the values computed by the Transformation Functions specified for GlobalAttribute LocalClass the corresponding value for GlobalAttribute To resume in the MOMIS system Data Fusion is performed by combining the SQL operator of full outer join with resolution functions From a theo retical point of view this operation is called full outer join merge operator Felix Naumann Johann Christoph Freytag and Ulf Leser Completeness of integrated information sources Inf Syst 29 7 583U615 2004 This data fusion operation is automatically defined by the MOMIS system the inte gration designer must only define Transformation and Resolution Functions and the Join Attributes On the other hand the integration designer may change the view auto matically associated to a GC by explicitly defining a specific Join Condition by right clicking on a GC choosing edit Join Function writing the function in the Join Function Panel and save it The default Join Function provided by the system is based on Join Attributes as defined before The choice of Join Attributes is fundamental to perform a correct Data Fusion process For more technical details on the Data Fusion process please see Section B In the following we list the conditions for a correct definition of Join attributes plea
58. phase we found as the most adequate annotation for the term guidacam peggi campings Base form camp and Meaning temporary lodgings in the country for travelers or vacationers while a problem arise from the source venere We didn t find satisfactory the meaning associated to hotel a building where travelers can pay for lodging and meals because it refers to 38 NewGlobalSchema 3 Y Import Annotation for quidacampeggi EESW l NewGlobalSchema 3 Global Schema Designer Annotation Select a Global Schema from the list below Global Schema Designer Annotation Bs ES Global Schema MW sources AZ Annotation about rotation from another Global Schema Ga a Fig 2 26 Annotation Importing hotel as a building and not as aservice From the Hypernym Chart we can no tice that the synset associated to hotel is a hyponym of the synset a structure that has a roof and walls and stands more or less permanently in one place identified also by the set of lemmas building edifice Fig 2 27 and camp is a hyponym of the synset structures collectively in which people are housed identified also by the set of lemmas living_accommodations housing lodging Through the WordNet Extender tool we created a new synset for the lemma hotel by introducing the gloss a lodging that provides accommodation meals and other services for travelers and oth
59. queries on it As reported in the previous section it follows a Global As View GAV approach for the definition of mappings between the GS and local schemas the GS is expressed in terms of the local schemas The GS generation process is composed by four main phases 4 Global Schema 3 Semantic Relationship Generation 1 Local Source Acquisition Extraction Global Classes wal 1 i 1 ia i 1 7 vi r r 1 1 i 4 Lexical Relationships Local Sources Local Schemas Structural Relationships Cluster Generation Semantic Relationships Designer S upplied ou Relationships Le free EHE Mapping E Tables 2 Local Source Annotation Automatic Manual Fig 1 2 The MOMIS Data Integration Process 1 Local Schema Acquisition Fig 1 2 1 The extraction of the Local Source Schema is performed by wrappers that logically extract the schema of each local source and convert it into the common language ODL3 2 Local Source Annotation Fig 1 2 2 the designer is asked to an notate the local sources i e to associate to each class and attribute name one or more meanings w r t a common lexical reference that in this case is the lexical database WordNet 7 WordNet is a the saurus for the English language that groups terms called lemmas in the WordNet terminology into sets of synonyms called synsets pro vides short definitions called gloss and connects the
60. rank is a string containing a number from 1 to 5 rating and user_rating instead are urls linking to an image containing the stars if we want rating and user_rating to look similar to rank we need to apply a substring function that extracts just the number of the stars from the whole url the stringtoint function for casting it to an integer value the sum and the division operator for calculating the average value Finally click on Save to save the inserted funcition The icon f for a non empty element of a Mapping Table means that a TF is defined for that element Note if you want to read more about Transformation Functions see Appendix A 94 2 9 3 Resolution Functions Another issue is how to obtain the GA value when it is mapped onto more Local Attributes in this case we need to define a Resolution Function Its application will permit to obtain starting from the values of the Local At tributes eventually transformed by a TF the value for the GA By right clicking on a GA and choosing edit Resolution Function a new window appears Fig 2 38 The interface is similar to the Transformation Function s one but the available functions are different e If function functionif condition true false e Coalesce function coalesce function1 function2 e String Concatenation Function functioni Cfunction2 Y Resolution Function about rating Function editor coalesce saperviaggia
61. re hotel rank strToInt substring venere hotels rating 40 1 A strTolnt substring S venere hotels user_rating 42 1 2 Function Transformation functions Local Attributes Help all strToInt substring S venere hotels ratil coalesce S f GA L1 S f GA L2 3 S saperviaggiare hotel rank The coalesce function returns the value of the o first of its input parameters that is not NULL IF function String Concatenation y saperviaggiare hotel rank E venere hotels rating venere hotels user_rating Fig 2 38 Resolution Function Interface 99 The If function allows you to impose a condition among attributes if the condition is true the function returns the first value else returns the second one For example FUNCTIONIF state lt gt Italy LC2 name LC1 name LC3 hotel_ name returns LC2 name if state value is different from Italy otherwise returns LC name LC3 hotel name The Coalesce function returns the first not null value of a given list of Local Attributes transformed by the Transformation Function For example COALESCE LC2 name LC1 name LC3 hotel_ name returns LC2 name if it s not null if null it returns LC1 name if it s not null otherwise returns LC3 hotel_name The String concatenation function returns concatenation of the values computed by the Transformation Functions applied to the local attributes the String con
62. red As said before MOMIS follows a Global As View GAV approach for the definition of mappings between the GVV and local schemata this means 75 that for the global class C a view VC over the local classes LL1 LL2 and LL3 must be defined this view is defined with the FULL OUTER JOIN MERG OPERATOR in the following we will show an SQL implementation of this operator First of all if K1 is defined as Join Attribute the join condition among the local classes will be performed on the basis of the corresponding K1 attributes of the local classes as follows select from LL1 full outer join LL2 on LL1 K1 LL2 K1 full outer join LL3 on LL3 K1 LL2 K1 OR LL3 K1 LL1 K1 It can be demonstrated that the order of local classes in the full outer join operation is not relevant i e the same result will be obtained with the following expression select from LES full outer join L 2 on LL3 K1 LL2 K1 full outer join LL1 on LL1 K1 LL2 K1 OR LL1 K1 LL3 K1 Then in the following we will use the first expression given above To complete the definition of the view associated to the global class we need to define its K1 and A values The K1 value is one of the not null values among LL1 K1 LL2 K1 and LL3 K1 then it can be computed as isnull isnull LL1 K1 LL2 K1 LL3 K1 the order of LL1 K1 LL2 K1 and LL3 K1 is not relevant The A value depends on the chosen resolution function as an example let us consider the coalesce
63. rence gt lt order by clause gt ORDER BY lt sort specification list gt lt sort specification list gt lt sort specification gt lt comma gt lt sort specification gt lt sort specification gt lt sort key gt lt ordering specification gt lt sort key gt lt attribute reference gt lt ordering specification gt ASC DESC lt group by clause gt GROUP BY lt grouping attribute reference list gt 87 lt grouping attribute reference list gt lt grouping attribute reference gt lt comma gt lt grouping attribute reference gt lt grouping attribute reference gt lt attribute reference gt lt having clause gt HAVING lt search condition gt 88 Bibliography 1 R Benassi S Bergamaschi A Fergnani and D Miselli Extending a lex icon ontology for intelligent information integration In R L de Mantaras and L Saitta editors ECAI pages 278 282 IOS Press 2004 D Beneventano S Bergamaschi F Guerra and M Vincini Synthesizing an Integrated Ontology IEEE Internet Computing Journal pages 42 51 Sep Oct 2003 S Bergamaschi D Beneventano A Corni E Kazazi M Orsini L Po and S Sorrentino Open Source release of the MOMIS Data Integration System In G Mecca and S Greco editors Proc of the Nineteenth Italian Symposium on Advanced Database Systems SEBD 26 29 June 2011 Maratea Italy pages 175 186 2011 S Bergamaschi
64. resolution function A coalesce LL1 A LL2 A LL3 A Then the view associated to the global class is select isnull isnull LL1 K1 LL2 K1 LL3 K1 AS K1 coalesce LL1 A LL2 A LL3 A AS A from LL1 full outer join LED on LL1 K1 LL2 K1 full outer join LL3 on LL3 K1 LL2 K1 OR LL3 K1 LL1 K1 In other words at the query select from C on the global class C will correspond the above query on its local classes This is true when the join attribute K1 is a key in each local class 76 To give an example of the result of this view let us consider the following instances of local classes LL1 LL2 and LL3 respectively 123 A1 123 12 A1 12 1 ALi A1 13 12 2 12 A2 12 123 A2_123 23 A2 23 A3_3 123 A3_123 13 A3_13 23 143 23 Please notice that the value 123 for K1 means that a record with this value for K1 is present in local classes LL1 LL2 and LL3 value 12 means that the record is present in local classes LL1 and LL2 and so on The result is as follows select isnull isnull LL1 K1 LL2 K1 LL3 K1 as Ki coalesce LL1 A LL2 A LL3 AS A FROM LL1 full join LL2 on LL1 K1 LL2 K1 full join LL3 on LL1 K1 LL3 K1 or LL2 Ki LL3 K1 We can observe that for each K1 value in the local classes there is only TT a record with this K1 value in the result and the A value is computed on the basis of the resolution function To highlight the importance of choosing a key as Join A
65. ributes Mapped Local Classes Operators structure a guidacampeggi camp AND E dica saperviaggiare hotel OR Y venere hotels gt E saperviaggiare hotel address 5 lt E venere hotels address Il city E saperviaggiare hotel city E guidacampeggi campings city 4 mile gt E venere hotels city B email sn Us E saperviaggiare hotel email fall outer join i i left outer join E guidacampeggi campings emz right outer join A fax inner join E saperviaggiare hotel fax_numb MR A RE TESS ER m 4 Global Data Preview __ Save j Close L na Fig 2 35 Join Function MOMIS follows a Global As View GAV approach for the definition of mappings between the GS and local schemata the GS is expressed in terms of the local schemata This means that for each Global Class C a view VC over the Local Classes of C must be defined One of the main innovation of the MOMIS system is that the view asso ciated to a GC may be automatically composed by the system i e this view need not be explicitly defined by the integration designer The automatic composition of the view is based on the following assump tion a GC performs Data Fusion among its local class instances where Data Fusion is the process of fusing multiple records representing the same real world object into a single and consistent representation The first step in a Data Fusion process is Object Identification i e to identify instantiation o
66. rogeneous and dis tributed sources with the MOMIS Data Integration System The User Guide is organized as follows Section 1 presents the MOMIS system by describing its architecture and and by identifying the main phases of the data integration process in Section 2 the MOMIS GUI and the Global Schema generation process are explained with images and examples Section 3 explains how to query the obtained Global Schema and finally Section 4 explains in details how to use the MOMIS Query Manager Web Interface If you are looking for a quicker MOMIS guide have a look at the MOMIS Video Tutorials available on the DataRiver YouTube channel http www youtube com user DataRiverSrl or on the Datariver Web Site http www datariver it You can find all MOMIS publications on the DBGroup publications page http www dbgroup unimo it site2012 index php publications Chapter 1 The MOMIS Data Integration System 1 1 MOMIS Overview Data integration is the problem of combining data residing at different sources and providing the user with a unified view of these data 6 The MOMIS Data Integration System Mediator envirOnment for Multi ple Information Sources is a framework able to integrate data coming from heterogeneous and distributed data sources structured and semi structured in a semi automatic way to bring out new information from apparently unre lated existing data An object oriented language with an underlying Descrip tion
67. s panel where you can select them 3 The attributes you choose will be written in the query editor together with the class Note If you want to perform a two way JOIN between classes you have to click on the first class then click on a class referenced an alert message will ask you if you want to join the classes after clicking the yes button you will see the attributes of both classes in the Class attributes panel in two different tabs and the join condition will be automatically written in the query editor In the GlobalSources Tree the node referred to the second class will be expanded and the reference to the first class will be enlighten 4 After choosing the attributes you can add and or conditions by clicking the add condition button Fig 4 2 Conditions may concern any attribute of the classes involved even the ones you didn t select 5 If you want the query result to be ordered by a specific attribute set you may also add sorting options Fig 4 3 by clicking the corresponding button Sorting options may concern only the attributes included in the query Condition x and Y structure ciy vlike__ v SUUCLUTEN A structure address Structure name add structure locality structure city Fig 4 2 Add Condition panel Sorting Options x structure name Y Ascendent Y add Fig 4 3 Add Sorting Options panel 6 At any step of this sequence the query is automatically obtain
68. s see Fig 3 9 62 8 HOTEL HOTEL ADDRESS HOTEL NA HOTEL CITY HOTEL FAX_ NUMBER HC Via De Fusari 9 BOLOGNA 051227179 Via Oberdan 12 BOLOGNA 051262679 Via Ferrarese 161 BOLOGNA 051372960 Via Aurelio Saffi 36 BOLOGNA 0516492426 Via S Donato 161 1 BOLOGNA 0516332366 Via Indipendenza 47 TREVECCHI BOLOGNA 051224143 Via Magenta 8 10 CITY HOT BOLOGNA 051372032 Via D Pignattari 11 COMMER BOLOGNA 051224733 Via dell Indipende DONATEL BOLOGNA 051248174 Via IV Novembre 10 OROLOGIO BOLOGNA 051260552 Via Aurelio Saffi 36 ELITE BOLOGNA 0516492426 Via Mazzini 45 BLUMEN BOLOGNA 051345439 Via Luigi Serra 7 IL GUERCI BOLOGNA 051369893 Via Marco Emilio L AMADEUS BOLOGNA 051405933 Via Indipendenza 8 GRAND H BOLOGNA 051234840 Via Montebello 8 ROYAL H BOLOGNA 051249724 Via P Pietramellara SOFITEL BOLOGNA 051249421 Via Alessandro Sto CORTICEL BOLOGNA 051324702 Via C Boldrini 4 EUROPA BOLOGNA 051247988 Min Mew at 97 caicnn DAIL ACMA nrininnge m 2 0 W UYU WU WU in a Yu a YU Y W WU Y WU UU a fe Fig 3 8 Data Preview Mapping Table structure structure globalSource campings guidacampeggi hotel saperviaggiare hotels venere a Fig 3 9 Mapping Table of structure global class 63 LQ1 guidacampeggi campings SELECT campings name campings city campings fax FROM campings WHERE city Bologna LQ2 venere hotels SEL
69. s of the Project Generation Process are 1 Local Source Schema Extraction Section 2 Global Schema Generation Section 2 5 Once completed the integration process it is possible to pose query on the obtained Global Schema by using e the Query Manager Interface e the MOMIS QM Web Interface Global Schema Generation Sources Annotation N Project Local Source New Global ew Frojec Schema Extraction Schema Semantic Relationships Mapping Refinement Fig 2 4 Project Generation Process 2 3 How to Create a new Project Click Project gt New Project and in the displayed dialog box Fig 2 5 enter a name for your project click OK The system will open the Local Source Schema Acquisition wizard see Section 2 4 2 4 Local Source Schema Extraction The extraction of the local source schema is performed by wrappers A wrap per is a software component that logically converts the source data structure 16 Enter the Project name Fig 2 5 New Project dialog Local Source Schema Extraction This wizard will guide you through the local source schema extraction process Please select the type of local source you want to connect to Oracle database PostgreSQL database JDBC source JDBC ODBC source Microsoft Office Excel file CSV file Fig 2 6 Local Source Type Selection 17 into the ODLysdata language Let s see how to extract the local source schema Click Pro
70. se note that in this version of the software if a condition 73 is not satisfied no errors and or no warnings will be displayed Condition 1 For a GC you need to select Join Attributes or to defined explicitly join conditions If no JoinAttribute is defined for a GC and no Join Condition is explic itly defined see next section then no join condition is defined for the full outer join operation that then corresponds to a Cartesian Product As a consequence no fusion is performed Condition 2 For each selected JoinAttribute if for the corresponding element JoinAttribute LocalClass is null then all Join Condition related to this LC are considered true Condition 3 For each selected JoinAttribute no Resolution Function needs to be defined Condition 4 A more subtle condition which need to be satisfied by the Join Attribute is that the corresponding local attributes must be a key in all local classes see Section B Warning 1 For a GC you need to select Join Attributes or to define explicitly join conditions If no JoinAttribute is defined for a GC and no Join Condition is explicitly defined then no join condition is defined for the full outer join operation and thus no fusion is performed Warning 2 For each selected JoinAttribute the element JoinAttribute LocalClass of the mapping table must be not null for each LC belonging to the GC If for the selected JoinAttribute the element JoinAttribute LocalClass is null th
71. sed by the system 13Synsets in WordNet are ranked in the order of their utilization frequency 37 used meaning STOP WORDS temporary lodgings REMOVAL AND x STEMMING in the country for travelers or campings vacationers camping camp temporary living quarters specially BASE FORM ee for soldiers Fig 2 25 Automatic Annotation Process Notice that the automatic annotation will not try to annotate the ele ments you have preaviously manually annotated so you will not lose your work 2 7 4 Annotation Importer The reuse of previous annotations is an important feature For this rea son a tool for easily importing source annotations from a GS to another GS has been developed If one of the sources has been annotated already in another global schema you can import the source annotation by right clicking on the source name select Import source annotation from another global schema and selecting the schema from which you want to import the annotations Fig 2 26 For example we create a new global schema named NewGlobalSchema and we want to import the annotation of the local source guidacampeggi so we right click on the source name and choose Import Source Annotation from another Global Schema choose TourismGlobalSchema and you will see that all the annotations for that source have been imported An example Let s see how all these tools are useful with an example In the annotation
72. sents the attribute class that you want to annotate and click on Add annotation button a new window will be opened If the term exists in WordNet or in the domain glossary then the base form is automatically proposed by the system If you don t find it satisfactory you can enter another one Then click on Add base form button and choose among the possible meanings of the base form Fig 2 19 You can select one or more meanings If the term does not exists in WordNet or in the domain glossary then the system shows a message ATTENTION lemma doesn t exist into WordNet database in this case you have three possible choices 30 Vimeo mese Select one base form from the list or add a new one Add base form _ Remove base form Open WordNet Extender Select one or more senses BI a language unit by which a person or thing is known his name really is George Wash N a person s reputation he wanted to protect his good name N family based on male descent he had no sons and there was no one to carry on his r N a well known or notable person they studied all the great names in the history of Fra N by the sanction or authority of halt in the name of the law N a defamatory or abusive word or phrase Y assign a specified usually proper proper name to They named their son David Th y nive the name or identifvina chararteristire nf refer ta hv name nr came nther identifvi wi
73. sters the Global Classes will be created and loaded in the Global Sources tree of the Mapping Refinement section Fig 2 34 Depending on which configuration you chose the system will build different clusters It s your job to decide which configuration is better for your sources integration process Clustering Settings Presets L Default Preset 1 Preset 2 o 100 80 50 50 50 50 50 Restore Generate Clusters Fig 2 32 Global Classes Generation In Fig 2 33 we report the icons used by the GUI and their intended meanings In this section you can also see the Mapping Table generated by the System for each Global Class whose columns represent the local classes be longing to the GC and whose rows represent the global attributes of the GC The element GlobalAttribute LocalClass of the Mapping Table represents 45 Local Source O Global Source Local Class Global Class l Local Attribute El Global Attribute J Local Attribute on which a Transformation Function has to be defined F Local Attribute on which a Transformation Function has been defined Join Attribute Global Class on which the join attributes have been defined Fig 2 33 Global Schema Editor Icons Legend the set of local attributes of LC which are mapped into the GlobalAttribute in this way a GlobalAttribute may correspond to one o more local attributes of a LC the element GlobalAttribute LocalClass is empty if no LA of LC is
74. t FROM lt global interface reference gt lt comma gt lt global interface reference gt lt global interface reference gt lt global interface name gt lt correlation specification gt lt correlation specification gt AS lt correlation name gt lt where clause gt WHERE lt search condition gt lt search condition gt lt boolean term gt lt search condition gt OR lt boolean term gt lt boolean term gt lt boolean factor gt lt boolean term gt AND lt boolean factor gt 86 lt boolean factor gt lt boolean primary gt lt boolean primary gt lt predicate gt lt left paren gt lt search condition gt lt right paren gt lt predicate gt lt comparison predicate gt lt like predicate gt lt null predicate gt lt join predicate gt lt comparison predicate gt lt attribute reference gt lt comp op gt lt value expression gt lt comp op gt lt equals operator gt lt not equals operator gt lt less than operator gt lt greater than operator gt lt less than or equals operator gt lt greater than or equals operator gt lt null predicate gt NOT IS NULL lt attribute reference gt lt like predicate gt lt attribute reference gt LIKE lt pattern gt lt pattern gt lt quote gt lt value expression gt lt quote gt lt join predicate gt lt attribute reference gt lt comp op gt lt attribute refe
75. tation Semantic Relationships Mapping Refinement po Fig 2 34 Mapping Refinement Panel 47 Global Attributes blue icon and by opening each node the local attributes that are mapped on it yellow icon If you right click on a GA and choose Remove Global Attribute the at tribute will be removed and the previously local mapped attributes are moved in the Unmapped Elements panel on the right Attributes from the Un mapped elements panel can be mapped into a GA moving them on it using Drag amp Drop 48 2 9 1 Join Functions From the Mapping Refinement interface you can impose join conditions be tween local classes mapped on a GC by right clicking on a GC choosing edit Join Function writing the function in the Join Function Panel and save it The system can provide a default Join Function as shown in Fig 2 35 n al Y Join Function about structure Join Function saperviaggiare hotel full outer join venere hotels on venere hotels hotel_name a saperviaggiare hotel name AND venere hotels city saperviaggiare hotel city full outer join guidacampeggi campings on guidacampeggi campings name saperviaggiare hotel name AND guidacampeggi campings city saperviaggiare hotel city OR guidacampeggi campings name venere hotels hotel_ name AND guidacampeggi campings city venere hotels city Get Default Join Function Global Attributes and Mapped Local Att
76. the size of the integration project Another drawback is due to the fact that designers can see the global result of the integration only at the end of the overall integration process and it is only at that time that they can refine mappings in order to improve the integrated schema To overcome these problems a first result of integration is semi automatically derived by MOMIS and proposed to the designer in few minutes she he can then improve this integration result through an iterative refinement process and a set of features Some of these features are listed below e a GUI that facilitates the integration process e a set of explore and preview tools that allow the designer to preview the integration result during each phase e the possibility to create different unified views to explore the global result of the data integration process e a suite of tools to semantically annotate data sources w r t a common lexical reference these tools allow the designer to import export the local source annotations and permit to extend the lexical reference itself with domain glossaries e a preview of the query plan that allows the designer to visualize for each executed global query the set of queries that compose the query plan 1 2 Data Integration Process and MOMIS Ar chitecture MOMIS builds a unified schema called Global Schema GS of several het erogeneous data sources also called local sources and allows users to for mulate
77. tion Semantic Relationships Mapping Refinement Fig 3 1 Launch the Query Manager The Query Manager interface lets you compose run and save queries It is composed by 57 e The global source panel see 1 in Fig 3 2 e The query text field see 2 in Fig 3 2 e The result panel see 3 in Fig 3 2 The global source tree helps you write the query by clicking the global source tree nodes To execute the inserted query click on Run Query button and you can see the result appear in the tabular panel see 3 in Fig 3 2 Y MOMIS TourismProject prj Project Global Schema 4 Help a Li et led Bal Ex Source Bla D I TourismGlobalschema QM TourismGlobalSchema 3 Query Manager E uri strin 1 je string mi B map string detailed_map string id string Query Result 3 records NAME ADDRESS JOLLY HOTEL LEONARDO DA VINCI Via dei Gracchi 324 JOLLY HOTEL MIDAS Via Aurelia 800 JOLLY HOTEL VITTORIO VENETO Corso Italia 1 Global Schema Explorer O E TourismGlobalSchema 063610138 0666418457 068841104 Fig 3 2 The Query Manager Interface In the MOMIS system a global query i e a query over the GS is expressed by using the OQL s language an extension of the ODMG OQL language Let us execute the global query reported below see 2 in Fig 3 2 SELECT name address phone_number fax FROM structure W
78. ttribute let us suppose that K1 is not a key in a local class say LL2 i e LL2 K1 A This means that inLL2 we can more than one record with the same value for K1 M2 a2 2 MM 12 12 12 la 123 a2 123 la 23 a2_23 ls 23 a2 23 BIS 6 123 a2 123 BIS Now the result of the view is as follows select isnull isnull LL1 K1 LL2 K1j LL3 K1 as K1 coalesce LL1 A LL2 A LL3 A AS a FROM LL1 full join LL2 on LL1 Ki LL2 K1 full join LL3 on LL1 K1 LL3 K1 or LL2 K1 LL3 K1 A3_3 123 A1 123 123 A1_123 LE 13 A1_ 13 23 A2 23 23 A2 23 BIS 2 az 2 12 a1_ 12 1 Ali We can observe that for each repeated value in LL2 K1 we have a re peated value in K1 of the global class and due to the resolution function 78 some of these repeated values see second and third records in the above result can have the same A value To illustrate several scenario and how to define Mapping Table Resolution Functions and Join Condition for these scenario Customer1 Code Address1 Code primary key Customer2 Code Address2 Code primary key Customer3 Code Address3 Code primary key 1 The Join Attribute Code is key in each local class a A unique address chosen among addresses from local sources on the basis of some criteria in this case Address local attributes are mapped onto a unique global attribute Address defined by an appropriate resolution function b A unique address containing all a
79. tuples referring to the same real world entity If two tuples satisfy a join condition imposed over the corresponding relations then the two tuples are assumed to be semantically equivalent If they differ on cor responding attributes attributes that are mapped to the same attribute in the global schema then a correct value is obtained by applying appropriate conflict resolution functions As an example let s take the global class structure and suppose that two local classes are mapped on it L venere hotels Lo saperviaggiare hotel If the global attribute name is chosen as Join Attribute i e JA name then 12 JC L1 Lo Li hotel name Lo name If the global attributes name and city are chosen as Join Attributes i e JA name and JAs city then JC L1 L2 L1 hotel name Lo name AND L city Lo city On the basis of these Join Conditions multiple records coming from dif ferent local classes and representing the same real world object are combined in a single record by means of a full outer join operator Resolution Functions The second step in a Data Fusion process is Data Reconciliation i e to solve conflicts among instantiations of the same object in different sources In the Data Fusion process considered in the MOMIS system conflicts may arise for global attributes mapped onto more than one LC Data Reconciliation is then performed by Resolution Functions in fact as explained in the previous section f
80. ty_name hotels url id city rating logo hotel_name user_rating am 983808808199 Fig 2 8 Tables and Attributes Selection 19 As you can see in Fig 2 7 you have to enter the correct parameters to establish the connection with the database server e Server Domain the server hostname e Port the port number e Username and Password Then by clicking the Connect button the Database list will be populated Choose the database name you want to connect to from the list Click the Next button to see the database tables and attributes Depend ing on the integration needs you can choose to acquire by selecting the tree nodes a subset of the source tables and attributes Fig 2 8 A Data Pre view functionality gives you access to the data stored in the different tables By selecting a table and clicking on Data Preview you are allowed to explore the first one hundred records of the table or of a single attribute and see the table total records number Fig 2 9 Yam a Origin ven hotels first 100 records SA Table records number 493 city rating hotel_name address price postal_ Bologna Corticella Via Ale Bologna Motel Marco Polo ww Via Ma Bologna w ideale w Via Elis Bologna we Millennhotel w Via Bol Bologna w Del Borgo we Via Ma Bologna we Kennedy Via Fos Bologna w arcoveggio we Via Lio Bologna we Grand Hotel Elite w
81. ution Functions durare de non d a pere 56 3 Querying the Global Schema 59 3 1 Query Manager Interface DA ee Ee biere Les 59 3 2 Query Saving and Loading 0033 5 2a PG iti ire 61 sro Query Plan Viewer do orta Lenta ei Be 62 Querying the Global Schema with the MOMIS QM Web In terface 67 4 1 Query Composition and Execution 67 4 2 Visualize Results in a Mapa See ey ne See Ed ee ds 69 a sig aloe II a en Da 71 LA lt Mapping table wes ee a Bae a E SES ld 71 Data Transformation Functions and Join Function Theoret ical Background 73 B Data Fusion Theoretical Background 79 C Query Unfolding Theoretical Background 85 C 1 Query Unfolding steps 85 G2 Multiple Class Queries Lujan gd Bee gd 87 The OQL query language syntax 89 List of Figures 1 1 1 2 1 3 1 4 1 5 2 1 2 2 2 3 2 4 2 5 2 6 2 7 2 8 2 9 2 10 2 11 2 12 2 13 2 14 2 15 2 16 2 17 2 18 2 19 2 20 2 21 2 22 2 23 2 24 The Data Integration Process 4 The MOMIS Data Integration Process 5 Mapping Refinement else ape aaa ea 6 The MOMIS architecture ss su oa Adee 4 Se ua T7 The WISDOM Project dataset so dr Lauda ra 9 MOMIS Welcome Papers pirlo pe aa a 12 MOMIS Main Window e Peet Eee a 13 M nu bar dir Oe oh Se aa ha aac aoi BO ia 14 Project Generation Process 16 New Project dialog co La ah A esate et GM eels ba 17 Local Source Type Selection t
82. where telephone telephone a document over the travelers can pay telephone line for lodging and meals an establishment tht provides lodging and usually meals and r services for travelers and other paying quests Fig 2 21 Using WordNet Extender for exending WordNet database Both nouns and verbs are organized into hierarchies defined by hypernym relationships By using the WordNet Extender you will be able to 1 extend WordNet by inserting a new association between an existing lemma and an existing synset Fig 2 21 1 for example we could as sociate the lemma telephone which is already present in WordNet with one of the synsets of the lemma phone_numbe 2 extend WordNet by inserting a new lemma and associate it with an existing synset Fig 2 21 2 for example we could create the new lemma winter_contact which is not present in WordNet and associate it with the synsets the number is used in calling a particular telephone This means that the synset associated to animal is a hypernym of the synset asso ciated to dog 33 3 extend WordNet by inserting a new lemma and a new synset for it Fig 2 21 3 for example we could create the lemma fax_number and associate it with a new synset with gloss the number is used to send a document over the telephone line 4 extend WordNet by inserting a new synset and associate it with an existing l
83. y new relationships This operation is important because if meaningless or incorrect relationshipis are inserted the subsequent inte gration process can produce a wrong global schema Inferred relationships MOMIS exploits a Description Logic reasoner 14A concept represented by the synset X is a meronym of a concept represented by the synset Y if in English we can assert X is a part of Y From the Hypernym Graph Viewer you can notice that hotel name and facility name share the same father name 42 ODB Tools f to infer new relationships by applying equivalence and sub sumption computation For example in the Common Thesaurus we have Structural relationship guidacampeggi facility RT guidacampeggi campings Lexical relationship guidacampeggi facility syn venere facility Thanks to ODB tools the system automatically infers a new relationship venere facility rt guidacampeggi campings In Fig 2 30 you can see the Semantic Relationships section Fes TT EE Project Global Schema Help CISL all 3 Source Explorer 1D TourismGlobalSchema 23 Global Schema Designer Semantic Relationships amp Annotation 4 Overview Mapping Compute Structural and Lexical Rels Compute Inferred Rels Ein _ TourismGlobalSchema 2 ada paete 3 Filter results by Producer Source Type Destination Annotation Semant
84. ym Synset Insert one or more similar lemmas The search will be performed in OR number Select a synset from below Found 25 synset s Search synsets by inserting similar lemmas or search synsets that contain in the gloss the inserted keywords 3 7 Lemma s figure number number Gloss the property possessed by a sum or total a concept of quantity involving zer a short theatrical performance b one the number is used in calling a particular tel a symbol used to represent a number he one of a series published periodically she a select company of people I hope to beco wnt a numeral or string of numerals that is used f wn a clothing measurement a number 13 shoe wn the grammatical category for the forms of no wn dui wo numeral number issue number number identification_number number number number a New Synset Click Finish to insert the new synset the new relationships and map the inserted synset on theinzerted lemma New Synset Lemma Name fax_number Synset Insert the gloss for the new synset Relationships the number is used to send a document over the telephone line Syntactic Category noun Search Hypernym Synset RelationshipType Lemma s is a Hyponym of telephone_number Gloss the number is used in calling a wn Exte Relationships that you are going to add into the WordN
85. zone_map Fl zone_map Add Rel Close Fig 2 31 User provided Relationships Interface The GUI also allows you to view filtered relationships by Producer Source Type SYN RT BT NT or Destination see 5 in Fig 2 30 Note each time you modify any annotation all the Semantic Relation ships have to be computed again It is not necessary to erase them all before computing 44 2 9 Mapping Table Creation and Refinement The first step of the generation of a GS is the automatic creation of clus ters classes describing the same or semantically related concepts in different sources are grouped together in clusters using hierarchical clustering tech niques for each cluster a global class with a set of global attributes and a mapping table expressing mappings between local and global attributes is defined After generating the clusters we can interactively refine and complete the proposed integration result By means of this interface we can set the parameters used by MOMIS to compute the clusters Fig 2 32 Starting from the left the first 3 parameters set the weight of the different relationships and the other 4 parameters set the threshold of affinity among local classes that have to be mapped together in the same global class depending on the local class names and structures You can set those parameters in order to obtain the more appropriate global schema If you click on the button Generate Clu

Download Pdf Manuals

image

Related Search

Related Contents

  Votre Avocat vous informe - Conseil national des barreaux  Flite 312 and 315 Flite 332 and 335 Flite 382  EZN Serie  Bosch PEX 300 AE  panasonic lumix dmc-sz1 User guide manual operating instructions  DataSite Accelerator Toolkit - Literature Library  InFocus LP930 User's Manual  E - Pecomark  Type 8605 - Bürkert Fluid Control Systems  

Copyright © All rights reserved.
Failed to retrieve file