PoolParty Semantic Integrator (https://www.poolparty.biz/) is a world-class semantic technology suite for organizing, enriching, and searching knowledge, which is available on the market for more than 10 years. UnifiedViews (http://unifiedviews.eu), available as an open source tool and also as part of PoolParty Semantic Integrator product, became in the last years a widely used and accepted solution for management of RDF data processing and integration tasks.
In my talk I will focus on data integration tasks in PoolParty Semantic Integrator. In the first part of my talk, I will briefly introduce UnifiedViews, its role in PoolParty Semantic Integrator, and I will present different types of data integration tasks users need to realize when integrating data. In the second part, I will describe selected data integration use cases PoolParty Semantic Integrator currently support - for each such use case, I will clearly describe (1) the goal of the use case, (2) what was reached, and (3) lessons learned. In the last part of my talk, I will outline next steps we plan regarding support for data integration tasks in the PoolParty Semantic Integrator suite.
PoolParty Semantic Integrator is a world-class semantic technology suite for organizing knowledge; it provides several components, such as PoolParty Thesaurus Manager for management of knowledge graphs, PoolParty Extractor for extracting terms and concepts from input documents, PoolParty GraphSearch for searching among annotated documents, and UnifiedViews, an Extract-Transform-Load (ETL) tool, for executing RDF data processing and integration tasks.
UnifiedViews allows users to define, execute, monitor, debug, schedule, and share RDF data processing and integration tasks. The data processing tasks may employ custom plugins (data processing units) created by users. UnifiedViews differs from other ETL frameworks by natively supporting management of RDF data/Linked Data.
In my talk I will focus on data integration tasks in PoolParty Semantic Integrator. Such tasks integrate external RDF and non-RDF data to existing (RDF) data available in the knowledge graph maintained by PoolParty Thesaurus Manager (PoolParty knowledge graph). Data integration tasks may involve following steps: (1) schema mapping, (2) entity linking, and (3) data fusion. Schema mapping ensures that RDF and non-RDF data sources are mapped to the target (RDF) schema expected by the user and maintained by PoolParty Thesaurus Manager. Entity linking enriches existing terms in the PoolParty knowledge graph with links to external RDF sources (which may already have but do not need to have schema aligned). Data fusion is needed when consolidating representations of terms in the PoolParty knowledge graph and those obtained from external RDF source(s); such consolidation usually requires to solve data conflicts - e.g., different sources of information about Austrian cities claim different population of city of Linz. All the steps of the data integration process should be accompanied with quality assessment procedures, which ensure that results with sufficient precision are produced.
In the first part of my talk, I will introduce different types of data integration tasks users need to process in PoolParty Semantic Integrator and I will present basic concepts of UnifiedViews and its role in PoolParty Semantic Integrator. In the second part, I will describe selected data integration use cases PoolParty Semantic Integrator and UnifiedViews currently support - for each such use case, I will clearly describe (1) the goal of such use case, (2) what was reached, and (3) lessons learned. In the last part of my talk, I will outline next steps we plan regarding support for data integration tasks in the PoolParty Semantic Integrator suite.