The New DBpedia Release Cycle: Increasing Agility and Efficiency in Knowledge Extraction Workflows

Video Forum

Since its inception in 2007, DBpedia has been constantly releasing open data in RDF, extracted from various Wikimedia projects using a complex software system called the DBpedia Information Extraction Framework (DIEF).
For the past 12 years, the software has received a plethora of extensions by the community and produced releases with over 14.4 billion triples in 2016.
Due to the increase of size and complexity the release process was facing many delays thus impacting the agility of the development.
In this paper, we describe the new DBpedia release cycle including our innovated release workflow, which allows development teams (in particular those who publish large, open data) to implement agile, cost-efficient processes and scale up productivity.
To address these challenges, the DBpedia release workflow has been redesigned so that its primary focus now is on productivity and agility while the quality aspect is assured by implementing a comprehensive testing methodology.
We run experimental evaluation and we argue that the implemented measures increase agility and allow for cost-effective quality-control and debugging and thus achieve a higher level of maintainability.
As a result of our innovation, DBpedia now produces on average over 13.5 billion triples per month with a minimal publishing effort.

Video

The New DBpedia Release Cycle: Increasing Agility and Efficiency in Knowledge Extraction Workflows

Speakers: 

Interested in this talk?

Register for SEMANTiCS conference
Register