Real-Time Data Linkage via Linked Data Event Streams

Last Updated on March 21, 2023 by Editorial Team

Author(s): Samuel Van Ackere

Originally published on Towards AI.

Seamless and efficient interlinking of a data stream across various domains and applications

Real-time interchanging data across domains and applications is challenging; data format incompatibility, latency and outdated data sets, quality issues, and lack of metadata and context.

Using Linked Data Event Stream (LDES), data can be fluently shared between different systems and organizations. In this way, companies and organizations can ensure that their data is well-structured, interoperable, and easily consumable by other systems and services. LDES has emerged as a standard for representing and sharing up-to-date data streams.

This proof of concept explores how LDES can interchange and combine three different data sets: address registers, parcels, and building units assuming different organizations publish them.

In this way, companies and organizations can ensure that their data is well-structured, interoperable, and easily consumable by other systems and services.

If you are not familiar with Linked Data Event Stream, you can find more information about it in this article:

Linked Data Event Streams explained in 8 minutes

What is a Linked data event stream, and how can it help you make your dynamic data accessible?

medium.com

Onboarding of a Linked Data Event Stream

Every data stream can finally be transformed into a Linked Data Event Stream with some conversions. In this example, we start from data streams outputted via Apache Kafka.

A data flow is configured in Apache Nifi that converts these Kafka topics to an LDES server. The LDES server is a configurable component used to ingest, store, transform, and (re-)publish a Linked Data Event Stream.

Publishing data of addresses, building units and parcels as LDES (Image by author)

In this case, data streams from parcels, building units, and data registry from Flanders are converted into an LDES to interchange and combine afterward. These LDES all have the same structure, are always in sync with the source data, and are presented in a machine-readable format, including metadata and relationships, enabling automated discovery and integration of information across various domains and applications. The figure underneath shows one part of the address registry LDES fragment. It embeds the context of the data and information about the city or municipality, street name, full address, and location.

Interlink multiple Linked Data Event Streams

The value of publishing data sets through an LDES specification lies in the easy interlinking of data between companies and organizations over the Web. With this capability, data consumers can consume various data sets using an LDES Client, enabling a streamlined approach to accessing diverse data sources.

For this purpose, the three Linked Data Event streams are stored in a GraphDB to facilitate efficient and effective data consumption.

GraphDB supports complex semantic queries and inference, making discovering meaningful relationships between different data sources possible.

In this Github repo, you will find a docker file with the configuration of GraphDB and Apache NiFi. A data flow is configured in Apache NiFi to convert these data streams into GraphDB. You will find an Apache NiFi configuration file containing the necessary data flow.

Once the Apache NiFi data flow is executed, the LDES members will flow into GraphDB. Subsequently, the data can be queried via SPARQL.

GraphDB is a powerful tool for storing and querying graph data. Still, it also offers a range of visualization possibilities that allow users to explore and understand their data in new and intuitive ways. One of the most popular visualization features in GraphDB is Graph Explorer, which provides an interactive view of a graph and allows users to navigate and search through it in real-time. This can be particularly useful for understanding complex relationships between entities, such as in social networks or biological systems.

SPARQL query

SPARQL is the standard query language and protocol for Linked Open Data on the web or for RDF triplestores. SPARQL enables the execution of semantic queries through data, offering a flexible and powerful method for searching, filtering, and retrieving structured information.

Below is an example of a SPARQL query is shown, which returns triples with information on parcels.

PREFIX gebouwregister: <https://basisregisters.vlaanderen.be/implementatiemodel/gebouwenregister#>
PREFIX adres: <https://data.vlaanderen.be/id/adres/>
PREFIX perceel: <https://data.vlaanderen.be/id/perceel/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX ns: <https://data.vlaanderen.be/ns/generiek#>

CONSTRUCT{

?perceel rdf:type ?type .
?perceel prov:generatedAtTime ?generatedAtTime .
?perceel ns:lokaleIdentificator ?lokaleIdentificator .
?perceel ns:naamruimte ?naamruimte .
?perceel ns:versieIdentificator ?versieIdentificator .

} where {
?perceel gebouwregister:Adresseerbaar%20Object adres:2327687 .
OPTIONAL { ?perceel rdf:type ?type .
?perceel prov:generatedAtTime ?generatedAtTime .
?perceel ns:lokaleIdentificator ?lokaleIdentificator .
?perceel ns:naamruimte ?naamruimte .
?perceel ns:versieIdentificator ?versieIdentificator .}
# ?perceel gebouwenregister:Perceel%3Astatus ?status .}
}

Combining multiple data streams via their semantics

The three data streams contain unique address IDs linked respectively to the parcel, building unit, and address registry data streams. Through the use of SPARQL queries, we are able to link these data streams semantically.

In this way, developing a REST API that retrieves an RDF containing details on addresses, parcels, and building units becomes possible. Using LDES as an intermediate data format, this API ensures that the output always remains synchronized with the source database.

Image by author

Full code

To replicate the whole proof of concept (LDES 2 GraphDB), please go to this Github repository. It describes how to set up the GraphDB and Apache NiFi via docker, after which the data flow can be started using the supplied Apache NiFi setup file.

GitHub – samuvack/ldes-grar

In this Proof of Concept, an Rest API is developed that makes it possible to harvest information of building units and…

github.com

If you like what you read, be sure to ❤️ it — as a writer, it means the world. Stay in touch by following me as an author.

Contributors to this article are ddvlanck (Dwight Van Lancker) (github.com), sandervd (Sander Van Dooren) (github.com) at Flanders Smart Data Space (Digital Flanders, Belgium). In a rapidly changing society, governments need to be more agile and resilient than ever. Digital Flanders realizes and supervises digital transformation projects for Flemish and local governments.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Real-Time Data Linkage via Linked Data Event Streams

Author(s): Samuel Van Ackere

What is a Linked data event stream, and how can it help you make your dynamic data accessible?

Interlink multiple Linked Data Event Streams

SPARQL query

Combining multiple data streams via their semantics

Full code

GitHub – samuvack/ldes-grar

In this Proof of Concept, an Rest API is developed that makes it possible to harvest information of building units and…

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Boost your Local Business with Data Analysis and Web Scraping

The 2025 AI Revolution: 10 Breakthroughs That Will Change Your Life

AI in Medical Imaging: A Life-Saving Revolution or Ethical Minefield?

AI in Medical Imaging: A Life-Saving Revolution or Ethical Minefield?

AI in Medical Imaging: A Life-Saving Revolution or Ethical Minefield?

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Real-Time Data Linkage via Linked Data Event Streams

Author(s): Samuel Van Ackere

What is a Linked data event stream, and how can it help you make your dynamic data accessible?

Interlink multiple Linked Data Event Streams

SPARQL query

Combining multiple data streams via their semantics

Full code

GitHub – samuvack/ldes-grar

In this Proof of Concept, an Rest API is developed that makes it possible to harvest information of building units and…

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement