Skip to content

Data Quality Constraints for the Semantic Web

Today, I have published a SPARQL query library that contains generic queries for the identification of certain type of data quality problems, namely

  • syntax violations,
  • functional dependency violations,
  • illegal values,
  • uniqueness violations,
  • value range violations, and
  • missing values.

The constraints library is based on the SPIN framework. In conjunction with SPIN it facilitates the easy definition of data quality rules which can be tested immediately against the own data set. Some of the constraints, such as for the identification of functional dependency violations, have been designed to make use of other Semantic Web sources, so that the manual effort for the identification of poor data remains limited.

The data quality constraints library can be found at http://semwebquality.org/ontologies/dq-constraints# . This work is based on our research paper “Using SPARQL and SPIN for Data Quality Management on the Semantic Web“. An overview presentation is available on slideshare. As soon as I find the time I will publish a short documentation and how to guide on www.semwebquality.org. To get quickly started without coding you can import the library into TopBraid Composer, a Semantic Web tool that is also available in a free version. Feedback is highly appreciated!