Skip to content

Data are more than character collections

DataDuring my research I have searched for a definition of data. Most of the times I found definitions that say something like “data is a collection of characters according to predefined syntax rules” (cf. Bodendorf 2005). But does this technical definition really define what data is? Doesn’t it rather say what data looks like? And how does data relate to information then? Is information really data + context / semantics? Of course, when I interpret data, I use my context knowledge and add semantics to it. But isn’t there a much more obvious relationship between data and information?

Data is materialized information.

While thinking many times about it, I finally found a simple, yet useful definition of data. I discovered the definition while thinking of the basic purposes of an information system: People use information systems to store and retrieve information. For this purpose, information is materialized into physical data, e.g. when people fill out forms. So thinking this way data is materialized information. And yes, it is of course also a collection of characters…

Do you agree? What is data for you? Please, send me your comment!

3 Comments

  1. I quite like definition of “data” from the Suggested Upper Merged Ontology, which states that “data point” or “datum” is “an item of factual information derived from measurement or research” (http://sigma.ontologyportal.org:4010/sigma/WordNet.jsp?synset=105816622). I think “data” covers observations, measurements, and records describing the physical or social reality. It may also provide models and conceptualizations of reality for describing other data (i.e., metadata). A key facet of data is that it is digital, which makes it amenable to automated computer processing.

    Saturday, July 21, 2012 at 11:30 pm | Permalink
  2. The SUMO definition is not bad, but it unnecessarily tries to precisely define the scope of data. It says “information derived from measurement or research”. In my opinion, this does not apply to master data, e.g. the color of a car or the procurement channel of a certain good, since I cannot directly see how this information is derived from measurement or from research.

    Monday, July 23, 2012 at 3:44 am | Permalink
  3. You brought up an interesting point. There is a whole other type of data that is “prescriptive” rather than “descriptive”, e.g., gr:Offerings don’t describe existing reality, they create/prescribe new data describing the commitments of the creator. I guess the content facet of “data” should be treated in a broader way than the one in which it is treated in SUMO, and focus more on the facet of form.

    Tuesday, July 24, 2012 at 12:22 pm | Permalink