Data vs. Information

Back when Y2K was still “a thing” and I was working full time as a technical trainer and web consultant, I had a disagreement with a student in one of my classes about data vs. information. The student insisted that they were one and the same, and I insisted they were different. My problem, however, was that even though I knew I was right, I wasn’t sure how to explain the difference in a way that would capture the essential nature of why data and information really are different.

I mentioned this dilemma in an email conversation with a long-time friend who knows more about databases than I ever will. His response is below, and it remains one of the best discussions about data, information, data management, and database design that I have ever read.

B


Data vs. Information

By Michael D. Ballard, Programmer Analyst | May 31, 2000

Data is simply bunches of numbers, letters, non-printing characters, etc. stored somewhere.  We could even make the case that, in a computer, data is simply ones and zeroes stored somewhere (I’ll get to “why” in a minute). Hopefully, that data is arranged in a manner that permits easy access and update.  Note that this is NOT a requirement, only a suggestion.

Information is data together with a context that gives it meaning.  For example:

  • 201131725 is data
  • 201-13-1725 is a Social Security number
  • 20.113.17.25 is an IP Address
  • $2,011,317.25 is the price of a very nice home

Often, context exists in layers.  For example, if you have a business that sells multiple products to multiple consumers, you probably have separate spreadsheets or files or tables or databases for recording data that relates to your customers, products, vendors, shippers, accounts payable, accounts receivable, etc.  If you had some sort of tool that could wander through the raw data on your disk and it found the string of numerals ‘201131725’, that string would have no inherent context and therefore no meaning.  If your tool could somehow figure out that this string is contained in a particular field of a particular record of a particular file in a particular database (e.g.: Tax ID field of record 101 in the Customer file of the Sales Order database for Itty Bitty Machine Company), you could then see all of its ‘layers’ of context which give it meaning and make it information.

One of the issues with getting information out of data is what radio or sound engineers call Signal-to-Noise ratio.  In general, it is highly desirable to arrange data in a manner that allows useful meaning to be understood without extraneous ‘noise’ being included.  For example, when you want to know information about customers, you probably want to know things like Name, Address, Phone number, and Credit Status.  It is probably not useful to know the color of the walls of the customers’ offices (unless you sell interior wall paint).

Achieving a high Signal-to-Noise ratio requires planning by the Database Designer before the first bit of data comes in.  It requires maintenance by the Database Designer and the Database Administrator.  It requires a suitable User Interface (usually the job of an Analyst or Programmer).  It requires training of the end-users so that they know what is required of them and what they can get back from the system.  It requires feedback from the end-users so that the Database Designer, the Database Administrator, the Analyst, the Programmer, and the other end-users can all usefully contribute to the evolution of a useful Information Store (a repository of useful and accessible information).

It is possible for all of these roles to be played by a single person.  It is also possible that any of these roles may be played by more than one person.  The most important thing is to recognize that each of these roles need to be played by someone in order to achieve an optimum result.