XML Overview

By: Stephen Patrick | 27 May 2016 | Category: XML Introduction

Related Contents

XML Overview What is XML ? Well XML stands for eXtensible markup language. So what is a markup language, well a markup language is a language that has features we use to augment text. It enables us to give semantics to text.

What Is A Markup Language Let’s start by considering natural language. In, particular lets consider English. English in general is very ambiguous we can create sentence structures that conform to a well defined grammar, or we can create sentences that don’t conform to a a strict grammar and they still have meaning.

Consider the very simple English sentence:

The boy ran.

Here we have the definite article the, we also have a noun and we have a verb ran. Let’s move on and consider this from a computing perspective. Let’s say we are tasked to write a computer program to identify the verbs and nouns in the given sentence. Is it simple, well, you might say with a little AI , maybe I can do it. But, let’s not go down that road, looking at the sentence in its literal form, could a computer identify what each word is. Well, it can’t it needs knowledge.

Now consider the alternative.

   <sentence>
       <words>
           <word type="article">The</word>
           <word type="noun">Boy</word>
           <word type="verb">Ran</word>
       </words
   </sentence>

Looking at the above what do you think? Is it easier to identify the different categories of words in the sentence. Moreover, could a person who has no grasp of the English language identify the categories, you bet!

So a markup language, allows us to provide semantics, meta data to some text. In computing we have many different types of markup languages. For example, as you a reading this article you are viewing it thanks to the HTML markup language. At a high level, the text of this article is surrounded by HTML tags that tell the web browser how to display the text. There are many other types of markup languages, but we won’t discuss those here.

XML is eXtensible

Well its in the name, eXtensible markup language. Why is XML extensible? Lets consider a scenario, Let’s say you have two computer programs siting on both sides of the world in different continents. They need to communicate to share and process data. In general they share predefined types of messages. Just like a computer network they need common protocols to communicate. But wouldn’t it be great if we can define our own language with its own constraints to say what exactly a message can contain. If the message sent from computer A to computer B does not conform to the rules of our language we know this message is invalid. We can report it as an error, ignore it etc.

This is exactly what XML allows us to do, it is extensible in that we can define new elements and constraints on those elements. Consider our marked up sentence from earlier.

   <sentence>
       <words>
           <word type="article">The</word>
           <word type="noun">Boy</word>
           <word type="verb">Ran</word>
       </words
   </sentence>

Yes the above is XML and we have defined a language. We have defined a language, by declaring a sentence to have a number of words and a word has a type (category) i.e. article, noun, verb etc. But how do we enforce that language. Well, in XML we have two main options we can use a document type definition DTD, or we can use an XML schema. Moreover, an XML schema is an XML document and both have to conform to a strict set of rules defined by its language. In the case of XML schema the XML schema must conform to its schema. So, if we are sending the above message to a computer, the computer could validate it based on its DTD or XML schema. We will talk more about DTD’s and schema’s later but for now its enough to know they define the rules of a language and allow us to create a language, also its worth mentioning a schema is more flexible than a DTD.

In this section we gave a brief overview of XML, we looked at what XML fundamentally is. Also we mentioned that XML is an extensible language. We said that XML is a Markup language. A Markup language is a way of providing semantics to text. Markup languages are ubiquitous in computing. They are commonly found when exchanging messages. For example, consider using your favorite app, maybe Facebook, or whats up. How do you think they are exchanging messages to your phone, tablet, or browser ?