YAML, which stands for "YAML Ain't Markup Languge", is a
data serialization format,
which can be used to represent data or message.
This article provide a background on YAML as well as a comparison between YAML and XML using the Maven POM as a example.
What is YAML?
While there are
many of data serialization format, the strengths of YAML is the focus on human
readability and still provide rich constructs for representing different data structure.
In YAML, each data stream can consists of multiple YAML data, each data is separated by the '---'
marker. And '...' can be used to signal an end of stream. Hash prompt '#' is used to start an
comments which ended by the end-of-line. Data in YAML can be represented in three different ways:
Scalar, Associative Array and Sequence.
For associative array (aka Map), it can be represented in this way:
name: Xyz Conference maxParticipants: 20 startTime: 2010-10-14 08:30:00.00 breakfastProvided: yes
Alternatively, associative array can also model with JSON style (as a short form):
[ name: Xyz Conference, maxParticipants: 20, startTime: 2010-10-14 08:30:00.00, breakfastProvided: yes]
For sequence, it can be represented in this way
- Java - Python - Ruby
Alternatively, sequence can also model in JSON style (as a short form):
[ Java, Python, Ruby ]
The short form syntax in YAML also imply that, JSON document is compliant with YAML.
To get a taste of YAML, below is a personal organizer modeled using YAML:
--- # Document which representing my personal organizer # Tasks tasks: - summary: Prepare project proposal tags: [School, Urgent] update time: 2010-10-14 18:59:23.17 completed: true - summary: Arrange a gathering tags: [Friend, Someday] update time: 2010-10-14 18:59:23.17 completed: false # Memos memos: - content: > Precedence of UrlMappings for status code is defined by lexical order. While regex based URL mapping is defined by the precedence rules. update time: 2010-10-14 18:59:23.17 archived: false - content: > Pattern-Oriented Software Architecture Volumn 2 http://www.cs.wustl.edu/~schmidt/POSA/POSA2/ update time: 2010-10-10 18:59:23.17 archived: true ...
From the example above, it is extremely easy to comprehend the content. It consists of two tasks and two memos. The structure of each task and memo is also readily understandable. In this example, several strengths of YAML is demostrated.
Richer information model used in YAML.
Compare with XML which each documents consists of element node. YAML document consists of four types of node, which are sequence, scalar, mapping and alias. While you can certainly model sequence or key to value pair in XML, it is more clumsy when compare to the one modeled in YAML.
In YAML, indentation is used to to present the hierarchy structure of data. Child elements will have more indentation then it's parent node. However, only whitespace character (ascii x20) is used for indentation, tab character (ascii x09) is not allowed.
Readily defined type
In YAML, some commonly used data types are already defined, which include String, Time, Boolean, Integer Number and Floating Point Number.
While in XML, you can use DTD or Schema to defined the data types for each of element, for this typing is offered from YAML, any YAML document will be benefits with this offering.
A full reference of the data supported in YAML can be found here.
Flexibility on encoding String
In YAML, string can be encode without any quote. In additions, you will find that, the use of tag in YAML is minimized. It make it perfert to encoding messages block in a YAML documents. For example, we can easily put a HTML or XML block, or some code fragments in the YAML document without resorting escape the characters in the text.
In XML, if you want to encode a HTML fragment, or a Math formula, or a code fragment, you will need escape some reserved characters with escape sequence.
Space efficiency
As YAML do not rely on balanced mark up (like XML) to represent the data structure, most of the document payload is representing the data itself, instead. Compare with XML, each tag must have a balanced closing tag, the size used for representation a piece of data in YAML is usually lesser than XML. Below, a direct XML and YAML comparison will demonstrate this points
Well, YAML is full of strengths, but it don't means it will be a perfect choice for representing all kind of data or messages. As always, no one tools can serve all purpose and YAML is no exception. YAML does have some weakness, below are some key issues:
Lack of schema definition
YAML doesn't provide any facility to define the schema and the semantics of the Document.
If different parties are going to interchange a YAML document, the exact specification of the document must be specified with somewhere else.
Compare with XML, in which the schema can be defined clearly with XML Schema or Relax NG, YAML just doesn't provide this kind of feature.
Lack of supporting technology
YAML doesn't provide any "query language" (like XQuery, XPath in XML) to support querying YAML.
Lack of namespace support
YAML does not provide any namespace mechanism. Without a namespace concept, it also imply there is not possible to unambiously weaving different YAML documents.
Comparing with XML directly
While XML are the most commonly used data format, there are some problem. One frequently heard complaint on XML is the space efficiency.
The following is a sample Maven POM file which describe a project / module build configuration
<project> <modelVersion>4.0.0</modelVersion> <name>Maven Default Project</name> <repositories> <repository> <id>central</id> <name>Maven Repository Switchboard</name> <layout>default</layout> <url>http://repo1.maven.org/maven2</url> <snapshots> <enabled>false</enabled> </snapshots> </repository> </repositories> <pluginRepositories> <pluginRepository> <id>central</id> <name>Maven Plugin Repository</name> <url>http://repo1.maven.org/maven2</url> <layout>default</layout> <snapshots> <enabled>false</enabled> </snapshots> <releases> <updatePolicy>never</updatePolicy> </releases> </pluginRepository> </pluginRepositories> <build> <directory>target</directory> <outputDirectory>target/classes</outputDirectory> <finalName>${artifactId}-${version}</finalName> <testOutputDirectory>target/test-classes</testOutputDirectory> <sourceDirectory>src/main/java</sourceDirectory> <scriptSourceDirectory>src/main/scripts</scriptSourceDirectory> <testSourceDirectory>src/test/java</testSourceDirectory> <resources> <resource> <directory>src/main/resources</directory> </resource> </resources> <testResources> <testResource> <directory>src/test/resources</directory> </testResource> </testResources> </build> <reporting> <outputDirectory>target/site</outputDirectory> </reporting> <profiles> <profile> <id>release-profile</id> <activation> <property> <name>performRelease</name> </property> </activation> <build> <plugins> <plugin> <inherited>true</inherited> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-source-plugin</artifactId> <executions> <execution> <id>attach-sources</id> <goals> <goal>jar</goal> </goals> </execution> </executions> </plugin> <plugin> <inherited>true</inherited> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-javadoc-plugin</artifactId> <executions> <execution> <id>attach-javadocs</id> <goals> <goal>jar</goal> </goals> </execution> </executions> </plugin> <plugin> <inherited>true</inherited> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-deploy-plugin</artifactId> <configuration> <updateReleaseInfo>true</updateReleaseInfo> </configuration> </plugin> </plugins> </build> </profile> </profiles> </project>
If it is modeled with YAML, it will looks something like this:
project: modelVersion: 4.0.0 name: Maven Default Project repositories: - id: central name: Maven Repository Switchboard layout: default url: http://repo1.maven.org/maven2 snapshots: enabled: false pluginRepositories: - id: central name: Maven Plugin Repository url: http://repo1.maven.org/maven2 layout: default snapshots: enabled: false releases: updatePolicy: never build: directory: target outputDirectory: target/classes finalName: ${artifactId}-${version} testOutputDirectory: target/test-classes sourceDirectory: src/main/java scriptSourceDirectory: src/main/scripts testSourceDirectory: src/test/java resources: - directory: src/main/resources testResources: - directory: src/test/resources reporting: outputDirectory: target/site profiles: - id: release-profile activation: property: name: performRelease build: plugins: - inherited: true groupId: org.apache.maven.plugins artifactId: maven-source-plugin executions: - id: attach-sources goals: goal: jar - inherited: true groupId: org.apache.maven.plugins artifactId: maven-javadoc-plugin executions: - id: attach-javadocs goals: goal: jar - inherited: true groupId: org.apache.maven.plugins artifactId: maven-deploy-plugin configuration: updateReleaseInfo: true
One notable difference amongst these two document is, the YAML one is more readable. When looks into details, the total size for the XML version is 2941 bytes while the size for the YAML is only 1714 bytes, which is effectively 42% reduction in size. In additions, the number of lines in the XML version is 110 lines, while the YAML version is only 68 lines, which is 38% reduction.
The difference is mainly due to the fact that, YAML doesn't requires a balance end tag. In additions, YAML natively providing constructs to support sequence and maps as top level elements. For example, two model lists of repository in XML, we need two tags (repositories and repository) to module a list, and two extra lines for putting the end tags. In YAML the end tag is never needed.
Conclusion
Considering both the pros and cons, YAML has great advantage on the human friendiness and clarity, while the lack of schema / namespace support will prevent it from being employed on large scale or complex environment.
No comments:
Post a Comment