Why another binary format?

About UJO

Efficiency in data serialization formats is clearly an important issue. Binary formats are much simpler and more efficient than any text format like JSON or XML no matter how much effort is put into optimizing. UJO combines a wide variety of data types and a flexible hierarchical structure with a simple and clear interface.

Data as Text on the Internet

With the invention of HTML a text format with a simple hierarchical data structure was born and started to become one of the most popular data containers in the history of computer science. The advantage of hierarchical objects and a syntax, that is easy to read by machines, was quickly recognized and lead to the invention of XML. Syntactically almost similar to HTML it does not define tags with a fixed meaning. Any application that uses XML can store data and define tags for any purpose. This extensibility made XML the format of choice for web applications in the past.

JSON is more efficient than XML

Today JSON is replacing XML in web applications. One of the arguments is efficiency, because JSON uses less characters to store the same data. Other possible factors to support JSON are availability and simplicity. JSON data can easily be read by a browser application without the need for additional libraries, because it’s JavaScript syntax. The clearly defined data types simplify the specification of interfaces. For JSON interfaces it is sufficient to describe a specific value as numeric. In XML both communication peers have to commit on the text representation of the value, because any node in XML is text only.

The Internet of Things

In the last few years a new kind of Internet is discussed. The Internet of Things is the idea of attaching any device to the network and build a huge system of sensors and actors. In this context small embedded devices with limited resources like CPU power and memory, are used. The so called constrained devices raised a problem that was almost forgotten while computers got more and more resources. Parsing engines for data need CPU power and adding overhead to data on the network consumes energy.

Binary is logical

In the beginning of the age of computer, before the internet, data was stored in a binary format. What is a binary format? Well, it’s not a text format! It can contain text as well, but the data is organized in a different way. There are many binary formats we use every day, like images, audio files, or proprietary special purpose formats. The efficiency of these formats is unmatched because the data can be stored without the restrictions of a text format. No encoding or decoding is necessary and the type conversion effort is minimalist. It seems to be the logical choice to use a binary format for applications in a constrained environment.

Why another binary format ?

What we require to connect the classic Internet with the Internet of Things is a hierarchical binary data format with many different data types. At least data types used in databases should be available. There are some formats around already, each with a slightly different emphasis on the aspects of storing binary data. The primary goal of UJO is to provide many different data types to support efficient and reliable automatic conversion without the loss of information. For example many different numeric types and special types to express date and time are available. Different types for Unicode strings support embedding of different text encoding in a single file. A table container is available for efficient storage of a database query result.

Why the name UJO ?

The name UJO means container in Esperanto. The reason to choose this name is simple. It is short and can be used as a file extension without the need to create an abbreviation and it was still possible to find an unused domain name. One goal of this project is to create implementations and language bindings for as many programming languages as possible to exchange data between many platforms and applications. We thought this is a good match to Esperanto as it was driven by the idea to exchange information between people all over the world. We are aware of the fact that Esperanto never really made it as a universal language compared to English, but at this starting point the idea counts.

Container

UJO means container and a container in the context of UJO is a list, a map or dictionary and a table. Simply speaking any data structure that is assembled from a combination of atomic types like numeric values, string, Boolean, etc. Containers in UJO can be nested. This means for example, a container can contain a dictionary, a table or any other container to create a hierarchical data structure. A valid UJO Document includes at least one container. This top level container can be chosen from any valid UJO container type.

Try it!

Try UJO to figure out if it is helpful for your project. You can download the sources on GitHub. To learn more about the internal structure of UJO, you can refer to the UJO specification. You can find additional information on the UJO documentation page. If you have questions or want to get involved subscribe to the UJO mailinglist. For professional support, send an email to support@libujo.org.