Lean and Mean – Under the Spiderwiz hood

February 9, 2020

Spiderwiz promise is to “free the massive programming resources that are required just to move the data around rather than dealing with the data itself”. In this post we will take a close look at how this is done within the Spiderwiz runtime engine.

Moving data from producers to consumers in a distributed system includes four aspects:

Serialization: converting the data to a series of bits that can be received, deserialized and interpreted by the receiver.
Delivery: transferring the data from its producer to the appropriate receiver.
Integrity: ensuring that the consumer gets the entire data it expects in the right order.
Efficiency: ensuring that the consumer gets only the data that it needs in the quickest way and without bloating the network with ballooned overhead, repetitions and burrs.

All of these challenges are tackled by the Spiderwiz runtime transparently, and in the view of a programmer that uses the Spiderwiz API – effortlessly. Let’s see how it happens.

Serialization. As explained in the previous post, the data in a Spiderwiz system consists of Data Objects that are described in a Data Object Class Library. A data object class contains property definitions and annotations that dictate the serialization format of an object of the specified type. The class definition, used by both producers and consumers, forms the interface that binds the endpoints to the same serialization format. Programmers need only commit changes at the producer side and handle events at the consumer side and the runtime does everything in between.

Delivery. A Spiderwiz-based network is a network of Spiderwiz-based applications, which are connected via some sorts of communication channels that are supported by the runtime. Currently the supported channels are TCP/IP sockets, WebSockets and sequential disk files (the latter mainly for debugging). Other types of channels can be added as Communication Plugins. Any network topology is allowed, provided that consumers of specific data types are connected to the producers of that type either directly or through one or more applications that are configured as a “hub”. Network connections are defined in the configuration files of the applications and do not require programming.

Data routing is also based on the Data Object paradigm. Every application that starts up manifests the data object types that it produces and the types that it consumes. The runtime engines (in plural, because there is an engine at the core of each application and the engines cooperate) match producers and consumers and route the data between them. There is no need for more than one communication path between two endpoints. Data objects of different types are multiplexed into a single data stream, while the “hub” applications demultiplex, reroute and remultiplex them as necessary.

As explained in the previous post, data delivery is not limited to one-to-one communication paths, therefore the routing mechanism described here can use unicast or multicast as necessary.

Integrity. The complex data delivery mechanism described above needs to cope with a range of challenges, of which data integrity is a major one. Since data routing is done on a per object basis, and considering that network topology is unrestricted and can easily turn into a mammoth spaghetti, there is a need to ensure that a stream of data objects is reconstructed at the consumer side in the same order as it is produced, with no duplications, definitely without endless circulation throughout the network, and that if a deficiency is detected then the producer of the missing objects is notified and completes them. This is exactly what Spiderwiz runtime engine does.

Efficiency. Although network capacity nowadays seems to be endless, there are many situations in which bandwidth is still a concern. This is the case when remote sites are connected through communication lines that are relatively slow, as in the World Wide Web in general. We all know that, despite the spectacular surge in communication speeds, digital media – DTV, YouTube, Netflix and all these fabulous services that fill up the cyber space – would not be what they are without video compression that reduces frames of thousands of bits each to few bytes.

This is a domain in which Spiderwiz has a striking impact. The split of data streams to small units of Data Objects allows wise distribution techniques, such as the multiplex and multicast described above, that substantially affect the overall network bandwidth.

Object-Based Data Compression

But this is just the beginning. The outstanding efficiency of Spiderwiz as a bandwidth shrinker comes from the way it exploits the concept of object-based data compression.

Properties of objects in the real world – location of a car, estimated arrival time of a flight, stock value in a stock exchange market, to name a few – tend to change gradually over time in small discrepancies, if at all. As Spiderwiz is about distribution of data objects that represent real world objects, it applies a mechanism that compresses data object streams by tracking changes in individual objects and encoding them on the wire, much like video streams are compressed by tracking and encoding the discrepancies between successive frames.

Consider for instance a service that is used to retrieve the estimated arrival time (ETA) of every commercial flight around the globe. To do its work, the service needs to communicate with the system of every airline in order to obtain the information relevant to that airline. These, in turn, need to communicate with each of their airplanes to obtain the information for the specific flight. Using standard protocols, the way to do that is to poll every airline at a frequency that depends on how much we want the data to be up to date, and get back a dump of the data for the entire airline fleet. Assuming that there are about 50,000 flights worldwide every day, and that the data size for every flight, including overhead, is at least 50 bytes, then we have at least 2.5M bytes on the wire on each poll. A polling frequency of 10 seconds would make it 15M bytes running on the wire every minute, which is equal to 2.5M bits per second.

In practice most services do not work like that. Services like the Air Traffic Organization (ATO) implement proprietary interfaces and protocols that allow data streaming in a much more efficient way. But proprietary protocols need cooperation on both sides, which is not always feasible, and they require a lot of development work on both sides on a per project basis.

The problem is solved entirely if all the involved parties use Spiderwiz. To take the ETA service example above, how many ETAs around the world are changed in average within one minute? 100? 200? Definitely not more than that. And since we talk about conveying discrepancies, each update would not take more than a few bytes. All together, we will not have more than one thousand bytes per minute instead of the 15 millions calculated above. That’s a compression ratio of %0.0067, and the data is always up to date by a fraction of a second. All this without any need for cooperation, and without writing a single line of code!

Thread Management

Another thing that happens under the hood of the Spiderwiz runtime engine is thread management. As an event-driven engine, it allocates separate execution threads to handle different kinds of events. One of the most resource consuming operations in Java is the spawning of new execution threads. For this reason, the engine never spawns threads on the fly. All of them are created during application initialization, and a sophisticated queuing mechanism is applied in order to execute all tasks quickly, efficiently and with the least resource consumption as possible. This is too transparent to the programmers, who would rarely need to spawn their own execution threads.

You can read more about this in Lesson 10 of the tutorial.