Entropy and the Future of the Web

January 19, 2010

Inspired by this post of Chris Dixon, I summarized my thoughts on the future of the web in a single tweet like this:

The fundamental question that will shape the future of the web is how we deal with entropy.


The disorder of the web thrives on both content and their connections. Today’s approach to the web of tomorrow depend on how we address this issue. The figure below shows what we can expect as a result of different combinations of low and high entropy in the two layers.

Entropy in the content layer reflects the degree of internal disorder. If we choose to lower content entropy through the addition of relevant metadata or structure we’ll realize the semantic web. If we don’t then content will remain unorganized and we’ll end up in the noisy web.

Entropy in the connection layer expresses disorder in the network of content. By defining meaningful relations between content elements connection entropy will decrease leading to the synaptic web. Should we leave connections in their ad-hoc state, we’ll arrive in the unorganized web.

The study of web entropy becomes interesting when we take a look at the intersections of these domains.

  • Semantic – synaptic: The most organized, ideal form of the web. Content and connections are thoroughly described, transparent and machine readable. Example: linked data.
  • Semantic – unorganized: Semantic content loosely connected throughout the web. Most blog posts have the valid semantic structure of documents, however, they’re connected by hyperlinks that say nothing about their relation. (Say, whether a blog entry extends, reflects on or debates the linked one.)
  • Noisy – synaptic: Organizes high entropy content by connecting relevant elements via meaningful relations. Among others, tagging, filtering, recommendation engines and content mapping fall into this domain.
  • Noisy – unorganized: Sparse network of unstructured content. This is the domain we’ve known for one and a half decades where keyword based indexing and search still dominates the web. If it continues to develop in this direction then technologies such as linguistic parsing and topic identification will definitely come into play in the future.

Which one?

The question is obvious: which domain represents the optimal course to take? Based on the domains’ description semantic – synaptic seems to be the clear choice. But we’re discussing entropy here and from thermodynamics we know that entropy grows in systems that are prone to spontaneous change and order is restored only at the cost of energy and effort.

Ultimately, the question comes down to this: are we going to fight entropy or not?

Bringing the semantic web into existence is an enormous task. To me, fighting the reluctance of people to adopt the use of metadata and semantic formats is unimaginable. The synaptic web seems more feasible as the spreading of social media already indicates. But in the end what matters is which domain or combination of domains will be popular among early adopters. The rest will follow.