Creating a network map in the studio
HolonIQ creates a Network, or visual map, of your data. The Network will allow you to quickly understand semantic similarities and themes within your documents.
Each node, or circle, in a Network represents one unique document. Similar nodes will be grouped together in the same Cluster. A Cluster is a group of nodes with similar language.
Nodes that have similar language will be connected with a line, or link, and will be closer to each other in the network. The HolonIQ Network is organized to minimize the distance between any two connections. Documents on the bottom of the network will be highly different from those on the top. Tightly grouped Clusters indicate that the documents are highly similar in their language.
Network Theory is a powerful way to analyze and visualize complex relationships. At HolonIQ, we’ve focused on leveraging this field to help you better and more quickly understand qualitative information. HolonIQ's unsupervised learning approach will categorize documents based on their semantic similarity. Exploring these relationships through a network allows you to understand similarities, make comparisons, and reveal multi-dimensional relationships.
Nodes & Connections
Each node, or circle, represents a unique document from the data source. For a company network, each node will represent a company; for a news network, each node will represent a unique article… Connections represent a strong semantic similarity between the documents, indicating that they are covering the same topic. Nodes, by default, are sized by degree, which is the number of connections a node has. More intuitively, a higher degree node will share a lot of language with nearby nodes and is therefore more representative of that area of the network.
Distance & Orientation
Distance in a HolonIQ network is relative, as the cardinal direction and the spatial distance does not have a direct interpretation. You can think of each node as a charged particle that wants to repel all of the other nodes, and the links as springs that keep all of the particles from spreading too far away from each other. The more similarities between two nodes (i.e., the greater the similarity between two companies’ technologies), the stronger the connections, and therefore the tighter the springs, between them. When you view an entire network, you see a map where similar nodes (companies, articles, or patents) group together, and dissimilar nodes end up far away from one another.
A cluster is a set of nodes that group, or “cluster,” together because many of them are connected due to sharing a high degree of similar language. When you first look at a network, you can see its clusters divided according to color, as determined by HolonIQ algorithms.
When HolonIQ builds a network, it compares each node to every other node. If two nodes have enough semantic similarity, there is a connection drawn between them. Highly similar nodes often all connect to one another, and they therefore land near one another in the network visualization. This relatively dense, defined group of nodes is a cluster (network theorists also use the term “module”). On the scale of a whole network, which has hundreds or thousands of nodes with varying levels of congruence, numerous clusters emerge, and the clusters provide a fundamental framework for analyzing the network.
The density of the nodes in a cluster correlates to the average similarity between the nodes. The more dense a cluster appears, the more similar its nodes. Likewise, the more spread out a cluster appears, the broader its mix of nodes. Some clusters have a dense core and a tail that extends away from the core. The core of these clusters contain relatively uniform nodes (i.e., very similar language across companies, patents, or news articles), and the nodes along the tail decrease in likeness as the distance from the core increases. A dense cluster is quick to interpret. You can assume a central node in a dense cluster will be very representative of its surrounding nodes. For companies, this implies that all companies are defining their business and market in very similar ways, and are likely direct competitors.
Nodes & Clusters Without Connections
Nodes without any connections have highly unique language that did not significantly match to any other documents in your result set. While these “orphan” nodes will still contain your search query, they did not have a shared theme across any of the other documents in the result set.
You may also see clusters which are not connected to the rest of the network. Similar to orphan nodes, these “orphan” clusters will share a niche topic amongst the underlying documents, but have differentiated language from the rest of the network. For example, you may see “orphan” clusters around specific product announcements if you are querying the news database. These do not pertain to the broader conversation around your search query (e.g. a technology search like “machine learning”).
Bridging Nodes & Bridging Clusters
Bridging nodes are nodes which span across portions of the network. These are often insightful nodes to explore given that they are sharing language across two different themes of documents. In a company network, this may be a company applying a technology in a novel way, or merging two mature technologies to tackle new problem. Similarly, bridging clusters will help identify how different sets of documents relate to one another. Central clusters will likely be core to an underlying search. For example, in a robotics network, we may see “machine learning” or AI as the central cluster with more vertical specific clusters on the periphery. Bridging clusters in between will highlight the gaps between specific applications and the core technologies.