RiskTech Forum

Ayasdi: Topology & Topological Data Analysis

Posted: 3 October 2016  |  Source: Ayasdi

Topology is the branch of pure mathematics that studies the notion of shape. Topology takes on two main tasks, namely the measurement of shape and the representation of shape. Both tasks are meaningful in the context of large, complex, and high dimensional data sets. They permit one to measure shape related properties within the data, such as the presence of loops, and they provide methods for creating compressed representations of data sets that retain features, and which reflect the relationships among points in the data set. The representation is in the form of a topological network or combinatorial graph, which is a very simple and intuitive object to work with using graph layout algorithms.

This whitepaper will explore the main properties of topological analysis and explain how shapes are measured and represented. It also shows how TDA provides a framework for machine learning and why it provides a way to understand the overall organization of the data directly.

Why Topological Data Analysis?

Topology within mathematics can be characterized as that part of the subject that studies notions of shape. It consists of at least two separate threads, one in which one attempts to “measure” shape, and in the other in which one attempts to find compressed combinatorial representations of shape and analyze the degree to which these representations are faithful to the shape. The first proceeds primarily via algebraic invariants, such as homology and homotopy groups, to measure and count the instances of particular patterns within the shape in a suitablyn systematic way. The second is the subject of a great deal of manifold topology, and is exemplified by the work on the “Hauptvermutung” concerning the existence of a common subdivision of any two triangulations of manifolds.

Both these threads have been extended to the world of point clouds of data. The measurement aspect is extended via the theory of persistent homology and its variants. The second one is extended by various simplicial complex constructors, such as Vietoris-Rips complexes, witness complexes, and the complexes constructed by the Ayasdi platform. In ordinary topology, the role of the combinatorial representations is to lend additional concreteness to the study of the shape, as well as to provide a succinct representation of it. They serve the same purpose in the study of high dimensional and complex data sets, in that they provide a compressed representation of the data that retains information about the geometric relationships between data points. The representations are also easy to work with, so they provide extremely useful and simple ways to interrogate the data, and to understand the driving variables characterizing various subgroups. At a high level, one can say that they allow for easy identification of coherent groups within the data. The search for coherent groups, performed naively, is a clearly intractable problem since it requires searching through the collection of all subsets in the data set.

Ultimately, both sets of ideas will be useful in permitting investigators to study their data. The representations are at the forefront, because they are what a user deals with directly. As we move further into automation, the measuring of the shape of a data set and of the complex outputs of the Ayasdi platform will be critical, since we will want, for example, to test Ayasdi constructions for the presence of geometric features such as flares and loops, so as to provide the user the best possible “quick analysis,” automatically building complexes for the user without requiring by hand selection of parameter values, metrics, and lenses.

Please register or log in to download the report.