Simplicial Data Analysis: theory, practice, and algorithms

Patania, Alice

doi:10.6092/polito/porto/2670783

Simplicial complexes store in discrete form key information on a topological space, and have been used in mathematics to introduce combinatorial and discrete tools in geometry and topology. They represent a topological space as a collection of ‘simple elements’ (such as vertices, edges, triangles, tetrahedra, and more general simplices) that are glued to each other in a structured manner. In the last 40 years, they have been a basic tool in computer visualization for storing and classifying different shapes of 3d images, then in the early 2000s these techniques were success- fully applied to more general data, not necessarily sampled from a metric space. The use of techniques borrowed from algebraic topology has been very successfull in analysing data from various fields: genomics, sensor analysis, brain connectomics, fMRI data, trade net- works, and new fields of application are being tested every day. Regrettably, topological data analysis has been used mainly as a qualitative method, the problem being the lack of proper tools to perform effective statistical analysis. Coming from well established techniques in random graph theory, the first models for random simplicial complexes have been introduced in recent years, none of which though can be used effectively in a quantitative analysis of data. We introduce a model that can be successfully used as a null model for simplicial complexes as it fixes the size distribution of facets. Another challenge is to successfully identify a simplicial complex which can correctly encode the topological space from which the initial data set is sampled. The most common solution is to build nesting simplicial complexes, and study the evolution of their features. A recent study uncovered that the problem can reside in making wrong assumption on the space of data. We propose a categorical reasoning which enlightens the cause leading to these misconceptions. We introduce a new category for weighted graphs and study its relation to other common categories when the weights are chosen in a poset. The construction of the appropriate simplicial complex is not the only obstacle one faces when applying topological methods to real data. Available algorithms for homological features extraction have a memory and time complexity which scales exponentially on the number of simplices, making these techniques not suitable for the analysis of ‘big data’. We propose a quantum algorithm which is able to track in logarithmic time the evolution of a quantum version of well known homological features along a filtration of simplicial complexes.

PORTO @ Archivio Istituzionale della Ricerca