Statistical vs. Structural Pattern Recognition

A Literature Review

Abstract

This paper surveys the different approaches in pattern recognition (PR). After the fundamental idea of PR is stated, a taxonomy landscape is presented which divides into three families, namely statistical, structural, and hybrids. The first represents a well-researched topic in PR which engendered popular and efficient supervised and unsupervised pattern discovery algorithms. The second family addresses techniques to find patterns in structurally represented data using graphs that allow capturing the information of relationships among objects. Thirdly, the hybridization of the prior two families will be discussed. This includes the elaboration of transformation methods that allow to embed a graph into a vector space using graph kernels or graph embedding.

Keywords: Statistical Pattern Recognition · Structural Pattern Recognition · Graph Kernels · Graph Embedding

Introduction

PR methods aim to find patterns in data which are hard or even infeasible to discover for humans. Although there are many different approaches to accomplish this, it always requires the derivation of a similarity or dissimilarity indicator among the data objects. Based on this measurement, arrangements can be built that group similar data objects together and distance dissimilar data objects from themselves. In classification disciplines, these groups are commonly referred to as classes, in clustering disciplines clusters.

The science of recognizing patterns in data started to emerge in the late sixties. A popular paper by Fu (1980) summarized the early developments of PR. Nowadays, PR has become an established and easily accessible tool which can be observed in many popular applications. For instance, recognition of faces (Hazim Barnouti, Sameer Mahmood Al-Dabbagh, & Esam Matti, 2016), creation of customer behaviour patterns (Koudehi, Rajeh, Farazmand, & Mohamad, 2014), detection of cancer cells (Ozdemir & Gunduz-Demir, 2013), or forecasting time series data (Aghabozorgi, Seyed Shirkhorshidi, & Ying Wah, 2015).

A crucial part of PR is the representation of the underlying information in order to be processed by a computational algorithm. In general, one distinguishes between statistical and structural data. Either case has advantages and disadvantages regarding the task of emerging patterns in this data. Furthermore, the applicable algorithmic tool repository differs as well. This survey constitutes popular PR methods in both data representations and reviews ideas to link them together.