Image source: Daniel Villatoro
In this blog/article, I expand on the idea of ‘Small data’.
I present a generic model for Small data combining Deterministic and Predictive components
Although I have presented the ideas in context of IoT(which I understand best) – the same algorithms and approach could apply to domains such as Retail, Telecoms, Banking etc
We could have a number of data sets which may be individually small but it is possible to find value at their intersection. This approach is similar to the mobile industry/ foursquare scenario of knowing the context to provide the best service/offer etc to a customer segment of one. That’s a powerful idea in itself and a reason to consider Small Data. However, I wanted to extend the deterministic aspects of Small data (intersection of many small data sets) by also considering the predictive aspects. The article describes a general approach for adding a predictive component to Small data which comprises of three steps: a) A limited set of features are extracted, b) Their dimensionality is reduced(ex using clustering) and c) finally we use a classification and a recognition method like Hidden Markov Models to recognize a higher order metric (ex walking or footfall)
Last week, I gave an invited talk on IoT and Machine Learning at the Bigdap conference organized by the Ontic project . The Ontic project is a EU FP7 project doing some interesting work on Big Data and Analytics mainly from a Telco perspective.
The audience was technical and was reflected in the themes of the event which (for example : Techniques, models and algorithms for Big data, Scalable Data Mining and Machine learning techniques and mechanisms, Big Data Security and Privacy challenges, Cleaning Big Data (noise reduction), acquisition & integration, Multidimensional Big Data, Algorithms for enhancing data quality.)
This blog post is inspired by some conversations following my talk with Daniel Villatoro (BBVA) and Dr Alberto Mozo (UPM/Ontic). It extends many of the ideas and papers I referenced in my talk.
In his talk, Daniel referred to ‘small data’ (image from Slides used with permission). In this context, as per slide, Small data refers to the intersection of various elements like customers, offers, social context etc in a small retailer context. Small data is an interesting concept and I wanted to explore it more. So, I spent the weekend thinking more about it.
When you have data elements, the concept of small data is a deterministic. It is similar to the mobile industry/ foursquare scenario of knowing the context to provide the best service/offer etc. Thus, given the right datasets, you can find value at the intersection. This works even if the individual Data sets are small as long as you find enough intersecting datasets to create a customer segment of one at their intersection.
That’s a powerful idea in itself and a reason to consider Small Data.
However, I wanted to extend the deterministic aspects of Small data (intersection of many small data sets) by also considering the predictive aspects. In the case of Predictive aspects, we want to infer insights from relatively limited data sets
In addition, I was also looking for a good use case to teach my students @citysciences. Hence, this blog will explore the predictive aspects of Small data in an IoT context
I believe the ideas I discuss could apply to any scenario (ex retail/banking) and indeed also to Big Data sets
The examples I have considered below strictly apply to Wireless Sensor Networks(WSNs). WSNs differ from IoT because there is potentially communication between the nodes. The topology of the WSNs can vary from a simple star network to an advanced multi-hop wireless mesh network. The propagation technique between the hops of the network can be routing or flooding. In contrast, IoT nodes do not necessarily communicate between each other in this way. But for the purposes of our example, the examples are valid because we are interested in the insights inferred from the Data.
Predictive characteristics of Small data
From a predictive standpoint, I propose that Small data will have the following characteristics:
1) The Data is missing or incomplete
2) The data is limited
3) Alternatively, we have Large data sets which need to be converted to a smaller data set to make it more relevant(ex a small retailer) to the problem at hand
4) The need for inferred metrics i.e. higher order metrics derived from raw data
This complements the deterministic aspects of Small data i.e. finding a number of data sets to identify the value at their intersection even if each data set itself may be small(Small data)
So, based on papers I reference below, I propose three methodologies that can be used for understanding Small data from a predictive standpoint
1) Feature extraction
2) Dimensionality reduction
3) Feature Classification and recognition
To discuss these in detail, I use the problem of monitoring physical activity for assisted living patients. These patients live in an apartment under a privacy-aware manner. Here, we use sensors and infer behaviour based on the sensor readings but yet want to protect the privacy of the patient
The papers I have referred to are (also in my talk):
- Activity Recognition Using Inertial Sensing for Healthcare, Wellbeing and Sports Applications: A Survey – Akin Avci, Stephan Bosch, Mihai Marin-Perianu, Raluca Marin-Perianu, Paul Havinga University of Twente, The Netherlands
- Robust location-aware activity recognition: Lu and Fu
This problem is a ‘small data’ problem because we have limited data, some of it is missing (not all sensors can be monitoring at all times) and we have to infer behaviour based on raw sensor readings. We will complement this with the deterministic interpretation of Small Data (where we accurately know a reading).
Small data: Assisted Living Scenario
source Robust Location-Aware Activity Recognition Using Wireless Sensor Network in an Attentive Home Ching-Hu Lu, Student Member, IEEE, and Li-Chen Fu, Fellow, IEEE
In an assisted living scenario, the goal is to recognize activity based on the observations of specific sensors. Traditionally, researchers used vision sensors for activity recognition. However, that is very privacy invasive. The challenge is thus to recognize human behaviour based on raw readings / activity from multiple sensors. In addition, in an assisted living system, the subject being monitored may have a disorder (for example Cognitive disorders or Chronic conditions).
The techniques presented below could also apply to other scenarios – ex to detect Quality of Experience in Telecoms or in general for any situation where we have to infer insights from relatively limited data sets(ex footfall)
The steps/methods for retrieving activity information from raw sensor data are: preprocessing, segmentation, feature extraction, dimensionality reduction and classification
In this post, we will consider the last three i.e. feature extraction, dimensionality reduction and classification. We could use these three techniques for situations where we want to create a predictive component for ‘small data’
Small data: Extracting predictive insights
In the above scenario, we could extract new insights using the following predictive techniques (even when we have less data)
1) Feature extraction
Feature extraction takes inputs from raw data readings and finds find the main characteristics of a data segment that accurately represent the original data. The smaller set of features can be described as abstractions of raw data. The purpose of feature extraction is to transform large quantities of input data into a reduced set of features. This smaller set of Data is represented as an n-dimensional feature vector. This feature vector is then used as an input to a classification algorithm.
2) Dimensionality Reduction
Dimensionality reduction methods aim to increase accuracy and reduce computational effort. By reducing the features involved in the classification process, less computational effort and memory are needed to perform the classification. In other words, if the dimensionality of a feature set is too high, some features might be irrelevant and do not even provide useful information for classification.The two general forms of dimensionality reduction are: feature selection and feature transform.
Feature selection methods select the features, which are most discriminative and contribute most to the performance of the classifier, in order to create a subset of the existing features. For example: SVM-Based Feature Selection select several most important features and conclude that 5 attributes would be enough to classify daily activities accurately. K-Means Clustering is a method to uncover structure in a set of samples by grouping them according to a distance metric. K-means clustering algorithms rank individual features according to their discriminative properties and their co-relationships.
Feature Transform Methods : Feature transform techniques try to map the high dimensional feature space into a much lower dimension, yielding fewer features that are a combination of the original features. They are useful in situations where multiple features collectively provide good discrimination but individually, those features would provide poor discrimination. Principal Component Analysis (PCA) PCA is a well known and widely used statistical analysis method and can be used to transform the original features into a lower dimensional space.
3) Classification and Recognition: The selected or reduced features from the dimensionality reduction process are used as inputs for the classification and recognition methods.
For example: Nearest Neighbor (NN) algorithms are used for classification of activities based on the closest training examples in the feature space. (ex k-NN algorithm)
Naïve Bayes is a simple probabilistic classifier based on Bayes’ theorem which can be used for Classification.
Support Vector Machines (SVMs) are supervised learning methods used for classification. In the assisted living scenario, SVM based activity recognition system using objects attached with sensors can be used to recognize drinking, phoning, and writing activities
Hidden Markov Models (HMMs) are statistical models that can also be used for activity recognition. I used a simple analogy to explain hidden markov analysis from a paper which explained HMM for inferring temperature in the distant past based on tree ring sizes
Gaussian Mixture Models (GMMs) can be used to recognize transitions between activities
Artificial Neural Networks can also be used to detect occurrences – ex falls.
Thus, we get a scenario as below
sensors(adapted from Activity Recognition Using Inertial Sensing for Healthcare,Wellbeing and Sports Applications: A Survey)
Small Data: Complementing the Deterministic by the predictive
Small Data could be a deterministic problem when we know a number of datasets and value lies at the intersection of these data sets. This strategy is possible with Mobile context based services and Location based services. The results so achieved could also be complemented by a predictive component of Small data.
In this case, a limited set of features are extracted, their dimensionality is reduced(ex using clustering) and finally we use a classification and a recognition method like Hidden Markov Models to actually recognize a higher order metric (ex walking, retail footfall etc)
I believe that these ideas could be adapted to many domains. Data science is engineering problem. It’s like building a Bridge where there is no fixed solution in advance. Every Bridge is different and will present a unique set of challenges. I like the blog post – Machine Learning is not a Kaggle competition . The author(Julia Evans) correctly emphasizes that we need to understand the business problem first. So, I think the above approach could apply to many business scenarios – ex in Retail (footfall), Healthcare, Airport lounges etc by inferring predictive insights from data streams