Book review: Fundamentals of Deep Learning: Designing Next-Generation Artificial Intelligence Algorithms by Nikhil Buduma

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

This blog is strictly not a book review since the book Fundamentals of Deep Learning: Designing Next-Generation Artificial Intelligence Algorithms by Nikhil Buduma is being published as an O Reilly early release (raw and unedited) book.

However, I have been a fan of Nikhil Buduma’s blog and writing. Hence, I bought the book as an early release and have enjoyed reading it. I also want to include it as a recommended book at the course I teach at Oxford University (Data Science for the Internet of Things)

There are very few accessible books on Deep Learning and it’s a complex and an evolving topic as I discussed in a recent blog – The evolution of Deep Learning Models. If you follow the detailed but readable posts on Nikhil’s blog such as A Deep dive into Recurrent neural networks  - you will enjoy the book

The first three chapters are released of the table of contents

Chapter 1 : The Neural Network

Chapter 2 : Training Feed Forward Neural Networks

Chapter 3  : Implementing Neural Networks in Theano

Chapter 4  : Beyond Gradient Descent

Chapter 5  : Convolutional Neural Networks:

Chapter  6 : Hopfield Networks and Restricted Bolzmann Machines

Chapter 7  : Deep Belief Networks

Chapter 8  : Recurrent Neural Networks

Chapter 9  : Autoencoders

Chapter 10 : Supplementary: Universality Theorems

(The table of contents is evolving )

I spoke to Nikhil about the creation and evolution of the book. Here are some comments from our discussion

How did the book idea come about

I first started writing about deep learning on my blog around January. I’d been hacking on it for a while and figured I might share the lessons I had learned applying these models to problems I’m passionate about (healthcare and language processing) with my peers within the MIT community. My blog got some pretty good reception, and ended up piquing the interest of Ben Lorica and Mike Loukides from O’Reilly. We talked about the possibility of writing a book, and I figured it would be a great way to make the field more accessible to a larger audience.

Writing an accessible book on Deep Learning

There’s definitely materials online for people interested in deep learning – a hodgepodge of papers, tutorials, and some books. Most of these materials are geared towards a highly academic audience, and it’s not particularly simple to navigate these resources. My goal was to synthesize the progress in the field so that anybody with some mathematical sophistication (basic calculus and familiarity with matrix manipulation) and Python programming under their belt would be able to tackle deep learning head on.

Explanation of Deep Learning models

As with classical machine learning, deep learning models can also be classified into three major areas – supervised, unsupervised, and reinforcement learning. My approach to the book is to develop an intuition for the major types of models. But in addition to being able to build their own, I’d like readers to come away with an understanding of why each model is designed the way it is. I think it’s this understanding that will enable readers to successfully leverage deep learning to tackle their own data challenges. I’m also interested in exploring some of the more exotic networks (augmented networks, long-term recurrent convolutional networks, spatial transformer networks, etc.) towards the end of my book to provide insights into the cutting edge of the field. Again, the focus here will not only be on how the models are structured, but also on why they’re structured the way they are.

Any final comments?

I’ve had the opportunity to work with luminaries in the machine learning space while writing this book, including Mike and Ben from O’Reilly, Jeff Dean of Google, and Jeff Hammerbacher of Mt. Sinai and Cloudera. I’m excited to see what readers think of the early release as it comes out, so I can tailor the content to what they’re looking for.

The book link is – Fundamentals of Deep Learning: Designing Next-Generation Artificial Intelligence Algorithms by Nikhil Buduma. I very much look forward to reading it as it develops and using it for my course

Retail Data Scientists: Data Science for Retail, marketing and advertising

You want to be a Data Scientist – but where do you start?

Why not consider Data Science for Retail, marketing and advertising.

And start NOW for just £100 for the first 4 sessions!

 

Retail is a great focus for aspiring Data Scientists.

The top 5 reasons to join

 a)    Customized Data Science course tailored to your needs

b)      Focus on Retail – with a good chance to transition your career and support from Retail experts

c)     Low upfront investment  – signup with as low as £100

d)      Retail toolkit: Modules(code) in R with lifetime access to a support network

e)      Possible to join even if you have no knowledge of Programming using the optional modules

 

Following our course on Data Science for Internet of things, we are now pleased to announce a new course: Retail Data Science with two options:

a)      Retail Data Science for Programmers and

b)      Retail Data Science for strategists

Here’s why Retail is a great focus for aspiring Data Scientists. Note: Here I use the word ‘retail’ in a broad sense to include – Retail, marketing and advertising.

 

1)      The market: There is a huge demand for retail data science and the market is very broad. In some online advertising agencies – 50% of the staff are data scientists  and Data scientists are increasingly crucial for Ad agencies .

2)      New innovation in the market: New applications are emerging ex mobile, IoT, Beacons etc alongwith new metrics like in-mall heat maps

3)      Value of analytics: Analytics are the key to convert customer interactions to relationships. The quantitative understanding of customers is thus key to increasing revenue

4)      Value of Real time: Retail has always used data in innovative ways but traditional statisticians would not be required to analyze massive data sets on the almost real-time scale that’s often required today.

5)      New tools: Big Data brings distributed processing capabilities in a cheap and affordable way through Hadoop.  But Hadoop itself is evolving for example through Apache Spark. The tools innovation is only just beginning!

6)      Operational intelligence:  Operational intelligence technologies give a new 360 degree view of the customer. For instance, companies like Splunk create new insights from machine data providing a 360 degree view of a customer.

7)      Emphasis : All companies use retail/marketing/advertising but each has a specific emphasis giving more opportunities for new Data science roles for example: Telecoms, web, mobile, social media have different approaches and different metrics to optimize

 

How can we help you become a Retail Data Scientist

Data scientists in the retail domain need to be a rare mix of marketing expert and a technology professional. Here’s how we can help

 

  • A Retail code toolkit: The course provides code (modules) in the R programming language to help solve specific problems i.e. A retail toolkit. Thus, you are empowered to work in a new role from Day one
  • The course provides lifetime community support for the code even beyond the course itself
  • There are two options: Retail Data Science for Programmers and Retail Data Science for strategists. If you choose the strategist option, we may not be familiar with Programming but would like to learn. Hence, we have additional modules to get you started for coding(see below)
  • This is a Use-case based data science course focussed on Retail, Marketing and advertising
  • We have totally 20 modules with certificate of completion
  • Coding is in the R programming language
  • 100£ for first 4 sessions on signup to be completed in first two months. Total for the course £799 GBP + VAT if applicable. Balance payable after 4 sessions (two months). We hope you will complete the course and gain the certification but if you do not want to proceed, its still a low risk exploration for £100 + VAT
  • The strategic option: If you are completely new to Programming – we have an option. We have an optional £200 GBP module for an introduction to programming in R. Note that if you have done some programming in any language – you should be able to pick up R from the main course. Learning to code is fun. Why not learn in context of specific business problems?
  • We provide support from retail experts
  • The course is customized i.e. we tailor the modules for you. Hence, we always have a small number of participants

We have very limited places. The sessions will be online.

The end goal is to transition you to a role in Data Science for retail

 

See some testimonials from our IoT course

  

The top 5 reasons to join

a)      Customized Data Science course tailored to your needs

b)      Focus on Retail – with a good chance to transition your career and support from Retail experts

c)      Low upfront investment – signup with as low as £100

d)      Retail toolkit: Modules(code) in R with lifetime access to a support network

e)      Possible to join even if you have no knowledge of Programming using the optional modules

 

For more information, please contact info at futuretext.com 

White paper – coming soon – Learn Scala through Spark

 

White paper – coming soon – Learn Scala through Spark

Coming soon by Aug 16

This white paper will explore learning Scala for beginners in context of Apache Spark

To get the paper when launched, please sign up below or email me at ajit.jaokar at futuretext.com

 

 Please sign up HERE

Evolution of Deep learning models

Evolution of Deep learning models

By

Ajit Jaokar

@ajitjaokar

Data Science for Internet of Things

Linkedin Ajit Jaokar

 

 

PS – This paper is best downloaded as a free pdf  HERE

Scope and approach

This paper is a part of a series covering Deep Learning applications for Smart cities/IoT with an emphasis on Security (human activity detection, surveillance etc). It also relates to my teaching at Oxford and UPM (Madrid) on Data Science and Internet of Things. The content is also a part of a personalized Data Science course I teach (online and offline) Personalized Data Science for Internet of Things course. I am also looking for academic collaborators to jointly publish similar work. If you want to be a part of the personalized Data Science course or collaborate academically, please contact me at ajit.jaokar at futuretext.com or connect with me on Linkedin Ajit Jaokar

No taxonomy of Deep learning models exists. And I do not attempt to create one here either. Instead, I explore the evolution of Deep learning models by loosely classifying them into Classical Deep learning models and Emerging Deep Learning models. This is not an exact classification. Also, we embark on this exercise keeping our goal in mind i.e. the application of Deep learning models to Smart cities from the perspective of Security (Safety, Surveillance). From the standpoint of Deep learning models, we are interested in ‘Human activity recognition’ and its evolution. This will be explored in subsequent papers.

In this paper, we list the evolution of Deep Learning models and recent innovations. Deep Learning is a fast moving topic and we see innovation in many areas such as Time series, hardware innovations, RNNs etc. Where possible, I have included links to excellent materials / papers which can be used to explore further. Any comments and feedback welcome and I am happy to cross reference you if you can add to specific areas.  Finally, I would like to thanks Lee Omar, Xi Sizhe and Ben Blackmore all of Red Ninja Labs for their feedback

Deep Learning – learning through layers

Deep learning is often thought of as a set of algorithms that ‘mimics the brain’. A more accurate description would be an algorithm that ‘learns in layers’. Deep learning involves learning through layers which allows a computer to build a hierarchy of complex concepts out of simpler concepts. Deep learning algorithms apply to many areas including Computer Vision, Image recognition, pattern recognition, speech recognition, behaviour recognition etc

To understand the significance of Deep Learning algorithms, it’s important to understand how Computers think and learn. Since the early days, researchers have attempted to create computers that think. Until recently, this effort has been rules based adopting a ‘top down’ approach. The Top-down approach involved writing enough rules for all possible circumstances.  But this approach is obviously limited by the number of rules and by its finite rules base.

To overcome these limitations, a bottom-up approach was proposed. The idea here is to learn from experience. The experience was provided by ‘labelled data’. Labelled data is fed to a system and the system is trained based on the responses – leading to the field of Machine Learning. This approach works for applications like Spam filtering. However, most data (pictures, video feeds, sounds, etc.) is not labelled and if it is, it’s not labelled well.

The other issue is in handling problem domains which are not finite. For example, the problem domain in chess is complex but finite because there are a finite number of primitives (32 chess pieces)  and a finite set of allowable actions(on 64 squares).  But in real life, at any instant, we have potentially a large number or infinite alternatives. The problem domain is thus very large.

A problem like playing chess can be ‘described’ to a computer by a set of formal rules.  In contrast, many real world problems are easily understood by people (intuitive) but not easy to describe (represent) to a Computer (unlike Chess). Examples of such intuitive problems include recognizing words or faces in an image. Such problems are hard to describe to a Computer because the problem domain is not finite. Thus, the problem description suffers from the curse of dimensionality i.e. when the number of dimensions increase, the volume of the space increases so fast that the available data becomes sparse. Computers cannot be trained on sparse data. Such scenarios are not easy to describe because there is not enough data to adequately represent combinations represented by the dimensions. Nevertheless, such ‘infinite choice’ problems are common in daily life.

Deep learning is thus involved with ‘hard/intuitive’ problem which have little/no rules and high dimensionality. Here, the system must learn to cope with unforeseen circumstances without knowing the Rules in advance.

 

Feed forward back propagation network

The feed forward back propagation network is a model which mimics the neurons in the brain in a limited way. In this model:  a)      Each neuron receives a signal from the neurons in the previous layer b)      Each of those signals is multiplied by a weight value. c)      The weighted inputs are summed, and passed through a limiting function which scales the output to a fixed range of values. d)      The output of the limiter is then broadcast to all of the neurons in the next layer. The learning algorithm for this model is called Back Propagation (BP) which stands for “backward propagation of errors”. We apply the input values to the first layer, allow the signals to propagate through the network and read the output. A BP network learns by example i.e. we must provide a learning set that consists of some input examples and the known correct output for each case. So, we use these input-output examples to show the network what type of behaviour is expected. The BP algorithm allows the network to adapt by adjusting the weights by propagating the error value backwards through the network. Each link between neurons has a unique weighting value. The ‘intelligence’ of the network lies in the values of the weights. With each iteration of the errors flowing backwards, the weights are adjusted. The whole process is repeated for each of the example cases. Thus, to detect an Object, Programmers would train a neural network by rapidly sending across many digitized versions of data (for example, images)  containing those objects. If the network did not accurately recognize a particular pattern,  the weights would be adjusted. The eventual goal of this training is to get the network to consistently recognize the patterns that we recognize (ex Cats).

Building a hierarchy of complex concepts out of simpler concepts

Deep learning involves learning through layers which allows a computer to build a hierarchy of complex concepts out of simpler concepts. This approach works for subjective and intuitive problems which are difficult to articulate. Consider image data. Computers cannot understand the meaning of a collection of pixels. Mappings from a collection of pixels to a complex Object are complicated. With deep learning, the problem is broken down into a series of hierarchical mappings – with each mapping described by a specific layer.

The input (representing the variables we actually observe) is presented at the visible layer. Then a series of hidden layers extracts increasingly abstract features from the input with each layer concerned with a specific mapping. However, note that this process is not pre defined i.e. we do not specify what the layers select

For example: From the pixels, the first hidden layer identifies the edges

From the edges, the second hidden layer identifies the corners and contours

From the corners and contours, the third hidden layer identifies the parts of objects

Finally, from the parts of objects, the fourth hidden layer identifies whole objects

 

 

 

 

 

 

 

Image and example source: Yoshua Bengio book – Deep Learning

Classical Deep Learning Models

Based on the above intuitive understanding of Deep learning, we now explore Deep learning models in more detail. No taxonomy of Deep learning models exists. Hence, we loosely classify Deep learning models into Classical and Emerging. In this section, we discuss the Classical Deep learning models.

Autoencoders: Feed forward neural networks and Back propagation

Feed forward neural networks (with back propagation as a training mechanism) are the best known and simplest Deep learning models. Back propagation is based on the classical optimisation method of steepest descent. In a more generic sense, Back propagation algorithms are a form of autoencoders. Autoencoders are simple learning circuits which aim to transform inputs into outputs with the least possible amount of distortion. While conceptually simple, they play an important role in machine learning. Autoencoders were first used in the 1980s by Hinton and others to address the problem of “backpropagation without a teacher”. In this case, the input data was used as the teacher and attempts were made to simulate the brain by mimicking Hebbian learning rules (cells that fire together – wire together). Feedforward Neural Networks with many layers are also referred to as Deep Neural Networks (DNNs). There are many difficulties in training deep feedforward neural networks

 

Deep belief networks

To overcome these issues, in 2006 Hinton et al. at University of Toronto introduced Deep Belief Networks (DBNs) – which is considered a breakthrough for Deep learning algorithms.

Here, the learning algorithm greedily trains one layer at a time, with layers created by stacked Restricted Boltzmann Machines (RBM) (instead of stacked autoencoders). Here, Restricted Boltzmann Machines (RBMS), are stacked and trained bottom up in unsupervised fashion, followed by a supervised learning phase to train the top layer and fine-tune the entire architecture. The bottom up phase is agnostic with respect to the final task. A simple introduction to Restricted Boltzmann machines is HERE where the Intuition behind RBMs is explained by considering some visible random variables (film reviews from different users) and some hidden variables (like film genres or other internal features). The task of the RBMs is to find out through training as to how these two sets of variables are actually connected to each other.

Convolutional Neural Networks (CNN)

Convolutional Neural Networks are similar to Autoencoders and RBMs but instead of learning single global weight matrix between two layers, they aim to find a set of locally connected neurons through filters (kernels). (adapted from stackoverflow). CNNs are mostly used in image recognition. Their name comes from “convolution” operator. A tutorial on feature extraction using convolution explains more.  CNNs use data-specific kernels to find locally connected neurons. Similar to autoencoders or RBMs, they also translate many low-level features (e.g. user reviews or image pixels) to the compressed high-level representation (e.g. film genres or edges) – but now weights are learned only from neurons that are spatially close to each other. Thus, a Convolutional Neural Network (CNN) is comprised of one or more convolutional layers and then followed by one or more fully connected layers as in a standard multilayer neural network. The architecture of a CNN is designed to take advantage of the 2D structure of an input image (or other 2D input such as a speech signal). CNNs is that they are easier to train and have many fewer parameters than fully connected networks with the same number of hidden units. A CNN tutorial HERE

Recurrent neural networks (RNNs)

A recurrent neural network (RNN) is a class of artificial neural network where connections between units form a directed cycle. This creates an internal state of the network which allows it to exhibit dynamic temporal behaviour. Unlike feedforward neural networks, RNNs can use their internal memory to process arbitrary sequences of inputs. This makes them applicable to tasks such as unsegmented connected handwriting recognition, where they have achieved the best known results.

The fundamental feature of a Recurrent Neural Network (RNN) is that the network contains at least one feed-back connection, so the activations can flow round in a loop. That enables the networks to do temporal processing and learn sequences, e.g., perform sequence recognition/reproduction or temporal association/prediction.Thus, feedforward networks use Directed acyclic graphs whereas Recurrent neural networks use Digraphs (Directed graphs). See also this excellent tutorial – Deep Dive into recurrent neural networks by Nikhil Buduma

Emerging Deep learning models

In the above section, we saw the main Deep learning models. Deep learning techniques are rapidly evolving. Much of the innovation takes place in combining different forms of learning with existing Deep learning techniques.  Learning algorithms fall into three groups with respect to the sort of feedback that the learner has access to: supervised learning, unsupervised learning and reinforcement learning. We also see emerging areas like application of Deep Learning to Time series data. In the section below, we discuss Emerging Deep learning models.  The list is not exhaustive because the papers and techniques selected are more relevant to our problem domain(Application of Deep learning techniques for Smart cities with an emphasis on Human activity monitoring for Security/Surveillance)

Application of Reinforcement learning to Neural networks

Playing Atari with reinforcement learning presents the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards. The method is applied to  seven Atari 2600 games from the Arcade Learning Environment, with no adjustment of the architecture or learning algorithm. It is found that it outperforms all previous approaches on six of the games and surpasses a human expert on three of them. The paper Deep learning for reinforcement learning in Pacman (q-learning) also addresses similar issues but for the game Pacman. DeepMind(now a part of Google) has a number of papers on reinforcement learning. Sascha Lange and Martin Riedmiller  apply Deep Auto-Encoder Neural Networks in Reinforcement Learning. The paper Recurrent Models of Visual Attention by Volodymyr Mnih Nicolas Heess Alex Graves Koray Kavukcuoglu of Google DeepMind presents a novel recurrent neural network model that is capable of extracting information from an image or video by adaptively selecting a sequence of regions or locations and only processing the selected regions at high resolution. It can be trained using reinforcement learning methods to learn task-specific policies.

 

Combining modalities for Deep learning

Multimodality is also an area of innovation for Deep learning networks. Multimodal networks learn from different types of data sources for example training video, audio and text together(usually video, audio and text are distinct training modes). The paper Multimodal deep learning  proposes a deep autoencoder considers the cross modality learning setting where both modalities are present (video and audio) during feature learning but only a single modality is used for supervised training and testing.

In the paper Joint Deep Learning for Pedestrian Detection Wanli Ouyang and Xiaogang Wang use CNNs but add deformation layer to classify the parts. Feature extraction, deformation handling, occlusion handling, and classification are four important components in pedestrian detection. This paper proposes that they should be jointly learned in order to maximize their strengths through cooperation.

Deep Learning of Invariant Spatio-Temporal Features from Video uses the convolutional Restricted Boltzmann machine (CRBM) as a basic processing unit. Their model(Space-Time Deep Belief Network) – ST-DBN, alternates the aggregation of spatial and temporal information so that higher layers capture longer range statistical dependencies in both space and time.

Parallelization

Another area for innovation and evolution of Deep learning is Parallelization. For example, Deep learning on Hadoop at Paypal and  Massively Parallel Methods for Deep Reinforcement Learning

 

Time Series

Because IoT/Smart series Data is mostly Time Series data, the use of Time Series with Deep Learning is also relevant to our work. In most cases, RNNs of DBNs are used to not only make a prediction but also (like NEST) to adapt. The paper Deep Learning for Time Series modelling forecasts demand i.e. predicts energy loads across different network grid areas using only time and temperature data. The paper uses hourly demand for four and a half years from 20 different geographic regions, and similar hourly temperature readings from 11 zones. Time Series Classification Using Multi-Channels Deep Convolutional Neural Networks uses deep learning framework for multivariate time series classification and the paper by Gilberto Batres-Estrada uses Deep Learning for Multivariate Financial Time Series

Cognitive computing

Ultimately, we can expect many services to be Cognitive.

An algorithmic framework will be called cognitive if it has the following properties: 1. it integrates knowledge from (a) various structured or unstructured sources, (b) past experience, and (c) current state, in order to reason with this knowledge as well as to adapt over time; 2. it interacts with the user (e.g., by natural language or visualization) and reasons based on such interactions; and 3. it can generate novel hypotheses and capabilities, and test their effectiveness. Source: Cognitive Automation of Data Science. Deep learning is increasingly becoming a part of Cognitive computing

Notes

Some additional notes:

Deep Learning in contrast to other machine learning techniques

To recap, a more formal definition of Deep LearningDeep Learning: a class of machine learning techniques, where many layers of information processing stages in hierarchical architectures are exploited for unsupervised feature learning and for pattern analysis/classification. The essence of deep learning is to compute hierarchical features or representations of the observational data, where the higher-level features or factors are defined from lower-level ones.

Historically, Deep Learning is a form of the fundamental credit assignment problem (Minsky, 1963). Here,  Learning or credit assignment is about finding weights that make the neural network exhibit desired behaviour, such as driving a car. Deep Learning is about accurately assigning credit across many such stages. Historical reference through Marvin Minsky’s papers

Deep learning techniques can also be contrasted to more traditional machine learning techniques. When we represent some object as a vector of n elements, we say that this is a vector in n-dimensional space. Thus, dimensionality reduction refers to a process of refining data in such a way, that each data vector x is translated into another vector x′ in an m-dimensional space (vector with m elements), where m<n.  The most common way of doing this is PCA (Principal Component Analysis). PCA finds “internal axes” of a dataset (called “components”) and sorts them by their importance. The first m most important components are then used as new basis. Each of these components may be thought of as a high-level feature, describing data vectors better than original axes.

Both – autoencoders and RBMs – do the same thing. Taking a vector in n-dimensional space they translate it into an m-dimensional one, trying to keep as much important information as possible and, at the same time, remove noise. If training of autoencoder/RBM was successful, each element of resulting vector (i.e. each hidden unit) represents something important about the object – shape of an eyebrow in an image, genre of a film, field of study in scientific article, etc. You take lots of noisy data as an input and produce much less data in a much more efficient representation. In the above image, we see an example of such a deep network. We start with ordinary pixels, proceed with simple filters, then with face elements and finally end up with entire faces. This is the essence of deep learning. (Adapted from stackexcahnge).

So, one could ask: If we already have techniques like PCA, why do we need autoencoders and RBMs? The reason is: PCA only allows linear transformation of a data vectors. Autoencoders and RBMs, on other hand, are non-linear by the nature, and thus, they can learn more complicated relations between visible and hidden units. Moreover, they can be stacked, which makes them even more powerful. Most problems addressed by Deep learning neural networks are not linear i.e. if we were able to model relationships linearly between the independent and dependent variable, classic regression techniques apply. The paper Deep neural networks as recursive generalised linear models (RGLMs) explains the applicability of Deep Learning techniques to non-linear problems from a statistical standpoint

Deep Learning and Feature learning

Deep Learning can be hence seen as a more complete, hierarchical and a ‘bottom up’ way for feature extraction and without human intervention. Deep Learning is a form of Pattern Recognition system and the performance of a pattern recognition system heavily depends on feature representation. In the past, manually designed features were used for image and video processing. These rely on human domain knowledge and it is hard to manually tune them.  Thus, developing effective features for new applications is a slow process. Deep learning overcomes this problem of feature extraction. Deep learning also distinguishes multiple factors and a hierarchy in video and audio data for example Objects (sky, cars, roads, buildings, pedestrians),  parts (wheels, doors, heads) can be decomposed from images. For this task, more layers provide greater granularity. For example Google net has more than 20 layers

Source: ELEG 5040 Advanced Topics on Signal Processing (Introduction to Deep Learning) by Xiaogang Wang

Deep learning and Classification techniques

None of deep learning models discussed here work as classification algorithms. Instead, they can be seen as Pretrainin , automated feature selection and learning, creating a hierarchy of features etc. Once trained (features are selected), the input vectors are transformed into a better representation and these are in turn passed on to a real classifier such as SVM or Logistic regression.  This can be represented as below.

 

 

 

 

 

Source: ELEG 5040 Advanced Topics on Signal Processing (Introduction to Deep Learning) by Xiaogang Wang

Advances in Hardware

Another major source for innovation in Deep learning networks is Hardware.  The impact of hardware on Deep Learning is a complex topic – but two examples are: The Qualcomm zeroth platform that brings cognitive and Deep learning capabilities – including to Mobile devices. Similarly, the NVIDIA cuDNN – GPU Accelerated Deep Learning

DBNs to pre-train DNNs

Finally, Deep learning techniques have synergies amongst themselves. We explained DBNs and DNNs above.  DBNs and DNNs can be used in conjunction i.e. Deep Belief Net (that use RBM for layer-wise training) can be used as the pre-training method for a Deep neural network.

Conclusions

This paper is a part of a series covering Deep Learning applications for Smart cities/IoT with an emphasis on Security (human activity detection, surveillance etc). Subsequent parts of this paper will cover human activity detection and Smart cities. The content is a part of a personalized Data Science course I teach (online and offline) Personalized Data Science for Internet of Things course. I am also looking for academic collaborators to jointly publish similar work. If you want to be a part of the personalized Data Science course or collaborate academically,  please contact me at ajit.jaokar at futuretext.com or connect with me on Linkedin Ajit Jaokar

IoT analytics, Edge Computing and Smart Objects

 

 

 

 

 

The term ‘Smart objects’ has been around from the times of Ubiquitous Computing.

However, as we have started building Smart objects, I believe that the meaning and definition has evolved.

Here is my view on how the definition of Smart Objects has changed in the world of Edge Computing and increasing processing capacity

At a minimum, a smart Object should have 3 things

a) An Identity ex ipv6
b) Sensors / actuators
c) A radio (Bluetooth / cellular etc)

In addition, a smart object could incorporate

a) Physical context ex location
b) Social context ex proximity in social media

To extend even more, Smartness could incorporate analytics

Some of these analytics could be performed on the device itself ex computing at the edge concept from Intel, Cisco and others.

However, Edge Computing as discussed today, still has some limitations

For example:

a)     The need to incorporate multiple feeds from different sensors to reach a decision ‘at the edge’

b)    The need for a workflow process i.e. actions based on readings – again often at the edge with it’s accompanying security and safety measures

To manage multiple sensor feeds, we need to understand concepts like sensor fusion (pdf) (source freescale).

We already have some rudimentary workflow through mechanisms like IFTTT(If this then that)

In addition, the rise of CPU capacity leads to greater intelligence on the device – for example Qualcomm Zeroth platform which enables Deep learning algorithms on the device.

So, in a nutshell, its a evolving concept especially if we include IoT analytics in the definition of Smart objects (and that some of these analytics could be performed at the Edge)  ..

We cover these ideas in the #DataScience for #IoT course and also at the courses I teach at Oxford University

Comments welcome

 

 

Become a Data Scientist for the Internet of Things – download free paper

 

Free paper: 


An Introduction to Deep Learning and it’s role for IoT/ future cities

 

  •  

  •  


  • Yes, I am interested in Data science

    Yes, I am interested in IoT

    I am specifically interested in the intersection of Data Science and IoT

    I am currently just curious and exploring

     

  •  



  • subscribed:
    1



  • Email Marketingby GetResponse

 

An Introduction to Deep Learning and it’s role for IoT/ future cities

Note The paper below best read as a pdf which you can download free below

 

An Introduction to Deep Learning and it’s role for IoT/ future cities

By Ajit Jaokar

@ajitjaokar

Please connect with me if you want to stay in touch on linkedin and for future updates

Background and Abstract

This article is a part of an evolving theme. Here, I explain the basics of Deep Learning and how Deep learning algorithms could apply to IoT and Smart city domains. Specifically, as I discuss below, I am interested in complementing Deep learning algorithms using IoT datasets. I elaborate these ideas in the Data Science for Internet of Things program which enables you to work towards being a Data Scientist for the Internet of Things  (modelled on the course I teach at Oxford University and UPM – Madrid). I will also present these ideas at the International conference on City Sciences at Tongji University in Shanghai  and the Data Science for IoT workshop at the Iotworld event in San Francisco

Please connect with me if you want to stay in touch on linkedin and for future updates

Deep Learning

Deep learning is often thought of as a set of algorithms that ‘mimics the brain’. A more accurate description would be an algorithm that ‘learns in layers’. Deep learning involves learning through layers which allows a computer to build a hierarchy of complex concepts out of simpler concepts.

The obscure world of deep learning algorithms came into public limelight when Google researchers fed 10 million random, unlabeled images from YouTube into their experimental Deep Learning system. They then instructed the system to recognize the basic elements of a picture and how these elements fit together. The system comprising 16,000 CPUs was able to identify images that shared similar characteristics (such as images of Cats). This canonical experiment showed the potential of Deep learning algorithms. Deep learning algorithms apply to many areas including Computer Vision, Image recognition, pattern recognition, speech recognition, behaviour recognition etc

 

How does a Computer Learn?

To understand the significance of Deep Learning algorithms, it’s important to understand how Computers think and learn. Since the early days, researchers have attempted to create computers that think. Until recently, this effort has been rules based adopting a ‘top down’ approach. The Top-down approach involved writing enough rules for all possible circumstances.  But this approach is obviously limited by the number of rules and by its finite rules base.

To overcome these limitations, a bottom-up approach was proposed. The idea here is to learn from experience. The experience was provided by ‘labelled data’. Labelled data is fed to a system and the system is trained based on the responses. This approach works for applications like Spam filtering. However, most data (pictures, video feeds, sounds, etc.) is not labelled and if it is, it’s not labelled well.

The other issue is in handling problem domains which are not finite. For example, the problem domain in chess is complex but finite because there are a finite number of primitives (32 chess pieces)  and a finite set of allowable actions(on 64 squares).  But in real life, at any instant, we have potentially a large number or infinite alternatives. The problem domain is thus very large.

A problem like playing chess can be ‘described’ to a computer by a set of formal rules.  In contrast, many real world problems are easily understood by people (intuitive) but not easy to describe (represent) to a Computer (unlike Chess). Examples of such intuitive problems include recognizing words or faces in an image. Such problems are hard to describe to a Computer because the problem domain is not finite. Thus, the problem description suffers from the curse of dimensionality i.e. when the number of dimensions increase, the volume of the space increases so fast that the available data becomes sparse. Computers cannot be trained on sparse data. Such scenarios are not easy to describe because there is not enough data to adequately represent combinations represented by the dimensions. Nevertheless, such ‘infinite choice’ problems are common in daily life.

How do Deep learning algorithms learn?

Deep learning is involved with ‘hard/intuitive’ problem which have little/no rules and high dimensionality. Here, the system must learn to cope with unforeseen circumstances without knowing the Rules in advance. Many existing systems like Siri’s speech recognition and Facebook’s face recognition work on these principles.  Deep learning systems are possible to implement now because of three reasons: High CPU power, Better Algorithms and the availability of more data. Over the next few years, these factors will lead to more applications of Deep learning systems.

Deep Learning algorithms are modelled on the workings of the Brain. The Brain may be thought of as a massively parallel analog computer which contains about 10^10 simple processors (neurons) – each of which require a few milliseconds to respond to input. To model the workings of the brain, in theory, each neuron could be designed as a small electronic device which has a transfer function similar to a biological neuron. We could then connect each neuron to many other neurons to imitate the workings of the Brain. In practise,  it turns out that this model is not easy to implement and is difficult to train.

So, we make some simplifications in the model mimicking the brain. The resultant neural network is called “feed-forward back-propagation network”.  The simplifications/constraints are: We change the connectivity between the neurons so that they are in distinct layers. Each neuron in one layer is connected to every neuron in the next layer. Signals flow in only one direction. And finally, we simplify the neuron design to ‘fire’ based on simple, weight driven inputs from other neurons. Such a simplified network (feed-forward neural network model) is more practical to build and use.

Thus:

a)      Each neuron receives a signal from the neurons in the previous layer

b)      Each of those signals is multiplied by a weight value.

c)      The weighted inputs are summed, and passed through a limiting function which scales the output to a fixed range of values.

d)      The output of the limiter is then broadcast to all of the neurons in the next layer.

Image and parts of description in this section adapted from : Seattle robotics site

The most common learning algorithm for artificial neural networks is called Back Propagation (BP) which stands for “backward propagation of errors”. To use the neural network, we apply the input values to the first layer, allow the signals to propagate through the network and read the output. A BP network learns by example i.e. we must provide a learning set that consists of some input examples and the known correct output for each case. So, we use these input-output examples to show the network what type of behaviour is expected. The BP algorithm allows the network to adapt by adjusting the weights by propagating the error value backwards through the network. Each link between neurons has a unique weighting value. The ‘intelligence’ of the network lies in the values of the weights. With each iteration of the errors flowing backwards, the weights are adjusted. The whole process is repeated for each of the example cases. Thus, to detect an Object, Programmers would train a neural network by rapidly sending across many digitized versions of data (for example, images)  containing those objects. If the network did not accurately recognize a particular pattern,  the weights would be adjusted. The eventual goal of this training is to get the network to consistently recognize the patterns that we recognize (ex Cats).

How does Deep Learning help to solve the intuitive problem

The whole objective of Deep Learning is to solve ‘intuitive’ problems i.e. problems characterized by High dimensionality and no rules.  The above mechanism demonstrates a supervised learning algorithm based on a limited modelling of Neurons – but we need to understand more.

Deep learning allows computers to solve intuitive problems because:

  • With Deep learning, Computers can learn from experience but also can understand the world in terms of a hierarchy of concepts – where each concept is defined in terms of simpler concepts.
  • The hierarchy of concepts is built ‘bottom up’ without predefined rules by addressing the ‘representation problem’.

This is similar to the way a child learns ‘what a dog is’ i.e. by understanding the sub-components of a concept ex  the behavior(barking), shape of the head, the tail, the fur etc and then putting these concepts in one bigger idea i.e. the Dog itself.

The (knowledge) representation problem is a recurring theme in Computer Science.

Knowledge representation incorporates theories from psychology which look to understand how humans solve problems and represent knowledge.  The idea is that: if like humans, Computers were to gather knowledge from experience, it avoids the need for human operators to formally specify all of the knowledge that the computer needs to solve a problem.

For a computer, the choice of representation has an enormous effect on the performance of machine learning algorithms. For example, based on the sound pitch, it is possible to know if the speaker is a man, woman or child. However, for many applications, it is not easy to know what set of features represent the information accurately. For example, to detect pictures of cars in images, a wheel may be circular in shape – but actual pictures of wheels may have variants (spokes, metal parts etc). So, the idea of representation learning is to find both the mapping and the representation.

If we can find representations and their mappings automatically (i.e. without human intervention), we have a flexible design to solve intuitive problems.   We can adapt to new tasks and we can even infer new insights without observation. For example, based on the pitch of the sound – we can infer an accent and hence a nationality. The mechanism is self learning. Deep learning applications are best suited for situations which involve large amounts of data and complex relationships between different parameters. Training a Neural network involves repeatedly showing it that: “Given an input, this is the correct output”. If this is done enough times, a sufficiently trained network will mimic the function you are simulating. It will also ignore inputs that are irrelevant to the solution. Conversely, it will fail to converge on a solution if you leave out critical inputs. This model can be applied to many scenarios as we see below in a simplified example.

An example of learning through layers

Deep learning involves learning through layers which allows a computer to build a hierarchy of complex concepts out of simpler concepts. This approach works for subjective and intuitive problems which are difficult to articulate.

Consider image data. Computers cannot understand the meaning of a collection of pixels. Mappings from a collection of pixels to a complex Object are complicated.

With deep learning, the problem is broken down into a series of hierarchical mappings – with each mapping described by a specific layer.

The input (representing the variables we actually observe) is presented at the visible layer. Then a series of hidden layers extracts increasingly abstract features from the input with each layer concerned with a specific mapping. However, note that this process is not pre defined i.e. we do not specify what the layers select

For example: From the pixels, the first hidden layer identifies the edges

From the edges, the second hidden layer identifies the corners and contours

From the corners and contours, the third hidden layer identifies the parts of objects

Finally, from the parts of objects, the fourth hidden layer identifies whole objects

Image and example source: Yoshua Bengio book – Deep Learning

Implications for IoT

To recap:

  • Deep learning algorithms apply to many areas including Computer Vision, Image recognition, pattern recognition, speech recognition, behaviour recognition etc
  • Deep learning systems are possible to implement now because of three reasons: High CPU power, Better Algorithms and the availability of more data. Over the next few years, these factors will lead to more applications of Deep learning systems.
  • Deep learning applications are best suited for situations which involve large amounts of data and complex relationships between different parameters.
  • Solving intuitive problems: Training a Neural network involves repeatedly showing it that: “Given an input, this is the correct output”. If this is done enough times, a sufficiently trained network will mimic the function you are simulating. It will also ignore inputs that are irrelevant to the solution. Conversely, it will fail to converge on a solution if you leave out critical inputs. This model can be applied to many scenarios

In addition, we have limitations in the technology. For instance, we have a long way to go before a Deep learning system can figure out that you are sad because your cat died(although it seems Cognitoys based on IBM watson is heading in that direction). The current focus is more on identifying photos, guessing the age from photos(based on Microsoft’s project Oxford API)

And we have indeed a way to go as Andrew Ng reminds us to think of Artificial Intelligence as building a rocket ship

“I think AI is akin to building a rocket ship. You need a huge engine and a lot of fuel. If you have a large engine and a tiny amount of fuel, you won’t make it to orbit. If you have a tiny engine and a ton of fuel, you can’t even lift off. To build a rocket you need a huge engine and a lot of fuel. The analogy to deep learning [one of the key processes in creating artificial intelligence] is that the rocket engine is the deep learning models and the fuel is the huge amounts of data we can feed to these algorithms.”

Today, we are still limited by technology from achieving scale. Google’s neural network that identified cats had 16,000 nodes. In contrast, a human brain has an estimated 100 billion neurons!

There are some scenarios where Back propagation neural networks are suited

  • A large amount of input/output data is available, but you’re not sure how to relate it to the output. Thus, we have a larger number of “Given an input, this is the correct output” type scenarios which can be used to train the network because it is easy to create a number of examples of correct behaviour.
  • The problem appears to have overwhelming complexity. The complexity arises from Low rules base and a high dimensionality and from data which is not easy to represent.  However, there is clearly a solution.
  • The solution to the problem may change over time, within the bounds of the given input and output parameters (i.e., today 2+2=4, but in the future we may find that 2+2=3.8) and Outputs can be “fuzzy”, or non-numeric.
  • Domain expertise is not strictly needed because the output can be purely derived from inputs: This is controversial because it is not always possible to model an output based on the input alone. However, consider the example of stock market prediction. In theory, given enough cases of inputs and outputs for a stock value, you could create a model which would predict unknown scenarios if it was trained adequately using deep learning techniques.
  • Inference:  We need to infer new insights without observation. For example, based on the pitch of the sound – we can infer an accent and hence a nationality

Given an IoT domain, we could consider the top-level questions:

  • What existing applications can be complemented by Deep learning techniques by adding an intuitive component? (ex in smart cities)
  • What metrics are being measured and predicted? And how could we add an intuitive component to the metric?
  • What applications exist in Computer Vision, Image recognition, pattern recognition, speech recognition, behaviour recognition etc which also apply to IoT

Now, extending more deeply into the research domain, here are some areas of interest that I am following.

Complementing Deep Learning algorithms with IoT datasets

In essence, these techniques/strategies complement Deep learning algorithms with IoT datasets.

1)      Deep learning algorithms and Time series data : Time series data (coming from sensors) can be thought of as a 1D grid taking samples at regular time intervals, and image data can be thought of as a 2D grid of pixels. This allows us to model Time series data with Deep learning algorithms (most sensor / IoT data is time series).  It is relatively less common to explore Deep learning and Time series – but there are some instances of this approach already (Deep Learning for Time Series Modelling to predict energy loads using only time and temp data  )

2)      Multiple modalities: multimodality in deep learning. Multimodality in deep learning algorithms is being explored  In particular, cross modality feature learning, where better features for one modality (e.g., video) can be learned if multiple modalities (e.g., audio and video) are present at feature learning time

3)      Temporal patterns in Deep learning: In their recent paper, Ph.D. student Huan-Kai Peng and Professor Radu Marculescu, from Carnegie Mellon University’s Department of Electrical and Computer Engineering, propose a new way to identify the intrinsic dynamics of interaction patterns at multiple time scales. Their method involves building a deep-learning model that consists of multiple levels; each level captures the relevant patterns of a specific temporal scale. The newly proposed model can be also used to explain the possible ways in which short-term patterns relate to the long-term patterns. For example, it becomes possible to describe how a long-term pattern in Twitter can be sustained and enhanced by a sequence of short-term patterns, including characteristics like popularity, stickiness, contagiousness, and interactivity. The paper can be downloaded HERE

Implications for Smart cities

I see Smart cities as an application domain for Internet of Things. Many definitions exist for Smart cities/future cities. From our perspective, Smart cities refer to the use of digital technologies to enhance performance and wellbeing, to reduce costs and resource consumption, and to engage more effectively and actively with its citizens (adapted from Wikipedia). Key ‘smart’ sectors include transport, energy, health care, water and waste. A more comprehensive list of Smart City/IoT application areas are: Intelligent transport systems – Automatic vehicle , Medical and Healthcare, Environment , Waste management , Air quality , Water quality, Accident and  Emergency services, Energy including renewable, Intelligent transport systems  including autonomous vehicles. In all these areas we could find applications to which we could add an intuitive component based on the ideas above.

Typical domains will include Computer Vision, Image recognition, pattern recognition, speech recognition, behaviour recognition. Of special interest are new areas such as the Self driving cars – ex the Lutz pod and even larger vehicles such as self driving trucks

Conclusions

Deep learning involves learning through layers which allows a computer to build a hierarchy of complex concepts out of simpler concepts. Deep learning is used to address intuitive applications with high dimensionality.  It is an emerging field and over the next few years, due to advances in technology, we are likely to see many more applications in the Deep learning space. I am specifically interested in how IoT datasets can be used to complement deep learning algorithms. This is an emerging area with some examples shown above. I believe that it will have widespread applications, many of which we have not fully explored(as in the Smart city examples)

I see this article as part of an evolving theme. Future updates will explore how Deep learning algorithms could apply to IoT and Smart city domains. Also, I am interested in complementing Deep learning algorithms using IoT datasets.

I elaborate these ideas in the Data Science for Internet of Things program  (modelled on the course I teach at Oxford University and UPM – Madrid). I will also present these ideas at the International conference on City Sciences at Tongji University in Shanghai  and the Data Science for IoT workshop at the Iotworld event in San Francisco

Please connect with me if you want to stay in touch on linkedin and for future updates

Does the ‘app economy’ still exist?

 

 

 

 

 

 

Something extraordinary happened last week

An app (meerkat) (which was a ‘massive hit’ at SXSW) and which was launched only two months ago – raised $14m in funding.

Three days after that – it’s popularity plunged rapidly after the launch of Twitter’s periscope.

Probably never to return to its height.

A few more days after that Meerkat and Periscope are neck to neck

In two months  an app goes from launch – to funding (14m) – plunge.

Some blame the Tech journalists – and there is some truth in that.

A whole ecosystem has grown up to support the ‘app economy’ – including the VCs, tech journalists, conference creators, hackathons and industry analysts who rank apps.

Sentiment changes rapidly.

Now, some articles call it the Schrödinger’s meerkat(is it dead or is it alive?)

Others have taken to defend the tech journalists themselves ex from the Guardian Tech journalists may have been wrong about Meerkat but they’re right to get excited about new apps

But there is a wider question here ..

Apps uptake metrics(ex downloads) have become a bit like the dot com era obsession ..

There is a lot of activity but it is transient (as we see in the case of Meerkat) because the value no longer lies in the App itself.

For long term success, the value (if it exists) lies beyond the app.

Here are some reasons why the app economy dynamic is changing and value is shifting away from the app:

a)      Even when the app has been poor, the company has done well when the value lay beyond the app. The best example of this is LinkedIn – whose app and website are always frustrating to me. I need to sometimes use wikihow to understand even the basics such as deleting a contact  . The app could be a lot better – but we still use it despite the app

b)      APIs are becoming increasingly important and are managing much of the complexity for example health care APIs. The app then becomes a simple interface – APIs do the work

c)       ‘App only’ brands are hard to sustain and expand: Unlike Linkedin – where the value lies beyond the app – for Rovio(angry birds) the product (and the value) was in the app itself. And 2014 has been a bad year for Rovio. It’s  unclear if the popularity of the brand will ever return.

d)      Content has a fleeting timescale and its getting even smaller: The diminishing popularity timescales apply to all online content. Gangnum style broke the YouTube popularity counter – but look again.. Gangnam style was launched in July 2012. Google trends for Gangnum style shows that it peaked in Dec 2012 – with a precipitous drop soon after. And Gangnum has been dropping in popularity ever since(even when cumulative views increase). Content apps also may have the same problem. Beyond the first year (or two) – they appear to be from an older era especially if the user base is younger. The Draw something app also had the same problem of drop in popularity

e)      Which apps do IoT developers use? Is like focussing on the dashboard and ignoring the engine: Which apps do IoT developers use is the wrong question – because it places too much emphasis on the app than the vertical(IoT). It’s like saying – which web development technique they use for their website? Does it matter? IoT is a hugely complex domain. Same will apply to automotive apps, healthcare apps etc.

f)       Apps are not open: Coming back to Meerkat – we are reminded with Twitter’s move that apps and social media are not open. If Twitter does a deal with Operators for ‘sponsored data’ – that’s even worse for innovation like Meerkat (and I expect that type of deal will be increasingly common – further suppressing  Long Tail innovation)

Analysis:

Apps continue to drive Long tail innovation

But for the reasons mentioned above, there is a fundamental shift in the ecosystem

Value is now closely tied to the vertical

In some ways, it is a natural maturing of the ecosystem

But when tied to a specific vertical – the value apportioned to the app is relatively less

Knowledge and integration about the Vertical now becomes more important than app in this maturing phase(leaving aside the Openness issue).

For example – for IoT – IBM bet $3 billion into IoT – but the focus is on analyzing data coming from many different devices.

The skillsets to do this are not the same as for the app – although there will be undoubtedly an app interface

So, does the app economy still exist?

Increasingly, not in the form we know it (across verticals)

In a more maturing phase, we will see deeper integration with specific verticals.

For other forms of apps – there is no way to predict economic value even over short periods

PS – if you are interested in IoT – have a look at this(  upskill to Big Data, Data Science and IoT )

We will also have an online version. Please contact me at ajit.jaokar at futuretext.com

Great to be on this list: The Internet of Things Landscape 2015: Top 100 Individuals and Brands

 

 

 

 

 

 

 

 

 

 

 

Great to be on this list http://www.onalytica.com/…/the-internet-of-things-landscap…/ (full list needs a free download) – I am No 90 (for individuals)

Good list of people and brands to follow

Data Science for Internet of Things course – London

 

 

 

 

 

 

 

 

 

 

Hello all

Over the past few years, I have been teaching a specialized course at Oxford University for Telecoms and Big Data

This year, I have also started teaching a new course for Data Science and IoT.

Here, we apply Predictive algorithms to IoT datasets.

Its a complex course and currently we have launched it with a few corporates through Oxford

Independent of the academic course, I have also launched a version with fablab London

The outline below gives you the the approach, content and modules.

If you can commute to London and want to master Data Science for Internet of Things – have a look at London Data Science for IoT

Alternately, we will have an online version for $600

This course is ideally suited to developers who want to transition their career towards Data Science(with an emphasis on Internet of Things)
By working with very small groups – I believe the program can truly make a difference
If you are interested in knowing more, please have a look at Data Science for IoT – London or please contact me for the Online version
I will also continue to share papers/research in this space as we develop more.