Miami Young Data Scientists – Pleased to be the winning team in the 2015 Association of Space Engineers/Astrosat challenge










For the last two years, I have worked with teaching Computer Science for young people.

This venture has had its ups and downs.

But we have had the support of many who believed in the vision.

So, It was very nice to see this

We (Countdown Institute – i.e. now me and Richard Schuchts based in Miami ) submitted an entry in the ASE AstroSat Challenge (supported by Northrop Grumman Corporation). The Association of Space Explorers is the unique professional organization composed of astronauts who have orbited Earth. They have 375 members from 35 countries and are passionate about encouraging students to pursue science, technology, engineering, and math education, as well as careers in astronautics. The ASE AstroSat Challenge is designed to give students a taste of the exciting world of satellite operations. The ASE AstroSat Challenge is made possible with the generous support of the Northrop Grumman Corporation.

Only 15 teams were selected

And our team (Miami Young Data Scientists/Countdown Institute) were one of them

Its amazing to get here.

It means the team of ‘young data scientists’ from Miami will be able to run a Space Experiment live in Space and also learn Data Science

The winning entry was based on teaching Data Science to young people.

Specifically, using Regression algorithms to make predictions on Space data from Ardusat (more on this soon)

This is different from our original idea and is more complex .. but I think it would make a difference to get more young people into Data Science (as per Harvard – the hottest profession in future)

Thus, I think the biggest winners are the young people of Miami who are a part of the winning team.

The main variation/evolution from the original idea is to focus on Data Science and inspiring students to take up Data Science through visualization of data and predictions using scientific methodology.

Its a way to get more students(both boys and girls) interested in Data science using Space exploration by coding on a live satellite.

Hence, the regression algorithms/iPython notebooks etc.

Also a bit more math. and hence slightly for older students(aroundn 15 to 17). All this also aligns with my ‘day job’ so to speak!

Here is the full list of winners of the astrosat competition

I am happy to share more. If you want to know more about this – please email me at ajit.jaokar at

Free Download – Data Science for IoT course papers

Here is a set of papers from the Data Science for Internet of Things – practitioners course

These have been published by course participants in top Data Science blogs like KDnuggets and Data Science Central

The Zip file includes the following papers:

1) Recurrent neural networks, Time series data and IoT – Part One

2) Spark SQL for Real Time Analytics – Part One

3) Spark SQL for Real Time Analytics – Part Two

4) Time Series IoT applications in Railroads

5) Nov 13 update: Kalman filters 1 and 2

Free Download – Data Science for IoT course papers


Data Science for Internet of Things: Practitioners course – modules list










Course URL : Data Science for Internet of Things: Practitioners course

Note that the modules are customizable i.e. as per your personal learning plan – you may choose to do more or less of a specific topic. For example, more Deep Learning vs Sensor fusion. But overall, we will follow this plan.

Overall themes covered in the course

  • IoT
  • Data Science
  • Big Data
  • Machine Learning
  • Deep Learning
  • Sensor fusion
  • Use Cases (application domains) and IoT Datasets
  • Math foundation
  • Time Series
  • IoT stream processing
  • Apache Spark ecosystem
  • Programming (R, Scala, SQL)

Weekly schedule


Week 0 Orientation, introductions, Personal learning plans, Platform signup
Week 1 nov 16 Foundations:An analytics Driven Organization – IoT and Machine Learning  - Data Science for IoT – Unique characteristics – Data Science for IoT – why now?
Nov 23 Machine Learning conceptsDeep Learning concepts
Nov 30 An introduction to IoT (Internet of Things)
Dec 7 IoT platforms – From sensor to Cloud
Dec 14 Concepts of Big Data Part One
Dec 21 Concepts of Big Data Part Two
Jan 11 Market drivers for IoT
Jan 18 Choosing a model – what technique to Use?
Jan 25 Use Cases  and IoT datasets (these will continue throughout the course)
Feb 1 Time series and NoSQL databases
Feb 8 Streaming analytics part One
Feb 15 Streaming analytics part two
Feb 22 Deep learning part one
Feb 29 Deep learning part two
Mar 7 Machine learning algorithms – part one
Mar 14 Machine learning algorithms – part two
Mar 21 Mathematical foundations – part one
Mar 28 Mathematical foundations – part two





Week 0 Orientation, introductions, Personal learning plans, Platform signup
 Nov 16
Nov 23
Nov 30 Intro to R, Installations, Basics of R
Dec 7
Dec 14 Data Frames in R & Tabular Data
Dec 21
Jan 11 Data Processing & Data Visualization in R
Jan 18
Jan 25 Scala basics
Feb 1
Feb 8 Spark batch processing I
Feb 15
Feb 22 Spark Batch Processing II
Feb 29
Mar 7 Spark SQL
Mar 14
Mar 21 Spark Streaming
Mar 28


My forthcoming book – Data Science for IoT


Hello all

As many of you know, I have been working on the Data Science for IoT course for the last year or so (Both at Oxford Uni and for my consulting work)

As part of this work, I have been covering many complex areas like Sensor fusion/kalman filters(published on Kdnuggets)  and Deep learning (Recurrent neural networks) for IoT (published on Data Science central) 

Last week, I was approached by a program at Stanford University about this work

In a nutshell, the content of the Data Science for IoT course will be included as a recommended book for a forthcoming program taught at Stanford University

Its been a while since I have written a book ..

Excluding books for young people(teaching coding and computer science), the last major effort was with Tony Fish (Mobile Web 2.0) which launched my career into Mobile.

A book is a major undertaking

However, the existing course (Data Science for IoT) and the collaboration with Stanford University program for the book gives me the opportunity to create the book iteratively (in sections as we teach)

The book will have co-authors (more on that soon) and also many contributors from the Data Science for IoT course

This enables me to keep the content very fresh – which is critical in such a rapidly evolving field

We have had a great response to the Data Science for IoT course from all over the world.

Most of the participants are from USA and UK – but we also have participants from as far as Australia and Nicaragua.

If you are interested in being part of the course – please sign up now (we start next week)See course Data Science for IoT course

Happy to discuss part payments if you want.

We have had some excellent participants already and I look forward to learning and sharing more insights as part of the book and the course

kind rgds


Time Series Forecasting and Internet of Things (IoT) in Grain Storage

Time Series Forecasting and Internet of Things (IoT) in Grain Storage 

Authors: Vinay Mehendiratta, PhD, Director of Research and Analytics, Eka Software

Sishir Kumar Pagada, Senior Software Engineer, Eka Software

Created as part of the Data Science for IoT practitioners course – starting Nov 10 2015

The pdf version of this paper may be downloaded HERE  


Grain storage operators are always trying to minimize the cost of their supply chain. Understanding relationship between receival, outturn, within storage site and between storage site movements can provide us insights that can be useful in planning for the next harvest reason, estimating the throughput capacity of the system, relationship between throughout and inventory. This article explores the potential of scanner data in advance analytics. Combination of these two fields has the potential to be useful for grain storage business. The study describes Grain storage scenarios in the Australian context.

Download paper HERE 

The IoT analytics driven organization ..















I am working on this idea for the Data Science for IoT course and also my teaching at Oxford University and UPM (University of Madrid)

In the 90s there was a book Re-engineering the corporation.

So, how would an IoT based Analytics driven organization look like?

I see IoT will re-engineer corporations – both for the vendor side (supply chain) and the customer side (CRM).

IoT analytics will be the glue that transforms business process across Organization boundaries.

more on this coming soon ..

Data Science for Internet of Things – practitioner course









Data Science for Internet of Things – practitioner course

Welcome to the world’s first course that helps you to become a Data Scientist for the Internet Of Things.

The course starts on Nov 17

Please contact 

This niche, personalized course is suited for:

  • Developers who want to transition to a new role as Data Scientists
  • Entrepreneurs who want to launch new products covering IoT and analytics
  • Anyone interested in developing their career in IoT Analytics

The course starts from November 2015 and extends to March 2016. We work with you for the next year and a half to transition your career to Data Science.

Created by Data Science and IoT professionals, the course covers infrastructure (Hadoop – Spark), Programming / Modelling(R/Time series) and ioT. We cover unique aspects of Data Science for IoT including Deep Learning, Complex event processing/sensor fusion and Streaming/Real time analytics

Our vision is to create an intellectually elite group world class Data Scientists for IoT

Contact  us at to signup



  • You can transition your career to Data Science for ioT
  • You are not alone: Toolkits and community support to start working on real Data science problems for IoT
  • You master specific skills: Spark, R, Scala, IoT platforms, Data analysis, SQL among others
  • The content can be personalized (see examples of personalization below)
  • The Data Science principles can apply to other domains i.e. beyond IoT


(Note the modules and the sequence are subject to change)

An overview of Data Science

An overview of Data Science,  What is Data Science? What problems can be solved using Data science – Extracting meaning from Data – Statistical processes behind Data – Techniques to acquire data (ex APIs) – Handling large scale data – Big Data fundamentals

Data Science and IoT

The IoT ecosystem, Unique considerations for the IoT ecosystem – Addressing IoT problems in Data science (time series data, enterprise IoT edge computing, real-time processing, cognitive computing, image processing, introduction to deep learning algorithms, geospatial analysis for IoT/managing massive geographic scale, strategies for integration with hardware, sensor fusion)


The Apache Spark ecosystem

Apache spark in detail including Scala, SQL, SparkR, Mlib and GraphX


The Data Science for IoT toolkit

A set of models and techniques in the R programming language to work with IoT based scenarios based on Time series modelling. These include models from Retail, Healthcare, Energy, wearables, Transport etc covering specific examples in these domains. The module provides you a toolkit which you can adapt and use from Day one in your work.


Mathematical foundations of Machine learning

Here we formally cover the mathematics for Data science including Linear Algebra, Matrix algebra, Bayesian Statistics, Optimization techniques (Gradient descent) etc. We also cover Supervised algorithms, unsupervised algorithms (classification, regression, clustering, dimensionality reduction etc) as applicable to IoT datasets


Unique Elements for IoT

This module emphasises the following unique elements for IoT

  • Complex event processing (sensor fusion)
  • Deep Learning and
  • Real Time (Spark, Kafka etc)

Summary of Benefits and Features 

Impact on your work Designed for developers/ICT contractors/Entrepreneurs who want to transition their career towards Data science roles with an emphasis on IoT
Typical profile A developer who has skills in programming environments like Java, Ruby, Python, Oracle etc and wants to learn Data Science within the context of Internet of Things.
Community support? Yes. Also includes the Alumni network i.e. beyond the duration of the course at no extra cost. The course is based on a toolkit which we use to analyze IoT datasets in context of specific problems in industry verticals (ex Retail, Transport etc). You are thus empowered to work with IoT from the outset
Approach to Big Data For Big Data, the course is focussed on Apache Spark – specifically Scala, SQL, mlib. Graphx and others on HDFS
Approach to Programming see scope below
Approach to Algorithms see scope below
Is this a full data science course? Yes, we cover machine learning / Data science techniques which are applicable to any domain. Our focus is Internet of Things. The course is practitioner oriented i.e. not academic and is not affiliated to a university.
Investment Offline(London):  £1,200 GBP + VAT(if applicable)
Online:  Yes. Please contact us at
Help with jobs/employment yes, we aim to transition your career. Hence, we are selective in the recruitment for the course. There are no guarantees – but a career transition is a key goal for us. We work with you upto a year and a half from the start of the course to get a new role in Data Science/IoT
Created by professionals See our profiles below
Personalization The course can be personalized. Examples include a focus on CEP/Sensor fusion,  RNNs and Time series, Edge processing, SQL  etc. There is no extra cost for this but we agree scope before we start. If you are interested in this option, please let us know at info@futuretext.comIf you want to see examples of our work and content, please seeSpark SQL real time analytics by Sumit Pal(published on kdnuggets)The evolution of Deep learning models by Ajit Jaokar
Duration The course starts from November 2015 and extends to March 2016. We work with you for the next year and a half to transition your career to Data Science.



How is this approach different to the more traditional MOOCs?

Here’s how we differ from MOOCs

a)  We are not ‘Massive’ – this approach works for small groups with more focused and personalized attention. We will never have 1000s of participants

b)  We help in career leverage: We work actively with you for career leverage – ex you are a startup / you want to transition to a new job etc

c)  We are vendor agnostic

d)  We work actively with you to build your brand(Blogs/Open source/conferences etc)

e)  The course can be personalized to streams(ex with Deep learning, Complex event processing, Streaming etc)

f)  We teach the foundations of maths where applicable

g)  We work with a small number of platforms which provide current / in-demand skills – ex Apache Spark, R, Azure BI etc

h)  We are exclusively focused on IoT (although the concepts can apply to any other vertical)

Approach to Programming

The main Programming focus is on Spark (Scala, SQL and R). We will also use an ioT platform (like Thingworx) The participants need to be able to Code/come from a development background (the Programming language itself does not matter).

What is your approach to working with Algorithms and Maths?

The course is based on modelling IoT based problems in the R programming language.  We follow a context based learning approach – hence we co-relate the maths to specific R based IoT models.

You will need an aptitude for maths. However, we cover the mathematical foundations necessary. These include: Linear Algebra including Matrix algebra, Bayesian Statistics, Optimization techniques (such as Gradient descent) etc.

What is the implication of an emphasis on IoT?

In 2015, IoT is emerging but the impact is yet to be felt over the next five years. Today, we see IoT driven by Bluetooth 4.0 including iBeacons. Over the next five years, we will see IoT connectivity driven by the wide area network (with the deployment of 5G 2020 and beyond). We will also see entirely new forms of connectivity (ex from companies like Sigfox). Enterprises (Renewables, Telematics, Transport, Manufacturing, Energy, Utilities etc) will be the key drivers for IoT. On the consumer side, Retail and wearables will play a part. This tsunami of data will lead to an exponential demand for analytics since analytics is the key business model behind the data deluge. Most of this data will be Time series data but will also include other types of data. For example, our emphasis on IoT also includes Deep Learning since we treat video and images as sensors.  IoT will lead to a Re-imagining of everyday objects.

Why is this course unique?  

The course emphasizes some aspects are unique to IoT (in comparison to traditional data science). These include: A greater emphasis on time series data, Edge computing, Real-time processing, Cognitive computing, In memory processing, Deep learning, Geospatial analysis for IoT, Managing massive geographic scale(ex for Smart cities), Telecoms datasets, Strategies for integration with hardware and Sensor fusion (Complex event processing). Note that we include video and images as sensors through cameras (hence the study of Deep learning)

Who is creating/teaching this course?  

The course is created by futuretext and conducted by Ajit Jaokar, Dr Paul Katsande and Sumit Pal

Ajit Jaokar  – Based in London, Ajit’s research and consulting is based on Data Science and the Internet of Things. His work is based on his teaching at Oxford University and UPM (Technical University of Madrid) and covers IoT, Data Science, Smart cities and Telecoms.








Dr Paul Katsande is a technical architect based in London working with Apache Spark, Scala and Data Science. Paul’s PhD research is based on image processing from the University of Manchester.







Sumit Pal is a big data, visualisation and data science consultant. He is also a software architect and big data enthusiast and builds end-to-end data-driven analytic systems. Sumit has worked for Microsoft (SQL server development team), Oracle (OLAP development team) and Verizon (Big Data analytics team) in a career spanning 22 years. Currently, he works for multiple clients advising them on their data architectures and big data solutions and does hands on coding with Spark, Scala, Java and Python. Sumit is based in Boston.




We have limited spaces. Please contact us at if you want to take the next steps!


See testimonials below




Book review: Fundamentals of Deep Learning: Designing Next-Generation Artificial Intelligence Algorithms by Nikhil Buduma























UPDATE: I now use this book in my teaching at the Data Science for Internet of Things – practitioner’s course 

This blog is strictly not a book review since the book Fundamentals of Deep Learning: Designing Next-Generation Artificial Intelligence Algorithms by Nikhil Buduma is being published as an O Reilly early release (raw and unedited) book.

However, I have been a fan of Nikhil Buduma’s blog and writing. Hence, I bought the book as an early release and have enjoyed reading it. I also want to include it as a recommended book at the course I teach at Oxford University (Data Science for the Internet of Things)

There are very few accessible books on Deep Learning and it’s a complex and an evolving topic as I discussed in a recent blog – The evolution of Deep Learning Models. If you follow the detailed but readable posts on Nikhil’s blog such as A Deep dive into Recurrent neural networks  - you will enjoy the book

The first three chapters are released of the table of contents

Chapter 1 : The Neural Network

Chapter 2 : Training Feed Forward Neural Networks

Chapter 3  : Implementing Neural Networks in Theano

Chapter 4  : Beyond Gradient Descent

Chapter 5  : Convolutional Neural Networks:

Chapter  6 : Hopfield Networks and Restricted Bolzmann Machines

Chapter 7  : Deep Belief Networks

Chapter 8  : Recurrent Neural Networks

Chapter 9  : Autoencoders

Chapter 10 : Supplementary: Universality Theorems

(The table of contents is evolving )

I spoke to Nikhil about the creation and evolution of the book. Here are some comments from our discussion

How did the book idea come about

I first started writing about deep learning on my blog around January. I’d been hacking on it for a while and figured I might share the lessons I had learned applying these models to problems I’m passionate about (healthcare and language processing) with my peers within the MIT community. My blog got some pretty good reception, and ended up piquing the interest of Ben Lorica and Mike Loukides from O’Reilly. We talked about the possibility of writing a book, and I figured it would be a great way to make the field more accessible to a larger audience.

Writing an accessible book on Deep Learning

There’s definitely materials online for people interested in deep learning – a hodgepodge of papers, tutorials, and some books. Most of these materials are geared towards a highly academic audience, and it’s not particularly simple to navigate these resources. My goal was to synthesize the progress in the field so that anybody with some mathematical sophistication (basic calculus and familiarity with matrix manipulation) and Python programming under their belt would be able to tackle deep learning head on.

Explanation of Deep Learning models

As with classical machine learning, deep learning models can also be classified into three major areas – supervised, unsupervised, and reinforcement learning. My approach to the book is to develop an intuition for the major types of models. But in addition to being able to build their own, I’d like readers to come away with an understanding of why each model is designed the way it is. I think it’s this understanding that will enable readers to successfully leverage deep learning to tackle their own data challenges. I’m also interested in exploring some of the more exotic networks (augmented networks, long-term recurrent convolutional networks, spatial transformer networks, etc.) towards the end of my book to provide insights into the cutting edge of the field. Again, the focus here will not only be on how the models are structured, but also on why they’re structured the way they are.

Any final comments?

I’ve had the opportunity to work with luminaries in the machine learning space while writing this book, including Mike and Ben from O’Reilly, Jeff Dean of Google, and Jeff Hammerbacher of Mt. Sinai and Cloudera. I’m excited to see what readers think of the early release as it comes out, so I can tailor the content to what they’re looking for.

The book link is – Fundamentals of Deep Learning: Designing Next-Generation Artificial Intelligence Algorithms by Nikhil Buduma. I very much look forward to reading it as it develops and using it for my course

Evolution of Deep learning models

Evolution of Deep learning models


Ajit Jaokar


Data Science for Internet of Things

Linkedin Ajit Jaokar



PS – This paper is best downloaded as a free pdf HERE 

PPS I discuss these ideas in the Data Science for Internet of Things – practitioners course 

Scope and approach

This paper is a part of a series covering Deep Learning applications for Smart cities/IoT with an emphasis on Security (human activity detection, surveillance etc). It also relates to my teaching at Oxford and UPM (Madrid) on Data Science and Internet of Things. The content is also a part of a personalized Data Science course I teach (online and offline) Personalized Data Science for Internet of Things course. I am also looking for academic collaborators to jointly publish similar work. If you want to be a part of the personalized Data Science course or collaborate academically, please contact me at ajit.jaokar at or connect with me on Linkedin Ajit Jaokar

No taxonomy of Deep learning models exists. And I do not attempt to create one here either. Instead, I explore the evolution of Deep learning models by loosely classifying them into Classical Deep learning models and Emerging Deep Learning models. This is not an exact classification. Also, we embark on this exercise keeping our goal in mind i.e. the application of Deep learning models to Smart cities from the perspective of Security (Safety, Surveillance). From the standpoint of Deep learning models, we are interested in ‘Human activity recognition’ and its evolution. This will be explored in subsequent papers.

In this paper, we list the evolution of Deep Learning models and recent innovations. Deep Learning is a fast moving topic and we see innovation in many areas such as Time series, hardware innovations, RNNs etc. Where possible, I have included links to excellent materials / papers which can be used to explore further. Any comments and feedback welcome and I am happy to cross reference you if you can add to specific areas.  Finally, I would like to thanks Lee Omar, Xi Sizhe and Ben Blackmore all of Red Ninja Labs for their feedback

Deep Learning – learning through layers

Deep learning is often thought of as a set of algorithms that ‘mimics the brain’. A more accurate description would be an algorithm that ‘learns in layers’. Deep learning involves learning through layers which allows a computer to build a hierarchy of complex concepts out of simpler concepts. Deep learning algorithms apply to many areas including Computer Vision, Image recognition, pattern recognition, speech recognition, behaviour recognition etc

To understand the significance of Deep Learning algorithms, it’s important to understand how Computers think and learn. Since the early days, researchers have attempted to create computers that think. Until recently, this effort has been rules based adopting a ‘top down’ approach. The Top-down approach involved writing enough rules for all possible circumstances.  But this approach is obviously limited by the number of rules and by its finite rules base.

To overcome these limitations, a bottom-up approach was proposed. The idea here is to learn from experience. The experience was provided by ‘labelled data’. Labelled data is fed to a system and the system is trained based on the responses – leading to the field of Machine Learning. This approach works for applications like Spam filtering. However, most data (pictures, video feeds, sounds, etc.) is not labelled and if it is, it’s not labelled well.

The other issue is in handling problem domains which are not finite. For example, the problem domain in chess is complex but finite because there are a finite number of primitives (32 chess pieces)  and a finite set of allowable actions(on 64 squares).  But in real life, at any instant, we have potentially a large number or infinite alternatives. The problem domain is thus very large.

A problem like playing chess can be ‘described’ to a computer by a set of formal rules.  In contrast, many real world problems are easily understood by people (intuitive) but not easy to describe (represent) to a Computer (unlike Chess). Examples of such intuitive problems include recognizing words or faces in an image. Such problems are hard to describe to a Computer because the problem domain is not finite. Thus, the problem description suffers from the curse of dimensionality i.e. when the number of dimensions increase, the volume of the space increases so fast that the available data becomes sparse. Computers cannot be trained on sparse data. Such scenarios are not easy to describe because there is not enough data to adequately represent combinations represented by the dimensions. Nevertheless, such ‘infinite choice’ problems are common in daily life.

Deep learning is thus involved with ‘hard/intuitive’ problem which have little/no rules and high dimensionality. Here, the system must learn to cope with unforeseen circumstances without knowing the Rules in advance.


Feed forward back propagation network

The feed forward back propagation network is a model which mimics the neurons in the brain in a limited way. In this model:  a)      Each neuron receives a signal from the neurons in the previous layer b)      Each of those signals is multiplied by a weight value. c)      The weighted inputs are summed, and passed through a limiting function which scales the output to a fixed range of values. d)      The output of the limiter is then broadcast to all of the neurons in the next layer. The learning algorithm for this model is called Back Propagation (BP) which stands for “backward propagation of errors”. We apply the input values to the first layer, allow the signals to propagate through the network and read the output. A BP network learns by example i.e. we must provide a learning set that consists of some input examples and the known correct output for each case. So, we use these input-output examples to show the network what type of behaviour is expected. The BP algorithm allows the network to adapt by adjusting the weights by propagating the error value backwards through the network. Each link between neurons has a unique weighting value. The ‘intelligence’ of the network lies in the values of the weights. With each iteration of the errors flowing backwards, the weights are adjusted. The whole process is repeated for each of the example cases. Thus, to detect an Object, Programmers would train a neural network by rapidly sending across many digitized versions of data (for example, images)  containing those objects. If the network did not accurately recognize a particular pattern,  the weights would be adjusted. The eventual goal of this training is to get the network to consistently recognize the patterns that we recognize (ex Cats).

Building a hierarchy of complex concepts out of simpler concepts

Deep learning involves learning through layers which allows a computer to build a hierarchy of complex concepts out of simpler concepts. This approach works for subjective and intuitive problems which are difficult to articulate. Consider image data. Computers cannot understand the meaning of a collection of pixels. Mappings from a collection of pixels to a complex Object are complicated. With deep learning, the problem is broken down into a series of hierarchical mappings – with each mapping described by a specific layer.

The input (representing the variables we actually observe) is presented at the visible layer. Then a series of hidden layers extracts increasingly abstract features from the input with each layer concerned with a specific mapping. However, note that this process is not pre defined i.e. we do not specify what the layers select

For example: From the pixels, the first hidden layer identifies the edges

From the edges, the second hidden layer identifies the corners and contours

From the corners and contours, the third hidden layer identifies the parts of objects

Finally, from the parts of objects, the fourth hidden layer identifies whole objects








Image and example source: Yoshua Bengio book – Deep Learning

Classical Deep Learning Models

Based on the above intuitive understanding of Deep learning, we now explore Deep learning models in more detail. No taxonomy of Deep learning models exists. Hence, we loosely classify Deep learning models into Classical and Emerging. In this section, we discuss the Classical Deep learning models.

Autoencoders: Feed forward neural networks and Back propagation

Feed forward neural networks (with back propagation as a training mechanism) are the best known and simplest Deep learning models. Back propagation is based on the classical optimisation method of steepest descent. In a more generic sense, Back propagation algorithms are a form of autoencoders. Autoencoders are simple learning circuits which aim to transform inputs into outputs with the least possible amount of distortion. While conceptually simple, they play an important role in machine learning. Autoencoders were first used in the 1980s by Hinton and others to address the problem of “backpropagation without a teacher”. In this case, the input data was used as the teacher and attempts were made to simulate the brain by mimicking Hebbian learning rules (cells that fire together – wire together). Feedforward Neural Networks with many layers are also referred to as Deep Neural Networks (DNNs). There are many difficulties in training deep feedforward neural networks


Deep belief networks

To overcome these issues, in 2006 Hinton et al. at University of Toronto introduced Deep Belief Networks (DBNs) – which is considered a breakthrough for Deep learning algorithms.

Here, the learning algorithm greedily trains one layer at a time, with layers created by stacked Restricted Boltzmann Machines (RBM) (instead of stacked autoencoders). Here, Restricted Boltzmann Machines (RBMS), are stacked and trained bottom up in unsupervised fashion, followed by a supervised learning phase to train the top layer and fine-tune the entire architecture. The bottom up phase is agnostic with respect to the final task. A simple introduction to Restricted Boltzmann machines is HERE where the Intuition behind RBMs is explained by considering some visible random variables (film reviews from different users) and some hidden variables (like film genres or other internal features). The task of the RBMs is to find out through training as to how these two sets of variables are actually connected to each other.

Convolutional Neural Networks (CNN)

Convolutional Neural Networks are similar to Autoencoders and RBMs but instead of learning single global weight matrix between two layers, they aim to find a set of locally connected neurons through filters (kernels). (adapted from stackoverflow). CNNs are mostly used in image recognition. Their name comes from “convolution” operator. A tutorial on feature extraction using convolution explains more.  CNNs use data-specific kernels to find locally connected neurons. Similar to autoencoders or RBMs, they also translate many low-level features (e.g. user reviews or image pixels) to the compressed high-level representation (e.g. film genres or edges) – but now weights are learned only from neurons that are spatially close to each other. Thus, a Convolutional Neural Network (CNN) is comprised of one or more convolutional layers and then followed by one or more fully connected layers as in a standard multilayer neural network. The architecture of a CNN is designed to take advantage of the 2D structure of an input image (or other 2D input such as a speech signal). CNNs is that they are easier to train and have many fewer parameters than fully connected networks with the same number of hidden units. A CNN tutorial HERE

Recurrent neural networks (RNNs)

A recurrent neural network (RNN) is a class of artificial neural network where connections between units form a directed cycle. This creates an internal state of the network which allows it to exhibit dynamic temporal behaviour. Unlike feedforward neural networks, RNNs can use their internal memory to process arbitrary sequences of inputs. This makes them applicable to tasks such as unsegmented connected handwriting recognition, where they have achieved the best known results.

The fundamental feature of a Recurrent Neural Network (RNN) is that the network contains at least one feed-back connection, so the activations can flow round in a loop. That enables the networks to do temporal processing and learn sequences, e.g., perform sequence recognition/reproduction or temporal association/prediction.Thus, feedforward networks use Directed acyclic graphs whereas Recurrent neural networks use Digraphs (Directed graphs). See also this excellent tutorial – Deep Dive into recurrent neural networks by Nikhil Buduma

Emerging Deep learning models

In the above section, we saw the main Deep learning models. Deep learning techniques are rapidly evolving. Much of the innovation takes place in combining different forms of learning with existing Deep learning techniques.  Learning algorithms fall into three groups with respect to the sort of feedback that the learner has access to: supervised learning, unsupervised learning and reinforcement learning. We also see emerging areas like application of Deep Learning to Time series data. In the section below, we discuss Emerging Deep learning models.  The list is not exhaustive because the papers and techniques selected are more relevant to our problem domain(Application of Deep learning techniques for Smart cities with an emphasis on Human activity monitoring for Security/Surveillance)

Application of Reinforcement learning to Neural networks

Playing Atari with reinforcement learning presents the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards. The method is applied to  seven Atari 2600 games from the Arcade Learning Environment, with no adjustment of the architecture or learning algorithm. It is found that it outperforms all previous approaches on six of the games and surpasses a human expert on three of them. The paper Deep learning for reinforcement learning in Pacman (q-learning) also addresses similar issues but for the game Pacman. DeepMind(now a part of Google) has a number of papers on reinforcement learning. Sascha Lange and Martin Riedmiller  apply Deep Auto-Encoder Neural Networks in Reinforcement Learning. The paper Recurrent Models of Visual Attention by Volodymyr Mnih Nicolas Heess Alex Graves Koray Kavukcuoglu of Google DeepMind presents a novel recurrent neural network model that is capable of extracting information from an image or video by adaptively selecting a sequence of regions or locations and only processing the selected regions at high resolution. It can be trained using reinforcement learning methods to learn task-specific policies.


Combining modalities for Deep learning

Multimodality is also an area of innovation for Deep learning networks. Multimodal networks learn from different types of data sources for example training video, audio and text together(usually video, audio and text are distinct training modes). The paper Multimodal deep learning  proposes a deep autoencoder considers the cross modality learning setting where both modalities are present (video and audio) during feature learning but only a single modality is used for supervised training and testing.

In the paper Joint Deep Learning for Pedestrian Detection Wanli Ouyang and Xiaogang Wang use CNNs but add deformation layer to classify the parts. Feature extraction, deformation handling, occlusion handling, and classification are four important components in pedestrian detection. This paper proposes that they should be jointly learned in order to maximize their strengths through cooperation.

Deep Learning of Invariant Spatio-Temporal Features from Video uses the convolutional Restricted Boltzmann machine (CRBM) as a basic processing unit. Their model(Space-Time Deep Belief Network) – ST-DBN, alternates the aggregation of spatial and temporal information so that higher layers capture longer range statistical dependencies in both space and time.


Another area for innovation and evolution of Deep learning is Parallelization. For example, Deep learning on Hadoop at Paypal and  Massively Parallel Methods for Deep Reinforcement Learning


Time Series

Because IoT/Smart series Data is mostly Time Series data, the use of Time Series with Deep Learning is also relevant to our work. In most cases, RNNs of DBNs are used to not only make a prediction but also (like NEST) to adapt. The paper Deep Learning for Time Series modelling forecasts demand i.e. predicts energy loads across different network grid areas using only time and temperature data. The paper uses hourly demand for four and a half years from 20 different geographic regions, and similar hourly temperature readings from 11 zones. Time Series Classification Using Multi-Channels Deep Convolutional Neural Networks uses deep learning framework for multivariate time series classification and the paper by Gilberto Batres-Estrada uses Deep Learning for Multivariate Financial Time Series

Cognitive computing

Ultimately, we can expect many services to be Cognitive.

An algorithmic framework will be called cognitive if it has the following properties: 1. it integrates knowledge from (a) various structured or unstructured sources, (b) past experience, and (c) current state, in order to reason with this knowledge as well as to adapt over time; 2. it interacts with the user (e.g., by natural language or visualization) and reasons based on such interactions; and 3. it can generate novel hypotheses and capabilities, and test their effectiveness. Source: Cognitive Automation of Data Science. Deep learning is increasingly becoming a part of Cognitive computing


Some additional notes:

Deep Learning in contrast to other machine learning techniques

To recap, a more formal definition of Deep LearningDeep Learning: a class of machine learning techniques, where many layers of information processing stages in hierarchical architectures are exploited for unsupervised feature learning and for pattern analysis/classification. The essence of deep learning is to compute hierarchical features or representations of the observational data, where the higher-level features or factors are defined from lower-level ones.

Historically, Deep Learning is a form of the fundamental credit assignment problem (Minsky, 1963). Here,  Learning or credit assignment is about finding weights that make the neural network exhibit desired behaviour, such as driving a car. Deep Learning is about accurately assigning credit across many such stages. Historical reference through Marvin Minsky’s papers

Deep learning techniques can also be contrasted to more traditional machine learning techniques. When we represent some object as a vector of n elements, we say that this is a vector in n-dimensional space. Thus, dimensionality reduction refers to a process of refining data in such a way, that each data vector x is translated into another vector x′ in an m-dimensional space (vector with m elements), where m<n.  The most common way of doing this is PCA (Principal Component Analysis). PCA finds “internal axes” of a dataset (called “components”) and sorts them by their importance. The first m most important components are then used as new basis. Each of these components may be thought of as a high-level feature, describing data vectors better than original axes.

Both – autoencoders and RBMs – do the same thing. Taking a vector in n-dimensional space they translate it into an m-dimensional one, trying to keep as much important information as possible and, at the same time, remove noise. If training of autoencoder/RBM was successful, each element of resulting vector (i.e. each hidden unit) represents something important about the object – shape of an eyebrow in an image, genre of a film, field of study in scientific article, etc. You take lots of noisy data as an input and produce much less data in a much more efficient representation. In the above image, we see an example of such a deep network. We start with ordinary pixels, proceed with simple filters, then with face elements and finally end up with entire faces. This is the essence of deep learning. (Adapted from stackexcahnge).

So, one could ask: If we already have techniques like PCA, why do we need autoencoders and RBMs? The reason is: PCA only allows linear transformation of a data vectors. Autoencoders and RBMs, on other hand, are non-linear by the nature, and thus, they can learn more complicated relations between visible and hidden units. Moreover, they can be stacked, which makes them even more powerful. Most problems addressed by Deep learning neural networks are not linear i.e. if we were able to model relationships linearly between the independent and dependent variable, classic regression techniques apply. The paper Deep neural networks as recursive generalised linear models (RGLMs) explains the applicability of Deep Learning techniques to non-linear problems from a statistical standpoint

Deep Learning and Feature learning

Deep Learning can be hence seen as a more complete, hierarchical and a ‘bottom up’ way for feature extraction and without human intervention. Deep Learning is a form of Pattern Recognition system and the performance of a pattern recognition system heavily depends on feature representation. In the past, manually designed features were used for image and video processing. These rely on human domain knowledge and it is hard to manually tune them.  Thus, developing effective features for new applications is a slow process. Deep learning overcomes this problem of feature extraction. Deep learning also distinguishes multiple factors and a hierarchy in video and audio data for example Objects (sky, cars, roads, buildings, pedestrians),  parts (wheels, doors, heads) can be decomposed from images. For this task, more layers provide greater granularity. For example Google net has more than 20 layers

Source: ELEG 5040 Advanced Topics on Signal Processing (Introduction to Deep Learning) by Xiaogang Wang

Deep learning and Classification techniques

None of deep learning models discussed here work as classification algorithms. Instead, they can be seen as Pretrainin , automated feature selection and learning, creating a hierarchy of features etc. Once trained (features are selected), the input vectors are transformed into a better representation and these are in turn passed on to a real classifier such as SVM or Logistic regression.  This can be represented as below.






Source: ELEG 5040 Advanced Topics on Signal Processing (Introduction to Deep Learning) by Xiaogang Wang

Advances in Hardware

Another major source for innovation in Deep learning networks is Hardware.  The impact of hardware on Deep Learning is a complex topic – but two examples are: The Qualcomm zeroth platform that brings cognitive and Deep learning capabilities – including to Mobile devices. Similarly, the NVIDIA cuDNN – GPU Accelerated Deep Learning

DBNs to pre-train DNNs

Finally, Deep learning techniques have synergies amongst themselves. We explained DBNs and DNNs above.  DBNs and DNNs can be used in conjunction i.e. Deep Belief Net (that use RBM for layer-wise training) can be used as the pre-training method for a Deep neural network.


This paper is a part of a series covering Deep Learning applications for Smart cities/IoT with an emphasis on Security (human activity detection, surveillance etc). Subsequent parts of this paper will cover human activity detection and Smart cities. The content is a part of a personalized Data Science course I teach (online and offline) Personalized Data Science for Internet of Things course. I am also looking for academic collaborators to jointly publish similar work. If you want to be a part of the personalized Data Science course or collaborate academically,  please contact me at ajit.jaokar at or connect with me on Linkedin Ajit Jaokar

IoT analytics, Edge Computing and Smart Objects






The term ‘Smart objects’ has been around from the times of Ubiquitous Computing.

However, as we have started building Smart objects, I believe that the meaning and definition has evolved.

Here is my view on how the definition of Smart Objects has changed in the world of Edge Computing and increasing processing capacity

At a minimum, a smart Object should have 3 things

a) An Identity ex ipv6
b) Sensors / actuators
c) A radio (Bluetooth / cellular etc)

In addition, a smart object could incorporate

a) Physical context ex location
b) Social context ex proximity in social media

To extend even more, Smartness could incorporate analytics

Some of these analytics could be performed on the device itself ex computing at the edge concept from Intel, Cisco and others.

However, Edge Computing as discussed today, still has some limitations

For example:

a)     The need to incorporate multiple feeds from different sensors to reach a decision ‘at the edge’

b)    The need for a workflow process i.e. actions based on readings – again often at the edge with it’s accompanying security and safety measures

To manage multiple sensor feeds, we need to understand concepts like sensor fusion (pdf) (source freescale).

We already have some rudimentary workflow through mechanisms like IFTTT(If this then that)

In addition, the rise of CPU capacity leads to greater intelligence on the device – for example Qualcomm Zeroth platform which enables Deep learning algorithms on the device.

So, in a nutshell, its a evolving concept especially if we include IoT analytics in the definition of Smart objects (and that some of these analytics could be performed at the Edge)  ..

We cover these ideas in the #DataScience for #IoT course and also at the courses I teach at Oxford University

Comments welcome