The AI layer for the Enterprise and the role of IoT

Introduction 

According to Deloitte: by the “end of 2016 more than 80 of the world’s 100 largest enterprise software companies by revenues will have integrated cognitive technologies into their products”. Gartner also predicts that 40 percent of the new investment made by enterprises will be in predictive analytics by 2020. AI is moving fast into the Enterprise and AI developments can create value for the Enterprise. This value can be captured/visualized by considering an ‘Enterprise AI layer’. This AI layer is focussed on solving relatively mundane problems which are domain specific.  While this is not as ‘sexy’ as the original vision of AI, it provides tangible benefits to companies.

 

In this brief article, we proposed a logical concept called the AI layer for the Enterprise.  We could see such a layer as an extension to the Data Warehouse or the ERP system. This has tangible and practical benefits for the Enterprise with a clear business model. The AI layer could also incorporate the IoT datasets and unite the disparate ecosystem. The Enterprise AI layer theme is a key part of the Data Science for Internet of Things course. Only a last few places remain for this course!.

 

Enterprise AI – an Intelligent Data Warehouse/ERP system?

AI enables computers to do some things better than humans especially when it comes to finding insights from large amounts of Unstructured or semi-structured data. Technologies like Machine learning , Natural language processing (NLP) , Speech recognition, and computer vision drive the AI layer. More specifically, AI applies to an algorithm which is learning on its own.

 

To understand this, we have to ask ourselves: How do we train a Big Data algorithm?  

There are two ways:

  • Start with the Rules and apply them to Data (Top down) OR
  • Start with the data and find the rules from the Data (Bottom up)

 

The Top-down approach involved writing enough rules for all possible circumstances.  But this approach is obviously limited by the number of rules and by its finite rules base. The Bottom-up approach applies for two cases. Firstly, when rules can be derived from instances of positive and negative examples(SPAM /NO SPAN). This is traditional machine learning when the Algorithm can  be trained.  But, the more extreme case is : Where there are no examples to train the algorithm.

 

What do we mean by ‘no examples’?

 

a)      There is no schema

b)      Linearity(sequence) and hierarchy is not known

c)      The  output is not known(non-deterministic)

d)     Problem domain is not finite

 

Hence, this is not an easy problem to solve. However, there is a payoff in the enterprise if AI algorithms can be created to learn and self-train manual, repetitive tasks – especially when the tasks involve both structured and unstructured data.

 

How can we visualize the AI layer?

One simple way is to think of it as an ‘Intelligent Data warehouse’ i.e. an extension to either the Data warehouse or the ERP system

 

For instance,  an organization would transcribe call centre agents’ interactions with customers create a more intelligent workflow, bot etc using Deep learning algorithms.

Enterprise AI layer – What it mean to the Enterprise

So, if we imagine such a conceptual AI layer for the enterprise, what does it mean in terms of new services that can be offered?  Here are some examples

  • Bots : Bots are a great example of the use of AI to automate repetitive tasks like scheduling meetings. Bots are often the starting point of engagement for AI especially in Retail and Financial services
  • Inferring from textual/voice narrative:  Security applications to detect suspicious behaviour, Algorithms that  can draw connections between how patients describe their symptoms etc
  • Detecting patterns from vast amounts of data: Using log files to predict future failures, predicting cyberseurity attacks etc
  • Creating a knowledge base from large datasets: for example an AI program that can read all of Wikipedia or Github.
  • Creating content on scale: Using Robots to replace Writers or even to compose Pop songs
  • Predicting future workflows: Using existing patterns to predict future workflows
  • Mass personalization:  in advertising
  • Video and image analytics: Collision Avoidance for Drones, Autonomous vehicles, Agricultural Crop Health Analysis etc

 

These  applications provide competitive advantage, Differentiation, Customer loyalty and  mass personalization. They have simple business models (such as deployed as premium features /new products /cost reduction )

 

The Enterprise AI layer and IoT

 

So, the final question is: What does the Enterprise layer mean for IoT?

 

IoT has tremendous potential but faces an inherent problem. Currently, IoT is implemented in verticals/ silos and these silos do not talk to each other. To realize the full potential of IoT, an over-arching layer above individual verticals could ‘connect the dots’. Coming from the Telco industry, these ideas are not new i.e. the winners of the mobile/Telco ecosystem were iPhone and Android – which succeeded in doing exactly that.

 

Firstly, the AI layer could help in deriving actionable insights from billions of data points which come from IoT devices across verticals. This is the obvious benefit as IoT data from various verticals can act as an input to the AI layer.  Deep learning algorithms play an important role in IoT analytics because Machine data is sparse and / or has a temporal element to it. Devices may behave differently at different conditions. Hence, capturing all scenarios for data pre-processing/training stage of an algorithm is difficult. Deep learning algorithms can help to mitigate these risks by enabling algorithms to learn on their own. This concept of machines learning on their own can be extended to ‘machines teaching other machines’. This idea is not so far-fetched and is already happening, A Fanuc robot teaches itself to perform a task overnight by observation and through reinforcement learning. Fanuc’s robot uses reinforcement learning to train itself. After eight hours or so it gets to 90 percent accuracy or above, which is almost the same as if an expert were to program it. The process can be accelerated if several robots work in parallel and then share what they have learned. This form of distributed learning is called cloud robotics

 

We can extend the idea of ‘machines teaching other machines’ more generically within the Enterprise. Any entity in an enterprise can train other ‘peer’ entities in the Enterprise. That could be buildings learning from other buildings – or planes or oil rigs.  We see early examples of this approach in Salesforce.com and Einstein. Longer term, Reinforcement learning is the key technology that drives IoT and AI layer for the Enterprise – but initially any technologies that implement self learning algorithms would help for this task

Conclusion

In this brief article, we proposed a logical concept called the AI layer for the Enterprise.  We could see such a layer as an extension to the Data Warehouse or the ERP system. This has tangible and practical benefits for the Enterprise with a clear business model. The AI layer could also incorporate the IoT datasets and unite the disparate ecosystem.  This will not be easy. But it is worth it because the payoffs for creating such an AI layer around the Enterprise are huge! The Enterprise AI layer theme is a key part of the Data Science for Internet of Things course. Only a last few places remain for this course!.

Data Science for Internet of Things course – Strategic foundation for decision makers

Data Science for Internet of Things course - Strategic foundation for decision makers 

To sign up or learn more email info@futuretext.com The course starts in Sep 2016

We have had a great response to the Data Science for Internet of Things course. The course takes a technological focus aiming enabling you to become a Data Scientist for the Internet of Things. I also had many requests for a Strategic version of the Data Science for Internet of Things Course for decision makers.

Today, we launch special edition of the course only for decision makers.

The course is based on an open problem solving methodology for IoT analytics which we are developing within the course.

 Why do we need a methodology for Data Science for IoT?

 

IoT will create huge volumes of Data making the discovery of insights more critical. Often, the analytics process will need to be automated. By establishing a formal process for extracting knowledge from IoT applications by IoT vertical, we capture best practise.

This saves implementation time and cost. The methodology is more than Data Mining (i.e. application of algorithms) – but rather, it leans more to KDDM (Knowledge Discovery and Data Mining) principles. It is thus concerned with the entire end-to-end Knowledge extraction process for IoT analytics.

This includes developing scalable algorithms that can be used to analyze massive datasets, interpreting and visualizing results and modelling the engagement between humans and the machine. The main motivation for Knowledge Discovery models is to ensure that the end product will be useful to the user.

Thus, the methodology includes aspects of IoT analytics such as validity, novelty, usefulness, and understandability of the results(by IoT vertical). The methodology builds on a series of interdependent steps with milestones. The steps often include loops and iterations and cover all the processes end to end (including KPIs, Business case, project management). We explore Data Science for IoT analytics at multiple levels including Process level, Workflow level and Systems level.

The concept of a KDDM process model was discussed in 1990s by Anand, Brachman, Fayyad, Piatetsky-Shapiro and others. In a nutshell, we build upon these ideas and apply them to IoT analytics. We also create code in Open source for this methodology.

As a decision maker, by joining the course, you have early and on-going access to both the methodology and the open source code.

Please contact us to sign up or to know more info@futuretext.com

Testimonials for our current course

 Jean Jacques Bernand – Paris – France

“Great course with many interactions, either group or one to one that helps in the learning. In addition, tailored curriculum to the need of each student and interaction with companies involved in this field makes it even more impactful.

As for myself, it allowed me to go into topics of interests that help me in reshaping my career.”

Johnny Johnson, AT&T – USA

“This DSIOT course is a great way to get up-to-speed.  The tools and methodologies for managing devices, wrangling and fusing data, and being able to explain it are taking form fast; Ajit Jaokar is a good fit.  For me, his patience and vision keep this busy corporate family man coming back.”

Yongkang Gao, General Electric, UK.

“I especially thank Ajit for his help on my personal project of the course — recommending proper tools and introducing mentors to me, which significantly reduced my pain in the beginning stage.”

karthik padmanabhan Manager – Global Data Insight and Analytics (GDIA) – Ford Motor Pvt Ltd.

“I am delighted to provide this testimonial to Ajit Jaokar who has extended outstanding support and guidance as my mentor during the entire program on Data science for IoT. Ajit is a world renowned professional in the niche area of applying the Data science principles in creating IoT apps. Talking about the program, it has a lot of breadth and depth covering some of the cutting edge topics in the industry such as Sensor Fusion, Deep Learning oriented towards the Internet of things domain. The topics such as Statistics, Machine Learning, IoT Platforms, Big Data and more speak about the complexity of the program. This is the first of its kind program in the world to provide Data Science training especially on the IoT domain and I feel fortunate to be part of the batch comprising of participants from different countries and skill sets. Overall this journey has transformed me into a mature and confident professional in this new space and I am grateful to Ajit and his team. My wish is to see this program accepted as a gold standard in the industry in the coming years”.

Peter Marriott – UK – www.catalystcomputing.co.uk

Attending the Data Science for IoT course has really helped me in demystifying the tools and practices behind machine learning and has allowed me to move from an awareness of machine learning to practical application.

Yair Meidan Israel – https://il.linkedin.com/in/yairmeidandatamining

“As a PhD student with an academic and practical experience in analytics, the DSIOT course is the perfect means by which I extend my expertise to the domain of IoT. It gradually elaborates on IoT concepts in general, and IoT analytics in particular. I recommend it to any person interested in entering that field. Thanks Ajit!”

Parinya Hiranpanthaporn, Data Architect and Advanced Analytics professional Bangkok

“Good content, Good instructor and Good networking. This course totally answers what I should know about Data Science for Internet of Things.”

 

Sibanjan Das – Bangalore

Ajit helped me to focus and set goals for my career that is extremely valuable. He stands by my side for every initiative I take and helps me to navigate me through every difficult situation I face. A true leader, a technology specialist, good friend and a great mentor. Cheers!!!

Manuel Betancurt – Mobile developer / Electronic Engineer. – Australia

I have had the opportunity to partake in the Data Science for the IoT course taught by Ajit Jaokar. He have crafted a collection of instructional videos, code samples, projects and social interaction with him and other students of this deep knowledge.

Ajit gives an awesome introduction and description of all the tools of the trade for a data scientist getting into the IoT. Even when I really come from a software engineering background, I have found the course totally accessible and useful. The support given by Ajit to make my IoT product a data science driven reality has been invaluable. Providing direction on how to achieve my data analysis goals and even helping me to publish the results of my investigation.

The knowledge demonstrated on this course in a mathematical and computer science level has been truly exciting and encouraging. This course was the key for me to connect the little data to the big data.

Barend Botha – London and South Africa – http://www.sevensymbols.co.uk

This is a great course for anyone wanting to move from a development background into Data Science with specific focus on IoT. The course is unique in that it allows you to learn the theory, skills and technologies required while working on solving a specific problem of your choice, one that plays to your past strengths and interests. From my experience care is taken to give participants one to one guidance in their projects, and there is also within the course the opportunity to network and share interesting content and ideas in this growing field. Highly recommended!

- Barend Botha

Jamie Weisbrod – San Diego - https://www.linkedin.com/in/jamie-weisbrod-3630053

Currently there is a plethora of online courses and degrees available in data science/big data. What attracted me to joining the futuretext class “Data Science for ioT” is Ajit Jaokar. My main concern in choosing a course was how to leverage skills that I already possessed as a computer engineer. Ajit took the time to discuss how I could personalize the course for my interests.

I am currently in the midst of the basic coursework but already I have been able to network with students all over the world who are working on interesting projects. Ajit inspires a lot of people at all ages as he is also teaching young people Data science using space exploration.

 Robert Westwood – UK – Catalyst computing
“Ajit brings to the course years of experience in the industry and a great breadth of knowledge of the companies, people and research in the Data Science/IoT arena.”

Overall, the syllabus covers the following themes in 6 months

Note that the schedule is personalized and flexible for the strategic course

i.e. we discuss and personalize your schedule at the start of the course

  • Principles
  • Problem solving with Data Science: Is an overall process of solving Data Science problems(agnostic of a language) and covers aspects such as exploratory Data analysis)
  • IoT analytics (includes analysis for each vertical within iot. This will be ongoing throughout the course including in the methodology)
  • Foundations of R: The basics of one Programming language ( R ) and how to implement Data science algorithms in R
  • Time Series – which forms the basis of most IoT data (code in R)
  • Spark and NoSQL databases: Code in Scala and implementation in Cassandra
  • Deep Learning
  • Data Science for IoT Methodology
  • Maths and Stats – (this will also be ongoing but will be a core module)
we also have (from day one) what we call foundation projects where you work in groups with projets where you already have code etc. so you apply the concepts in context of a real situation

 

Data Science for Internet of Things: A coaching approach

In the Data Science for Internet of Things course I take a coaching approach. I have alluded to this in the post about foundation projects  and construtivism.

Coaching has a questionable reputation – with some justification

But here, we are talking of high performance coaching strategies

For example: Consider the approach of a book like The talent code 

The author explores the world’s greatest talent hotbeds: tiny places that produce huge amounts of talent ex a small gym in Moscow that produces a large number of gold medalists in athletics. He found that there’s a pattern common to all of them: methods of training, motivation, and coaching. They also place and emphasis on hard skills

So, what does this mean for participants in context of foundation projects?

a)      Start with what you know(ideally)

b)      Work collaboratively

c)       Push your limits(you can choose something different)

d)      Each group for a project will have one person/s who is knowledgeable

e)      Your outcomes should be specific

f)       You can see the big picture through the methodology for problem solving with Data Science for Internet of Things

g)      Your contribution should be measurable

h)      Your contribution should be based on acquiring a specific skill

i)        foundation projects have a quiz

From my perspective – as tutor / coach

  • I need to understand what the participants already know (baseline)
  • Provide measurable feedback
  • Extend your capabilities/push limits
  • Ensure you acquire definite skills
  • Keep you motivated
  • Keep your learning at the right pace
  • Foster a sense of community
  • Provide alternative mentors in the community
  • Use newer methods of learning ex concept maps
  • Create great conversations
  • Allow room for unplanned expansion

I think these techniques applied online are new – and there is so much to learn for all.

If you are interested in the Data Science for Internet of Things course, please email us at info at futuretext.com

Data Science for IoT – role of foundation projects(constructivist learning)

 

In the Data Science for Internet of Things course, I use some elements of constructivism through the use of foundation projects.

Foundation projects allow the participant to choose a learning context which is most familiar to them based on their  existing experience
Foundation projects are different from the Capstone projects for each participant
This form of context based learning is not familiar to most people hence some notes
1) Context based learning is based loosely on constructivism .
A concise description –  Constructivism is pedagogy / learning theory which advocates that people construct their own understanding and knowledge of the world, through experiencing things and reflecting on those experiences. The teacher makes sure she understands the students’ preexisting conceptions, and guides the activity to address them and then build on them.
adapted from source :
“The most important single factor influencing learning is what the learner already knows. Ascertain this and teach him accordingly”
Quote by Asubel one of the pioneers of this education:
In Holland and Germany, this form of education in Science is called by various names ex concept context learning (pdf)
What it means for learning in the Data Science for IoT course:
1)  we follow two modes of learning in parallel - instructivist (via the video based modules) and constructivist (via the foundation projects)
2) for the foundation projects, the participants choose a context most familiar to them from your prior experience. (ex healthcare, renewables, Industrial IoT etc)
The downside of applying constructivist methods to learning is .. they take a relatively long time – hence the longer duration of the course
for the current batch, the foundation projects are:
The foundation projects and project leaders are
Wearables: led by Quang Nam Tran (London)

Renewables:led by vaijayanti vadiraj(Bangalore)

Python for Data Science - temporarily led by me

Big Data: Spark and Cassandra for IoT - temporarily led by me – looking to handover and Trenton Potgieter (Austin)
Deep Learning with Nvidia: led by Jean Jacques Bernard(Paris) and Yongkang Gao(UK)
Data visualization with R: Barend Botha(London)
Predix: Industrial IoT – temporarily led by me – looking to hand over
ETL/Pentaho -
Deep learning and Machine learning with H2O led by Sibanjan Das(Bangalore)
Remote monitoring of elderly/patient care / healthcare - Manuel Betancurt(Sydney)

More details about the course:  Data Science for Internet of Things course

Image: Jean Piaget – the founder of Constructivism

I am listed no 19 among top 50 authorities on twitter for #iot

nice to be listed here amongst some great company

top 50 authorities on twitter for #iot

Young Data Scientist: Data visualizations of our Ardusat/ASE Space experiment using Python

These are visualizations from the live data from our satellite experiment with Ardusat ASE challenge which we won last year.

Will be part of a book called Young Data Scientist

Created using Python libraries json, pandas, matplotlib, statsmodels, numpy.

We use linear regression and logistic regression to detect cloud presence. will be released as part of the Young Data Scientist book (Countdown Institute)

I even got a mapping of the route of the satellite (equatorial orbit) For some background see

Using Space Exploration to teach Young People about Data Science

  Please email me at ajit.jaokar at futuretext.com with subject Young Data Scientist if you want to know more as we launch the book/initiative

A methodology for solving problems with DataScience for Internet of Things

 

Introduction

This (long!) blog is based on my forthcoming book:  Data Science for Internet of Things.

It is also the basis for the course I teach  Data Science for Internet of Things Course.   Welcome your comments.  Please email me at ajit.jaokar at futuretext.com  - Email me also for a pdf version if you are interested in joining the course

Here, we start off with the question:  At which points could you apply analytics to the IoT ecosystem and what are the implications?  We then extend this to a broader question:  Could we formulate a methodology to solve Data Science for IoT problems?  I have illustrated my thinking through a number of companies/examples.  I personally work with an Open Source strategy (based on R, Spark and Python) but  the methodology applies to any implementation. We are currently working with a range of implementations including AWS, Azure, GE Predix, Nvidia etc.  Thus, the discussion is vendor agnostic.

I also mention some trends I am following such as Apache NiFi etc

The Internet of Things and the flow of Data

As we move towards a world of 50 billion connected devices,  Data Science for IoT (IoT  analytics) helps to create new services and business models.  IoT analytics is the application of data science models  to IoT datasets.  The flow of data starts with the deployment of sensors.  Sensors detect events or changes in quantities. They provide a corresponding output in the form of a signal. Historically, sensors have been used in domains such as manufacturing. Now their deployment is becoming pervasive through ordinary objects like wearables. Sensors are also being deployed through new devices like Robots and Self driving cars. This widespread deployment of sensors has led to the Internet of Things.

Features of a typical wireless sensor node are described in this paper (wireless embedded sensor  architecture). Typically, data arising from sensors is in time series format and is often geotagged. This means, there are two forms of analytics for IoT: Time series and Spatial analytics. Time series analytics typically lead to insights like Anomaly detection. Thus, classifiers (used to detect anomalies) are commonly used for IoT analytics to detect anomalies.  But by looking at historical trends, streaming, combining data from multiple events(sensor fusion), we can get new insights. And more use cases for IoT keep emerging such as Augmented reality (think – Pokemon Go + IoT)

Meanwhile,  sensors themselves continue to evolve. Sensors have shrunk due to technologies like MEMS. Also, their communications protocols have improved through new technologies like LoRA. These protocols lead to new forms of communication for IoT such as Device to Device; Device to Server; or Server to Server. Thus, whichever way we look at it, IoT devices create a large amount of Data. Typically, the goal of IoT analytics is to analyse the data as close to the event as possible. We see this requirement in many ‘Smart city’ type applications such as Transportation, Energy grids, Utilities like Water, Street lighting, Parking etc

IoT data transformation techniques

Once data is captured through the sensor, there are a few analytics techniques that can be applied to the Data. Some of these are unique to IoT. For instance, not all data may be sent to the Cloud/Lake.  We could perform temporal or spatial analysis. Considering the volume of Data, some may be discarded at source or summarized at the Edge. Data could also be aggregated and aggregate analytics could be applied to the IoT data aggregates at the ‘Edge’. For example,  If you want to detect failure of a component, you could find spikes in values for that component over a recent span (thereby potentially predicting failure). Also, you could correlate data in multiple IoT streams. Typically, in stream processing, we are trying to find out what happened now (as opposed to what happened in the past).  Hence, response should be near real-time. Also, sensor data could be ‘cleaned’ at the Edge. Missing values in sensor data could be filled in(imputing values),  sensor data could be combined to infer an event(Complex event processing), Data could be normalized, we could handle different data formats or multiple communication protocols, manage thresholds, normalize data across sensors, time, devices etc

 

 

Applying IoT Analytics to the Flow of Data

Overview

Here, we address the possible locations and types of analytics that could be applied to IoT datasets.

(Please click to expand diagram)

 

Some initial thoughts:

  • IoT data arises from  sensors and ultimately resides in the Cloud.
  • We  use  the  concept  of  a  ‘Data  Lake’  to  refer  to  a repository of Data
  • We consider four possible avenues for IoT analytics: ‘Analytics  at  the  Edge’,  ‘Streaming  Analytics’ , NoSQL databases and ‘IoT analytics at the Data Lake’
  • For  Streaming  analytics,  we  could  build  an  offline model and apply it to a stream
  • If  we  consider  cameras  as  sensors,  Deep  learning techniques could be applied to Image and video datasets (for example  CNNs)
  • Even when IoT data volumes are high, not  all  scenarios  need  Data  to  be distributed. It is very much possible to run analytics on a single node using a non-distributed architecture using Python or R systems.
  • Feedback mechanisms are a key part of IoT analytics. Feedback is part of multiple IoT analytics modalities ex Edge, Streaming etc
  • CEP (Complex event processing) can be applied to multiple points as we see in the diagram

 

We now describe various analytics techniques which could apply to IoT datasets

Complex event processing

Complex Event Processing (CEP) can be used in multiple points for IoT analytics (ex : Edge, Stream, Cloud et).

In general, Event processing is a method of tracking and  analyzing  streams  of  data and deriving a conclusion from them. Complex event processing, or CEP, is event processing that combines data from multiple sources to infer events or patterns that suggest more complicated circumstances. The goal of complex event processing is to identify meaningful events (such as opportunities or threats) and respond to them as quickly as possible.

In CEP, the data is at motion. In contrast, a traditional Query (ex an RDBMS) acts on Static Data. Thus, CEP is mainly about Stream processing but the algorithms underlining CEP can also be applied to historical data

CEP relies on a number of techniques including for Events: pattern detection, abstraction, filtering,  aggregation and transformation. CEP algorithms model event hierarchies and detect relationships (such as causality, membership or timing) between events. They create an abstraction of an  event-driven processes. Thus, typically, CEP engines act as event correlation engines where they analyze a mass of events, pinpoint the most significant ones, and trigger actions.

Most CEP solutions and concepts can be classified into two main categories: Aggregation-oriented CEP and Detection-oriented CEP.  An aggregation-oriented CEP solution is focused on executing on-line algorithms as a response  to  event  data  entering  the  system  –  for example to continuously calculate an average based on data in the inbound events. Detection-oriented CEP is focused on detecting combinations of events called events patterns or situations – for example detecting a situation is to look for a specific sequence of events. For IoT, CEP techniques are concerned with deriving a higher order value / abstraction from discrete sensor readings.

CEP uses techniques like Bayesian    networks,    neural    networks,     Dempster- Shafer methods, kalman filters etc. Some more background at Developing a complex event processing architecture for IoT

Streaming analytics

Real-time systems differ in the way they perform analytics. Specifically,  Real-time  systems  perform  analytics  on  short time  windows  for  Data  Streams.  Hence, the scope  of  Real Time analytics is a ‘window’ which typically comprises of the last few time slots. Making Predictions on Real Time Data streams involves building an Offline model and applying it to a stream. Models incorporate one or more machine learning algorithms which are trained using the training Data. Models are first built offline based on historical data (Spam, Credit card fraud etc). Once built, the model can be validated against a real time system to find deviations in the real time stream data. Deviations beyond a certain threshold are tagged as anomalies.

IoT ecosystems can create many logs depending on the status of IoT devices. By collecting these logs for a period of time and analyzing the sequence of event patterns, a model to predict a fault can be built including the probability of failure for the sequence. This model to predict failure is then applied to the stream (online). A technique like the Hidden Markov Model can be used for detecting failure patterns based on the observed sequence. Complex Event Processing can be used to combine events over a time frame (ex in the last one minute) and co-relate patterns to detect the failure pattern.

Typically, streaming systems could be implemented in Kafka and spark

 

Some interesting links on streaming I am tracking:

 Newer versions of kafka designed for iot use cases

Data Science Central: stream processing and streaming analytics how it works

Iot 101 everything you need to know to start your iot project – Part One

Iot 101 everything you need to know to start your iot project – Part Two

 

Edge Processing

Many vendors like Cisco and Intel are proponents of Edge Processing  (also  called  Edge  computing).  The  main  idea behind Edge Computing is to push processing away from the core and towards the Edge of the network. For IoT, that means pushing processing towards the sensors or a gateway. This enables data to be initially processed at the Edge device possibly enabling smaller datasets sent to the core. Devices at the Edge may not be continuously connected to the network. Hence, these devices may need a copy of the master data/reference data for processing in an offline format. Edge devices may also include other features like:

•    Apply rules and workflow against that data

•    Take action as needed

•    Filter and cleanse the data

•    Store local data for local use

•    Enhance security

•    Provide governance admin controls

IoT analytics techniques applied at the Data Lake

Data Lakes

The concept of a Data Lake is similar to that of a Data warehouse or a Data Mart. In this context, we see a Data Lake as a repository for data from different IoT sources. A Data Lake is driven by the Hadoop platform. This means, Data in a Data lake is preserved in its raw format. Unlike a Data Warehouse, Data in a Data Lake is not pre-categorised. From an analytics perspective, Data Lakes are relevant in the following ways:

  • We could monitor the stream of data arriving in the lake for specific events or could co-relate different streams. Both of these tasks use Complex event processing (CEP). CEP could also apply to Data when it is stored in the lake to extract broad, historical perspectives.
  • Similarly, Deep learning and other techniques could be applied to IoT datasets in the Data Lake when the Data  is ‘at rest’. We describe these below.

ETL (Extract Transform and Load)

Companies like Pentaho are applying ETL techniques to IoT data

Deep learning

Some deep learning techniques could apply to IoT datasets. If you consider images and video as sensor data, then we could apply various convolutional neural network techniques to this data.

It gets more interesting when we consider RNNs(Recurrent Neural Networks)  and Reinforcement learning. For example – Reinforcement learning and time series – Brandon Rohrer How to turn your house robot into a robot – Answering the challenge – a new reinforcement learning robot

Over time, we will see far more complex options – for example for Self driving cars  and the use of Recurrent neural networks (mobileeye)

Some more interesting links for Deep Learning and IoT:

Optimization

Systems level optimization and process level optimization for IoT is another complex area where we are doing work. Some links for this

 

 Visualization

Visualization is necessary for analytics in general and IoT analytics is no exception

Here are some links

NOSQL databases

NoSQL databases today offer a great way to implement IoT analytics. For instance,

Apache Cassandra for IoT

MongoDB and IoT tutorial

 

Other  IoT analytic techniques

In this section, I list some IoT  technologies where we could implement analytics

 

A Methodology to solve Data Science for IoT problems

We started off with the question: Which points could you apply analytics to the IoT ecosystem and what are the implications? But behind this work is a broader question:  Could we formulate a methodology to solve Data Science for IoT problems?  I am exploring this question as part of my teaching both online and at Oxford University along with Jean-Jacques Bernard.

Here is more on our thinking:

  • CRISP-DM is a Data mining process methodology used in analytics.  More on CRISP-DM HERE and HERE (pdf documents).
  • From a business perspective (top down),we can extend CRISP-DM to incorporate the understanding of the IoT domain i.e. add domain specific features.  This includes understanding the business impact, handling high volumes of IoT data, understanding the nature of Data coming from various IoT devices etc
    • From an implementation perspective(bottom up),  once we have an understanding of the Data and the business processes, for each IoT vertical : We first find the analytics (what is being measured, optimized etc). Then find the data needed for those analytics. Then we provide examples of that implementation using code. Extending CRISP-DM to an implementation methodology, we could have Process(workflow), templates,  code, use cases, Data etc
    • For implementation in R, we are looking to initially use Open source R and Spark and the  h2o.ai  API

 

Conclusion

We started off with the question:  At which points could you apply analytics to the IoT ecosystem and what are the implications? And extended this to a broader question:  Could we formulate a methodology to solve Data Science for IoT problems?  The above is comprehensive but not absolute. For example, you can implement deep learning algorithms on mobile devices (Qualcomm snapdragon machine learning development kit for mobile mobile devices).  So, even as I write it, I can think of exceptions!

 

This article is part of my forthcoming book on Data Science for IoT and also the courses I teach

Welcome your comments.  Please email me at ajit.jaokar at futuretext.com  - Email me also for a pdf version if you are interested. If you want to be a part of my course please see the testimonials at Data Science for Internet of Things Course.  

#Data Science for #IoT meetups – featuring H2O and GE Predix

The Data Science for IoT meetup has two amazing events in July

July 8. Limited to about 25 participants. Intelligent world hackathon GE Predix

GE will be dialling in remotely from San Francisco and answering your questions about the Intelligent world hackathon GE Predix hackathon

It’s one of the first outreach for Predix – GE’s cutting edge Industrial IoT platforms. Places are limited.

Registration at 6:30 event starts at 7
Venue is
La Fosse Associates
Portland House, 5th Floor
Bressenden Place
London, SW1E 5BH

Please register at

http://www.meetup.com/Data-Science-for-Internet-of-Things-Meetup-London/events/232248771/

 

July 13 – Data Science with H2O

I am pleased to welcome H2O – one of the best  known Deep learning and Data Science companies based out of Mountain view

Jo-fai (Joe) is a data scientist at H2O.ai. Joe liaises with customers to expand the use of H2O beyond the initial footprint. Before joining H2O, he was in the business intelligence team at Virgin Media where he developed data products to enable quick and smart business decisions. He also worked (part-time) for Domino Data Lab as a data science evangelist promoting products via blogging and giving talks at meetups.

 

Again 6:30 meeting for a 7pm start
Venue is
La Fosse Associates
Portland House, 5th Floor
Bressenden Place
London, SW1E 5BH

Please register http://www.meetup.com/Data-Science-for-Internet-of-Things-Meetup-London/events/232249969/

 New Sponsors for the Data Science for IoT meetup

I am pleased to announce that La Fosse associates will be a supporter for our group and their venue in Victoria will be great place for our events

La Fosse Associates specialises in recruiting technology, digital and change talent from top to bottom. We operate at all levels of a technology organisation on both a permanent, contract, interim and executive search basis. We offer a flexible service and a values driven approach.

And also H2O.ai have kindly agreed to sponsor Pizzas etc for our events.

Mountain view based H2O is a pioneer in AI and Data Science

At H2O.ai we see a world where all software will incorporate AI, and we’re focused on bringing AI to business through software. H2O.ai is the maker behind H2O, the leading open source machine learning platform for smarter applications and data products. H2Ooperationalizes data science by developing and deploying algorithms and models for R, Python and the Sparkling Water API for Spark. Some of H2O’s mission critical applications include predictive maintenance, operational intelligence, security, fraud, auditing, churn, credit scoring, user based insurance, predicting sepsis, ICU monitoring and more in over 5,000 organizations. H2O is brewing a grassroots culture of data transformation in its customer communities. Customers include Capital One, Progressive Insurance, Zurich North America, Transamerica, PricewaterhouseCoopers, Comcast, Nielsen Catalina Solutions, Neustar, Macy’s, Walgreens, Kaiser Permanente and Aetna.

Finally, I launch the next version of the Data Science for IoT course. Please see Data Science for IoT course for our approach and testimonials

If you are interested, please contact me at ajit.jaokar@futuretext.com

It’s great to see the meetup group take on such vibrancy. We attract world class companies and supporters like H2O, Nvidia, GE, La Fosse associates etc

Thanks for your support and contribution!

Kind rgds

Ajit

Boston housing Dataset without the racial profiling attribute

Like many data scientists, I use the UCI datasets extensively

Specifically, the Boston Housing Dataset is useful to teach Data Science

For example, I use it in the Data Science for IoT course because its a dataset which people can relate to easily(finding median value of house prices)

The attributes are

    1. CRIM      per capita crime rate by town
    2. ZN        proportion of residential land zoned for lots over 
                 25,000 sq.ft.
    3. INDUS     proportion of non-retail business acres per town
    4. CHAS      Charles River dummy variable (= 1 if tract bounds 
                 river; 0 otherwise)
    5. NOX       nitric oxides concentration (parts per 10 million)
    6. RM        average number of rooms per dwelling
    7. AGE       proportion of owner-occupied units built prior to 1940
    8. DIS       weighted distances to five Boston employment centres
    9. RAD       index of accessibility to radial highways
    10. TAX      full-value property-tax rate per $10,000
    11. PTRATIO  pupil-teacher ratio by town
    12. B        1000(Bk - 0.63)^2 where Bk is the proportion of blacks 
                 by town
    13. LSTAT    % lower status of the population
    14. MEDV     Median value of owner-occupied homes in $1000's
However, there is a problem with this dataset especially with this attribute
12. B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town

Hence, I use a modified version of the dataset which you can find as a CSV HERE

It removes the above attribute and it does not make any difference to the dataset

You can then upload into a dataframe using the following code and changing to your directory path

# Read the data from the csv file
Boston = read.csv(“c:\\futuretext\\Boston.csv”)

 

 

Data Science for Internet of Things course – Sep 2016 – now in its fourth batch

 

Course Outline

In a nutshell

Now in its fourth batch, the Data Science for Internet of Things course is designed to prepare you for the role of a Data Scientist for the Internet of Things(IoT) domain.

 

Only last few places left – Sep  2016 – please email

info@futuretext.com  if you want to join

Please contact info@futuretext.com 

 

This niche, personalized course is suited for:

  • Developers who want to transition to a new role as Data Scientists
  • Entrepreneurs who want to launch new products covering IoT and analytics
  • Anyone interested in developing their career in IoT Analytics from a strategic perspective by choosing the Strategic/non programming option

 

Duration: The course starts from Sep 2016 and extends to Mar  2017. We work with you for the next six months after that on a specific project and to help transition your career to Data Science through our network. The extra time also allows you to catch up on specific modules in the course

 

Scope: Created by Data Science and IoT professionals, the course covers infrastructure (Hadoop – Spark), Programming / Modelling (Python/R/Time series) and Deep Learning (Theano, H2O) within the context of the Internet of Things.

Internet of Things: We cover unique aspects of Data Science for IoT including Deep Learning, Complex event processing/sensor fusion and Streaming/Real time analytics

 

Investment: The course is conducted Online and Offline(london). Please contact us for pricing (info@futuretext.com)

 

Contact  us at info@futuretext.com to signup

 

Top Ten Reasons to join this course

Here are the top 10 reasons to join the course:

1)      Work towards the goal for being a Data Scientist for the Internet of Things

2)      Personalization of content

3)      Flexibility(chances to catch up if needed)

4)      Coaching approach

5)      Affordable

6)      Covers complex elements of IoT such as Deep learning, Sensor fusion, Streaming

7)      Based on an Open methodology and Open source

8)      External publishing and personal branding

9)      Project based

10)  Being part of a global community

 

Benefits

The course aims to equip you to be a Data Scientist for the Internet of Things domain

  • You can transition your career to Data Science for IoT. This could mean a new job, new role in your existing company, a  project or a start-up idea
  • You are not alone: Toolkits and community support to start working on real Data science problems for IoT
  • You master specific skills: Spark, R, Python, Scala, IoT platforms, Data analysis, Deep Learning and SQL among others in context of IoT analytics
  • The course content can be personalized (see below)
  • While we focus on IoT only, the Data Science principles can apply to other domains

 

Testimonials

 

Jean Jacques Bernand – Paris – France

“Great course with many interactions, either group or one to one that helps in the learning. In addition, tailored curriculum to the need of each student and interaction with companies involved in this field makes it even more impactful.

As for myself, it allowed me to go into topics of interests that help me in reshaping my career.”

Johnny Johnson, AT&T – USA

“This DSIOT course is a great way to get up-to-speed.  The tools and methodologies for managing devices, wrangling and fusing data, and being able to explain it are taking form fast; Ajit Jaokar is a good fit.  For me, his patience and vision keep this busy corporate family man coming back.”

Yongkang Gao, General Electric, UK.
“I especially thank Ajit for his help on my personal project of the course — recommending proper tools and introducing mentors to me, which significantly reduced my pain in the beginning stage.”
karthik padmanabhan Manager – Global Data Insight and Analytics (GDIA) – Ford Motor Pvt Ltd.

“I am delighted to provide this testimonial to Ajit Jaokar who has extended outstanding support and guidance as my mentor during the entire program on Data science for IoT. Ajit is a world renowned professional in the niche area of applying the Data science principles in creating IoT apps. Talking about the program, it has a lot of breadth and depth covering some of the cutting edge topics in the industry such as Sensor Fusion, Deep Learning oriented towards the Internet of things domain. The topics such as Statistics, Machine Learning, IoT Platforms, Big Data and more speak about the complexity of the program. This is the first of its kind program in the world to provide Data Science training especially on the IoT domain and I feel fortunate to be part of the batch comprising of participants from different countries and skill sets. Overall this journey has transformed me into a mature and confident professional in this new space and I am grateful to Ajit and his team. My wish is to see this program accepted as a gold standard in the industry in the coming years”.

Peter Marriott – UK – www.catalystcomputing.co.uk

Attending the Data Science for IoT course has really helped me in demystifying the tools and practices behind machine learning and has allowed me to move from an awareness of machine learning to practical application.

Yair Meidan Israel – https://il.linkedin.com/in/yairmeidandatamining

“As a PhD student with an academic and practical experience in analytics, the DSIOT course is the perfect means by which I extend my expertise to the domain of IoT. It gradually elaborates on IoT concepts in general, and IoT analytics in particular. I recommend it to any person interested in entering that field. Thanks Ajit!”

Parinya Hiranpanthaporn, Data Architect and Advanced Analytics professional Bangkok

“Good content, Good instructor and Good networking. This course totally answers what I should know about Data Science for Internet of Things.”

 

Sibanjan Das – Bangalore

Ajit helped me to focus and set goals for my career that is extremely valuable. He stands by my side for every initiative I take and helps me to navigate me through every difficult situation I face. A true leader, a technology specialist, good friend and a great mentor. Cheers!!!

Manuel Betancurt – Mobile developer / Electronic Engineer. – Australia

I have had the opportunity to partake in the Data Science for the IoT course taught by Ajit Jaokar. He have crafted a collection of instructional videos, code samples, projects and social interaction with him and other students of this deep knowledge.

Ajit gives an awesome introduction and description of all the tools of the trade for a data scientist getting into the IoT. Even when I really come from a software engineering background, I have found the course totally accessible and useful. The support given by Ajit to make my IoT product a data science driven reality has been invaluable. Providing direction on how to achieve my data analysis goals and even helping me to publish the results of my investigation.

The knowledge demonstrated on this course in a mathematical and computer science level has been truly exciting and encouraging. This course was the key for me to connect the little data to the big data.

Barend Botha – London and South Africa – http://www.sevensymbols.co.uk

This is a great course for anyone wanting to move from a development background into Data Science with specific focus on IoT. The course is unique in that it allows you to learn the theory, skills and technologies required while working on solving a specific problem of your choice, one that plays to your past strengths and interests. From my experience care is taken to give participants one to one guidance in their projects, and there is also within the course the opportunity to network and share interesting content and ideas in this growing field. Highly recommended!

- Barend Botha

Jamie Weisbrod – San Diego - https://www.linkedin.com/in/jamie-weisbrod-3630053

Currently there is a plethora of online courses and degrees available in data science/big data. What attracted me to joining the futuretext class “Data Science for ioT” is Ajit Jaokar. My main concern in choosing a course was how to leverage skills that I already possessed as a computer engineer. Ajit took the time to discuss how I could personalize the course for my interests.
I am currently in the midst of the basic coursework but already I have been able to network with students all over the world who are working on interesting projects. Ajit inspires a lot of people at all ages as he is also teaching young people Data science using space exploration.

 Robert Westwood – UK – Catalyst computing
“Ajit brings to the course years of experience in the industry and a great breadth of knowledge of the companies, people and research in the Data Science/IoT arena.”

Big Picture

Below is a big picture to our approach.

  • Data Science for Internet of Things is based on time series data from IoT devices – but with three additional techniques: Deep learning, Sensor fusion (Complex Event Processing) and Streaming.
  • We consider Deep learning because we treat cameras as sensors but also include reinforcement neural networks for IoT devices
  • The course is based on templates (code) for the above in R, Python and Spark (Scala, Pyspark,SQL). It is hence suited for people with a Programming background(even if from other languages). The exception is if you choose to do a Strategic option(non programming certification – see below)
  • The ideas learnt in the core modules are implemented in Projects. Projects could last as long as six months
  • The advanced modules (ex Sensor fusion) are built on top of the core modules(ex Time series etc)
  • Much of our work has been published in leading blogs like KDnuggets and Data Science Central etc
  • The course has evolved based on active contribution from existing participants: ex Jean Jacques Barnard(methodology), Peter Marriot(Python), Sibanjan Das(H2O/Deep learning), Shiva Soleimani(methodology), Yongkang Gao(Nvidia TK1), Raj Chandrasekaran(Spark) , Vinay Mendiratta(systems level optimization of IoT sensors). We plan to open source most of our code
  • We use Apache Spark for Streaming and Apache flink for sensor fusion.
  • Ironically, due to the emphasis on Data, the course is strictly not an IoT course ie we are concerned primarily with applying predictive learning algorithms on IoT datasets
  • Finally, the course is personalized. I see it more as coaching than a course. – for example you can choose to focus on a smaller subset of topics which is decided in the personal learning plan at the outset

 

Interested ? Email info@futuretext.com for details of the September batch (now in it’s fourth batch)

 

Coaching ..

The course takes a coaching approach i.e. the relatively small numbers allow us to customize the content and take a personalized approach through one on one calls/hangouts etc

Timeline and Modules

Notes:

  • The modules and the sequence are subject to change)
  • You can start in Sep if you are away on holiday in August (because the first two weeks are personalization and onboarding). However, if you can start in August, we recommend you do so

 

 

Aug 15 Personal learning plans and onboarding
Aug 22
Aug 29
Sep 5 Data Science for IoT core concepts
Sep 12 Data Science for IoT core concepts Quiz and hangout
Sep 19 IoT Platforms
Sep 26 IoT Platforms Quiz and hangout
Oct 3 Fundamentals of R Programming
Oct 10 Fundamentals of R Programming Quiz and hangout
Oct 17 Problem solving with Data science Part One
Oct 24 Problem solving with Data science Part One – Quiz and hangout
Oct 31 Problem solving with Data science Part Two
Nov 7 Problem solving with Data science Part Two – Quiz and hangout
Nov 14 Time series and NoSQL databases
Nov 21 Time series and NoSQL databases – Quiz and hangout
Nov 28 Spark ecosystem – part one
Dec 5 Spark ecosystem – part one – Quiz and hangout
Dec 12 Spark ecosystem – part two
Dec 19 Spark ecosystem – part two – Quiz and hangout
Jan 9 2017 Deep learning
Jan 16 2017 Deep learning – Quiz and hangout
Feb 2017 to June 2017 Project and optional modules

 

Optional modules: Choose 2 modules from

1. Spark and scala with an emphasis on distributed algorithms for IoT use cases(

2. Deep learning – based on H20

3. Sensor fusion

4. Python (Python covers all aspects of Python including Deep learning via Theano)

 

Module and Content notes

The content includes

 

Mathematical and statistical techniques (covered throughout the course)- Stats, how to solve a problem, where algorithms fit in, motivational examples, how to choose an algorithm, end to end steps etc) – may have multiple parts and will broadly include

Hypothesis testing – Descriptive statistics – Data loading – Split-out validation dataset – Basic Data visualizations – Prepare Data – Data Cleaning – Feature Selection – Data Transforms – Test and Evaluate Algorithms – Improve Accuracy

(Algorithm Tuning, Ensembles etc) – Time series – mulivariate regression

 

Methodology - How to solve an IoT analytics problem

 

Algorithms

Regression

Clustering

multivariate

Time series (air quality and temperature)

Big Data: Spark and Scala(optional)

Deep learning: H20(optional)

 

An overview of Data Science

An overview of Data Science,  What is Data Science? What problems can be solved using Data science – Extracting meaning from Data – Statistical processes behind Data – Techniques to acquire data (ex APIs) – Handling large scale data – Big Data fundamentals

 

Data Science and IoT

The IoT ecosystem, Unique considerations for the IoT ecosystem – Addressing IoT problems in Data science (time series data, enterprise IoT edge computing, real-time processing, cognitive computing, image processing, introduction to deep learning algorithms, geospatial analysis for IoT/managing massive geographic scale, strategies for integration with hardware, sensor fusion)

 

The Apache Spark ecosystem

Apache spark in detail including Scala, SQL, SparkR, Mlib and GraphX

 

The Data Science for IoT methodology

A specific approach to solve Data Science problems for IoT including strategy and development

 

Mathematical foundations of Machine learning

Here we formally cover the mathematics for Data science including Linear Algebra, Matrix algebra, Bayesian Statistics, Optimization techniques (Gradient descent) etc. We also cover Supervised algorithms, unsupervised algorithms (classification, regression, clustering, dimensionality reduction etc) as applicable to IoT datasets – covered throughout the course as needed

 

Unique Elements for IoT

This module emphasises the following unique elements for IoT

  • Complex event processing (sensor fusion)
  • Deep Learning and
  • Real Time (Spark, Kafka etc)

FAQ: Summary of Benefits and Features

 

Impact on your work Designed for developers/ICT contractors/Entrepreneurs who want to transition their career towards Data science roles with an emphasis on IoT
Typical profile A developer who has skills in programming environments like Java, Ruby, Python, Oracle etc and wants to learn Data Science within the context of Internet of Things with the goal of becoming a Data Scientist for IoT. You may also choose a strategic option (i.e. excluding  Programming)
Community support? Yes. Also includes the Alumni network i.e. beyond the duration of the course at no extra cost.
Approach to Big Data For Big Data, the course is focussed on Apache Spark – specifically Scala, SQL, mlib. Graphx and others on HDFS
Approach to Programming see scope below
Approach to Algorithms see scope below
Is this a full data science course? Yes, we cover machine learning / Data science techniques which are applicable to any domain. Our focus is Internet of Things. The course is practitioner oriented i.e. not academic and is not affiliated to a university.
InvestmentThe course is conducted Online and Offline(london). Please contact us for pricing (info@futuretext.com)
Help with jobs/employment yes, we aim to transition your career. Hence, we are selective in the recruitment for the course. There are no guarantees – but a career transition is a key goal for us. We work with you  over the duration of the course(including the Project) to typically achieve one of the following :  a) Help you to get a new role in Data Science/IoT.  B) Support you in your existing Data science role Often career transition includes support for Startups also or working with your existing company
Created by professionals See my profile below
Personalization The course is based on a PLP (Personal learning plan) which allows you to customize for language, projects, domains, career goals, entrepreneurial goals etc . The course can be personalized. Examples include a focus on CEP/Sensor fusion,  RNNs and Time series, Edge processing, SQL  etc. There is no extra cost for this but we agree scope before we start through a Personal Learning Program(PLP).
Duration The course starts from Sep 2016 to late Mar 2017. We work with you for the next six months after that on a specific project and to help transition your career to Data Science through our network. The extra time also allows you to catch up on specific modules in the course
Projects A significant part of the course is Project based. Projects are based on   predictive analytics algorithms for IoT applications. Projects use our methodology which is based on a formalized way of solving IoT analytics  problems. Projects can be based in any of the Programming Languages we cover i.e. R or Python. Spark(Scala) and SQL(distributed processing i.e. Big Data) and  Theano and deeplearning4j for Deep learning . If you want to work on a specific project you should indicate in advance(or if you want to explore some ideas deeper)
Access to knowledge We do not restrict access to knowledge by specialization. For example – if you choose to focus on sensor fusion – you will still have access to all material for Deep learning
Batch sizes Are limited to ensure personalized attention
Time per week about 5 hours/week. No additional materials needed to buy etc
Certificate of completion Yes – based on the quiz and projects. See below for levels of certification
Delivery of content via video. You do not have to be online at specific times with the exception of hangouts
Is there a selection process? Yes. I spend a lot of time with participants and this approach may not be suitable for everyone. Hence, there is an initial discussion before you signup

 

Who has created this course / Who is the tutor?

The course is created by Ajit Jaokar

 

Ajit”s work spans research, entrepreneurship and academia relating to IoT, predictive analytics and Mobility.

His current research focus is on applying data science algorithms to IoT applications. This includes Time series, sensor fusion and deep learning(mostly in R/Apache Spark). This research underpins his teaching at Oxford University (Data Science for Internet of Things) and ‘City sciences’ program at University of Madrid.

His book is included as a course book at Stanford University for Data Science for Internet of Things. In 2015, Ajit was included in top 16 influencers (Data Science Central), Top 100 blogs( KDnuggets), Top 50 (IoT central)

Ajit has been involved with various Mobile / Telecoms / IoT projects since 1999 ranging from strategic analysis, Development, research, consultancy and project management. From 2011, he has further specialized in the predictive analytics for IoT.

Ajit works with Predictive learning algorithms(R and Spark) with applications including Smart cities, IoT and Telecoms

In 2009, Ajit was nominated to the World Economic Forum’s ‘Future of the Internet’ council. In 2011, he was nominated to the World Smart Capital program (Amsterdam). Ajit moderates/chairs Oxford University’s Next generation mobile applications panel. In 2012, he was nominated to the board of Connected Liverpool for their Smart city vision. Ajit has been involved in IOT based roles for the webinos project (Fp7 project). Since May 2005, he has founded the OpenGardens blog which is widely respected in the industry. Ajit has spoken at MobileWorld Congress (4 times) ,CTIA, CEBIT, Web20 expo, European Parliament, Stanford University, MIT Sloan, Fraunhofer FOKUS;University of St. Gallen. He has been involved in transatlantic technology policy discussions.

Ajit is passionate about teaching Data Science to young people through Space Exploration working with Ardusat

Additional details

Why is this course unique?

The course emphasizes some aspects are unique to IoT (in comparison to traditional data science). These include: A greater emphasis on time series data, Edge computing, Real-time processing, Cognitive computing, In memory processing, Deep learning, Geospatial analysis for IoT, Managing massive geographic scale(ex for Smart cities), Telecoms datasets, Strategies for integration with hardware and Sensor fusion (Complex event processing). Note that we include video and images as sensors through cameras (hence the study of Deep learning)

What is the implication of an emphasis on IoT?

In 2015/6, IoT is emerging but the impact is yet to be felt over the next five years. Today, we see IoT driven by Bluetooth 4.0 including iBeacons. Over the next five years, we will see IoT connectivity driven by the wide area network (with the deployment of 5G 2020 and beyond). We will also see entirely new forms of connectivity (example: LoRa , Sigfox etc). Enterprises (Renewables, Telematics, Transport, Manufacturing, Energy, Utilities etc) will be the key drivers for IoT. On the consumer side, Retail and wearables will play a part. This tsunami of data will lead to an exponential demand for analytics since analytics is the key business model behind the data deluge. Most of this data will be Time series data but will also include other types of data. For example, our emphasis on IoT also includes Deep Learning since we treat video and images as sensors.  IoT will lead to a Re-imagining of everyday objects. Thus, we believe we are only just seeing the impact of Data Science for Internet of Things.

How is this approach different to the more traditional MOOCs?

Here’s how we differ from MOOCs

a)  We are not ‘Massive’ – this approach works for small groups with more focused and personalized attention. We will never have 1000s of participants

b)  We help in career leverage: We work actively with you for career leverage – ex you are a startup / you want to transition to a new job etc

c)  We are vendor agnostic

d)  We work actively with you to build your brand(Blogs/Open source/conferences etc)

e)  The course can be personalized to streams (ex with Deep learning, Complex event processing, Streaming etc)

f)  We teach the foundations of maths where applicable

g)  We work with a small number of platforms which provide current / in-demand skills – ex Apache Spark, R etc

h)  We are exclusively focused on IoT (although the concepts can apply to any other vertical)

 

What is your approach to Programming?

The main Programming focus is on R and Python. We also use Spark (Scala, SQL and R). We  use  H2O and Theano(for Deep learning).

 

What is your approach to working with Algorithms and Maths?

The course is based on modelling IoT based problems in the R and Python programming languages.  We follow a context based learning approach – hence we co-relate the maths to specific R based IoT models. You will need an aptitude for maths. However, we cover the mathematical foundations necessary. These include: Linear Algebra including Matrix algebra, Bayesian Statistics, Optimization techniques (such as Gradient descent) etc.

 

Where do you stand on the R vs. Python debate?

The primary language for the course is R. But we also cover code in Python. Because the course lasts for a year,  you can easily cover both. We believe that commercially R will be more valuable because of the uptake from companies like Microsoft (Azure), HPE(Vertica), SAP(Hana), Oracle, Hitachi/Pentaho etc.

What is your view on Open source?

We actively encourage Open Source. We intend to Open Source most of your code and also our methodology. We encourage participants to contribute to our Open Source github also. This helps to build your personal brand. The code and content is part of our forthcoming book.

What is your view on Publishing?

Like Open Source, we encourage Open publishing. Many of the course participants have featured in blogs like KD Nuggets and Data Science Central. For example: Deep Learning for Internet of Things Using H2O by Sibanjan Das and Ajit Jaokar on KDnuggets

Can you explain more about Constructivism?

The learning philosophy of the course is based on a concept called Constructivism.

 

Constructivism (constructivist learning philosophy) is based on learning mechanisms in which knowledge is internalized by learners through processes of accommodation and assimilation. The learner has an internal representation of a concept. They then encounter new ideas and concepts. These ideas and concepts are assimilated through learning. In doing so, they may incorporate new experiences into the existing framework. Alternately, if they find that the new experience contradicts their existing framework – they may evolve their internal representation or change their perception.  Thus, learning is an on-going process of concept assimilation. We incorporate constructivist ideas into the course content, Projects and the methodology. Here are the implications and findings of this approach.

 

Specifically, that means for this course:

 

  • Understanding existing knowledge in the onboarding stage:  To acquire new knowledge in complex domains, the context is important. Specifically, the starting point of the Learner is important so that new ideas can be co-related to their existing knowledge base. This is different for everyone. Hence, we take some time to understand your existing knowledge in the onboarding stage
  • Co-relating concepts: New concepts and ideas have hierarchical dependencies and also lateral connections. These need to be incorporated into the learning process. This is included in the content as far as possible. It implies that you can learn another language (Python) based on the main language( R  ) relatively easily
  • Personalization and longer duration: The process is slow and personalized. It is not possible to use this approach in a mass/massive mode
  • Flexible learning paths: There is a broad structure to acquiring knowledge but not a fixed path
  • The use of projects in context of constructivism:  Projects provide the Physical context. But Projects need to be seen in the wider context. Projects and Methodology go together. The methodology provides the panoramic view to the problem in context of the Big Picture. Thus, Projects are the core of our course – but we use the idea of Projects in a wider learning context.

The key idea in constructivism is: The starting point is familiar to the learner. Hence, we spend a lot of  time in the first few months trying to understand the learner’s current state of understanding. After this, the next phase is spent on the core modules which correlates new ideas to the existing understanding thereby building on existing concepts. In turn, these are further expanded in the  project.

The project itself is contrasted against a methodology  (we are creating an Open methodology for Data Science for Internet of Things as part of the course).

 

How do I ‘transition my knowledge’ and gain recognition?

Data Science for IoT is a new domain. It is also an amalgamation of two rapidly evolving domains in their own right(Internet of Things and Data Science). This gives you many opportunities to establish expertise in niche areas using the Zulu principle – become an expert on a clearly defined, narrow subsection of a field

Many on the course have pursued this concept with success: Dr Vinay Mendiratta(IoT systems level optimization), Robert Westwood(Markovian analysis for IoT datasets), Barend Botha(IoT visualization), Yongkang Gao(Deep learning), Vaijayanti Vadiraj(renewables), Ibrahim El Badawi(Drones)

Many of these areas have been suggested by the participants themselves…

We encourage you to publish your work externally.

How does Certification work?

There are three stages of certification

 

a)  Stage One: Strategic certification

b)  Stage Two: Project certification

c)  Stage Three: Programming certification

 

Where possible, we also validate projects externally especially from a business perspective.

For example if you are doing a project on renewables, we try to validate using someone in Renewables to give feedback for the project. The Programming certification will be in R language and Depending on the optional languages you have chosen. Projects can span a wide range of interesting areas as long as they pertain to IoT and analytics. These include renewables, automotive, Drones etc.

Is one year not too long?

Not really!.

I see many courses who claim to create a Data scientist in X weeks – that is not possible in my view

Especially with a subject like Data Science for IoT, we have complex interdependent concepts which need a longer timeframe to absorb. The longer timeframe also gives you (and me) more flexibility to manage the course. Finally, half the course is based on Projects – which also gives you flexibility.

Is this an academic course?

No. It is not. It is a practitioner based course. It is also not affiliated to any academic institution.

To sign up or for any other questions: please contact info@futuretext.com