Using Space exploration to teach young people about Data Science – we go live today

Many of you know that I co-founded a social startup in USA teaching young people Data science using space exploration, we have been working closely with Ardusat.
we go live today
10am Utah MST
12pm Miami EST
5pm London GMT
the globe will have a count down timer..
A lot of our work will start after we get live data from space.

AM2 from Ajit Jaokar on Vimeo.

The lipstick robot – a great way to explain Deep learning

I love motivational examples in teaching complex ideas

I use this simple little video ..to teach Deep learning to my students in Oxford, UPM and Data science for Internet of Things

Think of ideas like teaching a computer to recognize images of cats using Deep learning .. OR training a computer to play pacman using Deep learning

They all work in the same way

You let the deep learning system iterate with many examples and in each case, you tell the computer using a classifier if it’s interpretation was correct or not(aka is it a cat or not, Pacman scores etc)

Now watch the video below

I see the click at the end of the step as a classifier

As you see, the robot has a long way to go!

I even think its improving on each iteration!

That’s deep learning for you!

PS I am not sure that this was the original intent of the video by @SimoneGiertz but its cool

Video link is lipstick robot

Creating an open methodology for Internet of Things (IoT) Analytics: Data science for Internet of Things

Introduction

Update: 

a) I am not referring to ‘standardization’ here. Rather the need for a methodology i.e. structured way to solve problems(Think of it like Kaggle meets #IoT analytics)

b)  Added reference to PFA(Portable format for Analytics) – thanks Gregory Piatetsky-Shapiro @kdnuggets for the feedback

We often encounter this problem in my teaching Data Science for Internet of Things:

There is no specific methodology to solve Data Science for IoT  (IoT Analytics) problems.

This leads to some initial questions:

  • Should there be a distinct methodology to solve Data Science problems for IoT?
  • Are IoT problems for Data Science unique enough to warrant a specific approach?
  • What existing methodologies should we draw upon?

On one hand , A Data Science for IoT problem is a typical Data Science problem. On the other hand, there are some unique considerations to IoT – for example in the use of Hardware, High Data volumes, Use of CEP(Complex event processing), impact of verticals(like automotive), Impact of streaming data etc.

Background and inspiration

Some initial background:

Data mining has well known methodologies such as Crisp DM.  Hilary Mason and others have also proposed specific methodologies for Data Science . Kaggle problems have a specific approach to solving them . With techniques like PFA(Portable format for Analytics) provide a way of formalizing and moving Analytics models.

All these strategies also apply to IoT. IoT itself has methodologies like Ignite IoT – but these do not cover IoT analytics in detail.

A methodology for IoT analytics(Data Science for IoT) should cover the unique aspects of each step in Data Science. For example: It is more than the choice of the model family. The choice of the model family (ANN, SVM, Trees, etc) is only one of the many choices to make – Others include :

a) Choice of the model structure – optimisation methodology (CV, Bootstrap, etc)

b)  Choice of the model parameter optimisation algorithm (joint gradients vs. conjugate gradients )

c)  Preprocessing of the data (centring, reduction, functional reduction, log-transform, etc.)

d)  How to deal with missing data (case deletion, imputation, etc.)

e)  How to detect and deal with suspect data (distance-based outlier detection, density-based, etc.)

f)  How to choose relevant features (filters, wrappers, embedded method ?)

g)  How to measure prediction performances (mean square error, mean absolute error, misclassification rate, lift, precision/recall, etc.)

source Methodology and standards for data analysis with machine learning tools Damien Fran¸cois ∗

The methodology could also cover  -

Exploratory analysis of data

Hypothesis testing (“Given a sample and an apparent effect, what is the probability of seeing such an effect by chance?” )

and other ideas ..

An Open methodology for IoT analytics problems

Building on the above, we need an Open, end-to-end,  step by step methodology to solve IoT Analytics/Data Science for IoT problems

In addition, the methodology would need to consider the unique aspects of IOT. For example:

a)      Complex event processing especially using Apache Spark for CEP

b)      Deep learning (because we consider Cameras as sensors)

c)      Anomaly Detection: Consider Anomaly detection (a typical IoT analytics scenario). There are many considerations:  What is the triggering event, How much has the machine deviated from the plan, What is the root cause of the bottleneck, Are there any external factors affecting the system performance, How do I know that I should trust IOT data? Is there a recommended plan of action? How is the Data visualized? Does the Data have missing elements? How do we detect failure in other processes? (Anomaly detection adapted from Dr Vinay Mehendiratta)

In addition, IoT vertical domains have special considerations: Smart Grid, Smart cities, Smart energy, Automotive, Smart factory, Mobile, Wearables, Smart home etc.

For example:

Modelling energy prices,

Classifying step using machine learning,

Bus routing using mobile phone data,

Linear and non-linear regression models to predict global temperature and weather prediction

etc

Creating an Open methodology

Currently, this is an evolving thought process being developed as a part of the Data Science for IoT course. We intend to create it as an open methodology – starting with the question: What is common across these IoT analytics problems and how can we adapt existing Data Science techniques  to solve IoT analytics problems?

Over the next few weeks, we are conducting a survey and developing the methodology

If you are interested in participating and knowing more, please sign up to our mailing list and download our papers or contact me at ajit.jaokar at futuretext.com 

My blog featured in 4 Top 50/100 lists for IoT / Big Data / Data Science last year

An interesting year in social media last year .. and A nice way to start the year

My blog featured in 4 Top 50/100 lists for IoT / Big Data / Data Science last year
I always find this interesting since I write about a very niche space(Data Science for IoT) and its more mathematical / technical than my previous work in Mobile
These are great lists also – some very clued on people – well worth following them

Inline images 1

What is the best way for getting started in Statistics for Programmers/Data Science?

What is the best way for getting started in Statistics for Programmers/Data Science?

I am often asked this question: What’s the best way for getting started in Statistics for Programmers?

At the Data Science for IoT course – and also in my teaching at Oxford University – I have used the following approach.

Comments welcome:

Firstly, the interest in Statistics for Programmers is a fairly recent phenomenon.

This interest is based on the uptake of Data Science – a hot profession now.

Here’s how most people approach the problem

They pick up an old High School statistics text book – either their own from younger days– or a standard book.

These books are often decades old.

They start with page One .. and work linearly through a few pages ..

They quickly realize why they disliked stats earlier.

And that sentiment has not changed with the passage of time ..

But, here is a different approach

For Data Science, you do not need to master Statistics per se

You need to understand Statistical models.

A model is defined as a combination of  predictive algorithms (based on Statistics) and Data.

Data science is based on creating models that improve with experience / training/

In contrast, in the Data Science for IoT course – we start with problems (the Engineering approach).

I recommend three sources which I am using (if you have others, please let me know at ajit.jaokar at futuretext.com and I shall link them and refer back to you)

Start with Understanding the problem

See these two links by @Brandon Rohrer  (@Microsoft Data Science)  -

Which algorithm family can answer my question and

Which questions can Data Science answer.

See also this post by Dr Vincent Granville @DataScienceCtrl

on 24 uses of Statistical modelling Part 1 and  2

These posts give you an idea of the problems that can be solved using Data science and stats(without going into the math itself initially)

Then read Allen Downey’s books

Allen Downney writes excellent books and they are all free under creative commons. You can download them  at Green Tea Press and they have an excellent ethos. Especially – Think Stats, Think Bayes, Think complexity (in that order).

To encourage the author I would also encourage you to buy these books especially Think Stats.

You can follow him on Twitter @allendowney

Having mastered to this stage, then start with code and small datasets.

I prefer UCI datasets and Python scikit learn library.

Sumit also works with the REPL approach and Paul uses Spark notebook in our course.

In any case, these are small sections of code run in a controlled environment and show you how the stats are implemented(libraries / APIs like scikit learn – are relatively easier to understand if you come from a Programming background)

Thats the path we are using in the Data Science for IoT course.

Any comments/feedback welcome on your approach to teach statistics (ajit.jaokar at futuretext.com)

Image source: Scatter plots – wikipedia

Data Science for Internet of Things – practitioner course – March 2016

Now running in it’s third batch ..

Welcome to the world’s first course that helps you to become a Data Scientist for the Internet Of Things ..

This course has already started. If you want to know more, please email us at info at futuretext.com 

 

Overview

The course starts on March 22 – 2016 - 

Please contact info@futuretext.com 

This niche, personalized course is suited for:

  • Developers who want to transition to a new role as Data Scientists
  • Entrepreneurs who want to launch new products covering IoT and analytics
  • Anyone interested in developing their career in IoT Analytics

Duration: The course starts from March 2016 and extends to July  2016. We work with you for the next six months after that on a specific project and to help transition your career to Data Science through our network. The extra time also allows you to catch up on specific modules in the course

Scope: Created by Data Science and IoT professionals, the course covers infrastructure (Hadoop – Spark), Programming / Modelling (Python/R/Time series) and Deep Learning (Theano, Deeplearning4j) within the context of the Internet of Things.

Internet of Things: We cover unique aspects of Data Science for IoT including Deep Learning, Complex event processing/sensor fusion and Streaming/Real time analytics

Investment:

Offline (London):  £1,200 GBP + VAT
Online:  Yes. Please contact us at info@futuretext.com

 

Contact  us at info@futuretext.com to signup

 

Benefits

 

  • The course aims to equip you to be a Data Scientist for the Internet of Things domain
  • You can transition your career to Data Science for IoT. This could mean a new job, role, project or a start-up idea
  • You are not alone: Toolkits and community support to start working on real Data science problems for IoT
  • You master specific skills: Spark, R, Python, Scala, IoT platforms, Data analysis, Deep Learning and SQL among others
  • The course content can be personalized (see below)
  • The Data Science principles can apply to other domains i.e. beyond IoT

 

Modules

(Note the modules and the sequence are subject to change)

 

An overview of Data Science

An overview of Data Science,  What is Data Science? What problems can be solved using Data science – Extracting meaning from Data – Statistical processes behind Data – Techniques to acquire data (ex APIs) – Handling large scale data – Big Data fundamentals

 

Data Science and IoT

The IoT ecosystem, Unique considerations for the IoT ecosystem – Addressing IoT problems in Data science (time series data, enterprise IoT edge computing, real-time processing, cognitive computing, image processing, introduction to deep learning algorithms, geospatial analysis for IoT/managing massive geographic scale, strategies for integration with hardware, sensor fusion)

 

The Apache Spark ecosystem

Apache spark in detail including Scala, SQL, SparkR, Mlib and GraphX

 

The Data Science for IoT methodology

A specific approach to solve Data Science problems for IoT including strategy and development

 

Mathematical foundations of Machine learning

Here we formally cover the mathematics for Data science including Linear Algebra, Matrix algebra, Bayesian Statistics, Optimization techniques (Gradient descent) etc. We also cover Supervised algorithms, unsupervised algorithms (classification, regression, clustering, dimensionality reduction etc) as applicable to IoT datasets

 

Unique Elements for IoT

This module emphasises the following unique elements for IoT

  • Complex event processing (sensor fusion)
  • Deep Learning and
  • Real Time (Spark, Kafka etc)

  

FAQ: Summary of Benefits and Features

 

Impact on your work Designed for developers/ICT contractors/Entrepreneurs who want to transition their career towards Data science roles with an emphasis on IoT
Typical profile A developer who has skills in programming environments like Java, Ruby, Python, Oracle etc and wants to learn Data Science within the context of Internet of Things with the goal of becoming a Data Scientist for IoT
Community support? Yes. Also includes the Alumni network i.e. beyond the duration of the course at no extra cost.
Approach to Big Data For Big Data, the course is focussed on Apache Spark – specifically Scala, SQL, mlib. Graphx and others on HDFS
Approach to Programming see scope below
Approach to Algorithms see scope below
Is this a full data science course? Yes, we cover machine learning / Data science techniques which are applicable to any domain. Our focus is Internet of Things. The course is practitioner oriented i.e. not academic and is not affiliated to a university.
Investment Offline(London):  £1,200 GBP + VAT(if applicable)
Online:  Yes. Please contact us at info@futuretext.com
Help with jobs/employment yes, we aim to transition your career. Hence, we are selective in the recruitment for the course. There are no guarantees – but a career transition is a key goal for us. We work with you  over the duration of the course(including the Project) to get a new role in Data Science/IoT
Created by professionals See our profiles below
Personalization The course is based on a PLP (Personal learning plan) which allows you to customize for language, projects, domains, career goals, entrepreneurial goals etc . The course can be personalized. Examples include a focus on CEP/Sensor fusion,  RNNs and Time series, Edge processing, SQL  etc. There is no extra cost for this but we agree scope before we start through a Personal Learning Program(PLP). If you are interested in this option, please let us know at info@futuretext.comIf you want to see examples of our work and content, please see Spark SQL real time analytics by Sumit Pal(published on kdnuggets)The evolution of Deep learning models by Ajit Jaokar
Duration The course starts from March 2016 and extends to July  2016. We work with you for the next six months after that on a specific project and to help transition your career to Data Science through our network. The extra time also allows you to catch up on specific modules in the course
Projects A significant part of the course is Project based. Projects are based on   predictive analytics algorithms for IoT applications. Projects use our methodology which is based on a formalized way of solving IoT analytics  problems. Projects can be based in any of the Programming Languages we cover i.e. R or Python. Spark(Scala) and SQL(distributed processing i.e. Big Data) and  Theano and deeplearning4j for Deep learning . If you want to work on a specific project you should indicate in advance(or if you want to explore some ideas deeper)
Access to knowledge We do not restrict access to knowledge by specialization. For example – if you choose to focus on sensor fusion – you will still have access to all material for Deep learning
Batch sizes Are limited to ensure personalized attention
Time per week about 5 hours/week. No additional materials needed to buy etc
Certificate of completion Yes – based on the quiz and projects.
Delivery of content via video. You do not have to be online at specific times

 

How is this approach different to the more traditional MOOCs?

Here’s how we differ from MOOCs

a)  We are not ‘Massive’ – this approach works for small groups with more focused and personalized attention. We will never have 1000s of participants

b)  We help in career leverage: We work actively with you for career leverage – ex you are a startup / you want to transition to a new job etc

c)  We are vendor agnostic

d)  We work actively with you to build your brand(Blogs/Open source/conferences etc)

e)  The course can be personalized to streams(ex with Deep learning, Complex event processing, Streaming etc)

f)  We teach the foundations of maths where applicable

g)  We work with a small number of platforms which provide current / in-demand skills – ex Apache Spark, R etc

h)  We are exclusively focused on IoT (although the concepts can apply to any other vertical)

 

Approach to Programming

The main Programming focus is on Python, R , Spark (Scala, SQL and R). We also use  Deeplearning4j and Theano(for Deep learning).  We will also use an ioT platform (like Thingworx) but we will emphasize IoT analytics.  The participants need to be able to Code/come from a development background (the Programming language itself does not matter).

 

What is your approach to working with Algorithms and Maths?

The course is based on modelling IoT based problems in the Python and R programming language.  We follow a context based learning approach – hence we co-relate the maths to specific R based IoT models. You will need an aptitude for maths. However, we cover the mathematical foundations necessary. These include: Linear Algebra including Matrix algebra, Bayesian Statistics, Optimization techniques (such as Gradient descent) etc.

 

What is the implication of an emphasis on IoT?

In 2015, IoT is emerging but the impact is yet to be felt over the next five years. Today, we see IoT driven by Bluetooth 4.0 including iBeacons. Over the next five years, we will see IoT connectivity driven by the wide area network (with the deployment of 5G 2020 and beyond). We will also see entirely new forms of connectivity (ex LoRa, Sigfox etc). Enterprises (Renewables, Telematics, Transport, Manufacturing, Energy, Utilities etc) will be the key drivers for IoT. On the consumer side, Retail and wearables will play a part. This tsunami of data will lead to an exponential demand for analytics since analytics is the key business model behind the data deluge. Most of this data will be Time series data but will also include other types of data. For example, our emphasis on IoT also includes Deep Learning since we treat video and images as sensors.  IoT will lead to a Re-imagining of everyday objects.

 

Why is this course unique?

The course emphasizes some aspects are unique to IoT (in comparison to traditional data science). These include: A greater emphasis on time series data, Edge computing, Real-time processing, Cognitive computing, In memory processing, Deep learning, Geospatial analysis for IoT, Managing massive geographic scale(ex for Smart cities), Telecoms datasets, Strategies for integration with hardware and Sensor fusion (Complex event processing). Note that we include video and images as sensors through cameras (hence the study of Deep learning)

 

 

Who is creating/teaching this course?

The course is created by futuretext and conducted by Ajit Jaokar, Dr Paul Katsande and Sumit Pal

Ajit Jaokar  – Based in London, Ajit’s research and consulting is based on Data Science and the Internet of Things. His work is based on his teaching at Oxford University and UPM (Technical University of Madrid) and covers IoT, Data Science, Smart cities and Telecoms.

 

 

Sumit Pal is a big data, visualisation and data science consultant. He is also a software architect and big data enthusiast and builds end-to-end data-driven analytic systems. Sumit has worked for Microsoft (SQL server development team), Oracle (OLAP development team) and Verizon (Big Data analytics team) in a career spanning 22 years. Currently, he works for multiple clients advising them on their data architectures and big data solutions and does hands on coding with Spark, Scala, Java and Python. Sumit is based in Boston.

 

Dr Paul Katsande is a technical architect based in London working with Apache Spark, Scala and Data Science. Paul’s PhD research is based on image processing from the University of Manchester.

 

 

We have limited spaces. Please contact us at info@futuretext.com if you want to take the next steps!

 

Testimonials

See video below

 

 

 

Weekly schedule

Concepts

Week 0 March 15 Orientation, introductions, Personal learning plans, Platform signup
Week 1 mar 21 Foundations:An analytics Driven Organization – IoT and Machine Learning  - Data Science for IoT – Unique characteristics – Data Science for IoT – why now?
Mar 28 Machine Learning concepts Deep Learning concepts
Apr 4 An introduction to IoT (Internet of Things)
Apr 11 IoT platforms – From sensor to Cloud
Apr  18 Concepts of Big Data Part One
Apr  25 Concepts of Big Data Part Two
May 2 Market drivers for IoT
May 9 Choosing a model – what technique to Use?
May 16 Use Cases  and IoT datasets (these will continue throughout the course)
May  23 Time series and NoSQL databases
May 30 Streaming analytics part One
June  6 Streaming analytics part two
June 13 Deep learning part one
June 20 Deep learning part two
June  2 7 Machine learning algorithms – part one
July 4 Machine learning algorithms – part two
July 11 Mathematical foundations – part one
July 18 Mathematical foundations – part two
July To Dec 31 Project

 

 

Programming

 

Week 0 Mar 15 Orientation, introductions, Personal learning plans, Platform signup
Week 1 mar 21
Mar 28
Apr 4 Intro to R, Installations, Basics of R
Apr 11
Apr  18 Data Frames in R & Tabular Data
Apr  25
May 2 Data Processing & Data Visualization in R
May 9
May 16 Scala basics
May  23
May 30 Spark batch processing I
June  6
June 13 Spark Batch Processing II
June 20
June  2 7 Spark SQL
July 4
July 11 Spark Streaming
July 18
July To Dec 31 Projects

 

 Contact  us at info@futuretext.com to signup

 

 

IoT data analytics and visualization event – Palo alto – Feb 2016

As per every year, we are supporting this great event. The IoT data analytics and visualization event – Palo alto is now a must attend event for IoT professionals.

DATA15’ which provides a 15% discount to attend the event

Have a look at the conference and the speakers IoT data analytics and visualization event – Palo alto – Feb 2016

 

Miami Young Data Scientists – Pleased to be the winning team in the 2015 Association of Space Engineers/Astrosat challenge

 

 

 

 

 

 

 

 

 

For the last two years, I have worked with teaching Computer Science for young people.

This venture has had its ups and downs.

But we have had the support of many who believed in the vision.

So, It was very nice to see this

We (Countdown Institute – i.e. now me and Richard Schuchts based in Miami ) submitted an entry in the ASE AstroSat Challenge (supported by Northrop Grumman Corporation). The Association of Space Explorers is the unique professional organization composed of astronauts who have orbited Earth. They have 375 members from 35 countries and are passionate about encouraging students to pursue science, technology, engineering, and math education, as well as careers in astronautics. The ASE AstroSat Challenge is designed to give students a taste of the exciting world of satellite operations. The ASE AstroSat Challenge is made possible with the generous support of the Northrop Grumman Corporation.

Only 15 teams were selected to run a Space experiment – And our team (Miami Young Data Scientists/Countdown Institute) were one of them

Its amazing to get here.

It means the team of ‘young data scientists’ from Miami will be able to run a Space Experiment live in Space and also learn Data Science

The winning entry was based on teaching Data Science to young people.

Specifically, using Regression algorithms to make predictions on Space data from Ardusat (more on this soon)

This is different from our original idea and is more complex .. but I think it would make a difference to get more young people into Data Science (as per Harvard – the hottest profession in future)

Thus, I think the biggest winners are the young people of Miami who are a part of the winning team.

The main variation/evolution from the original idea is to focus on Data Science and inspiring students to take up Data Science through visualization of data and predictions using scientific methodology.

Its a way to get more students(both boys and girls) interested in Data science using Space exploration by coding on a live satellite.

Hence, the regression algorithms/iPython notebooks etc.

Also a bit more math. and hence slightly for older students(aroundn 15 to 17). All this also aligns with my ‘day job’ so to speak!

Here is the full list of winners of the astrosat competition

I am happy to share more. If you want to know more about this – please email me at ajit.jaokar at futuretext.com

Free Download – Data Science for IoT course papers

Here is a set of papers from the Data Science for Internet of Things – practitioners course

These have been published by course participants in top Data Science blogs like KDnuggets and Data Science Central

The Zip file includes the following papers:

1) Recurrent neural networks, Time series data and IoT – Part One

2) Spark SQL for Real Time Analytics – Part One

3) Spark SQL for Real Time Analytics – Part Two

4) Time Series IoT applications in Railroads

5) Nov 13 update: Kalman filters 1 and 2

Free Download – Data Science for IoT course papers

 

Data Science for Internet of Things: Practitioners course – modules list

 

 

 

 

 

 

 

 

 

Course URL : Data Science for Internet of Things: Practitioners course

Note that the modules are customizable i.e. as per your personal learning plan – you may choose to do more or less of a specific topic. For example, more Deep Learning vs Sensor fusion. But overall, we will follow this plan.

Overall themes covered in the course

  • IoT
  • Data Science
  • Big Data
  • Machine Learning
  • Deep Learning
  • Sensor fusion
  • Use Cases (application domains) and IoT Datasets
  • Math foundation
  • Time Series
  • IoT stream processing
  • Apache Spark ecosystem
  • Programming (R, Scala, SQL)

Weekly schedule

Concepts

Week 0 Orientation, introductions, Personal learning plans, Platform signup
Week 1 nov 16 Foundations:An analytics Driven Organization – IoT and Machine Learning  - Data Science for IoT – Unique characteristics – Data Science for IoT – why now?
Nov 23 Machine Learning conceptsDeep Learning concepts
Nov 30 An introduction to IoT (Internet of Things)
Dec 7 IoT platforms – From sensor to Cloud
Dec 14 Concepts of Big Data Part One
Dec 21 Concepts of Big Data Part Two
Jan 11 Market drivers for IoT
Jan 18 Choosing a model – what technique to Use?
Jan 25 Use Cases  and IoT datasets (these will continue throughout the course)
Feb 1 Time series and NoSQL databases
Feb 8 Streaming analytics part One
Feb 15 Streaming analytics part two
Feb 22 Deep learning part one
Feb 29 Deep learning part two
Mar 7 Machine learning algorithms – part one
Mar 14 Machine learning algorithms – part two
Mar 21 Mathematical foundations – part one
Mar 28 Mathematical foundations – part two

 

 

Programming

 

Week 0 Orientation, introductions, Personal learning plans, Platform signup
 Nov 16
Nov 23
Nov 30 Intro to R, Installations, Basics of R
Dec 7
Dec 14 Data Frames in R & Tabular Data
Dec 21
Jan 11 Data Processing & Data Visualization in R
Jan 18
Jan 25 Scala basics
Feb 1
Feb 8 Spark batch processing I
Feb 15
Feb 22 Spark Batch Processing II
Feb 29
Mar 7 Spark SQL
Mar 14
Mar 21 Spark Streaming
Mar 28