I love motivational examples in teaching complex ideas
Think of ideas like teaching a computer to recognize images of cats using Deep learning .. OR training a computer to play pacman using Deep learning
They all work in the same way
You let the deep learning system iterate with many examples and in each case, you tell the computer using a classifier if it’s interpretation was correct or not(aka is it a cat or not, Pacman scores etc)
Now watch the video below
I see the click at the end of the step as a classifier
As you see, the robot has a long way to go!
I even think its improving on each iteration!
That’s deep learning for you!
PS I am not sure that this was the original intent of the video by @SimoneGiertz but its cool
Video link is lipstick robot
Creating an open methodology for Internet of Things (IoT) Analytics: Data science for Internet of Things
a) I am not referring to ‘standardization’ here. Rather the need for a methodology i.e. structured way to solve problems(Think of it like Kaggle meets #IoT analytics)
b) Added reference to PFA(Portable format for Analytics) – thanks Gregory Piatetsky-Shapiro @kdnuggets for the feedback
We often encounter this problem in my teaching Data Science for Internet of Things:
There is no specific methodology to solve Data Science for IoT (IoT Analytics) problems.
This leads to some initial questions:
- Should there be a distinct methodology to solve Data Science problems for IoT?
- Are IoT problems for Data Science unique enough to warrant a specific approach?
- What existing methodologies should we draw upon?
On one hand , A Data Science for IoT problem is a typical Data Science problem. On the other hand, there are some unique considerations to IoT – for example in the use of Hardware, High Data volumes, Use of CEP(Complex event processing), impact of verticals(like automotive), Impact of streaming data etc.
Background and inspiration
Some initial background:
Data mining has well known methodologies such as Crisp DM. Hilary Mason and others have also proposed specific methodologies for Data Science . Kaggle problems have a specific approach to solving them . With techniques like PFA(Portable format for Analytics) provide a way of formalizing and moving Analytics models.
All these strategies also apply to IoT. IoT itself has methodologies like Ignite IoT – but these do not cover IoT analytics in detail.
A methodology for IoT analytics(Data Science for IoT) should cover the unique aspects of each step in Data Science. For example: It is more than the choice of the model family. The choice of the model family (ANN, SVM, Trees, etc) is only one of the many choices to make – Others include :
a) Choice of the model structure – optimisation methodology (CV, Bootstrap, etc)
b) Choice of the model parameter optimisation algorithm (joint gradients vs. conjugate gradients )
c) Preprocessing of the data (centring, reduction, functional reduction, log-transform, etc.)
d) How to deal with missing data (case deletion, imputation, etc.)
e) How to detect and deal with suspect data (distance-based outlier detection, density-based, etc.)
f) How to choose relevant features (filters, wrappers, embedded method ?)
g) How to measure prediction performances (mean square error, mean absolute error, misclassification rate, lift, precision/recall, etc.)
The methodology could also cover -
Hypothesis testing (“Given a sample and an apparent effect, what is the probability of seeing such an effect by chance?” )
and other ideas ..
An Open methodology for IoT analytics problems
Building on the above, we need an Open, end-to-end, step by step methodology to solve IoT Analytics/Data Science for IoT problems
In addition, the methodology would need to consider the unique aspects of IOT. For example:
b) Deep learning (because we consider Cameras as sensors)
c) Anomaly Detection: Consider Anomaly detection (a typical IoT analytics scenario). There are many considerations: What is the triggering event, How much has the machine deviated from the plan, What is the root cause of the bottleneck, Are there any external factors affecting the system performance, How do I know that I should trust IOT data? Is there a recommended plan of action? How is the Data visualized? Does the Data have missing elements? How do we detect failure in other processes? (Anomaly detection adapted from Dr Vinay Mehendiratta)
In addition, IoT vertical domains have special considerations: Smart Grid, Smart cities, Smart energy, Automotive, Smart factory, Mobile, Wearables, Smart home etc.
Creating an Open methodology
Currently, this is an evolving thought process being developed as a part of the Data Science for IoT course. We intend to create it as an open methodology – starting with the question: What is common across these IoT analytics problems and how can we adapt existing Data Science techniques to solve IoT analytics problems?
Over the next few weeks, we are conducting a survey and developing the methodology
If you are interested in participating and knowing more, please sign up to our mailing list and download our papers or contact me at ajit.jaokar at futuretext.com
An interesting year in social media last year .. and A nice way to start the year
What is the best way for getting started in Statistics for Programmers/Data Science?
I am often asked this question: What’s the best way for getting started in Statistics for Programmers?
At the Data Science for IoT course – and also in my teaching at Oxford University – I have used the following approach.
Firstly, the interest in Statistics for Programmers is a fairly recent phenomenon.
This interest is based on the uptake of Data Science – a hot profession now.
Here’s how most people approach the problem
They pick up an old High School statistics text book – either their own from younger days– or a standard book.
These books are often decades old.
They start with page One .. and work linearly through a few pages ..
They quickly realize why they disliked stats earlier.
And that sentiment has not changed with the passage of time ..
But, here is a different approach
For Data Science, you do not need to master Statistics per se
You need to understand Statistical models.
A model is defined as a combination of predictive algorithms (based on Statistics) and Data.
Data science is based on creating models that improve with experience / training/
In contrast, in the Data Science for IoT course – we start with problems (the Engineering approach).
I recommend three sources which I am using (if you have others, please let me know at ajit.jaokar at futuretext.com and I shall link them and refer back to you)
Start with Understanding the problem
See these two links by @Brandon Rohrer (@Microsoft Data Science) -
See also this post by Dr Vincent Granville @DataScienceCtrl
These posts give you an idea of the problems that can be solved using Data science and stats(without going into the math itself initially)
Then read Allen Downey’s books
Allen Downney writes excellent books and they are all free under creative commons. You can download them at Green Tea Press and they have an excellent ethos. Especially – Think Stats, Think Bayes, Think complexity (in that order).
To encourage the author I would also encourage you to buy these books especially Think Stats.
You can follow him on Twitter @allendowney
Having mastered to this stage, then start with code and small datasets.
In any case, these are small sections of code run in a controlled environment and show you how the stats are implemented(libraries / APIs like scikit learn – are relatively easier to understand if you come from a Programming background)
Thats the path we are using in the Data Science for IoT course.
Any comments/feedback welcome on your approach to teach statistics (ajit.jaokar at futuretext.com)
Image source: Scatter plots – wikipedia
Now running in it’s third batch ..
Welcome to the world’s first course that helps you to become a Data Scientist for the Internet Of Things ..
This course has already started. If you want to know more, please email us at info at futuretext.com
The course starts on March 22 – 2016 -
Please contact email@example.com
This niche, personalized course is suited for:
- Developers who want to transition to a new role as Data Scientists
- Entrepreneurs who want to launch new products covering IoT and analytics
- Anyone interested in developing their career in IoT Analytics
Duration: The course starts from March 2016 and extends to July 2016. We work with you for the next six months after that on a specific project and to help transition your career to Data Science through our network. The extra time also allows you to catch up on specific modules in the course
Scope: Created by Data Science and IoT professionals, the course covers infrastructure (Hadoop – Spark), Programming / Modelling (Python/R/Time series) and Deep Learning (Theano, Deeplearning4j) within the context of the Internet of Things.
Internet of Things: We cover unique aspects of Data Science for IoT including Deep Learning, Complex event processing/sensor fusion and Streaming/Real time analytics
Offline (London): £1,200 GBP + VAT
Online: Yes. Please contact us at firstname.lastname@example.org
Contact us at email@example.com to signup
- The course aims to equip you to be a Data Scientist for the Internet of Things domain
- You can transition your career to Data Science for IoT. This could mean a new job, role, project or a start-up idea
- You are not alone: Toolkits and community support to start working on real Data science problems for IoT
- You master specific skills: Spark, R, Python, Scala, IoT platforms, Data analysis, Deep Learning and SQL among others
- The course content can be personalized (see below)
- The Data Science principles can apply to other domains i.e. beyond IoT
(Note the modules and the sequence are subject to change)
An overview of Data Science
An overview of Data Science, What is Data Science? What problems can be solved using Data science – Extracting meaning from Data – Statistical processes behind Data – Techniques to acquire data (ex APIs) – Handling large scale data – Big Data fundamentals
Data Science and IoT
The IoT ecosystem, Unique considerations for the IoT ecosystem – Addressing IoT problems in Data science (time series data, enterprise IoT edge computing, real-time processing, cognitive computing, image processing, introduction to deep learning algorithms, geospatial analysis for IoT/managing massive geographic scale, strategies for integration with hardware, sensor fusion)
The Apache Spark ecosystem
Apache spark in detail including Scala, SQL, SparkR, Mlib and GraphX
The Data Science for IoT methodology
A specific approach to solve Data Science problems for IoT including strategy and development
Mathematical foundations of Machine learning
Here we formally cover the mathematics for Data science including Linear Algebra, Matrix algebra, Bayesian Statistics, Optimization techniques (Gradient descent) etc. We also cover Supervised algorithms, unsupervised algorithms (classification, regression, clustering, dimensionality reduction etc) as applicable to IoT datasets
Unique Elements for IoT
This module emphasises the following unique elements for IoT
- Complex event processing (sensor fusion)
- Deep Learning and
- Real Time (Spark, Kafka etc)
FAQ: Summary of Benefits and Features
|Impact on your work||Designed for developers/ICT contractors/Entrepreneurs who want to transition their career towards Data science roles with an emphasis on IoT|
|Typical profile||A developer who has skills in programming environments like Java, Ruby, Python, Oracle etc and wants to learn Data Science within the context of Internet of Things with the goal of becoming a Data Scientist for IoT|
|Community support?||Yes. Also includes the Alumni network i.e. beyond the duration of the course at no extra cost.|
|Approach to Big Data||For Big Data, the course is focussed on Apache Spark – specifically Scala, SQL, mlib. Graphx and others on HDFS|
|Approach to Programming||see scope below|
|Approach to Algorithms||see scope below|
|Is this a full data science course?||Yes, we cover machine learning / Data science techniques which are applicable to any domain. Our focus is Internet of Things. The course is practitioner oriented i.e. not academic and is not affiliated to a university.|
|Investment||Offline(London): £1,200 GBP + VAT(if applicable)
Online: Yes. Please contact us at firstname.lastname@example.org
|Help with jobs/employment||yes, we aim to transition your career. Hence, we are selective in the recruitment for the course. There are no guarantees – but a career transition is a key goal for us. We work with you over the duration of the course(including the Project) to get a new role in Data Science/IoT|
|Created by professionals||See our profiles below|
|Personalization||The course is based on a PLP (Personal learning plan) which allows you to customize for language, projects, domains, career goals, entrepreneurial goals etc . The course can be personalized. Examples include a focus on CEP/Sensor fusion, RNNs and Time series, Edge processing, SQL etc. There is no extra cost for this but we agree scope before we start through a Personal Learning Program(PLP). If you are interested in this option, please let us know at email@example.comIf you want to see examples of our work and content, please see Spark SQL real time analytics by Sumit Pal(published on kdnuggets)The evolution of Deep learning models by Ajit Jaokar|
|Duration||The course starts from March 2016 and extends to July 2016. We work with you for the next six months after that on a specific project and to help transition your career to Data Science through our network. The extra time also allows you to catch up on specific modules in the course|
|Projects||A significant part of the course is Project based. Projects are based on predictive analytics algorithms for IoT applications. Projects use our methodology which is based on a formalized way of solving IoT analytics problems. Projects can be based in any of the Programming Languages we cover i.e. R or Python. Spark(Scala) and SQL(distributed processing i.e. Big Data) and Theano and deeplearning4j for Deep learning . If you want to work on a specific project you should indicate in advance(or if you want to explore some ideas deeper)|
|Access to knowledge||We do not restrict access to knowledge by specialization. For example – if you choose to focus on sensor fusion – you will still have access to all material for Deep learning|
|Batch sizes||Are limited to ensure personalized attention|
|Time per week||about 5 hours/week. No additional materials needed to buy etc|
|Certificate of completion||Yes – based on the quiz and projects.|
|Delivery of content||via video. You do not have to be online at specific times|
How is this approach different to the more traditional MOOCs?
Here’s how we differ from MOOCs
a) We are not ‘Massive’ – this approach works for small groups with more focused and personalized attention. We will never have 1000s of participants
b) We help in career leverage: We work actively with you for career leverage – ex you are a startup / you want to transition to a new job etc
c) We are vendor agnostic
d) We work actively with you to build your brand(Blogs/Open source/conferences etc)
e) The course can be personalized to streams(ex with Deep learning, Complex event processing, Streaming etc)
f) We teach the foundations of maths where applicable
g) We work with a small number of platforms which provide current / in-demand skills – ex Apache Spark, R etc
h) We are exclusively focused on IoT (although the concepts can apply to any other vertical)
Approach to Programming
The main Programming focus is on Python, R , Spark (Scala, SQL and R). We also use Deeplearning4j and Theano(for Deep learning). We will also use an ioT platform (like Thingworx) but we will emphasize IoT analytics. The participants need to be able to Code/come from a development background (the Programming language itself does not matter).
What is your approach to working with Algorithms and Maths?
The course is based on modelling IoT based problems in the Python and R programming language. We follow a context based learning approach – hence we co-relate the maths to specific R based IoT models. You will need an aptitude for maths. However, we cover the mathematical foundations necessary. These include: Linear Algebra including Matrix algebra, Bayesian Statistics, Optimization techniques (such as Gradient descent) etc.
What is the implication of an emphasis on IoT?
In 2015, IoT is emerging but the impact is yet to be felt over the next five years. Today, we see IoT driven by Bluetooth 4.0 including iBeacons. Over the next five years, we will see IoT connectivity driven by the wide area network (with the deployment of 5G 2020 and beyond). We will also see entirely new forms of connectivity (ex LoRa, Sigfox etc). Enterprises (Renewables, Telematics, Transport, Manufacturing, Energy, Utilities etc) will be the key drivers for IoT. On the consumer side, Retail and wearables will play a part. This tsunami of data will lead to an exponential demand for analytics since analytics is the key business model behind the data deluge. Most of this data will be Time series data but will also include other types of data. For example, our emphasis on IoT also includes Deep Learning since we treat video and images as sensors. IoT will lead to a Re-imagining of everyday objects.
Why is this course unique?
The course emphasizes some aspects are unique to IoT (in comparison to traditional data science). These include: A greater emphasis on time series data, Edge computing, Real-time processing, Cognitive computing, In memory processing, Deep learning, Geospatial analysis for IoT, Managing massive geographic scale(ex for Smart cities), Telecoms datasets, Strategies for integration with hardware and Sensor fusion (Complex event processing). Note that we include video and images as sensors through cameras (hence the study of Deep learning)
Who is creating/teaching this course?
The course is created by futuretext and conducted by Ajit Jaokar, Dr Paul Katsande and Sumit Pal
Ajit Jaokar – Based in London, Ajit’s research and consulting is based on Data Science and the Internet of Things. His work is based on his teaching at Oxford University and UPM (Technical University of Madrid) and covers IoT, Data Science, Smart cities and Telecoms.
Sumit Pal is a big data, visualisation and data science consultant. He is also a software architect and big data enthusiast and builds end-to-end data-driven analytic systems. Sumit has worked for Microsoft (SQL server development team), Oracle (OLAP development team) and Verizon (Big Data analytics team) in a career spanning 22 years. Currently, he works for multiple clients advising them on their data architectures and big data solutions and does hands on coding with Spark, Scala, Java and Python. Sumit is based in Boston.
Dr Paul Katsande is a technical architect based in London working with Apache Spark, Scala and Data Science. Paul’s PhD research is based on image processing from the University of Manchester.
We have limited spaces. Please contact us at firstname.lastname@example.org if you want to take the next steps!
See video below
|Week 0 March 15||Orientation, introductions, Personal learning plans, Platform signup|
|Week 1 mar 21||Foundations:An analytics Driven Organization – IoT and Machine Learning - Data Science for IoT – Unique characteristics – Data Science for IoT – why now?|
|Mar 28||Machine Learning concepts Deep Learning concepts|
|Apr 4||An introduction to IoT (Internet of Things)|
|Apr 11||IoT platforms – From sensor to Cloud|
|Apr 18||Concepts of Big Data Part One|
|Apr 25||Concepts of Big Data Part Two|
|May 2||Market drivers for IoT|
|May 9||Choosing a model – what technique to Use?|
|May 16||Use Cases and IoT datasets (these will continue throughout the course)|
|May 23||Time series and NoSQL databases|
|May 30||Streaming analytics part One|
|June 6||Streaming analytics part two|
|June 13||Deep learning part one|
|June 20||Deep learning part two|
|June 2 7||Machine learning algorithms – part one|
|July 4||Machine learning algorithms – part two|
|July 11||Mathematical foundations – part one|
|July 18||Mathematical foundations – part two|
|July To Dec 31||Project|
|Week 0 Mar 15||Orientation, introductions, Personal learning plans, Platform signup|
|Week 1 mar 21|
|Apr 4||Intro to R, Installations, Basics of R|
|Apr 18||Data Frames in R & Tabular Data|
|May 2||Data Processing & Data Visualization in R|
|May 16||Scala basics|
|May 30||Spark batch processing I|
|June 13||Spark Batch Processing II|
|June 2 7||Spark SQL|
|July 11||Spark Streaming|
|July To Dec 31||Projects|
Contact us at email@example.com to signup
As per every year, we are supporting this great event. The IoT data analytics and visualization event – Palo alto is now a must attend event for IoT professionals.
DATA15’ which provides a 15% discount to attend the event
Have a look at the conference and the speakers IoT data analytics and visualization event – Palo alto – Feb 2016
Miami Young Data Scientists – Pleased to be the winning team in the 2015 Association of Space Engineers/Astrosat challenge
For the last two years, I have worked with teaching Computer Science for young people.
This venture has had its ups and downs.
But we have had the support of many who believed in the vision.
So, It was very nice to see this
We (Countdown Institute – i.e. now me and Richard Schuchts based in Miami ) submitted an entry in the ASE AstroSat Challenge (supported by Northrop Grumman Corporation). The Association of Space Explorers is the unique professional organization composed of astronauts who have orbited Earth. They have 375 members from 35 countries and are passionate about encouraging students to pursue science, technology, engineering, and math education, as well as careers in astronautics. The ASE AstroSat Challenge is designed to give students a taste of the exciting world of satellite operations. The ASE AstroSat Challenge is made possible with the generous support of the Northrop Grumman Corporation.
Only 15 teams were selected to run a Space experiment – And our team (Miami Young Data Scientists/Countdown Institute) were one of them
Its amazing to get here.
It means the team of ‘young data scientists’ from Miami will be able to run a Space Experiment live in Space and also learn Data Science
The winning entry was based on teaching Data Science to young people.
Specifically, using Regression algorithms to make predictions on Space data from Ardusat (more on this soon)
This is different from our original idea and is more complex .. but I think it would make a difference to get more young people into Data Science (as per Harvard – the hottest profession in future)
Thus, I think the biggest winners are the young people of Miami who are a part of the winning team.
The main variation/evolution from the original idea is to focus on Data Science and inspiring students to take up Data Science through visualization of data and predictions using scientific methodology.
Its a way to get more students(both boys and girls) interested in Data science using Space exploration by coding on a live satellite.
Hence, the regression algorithms/iPython notebooks etc.
Also a bit more math. and hence slightly for older students(aroundn 15 to 17). All this also aligns with my ‘day job’ so to speak!
Here is the full list of winners of the astrosat competition
I am happy to share more. If you want to know more about this – please email me at ajit.jaokar at futuretext.com
Here is a set of papers from the Data Science for Internet of Things – practitioners course
These have been published by course participants in top Data Science blogs like KDnuggets and Data Science Central
The Zip file includes the following papers:
1) Recurrent neural networks, Time series data and IoT – Part One
2) Spark SQL for Real Time Analytics – Part One
3) Spark SQL for Real Time Analytics – Part Two
4) Time Series IoT applications in Railroads
5) Nov 13 update: Kalman filters 1 and 2
Note that the modules are customizable i.e. as per your personal learning plan – you may choose to do more or less of a specific topic. For example, more Deep Learning vs Sensor fusion. But overall, we will follow this plan.
Overall themes covered in the course
- Data Science
- Big Data
- Machine Learning
- Deep Learning
- Sensor fusion
- Use Cases (application domains) and IoT Datasets
- Math foundation
- Time Series
- IoT stream processing
- Apache Spark ecosystem
- Programming (R, Scala, SQL)
|Week 0||Orientation, introductions, Personal learning plans, Platform signup|
|Week 1 nov 16||Foundations:An analytics Driven Organization – IoT and Machine Learning - Data Science for IoT – Unique characteristics – Data Science for IoT – why now?|
|Nov 23||Machine Learning conceptsDeep Learning concepts|
|Nov 30||An introduction to IoT (Internet of Things)|
|Dec 7||IoT platforms – From sensor to Cloud|
|Dec 14||Concepts of Big Data Part One|
|Dec 21||Concepts of Big Data Part Two|
|Jan 11||Market drivers for IoT|
|Jan 18||Choosing a model – what technique to Use?|
|Jan 25||Use Cases and IoT datasets (these will continue throughout the course)|
|Feb 1||Time series and NoSQL databases|
|Feb 8||Streaming analytics part One|
|Feb 15||Streaming analytics part two|
|Feb 22||Deep learning part one|
|Feb 29||Deep learning part two|
|Mar 7||Machine learning algorithms – part one|
|Mar 14||Machine learning algorithms – part two|
|Mar 21||Mathematical foundations – part one|
|Mar 28||Mathematical foundations – part two|
|Week 0||Orientation, introductions, Personal learning plans, Platform signup|
|Nov 30||Intro to R, Installations, Basics of R|
|Dec 14||Data Frames in R & Tabular Data|
|Jan 11||Data Processing & Data Visualization in R|
|Jan 25||Scala basics|
|Feb 8||Spark batch processing I|
|Feb 22||Spark Batch Processing II|
|Mar 7||Spark SQL|
|Mar 21||Spark Streaming|