Programming for Data Science the Polyglot approach: Python vs. R OR Python + R + SQL





In this post, I discuss a possible new approach to teaching Programming for Data Science.

Programming for Data Science is focussed on the R vs. Python question.  Everyone seems to have a view including the venerable Nature journal (Programming – Pick up Python).

Here, I argue that we look beyond Python vs. R debate and look to teach R, Python and SQL together. To do this, we need to look at the big picture first (the problem we are solving in Data science) and then see how that problem is broken down and solved by different approaches. In doing so, we can more easily master multiple approaches and then even combine them if needed.

On first impressions, this Polyglot approach (ability to master multiple languages) sounds complex.

Why teach 3 languages together?  (For simplicity – I am including SQL as a language here)

Here is some background

Outside of Data science, I also co-founded a social enterprise to teach Computer Science to kids  Feynlabs. At Feynlabs, we have been working with ways to accelerate learning to Code. One way to do this is to compare and contrast multiple programming languages. This approach makes sense for Data Science also because a learner can potentially approach Data science from many directions.

To learn programming for Data Science, it would thus help to build up from an existing foundation they are already familiar with and then co-relate new ideas to this foundation through other approaches. From a pedagogical standpoint, this approach is similar to David Asubel who stressed the importance of prior knowledge in being able to learn new concepts:  “The most important single factor influencing learning is what the learner already knows.”

But first, we address what is the problem we are trying to solve and how that problem can be broken down

I also propose to make this approach as part of Data Science for IoT course/certification but I also expect I will teach it as a separate module – probably in a workshop format in London and USA. If you are interested to know more, please sign up on the mailing list   HERE

Data Science – the problem we are trying to solve

Data science involves the extraction of knowledge from data. Ideally, we need lots of data from a variety of sources.  Data Science lies at the intersection of multiple disciplines: Programming, Statistics, Algorithms, Data analysis etc. The quickest way to solve Data Science problems is to start analyzing data as soon as possible. However, Data Science also needs a good understanding of the theory – especially the machine learning approaches.

A Data Scientist typically approaches a problem using a methodology like OSEMN (Obtain, Scrub, Explore, Model, Interpret). Some of these steps are common to a classic data warehouse and are similar to classic ETL (Extract Transform Load) approach. However, the modelling and interpreting stage are unique to Data Science. Modelling needs an understanding of Machine Learning algorithms and how they fit together. For example: Unsupervised algorithms (Dimensionality reduction, Clustering) and Supervised algorithms (Regression, Classification)

To understand Data Science, I would expect some background in Programming. Certainly, one would not expect a Data Scientist to start from “Hello World”. But on the other hand, the syntax of a language is often over-rated. Languages have quirks – and they are easy to get around with most modern tools.

So, if we try to look at the problem / big picture first (ex the Obtain, Scrub, Explore, Model and Interpret) stages – it is easier to fit in the Programming languages to the stages. Machine Learning has 2 phases: the Model Building phase and the Prediction phase. We first build the model (often as a batch mode – and it takes longer). We then perform predictions on the model in a dynamic/real-time mode. Thus, to understand Programming for Data Science, we can divide the learning into four stages: The Tool itself (IDE), Data Management, Modelling and Visualization

Tools, IDE and Packages

After understanding the base syntax – it’s easier to understand the language in terms of its packages and libraries. Both Python and R have a vast number of packages (such as Statsmodels)  – often distributed as libraries (scikit-learn). Both languages are interpreted. Both have good IDEs such as Spyder, iPython for Python and RStudio for R. If using Python, you would probably use a library like scikit-learn and a distribution of Python such as the Anaconda distribution. With R, you would use the RStudio  and install specific packages using R’s  CRAN package management system.

Data management

Apart from R and Python, you would also need to use SQL. I include SQL because SQL plays a key role in the Data Scrubbing stage. Some have called this stage as the Janitor work of Data Science and it takes a lot of time. SQL also plays a part in SQL on Hadoop approaches like Apache Drill which allow users to write SQL queries on data stored in Hadoop and receive results

With SQL, you are manipulating data in Sets. However, once the data is inside the Programming environment, it is treated differently depending on the language.

In R, everything is a vector and R Data structures and functions are vectorized . This means, most functions in R work on Vectors (i.e. on all the elements – not on individual elements in a loop). Thus, in R, you read your data in a data frame and use a built-in model (here are the steps / packages for linear regression) . In Python, if you did not use a library like scikit-learn , you would need to make many decisions yourselves and that can be a lot harder. However, with a package like scikit-learn, you get a consistent, well documented  interface to the models. That makes your job a lot easier by focussing on the usage.

Data Exploration and Visualization

After the Data modelling stage, we come to Data exploration and visualization. Here, for Python – the pandas package is a powerful tool for data exploration. Here is a simple and quick intro to the power of Python Pandas (YouTube video). Similarly, R uses dplyr and ggplot2 packages for Data exploration and visualization.

A moving goalpost and a Polyglot approach

Finally, much of this discussion is a rapidly moving goalpost. For example, in R, large calculations need the data to be loaded in a matrix (ex nxn matrix manipulation). But, with platforms like Revolution Analytics – that can be overcome. Especially with the acquisition of Revolution analytics by Microsoft – and with Microsoft’s history for creating good developer tools – we can expect development in R would be simplified.

Also, since both R and Python are operating in the context of Hadoop for Data science, we would expect to leverage the Hadoop architecture through HDFS connectors both for Python Hadoop frameworks and R Hadoop integration. Also, one would argue that we are already living in a post hadoop/mapreduce world with Spark and Storm especially for Real time calculations and that at least some Hadoop functions may be replaced by Spark

Here is a good introduction to Apache Spark and a post about Getting started with Spark in Python. Interestingly, the Spark programming guide includes integration with 3 languages (Scala, Java and Python) but no R. But the power of Open source means we have SparkR which integrates R with Spark.

The approach to cover multiple languages has some support – for instance, with the Beaker notebook . You could also achieve the same effect by working on the command line for example in Data Science at the Command Line


Even in a brief blog post – you can get a lot of insights when we look at the wider problem of Data science and compare how different approaches are addressing segments of that problem. You just need to get the bigger picture of how these Languages fit together for Data Science and understand the  major differences (for example vectorization in R).

Use of good IDEs, packages etc softens the impact of programming.

It then changes our role, as Data Scientists, to mixing and matching a palette of techniques as APIs – sometimes spanning languages.

I hope to teach this approach as part of Data Science for IoT course/certification

Programming for Data Science will also be a separate module talk over the next few months at fablab london, London IT contractors meetup group, CREATE Miami, a venture accelerator at Miami Dade College, City Sciences conference(as part of a larger paper) in Shanghai and MCS Madrid

For more schedules and details please sign up HERE

Call for Papers Shanghai, 4-5 June 2015 – International Conference on City Sciences (ICCS 2015): New architectures, infrastructures and services for future cities

Call for Papers from  International  Conference on City  Sciences  (ICCS  2015): New  architectures,  infrastructures  and  services  for  future cities co-organized by City sciences where I teach




Call for Papers  Shanghai,  4-5  June  2015

International  Conference on City  Sciences  (ICCS  2015): New  architectures,  infrastructures  and  services  for  future cities

The   new   science   of   cities   stands   at   a   crossroads.   It   encompasses   rather   different,   or   even  conflicting,  approaches.  Future  cities  place  citizens  at  the  core  of  the  innovation  process  when  creating  new  urban  services,  through  “experience  labs”,  the  development  of  urban  apps  or  the  provision   of     ”open    data”.     But     future     cities     also    describe     the     modernisation     of    urban  infrastructures     and    services    such    as    transport,    energy,    culture,    etc.,    through    digital    ICT  technologies:   ultra-­‐fast   fixed  and  mobile  networks,  the  Internet  of  things,  smart  grids,  data  centres,   etc.  In  fact  during  the  last   two  decades local   authorities  have  invested   heavily  in  new  infrastructures   and  services,   for  instance  putting  online  more  and  more  public  services  and  trying    to   create   links   between  still prevalent silo   approaches   with   the   citizen   taking   an  increasingly  centre-­‐stage  role.  However,  so  far  the  results  of  these  investments  have  not  lived  up  to  expectations,  and  particularly  the  transformation  of  the  city  administration  has  not  been  as    rapid   nor   as   radical   as   anticipated.   Therefore,   it   can   be   said   that   there   is   an   increasing  awareness  of  the  need  to  deploy  new  infrastructures  to  support  updated  public  services  and  of  the     need    to   develop   new    services    able    to   share    information    and    knowledge    within    and  between   organizations   and   citizens.   In   addition,   urban   planning   and   urban   landscape   are  increasingly    perceived   as   a   basic   infrastructure,   or   rather   a   framework,   where   the   rest   of  infrastructures  and  services  rely  upon.  Thus,  as  an  overarching  consequence,  there  is  an  urgent  need   to   discuss  among  practitioners  and   academicians  successful  cases  and   new  approaches  able  to  help  to  build  better  future  cities.

Taking  place  in  Shanghai,   the  paradigm  of  challenges  for  future  cities  and   a  crossroad   itself  between   East  and   West,  the  International   Conference  on  City  Sciences  responds  to  these  and  other   issues  by  bringing  together  academics,  policy  makers,  industry  analysts,  providers  and  practitioners     to    present    and    discuss    their    findings.    A    broad    range    of    topics    related  to  infrastructures    and   services   in   the   framework   of   city   sciences   are   welcome   as   subjects   for  papers,  posters  and  panel  sessions:

  • Developments  of   new  infrastructures  and  services  of   relevance  in  an  urban  context:  broadband,    wireless,   sensors,   data,   energy,   transport,   housing,   water,   waste,   and  environment.
  • City sustainability  from  infrastructures  and  services
  • ICT-­‐enabled  urban  innovations
  • Smart city  developments  and  cases
  • Social and  economic  developments  citizen-­‐centric
  • Renewed  government  services  in  a  local  level
  • Simulation and  modelling  of  the  urban  context
  • Urban landscape  as new infrastructure Additional relevant topics  is  also  welcomed.

Authors  of  selected   papers  from  the  conference  will  be  invited   to   submit  to   special  issues  of International  peer-reviewed  academic  journals.

Important  deadlines:

  • 20 February:  Deadline  for  Abstracts  and  Panel  Session  Suggestions
  • 30 March:  Notification  of  Acceptance
  • 30  Apr:  Deadline  for  Final  Papers  and  Panel  Session  Outlines
  • 4- 5  June:  International  Conference  on  City  Sciences  at  Tongji  University  in  Shanghai,  PR  China

Submission of  Abstracts:

Abstracts  should   be  about  2  pages  (800  to   1000  words)  in   length   and  contain  the   following


  • Title of  the  contribution
  • A  research  question
  • Remarks on  methodology
  • Outline of  (expected)  results
  • Bibliographical notes  (up  to  6  main  references  used  in  the  paper)
  • All abstracts  will  be  subject  to  blind  peer  review  by  at  least  two  reviewers.

conference link: International  Conference on City  Sciences  (ICCS  2015): New  architectures,  infrastructures  and  services  for  future 





Data Science at the command line – Book and workshop ..











I am reading a great book called Data Science at the Command line

The author Jeroen Janssens has a workshop in London on Data Science at the command line which I am attending

Here is a brief outline of some of the reasons why I like this approach ..

I have always liked the Command line .. from my days of starting with Unix machines. I must be one of the few people to actually want a command line mobile phone!

 If you have worked with Command line tools, you already know that they are powerful and fast.
For data science especially, that’s relevant because of the need to manipulate data and work with a range of products that can be invoked through a shell like interface
The book is based on the Data science toolbox – created by the author as an Open source tool and is brief and concise(187 pages). The book focuses on specific commands / strategies that can be linked together using simple but powerful command line interfaces
Examples include:
using tools such as json2csv tapkee dimensionality reduction library  and Rio (created by the author). Rio loads CSVs into R as a data.frame, executes given commands and gets the output as CSV or PNG )
run_experiment -  a SciKit-Learn command-line utility for running a series of learners on datasets specified in a configuration file.
tools like topwords.R
and many others
By co-incidence I read this as I was working on this post:  command line tools can be 235x faster than your hadoop cluster

I recommend both the book and the workshop.


a) I have been informed that there is a 50% discount offered for students, academics, startups and NGOs for the workshop
b) Jeroen says that:  The book is not really based on the Data Science Toolbox, but rather provides a modified one so that you don’t have to install everything yourself in order to get started. You can download the VM HERE

IoT and the Rise of the Predictive Organization












I will be launching a newsletter starting in Jan 2015 to cover these ideas in detail.

You can sign up for the newsletter at futuretext IoT Machine Learning – Predictive Analytics – newsletter

I will also be launching a course/certification for “Data Science in IoT” at Oxford, London and San Francisco – email me at ajit.jaokar at if you want to know more


In the Godfather II, Hyman Roth said to Micheal Corleone

             ’Michael – we are bigger than US Steel“.

Over the holiday season,  I said this to my friend Jeremy Geelan when I was comparing the Mobile industry to the IoT.

The term Internet of Things was coined by the British technologist Kevin Ashton in 1999, to describe a system where the Internet is connected to the physical world via ubiquitous sensors. Languishing depths of academia(at least here in Europe …) – IoT had it’s netscape moment early in 2014 when Google acquired Nest

Mobile is huge and has dominated the Tech landscape for the last decade.

But the Internet of Things(IoT) will be bigger.

How big?

Here are some numbers. Souce (adapted from  David Wood blog )

By 2020, we are expected to have 50 billion connected devices

To put in context:

  • The first commercial citywide cellular network was launched in Japan by NTT in 1979.
  • The milestone of 1 billion mobile phone connections was reached in 2002.
  • The 2 billion mobile phone connections milestone was reached in 2005.
  • The 3 billion mobile phone connections milestone was reached in 2007.
  • The 4 billion mobile phone connections milestone was reached in February 2009.
  • We reached 7.2 billion active mobile connections 2014

So, 50 billion by 2020 is a massive number by a factor, and no one doubts that number any more.

But IoT is much more than the number of connections – it’s all about the Data and the intelligence that can be gleaned from the Data.

As more objects are becoming embedded with sensors and gain the ability to communicate, new business models emerge.

IoT also creates new pathways for information to travel – especially across an Organization’s bounday and across it’s value chain and in engaging with their customers.

This Data and the Intelligence gleaned from it – will fundamentally transform organizations creating a new kind of ‘Predictive Organization’ which has Predictive analytics / Machine Learning at it’s core i.e. Algorithms that will learn from experience.

Machine learning is the study of algorithms and systems that improve their performance with experience. There are broadly two ways for algorithms to learn:  Supervised learning(where the algorithm is trained in advance using labelled data sets) and unsuprevised learning (with no prior learning – ex with methods like Clustering etc).

Machine Learning algorithms take the billions of Data points as inputs and extract actionable insights from ther data. So, the Predictive Organization starts with the prediction process and then creates a feedback loop through measuring and managing. Crucially, this tales place across the boundary of the Enterprise

I believe there are twelve unique characterictics of IoT based Predictive analytics/machine learning

1)     Time Series Data: Processing sensor data.

2)     Beyond sensing: Using Data for improving lives and businesses.

3)     Managing IoT Data.

4)     The Predictive Organization: Rethinking the edges of the Enterprise: Supply Chain and CRM impact

5)     Decisions at the ‘Edge’

6)     Real time processing.

7)     Cognitive computing – Image processing and beyond.

8)     Managing Massive Geographic scale.

9)     Cloud and Virtualization.

10)  Integration with Hardware.

11)  Rethinking existing Machine Learning Algorithms  for the IoT world.

12)  Co-relating IoT data to social data – the Datalogix model for IoT

Indeed one could argue that IoT leads to the creation of new types of organization – for instance  based on the sharing economy based on converging the digital and the physical world.

I will be launching a newsletter starting in Jan 2015 to cover these ideas in detail.

You can sign up for the newsletter at futuretext IoT Machine Learning – Predictive Analytics – newsletter

I will also be launching a course/certification for “Data Science in IoT” at Oxford, London and San Francisco - email me at ajit.jaokar at if you want to know more

Image source: wikipedia

Implementing Tim Berners-Lee’s vision of Rich Data vs. Big Data










In a previous blog post,  I discussed (Magna Carta for the Web) about the potential of Tim Berners-Lee vision of Rich Data.

When I met Tim at the EIF event in Brussels, I asked about the vision of Rich Data. I also thought more about how this vision could be actually implemented from a Predictive/Machine learning standpoint.

To recap the vision from the previous post:

So what is Rich Data? It’s Data (and Algorithms) that would empower the individual. According to Tim Berners-Lee: “If a computer collated data from your doctor, your credit card company, your smart home, your social networks, and so on, it could get a real overview of your life.” Berners-Lee was visibly enthusiastic about the potential applications of that knowledge, from living more healthily to picking better Christmas presents for his nephews and nieces. This, he said, would be “rich data”. (Motherboard

This blog explores a possible way this idea could be implemented. I hope perhaps I can implement it perhaps as part of an Open Data Institute incubated start-up

To summarize my view here:

The world of Big Data needs to maintain large amounts of Data because the past is used to predict the future. This is needed  because we do not voluntarily share data and Intent. Here,  I propose that to engender Trust, both the Algorithms and the ‘training’ should be transparent – which leads to greater Trust and greater sharing.  This in turn does not need us to hold large amounts of Data (Big Data) to determine Predictions(Intents). Instead, Intents will be known (shared voluntarily) by people at the point of need. This would create a world of Rich Data – where the Intent is determined algorithmically using smaller data sets (and without the need to maintain a large amount of historical data)


Thus, to break it down further, here are some more thoughts:

a)      Big Data vs. Rich Data: To gain insights from data, we currently collect all the data we can lay our hands on (Big Data).  In contrast, for Rich Data, instead of collecting all data in one place in advance, you need access to many small data sets for a given person and situation. But crucially, this ‘linking of datasets’ should happen at the point of need and dynamically. For example:  Personal profile, Contextual information and risk profile ex for a person who is at a risk of Diabetes or a Stroke – only at the point of a medical emergency(vs. gathered in advance).

b)      Context already exists: Much of this information exists already. The mobile industry has done a great job of  capturing contextual  information accurately – for example location and tying it to content(Geo tagged images)

c)       The ‘segment of one’ idea has been tried in many variants: Segmenting has been tried – with some success. In Retail (The future of Retail is segment of One), BCG perspective paper (Segment of One marketing – pdf) Inc magazine – Audience segmenting – targeting your customers . Segmentation is already possible

d)      Intents are not linked to context: The feedback loop is not complete because currently while context exists – it is not tied to Intent. Most people do not trust advertisers and others with their intent

e)      Intent (Predictions) are based on the past:  Because we do not trust providers with Intent – Intent is gleaned through Big Data. Intents are related to Predictions. Predictions are based on a large number of historical observations either of the individual or related individuals. To create accurate predictions in this way, we need large amounts of centralized data and any other forms of Data.  That’s the Big Data world we live in

f)       IoT: IoT will not solve the problem. It will create an order of magnitude of contextual information – but providers will not be trusted and datasets will not be shared. And we will continue to create larger datasets with bigger volumes.


To recap:

a)      To gain insights from data, we currently collect all the data we can lay our hands on. This is the world of Big Data.

b)      We take this approach because we do not know the Intent.

c)       Rather, we (as people) do not trust providers with Intent.

d)      Hence, in the world of Big Data, we need a lot of Data.  In contrast, for Rich Data, instead of collecting all data in one place in advance, you need access to many small data sets for a given person and situation. But crucially, this ‘linking of datasets’ should happen at the point of need and dynamically. For example:  Personal profile, Contextual information and risk profile ex for a person who is at a risk of Diabetes or a Stroke – only at the point of a medical emergency(vs. gathered in advance).


From an algorithmic standpoint, the overall objective is:  To determine the maximum likelihood of sharing under a Trust framework. Given a set of trust frameworks and a set of personas ( for example person with a propensity of a stroke)  - We want to know the probability of sharing information and under which trust framework

We need a small number of observations for an individual

We need an inbuilt trust framework for sharing

We need the Calibration of Trust to be ‘people driven’ and not provider driven


A possible way to implement the above could be through a Naive Bayes Classifier.

  • In machine learning, Naive Bayes classifiers are a family of simple probabilistic classifiers based on applying Bayes’ theorem with strong (naive) independence assumptions between the features.
  • Workings: Let {f1, . . . , fm} be a predefined set of m features. A classifier is a function f that maps input feature vectors x ∈ X to output class labels y ∈ {1, . . . , C} where X is the feature space. Our goal is to learn f from a labelled training set of N input-output pairs, (xn, yn), n = 1 : N; this is an example of supervised learning i.e. the algorithm has to be trained
  • An advantage of Naive Bayes is that it only requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification.
  • This represents the basics of Naive Bayes. Tom Mitchell in a Carnegie Mellon paper says “A hundred independently drawn training examples will usually suffice to obtain a maximum likelihood estimate of P(Y) that is within a few percent of its correct value1 when Y is a Boolean variable. However, accurately estimating P(X|Y) typically requires many more examples.”
  • In addition, we need to consider feature selection and dimensionality reduction. Feature selection is the process of selecting a subset of relevant features for use in model construction. Feature selection is different from dimensionality reduction. Both methods seek to reduce the number of attributes in the dataset, but a dimensionality reduction method do so by creating new combinations of attributes, where as feature selection methods include and exclude attributes present in the data without changing them. Examples of dimensionality reduction methods include Principal Component Analysis


  • Thus, a combination of Naive Bayes and PCA may be  a start to implementing Rich Data. Naive Bayes needs relatively a smaller amount of data. PCA will reduce dimensionality.
  • How to incorporate Trust? The next question is: How to incorporate Trust? Based on above, Trust become a feature (an input vector) to the algorithm with an appropriate weightage. The output is then based on the probability of sharing under a Trust framework for a given persona
  • Who calibrates the Trust? A related and bigger question is: How to calibrate Trust within the Algorithm? This is indeed the Holy Grail and underpins the foundation of the approach. Prediction in research has grown exponentially due to the availability of Data – but Predictive science is not perfect (Good paper: The Good, the Bad, and the Ugly of Predictive) .  Predictive Algorithms gain their intelligence through two ways:  Supervised learning  (like Naive Bayes where the algorithm learns through training Data) or through Unsupervised learning where the algorithm tries to find hidden structure in unlabeled data.


So, if we have to calibrate trust for a Supervised learning algorithm – the workings must be open and the trust (propensity to share) must be created from the personas itself. Ex – People at risk of a stroke, elderly etc. Such an Open algorithm that learns from the people and whose workings are transparent will engender trust. It will in turn lead to greater sharing – and a different type of predictive algorithm which will need smaller historical amounts of data  - but will track a larger number of Data streams to determine value at their intersection. This in turn will complete the feedback loop and tie intent to context

Finally, I do not propose that a specific algorithm (such as Naive Bayes) is the answer – rather I propose that both the Algorithms and the ‘training’ should be transparent – which leads to greater Trust and greater sharing.  This in turn does not need us to hold large amounts of Data (Big Data) to determine Predictions(Intents). Instead, Intents will be known (shared voluntarily) by people at the point of need. This would create a world of Rich Data – where the Intent is determined algorithmically using smaller data sets (and without the need to maintain a large amount of historical data)

Comments welcome – at ajit.jaokar at 

Predictive Analytics as a service for IoT


This post is a personal viewpoint based on my teaching (IoT and Machine Learning) at the City sciences program at UPM in Madrid – Technical University of Madrid and at Oxford University (with a mobile perspective).

Predictive Analytics are critical for IoT, but most companies do not have the skillsets to develop their own Predictive analytics engine.  The objective of this effort is to provide a predictive analytics interface for Hypercat. We aim to provide a solution accessed through a Hypercat API and a library. Whenever possible, we will use Open Source. We will also encapsulate industry best practices into the solution. The post is also related to extending the discussions at the event Smart cities need a Trusted IoT foundation

Data and Analytics will be the key differentiator for IoT.

A single sensor collecting data at one-second intervals will generate 31.5 million datapoints year (source Intel/WindRiver). However, the value lies not just in one sensor’s datapoints – but rather the collective intelligence gleaned for thousands (indeed millions) of sensors working together

As I discuss below, this information (and more specifically the rate of IoT based sensor information and its real time nature) will make a key difference for IoT and Predictive analytics.

IoT and predictive analytics will change the nature of decision making and will change the competitive landscape of industries. Industries will have to make thousands of decisions in near real-time. With predictive analytics, each decision will improve the model for subsequent decisions (also in near real time). We will recognize patterns, make adjustments and improve performance based on data from multiple people and sensors

IoT and Predictive analytics will enable devices to identify, diagnose and report issues more precisely and quickly as they occur. This will create a ‘closed loop’ model where the Predictive model improves with experience. We will thus go from identifying patterns to making predictions – all in real time  

However, the road to this vision is not quite straight forward. The two worlds of IoT and Predictive analytics do not meet easily

Predictive analytics needs the model to be trained before the model makes a prediction. Creating a model and updating it on a continuous real-time basis with streaming IoT data is a complex challenge. Also, it does not fit in the traditional model of map reduce and it’s inherently batch processing nature. This challenge is being addressed already (Moving Hadoop beyond batch processing and MapReduce) but will become increasingly central as IoT becomes mainstream.


IoT and Predictive analytics – opportunities

For IoT and Predictive analytics, processing will take place both in the Cloud but also more to the edge. Not all data will be sent to the Cloud at all times. The newly launched Egburt from Camgian microsystems is an example of this new trend.  Some have called this trend ‘Data gravity’ where computing power is brought to the data as opposed to processing Data in a centralized location.

In addition, the sheer volume of IoT data leads to challenges and opportunities. For example 100 million points per second in a time series is not uncommon. This leads to specific challenges for IoT (Internet of Things – time series data challenge)

Here are some examples of possible opportunities for IoT and Predictive analytics where groups of sensors work together:

  • We could undertake system wide predictive maintenance of offshore equipment like wind farms for multiple turbines (i.e. the overall system as opposed to a specific turbine).  If we predict a high likelihood of failure in one turbine, we could dynamically reduce the load on that turbine by switching to a lower performance.
  • Manage overall performance of a group of devices – again for the wind farm example – individual turbines could be tuned together to achieve optimal performance where individual pieces of equipment have an impact on the overall performance
  • Manage the ‘domino effect’ of failure – as devices are connected (and interdependent) – failure of one could cascade across the whole network. By using predictive analytics – we could anticipate such cascading failure and also reduce its impact

IoT and Predictive analytics – challenges

Despite the benefits, the two worlds of IoT and Predictive analytics do not meet very naturally

In a nutshell, Predictive analytics involves extracting information from existing data sets to identify patterns which help predict future outcomes and trends for new (unseen) scenarios.  This allows us to predict what will happen in future with an acceptable level of reliability.

To do this, we must

a)      Identify patterns from existing data sets

b)      Create a model which will predict the future


Doing these two steps in Real time is a challenge. Traditionally, data is fed to a system in a batch. But for IoT, we have a continuous stream of new observations in real time. The outcome (i.e. the business decision) also has to be made in real time. Today, some systems like Credit card authorization perform some real time validations – but for IoT, the scale and scope will be much larger.


So, this leads to more questions:

a)      Can the predictive model be built in real time?

b)      Can the model be updated in real time?

c)       How much historical data can be used for this model?

d)      How can the data be pre-processed and at what rate?

e)      How frequently can the model be retrained?

f)       Can the model be incrementally updated?


There are many architectural changes also for Real time  ex In memory processing, stream processing etc



According to Gartner analyst Joe Skorupa. “The enormous number of devices, coupled with the sheer volume, velocity and structure of IoT data, creates challenges, particularly in the areas of security, data, storage management, servers and the data center network, as real-time business processes are at stake,”

Thus, IoT will affect many areas: Security, Business processes, Consumer Privacy Data Storage Management Server Technologies Data Center Network etc

The hypercat platform provides a mechanism to manage these complex changes

We can model every sensor+actuator and person as a Digital entity. We can assign predictive behaviour to digital objects (Digital entity has processing power, an agenda and access to meta data). We can model and assign predictive behaviour to multiple levels of objects(from the while refinery to a valve)

We can model time varying data and predict behaviour based on inputs at a point in time.  The behaviour is flexible (resolved at run time) and creates a risk prediction and a feedback loop to modify behaviour in real time along with a set of rules

We can thus cover the whole lifecycle – Starting with discovery of new IoT services in a federated manner, managing security and privacy to ultimately creating autonomous, emergent behaviour for each entity

All this in context of a security and Interoperability framework


Predictive analytics as a service?

Based on the above, predictive analytics cannot be an API – but it would be more a dynamic service which can provide the right data, to the right person, at the right time and place. The service would be self improving(self learning) in real time.

I welcome comments on the above. You can email me at ajit.jaokar at or post in the Hypercat LinkedIn forum






Small Data: A Deterministic and predictive approach


Image source: Daniel Villatoro 


In this blog/article, I expand on the idea of ‘Small data’.

I present a generic model for Small data combining Deterministic and Predictive components

Although I have presented the ideas in context of IoT(which I understand best) – the same algorithms and approach could apply to domains such as Retail, Telecoms, Banking etc

We could have a number of data sets which may be individually small but it is possible to find value at their intersection.  This approach is similar to the mobile industry/ foursquare scenario of knowing the context to provide the best service/offer etc to a customer segment of one. That’s a powerful idea in itself and a reason to consider Small Data. However, I wanted to extend the deterministic aspects of Small data (intersection of many small data sets) by also considering the predictive aspects. The article describes a general approach for adding a predictive component to Small data which comprises of three steps: a) A limited set of features are extracted, b) Their dimensionality is reduced(ex using clustering) and c) finally we use a classification and a recognition method like Hidden Markov Models to recognize a higher order metric (ex walking or footfall)


 Last week, I gave an invited talk on IoT and Machine Learning at the Bigdap conference organized by the Ontic project . The Ontic project is a EU FP7 project doing some interesting work on Big Data and Analytics mainly from a Telco perspective.

The audience was technical and was reflected in the themes of the event which (for example : Techniques, models and algorithms for Big data, Scalable Data Mining and Machine learning techniques and mechanisms, Big Data Security and Privacy challenges, Cleaning Big Data (noise reduction), acquisition & integration, Multidimensional Big Data, Algorithms for enhancing data quality.)

This blog post is inspired by some conversations following my talk with Daniel Villatoro (BBVA) and Dr Alberto Mozo (UPM/Ontic). It extends many of the ideas and papers I referenced in my talk.


In his talk, Daniel referred to ‘small data’ (image from Slides used with permission). In this context, as per slide, Small data refers to the intersection of various elements like customers, offers, social context etc in a small retailer context. Small data is an interesting concept and I wanted to explore it more. So, I spent the weekend thinking more about it.

When you have data elements, the concept of small data is a deterministic. It is similar to the mobile industry/ foursquare scenario of knowing the context to provide the best service/offer etc. Thus, given the right datasets, you can find value at the intersection. This works even if the individual Data sets are small as long as you find enough intersecting datasets to create a customer segment of one at their intersection.

That’s a powerful idea in itself and a reason to consider Small Data.

However, I wanted to extend the deterministic aspects of Small data (intersection of many small data sets) by also considering the predictive aspects. In the case of Predictive aspects, we want to infer insights from relatively limited data sets

In addition, I was also looking for a good use case to teach my students @citysciences. Hence, this blog will explore the predictive aspects of Small data in an IoT context

I believe the ideas I discuss could apply to any scenario (ex retail/banking) and indeed also to Big Data sets

A caveat:

The examples I have considered below strictly apply to Wireless Sensor Networks(WSNs). WSNs differ from IoT because there is potentially communication between the nodes. The topology of the WSNs can vary from a simple star network to an advanced multi-hop wireless mesh network. The propagation technique between the hops of the network can be routing or flooding.  In contrast, IoT nodes do not necessarily communicate between each other in this way. But for the purposes of our example, the examples are valid because we are interested in the insights inferred from the Data.

Predictive characteristics of Small data

From a predictive standpoint, I propose that Small data will have the following characteristics:

1)      The Data is missing or incomplete

2)      The data is limited

3)      Alternatively, we have Large data sets which need to be converted to a smaller data set to make it more relevant(ex a small retailer)  to the problem at hand

4)      The need for inferred metrics i.e. higher order metrics derived from raw data

This complements the deterministic aspects of Small data i.e. finding a number of data sets to identify the value at their intersection even if each data set itself may be small(Small data)

So, based on papers I reference below, I propose three methodologies that can be used for understanding Small data from a predictive standpoint

1)      Feature extraction

2)      Dimensionality reduction

3)      Feature Classification and recognition

To discuss these in detail, I use the problem of monitoring physical activity for assisted living patients. These patients live in an apartment under a privacy-aware manner. Here, we use sensors and infer behaviour based on the sensor readings but yet want to protect the privacy of the patient

The papers I have referred to are (also in my talk):

  • Activity Recognition Using Inertial Sensing for Healthcare, Wellbeing and Sports Applications: A Survey – Akin Avci, Stephan Bosch, Mihai Marin-Perianu, Raluca Marin-Perianu, Paul Havinga University of Twente, The Netherlands
  • Robust location-aware activity recognition: Lu and Fu 

This problem is a ‘small data’ problem because we have limited data, some of it is missing (not all sensors can be monitoring at all times) and we have to infer behaviour based on raw sensor readings. We will complement this with the deterministic interpretation of Small Data (where we accurately know a reading).

Small data: Assisted Living Scenario

source Robust Location-Aware Activity Recognition Using Wireless Sensor Network in an Attentive Home Ching-Hu Lu, Student Member, IEEE, and Li-Chen Fu, Fellow, IEEE

In an assisted living scenario, the goal is to recognize activity based on the observations of specific sensors. Traditionally, researchers used vision sensors for activity recognition. However, that is very privacy invasive.  The challenge is thus to recognize human behaviour based on raw readings / activity from multiple sensors. In addition, in an assisted living system, the subject being monitored may have a disorder (for example Cognitive disorders or Chronic conditions).

The techniques presented below could also apply to other scenarios – ex to detect Quality of Experience in Telecoms or in general for any situation where we have to infer insights from relatively limited data sets(ex footfall)

The steps/methods for retrieving activity information from raw sensor data are: preprocessing, segmentation, feature extraction, dimensionality reduction and classification

 In this post, we will consider the last three i.e. feature extraction, dimensionality reduction and classification. We could use these three techniques for situations where we want to create a predictive component for ‘small data’


Small data: Extracting predictive insights

In the above scenario, we could extract new insights using the following predictive techniques (even when we have less data)

 1)      Feature extraction

Feature extraction takes inputs from raw data readings and finds find the main characteristics of a data segment that accurately represent the original data. The smaller set of features can be described as abstractions of raw data. The purpose of feature extraction is to transform large quantities of input data into a reduced set of features. This smaller set of Data is represented as an n-dimensional feature vector. This feature vector is then used as an input to a classification algorithm.

 2)      Dimensionality Reduction

Dimensionality reduction methods aim to increase accuracy and reduce computational effort. By reducing the features involved in the classification process, less computational effort and memory are needed to perform the classification. In other words, if the dimensionality of a feature set is too high, some features might be irrelevant and do not even provide useful information for classification.The two general forms of dimensionality reduction are: feature selection and feature transform.

 Feature selection methods select the features, which are most discriminative and contribute most to the performance of the classifier, in order to create a subset of the existing features. For example: SVM-Based Feature Selection select several most important features and conclude that 5 attributes would be enough to classify daily activities accurately. K-Means Clustering is a method to uncover structure in a set of samples by grouping them according to a distance metric. K-means clustering algorithms rank individual features according to their discriminative properties and their co-relationships.

 Feature Transform Methods : Feature transform techniques try to map the high dimensional feature space into a much lower dimension, yielding fewer features that are a combination of the original features. They are useful in situations where multiple features collectively provide good discrimination but individually, those features would provide poor discrimination. Principal Component Analysis (PCA) PCA is a well known and widely used statistical analysis method and can be used to transform the original features into a lower dimensional space.

 3)     Classification and Recognition: The selected or reduced features from the dimensionality reduction process are used as inputs for the classification and recognition methods.  

For example: Nearest Neighbor (NN) algorithms are used for classification of activities based on the closest training examples in the feature space. (ex k-NN algorithm)

 Naïve Bayes is a simple probabilistic classifier based on Bayes’ theorem which can be used for Classification.

 Support Vector Machines (SVMs) are supervised learning methods used for classification. In the assisted living scenario, SVM based activity recognition system using objects attached with sensors can be used to recognize drinking, phoning, and writing activities

 Hidden Markov Models (HMMs) are statistical models that can also be used for activity recognition. I used a simple analogy to explain hidden markov analysis from a paper which explained HMM for inferring temperature in the distant past based on tree ring sizes

 Gaussian Mixture Models (GMMs) can be used to recognize transitions between activities

 Artificial Neural Networks can also be used to detect occurrences – ex falls.

 Thus, we get a scenario as below











sensors(adapted from Activity Recognition Using Inertial Sensing for Healthcare,Wellbeing and Sports Applications: A Survey)

activity (adapted from Robust location-aware activity recognition: Lu and Fu  )

Small Data: Complementing the Deterministic by the predictive

To conclude:

Small Data could be a deterministic problem when we know a number of datasets and value lies at the intersection of these data sets. This strategy is possible with Mobile context based services and Location based services. The results so achieved could also be complemented by a predictive component of Small data.

In this case,  a limited set of features are extracted, their dimensionality is reduced(ex using clustering) and finally we use a classification and a recognition method like Hidden Markov Models to actually recognize a higher order metric (ex walking, retail footfall etc)

I believe that these ideas could be adapted to many domains. Data science is engineering problem. It’s like building a Bridge where there is no fixed solution in advance. Every Bridge is different and will present a unique set of challenges.  I like the blog post – Machine Learning is not a Kaggle competition . The author(Julia Evans) correctly emphasizes that we need to understand the business problem first. So, I think the above approach could apply to many business scenarios – ex in Retail (footfall), Healthcare, Airport lounges etc by inferring predictive insights from data streams


A fantastic lineup for the forumoxford conference on Friday (which I co-chair with Tomi)

A fantastic lineup for the forumoxford conference on Friday (which I co-chair with Tomi)



ForumOxford is pleased to have the support of Distimo in 2014.

See below for further details of the talks at ForumOxford



James Elles

Member of the European Parliament


I am a Member of the European Parliament for the South-East Region, with special responsibility for the Conservative Party in Berkshire, Buckinghamshire and Oxfordshire. Now in my sixth term, I am a member of the European Parliament’s Budgets committee, and also a substitute member of both the Foreign Affairs committee and the EU-US delegation. I am the founder of the Transatlantic Policy Network, which I currently chair, and a co-founder of the European Internet Foundation.

Karim Lesina

Vice President, AT&T


Karim Antonio Lesina is the Vice President of AT&T, covering International External Affairs for the European Union, Caribbean, Central and Latin America Regions and in charge of the Trans-Atlantic Relations. In this role he leads AT&T’s advocacy in those regions. AT&T is a premier global communications company, providing wholesale services and mobile roaming services to over 220 countries and territories, and providing business enterprise services to countries representing over 99 percent of the world’s economy.

In addition to developing and implementing market access strategies to enable AT&T’s global expansion to satisfy customer needs, other responsibilities for Mr Lesina include ensuring compliance with international telecom regulations, and advocacy on a wide range policy matters related to the stable growth, innovation and investment by the information and communications technology sector.

Mr Lesina is based in AT&T’s Brussels (Belgium) office. He is an active member in several industry and community organizations, including current service as: Chair of the Presidency Group of the American Chamber of Commerce to the EU. He is a Board Member of the European Internet Foundation and of the Transatlantic Business Council. He also represents AT&T in different associations such as ETNO, GSMA, ECTA, TPN, etc.
Prior to joining AT&T, Mr Lesina held senior positions with another leading US-headquartered ICT company (Intel Corporation), and a number of leading public affairs agencies in Brussels. Born in Dakar (Senegal) Mr Lesina is an Italian-Tunisian national and has a Master Degree in Economics of development at the Catholic University of Louvain-la-Neuve in Belgium.


Mobilising our world: Driving investment and innovation in the wireless revolution



Haydn Shaughnessy

  • Contributor, Forbes
  • Haydn is the author of The Elastic Enterprise, an account of how stellar companies prospered during the recent recession. His next book is The Fluid Core, How Technology Is Creating A New Hierarchy of Need and How Smart Companies Respond.

    He writes on innovation and competitiveness issues for where his audience regularly exceeds 500,000 monthly. He has also written for The Wall St Journal and other leading outlets. He used to write the Convergence Culture column in The Irish Times.

    In the late 1980s he became involved in the EU’s attempt to create an ARPA-type unit, similar to the Advanced Research Projects Agency in the USA. That was the RACE programme. He caught the tech bug from that experience, discussing and writing about 3G mobile and broadband applications more than a decade before they became commonplace.

    He was educated at The London School of Economics and Oxford University and is a fellow at the Paul Merage School of Business, University of California at Irvine, and of the Society for New Communications Research and an adviser to organizations in transition.


3 degrees of Separation – How the Relentless March of Connectivity Transforms Markets


The theme is that connectivity is highly prized by consumers more so than we usually realise and over time that has led to an increase in scale-free or power law economics, making it increasingly important for companies to incorporate consumers/customers are every stage of business.


Chris Book

Developer Relations Manager, Distimo


Chris Book is a Developer Relations Manager at Distimo, where he advocates for Distimo products to the developer community, showing both current and future clients how to gain value from their own app store data.


Show Me the Money! Learning from mobile games monetisation – how to create a successful app


Dr Catherine Mulligan

Research Fellow, Innovation and Entrepreneurship group, Imperial College London


Dr Catherine Mulligan is a Research Fellow in the Innovation and Entrepreneurship group. She is Principal Investigator on two RCUK Digital Economy Program grants: Sustainable Society Network+ and Scaling the Rural Enterprise.

In addition, Catherine is a researcher on WP4 (Business Models) for the Digital City Exchange and Co-Investigator on the “Unleashing the Value of Big Data” projects.

Catherine has 15 years international experience in the Mobile Telecommunications and ICT industries, including 10 years at Ericsson in Stockholm, Sweden. Working on a variety of cutting edge technologies, Catherine experienced first-hand the complexities of successfully taking innovation to market.

Her research interests lie in the area of new economic and business models enabled by the digital economy. In particular, Catherine is interested in the role that technologies play in the creation of citizen-centric smart/sustainable cities.


From M2M to IoT – the impact on Smart Cities

Digital technologies are often suggested as a panacea for the development of “smart cities” – cities that in some form integrate a digital infrastructure with the physical city in order to reduce environmental impact while improving quality of life and economic prospects. While these sorts of concepts have been around for several decades, the recent advent of smartphones and cheaper sensor technology means that digitally enabled, or “smart,” cities are fast becoming a real-world possibility.


The role of the Internet of Things (IoT) in cities is one that will only increase as the pressure on cities to deliver services at reduced costs for an expanding population increases. Many examples exist, including water management, transport management, and waste management. Due to the complexity in cities, the best way for them to achieve the desired outcomes of smart cities is to utilize an information marketplace approach, which allows them to combine together the data from extensive M2M and IoT investments and utilize positive externalities associated with technology in order to reduce environmental impact while creating jobs and economic prosperity for citizens.  Dr Mulligan will present some insights from her upcoming book “From M2M to IoT: An Introduction to a New Age of Intelligence”, published by Elsevier.


David Rogers

Founder, Copper Horse Solutions Ltd

  • David (@drogersuk) is a mobile phone security expert who runs Copper Horse Solutions Ltd, a software and security company based in Windsor, UK. He also chairs the Device Security Steering Group at the GSM Association and teaches the Mobile Systems Security course at the University of Oxford. He has worked in the mobile industry for over 15 years in security and engineering roles. Prior to this he worked in the semiconductor industry.

    David’s articles and comments on mobile security topics have been regularly covered by the media worldwide including The Guardian, The Wall Street Journal and Sophos’ Naked Security blog. His book ‘Mobile Security: A Guide for Users’ was published in 2013.

    David holds an MSc in Software Engineering from the University of Oxford and a HND in Mechatronics from the University of Teesside.

    David blogs from


The Future of Mobile Device Security


The mobile device is changing – is it a handset, a watch or an electricity meter? Is it secure? How do you secure it? What if a software update bursts my home’s hot water pipes? The ever-increasing complexity of the mobile eco-system is a potential nightmare for users yet security and safety remains an after-thought for some companies.


This talk will look at what is really happening and what should happen to ensure consumers remain safe and secure.



Dr Patricia Timoner

Strategy Consultant, Mobile Digital and Interactive Marketing South America Operations, GrowVC


With over 15 years of work experience in mobile, media and high technology, Dr Patricia Timoner has spent the last nine years as a Strategy Consultant in Mobile Digital and Interactive Marketing. Some of her most notable high tech media/marketing consulting projects include developing the pilot for Sony to use the Playstation Portable (PSP) as a mobile engagement and interactive platform; the strategic marketing plan for OpenSearch the online advertising software; and a major marketing campaign for Smirnoff that achieved 16% gain in market share. With major high tech related work on four continents, Dr Timoner is a co-founder of Mobile Monday (MoMo) Sao Paulo and sits on its Board, as well as assisting several MoMo chapters in Latin America.

An expert on mobile business and media, with focus areas on social networking on mobile, advertising and the emerging opportunity of ‘engagement marketing’, Patricia has contributed to the evolution of the thought leadership through speakerships, workshops, guest lecturing and writing. She has contributed a chapter to the Pagani’s Encyclopedia of Multimedia Technology and Networking, as well as a chapter to Kotler’s Marketing (6th Edition). Being an internatinoally respected authority on high tech, Dr Timoner has served as a judge on prestigious international competitions such as the World Championships of E-Business and Asia Pacific IT and T Awards.

Dr Timoner has completed major consulting projects such as the marketing strategy for Reniar AB, the customer relationship management project for the Clinica Medica General of Los Angeles, and the business development of the ANZMAC conference. She has helped in international expansion projects with companies such as HealthRider, Spinning, and Williams Worldwide Television. Due to rare experience managing numerous intercontinental high tech projects between Australia, Brazil, China and the USA, Dr Timoner is often engaged to projects of exceptional international issues. One of her most recognized achievements was the business model for the international OPTO VLSI project between Germany, Israel, South Korea, the UK and Australia. Dr Timoner was recognized as crucial to the project’s success.

Before her consulting, writing and lecturing career Patricia gained actual hands-on employment experience in the 1980s and 1990s at the executive level as the Managing Director of T-Connection Sao Paulo; at the operational level as Marketing Manager of Physician Care Management Los Angeles California and Marketing Manager at Queens Commercio e Industria in Brazil. Dr Timoner began her marketing and advertising career learning the basics as Junior Brand Manager at Heublein do Brasil, and Media Coordinator at MPM Propaganda.

A popular speaker seen at international conferences on five continents, Patricia chaired the Mobile Commerce Track at IRMA San Diego and the Electronic Marketing Track at the We-b Conference. She has presented to such events as the Mobile Content World, Digital Marketing and Media Summit, IABE Las Vegas, TeleViva Movel, European Conference on Information Systems, Mobile 2.0 and the M Payment and M Banking Congress. During her consulting career, Dr Timoner has been a frequent guest lecturer on mobile, digital and media topics at major universities such as ESPM and FAAP Universities of Sao Paulo, Liaoning University China, Trisakti University of Jakarta, and University of Western Sydney. Patricia Timoner holds a Ph D in Mobile Commerce from Edith Cowan University Perth Australia; with an MBA Degree specializing on E-Commerce; a Master’s Degree on Marketing; and a Bachelor’s Degree in Social Communications and Advertising.


Brazil – Mobile Market Overview


Caribbean. Mobile penetration is upward of 132% and still growing by about 7% annually. Yet the market has some very unique characteristics.


This presentation aims to provide an overview of the Brazilian mobile market and user habits, as well as scenario forecasts for the years 2015 and 2020.


  • Sean Kane
  • Co-Founder,
  • is a very dynamic community with a UK origin but a global following. Sean Kane has been active in growing web and mobile companies for 15 years in the media and social space, such as (SVP), Intercasting (GM) and Bebo (VP Mobile). He is also Co-Founder of Springboard Accelerator and assists numerous Startup Programs around the globe. He has been honoured to work with amazing founders across the world, has been fortunate to be involved in several IPOs and acquisitions and is an active mentor for entrepreneurial programs. Sean has done coursework at Harvard, MIT and ITAM.


Raising Money for Mobile Startups: Online and Crowdfunding



  • Jeanette Carlsson
  • CEO & Founder, @newmedia2.0
  • Jeanette is CEO and Founder of @newmedia2.0 (, the leading, independent digital media and insight consultancy, advising Brands and TMT companies on how to deliver improved business performance and growth through better use of digital and strategic user insights. She is also an advisor to the world’s largest mobile marketing company; UKTI’s ‘Catalyst programme, UK ‘Tech City ‘and US tech start-ups; and guest lecturer on digital, innovation & entrepreneurship at University of Oxford and University of Warwick.

    Jeanette has over 20 years’ professional experience leading, growing and advising companies from start-ups to large corporates on how to deliver improved business performance and growth through better strategy and use of digital communications, working with boards and senior executives of the world’s leading brands in the UK, Nordics, Western Europe and the US. Prior to setting up @newmedia2.0, Jeanette was MD of a leading Marketing Analytics company, and before that, spent 10 years at IBM leading and growing UK, European and global businesses in the TMT space. Jeanette has published widely on key strategic communications & digital issues facing businesses and is a frequent speaker at leading industry events.

    Jeanette has a postgraduate degree in Advanced Strategy from the University of Oxford; an MA and BSc in Economics from University College London and a BA in English from University of Copenhagen. Jeanette is bilingual Danish/English and also speaks Swedish, Norwegian, German and French.


What is your data worth?


As consumers consume content and interact with businesses across multiple platforms, companies collect a vast and growing bank of user data from multiple touchpoints from offline media to website traffic, mobile/tablet apps, social media, digital content, reader offers, transaction data etc. Turning this data into strategic, actionable user insights holds significant potential business value to companies in terms of empowering decisions and monetising strategies, so is increasingly becoming a strategic priority. Today, few companies capture or extract maximum value from this hidden ‘goldmine’. In this talk, Jeanette Carlsson, CEO @newmedia2.0 – the leading digital and user insight consultancy, looks at the data opportunity and shares her perspective on how best to leverage user data to develop a true understanding of users and their value with associated business benefits.



  • Dan Appelquist
  • Open Web Advocate, Telefónica Digital and blogger at Torgo
  • Dan is an American Ex-Pat living in London. He’s a father of two and husband of one. Dan is the Open Web Advocate for Telefónica Digital, focusing on the Open Web Device. Dan founded Mobile Monday London, Over the Air and the Mobile 2.0 conference series. Dan is an advocate for the open Web and for Web Standards.


East Side, West Side, Peace: How App Developers Should Leverage the Web


The web is 25 years old. During that time it has evolved from a system for sharing academic information to a ubiquitous, distributed application platform used by businesses, organisations and individuals the world over, and it has become one of the greatest engines of innovation the world has ever seen.


The evolution of this open platform has been unlikely and convoluted, and throughout its development it has been subject to attacks parties whose interests are better served by closed, controlled platforms. The rise of app stores and native mobile applications is another such development. Don’t worry, though.


This talk is not going to be about apps vs. web. As apps platforms continue to mature, it’s becoming clearer how apps and the web should work together – how we can and should be leveraging the web to do what it’s good at and using native apps to do what they’re good at – ultimately to provide the best user experience across all platforms.


I’ll talk about the latest and greatest in mobile web application platforms (such as Firefox OS), how best to use the Web alongside of native apps across platforms. I’ll examine some (currently) broken user experiences and point the way forward to a world where apps and the web can live together.


Lilach Bullock

Founder, Socialable


Lilach is the founder and driving force behind Socialable, and highly regarded on the world speaker circuit. Forbes and Number 10 Downing Street have even been graced by her presence! In a nutshell, she’s a hugely connected and highly influential serial entrepreneur – the embodiment of Digitelligence.

Listed in Forbes as one of the top 20 women social media power influencers and likewise as one of the top social media power influencers, she is one of the most dynamic personalities in the social media market, she actively leverages ethical online marketing for her clients and for Socialable.

After launching her first business within three years of becoming a mother, her financial success was recognised by being a finalist at the Best MumPreneur of the Year Awards, presented at 10 Downing Street. Following a resultant offer and wishing to spend more time with her daughter, she sold her first business to focus on social media, developing a multi-site blog and online marketing portfolio that generates in excess of 600,000 + page views per month.

A business owner, social media consultant, internet mentor and genuine digital guru, Lilach is consulted by journalists and regularly quoted in newspapers, business publications and marketing magazines (including Forbes, The Telegraph, Wired, Prima Magazine, The Sunday Times, Social Media Today and BBC Radio 5 Live). What’s more, her books have achieved No 1 on Amazon for Sales and Marketing and Small Business and Entrepreneurship.

When Lilach isn’t working she enjoys spending time with her family and is an avid fan of Zumba.



Lee Omar

CEO and Founder, Red Ninja

Lee is the founder of Red Ninja, a high growth design led innovation company. He is an experienced application developer and smart city practitioner. He has developed real time Internet of Things data driven apps for ARM, BBC, TUC, Network Rail, Merseytravel, National Museums and Liverpool City Council. He developed his first geo-locational mobile social network app in 2009.


Lee works directly with Chief Scientific Advisor to UK Government Sir Mark Walport in his Foresight Future Cities project, which is looking at the role of cities in 2065 and advising on policy changes that should be implemented to get there. He is currently working with National Health Service to develop mobile ambient assisted living technologies that enable people to live in their homes longer.

An expert in generating value from big data sets and the Internet of Things and is currently advising Connected Digital Economy Catapult to set their funding strategy for the cultural sector around data. He sits on the British Standards Institute advisory group that defines the framework for smart cities for local governments and also the ontology. Lee is the Entrepreneur in Residence at XJLTU University (Souzou/Shanghai).



  • Tineka Smith
  • Media Relations Specialist, Weber Shandwick Technology UK
  • Tineka Smith is a media relations specialist within the technology landscape. Her experience includes working with Weber Shandwick Technology while providing media relations expertise and content creation across a range of clients which have included Microsoft, BAE Systems, Capgemini, Gartner, Veracode and Ricoh Europe.

    Prior to Weber Shandwick Technology, Tineka was a reporter and junior news editor for Computer Business Review covering a range of technology topics including big data, mobile, security and social media. Tineka also worked as a tech contributor for the New Statesman business blog and previously worked for local newspapers and TV in the United States. Tineka has a BA (Hons) in Communications, French Studies and a Masters in International Journalism.


How Mobile is Changing the Media Landscape


This will take a look at how journalism has changed over the past few years due to the increasing use of mobile devices in reporting and how it could potentially affect the future. Mobile has also spurred the increase of citizen journalism and has placed much of the power in how/which news is delivered into the hands of the public. In the coming years mobile may also affect how TV news is delivered as more and more journalists are switching to using cameras on their tablets to conduct interviews for news organisations.



Patrick Bergel

CEO and Founder, Animal Systems



Karen Barber?



Afternoon panellists



  • Peggy Anne Salz
  • Chief Analyst and Founder, MobileGroove


Peggy Anne Salz is lead analyst and founder of MobileGroove, a top 50 ranked influential destination that produces and promotes custom research, strategic thought leadership and knowledge resources for the global mobile industry. Her work, which includes 300+ articles on mobile marketing, mobile search, social media and mobile industry news and developments, has appeared in The International Herald Tribune, The Wall Street Journal (Europe & Asia editions), TIME, and in the Agile Minds column in EContent magazine, among many more.

Peggy is also a Gigaom Research mobile analyst, where her focus is mobile loyalty, mobile messaging and mobile retail. Her most recent report, Managing The Complete Customer Experience: Encouraging Engagement with Mobile and Apps, helps businesses understand and harness mobile to re-imagine the customer experience and super-charge sales and service channels.

Peggy has written nine books about mobile, both as a lead author and in partnership with global companies in the industry. Her most recent book, Apponomics: The Insider’s Guide to a Billion Dollar App Business (InMobi, 2014) provides actionable insights into how companies can market and monetize their apps. It builds on the success of her first book on mobile apps, The Everything Guide to Mobile Apps: A Practical Guide to Affordable Mobile App Development for Your Business (F+W Media, Inc, 2013), a practical, crowd-sourced book providing businesses and developers with insights on how to make, monetize and market mobile apps. She has also edited and produced the Mobile Operator Guide 2013: The Evolution of Mobile Services: Challenges, Strategies, Opportunities and the Mobile Commerce Guide 2013: Engage Customers & Build Loyalty in Developed and Emerging Markets. She is currently working on a new industry resource e-book that explores the impact of ultra-broadband mobility and how it can lay the groundwork for the ultra-connected society of the future.

Graduating with honours from the University of Pittsburgh, Peggy earned a B.A. in Philosophy of Science, Political Science, and Economics. She is a Fulbright fellow and a member of the International Who’s Who of Professionals.


Mick Rigby

CEO, Yodel Mobile Ltd

Mick founded Yodel Mobile in 2007, the first specialist mobile marketing agency headquartered in the UK.

Yodel Mobile is a strategically-led full service agency that offers best in class strategy, development and delivery for organisations looking to incorporate mobile successfully into their business.

Its clients include Kobo, The Daily Mail, IPC, Dennis Publishing, Hastings Direct, The Economist, Wall Street Journal and Sage.

Prior to Yodel Mobile, Mick was the managing partner of a communications planning agency, he has over 20 years’ experience in the marketing and advertising sector and worked at some of the biggest media and advertising agencies.

He is passionate about mobile technology, especially how it can be used to enhance brand and consumer experiences.

Tony Pearce

Founder, gamesGRABR

Coming from a 20 year background in senior management in the entertainment and gaming industry Tony has an excellent track record as a successful entrepreneur and CEO.

Over the past 10 years Tony has raised over £15m in VC funding, started 5 companies and had 2 successful exits. Most recently Tony founded GamesGRABR an innovative platform (web, mobile and tablet) that marries the power and usability of a contemporary ‘pinboard-style’ interface allowing the user to curate their own collections, discover, engage, play and share them with other users with similar interests.

Previously Tony founded a games company called TeePee Games and developed a technology that allowed users to get recommendations on games across social networks such as Facebook.

Tony was the CEO and Co-Founder of Player X and spearheaded the company from a two person start-up in 2004 and developed it into Europe’s largest mobile games distributor. The company was named by Library House as the fourth fastest growing VC backed company in the UK before it was acquired by Spanish mobile content company ZED in April 2009.

Tony is also the co-founder of an executive networking event called the Centurions which is aimed at the digital entertainment industry and takes place every two months in London along with events in New York, Istanbul and Munich.


Richard Downey

Director, The Mobile House


 Martin Wrigley

General Manager Europe, App Developers Alliance



Martin heads up the European arm of the Alliance as General Manager, Europe. With more than 25 years of experience in telecoms and IT and a wide background of development, solutions architecture and delivery.


Martin lead the app developer services area for mobile operator, Orange, between 2004 and 2012.  He has since been actively involved with IT integrators, developers and European institutions, and is also Executive Director of AQuA, – the App Quality Alliance.


The need for a Developer Industry Association – The App Developers Alliance


The talk addresses the need of the industry to have an association that can not only help in terms of business education, best practice information sharing, peer-to-peer global networking but also in representation into policy makers worldwide.  It introduces the App developers Alliance that has been fulfilling this need in the USA for the past few year as and is now expanding into Europe.



Application developer alliance launches in Europe ..

 I have covered the Application Developer alliance before ( Joel Spolsky named Chairman of Application developers alliance board of directors) and Application developer alliance: The app economy has a new catalyst

So, its nice to see that the Application developer alliance formally launches in Europe

This is good news because – the app economy is key to European jobs, prosperity and competitiveness.

The eurapp study finds a total of 1.8M jobs in the EU app economy, with €17.5bn in revenues

And yet, that’s just the beginning! The future will be a lot brighter with grassroots innovation and entrepreneurs who will go on to create new companies – all of which will be ‘apps’.

We now accept that we live in a Mobile first world- yet, governments and policy makers have not fully grasped how big the nascent app economy could be.

Apps affect everyone

I am building a start-up feynlabs - which will be an app .. as will many others .. i.e. every new idea is an app

The app ecosystem is also the theme of our conference in Oxford University in May

I will watch this space with interest

Mobile World Congress review – from an IOT and Disruption perspective


Here is my review of the Mobile World Congress. This year, for the first time, I was invited to attend the GSMA Ministerial programme as part of my work in being the co-author for Digital world in 2030 report to be released in the European parliament next week

This blog is a personal perspective (i.e. related to the above report – but my own views).

It tracks disruptive trends and I present a perspective / viewpoint on the industry.  It is biased towards IOT – which is also a personal focus – especially due to my teaching for  ”Big data analytics and algorithms for cities” at the City sciences program for the Technical University of Madrid

We now live in a mobile first world. I was not carrying a laptop and this single image (Android with Intel inside) – shows us that we are indeed in a Mobile first world









The ‘Mobile first world’ also explains the sheer size of MWC(85,000 attendees – a 20% jump from the previous year) –  45 restaurants – two  heliports – 1,700 exhibitors – eight exhibition halls – 240,000 square meters (2.6 million square feet)

If you are not a part of it – you are missing out!

So, here are my perspectives ..


Firstly, here is my overall perspective, which I also discussed in my keynote at the Swiss mobile association in Zurich (slides) (smama)

  • The Mobile data industry as it stands today, is about fifteen years old.
  • It is fast growing but mature. We now have a two horse race for devices (considering Samsung owns about 70% of the Android market).
  • In innovation terms, innovation shifts from building networks to creating light bulbs. When compared to electric networks – in the early stages – there was a period in which networking technologies competed (AC/DC – Edison/Tesla) etc. Once that was decided, innovation shifts to creating light bulbs). We are in a similar phase.
  • So, innovation shifts along three dimensions: Horizontal (apps), Vertical (cross stack – ex IOT) and network.
  • Why is network the third dimension of innovation? Unlike electric and power networks which have a 50/60 year cycle – Telco networks have a 7 year cycle (from standardization to spectrum to devices). That means the rate of Telco innovation is also comparatively faster and every network innovation leads to a knock on effect for secondary innovation (ex devices). Hence, 5G is important What will 5G look like and how 5G will shape the technological landscape of countries for the next decade
  • Since 5G is widely expected to be deployed around 2020 – the question then is: How will the market play out between now and 2020? There are some key indicators already:  iBeacon  could be a de-facto standard since it’s an ‘open enough’ standard. (I.e. iPhone and Android devices can both use it). Many of the use cases for mobile couponing etc promoted by NFC – could well be deployed with iBeacon. More interestingly, iBeacon could motivate retailers to open up their WiFi networks and allow others to use their WiFi (in return for instore coupons)
  • Similarly, Hotspot 2.0 would allow us to seamlessly navigate between cellular and wifi networks.
  • This means – by 2020 – we could live in a world with primarily localized connectivity and 5G would then make that connectivity pervasive (like ‘air’)
  • In this world, Operators would have a much better visibility of their customers(through say Hotspot 2.0) and would focus on being much more customer centric by truly leveraging their data (Big Data, IoT etc)
  • To quote John F Kennedy, a rising tide will lift all boats and thus, we have an optimistic view of the industry
  • Thus, Hotspot 2.0, Low energy Bluetooth (on which iBeacon is based) will be interesting. It will be also interesting to see how Zigbee fares in future in context of Bluetooth (and the jury is definitely out on Zigbee  as is on NFC (Apple’s ibeacon mobile payment system is the death knell of NFC). NFC and Zigbee have taken too long to gain critical mass – and normally when that happens – something else takes over
  • More interesting from a connectivity standpoint is 900Mhz wifi also called 80211ah standard. At MWC 900MhzWi-Fi makes a debut which could provide wi-fi with dedicated bands for ultra-reliable, always-on machine-to-machine connections.

Why the focus on IOT ..

2014 was clearly the year of IOT. IOT is not M2M (machine to machine). M2M is a telecoms term which implies that there is a radio (cellular) at both ends of the communication. On the other hand, IOT means simply connecting to the Internet.

The two are not the same!

From an IOT perspective, we can see connectivity in three ways:

a)      Things connected to the cellular network directly and communicating mainly via cellular (machine to machine)

b)      Things connected to Internet, speaking to each other  in an autonomous, discoverable, peer to peer mode (ubiquitous computing)

c)       Things connected to the mobile phone and then to the Internet

(a) is too expensive for mass market.  (b) is too futuristic. (c) is happening now .. and is my main focus here


MWC is all about devices!

  • LG, Samsung, and Sony dominate ex Sony’s Xperia Z2
  • In contrast, devices from Huawei and ZTE did not seem to be impressive. It will be interesting to see if they are successful going forward as device vendors or will continue to remain infrastructure vendors
  • Both Nokia and Blackberry (former leaders) remain uncertain with Nokia adopting Android.
  • New devices like Yotaphone 2 (screen on one side – e-reader on the other side) provide new interfaces
  • Some vendors like Kyocera introduce some quirky concepts
  • Both Sailfish (Jolla) and Ubuntu had a presence – but too early to say how these devices will fare in the future
  • There were devices and then there was Samsung Galaxy S5! Apart from the usual (5.1 inch screen, 16 megapixels camera etc), we also have for the first time a fingerprint scanner and support for 128-GB memory card.
  • The Galaxys5 also allows you to track your heartbeat. In conjunction with the Samsung Gear Fit , we have the makings of a new class of IOT device targeting healthcare


Emerging market devices

Devices targeting emerging markets and the cost conscious customer were prominent.

  • Nokia X(with  Android) strategy can be seen in this light since they support  up to 75% of apps available on Google Play. The Nokia X, X+ and XL will cost €89 (£73), €99 (£81) and €109 (£89), respectively.
  • The $25 firefox phone also comes in the same class and challenges the Nokia X
  • The facebook whatsapp strategy could also been in that light


Automotive was also a strong theme especially with Ford Sync 2 , Mercedes QNX-based system , Connected car solutions at MWC 2013 – m2m, CarConnectivity consortium , AT&T-GM


  • In an age of Snowden, Privacy has taken center stage.
    Mozilla announced the Future of Mobile Privacy to secure data easily.
  • Samsung announced the Knox security product targeted at small and medium enterprises. According to Computerweekly “The SME offering allows dual “personalities” to be set up on a smartphone to allow one handset to function as two separate devices, keeping the data from each, whether personal or corporate, contained and secure.
  • And then there was the Blackphone – dubbed the Snowden phone (Blackphone web site). The ‘privacy-first’ runs a customized version Android OS.
  • As mentioned before, the Galaxy 5 already has fingerprint recognition
  • My analysis: It’s interesting to see how many people actually buy secure phones for themselves (i.e. not popular phones like Galaxy 5 which have secure features built into them). In my experience, many people (with Geographical exceptions like Germany) – want privacy – but actually behave in the opposite manner (or do not pay for privacy)


I have already explained the significance of IOT above .. Here are some interesting observations






Comments/feedback welcome at ajit.jaokar at  Follow me @ajitjaokar

For a copy of the Digital world in 2030 report to be released in the European parliament next week – follow the link