Open source Big Data Smart City algorithms …

Since my last blog about Big Data for Smart cities (Big data for Smart cities – How do we go from Open Data to Big Data for Smart cities, here are some more thoughts ..

Extending that, I propose in this blog the creation of Big Data Smart city algorithms which are Open sourced

I hope that this blog could be the start of something very interesting …

I proposed this idea to some folks from the Liverpool Smart city initiative. We plan to have a meeting soon about this soon either in London or in Liverpool – and if you are interested, please email me at ajit.jaokar at  but in any case, we will share more information here.

Open Source Smart City algorithms ..

 There is always the perennial question of: What makes a city smart?

 I have discussed these ideas before and the answer is broadly a mix of People, Grass roots innovation, Sensors, Open Data, Big Data (analytics) and technology.

To this mix, we could add ‘algorithms’. Extending the ideas of Big Data to Smart cities, algorithms could play a key role with Smart cities contributing to ‘what makes the city smart’.

 In a nutshell, I propose a plan to create Open source Big Data algorithms for smart cities. 

 The goal is to create and release Big Data algorithms for Smart cities as Open source(perhaps as an Apache project). By doing so, all cities could use these algorithms in their own way. If we use a license like the Apache License, we could also encourage various entities to create their own implementations.

The problem spans many domains .. So where to start?

The obvious starting point is to look at existing city level problems where algorithms could be applied. After this, we could look at Big Data algorithms and then apply these algorithms to City level problems.

By releasing them as open source, other cities could contribute to these algorithms.

 Smart cities and Big Data

From my previous blog, here are a list of services in a city that could be improved through data/algorithms

  •  Environmental services (ex: reduced pollution)
  • Recycling/waste disposal
  • Optimal use and location of infrastructure
  • Traffic management
  • Transportation
  • Consumer advice based on real time data
  • Healthcare
  • City Planning (zoning, construction, transport, airports)

Each of these, and many others, could benefit from algorithms based on Big Data.


Big Data algorithms

Now let us switch hats and look at Big Data algorithms

This space is still very new. But, here are some thoughts/resources

  • Many of the algorithms used for Big Data are Predictive analytics algorithms. Predictive analytics algorithms have been in use for a few years  and benefit for existing computing techniques. Predictive analytics encompasses a variety of statistical techniques from modelling, machine learning, data mining and game theory that analyze current and historical facts to make predictions about future events.(Wikipedia).  In business, predictive models exploit patterns found in historical and transactional data to identify risks and opportunities. Credit scores are examples of predictive analytics. The difference is how to apply these techniques to very large data sets – Big Data. Specifically, in our case, how to apply these techniques to city level data.
  • In many instances, we may have to re-apply ideas from other domains to city level problems. Atbrox has a very good set of resources for mapreduce-hadoop algorithms. These include – Search, Behavioural targeting, Astronomy,  Social Networks, Bioinformatics/Medical Informatics, Machine Translation, Spatial Data Processing, Artificial Intelligence/Machine Learning/Data Mining, Clustering, mining large-scale rich-media data, Search Query Analysis Simulation, User-based collaborative filtering recommendation algorithms on Hadoop, Genetics, Approximation Algorithms, Game theory, Mining Algorithms of Data in non-traditional formats (unstructured, semi-structured). There is an excellent conference at Stanford university with papers in most cases – Workshop on Algorithms for Modern Massive Data Sets (MMDS)
  • Machine learning algorithms: Finally, there could be a role for machine learning algorithms such as NLP and Genome algorithms and libraries like Apache Mahout  . Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that take as input empirical data, such as that from sensor databases, and yield patterns or predictions thought to be features of the underlying mechanism that generated the data.


Adapting algorithms from other domains to cities

I have been fascinated by algorithms and algorithms how algorithms can take learning from one domain and apply it to another. So, I see many areas from which we could apply learning to Smart cities (as algorithms). Here is a fascinating example that says that the English language originated in Turkey!  So, this is very interesting. An algorithm from one area is applied to another with interesting results. The algorithm in this case is originally used to understand the propagation of viruses – phylogenetic analysis .These ideas for virus propagation were used to understand the origin of languages.

What other areas could we draw upon and apply algorithms to cities?

Why open source?

I am not a big fan of ‘one system to rule them all’ for Smart cities i.e. by definition, we will have a heterogeneous set of systems and technologies in a typical smart city. But trust is the most important element .. The PEW internet says that Some predict that algorithms will most negatively impact the lives of …

Here are some comments from the report .. (emphasis mine)

  •  Steve Sawyer, professor and associate dean of research at Syracuse University; an expert of more than 20 years of research on the Internet, computing, and work, wrote, “Our vision of the data is based on our vision of the world, and this vision is not very broad-minded when it comes to Big Data. We tend to emphasize the parietal insights of a particular form of economic thinking, and we tend to frame social analyses through a form of soft colonialism. Such bias, combined with the arrogance of technical competence, will create huge disparities between ‘what the data say’ and the lives of billions of people.”

  • Brian Harvey, a lecturer at the University of California-Berkeley, noted, “The collection of information is going to benefit the rich, at the expense of the poor. I suppose that for a few people that counts as a positive outcome, but your two choices should have been ‘will mostly benefit the rich’ or ‘will mostly benefit the poor,’ rather than ‘good for society’ and ‘bad for society.’  There’s no such thing as ‘society.’ There’s only wealth and poverty, and class struggle. And yes, I know about farmers in Africa using their cell phones to track prices for produce in the big cities. That’s great, but it’s not enough.”

  • Ebenezer Baldwin Bowles, owner and managing editor of, wrote, “With Big Data comes Great Power, and neither shall be used wisely for the common good. The objective is not to reveal opportunity for the elimination of scarcity among the many, but to identify fertile ground for exploitation and control.

  • Paul McFate, an online communications specialist based in Provo, Utah, said, “New media channels will continue to splinter consumers and enhance the social divide. Intelligent people will use the information well, but the average person will continue to look for bright shiny objects that will entertain. Abusive people will continue to abuse. Providing access to data does not change moral behaviour.”

  • Daren C. Brabham, an assistant professor of communications at the University of North Carolina-Chapel Hill, said, “Our reliance on algorithms is already proven to be problematic, evidenced by the fickle nature of the stock markets and other things. As we keep funneling the best and brightest mathematicians into algorithm-focused professions (like finance), we’ll continue to abstract real labor and real human concerns further away from real consequences and circumstances. This is a massive ethical problem, too.”

  • David A.H. Brown, executive director of Brown Governance Inc., a consulting business based in Toronto, Canada, noted, “Democratization is the issue; this has tremendous implications for social structure and social order (increasing pressure by ‘have-nots’ on ‘elites’) as well as privacy, family, and culture. A big unanswered question is who will control Big Data?  Whomever controls the information will have greater power and influence, and they may use this for positive or negative results.”

Thus, Big Data does not exactly engender trust with the public.

More so, when applied to Smart cities, there is an obligation for trust and transparency.

So, to conclude by making Big data algorithms for Smart cities as Open Source, I hope we can get contribution, transparency and trust . Any comments welcome. Please contact me at ajit.jaokar at if you want to contribute or stay in touch.


Image source: Apache Mahout project