Data Science at the command line – Book and workshop ..

 

 

 

 

 

 

 

 

 

 

I am reading a great book called Data Science at the Command line

The author Jeroen Janssens has a workshop in London on Data Science at the command line which I am attending

Here is a brief outline of some of the reasons why I like this approach ..

I have always liked the Command line .. from my days of starting with Unix machines. I must be one of the few people to actually want a command line mobile phone!

 If you have worked with Command line tools, you already know that they are powerful and fast.
For data science especially, that’s relevant because of the need to manipulate data and work with a range of products that can be invoked through a shell like interface
The book is based on the Data science toolbox – created by the author as an Open source tool and is brief and concise(187 pages). The book focuses on specific commands / strategies that can be linked together using simple but powerful command line interfaces
Examples include:
using tools such as json2csv tapkee dimensionality reduction library  and Rio (created by the author). Rio loads CSVs into R as a data.frame, executes given commands and gets the output as CSV or PNG )
run_experiment -  a SciKit-Learn command-line utility for running a series of learners on datasets specified in a configuration file.
tools like topwords.R
and many others
By co-incidence I read this as I was working on this post:  command line tools can be 235x faster than your hadoop cluster

I recommend both the book and the workshop.

 UPDATE:

a) I have been informed that there is a 50% discount offered for students, academics, startups and NGOs for the workshop
b) Jeroen says that:  The book is not really based on the Data Science Toolbox, but rather provides a modified one so that you don’t have to install everything yourself in order to get started. You can download the VM HERE