Insights that work, from top 50 innovators

As much of our recent content has focussed on data or analytics, it’s good to start with something for research leaders.

Each year, Greenbook in the USA, runs a survey to identify the most innovative research agencies/suppliers. From this, and their team of expert contributors, they identify the top innovators.

The first free resource I have to share, is a collation of success stories & lessons learnt from these firms. Greenbook asked each to contribute an example of how they deliver insights that work for their clients.

The result is an interesting & creative list of research projects for clients as diverse as Sky Team, Facebook & Shell Oil. Well worth a download & read to help spark your own ideas, with some innovative methods:

Fixing your problem of R programs being so slow

I mentioned, in the post I shared on the Julia programming language, that one reason for its existence was performance. Many analysts have told me their frustration with R programs taking so long to run. This is especially true with large data sets.

Now don’t get me wrong, R still has plenty of fans. Despite Python, possibly, now becoming the most used Data Science programming language, R has strengths. After all, it is a language designed from the start for use in statistics or data science. But can anything be done about its speed (or lack of)?

Well, as a present to Data Science leaders, I’m pleased to share this very useful blog post from Emily Robinson. On GitHub she shares a case study with plenty of worked examples. She directly addresses the issue of how to improve the performance of her R code.

I hope this saves you time:

Advice to help with SQL & Excel (yes they are still used)

Whenever I train analysts or data scientists, I am often surprised how many are still reliant on SQL & Excel. Media coverage may suggest that everyone has moved on to coding in R or visualising in Tableau, but I beg to differ.

Experience tells me that just as businesses struggle with large legacy systems, they also rely on legacy data & analytics tools. In fact, even those who are piloting use of R or Python, often still need to interface to SQL or Excel for data access.

So, are there any tips or advice for those still using such tools? What about newer data scientists? Those who never ‘cut their teeth‘ by stretching SQL & Excel to their limits. Well, I think I’ve found two resources to help both communities. Given this blog aims to be pragmatic & realistic, I am also pleased to give airtime to SQL & Excel. Both are still key skill-sets for many of today’s analysts.

First up, here is a short tutorial in use of SQL window functions. Alex Yeskov does a great job in providing simple steps & clear examples. Worth considering for simpler clearer SQL code:

Second, for those already familiar with coding in Python (and as recommended using Pandas module), here is how to access Excel. In this blog post, Harish Garg, does a great job of stepping you through coding access to Excel data sources:

Last but not least, Data Viz tool for multivariate mapping

Data Visualisation continues to be a popular topic for this blog, so let’s provide a present there too. Those who have attended my Data Visualisation training course, will know that I recommend two online tools there.

When deciding on colour combinations for use for categories, ColorBrewer can really help. When seeking inspiration for how to visualise tree or network data, TreeVis is a very helpful library.

So, I’m glad to complement these by recommending a blog post to help with your choice of maps. In this post, Jim Vallandingham, helpfully walks through some of the popular options and what you should consider. His focus is on presenting multivariate data on top of mapping geospatial data. Worth reading through & experimenting with those that are new to you:

I hope you were able to benefit from at least one of those free resources.

