Inside DataSift: A one-stop shop to build on social data? contributing writer Max Tatton-Brown paid a visit to DataSift, the UK-based 'human data intelligence' company that is poised to become Europe's next enterprise software 'unicorn'.

What did Twttr ever do for us eh?

With its dying breaths, Odeo spawned its little SMS-based service with nary a second glance. But then something remarkable happened. A community quickly saw the potential and punctuated the protocol with meaningful creations like @s, RTs, #s — just as the company fleshed out its official brand with vowels.

This was the real journey that made Twttr become Twitter.

Behind the scenes, the community also scattered the seeds that would become some of Twitter’s biggest business advantages. It’s easy to forget that the little social media buttons to Retweet, Like and share also create an outpost of data signals that helps the networks understand the value of content.

It’s here that DataSift’s story begins.

Long before the Like button, the team that became DataSift made a deal with Twitter: you get the RT button and an install base of 500,000 domains, we get long-term firehose access.

Soon, the company became one of only two suppliers licensed to resell access to the realtime stream of every single tweet — a strategic advantage that money literally couldn’t buy at the time.

And here’s where the real crossroads starts. Of the two companies, Gnip launched itself guns blazing into licensing that raw, unfiltered data far and wide. But DataSift claims to have taken a different path.

DataSift founder Nick Halstead, puts it this way: “Most people, when they view DataSift for the first time, look at our Twitter relationship and relationships with the other networks and think that we’re predominantly a 'brokerage company around licensing’.

“But Social is a very large dataset. It’s multiple billions of items every day. Our goal is to make it so that if you have an idea for a business and you want to implement it in the shortest space of time, you can focus on user experience and building a good business — not have to hire 50 engineers just to know what to do with the data to start off with. We call it human data intelligence."

"We’re trying to understand what people say.”

Privacy in practice

Datasift pool

But it’s not all just about private enterprise. Halstead cites an example with echoes of Apple’s recent ResearchKit presentation.

“One example that has always interested me is health and the market of understanding how people react to particular drugs. There are millions of people talking about ‘I took this, I was sick the next morning.’ If it could be solved, it could be a huge social benefit. If you could see that some drugs actually benefit other illnesses."

This goal of creating value from the swarms of social data people post about themselves is not without uncomfortable wrinkles. Take this case study, from the DataSift website:

“LocalResponse worked with a major media advertiser to run a campaign around the movie ‘Sparkle’ which featured Whitney Houston as one of the main actresses.

“Despite the lack of an official Whitney Houston Twitter account, Local Response identified her fans by running queries to search for those who had expressed sadness on Twitter after her death. The company then targeted that engaged audience with ads for the movie.“

Datashift Minecraft warrior

This model of creating value on the back of people’s careless data leakage is starting to creak as more and people become more aware of it.

But if DataSift is built around the idea it provides solutions to anyone wanting to create a business on social data, it is well placed to improve best practice and responsible behaviour across whole industries. It’s not just its technical benefits that scale to its customers.

As Halstead puts it: “If you think of the way the banking industry has always treated data, clearly they aren’t going to sell my credit card purchase data directly. They summarise the knowledge and you end up in a bucket of a demographic and that’s the way this industry is going to move.”

In timely fashion, last week saw DataSift put its money where its mouth is on the topic, announcing a new partnership with Facebook built around this more anonymised, demographic information.

It’s not about targeting careless Whitney-mourners for movies, it’s about helping marketers understand customers’ conversations at scale in a non-invasive way.

Or, in their own words: “make it much easier for the whole industry to understand what people are saying and deliver that technology to other software developers.”

Building an enterprise unicorn

Datasift crazy cart data delivery

Rather than cashing in its chips on its Twitter Firehouse access and building a sales business, DataSift is adamant that its mission is more noble. Halstead describes the desire to create “a big tech company with a weight of IP” and cites a goal of “weaponising the enterprise market”.

It’s clear the company’s sights are set far beyond social.

“The only company we’ve ever thought of as an old model of what we’re becoming is Autonomy. But that was very old school, on premise, pre-baked everything for you, wasn’t customisable and wasn’t real-time,” Halstead says.

DataSift's founder talks about Autonomy’s IDOL or “Intelligent Data Operating Layer”, which pitched itself as something of a holy grail in the enterprise space, bringing together structured and unstructured data from email to databases, audio and almost any other type you choose to name.

For better and worse, it’s not a new problem and, for Autonomy, that mission famously ended in tears. So what makes DataSift different? In many ways, having grown up wrestling firehoses has defined the company’s approach at its core.

Halstead explains: “Social has taught us to be a very different kind of business. We’ve very real-time and the technologies about becoming real-time are all unique.

“At any one time, we’re dealing with several hundred thousand live streams of data that customers are consuming. For every one of those streams, there’s between 1,000-10,000 lines of code being executed on top of 10,000-100,000 items a second. The scale of the processing is just vast.”

datasift offices

“With our technology stack, all of it’s bespoke because the scale we’re working at is beyond any open-source project. When people say they can do some of the things we do using Hadoop Spark or Storm, those are all toys compared with seven years of hard work and now 135 people building technologies to do things at the scale we operate at.”

Bold claims. But with a precedent building ambitious social projects before they were even a glint in Twitter’s eye, and formative years juggling a stream of data that’s peaking 500 million items a day, DataSift found itself in the right place at the right time.

But we need to stop talking about this company in terms of its past. Crucially, Halstead chose to double down on those advantages and use them to extend the company’s lead instead of fill a chest with cash.

With talk of a Series D coming up fast, this is a European company that has earned the right to be considered in a broader light. The 140 characters that matter here are the growing team of engineers that will determine whether or not the company can make that great escape.

datasift jar

(Featured image + all images used in this post property of

Follow the developments in the technology world. What would you like us to deliver to you?
Your subscription registration has been successfully created.