European data miners: “We were told to relocate our servers to the US”

European data miners: “We were told to relocate our servers to the US”

Tech firms are speaking out against changes to EU copyright rules that they say could force them to leave Europe for the sake of protecting their businesses.

The rules in question concern text and data mining (TDM), the practice of uncovering and analyzing patterns from indexed data sets, ranging from academic papers to social media posts. It is a widespread practice among data analytics companies, who index public content from the Internet and mine it for meaningful patterns.

But publishers, working in concert with the European Commission, say that the act of data mining needs to be more strictly regulated.

The Commission is in the process of overhauling European copyright laws and says that any for profit company that mines data from copyrighted material must pay up for the right to extract and analyze data.

The EU executive has styled itself as the guardian of the publishing industry, saying outright that the copyright rules will protect publishers from being “rolled over” by digital competition.

“When, as publishers, you do not come out in support of this proposal, you place a time-window on your own economic and cultural futures. Start writing about the balance offered by this proposal, which is essential to the survival of your own publishing houses against data-driven and capital-intensive online platforms,” said European Commissioner Gunther Oettinger to a conference of German publishers in 2016, seven days after the Commission published its proposal on digital copyright.

Publishers have welcomed the proposal, saying that the changes will help them recoup losses in the dog-eat-dog world of declining digital copyright revenues.

“Enabling TDM is very costly and requires significant investment which justifies a charge when commercial companies wish to profit themselves from this effort. Not-for profit research institutions mining content for a non-commercial purpose do not profit from these efforts, which is why our publishers have undertaken to allow mining for them at no additional charge,” said Matt McKay, Director of Communications at the International Association for Scientific, Technical, and Medical Publishers.

Data miners, however, say that if they pay to access copyrighted material, such as newspaper articles or scientific magazines, they should be able to harvest the material for any data it contains.

“The additional licensing requirement won’t generate more revenue for publishers who hold the rights. As soon as you have legal access to something, you should be able to read it and to mine it,” said Lenard Koschwitz, a lobbyist at Allied for Startups, an association of European startups who are fighting against the new rules.

Miners flock to the digital gold rush

Data mining firms have blossomed in recent years as actors in the digital economy have recognized more and more the economic potential of big data.

For example, data mining a large collection of indexed scientific papers could suggest a yet unknown association between a pharmaceutical compound and a disease, even if none of the articles had explicitly suggested a connection between the two.

Currently, data analytics companies operate in a legal grey zone when they mine for data on copyrighted material in Europe, as data mining is not yet a legally defined practice in existing EU law.

But according to the Commission’s most recent proposal, which defines data mining as “automated computational analysis of information in digital form”, the right to mine will only be freely accessible for research institutions and scientific researchers, meaning that for-profit companies will have to pay copyright holders for mining licenses in addition to access.

Some companies have said this will simply encourage them in the long term to relocate to other jurisdictions, where copyright laws are much more lenient.

In the United States, for example, a court ruling in 2013 in a lawsuit brought against Google for its Google Books Library Project found that data mining cannot be considered a copyright violation since it is a “transformative” act and does not merely copy the material (and thus violate copyright).

“Google Books is transformative in the sense that it has transformed book text into data for purposes of substantive research, including data mining and text mining in new areas. does not supersede or supplant books because it is not a tool to read books,” said the court in its ruling.

However, based on the Commission’s original proposal on copyright from 2016 and its progress this year in the European Parliament, the EU does not seem prepared to give any such leeway.

Patrick Bunk, the CEO of Berlin-based Ubermetrics, a cloud-based media and data analytics software company which engages in data mining, described how his attempts to convince EU officials fell on deaf ears.

“I spoke with many attachés of national governments who will be involved in amending the legislation. They said outright, ‘There’s no point in discussing this. Just leave Europe. The publishers are too strong,’” he said.

He also noted that the rules on data mining might also impede the growth of artificial intelligence firms in Europe, which rely on algorithmic analysis of large amounts of data to build their intelligence.

“If we close the tab on data mining, we leave machine learning only to those who have their own massive banks of data to build it: that’s to say, only Google and Facebook,” said Koschwitz of Allied for Startups.

The future of digital copyright

The copyright file, which is still under debate and subject to change as EU lawmakers debate it, could be amended to accommodate the concerns of data analytics firms.

However, political will to support the right to data mine alongside the right to access is evaporating as other controversial bits of copyright legislation, including a tax on hyperlinks and fair remuneration clauses for artists, are drawing equal amounts of controversy.

“The likelihood of is currently not high. The issue is currently getting less attention than the debates about an extra copyright for news sites (which would affect the freedom to hyperlink) and about whether internet platforms should be forced to scan all user uploads for copyright infringement. There are also fewer campaigns to raise public awareness on the topic,” said Julia Reda, a German MEP who has been working on the copyright file since last year.

Given the seeming inevitability of the licensing requirements on DTM and their desire to remain out of the controversy, some companies are even afraid to speak out against the data mining rules.

“I have decided not to talk about this topic publicly anymore, as investors get worried when I do. It brings an unnecessary spotlight and additional risk onto the firm,” said one founder of a British data analytics firm who declined to be named for this article.

He also argued that these abstract regulatory concerns should not be of too much concern to early-stage startups.

“The reality for a seed stage startup is there are a lot more urgent things to think about. should really be considered post hoc and done to lift investors’ concerns.”

But, to some of his colleagues, that is naïve and wishful thinking about regulatory compliance, and likely to come back with a vengeance.

“We need to deal with this problem now. Even though we believe the highest courts will side with us eventually, it could take up to twenty years to reach legal certainty. That’s too long for business in Europe,” said Bunk of Ubermetrics. 

Still, the long-term question is whether the uncertainty is substantial enough to push companies away from Europe altogether.

“Europe is establishing barriers to entry and market disadvantages for TDM and AI startups that don't exist in other parts of the world, essentially telling them to take their business elsewhere. I find that regrettable, but understandable ,” said Reda.

“I want continue to operate a search engine, and I want to keep my business here in Europe. But based on what I've been told in Brussels, which is to relocate my servers and headquarters to the US, I cannot,” said Bunk.

Featured image credit: Xiang Gao / Unsplash

Follow the developments in the technology world. What would you like us to deliver to you?
Your subscription registration has been successfully created.