brentp.net

10 August 2014 by Brent Pickup on Internet

Metadata: An Australian Proposal

In the Australian media this week, there has been some lively debate about the Governments proposed new anti-terror laws, which include provisions for Internet Service Providers, and telecommunications companies to keep metadata for two years (rather than the "Operationally Required" time period that most telecoms keep their metadata, for billing purposes). It's important to be aware of what types of data ISP's and telecoms keep currently, and what the government proposes they keep.

Telecoms

  • Date/Time of Phone Call
  • Duration of Phone Call
  • Identifier of Tower that call was placed from (ie: your location)
  • Recipient and Initiator of Phone Call
  • Cost of Call

Internet Service Providers

  • The time you connect to the internet
  • The phone number you connect from
  • Which endpoint you authenticate to
  • The IP Address you are assigned
  • Data usage at the start of the session
  • Data usage at the end of the session

As you could probably guess, most of this information is for billing purposes, and in the rare case, troubleshooting purposes (in case something breaks at the ISP/Telecom provider). The data that the government insists that the provider collects under the new anti-terror legislation is a great deal more intrusive than what's described above.

Cabinet Ministers, and the Prime Minister himself have been on morning talk shows, and prime-time news shows trying to explain metadata to the layman -- to convince them that the government does not want their internet history. Except, that's exactly what they will end up getting under the proposed legislation.

Heres why

In a recent interview with Sky News, Attorney General George Bandis struggled to explain what exactly would be collected by Internet Providers under this new legislation. In this interview, Bandis' explanation of what the government wants exactly seems to be very strange. He stated that the government wants the IP Address people are assigned at the start of their internet session (telcos already collect this). Where it starts to get strange, is when he tries to differentiate "history" between the URL's that people visit. He then goes on to explain that they would only want the general site, and not the pages within (my interpretation of this is the TLD "google.com", and not "google.com/search?q=bad+stuff").

There is pretty much only one way that data can be grabbed from the network (in real time) at a carrier-grade scale, and that's using Deep Packet Inspection. Theres one thing about the hardware that performs DPI at that scale -- it's fucking expensive. By default, ISP's try to stay out of the data as much as possible, because that increases the performance of their networks (it's like a post office, if someone was reading each letter before sending it, does that increase performance or decrease it?). The ISP would then grab stuff like the "Host" header from the packet (for HTTP requests), and if the Senators interpretation was wrong, perhaps the entire URL.

By the way, the only protocol being discussed in the media is HTTP so far. The media has not asked the question of "What if someone is making a Skype or FaceTime call, do ISPs have to record that?". The more diverse the metadata the ISP has to collect, the more expensive it becomes.

Storing the Data

It's all well and good viewing the data as it flows over the wire (or fiber), the next important step is actually storing the data so that it can be accessed after-the-fact. This is where the real expense actually comes into play.

I can imagine for an internet provider the size of iiNet, with hundreds of thousands of subscribers, that renting enough datacenter space to store two years of their customers web habits may not be possible -- they may have to build a new datacenter just for this data (which is exactly what Steve Dalby has said in the not-too-distant past). They've also made it abundantly clear that the costs of building such a datacenter to store such data would be passed on to their customers (not the government, which has not budgeted to fund such expenditure).

Working in Education, I can tell you that logging internet histories does consume a lot of disk space. Even a single Google Search can take up about 10-15 lines in a logfile, because Google submits a new request every three characters you type in. So, just say you wish to search for something like "cute cats on reddit", here's what the government would be able to see:

Imagine that on the scale of 24 Million people. It's an unthinkable amount of data to process, and save per day. Then comes the costs of maintaining a backup of the data (which is an industry standard practice). So, it's not just the cost of the disks to save it on, its also the hardware to back it up onto tape, or replicate it across to another datacentre.

Side Note: The faster the NBN rolls out, the more peoples internet usage increases. With more internet usage comes more expense of saving this 'metadata' as well.

What is metadata exactly?

That's the thing, noone can actually really explain what the Australian Governments definition of metadata is at the moment, or indeed how far they wish to go with their plan. Do they want to be ambitious and get an NSA-level amount of metadata, or do they want to keep it "small-scale" and just stick with web histories? No-one knows.

But, as the LNP always say, I'd love a good cost-benefit analysis right about now.

Those who would give up essential liberty, to purchase a little temporary safety, deserve neither liberty nor safety.

Benjamin Franklin