by Aaron Swartz
Yet, as my little thought experiment above has hopefully made clear, the programmable web is anything but a pipe dream—it is today’s reality and tomorrow’s banality. No software developer will remain content to limit themselves only to things on the user’s own computer. And no web site developer will be content to limit their site only to users who act with it directly.
Just as the interlinking power of the World Wide Web sucked all available documents into its maw—encouraging people to digitize them, convert them into HTML, give them a URL, and put them on the Internet (hell, as we speak, Google is even doing this to entire libraries)—the programmable Web will pull all applications within its grasp. The benefits that come from being connected are just too powerful to ultimately resist.
They will, of course, be granted challenges to business models—as new technologies always are—especially for those who make their money off of gating up and charging access to data. But such practices simply aren’t tenable in the long term, legally or practically (let alone morally). Under US law, facts aren’t copyrightable (thanks to the landmark Supreme Court decision in Feist v. Rural Telephone Service) and databases are just collections of facts. (Some European countries have special database rights, but such extensions have been fervently opposed in the US.)
But even if the law didn’t get in the way, there’s so much value in sharing data that most data providers will eventually come around. Sure, providing a website where people can look things up can be plenty valuable, but it’s nothing compared to what you can do when you combine that information with others.
To take an example from my own career, look at the website OpenSecrets.org. It collects information about who’s contributing money to US political candidates and displays nice charts and tables about the industries that have funded the campaigns of presidential candidates and members of Congress.
Similarly, the website Taxpayer.net provides a wealth of information about Congressional earmarks—the funding requests that members of Congress slip into bills, requiring a couple million dollars be given to someone for a particular pet project. (The $398 million “Bridge to Nowhere” being the most famous example.)
Both are fantastic sites and are frequently used by observers of American politics, to good effect. But imagine how much better they would be if you put them together—you could search for major campaign contributors who had received large earmarks.
Note that this isn’t the kind of “mashup” that can be achieved with today’s APIs. APIs only let you look at the data in a particular way, typically the way that the hosting site looks at it. So with OpenSecrets’ API you can get a list of the top contributors to a candidate. But this isn’t enough for the kind of question we’re interested in—you’d need to compare each earmark against each donor to see if they match. It requires real access to the data.
Note also that the end result is ultimately in everyone’s best interest. OpenSecrets.org wants people to find out about the problematic influence of money in politics.Taxpayer.net wants to draw attention to this wasteful spending. The public wants to know how money in politics causes wasteful spending and a site that helps them do so would further each organization’s goals. But they can only get there if they’re willing to share their data.
Fortunately for us, the Web was designed with this future in mind. The protocols that underpin it are not designed simply to provide pages for human consumption, but also to easily accommodate the menagerie of spiders, bots, and scripts that explore its fertile soil. And the original developers of the Web, the men and women who invented the tools that made it the life-consuming pastime that it is today, have long since turned their sights towards making the Web safe, even inviting, for applications.
Unfortunately, far too few are aware of this fact, leading many to reinvent—sloppily—the work that they have already done. (It hasn’t helped that the few who are aware have spent their time working on the Semantic Web nonsense that I criticized above.) So we will begin by trying to understand the architecture of the Web—what it got right and, occasionally, what it got wrong, but most importantly why it is the way it is. We will learn how it allows both users and search engines to co-exist peacefully while supporting everything from photo-sharing to financial transactions.
We will continue by considering what it means to build a program on top of the Web—how to write software that both fairly serves its immediate users as well as the developers who want to build on top of it. Too often, an API is bolted on top of an existing application, as an afterthought or a completely separate piece. But, as we’ll see, when a web application is designed properly, APIs naturally grow out of it and require little effort to maintain.
Then we’ll look into what it means for your application to be not just another tool for people and software to use, but part of the ecology—a section of the programmable web. This means exposing your data to be queried and copied and integrated, even without explicit permission, into the larger software ecosystem, while protecting users’ freedom.
Finally, we’ll close with a discussion of that much-maligned phrase, “the Semantic Web,” and try to understand what it would really mean.
Let’s begin.
Privacy, Accuracy, Security: Pick Two
http://www.aaronsw.com/weblog/001016
July 29, 2003
Age 16
The Problems with Compulsory Licensing
Millions of people want to download music for, essentially, free. The record companies don’t want them to do this, and claim that they’re losing money and threaten to sue you into oblivion. How do we reconcile these two? One proposal is compulsory licensing.
The basic idea is that a large portion of the population pays a relatively small tax to the government, who then gives it to the artists whose work is downloaded. Terry Fisher says that a small tax on CD burners, DVD burners, DSL, and cable modems (costing the average family $50, less than they spend on DVDs and CDs) could pay for all the music and movies plus a 20% bureaucratic overhead.
Assuming this could be made to work, people could be convinced to accept it, and Congress could pass it, there are still three problems which can’t all be solved.
Privacy
Some proposals suggest that we simply monitor everyone’s Internet connection (or, usually, get the ISPs to do it) and send the results to the government. I think this is an unacceptable invasion of privacy. It’s bad enough we have to have Carnivore watching our packets and describing our emails when law enforcement gets a warrant, but now you want the government to keep track of all the music and movies we download, all the time? I don’t think that’s going to fly.
Accuracy
OK, they say, we won’t watch everyone’s computers. We’ll just use sampling. This has worked well in other media. TV networks, for example, make money off of advertising. They charge for ads based on how many people watch the shows. They figure out how many people watch the shows using Nielsen ratings. Nielsen ratings are calculated by getting a small percentage of the population to install a set-top box which monitors what they watch and when and sends the results back to Nielsen.
(This has some interesting effects, among which is the fact that boycotts of shows only have a real effect insofar as the boycotters are Nielsen homes. This means that as long as you’re not a Nielsen home, you can boycott a show and still watch it.)
(“Sweeps week” is a similar phenomenon but on a somewhat smaller scale. Each individual TV station [like our local NBC affiliate, WMAQ] sells advertising also, so they need to know how many people locally watch the shows. But each little station can’t afford to do the Nielsen thing, so they do something similar with paper diaries that they send out one week of the year. But they all do it on the same week [sweeps week], so the networks purposely introduce big guest stars and major cliffhangers that week to get more people to watch the show.)
This sounds good, and it works reasonably well for TV, but it won’t work on the Internet. Popularity on the Internet
doesn’t follow the old rules; it follows something called a power law. [. . .] There are hundreds of thousands of sites with tens of users and tens of sites with hundreds of thousands of users. And there are tens of thousands of sites with hundreds of users, and thousands of sites with thousands of users and so on.
Sampling can’t cope with this kind of disparity. It can deal when there are a small number of known groups who make up a very small amount of the population (just seek out those groups specifically). But it can’t deal when there’s a large number of unknown groups who each make up a very small amount of the population (like the tons of small websites, each with a small but loyal fan base).
Who cares about these people? you may say. But while each of these groups have small fan bases individually, collectively they make up a significant portion, if not a majority, of the overall system. In other words, if you count these guys out you’ll be doubling the amount of money folks like Britney Spears get over what they deserve.
Britney Spears seems to be doing just fine with the current system. If all we’re doing is helping her, why are we going to all this trouble? And furthermore, if you’re going to tax me to pay the artists I listen to, it’s a little unfair if none of that money goes to the ones I actually care about.
Security
Fine, fine, they say, if they read this far. How about we just have people submit the songs they listen to anonymously? People want their favorite artists to be paid, so they’ll be happy to.
Yeah, but that’s exactly the problem. People want their favorite artists to be paid, especially when those artists are themselves. What stops me from anonymously submitting that 1M people listened to my band and waiting for the money to roll in? Small things like that will get lost in the noise.
Even if the system isn’t anonymous (so we’re forgetting about privacy) you still have this problem. An enterprising MIT student, taking advantage of the fact that MIT has 16.5M IP addresses to themselves, writes a little program to pretend to be a whole bunch of MIT students who all have decided that his band is their new favorite. Again, it’ll get lost in the noise of MIT and the money will roll in.
It doesn’t seem right to tax Americans and give their money to fraudsters, no matter how clever the fraudsters are. It’ll be really hard to eliminate fraud, and when it’s so easy and anonymous, it’ll be more widespread than anything we’ve seen before.
Conclusion
I’ve gone through all the compulsory licensing scenarios, and I always seem to get stuck on one (or more) of these issues. If anyone’s found a way to eliminate all of them, please let me know!
Fixing Compulsory Licensing
http://www.aaronsw.com/weblog/001036
September 15, 2003
Age 16
In a previous post I dashed the world’s hopes for a viable compulsory licensing system, no matter how attractive one might seem. Luckily for the world, I’m back to explain how to make a compulsory licensing system that doesn’t run into any of those problems using . . . cryptography!
(To review, the idea for our compulsory licensing system is this: we tax Internet connections and CD/DVD burners a small amount and send the money to the artists. In exchange, they let us download their songs and movies off the Internet. The problem is how to decide which artists should get the money without losing privacy, accuracy, or security.)
Here’s the key to my proposal: when you pay the tax you get a vote.
So when you buy a CD or DVD burner, it comes with a short string (a random-looking series of letters and numbers) to type into your computer. (The strings are given to the manufacturers by the government when they pay the tax.) When you pay the bill for your Internet connection, you’re emailed another such string. (The string from your email can be handled automatically, and the one in the CD burner box could be made relatively easy to type in.)
The string is a digital gift certificate, worth however much the tax you paid was, but only spendable on donations to artists. Once your computer has the string, it looks at all the songs you’ve listened to and decides what songs to spend your gift certificate money on. (It knows what you listen to because it’s built in to your MP3 player.) If you’ve listened to one Britney Spears song day and night for the past month and nothing else, it will give all your money to Britney. If you listen to a variety of independent bands, it will split your money among them. (Advanced users can of course customize how their money will be spent, but it’s simpler to have the computer choose automatically by default.)
The result is sent anonymously to the government using the string. (The strings will be unique enough that it will be nearly impossible to guess a correct one.) The government checks this against the list of strings they gave out and the list of strings that have already been used to make sure that it’s legitimate, and then credits the appropriate accounts.
Does this solve all the problems?
Yes, it’s private. The strings are received and sent anonymously. (“But wait,” you say, “the Internet providers know who gets what string.” OK, if you’re really paranoid a solution to this is explained below.) The government can’t connect you with your vote.
Yes, it’s accurate. The money goes to the artists that the people like and want to support, as chosen by the people themselves. There are a few edge cases. For example, if everyone listens to but hates Jerry Falwell, they might choose not to give him any money, even though they’ve taken advantage of his work. I think this is an acceptable problem—the majority of people won’t bother to change the defaults and even if they do, hey, it’s their money.
Yes, it’s secure. The amount of money you have control over is equal to the amount of money you paid in taxes, so the worst-case scenario is that you get your tax money back. There is a chance that everyone will give all their money to themselves, but this can be prevented by only paying out to accounts that meet some higher threshold of cash.
Won’t artists offer to buy people’s gift certificates for cash? The artist can spend the gift certificate on themselves and recover their money. (Seth Schoen)
The government could make such behavior against the terms of service for having an artist account. To be successful, any such operation would have to be publicized. The government could keep an eye out for such things, send the operator a known gift certificate, see whose account it went into, and shut down the account.
Can’t operators use this to shut down the account of someone they don’t like?
The government gift certificate would be indistinguishable from a normal one, so they’d have to be giving lots of gift certificates to that person, in which case they’d be losing lots of money. To be extra sure, the government could trace the source of the payment for the gift certificate. Or they could just bankrupt whoever was running the scam by feeding them lots of bogus gift certificates that appeared to go through, but are never credited to the artist’s account.
Hey, where’s the crypto?
OK, here’s the fun part. The money can be securely distributed to you using digital cash techniques. Here’s how that system works, by physical analogy:
1. You send “the bank” (probably the government or your ISP) a gift certificate with a random string on it and a piece of carbon paper in a sealed envelope.
2. They sign the outside of the envelope and their signature goes through the carbon paper onto the gift certificate.
3. You open the envelope, take out the signed gift certificate, and use this as described above. (The government uses the random string to make sure you don’t use it twice and they verify the signature to make sure it’s legitimate.)
4. Each signed gift certificate is worth a set amount ($1?) so you repeat as necessary to get the amount you’re owed.
Since the government can’t open the envelope (we use crypto to ensure this), they have no idea of knowing which gift certificate they signed, so they can’t associate you with it when you spend it later.
Now, to anonymously submit the gift certificates to
the government, you reuse the peer-to-peer network you downloaded the songs from as a remailer network. You encrypt your gift certificate so only the government can read it, then you pass it to a friend on the peer-to-peer network, who passes it to a friend, etc., until someone gives it to the government. The government publishes the list of identifiers for gift certificates they’ve received, so you can make sure it got through and resend it if it didn’t.
Conclusion
This proposal isn’t the simplest, and probably not the most elegant, but unlike the others it will work without cheating the public. I hope the people building these compulsory licensing systems see the value in that.
Postel’s Law Has No Exceptions
http://www.aaronsw.com/weblog/001025
August 18, 2003
Age 16
As Mark Pilgrim is fond of saying, “There are no exceptions to Postel’s Law.” (Postel’s Law is generally quoted as “be liberal in what you accept and conservative in what you put out” or something to that effect.) The message of the law is that interoperability is the primary concern, and that programs should accept things, even things that are against the spec, if necessary to achieve interoperability.
HTML, as you may know, is a mess. It’s contorted in a hundred different ways with tons of bugs and their work-arounds encrusted into the web, and browsers are expected to make sense of all of it. The XML people saw this and said, “We have to fix this.” Their solution was to break Postel’s Law.
With XML you are supposed to die and never look back if the document you come across violates the spec. The idea was that if everything died on invalid feeds, no one would ever write them. This is wrong for three reasons: