Thursday, July 31, 2008

Comcast Faces Sanctions But Still Gains Subscribers

Comcast gained 278,000 new broadband customers last quarter, with more than two-thirds migrating from DSL service, the company said today.

The report comes on the heels of reports that Verizon lost 100,000 DSL subscribers (though many of those replaced their DSL service with Verizon's FIOS) and that AT&T added only 46,000 new DSL customers.

So, even though DSL connections aren't as appealing as they once were, the good news is that broadband use is still growing. But the bad news is that demand is outpacing capacity, especially as more and more people turn to the Web for bandwidth-intensive video.

In fact, it's questionable how many of those new Comcast subscribers will be any happier with their broadband service than they were when they used DSL connections. DSL and cable modems are both considered broadband, but cable modems -- at least theoretically -- sometimes have higher top speeds than DSL lines.

In reality, cable providers aren't able to offer as much bandwidth as people currently want. Comcast has already admitted it slowed peer-to-peer traffic to manage congestion on its network -- actions that spurred net neutrality groups to complain to the FCC that Comcast violated net neutrality principles. The FCC is expected to officially rule against the cable company on Friday.

Comcast's peer-to-peer throttling might be the most high-profile example of the consequences of network congestion, but it's not the only one. A recent study by the Max Planck Institute for Software Systems in Germany showed that Cox also was blocking users from file-sharing sites.

At the same time, demand for bandwidth is only going to grow in the future. A new Integrated Media Measurement Inc. study shows that more than 20% of viewers now watch prime-time television online, up from 6% last fall.

While Comcast should be glad that people are signing up for its Internet service, the company, like other service providers, still must figure out how to make sure those subscribers can use the bandwidth they're paying for.

Monday, July 28, 2008

Cuil Touts User Privacy

A group of Google vets are taking on their former employer with the launch of a new search engine, Cuil.

The engine, unveiled today, boasts it indexes 120 billion pages -- or three times Google's 40 billion. But these raw numbers aren't all that useful when determining whether a search engine can return pages related to users' queries. Index size also isn't an especially reliable metric, because different companies sometimes use different tallying methods.

Regardless, Cuil certainly isn't the first would-be Google rival. The search giant's success has spawned a host of entrants into search, but none have come close to putting a dent in Google's commanding market share.

Cuil also had some crashes this morning, but that's not necessarily a bad sign; sites often need to get some bugs ironed out when they launch.

In some ways, what's most notable about Cuil is that the company is touting itself as privacy-friendly. The home page contains just two links -- "About Cuil" and "Your Privacy." Users who click on the privacy link land on a page that states, "We do not collect any personally identifiable information, period. We have no idea who sends queries: not by name, not by IP address, and not by cookies." Cuil also states it doesn't store logs of users' activity on the site.

If nothing else, Cuil's move shows that privacy considerations are top of mind in Silicon Valley these days. Companies might disagree about the wisdom of storing IP addresses, but there's no real question that query logs can reveal users' identities, as the world learned two years ago AOL released three months' worth of query data for 650,000 anonymized users. One such user, Thelma Arnold, was identified in a matter of days by The New York Times.

Google insists that it needs to store query logs to improve its search results and to guard against click fraud. But the emergence of companies like Cuil calls into question whether Google needs this information as much as it says it does.

Friday, July 25, 2008

A new deal between the U.K. record labels' group and six British

Internet service providers calls for the ISPs to start taking action against subscribers who allegedly download pirated material. Under the arrangement, the ISPs will send warning letters to users suspected of sharing copyrighted material.

After a set number of warnings, it's possible that ISPs will start throttling traffic, but there's no agreement yet on this point. One ISP, Carphone Warehouse, has gone on record as saying it won't implement any sort of "three strikes" rule that would cut off subscribers' connections after several warnings, according to PC Pro.

The U.K. record labels' organization, BPI, seems happy with this deal, calling it "a groundbreaking agreement.... on measures to help significantly reduce illegal filesharing."

But the reality is that this plan is likely to do nothing other than highlight how hard it is to detect online piracy. An April study showed that filters are routinely stymied by encryption techniques.

And three University of Washington computer scientists reported last month that they received hundreds of takedown notices wrongly accusing them of infringing copyright. "Our results show that potentially any Internet user is at risk for receiving DMCA takedown notices today," they wrote in the report ""Challenges and Directions for Monitoring P2P File Sharing Networks -- or -- Why My Printer Received a DMCA Takedown Notice."

When innocent users start getting notices that they're suspected of piracy -- and it's inevitable that they will -- the record labels will face an even bigger public relations problem than at present. And users who are infringing copyright might learn that they need to use encryption technology, but there's no reason to think they will stop trading files. If anything, this deal just escalates a brewing battle between Web users and the record labels, while doing nothing to encourage people to pay for music.

Markey 'Still Troubled' By NebuAd Test

Only 15 Embarq subscribers out of 26,000 asked the company to refrain from selling information about their Web surfing history to behavioral targeting company NebuAd.

That was one of the additional details Embarq revealed late Wednesday in a second letter responding to a Congressional inquiry about its test of NebuAd's platform.

If the proportion of opt-outs sounds low, consider that the vast majority of Embarq subscribers probably had no idea that the company was conducting such a test. That's because Embarq chose to inform subscribers of the test, conducted in Gardner, Kan., by revising its privacy policy about two weeks before embarking on the experiment.

The company posted the revision online, on its own corporate site -- a type of notice that seems designed to ensure as few people as possible read it. After all, subscribers who use the Web in typical ways -- to read newspapers, check e-mail, watch TV, read blogs or otherwise consume media -- could easily do so for months, if not years, without ever thinking to visit their ISP's home page to investigate whether the company had decided to start selling their data.

Rep. Ed Markey, who held a hearing last week about NebuAd, isn't satisfied. "I am still troubled by the company's failure to directly inform their consumers of the consumer data gathering test and the notion that an 'opt-out' option is a sufficient standard for such sweeping data gathering."

Privacy advocates say that ISP-based behavioral targeting violates wiretap laws unless subscribers consent. Some states additionally require that both parties to a conversation consent -- meaning that publishers seemingly also need to give permission to share the information. Advocates also are concerned because ISPs have access to users' entire clickstream histories, from every search conducted to every Web site visited. NebuAd says it doesn't collect "sensitive" information or store names, addresses or other information that could be used to identify individual users, but advocates are skeptical. After all, even without names or IP addresses, a detailed clickstream history can in itself provide clues to users' identities -- especially if people conduct searches on, say, their own names, hometowns, employers, and the like.

If NebuAd wants to convince lawmakers its program is legitimate, it needs to do a better job of making sure that subscribers know about it and can make a decision about whether to participate.

Wednesday, July 23, 2008

The Case Of The Too-Private Privacy Notice

To defend itself from charges that it sold subscribers' Web-surfing data without their consent, Internet service provider Embarq has issued the absurd defense that it notified subscribers by posting a revision to its privacy policy online.

Even more ludicrous, Embarq attempts to argue that such a procedure is consistent with the Federal Trade Commission's proposed voluntary guidelines for behavioral advertising.
Embarq earlier this year tested NebuAd's behavioral targeting platform. NebuAd, unlike network-based behavioral targeting companies, works with ISPs to collect data about Web sites users visit and then send them targeted ads.

Digital rights advocates say this program violates federal wiretap laws, and some lawmakers have said that it should require users' explicit consent. NebuAd says that users can always opt out of the program -- but, of course, that's only an option for users who are aware of it.
Embarq, like most companies that have tested NebuAd's platform, "notified" subscribers by quietly changing its privacy policy. When lawmakers learned of this test and demanded answers, Embarq justified its procedure as consistent with FTC proposed guidelines and with the way ad networks notify users about behavioral targeting. Embarq is wrong on both counts.

Yes, many Web publishers that participate in ad networks inform visitors about that via policy privacies. Whether anyone reads those policies is open to debate, but at least they're posted at the sites where the data is being collected.

Embarq simply revised the privacy policy on its own site -- a site that it's hard to imagine Embarq customers have much reason to frequently visit. Even if some subscribers go to Embarq's site to pay their Internet access bills online, it's not likely that this happens more than once a month. But Embarq only revised its policy around two weeks before conducting the test.

Additionally, the FTC's proposed voluntary guidelines call for notice at "every website where data is collected for behavioral advertising." Embarq wasn't collecting data on its own sites. It was collecting data at sites like Google, Yahoo and NYTimes.com. It's safe to assume that very few if any Embarq subscribers who visited those sites had no inkling that Embarq was selling that information. Clearly Embarq wanted it that way.

Congress member Ed Markey has already made it clear he's not happy with how the NebuAd tests were conducted. "We need to have remedial legal courses for some corporate general counsels," he said at a hearing last week. Embarq's response to Markey isn't likely to change his mind about that.

Monday, July 21, 2008

Find Evidence on Your Opponent's Web Site

One of the best places on the Internet to find information about a company
-- such as a litigation adversary -- is the company's own Web site. But while a visitor researches a company, the company may be researching the visitor, revealing more than the researcher would like. In addition, the company may at any time change or remove information on its Web site that may be most valuable to the researcher. This article discusses the information that Web site owners can learn about visitors to their site, and shows ways to see older versions of Web pages that may have been changed or removed.

Web sites routinely collect certain information from visitors to maintain statistics and to enhance the visitor's experience on the Web site. Much of this information may be sent from the visitor's computer to the Web site without the visitor's knowledge, and may reveal more than the visitor expects. A Web site owner can learn many things about visitors through "cookies" and environment variables such as the IP address.

A "cookie" is a small piece of information written on a visitor's computer by a Web site. A cookie might contain the visitor's Web site user name and password, display preferences or even name and address. When a Web site offers to "remember" a visitor, it is offering to write cookies. Cookies stay on the visitor's computer after the visitor has left the Web site, closed the Web browser, disconnected from the Internet and even turned off the computer. If a visitor provides his name and e-mail address to a Web site, that information might be stored in a cookie, and would be available to the Web site on the next visit, which could be months later.

Cookies have received a great deal of attention in the media because privacy advocates are concerned about the way advertisers use cookies. However, cookies are probably not a significant concern for those performing covert research on an opposing party's Web site. As a general rule, cookies contain only information that the visitor has provided to the Web site or information that the Web site could have obtained without cookies.

If a visitor is concerned about information that might be stored in cookies, cookies can be erased. In the Internet Explorer Web browser, for example, the visitor can pick Tools menu, Internet Options, General, Delete Cookies. This can be done at any time -- before, during or after the visit to a Web site -- and will immediately delete all cookies. Unfortunately, this will also delete desirable cookies, such as Westlaw or Lexis logins. For those who wish to preserve desirable cookies while deleting undesirable cookies, there is privacy software that provides enhanced cookie management.

A greater concern for those performing covert research is environment variables, particularly the Internet Protocol address. The IP address is a unique identifying set of numbers used to direct communications through a network or the Internet. A Web site always has access to every visitor's IP address: Without that information, the Web site and visitor would not be able to communicate. However, the IP address may reveal more than the visitor realizes.

Most larger businesses, including large law firms, have "static" IP addresses, permanent IP addresses that specifically identify the company. For example, the static IP address 67.200.59.2 can easily be identified as the Young Conaway law firm. Most smaller businesses and residential connections to the Internet use "dynamic" IP addresses, temporary addresses that are assigned when the person connects to the Internet and may be different every time. The dynamic IP address 141.158.235.41 can be identified as a customer of the Verizon Internet service in the Philadelphia area, but cannot be connected to a specific individual or company.

The Web site Broadband Reports has a useful tool to show what can be learned from a person's IP address. When a person visits www.dslreports.com/whois, the page displays the visitor's current IP address. That IP address can then be entered in the WhoIs box to learn what is readily known about that IP address. Another site, www.IP-adress.com, displays the IP address of the current visitor with a map showing the locality associated with the IP address.

Web sites routinely store IP addresses for statistical purposes, but Web site owners do not ordinarily analyze the IP address of every visitor to a Web site, so there is little concern in casually browsing public areas of an opponent's Web site. However, Web site owners are likely to check the IP address when there is suspicious behavior. For example, they might check the IP address of a person who tries to view a confidential, blocked or hidden page. They might check the source of an e-mail requesting information about the company or its products. Users should be aware that the e-mail sender's identity cannot be concealed by using Web e-mail services, such as a Hotmail, Gmail or Yahoo Mail -- these services embed the sender's IP address in the e-mail. The only way to effectively conceal the sender's identity is to send the e-mail from some other location, such as a home computer, a public library or an Internet cafe.

Web site owners may also track the IP addresses of messages posted on the Web site's message boards or chats conducted through online chat services, and are likely to check the address if the post or chat is suspicious in nature. For example, if a visitor posts a message on a customer support message board asking if any other customers have had a particular problem with the company's product, the site owner might be inclined to check the poster's IP address.

Environment variables can also reveal the last page that the visitor saw before coming to the current page, the page where the visitor clicked a link to come to the current page. Like IP addresses, this is not the sort of thing that a Web site owner normally checks in the absence of suspicious activity. However, if a page on one site links to a page on another site that is supposed to be confidential or hidden, the host of the latter site might look into the former site and into the visitors who clicked that link.

Other information found in environment variables is generally less of a concern for covert research. For example, environment variables reveal the visitor's browser (Internet Explorer, Firefox, Opera, etc.), which is not especially confidential. Hypothetically, environment variables could reveal a visitor's network login, but as a practical matter that information is rarely revealed.

THE WAYBACK MACHINE

Browsing a party's Web site will only show the information that the Web site owner currently wants visitors to see. Sometimes, the most valuable information about an opposing party is the information that has been changed or removed. Fortunately, there are ways to see older versions of Web pages. Pages that were changed recently can be viewed through Google's cache feature. Pages that were changed months or years ago may be available through the Internet Archive, also known as the Wayback Machine. Viewing these older versions of Web pages avoids the privacy risks discussed above: The copied pages are not on the company's Web site, so the company has no record of the researcher's activities.

When Google indexes Web pages, it stores a copy, referred to as a "cached" page. Google provides a link labeled "Cached" that allows researchers to view this copy. This cached version may be a day, a week or a month old, depending on how recently Google indexed the page.

Google's cache is most useful when the page found in a search doesn't fit the search performed. The mismatch occurs because the page has changed since it was indexed. The cached version will show the page as it appeared when it was indexed, with the search terms highlighted. The cache can also be useful when seeking information that is known to have been recently removed. If a researcher recently saw useful information on a Web site but that information is no longer there, a Google search for the missing information could turn up a cached version of the page that would contain the desired information. Google discusses its cache feature in detail in the Google Guide at Cached Pages.

If older versions of Web pages are desired, they may be found in the Internet Archive, better known as the Wayback Machine, a reference to the "Peabody's Improbable History" segment on the classic "Rocky and Bullwinkle" cartoons. The Wayback Machine crawls the Internet and makes copies of Web pages, storing them as they existed at some time in the past. It currently stores more than 85 billion Web pages, comprising two petabytes of information, archived since 1996.

The Wayback Machine does not allow visitors to search the archive's content; it simply retrieves older versions of a page with a known Web address. The page may not look precisely the way it did at that time: Images, formatting or code may be missing from the page. However, the text of the page is as it was on the day it was archived. Links on the page will function, and will take the visitor to archived versions of the linked page, allowing visitors to browse through an older version of the site. This is very useful if the precise address of the desired old page is unknown. Users should be aware, however, that the linked page may not be from precisely the same date as the linking page. It is important to watch the URL (Web address), which indicates the date in a year-month-day format. For example, the Wayback Machine contains a version of the Young Conaway home page archived on Aug. 11, 2007, with this URL: http://web.archive.org/web/20070811170145/http://ycst.com/. The page links to an article about the firm's support for the South Asian Bar Association that was archived on June 29, 2007, with this URL: http://web.archive.org/web/20070629214521/ycst.com/newsart.htm?a=179.

The Wayback Machine can be used to find older versions of guidelines, policies or procedures of an organization that have since been changed. It may contain claims that the company made about its products, services or business prospects that it may now deny. It may show when a company possessed particular information. It may hold older versions of manuals or documentation that are no longer available.

USE AS EVIDENCE IN LITIGATION

The Wayback Machine has been used several times as evidence in trade secret and copyright infringement cases. See Syncsort Inc. v. Innovative Routines International Inc., No. 04-3623, 2008 U.S. Dist. Lexis 35364 (D.N.J. April 30, 2008) (to prove that information was not a trade secret because it was publicly available on the Internet at one time); Allen v. The Ghoulish Gallery, No. 06cv371, 2007 U.S. Dist. Lexis 86224 (S.D. Calif. Nov. 20, 2007) (to prove validity of copyright claim); Telewizja Polska USA Inc. v. Echostar Satellite Corp., No. 02 C 3293, 2004 U.S. Dist. Lexis 20845 (N.D. Ill. Oct. 14, 2004) (to demonstrate inaccurate claims made in opposing party's past advertising).

However, use of the Wayback Machine as evidence has been questioned as hearsay under Fed. R. Evid. 801 and as lacking authentication under Fed. R. Evid. 901. See, e.g., Novak v. Tucows Inc., No. 06-CV-1909, 2007 U.S. Dist. Lexis 21269 (E.D.N.Y. March 26, 2007); Chamilia LLC v. Pandora Jewelry LLC, No. 04-CV-6017, 2007 U.S. Dist. Lexis 71246 (S.D.N.Y. Sept. 24, 2007); and St. Luke's Cataract & Laser Inst. P.A. v. Sanderson, No. 8:06-CV-223, 2006 U.S. Dist. Lexis 28873 (M.D. Fla. May 12, 2006), though one court has permitted its use over such objections. See Telewizja Polska USA, 2004 U.S. Dist. Lexis 20845, at *6 (finding an affidavit to be sufficient authentication, and the information not hearsay as an admission by a party-opponent). Nevertheless, the Wayback Machine remains a valuable research tool, even if its contents cannot be used for evidence.

Researching an opposing party's Web site, both past and present content, can be a valuable source of information. But researchers must remember that if they are looking at their opponent's current Web site, rather than an older copy, the Web site owner may be aware of who they are and what they are doing.

Thursday, July 17, 2008

rpath.org Privacy Policy

Preface: We intend this policy to be a common-sense policy supporting minimal use and storage of private information. Our goal for this site is to store as little private personal information as is technically and legally feasible.

The term "The Service" as used in this document means the Conary repository hosting service accessible via http://www.rpath.com/rbuilder/ and all software development projects created via http://www.rpath.com/rbuilder/, hosted in the rpath.org domain.

We will never sell, rent, or otherwise transfer your private information that you give to us unless required by law or regulation; we will not use your information for unsolicited commercial email without your express permission (opt-in); you may request to be removed from our lists and we will use our best efforts to honor your request within 30 days.

We may send you automated email messages that are not unsolicited commercial email. By way of example and not limitation:

We may send periodic emails purely to verify that the email data that you have provided to us is still valid. We may, from time to time, inform you of any changes to this privacy policy or our terms of service by email.

Private information may include the following information in your user profile:

email address
full name
passwords and other authentication information such as hints
other contact information, such as phone number and address
You may be given the option whether to mark some such information as private or public. The profile information may also include other information we collect, as required by law.

All other information on the site is to be considered public information, and storing private information on this service, except for personal account information, is prohibited by the terms of service.

Some use of this site may intrinisically disclose personal information.

When you commit to a Conary archive, the name and contact information you provide will be permanently recorded in the Conary archive. When you send mail to a mailing list, the email address you use will be provided in the email that is sent, and will be stored in the permanent archives on our site. Any information at all that you commit to a Conary archive will be publically visible.

These examples are by way of example and not limitation.

We may collect, analyze, and store detailed and aggregate network information, including domain names, in order to monitor trends. We may share or publish this information only in aggregate form.

The full functionality of this site may, at our discretion, require the use of "cookies", which store data on your hard drive or in memory on your computer.

This privacy policy applies only to The Service, and not to any other services provided by rPath, Inc. Links on this site to URLs not part of The Service are not covered by this privacy policy.

In order to enhance security and guard the privacy of your information, we may take any actions necessary, such as security audits by persons or automated tools. Access to your private information will be limited to individuals who have a non-disclosure agreement.

If site security has been compromised in any way, we reserve the right to notify and cooperate fully with appropriate law enforcement officials, and to take other measures that we believe to be appropriate. If we are aware that your private information has been disclosed, we will attempt to notify you by email, as soon as possible and permitted by law or regulation, of the information we possess related to the disclosure of your private information.

We may change this policy from time to time. Any such changes will not make previously private personal information public, unless required by law. A change to this privacy policy will be posted at the following URL http://www.rpath.com/permanent/rbo-privacy.html, and we will send email to active users notifying them of the change 15 days prior to change, unless otherwise required by law or regulation.

This document expresses our policy for maintaining your privacy. We do not guarantee any specific performance under these terms. In particular, circumstances over which we have no control may cause your information to be disclosed. We will not be liable if your information is disclosed.

Your use of any services provided as part of The Service signify your acceptance of these terms.

Wednesday, July 16, 2008

Google Cookies

Google cookies - When you visit Google, they send one or more cookies - a small file containing a string of characters - to your computer that uniquely identifies your browser.

Google then uses cookies to improve the quality of service by storing user preferences and tracking user trends, such as how people search.

Most browsers are initially set up to accept cookies, but you can reset your browser to refuse all cookies or to indicate when a cookie is being sent.

Some Google features and services may not function properly if your cookies are disabled.

Wednesday, July 9, 2008

Skeptics Question NebuAd's Privacy Claims

To hear NebuAd CEO Bob Dykes tell it, the controversial company is the best thing to come along for online privacy in a very long time.

"NebuAd's systems are designed so that no one, not even the government, can determine the identity of our users," Dykes told the Senate commerce committee today at a hearing in Washington.

NebuAd partners with ISPs to gather data that's used to send consumers targeted ads. Its platform riles privacy advocates because ISPs have access to users' entire Web-surfing history, ranging from every search made to every Web site visited.

Dykes insists that any information that isn't relevant to particular marketing segments is immediately discarded and that the company doesn't store users' names, identifying information or IP addresses. NebuAd converts IP addresses into other, random, identifiers via a supposedly irreversible and uncrackable formula, Dykes said.

He added that the company developed its platform in 2006, shortly after AOL posted search histories of 650,000 Web users online -- a blunder still considered among the worst privacy breaches to date. Even though AOL had "anonymized" the IP addresses, it proved possible to identify users simply by examining their search histories. Dykes said the company aimed to design a platform that would make a similar breach impossible.

Privacy advocates, meanwhile, weren't convinced. Leslie Harris, president and CEO of the digital rights group Center for Democracy & Technology, argued that NebuAd's platform seems to violate federal wiretap laws.

Byron Dorgan, the Senator who chaired today's hearing, also seemed unpersuaded. He questioned NebuAd's decision to let users opt out of the service, as opposed to asking them to affirmatively consent to it. Dorgan said if his ISP approached him to ask if he would allow another company to view every site he visited, his answer would be an unequivocal no. "Of course it's not okay. Are you kidding me? N-O. No."

One topic didn't come up at today's hearing: adware.
Recent media reports have highlighted the fact that several veterans of adware company Claria (formerly Gator) are now executives at NebuAd. Additionally, NebuAd rival Phorm used to be an adware company.

Certainly, there are some superficial similarities. Adware companies target ads to Web users based on the sites they visit. But then again, so do all behavioral targeting companies. It's true that older behavioral targeting companies only collect data from a limited number of sites, while adware companies, as well as Phorm and NebuAd, have access to all sites users visit.

But adware companies -- at least in theory -- look somewhat different from a privacy point of view than Phorm and NebuAd. Consider, adware companies are theoretically opt-in, in that consumers must affirmatively download the ad-serving software. (Admittedly, that isn't always the case, given rogue installers' ingenuity in hijacking people's computers and loading them with software.) But NebuAd and Phorm are both opt-out, meaning that consumers who don't read the notifications will automatically be included in the program.

NebuAd and Phorm's business model is different from adware in at least one other key respect. Adware companies traditionally served pop-ups that competed with publishers' own ads. NebuAd and Phorm only serve ads on Web sites of publishers they have deals with.

That's not to say that publishers by and large will embrace NebuAd and Phorm. Both companies harvest information from Web sites they have no relationship with -- activities that may well lead to lawsuits. In fact, the Center for Democracy & Technology this week pointed out that at least 12 states require that both parties to a conversation consent to it being recorded. Even if Web users agree to participate in NebuAd or Phorm's programs, the Web sites they visit might not likewise agree.

Monday, July 7, 2008

Google's Privacy Policy Link: Too Little, Too Late?

Now that it's facing a genuine privacy crisis, Google has decided to quell a completely insignificant privacy dust-up. The search giant has finally placed a link to its privacy policy on the home page. Previously, users had to click on multiple links, or search on the terms "Google privacy policy," to reach the information.

This purely cosmetic change might placate some watchdogs, who argued that Google was violating California law by not including the link on its home page, but does nothing to solve the larger problem: Google stores too much information about its users.

Among other data, the company retains logs showing users' IP addresses and their search queries. Google contends that IP addresses don't usually reveal people's identities. But that assertion ignores the reality that examining all of a person's searches can in itself reveal identity. In other words, users' identities can be deduced whether the IP address is real or a made-up sequence of numbers -- as long as it's paired with all of the searches originating from a single computer.

Last week, a federal judge ordered Google to disclose to Viacom complete user logs for YouTube, including all users' IP addresses, screen names and which videos they watched. Google and Viacom have since tried to quell privacy concerns, with Google saying it will ask to "anonymize" IP addresses, even though that won't necessarily preserve users' privacy as long as all of their information is still paired with the same identifier. Viacom has also said it will handle all information confidentially.

As privacy advocates point out, Google wouldn't be facing this problem now if it hadn't compiled and stored these records in the first place.

Louis Stanton, the federal judge who issued the order in the YouTube lawsuit, wrote in his opinion that Google argues in its public policy blog that IP addresses aren't necessarily personally identifiable. "We have proposed broad global privacy standards, and are strong supporters of the idea that data protection laws should apply to any data that could identify you. The reality is though that in most cases, an IP address without additional information cannot," the blog states.
But, on other sections of its site, Google equates IP addresses with personally identifiable information. "Due to user privacy concerns, Google Analytics doesn't report on personally identifiable information, including a visitor's IP address," the company states on a site about Google's analytics tool.

In other words, even Google realizes that, for all practical purposes, IP addresses should be treated as personally identifiable information. Given the events of last week, the company should rethink the wisdom of retaining such data.

Sunday, July 6, 2008

Privacy advocates are condemning a federal judge's ruling ordering Google to give Viacom information about which users watched what videos on YouTube.

Viacom had asked for the information as part of its $1 billion copyright infringement lawsuit against video-sharing site YouTube. Viacom aims to show that a large proportion of clips watched on the site are pirated.

The judge in the case, Louis Stanton, ruled in Google's favor in significant respects. He ruled that Google need not reveal its search formula or its details of its ad platform. But he also ruled that Viacom could obtain data about users' activity on YouTube, including their screen names and IP addresses.

He held that such logs didn't compromise users' privacy because screen names are pseudonymous. Additionally, he wrote, IP addresses alone can't identify users. To support the latter proposition, he quoted Google's public policy blog, which argued that IP addresses should not be considered personally identifiable information.
Pundits lost no time in criticizing that holding. "I say this with the utmost respect, but Judge Stanton is a moron. And Google simply cannot hand this data over without facing a class action lawsuit of staggering proportions," writes Michael Arrington of TechCrunch.

The Electronic Frontier Foundation also takes Stanton to task. "The Court's erroneous ruling is a set-back to privacy rights, and will allow Viacom to see what you are watching on YouTube. We urge Viacom to back off this overbroad request and Google to take all steps necessary to challenge this order and protect the rights of its users," the group said in a statement.

In fact, whether IP addresses should be considered personally identifiable is a subject of huge debate. Regulators in Europe have indicated that IP addresses are personal data. Google, which stores uses' search queries by IP address, contests that notion, arguing that in most cases, IP addresses can't in themselves be used to identify specific users. Google says it needs to keep IP logs of search requests to improve its search engine and to fight click fraud.

In the U.S., privacy advocates and search engines have faced off on this issue, but without any resolution.

The EFF and other observers have pointed to AOL's "Data Valdez" -- the data breach that occurred when an employee posted three months' worth of search queries for 650,000 users -- as proof that IP addresses can reveal identity. But there, it wasn't the IP addresses that revealed people's identities; rather, it was the substance of the searches. The IP addresses had been "anonymized," but people's identities were ascertainable because they typed their names or addresses or other key information into the query box.

With YouTube, it's not clear that simply knowing what videos were watched will reveal people's identity the same way that learning their search history could. Still, it could happen. Additionally, as the EFF points out, YouTube users didn't have the opportunity to come into court and oppose Viacom's request. For that reason alone, Stanton should rethink his decision.