Sunday, April 22, 2007

Cookies Wildly Overcount Web Visitors

By Max Kalehoff

"Why don't my server-log files match up with the comScore Media Metrix or Netratings stats?"
When I worked at Media Metrix in the late '90s, and later at comScore following its acquisition of Media Metrix, I heard that question (and plea) thousands of times. In fact, I was involved in numerous campaigns to educate clients and the broader Internet advertising industry on why panel-measurement and server-side traffic data differed so much.

The online media and advertising industry inherently wanted and needed third-party validation. But in the aggressive game of chest-beating over reach and audience, publishers couldn't easily ignore the discrepancies between panel-measurement ratings and their own server-log numbers. And with the latter consistently reporting higher, they still can't!

Of course, there are many reasons for the differences. For one, there is often confusion between tracking unique Web site users versus unique browsers. ISP and browser caching also prevent alignment between log-file and panel measurement. Then there is ambiguity because panel measurements tend to size specific locations and markets --such as by home, work or country -- versus including visitors from virtually anywhere in the universe.

But one of the biggest factors in log-file and panel-measurement discrepancy is Internet users' deletion of cookies, those little snippets of software that Web sites leave on your PC when you visit them, or which ad networks leave behind when serving you ads. They're so often trusted, especially when they reflect favorably, but so misunderstood.

While I no longer work at comScore, the company released this week an important analysis showing the validity of using cookie-based data to measure the number of unique visitors to individual Web sites, as well as the unique users that were served an ad by an ad server. The study -- an analysis of 400,000 home PCs included in comScore's U.S. sample during December 2006, not self-reported data -- examined both first-party and third-party cookies from one prominent Web site and a third-party ad-server network.

Key findings:


For the site analysis, comScore observed that 31 % of U.S. Internet users cleared their first-party cookies during the month. Within this user segment, the study found an average of 4.7 different cookies for the site. Among the 7% of computers with at least 4 cookie resets, comScore counted an average of 12.5 distinct first-party cookies per computer, accounting for 35 % of all cookies observed in the analysis.

Using the total comScore sample as a basis, an average of 2.5 distinct first-party cookies were observed per computer for the site being examined. This indicates that Web site server logs that count unique cookies to measure unique visitors are likely to be exaggerating the size of the site's audience by a factor as high as 2.5, or an overstatement of 150%.

ComScore's analysis of third-party cookies from the third-party ad server revealed an average of 2.6 distinct cookies per computer in December, indicating a similar rate of overstatement as the first-party cookies. For those computers where at least one cookie reset occurred, the number of third-party cookies observed was slightly higher than first-party cookies at 5.5.
While there is bound to be variance of cookie deletion among different users across different sites, these results are directionally compelling. Fred Wilson, a comScore investor and board member, commented on the analysis: "It's true that panel data is generally a lot lower than your own server logs. But that doesn't mean your server logs are right."

How are you reconciling panel data with your server logs?

No comments: