Last week I voiced some concerns around the traffic numbers that Compete.com reports for websites. My primary concern was that, after seeing nearly a 90% variance between their numbers and those from Google Analytics on my own site, I couldn’t trust that their numbers for other sites would be correct. This is important, as we often use Compete.com and similar sites to report on potential reach of coverage for our clients.
I was pleased to see that Compete responded, multiple times — in a comment on my post; via a tweet and then subsequently in email.
Given the interest I saw from other people, I obtained Compete’s permission to publish our exchange.
Comment on my original post:
Hi Dave – Matt here, Client Relations Director at Compete.com. I am happy to do my best to clear up any confusion if you reach out to me at firstname.lastname@example.org. Compete is based on a sample of 2 million, US based panelists. We have a small sample warning on your website, which means we have limited data on your domain. Even when considering your GA numbers, you are well down the long tail of the internet. The code you’ve installed, it is just for audience profiles and it is only found within that tab of the product. It does not impact any traffic numbers. In any event, I’d be happy to explain more if you would like to reach out.
While I (and several people on Twitter) wasn’t thrilled with the tone of “you are well down the long tail of the internet” (which sends a message of “just wanted to remind you you’re nothing to us”), I took Matt up on his offer and emailed their support address:
I’m reaching out following a tweet from your twitter account regarding a recent blog post of mine (http://t.co/lCXr4UN).
I’d love to understand more about why the numbers seem to diverge so much from what I’m seeing on my website logs.
Sure enough, a day later I received a reply:
Thanks for reaching out to us about your site. I know that it can often be confusing [Dave: no, it’s not confusing; it’s just irritating. Moving on…] when comparing the traffic numbers for local analytics (google, omniture, etc) to the numbers on Compete.com. At the core the methodologies are like comparing apples to oranges, they’re both fruit – just produced from different trees. Think of compete numbers as an orange – a U.S. based research numbers that help you understand your size and trends against your competition. Local analytics is more like the apple that helps you understand what’s happening on your site so you can improve your visitor’s experience. They’re great supplements to one another in terms of getting a more complete picture of the internet, but are inherently very different in the approach you take to consuming the data sets.
From a more technical perspective panel-based clickstream data (Compete.com) and web analytics data (local analytics (Google Analytics) and server logs) stems from the underlying methodologies that each approach use. At a high-level, panel-based providers like Compete measure online behavior based on consumers, whereas local analytics measure similar behaviors based on cookies. The consumer metrics that panel companies provide are based on statistically-derived estimates that are derived from a representative sample of consumers; in this instance, the behaviors of the sample are weighted and extrapolated to represent the entire internet browser population. The cookie-centric metrics that web analytics companies provide are developed on simple counts of cookies for all of the web pages that are a tagged on a site or a set of sites; when a consumer visits a specific page on a site, that visit is counted by the web analytics platform.
Both approaches have their strengths and limitations. Panel-based measurement provides excellent insight into visitor demographics, what consumers do across all of the websites they visit and analysis over long time periods. The limitation of consumer panels is that they sometimes do not provide sufficient sample to measure “low incidence” behaviors such as visiting very small sites, using rare search terms, or interacting with low-traffic pages on specific websites. Compete’s panel is one of the largest in the industry, this helps us ensure that we can measure and report on more of these infrequent behaviors compared to other panel providers.
If you take a look at information which is collected in a similar fashion below you’ll see either no data or considerably smaller numbers than local analytics.
Cookie-based web solutions are good sources of information on all of the behaviors that occur within a website, and therefore can be used to calculate and optimize site flow, conversion rate and other onsite activities. In this manner, web analytics are not subject to the same sample requirements as panel companies. However, there are some limitations that cookie-based solutions are susceptible to that panel-based measurement services are not. Cookie-based data can be affected and sometimes inflated by the deletion of a user’s cookies, incorrect page tagging, and susceptibility to bots or spiders. Also, the data found on Compete.com comes from our panel of U.S. users, local analytics and server logs collect U.S. and International data.
I hope this explanation can help clear up any questions that you might have with our numbers vs. those seen when looking at Google Analytics and server logs. Please let me know if you have any other questions that I can help you answer. If you’d like to learn more about our methodology please reference our data methodology whitepaper: http://media.compete.com/site_media/upl/img/Compete%20Data%20Methodology.pdf.
Compete.com Customer Support
While some people I spoke to felt the answer was a bit condescending, I thought it did a good job of explaining the difference in methodologies between Compete and Google Analytics in plain language. Ok, so now I understood the difference between the two services’ methodologies (I didn’t ask, but it’s helpful nonetheless). The question remained, though – why should people trust Compete to provide data on anything but the top tier of sites on the Internet?
So I asked:
Hi, [I didn’t realize until afterward that the original email had been signed — my bad]
Thanks for the thoughtful response.
I do have two follow-up questions for you:
1. For sites that are, in the words of your Client Relations Director, “well down the long tail of the internet” [Dave: I couldn’t resist], do you therefore recommend using other data sources than Compete?
2. Would you mind if I published your response as an update to my blog post? I would love to include your side of the story for people to consider.
Thanks again for reaching out, we’re happy to have you publish our dialog and are open to answering your questions about our tools and how it relates to the industry. Though I would request that you include your follow-up question to provide context to our reply.
In terms of data sources, our client relations team advocates for two things. First, a firm understanding of the data you’re looking at through asking targeted questions about the data source. When you’re unfamiliar with a data set this can be really difficult, because you don’t know what you don’t know, right? The second challenge is using the proper tool for the job, compete specifically isn’t meant to replace your local analytics – this is not our goal.
Compete ultimately is looking to provide our customers with a better understanding of their industry and competitors from a research perspective. If your data set on our site is listed under a small sample warning, the data will have more of a directional relevance that gives you an idea of what’s going on for the industry. For sites with more traffic activity, our data is really helpful in understanding the approach competitors take in terms of SEO, SEM, and traffic acquisition to name a few popular insights typically gleaned.
If you take a look at the image attached (Blogging services 2 year category view*) you’ll notice that between the month of March and April there is a slight monthly decrease of about -2.7%. Looking at the trend in 2010 the dip was about -1.1%, so this 2 year view allows you to see that there’s something happening within US online consumer’s behavoir that makes it so that each year between March and April there is a slight dip in the amount of activity to the overall category (which includes domains like blogger and wordpress).
*Compete Categories are groups of domains we organize for our PRO and Enterprise Level Subscribers.
From my personal go-to toolkit, my data sources could include Google Trends – think search data, similar sites, and geo-demographics. If you’re looking for sentiment data visa-vi social networks, I like using Google Realtime to build out timelines that correspond with news and product happenings. I use these services a lot to supplement our information, they’re not a direct replacement in terms of the value add for a lot of digital marketers but they can help give you a better picture of what people were taking about during specific time periods, breaking news, and similar domains that may be of interest to investigate.
When it comes to studying the internet and ultimately the behavoir of users, getting familiar with new tools and services can be a difficult process if you don’t have the resources for full data immersion. If you’re looking to attempt to prompt specific actions you’d like visitors to take, it takes time and practice. One of the things that Avinash* typically preaches, and I like to echo, is that marketers should always be aiming to synthesize what the data trends indicate, rather than simply “reporting”.
Here’s the post* http://www.kaushik.net/avinash/2011/04/difference-web-reporting-web-analysis.html
I hope this helps, please let us know if you have any other questions.
A couple of thoughts from my end:
Firstly, thank you to Matt and Lindsay from Compete for their thoughtful responses, and for allowing me to publish this exchange. I appreciate the thought and the time spent on their end.
Secondly, it appears that Compete doesn’t intend for its traffic numbers to be used for analysis – from their response, it appears that “For sites with more traffic activity, our data is really helpful in understanding the approach competitors take in terms of SEO, SEM, and traffic acquisition” insights are the primary uses for their data. Fair enough – I hadn’t thought of Compete in that way before, and it’s good to know that that’s their intent.
Thirdly, I still have two outstanding questions.
Outstanding question #1: How accurate are Compete’s/Quantcast’s/Alexa’s numbers for top-tier websites?
Are we looking at a 5% error margin? 15%? 25%? I’d love to know, because we have a duty to clients to know how accurate the numbers we’re using are.
Compete’s team, for all heir helpfulness, still hasn’t explained why people should trust their traffic numbers for sites (although, frankly, I could have been clearer as to why I was asking). This is still a critical issue – to quote my original post:
Should I believe that CNN.com’s traffic went up by 27% in March compared to February? Should I believe that Mashable’s traffic went down by nearly 30% in the last year?
Why is it important? Because I and many other people look at Compete’s numbers to determine sites’ traffic numbers when reporting on the results of our activities. If we can’t believe those numbers, we need to look elsewhere.
Outstanding question #2: What is the best site – free or paid – for providing reach analysis of lower-tier websites?
I readily acknowledge that Compete is a free service (it doesn’t sound like their Pro service adds much in terms of accuracy – just longer time periods and additional data for analysis), and that perhaps I shouldn’t expect too much from a free service.
I’ll be clear, though: I would be happy to consider paid services if they’re able to offer accurate reports.
Let’s face it – few companies are able to conduct outreach targeting only top-tier websites. Especially when you get into niches, there are relatively few relevant sites with traffic comparable to the top-tier of the Internet. So, where do we go for analysis of the rest?
What do you think? My questions again:
- How accurate are Compete’s/Quantcast’s/Alexa’s numbers for top-tier websites?
- What, in your opinion, is the best site – free or paid – for providing reach analysis of lower-tier websites?