More Stories
How To Support Data with Real-Life Interviews - Whiteboard Friday

Posted by rcancino

With all the data that today's marketers can access, there's often still no substitute for the quality of information you can get from interviewing real people. In today's Whiteboard Friday, we welcome Rebekah Cancino -- a partner at Phoenix-based Onward and #MozCon 2016 speaker -- to teach us the whys and hows of great interviews.


Click on the whiteboard image above to open a high resolution version in a new tab!

Video Transcription

Hi, Moz fans. I'm Rebekah Cancino. I'm a partner at Onward, and I lead content strategy and user experience design. Today I'm here to talk to you about how to support the data you have, your keyword data, data around search intent, analytics with real life user interviews.

So recently, Rand has been talking a little more about the relationship between user experience design and SEO, whether it's managing the tensions between the two or the importance of understanding the path to customer purchase. He said that in order to understand that path, we have to talk to real people. We have to do interviews, whether that's talking to actual users or maybe just people inside your company that have an understanding of the psychographics and the demographics of your target audience, so people like sales folks or customer service reps.

Now, maybe you're a super data-driven marketer and you haven't felt the need to talk to real people and do interviews in the past, or maybe you have done user interviews and you found that you got a bunch of obvious insights and it was a huge waste of time and money.

I'm here to tell you that coupling your data with real interviews is always going to give you better results. But having interviews that are useful can be a little bit tricky. The interviews that you do are only as good as the questions you ask and the approach that you take. So I want to make sure that you're all set and prepared to have really good user interviews. All it takes is a little practice and preparation.


It's helpful to think of it like this. So the data is kind of telling us what happened. It can tell us about online behaviors, things like keywords, keyword volume, search intent. We can use tools, like or Ubersuggest or even Moz's Keyword Explorer, to start to understand that.

We can look at our analytics, entry and exit pages, bounces, pages that get a lot of views, all of that stuff really important and we can learn a lot from it. But with our interviews, what we're learning about is the why.


This is the stuff that online data just can't tell us. This is about those offline behaviors, the emotions, beliefs, attitudes that drive the behaviors and ultimately the purchase decisions. So these two things working together can help us get a really great picture of the whole story and make smarter decisions.

So say, for example, you have an online retailer. They sell mainly chocolate-dipped berries. They've done their homework. They've seen that most of the keywords people are using tend to be something like "chocolate dipped strawberries gifts" or "chocolate dipped strawberries delivered." And they've done the work to make sure that they've done their on-page optimization and doing a lot of other smart things too using that.

But then they also noticed that their Mother's Day packages and their graduation gifts are not doing so well. They're starting to see a lot of drop-offs around that product description page and a higher cart abandonment rate than usual.

Now, given the data they had, they might make decisions like, "Well, let's see if we can do a little more on-page keyword optimization to reflect what's special about the graduation and Mother's Day gifts, or maybe we can refine the user experience of the checkout process. But if they talk to some real users -- which they did, this is a real story -- they might learn that people who send food gift items, they worry about: Is the person I'm sending the gift to, are they going to be home when this gift arrives? Because this is a perishable item, like chocolate-dipped berries, will it melt?

Now, this company, they do a lot of work to protect the berries. The box that they arrive in is super insulated. It's like its own cooler. They have really great content that tells that story. The problem is that content is buried in the FAQs instead of on the pages in places it matters most -- the product detail, the checkout flow.


So you can see here how there's an opportunity to use the data and the interview insights together to make smarter decisions. You can get to insights like that for your organization too. Let's talk about some tips that are going to help you make smarter interview decisions.

So the first one is to talk to a spectrum of users who represent your ideal audience. Maybe, like with this berry example, their ideal customer tends to skew slightly female. You would want that group of people, that you're talking to, to skew that way too. Perhaps they have a little more disposable income. That should be reflected in the group of people that you're interviewing and so forth. You get it.

The next one is to ask day-in-the-life, open-ended questions. This is really important. If you ask typical marketing questions like, "How likely are you to do this or that?" or, "Tell me on a scale of 1 to 10 how great this was," you'll get typical marketing answers. What we want is real nuanced answers that tell us about someone's real experience.

So I'll ask questions like, "Tell me about the last time you bought a food gift online? What was that like?" We're trying to get that person to walk us through their journey from the minute they're considering something to how they vet the solutions to actually making that purchase decision.

Next is don't influence the answers. You don't want to bias someone's response by introducing an idea. So I wouldn't say something like, "Tell me about the last time you bought a food gift online. Were you worried that it would spoil?" Now I've set them on a path that maybe they wouldn't have gone on to begin with. It's much better to let that story unfold naturally.

Moving on, dig deeper. Uncover the why, really important. Maybe when you're talking to people you realize that they like to cook and by sharing a food item gift with someone who's far away, they can feel closer to them. Maybe they like gifts to reflect how thoughtful they are or what good tastes they have. You always want to uncover the underlying motives behind the actions people are taking.

So don't be too rushed in skipping to the next question. If you hear something that's a little bit vague or maybe you see a point that's interesting, follow up with some probes. Ask things like, "Tell me more about that," or, "Why is that? What did you like about it?" and so on.

Next, listen more than you talk. You have maybe 30 to 45 minutes max with each one of these interviews. You don't want to waste time by inserting yourself into their story. If that happens, it's cool, totally natural. Just find a way to back yourself out of that and bring the focus back to the person you're interviewing as quickly and naturally as possible.

Take note of phrases and words that they use. Do they say things like "dipped berries" instead of "chocolate-dipped strawberries?" You want to pay attention to the different ways and phrases that they use. Are there regional differences? What kinds of words do they use to describe your product or service or experience? Are the berries fun, decadent, luxurious? By learning what kind of language and vocabulary people use, you can have copy, meta descriptions, emails that take that into account and reflect that.

Find the friction. So in every experience that we have, there's always something that's kind of challenging. We want to get to the bottom of that with our users so we can find ways to mitigate that point of friction earlier on in the journey. So I might ask someone a question like, "What's the most challenging thing about the last time you bought a food gift?"

If that doesn't kind of spark an idea with them, I might say something even a little more broad, like, "Tell me about a time you were really disappointed in a gift that you bought or a food gift that you bought," and see where that takes them.

Be prepared. Great interviews don't happen by accident. Coming up with all these questions takes time and preparation. You want to put a lot of thought into them. By asking questions that tell me about the nature of the whole journey, you want to be clear about your priorities. Know which questions are most important to you and know which ones are must have pieces of information. That way you can use your time wisely while you still let the conversation flow where it takes you.

Finally, relax and breathe. The people you're interviewing are only going to be as relaxed as you are. If you're stiff or overly formal or treating this like it's a chore and you're bored, they're going to pick up on that energy and they're probably not going to feel comfortable sharing their thoughts with you, or there won't be space for that to happen.

Make sure you let them know ahead of time, like, "Hey, feel free to be honest. These answers aren't going to be shared in a way that can be attributed directly to you, just an aggregate."

And have fun with it. Be genuinely curious and excited about what you're going to learn. They'll appreciate that too.

So once you've kind of finished and you've wrapped up those interviews, take a step back. Don't get too focused or caught up on just one of the results. You want to kind of look at the data in aggregate, the qualitative data and let it talk to you.


What stories are there? Are you seeing any patterns or themes that you can take note of, kind of like the theme around people being worried about the berries melting? Then you can organize those findings and make sure you summarize it and synthesize it in a way that the people who have to use those insights that you've gotten can make sense of.

Make sure that you tell real stories and humanize this information. Maybe you recorded the interviews, which is always a really good idea. You can go back and pull out little sound bites or clips of the people saying these really impactful things and use that when you're presenting the data.

So going back to that berry example, if you recall, we had that data around: Hey, we're seeing a lot of drop-offs on the product description page. We're seeing a higher cart abandonment rate. But maybe during the user interviews, we noticed a theme of people talking about how they obsessively click the tracking link on the packages, or they wait for those gift recipients to send them a text message to say, "Hey, I got this present." As you kind of unraveled why, you noticed that it had to do with the fact that these berries might melt and they're worried about that.

Well, now you can elevate the content that you have around how those berries are protected in a little cooler-like box on the pages and the places it matters most. So maybe there's a video or an animated GIF that shows people how the berries are protected, right there in the checkout flow.


I hope that this encourages you to get out there and talk to real users, find out about their context and use that information to really elevate your search data. It's not about having a big sample size or a huge survey. It's much more about getting to real life experiences around your product or service that adds depth to the data that you have. In doing that, hopefully you'll be able to increase some conversions and maybe even improve behavioral metrics, so those UX metrics that, I don't know, theoretically could lead to higher organic visibility anyway.

That's all for now. Thanks so much. Take care.

Video transcription by

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!

Duplicate Listings and the Case of the Nomadic New Mexican Restaurant

Posted by MiriamEllis


Albuquerque’s locals and tourists agree, you can’t find a more authentic breakfast in town than at Perea’s New Mexican Restaurant. Yelp reviewers exclaim, "Best green chile ever!!", "Soft, chewy, thick-style homemade flour tortillas soak up all the extra green chili," "My go-to for great huevos rancheros," and "Carne was awesome! Tender, flavorful, HOT!" The descriptions alone are enough to make one salivate, but the Yelp reviews for this gem of an eatery also tell another story — one so heavily spiced with the potential of duplicate listings that it may take the appetite of any hard-working local SEO away:

“Thru all of the location changes, this is a true family restaurant with home cooking.”

“This restaurant for whatever reason, changes locations every couple years or so.”

“They seem to wander from different locations”

“As other reviews have already mentioned, Perea's changes locations periodically (which is puzzling/inconvenient — the only reason they don't get 5 stars)”

“They switch locations every few years and the customers follow this place wherever it goes.”

Reading those, the local SEO sets aside sweet dreams of sopapillas because he very much doubts the accuracy of that last review comment. Are all customers really following this restaurant from place to place, or are visitors (with money to spend) being misdirected to false locations via outdated, inconsistent, and duplicate listings?

The local SEO can’t stand the suspense, so he fires up Moz Check Listing

He types in the most recent name/zip code combo he can find, and up comes:


A total of 2 different names, 3 different phone numbers, and 4 different addresses! In 5 seconds, the local SEO has realized that business listings around the web are likely misdirecting diners left and right, undoubtedly depriving the restaurant of revenue as locals fail to keep up with the inconvenient moves or travelers simply never find the right place at all. Sadly, two of those phone numbers return an out-of-service message, further lessening the chances that patrons will get to enjoy this establishment’s celebrated food. Where is all this bad data coming from?

The local SEO clicks on just the first entry to start gaining clues, and from there, he clicks on the duplicates tab for a detailed, clickable list of duplicates that Check Listing surfaces for that particular location:


From this simple Duplicates interface, you can immediately see that 1 Google My Business listing, 1 Foursquare listing, 3 Facebook Places, 1 Neustar Localeze listing, and 1 YP listing bear further investigation. Clicking the icons takes you right to the sources. You’ve got your clues now, and only need to solve your case. Interested?

The paid version of Moz Local supports your additions of multiple variants of the names, addresses, and phone numbers of clients to help surface further duplicates. Finally, your Moz Local dashboard also enables you to request closure of duplicates on our Direct Network partners. What a relief!

Chances are, most of your clients don’t move locations every couple of years (at least, we hope not!), but should an incoming client alert you to a move they’ve made in the past decade or so, it’s likely that a footprint of their old location still exists on the web. Even if they haven’t moved, they may have changed phone numbers or rebranded, and instead of editing their existing listings to reflect these core data changes, they may have ended up with duplicate listings that are then auto-replicating themselves throughout the ecosystem.

Google and local SEOs share a common emotion about duplicate listings: both feel uneasy about inconsistent data they can’t trust, knowing the potential to misdirect and frustrate human users. Feeling unsettled about duplicates for an incoming client today?

Get your appetite back for powerful local SEO with our free Check Listing tool!

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!

Weird, Crazy Myths About Link Building in SEO You Should Probably Ignore - Whiteboard Friday

Posted by randfish

The rules of link building aren't always black and white, and getting it wrong can sometimes result in frustrating consequences. But where's the benefit in following rules that don't actually exist? In today's Whiteboard Friday, Rand addresses eight of the big link building myths making their rounds across the web.


Click on the whiteboard image above to open a high-resolution version in a new tab!

Video Transcription

Howdy, Moz fans, and welcome to another edition of Whiteboard Friday. This week we're going to chat about some of the weird and crazy myths that have popped up around link building. We've actually been seeing them in the comments of some of our blog posts and Whiteboard Fridays and Q&A. So I figured, hey, let's try and set the record straight here.

1. Never get links from sites with a lower domain authority than your own


What? No, that is a terrible idea. Domain authority, just to be totally clear, it's a machine learning system that we built here at Moz. It takes and looks at all the metrics. It builds the best correlation it can against Google's rankings across a broad set of keywords, similar to the MozCast 10K. Then it's trying to represent, all other things being equal and just based on raw link authority, how well would this site perform against other sites in Google's rankings for a random keyword? That does not in any way suggest whether it is a quality website that gives good editorial links, that Google is likely to count, that are going to give you great ranking ability, that are going to send good traffic to you. None of those things are taken into account with domain authority.

So when you're doing link building, I think DA can be a decent sorting function, just like Spam Score can. But those two metrics don't mean that something is necessarily a terrible place or a great place to get a link from. Yes, it tends to be the case that links from 80- or 90-plus DA sites tend to be very good, because those sites tend to give a lot of authority. It tends to be the case that links from sub-10 or 20 tend to not add that much value and maybe fail to have a high Spam Score. You might want to look more closely at them before deciding whether you should get a link.

But new websites that have just popped up or sites that have very few links or local links, that is just fine. If they are high-quality sites that give out links editorially and they link to other good places, you shouldn't fret or worry that just because their DA is low, they're going to provide no value or low value or hurt you. None of those things are the case.

2. Never get links from any directories


I know where this one comes from. We have talked a bunch about how low-quality directories, SEO-focused directories, paid link directories tend to be very bad places to get links from. Google has penalized not just a lot of those directories, but many of the sites whose link profiles come heavily from those types of domains.

However, lots and lots of resource lists, link lists, and directories are also of great quality. For example, I searched for a list of Portland bars — Portland, Oregon, of course known for their amazing watering holes. I found PDX Monthly's list of Portland's best bars and taverns. What do you know? It's a directory. It's a total directory of bars and taverns in Portland. Would you not want to be on there if you were a bar in Portland? Of course, you would want to be on there. You definitely want those. There's no question. Give me that link, man. That is a great freaking link. I totally want it.

This is really about using your good judgment and about saying there's a difference between SEO and paid link directories and a directory that lists good, authentic sites because it's a resource. You should definitely get links from the latter, not so much from the former.

3. Don't get links too fast or you'll get penalized


Let's try and think about this. Like Google has some sort of penalty line where they look at, "Oh, well, look at that. We see in August, Rand got 17 links. He was under at 15 in July, but then he got 17 links in August. That is too fast. We're going to penalize him."

No, this is definitely not the case. I think what is the case, and Google has filed some patent applications around this in the past with spam, is that a pattern of low-quality links or spammy-looking links that are coming at a certain pace may trigger Google to take a more close look at a site's link profile or at their link practices and could trigger a penalty.

Yes. If you are doing sketchy, grey hat/black hat link building with your private networks, your link buys, and your swapping schemes, and all these kinds of things, yeah, it's probably the case that if you get them too fast, you'll trip over some sort of filter that Google has got. But if you're doing the kind of link building that we generally recommend here on Whiteboard Friday and at Moz more broadly, you don't have risk here. I would not stress about this at all. So long as your links are coming from good places, don't worry about the pace of them. There's no such thing as too fast.

4. Don't link out to other sites, or you'll leak link equity, or link juice, or PageRank


...or whatever it is. I really like this illustration of the guys who are like, "My link juice. No!" This is just crap.

All right, again, it's a myth rooted in some fact. Historically, a long time ago, PageRank used to flow in a certain way, and it was the case that if a page had lots of links pointing out from it, that if I had four links, that a quarter each of the PageRank that this page could pass would go to each of them. So if I added one more, oh, now that's one-fifth, then that becomes one-fifth, and that becomes one-fifth. This is old, old, old-school SEO. This is not the way things are anymore.

PageRank is not the only piece of ranking algorithmic goodness that Google is using in their systems. You should not be afraid of linking out. You should not be afraid of linking out without a "nofollow" link. You, in fact, should link out. Linking out is not only correlated with higher rankings. There have also been a bunch of studies and research suggesting that there's something causal going on, because when followed links were added to pages, those pages actually outranked their non-link-carrying brethren in a bunch of tests. I'll try and link to that test in the Whiteboard Friday. But regardless to say, don't stress about this.

5. Variations in anchor text should be kept to precise proportions


So this idea that essentially there's some magic formula for how many of your keyword anchor text, anchor phrases should be branded, partially branded, keyword match links that are carrying anchor text that's specifically for the keywords you're trying to rank for, and random assorted anchor texts and that you need some numbers like these, also a crazy idea.

Again, rooted in some fact, the fact being if you are doing sketchy forms of link building of any kind, it's probably the case that Google will take a look at the anchor text. If they see that lots of things are kind of keyword-matchy and very few things contain your brand, that might be a trigger for them to look more closely. Or it might be a trigger for them to say, "Hey, there's some kind of problem. We need to do a manual review on this site."

So yes, if you are in the grey/black hat world of link acquisition, sure, maybe you should pay some attention to how the anchor text looks. But again, if you're following the advice that you get here on Whiteboard Friday and at Moz, this is not a concern.

6. Never ask for a link directly or you risk penalties


This one I understand, because there have been a bunch of cases where folks or organizations have sent out emails, for example, to their customers saying, "Hey, if you link to us from your website, we'll give you a discount," or, "Hey, we'd like you to link to this resource, and in exchange this thing will happen," something or other. I get that those penalties and that press around those types of activities has made certain people sketched out. I also get that a lot of folks use it as kind of blackmail against someone. That sucks.

Google may take action against people who engage in manipulative link practices. But for example, let's say the press writes about you, but they don't link to you. Is asking for a link from that piece a bad practice? Absolutely not. Let's say there's a directory like the PDX Monthly, and they have a list of bars and you've just opened a new one. Is asking them for a link directly against the rules? No, certainly not. So there are a lot of good ways that you can directly ask for links and it is just fine. When it's appropriate and when you think there's a match, and when there's no sort of bribery or paid involvement, you're good. You're fine. Don't stress about it.

7. More than one link from the same website is useless


This one is rooted in the idea that, essentially, diversity of linking domains is an important metric. It tends to be the case that sites that have more unique domains linking to them tend to outrank their peers who have only a few sites linking to them, even if lots of pages on those individual sites are providing those links.

But again, I'm delighted with my animation here of the guys like, "No, don't link to me a second time. Oh, my god, Smashing Magazine." If Smashing Magazine is going to link to you from 10 pages or 50 pages or 100 pages, you should be thrilled about that. Moz has several links from Smashing Magazine, because folks have written nice articles there and pointed to our tools and resources. That is great. I love it, and I also want more of those.

You should definitely not be saying "no." You shouldn't be stopping your link efforts around a site, especially if it's providing great traffic and high-quality visits from those links pointing to you. It's not just the case that links are there for SEO. They're also there for the direct traffic that they pass, and so you should definitely be investing in those.

8. Links from non-relevant sites or sites or pages or content that's outside your niche won't help you rank better


This one, I think, is rooted in that idea that Google is essentially looking and saying like, "Hey, we want to see that there's relevance and a real reason for Site A to link to Site B." But if a link is editorial, if it's coming from a high-quality place, if there's a reason for it to exist beyond just, "Hey, this looks like some sort of sketchy SEO ploy to boost rankings," Googlebot is probably going to count that link and count it well.

I would not be worried about the fact that if I'm and I'm selling coffee online or have a bunch of coffee resources and wants to link to me or they happen to link to me, I'm not going to be scared about that. In fact, I would say that, the vast majority of the time, off-topic links from places that have nothing to do with your website are actually very, very helpful. They tend to be hard for your competitors to get. They're almost always editorially given, especially when they're earned links rather than sort of cajoled or bought links or manipulative links. So I like them a lot, and I would not urge you to avoid those.

So with that in mind, if you have other link ideas, link myths, or link facts that you think you've heard and you want to verify them, please, I invite you to leave them in the comments below. I'll jump in there, a bunch of our associates will jump in there, folks from the community will jump in, and we'll try and sort out what's myth versus reality in the link building world.

Take care. We'll see you again next week for another edition of Whiteboard Friday.

Video transcription by

Feeling inspired by reality? Start building quality links with OSE.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!

Introducing Progressive Web Apps: What They Might Mean for Your Website and SEO

Posted by petewailes

Progressive Web Apps. Ah yes, those things that Google would have you believe are a combination of Ghandi and Dumbledore, come to save the world from the terror that is the Painfully Slow WebsiteTM.

But what actually makes a PWA? Should you have one? And if you create one, how will you make sure it ranks? Well, read on to find out...

What's a PWA?

Given as that Google came up with the term, I thought we'd kick off with their definition:

"A Progressive Web App uses modern web capabilities to deliver an app-like user experience."
Progressive Web Apps

The really exciting thing about PWAs: they could make app development less necessary. Your mobile website becomes your app. Speaking to some of my colleagues at Builtvisible, this seemed to be a point of interesting discussion: do brands need an app and a website, or a PWA?

Fleshing this out a little, this means we'd expect things like push notifications, background sync, the site/app working offline, having a certain look/design to feel like a native application, and being able to be set on the device home screen.

These are things we traditionally haven't had available to us on the web. But thanks to new browsers supporting more and more of the HTML5 spec and advances in JavaScript, we can start to create some of this functionality. On the whole, Progressive Web Apps are:

Work for every user, regardless of browser choice because they're built with progressive enhancement as a core tenet.
Fit any form factor: desktop, mobile, tablet, or whatever is next.
Connectivity independent
Enhanced with service workers to work offline or on low quality networks.
Feel like an app to the user with app-style interactions and navigation because they're built on the app shell model.
Always up-to-date thanks to the service worker update process.
Served via HTTPS to prevent snooping and ensure content hasn't been tampered with.
Are identifiable as "applications" thanks to W3C manifests and service worker registration scope allowing search engines to find them.
Make re-engagement easy through features like push notifications.
Allow users to "keep" apps they find most useful on their home screen without the hassle of an app store.
Easily share via URL and not require complex installation.
Source: Your First Progressive Web App (Google)

It's worth taking a moment to unpack the "app-like" part of that. Fundamentally, there are two parts to a PWA: service workers (which we'll come to in a minute), and application shell architecture. Google defines this as:

...the minimal HTML, CSS, and JavaScript powering a user interface. The application shell should:
  • load fast
  • be cached
  • dynamically display content
An application shell is the secret to reliably good performance. Think of your app's shell like the bundle of code you'd publish to an app store if you were building a native app. It's the load needed to get off the ground, but might not be the whole story. It keeps your UI local and pulls in content dynamically through an API.
Instant Loading Web Apps with an Application Shell Architecture

This method of loading content allows for incredibly fast perceived speed. We are able to get something that looks like our site in front of a user almost instantly, just without any content. The page will then go and fetch the content and all's well. Obviously, if we actually did things this way in the real world, we'd run in to SEO issues pretty quickly, but we'll address that later too.

If then, at their core, a Progressive Web App is just a website served in a clever way with extra features for loading stuff, why would we want one?

The use case

Let me be clear before I get into this: for most people, a PWA is something you don't need. That's important enough that it bares repeating, so I'll repeat it:

You probably don't need a PWA.

The reason for this is that most websites don't need to be able to behave like an app. This isn't to say that there's no benefit to having the things that PWA functionality can bring, but for many sites, the benefits don't outweigh the time it takes to implement the functionality at the moment.

When should you look at a PWA then? Well, let's look at a checklist of things that may indicate that you do need one...

Signs a PWA may be appropriate

You have:

  • Content that regularly updates, such as stock tickers, rapidly changing prices or inventory levels, or other real-time data
  • A chat or comms platform, requiring real-time updates and push notifications for new items coming in
  • An audience likely to pull data and then browse it offline, such as a news app or a blog publishing many articles a day
  • A site with regularly updated content which users may check in to several times a day
  • Users who are mostly using a supported browser

In short, you have something beyond a normal website, with interactive or time-sensitive components, or rapidly released or updated content. A good example is the Google Weather PWA:


If you're running a normal site, with a blog that maybe updates every day or two, or even less frequently, then whilst it might be nice to have a site that acts as a PWA, there's probably more useful things you can be doing with your time for your business.

How they work

So, you have something that would benefit from this sort of functionality, but need to know how these things work. Welcome to the wonder that is the service worker.

Service workers can be thought of as a proxy that sits between your website and the browser. It calls for intercept of things you ask the browser to do, and hijacking of the responses given back. That means we can do things like, for example, hold a copy of data requested, so when it's asked for again, we can serve it straight back (this is called caching). This means we can fetch data once, then replay it a thousand times without having to fetch it again. Think of it like a musician recording an album — it means they don't have to play a concert every time you want to listen to their music. Same thing, but with network data.

If you want a more thorough explanation of service workers, check out this moderately technical talk given by Jake Archibald from Google.


What service workers can do

Service workers fundamentally exist to deliver extra features, which have not been available to browsers until now. These includes things like:

  • Push notifications, for telling a user that something has happened, such as receiving a new message, or that the page they're viewing has been updated
  • Background sync, for updating data while a user isn't using the page/site
  • Offline caching, to allow a for an experience where a user still may be able to access some functionality of a site while offline
  • Handling geolocation or other device hardware-querying data (such as device gyrpscope data)
  • Pre-fetching data a user will soon require, such as images further down a page

It's planned that in the future, they'll be able to do even more than they currently can. For now though, these are the sorts of features you'll be able to make use of. Obviously these mostly load data via AJAX, once the app is already loaded.

What are the SEO implications?

So you're sold on Progressive Web Apps. But if you create one, how will you make sure it ranks? As with any new front-end technology, there are always implications for your SEO visibility. But don't panic; the potential issues you'll encounter with a PWA have been solved before by SEOs who have worked on JavaScript-heavy websites. For a primer on that, take a look at this article on JS SEO.

There are a few issues you may encounter if you're going to have a site that makes use of application shell architecture. Firstly, it's pretty much required that you're going to be using some form of JS framework or view library, like Angular or React. If this is the case, you're going to want to take a look at some Angular.JS or React SEO advice. If you're using something else, the short version is you'll need to be pre-rendering pages on the server, then picking up with your application when it's loaded. This enables you to have all the good things these tools give you, whilst also serving something Google et al can understand. Despite their recent advice that they're getting good at rendering this sort of application, we still see plenty of examples in the wild of them flailing horribly when they crawl heavy JS stuff.

Assuming you're in the world of clever JS front-end technologies, to make sure you do things the PWA way, you'll also need to be delivering the CSS and JS required to make the page work along with the HTML. Not just including script tags with the <code>src attribute, but the whole file, inline.

Obviously, this means you're going to increase the size of the page you're sending down the wire, but it has the upside of meaning that the page will load instantly. More than that, though, with all the JS (required for pick-up) and CSS (required to make sense of the design) delivered immediately, the browser will be able to render your content and deliver something that looks correct and works straightaway.

Again, as we're going to be using service workers to cache content once it's arrived, this shouldn't have too much of an impact. We can also cache all the CSS and JS external files required separately, and load them from the cache store rather than fetching them every time. This does make it very slightly more likely that the PWA will fail on the first time that a user tries to request your site, but you can still handle this case gracefully with an error message or default content, and re-try on the next page view.

There are other potential issues people can run in to, as well. The Washington Post, for example, built a PWA version of their site, but it only works on a mobile device. Obviously, that means the site can be crawled nicely by Google's mobile bots, but not the desktop ones. It's important to respect the P part of the acronym — the website should enable features that a user can make use of, but still work in a normal manner for those who are using browsers that don't support them. It's about enhancing functionality progressively, not demanding that people upgrade their browser.

The only slightly tricky thing with all of this is that it requires that, for best experience, you design your application for offline-first experiences. How that's done is referenced in Jake's talk above. The only issue with going down that route: you're only serving content once someone's arrived at your site and waited long enough to load everything. Obviously, in the case of Google, that's not going to work well. So here's what I'd suggest...

Rather than just sending your application shell, and then using AJAX to request content on load, and then picking up, use this workflow instead:

  • User arrives at site
  • Site sends back the application shell (the minimum HTML, JS, and CSS to make everything work immediately), along with...
  • ...the content AJAX response, pre-loaded as state for the application
  • The application loads that immediately, and then picks up the front end.

Adding in the data required means that, on load, we don't have to make an AJAX call to get the initial data required. Instead, we can bundle that in too, so we get something that can render content instantly as well.

As an example of this, let's think of a weather app. Now, the basic model would be that we send the user all the content to show a basic version of our app, but not the data to say what the weather is. In this modified version, we also send along what today's weather is, but for any subsequent data request, we then go to the server with an AJAX call.

This means we still deliver content that Google et al can index, without possible issues from our AJAX calls failing. From Google and the user's perspective, we're just delivering a very high-performance initial load, then registering service workers to give faster experiences for every subsequent page and possibly extra functionality. In the case of a weather app, that might mean pre-fetching tomorrow's weather each day at midnight, or notifying the user if it's going to rain, for example.

Going further

If you're interested in learning more about PWAs, I highly recommend reading this guide to PWAs by Addy Osmani (a Google Chrome engineer), and then putting together a very basic working example, like the train one Jake mentions in his YouTube talk referenced earlier. If you're interested in that, I recommend Jake's Udacity course on creating a PWA available here.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!

A Dozen Digestible Takeaways from 2016's E-Commerce Benchmarks Study

Posted by Alan_Coleman

Hey Moz Blog readers.

I’m delighted to share with you a big body of work the Wolfgang team has just completed. It’s our E-commerce Benchmarks 2016 study. We dove into Google Analytics insights from over 80 million website sessions and over one-quarter of a billion dollars in online revenue for travel and retail websites, calculating average e-commerce website key performance indicators (KPIs) for you to use as benchmarks.

I hope these findings help you benchmark your KPIs and gain deeper insights into what you can do to boost conversion.

There are a number of unique features to this study:

  • We’ve divvied the results up into overall, travel, and retail. Within the retail cohort, we’ve broken out results for our "online only" retailers and "multichannel" retailers. The KPIs are distinctly different for the two sets of retailers.
  • We’ve conducted a correlation study in which we correlate all the factors of the study with conversion rate and with average order value.
  • We’ve expanded the scope of the study since last time and based on your comments, we’ve included site speed analysis, as well as more info around paths to conversion and assisted conversions.

In this post I’m going to give you an overview of 12 key takeaways. You can read the full report here. Or grab some quick insights from our infographic here.

1/ The average e-commerce conversion rate is 1.48%.

  • Retail websites averaged 1.36%.
  • Online-only websites converted almost twice as well as their multi-channel counterparts with 2%, compared to 1.12%.
  • The travel websites in the study averaged a 2.04% conversion rate.

It was notable that the travel websites enjoyed higher conversion rates but lower engagement rates than the average retailer. This spiked my curiosity, as that just seemed too darn easy for the travel retailers. After deep-diving the data, I found that the committed retail customer would visit the one retail website multiple times on their journey to purchase. On the other hand, the travel shopper does a lot of research, but on other websites, review sites, via online travel agents, travel bloggers, etc. before arriving at the e-commerce website to merely check price and availability before booking. This finding illuminates the fact that the retailer has more influence on its customers' journey to purchase than the travel website, who's more dependent on an ecosystem of travel websites to warm up the prospect.


Click the image to open a still image in a new tab

2/ The death of SEO?

The data states it emphatically: "Hell no!"

Google organic is the largest source of both traffic (43%) and revenue (42%). SEO traffic from Google organic has actually increased by 5% since our last study.

There was also a strong correlation between websites with a high percentage of traffic from Google organic and higher-than-average Average Order Values (AOVs).

From this finding, we can infer that broad organic coverage will be rewarded by transactions from research-heavy, high-value customers.

3/ AdWords is the king of conversion

The strongest correlation we saw with higher conversion rates was higher-than-average traffic and revenue from AdWords.

In my experience, Google AdWords is the best-converting traffic source. So my take is that, when a website increases its spend on Adwords, it adds more high-conversion traffic to its profile and increases its average conversion rate.

AdWords accounts for 26% of traffic and 25% of revenue on average.

4/ Google makes the World Wide Web go 'round

When you combine Google organic and PPC, you see that Google accounts for 69% of traffic and 67% of revenue. More than two-thirds! Witness the absolute dominance of “The Big G” as our window to the web.


Click the image to open a still image in a new tab

5/ Facebook traffic quadruples!

In our last study, Facebook accounted for a meagre 1.3% of traffic. This time around, it's leapt up to 5%, with Facebook CPC emerging from nowhere to 2%. When better cross-device measurement becomes available in Google Analytics, I expect Facebook to be seen as an assisted conversion power player.

6/ Don’t discount email

Email delivers 6% of traffic, which is actually as much as all the social channels combined — and treble the revenue. In fact, with a 6% share of revenue, Google is the only medium that delivers more revenue than email. Digital marketers often lust after shiny new toys (hello, Snapchat!), but the advice here is to look after the old reliables first. And this 40-year-old technology we all use every day is about as old and reliable as it gets.

7/ Site speed matters most

This section was added to the study after comments from you, the Moz Blog readers, last time around, so thanks for your input. The server response time correlation with conversion rate (-0.31) was one of the strongest we saw. It was dramatically stronger than engagement metrics, such as time on site (0.11) or pages viewed (0.10). We also found that for every two-tenths of a second you shave off your server response time, you'll increase conversion rate by 8%. Don’t forget that site speed is a Google ranking factor, so by optimizing for it you'll benefit from a "multiplier effect" of more traffic and a higher conversion rate on all your traffic. Google’s page speed tool is a great place to start your speed optimization journey.

Check out our conversion rate correlation chart below to get more insights on which metrics can move conversion rate.


Click the image to open a still image in a new tab

8/ Mobile is our "decision device"

2015 was finally "the year of mobile." Mobile became the largest traffic source of the devices, but seriously underperforms for revenue. Its 42% share of traffic becomes a miserly 21% share of revenue, and it suffers the lowest average conversion rate and AOV. Despite these lowly conversion metrics, our correlation study found that websites with a larger-than-average portion of mobile traffic benefitted from larger-than-average conversion rates. This indicates that the "PA in your pocket" is the device upon which decisions are arrived at before being completed on desktop. We can deduce that while desktop remains our "transaction device," mobile has become our "decision device," where research is carried out and purchase decisions arrived at.

9/ Digital marketers are over-indexing on display advertising

Despite accounting for 38% of digital marketers budgets (IAB Europe), display failed to register as a top ten traffic source. This means it contributed less than 1% of e-commerce website traffic.

10/ Bounce rate don’t mean diddly squat

Bounce rate actually has zero correlation with conversion rate! Digital marketers feel a deep sense of rejection when they see a high bounce rate. However, as an overall website metric, it’s a dud. While admittedly there are bad bounces, there are many good bounces accounted for in the number.

11/ Digital marketing "economies of scale"

Interestingly, websites that enjoyed more-than-average traffic levels enjoyed higher-than-average conversion rates.

This illustrates a digital marketing version of "economies of scale"; more traffic equals better conversion rates.

The corollary of this is lower CPAs (Cost Per Acquisitions).

12/ People are buying more frequently and spending more per order online.

Average conversion rates have increased 10% since the last study. Retail average order value has shot up a whopping 25%! This demonstrates people are migrating more and more of their shopping behavior off the high street and onto the Internet. There’s never been a better time to be an e-commerce digital marketer.

You can deep-dive the above digestibles by reading the full study here.

How do these benchmarks compare to your personal experience? Anything you're surprised by, or that confirms your long-held suspicions?

I’d love to hear your thoughts in the comments below.

Optimize hard,


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!

Here’s How to Generate and Insert Rel Canonical with Google Tag Manager

Posted by luciamarin

This post was originally in YouMoz, and was promoted to the main blog because it provides great value and interest to our community. The author's views are entirely his or her own and may not reflect the views of Moz, Inc.

In this article, we’re going to learn how to create the rel canonical URL tag using Google Tag Manager, and how to insert it in every page of our website so that the correct canonical is automatically generated in each URL.

We’ll do it using Google Tag Manager and its variables.

Why send a canonical from each page to itself?

Javier Lorente gave us a very good explanation/reminder at the 2015 SEO Salad event in Zaragoza (Spain). In short, there may be various factors that cause Google to index unexpected variants of a URL, and this is often beyond our control:

  • External pages that display our website but use another URL (e.g., Google’s own cache, other search engines and content aggregators,, etc.). This way, Google will know which one is the original page at all times.
  • Parameters that are irrelevant to SEO/content such as certain filters and order sequences

By including this “standard” canonical in every URL, we are making it easy for Google to identify the original content.

How do we generate the dynamic value of the canonical URL?

To generate the canonical URL, dynamically we need to force it to always correspond to the “clean" (i.e., absolute, unique, and simplified) URL of each page (taking into account the www, URL query string parameters, anchors, etc.).

Remember that, in summary, the URL variables that can be created in GTM (Google Tag Manager) correspond to the following components:


We want to create a unique URL for each page, without queries or anchors. We need a “clean” URL variable, and we can’t use the {{Page URL}} built-in variable, for two reasons:

  1. Although fragment doesn’t form part of the URL by default, query string params does
  2. Potential problems with protocol and hostname, if different options are admitted (e.g., SSL and www)

Therefore, we need to combine Protocol + Host + Path into a single variable.

Now, let's take a step-by-step look at how to create our {{Page URL Canonical}} variable.

1. Create {{Page Protocol}} to compile the section of the URL according to whether it’s an http:// or https://


Note: We’re assuming that the entire website will always function under a single protocol. If that’s not the case, then we should substitute the {{Page Protocol}} variable for plain text in the final variable of Step #4. (This will allow us to force it to always be http/https, without exception.)

2. Create {{Page Hostname Canonical}}

We need a variable in which the hostname is always unique, whether or not it’s entered into the browser with the www. The hostname canonical must always be the same, regardless of whether or not it has the www. We can decide based on which one of the domains is redirected to the other, and then keep the original as the canonical.

How do we create the canonical domain?

  • Option 2.1: Redirect the domain with www. to a domain without www. via 301
    Our canonical URL is WITHOUT www. We need to create Page Hostname, but make sure we always remove the www:
  • Option 2.2: Redirect the domain without www. to a domain with www. via 301
    Our canonical URL is WITH www. We need to create Page Hostname without www (like before), and then insert the www in front using a constant variable:

3. Enable the {{Page Path}} built-in variable


Note: Although we have the {{Page Hostname}} built-in variable, for this exercise it’s preferable not to use it, as we’re not 100% sure how it will behave in relation to the www (e.g., in this instance, it’s not configurable, unlike when we create it as a GTM custom variable).

4. Create {{Page URL Canonical}}

Link the three previous variables to form a constant variable:

{{Page Protocol}}://{{Page Hostname Canonical}}{{Page Path}}

Summary/Important notes:

  1. Protocol: returns http / https (without ://), which is why we enter this part by hand
  2. Hostname: we can force removal of the www. or not
  3. Path: included from the slash /. Does not include the query, so it's perfect. We use the built-in option for Page Path.


Now that we have created {{Page URL Canonical}}, we could even populate it into Google Analytics via custom dimensions. You can learn to do that in this Google Analytics custom dimensions guide.

How can we insert the canonical into a page using Tag Manager?

Let’s suppose we’ve already got a canonical URL generated dynamically via GTM: {{Page URL Canonical}}.

Now, we need to look at how to insert it into the page using a GTM tag. We should emphasize that this is NOT the “ideal” solution, as it’s always preferable to insert the tag into the <head> of the source code. But, we have confirming evidence from various sources that it DOES work if it’s inserted via GTM. And, as we all know, in most companies, the ideal doesn’t always coincide with the possible!

If we could insert content directly into the <head> via GTM, it would be sufficient to use the following custom HTML tag:

<link href=”{{Page URL Canonical}}” />

But, we know that this won’t work because the inserted content in HTML tags usually goes at the end of the </body>, meaning Google won’t accept or read a <link rel="canonical"> tag there.

So then, how do we do it? We can use JavaScript code to generate the tag and insert it into the <head>, as described in this article, but in a form that has been adapted for the canonical tag:

<script> var c = document.createElement('link');  c.;  c.href = {{Page URL Canonical}};  document.head.appendChild(c);</script>

And then, we can set it to fire on the “All Pages” trigger. Seems almost too easy, doesn’t it?


How do we check whether our rel canonical is working?

Very simple: Check whether the code is generated correctly on the page.

How do we do that?

By looking at the DevTools Console in Chrome, or by using a browser plugin like like Firebug that returns the code generated on the page in the DOM (document object model). We won't find it in the source code (Ctrl+U).

Here’s how to do this step-by-step:

  1. Open Chrome
  2. Press F12
  3. Click on the first tab in the console (Elements)
  4. Press Ctrl+F and search for “canonical”
  5. If the URL appears in the correct form at the end of the <head>, that means the tag has been generated correctly via Tag Manager

That's it. Easy-peasy, right?

So, what are your thoughts?

Do you also use Google Tag Manager to improve your SEO? Why don’t you give us some examples of when it’s been useful (or not)?

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!

Search in Pics: Bing Ads agency awards, Google baby bib & crossroads

In this week’s Search In Pictures, here are the latest images culled from the web, showing what people eat at the search engine companies, how they play, who they meet, where they speak, what toys they have and more. Bing Ads agency awards event: Source: Twitter Google baby big: Source:...

Please visit Search Engine Land for the full article.

How to Generate Content Ideas Using Screaming Frog in 20(ish) Minutes

Posted by Todd_McDonald

A steady rise in content-related marketing disciplines and an increasing connection between effective SEO and content has made the benefits of harnessing strategic content clearer than ever. However, success isn't always easy. It's often quite difficult, as I’m sure many of you know.

A number of challenges must be overcome for success to be realized from end-to-end, and finding quick ways to keep your content ideas fresh and relevant is invaluable. To help with this facet of developing strategic content, I’ve laid out a process below that shows how a few SEO tools and a little creativity can help you identify content ideas based on actual conversations your audience is having online.

What you’ll need

Screaming Frog: The first thing you’ll need is a copy of Screaming Frog (SF) and a license. Fortunately, it isn’t expensive (around $150/USD for a year) and there are a number of tutorials if you aren’t familiar with the program. After you’ve downloaded and set it up, you’re ready to get to work.

Google AdWords Account: Most of you will have access to an AdWords account due to actually running ads through it. If you aren’t active with the AdWords system, you can still create an account and use the tools for free, although the process has gotten more annoying over the years.

Excel/Google Drive (Sheets): Either one will do. You'll need something to work with the data outside of SF.

Browser: We walk through the examples below utilizing Chrome.

The concept

One way to gather ideas for content is to aggregate data on what your target audience is talking about. There are a number of ways to do this, including utilizing search data, but it lags behind real-time social discussions, and the various tools we have at our disposal as SEOs rarely show the full picture without A LOT of monkey business. In some situations, determining intent can be tricky and require further digging and research. On the flipside, gathering information on social conversations isn’t necessarily that quick either (Twitter threads, Facebook discussion, etc.), and many tools that have been built to enhance this process are cost-prohibitive.

But what if you could efficiently uncover hundreds of specific topics, long-tail queries, questions, and more that your audience is talking about, and you could do it in around 20 minutes of focused work? That would be sweet, right? Well, it can be done by using SF to crawl discussions that your audience is having online in forums, on blogs, Q&A sites, and more.

Still here? Good, let’s do this.

The process

Step 1 – Identifying targets

The first thing you’ll need to do is identify locations where your ideal audience is discussing topics related to your industry. While you may already have a good sense of where these places are, expanding your list or identifying sites that match well with specific segments of your audience can be very valuable. In order to complete this task, I'll utilize Google’s Display Planner. For the purposes of this article, I'll walk through this process for a pretend content-driven site in the Home and Garden vertical.

Please note, searches within Google or other search engines can also be a helpful part of this process, especially if you're familiar with advanced operators and can identify platforms with obvious signatures that sites in your vertical often use for community areas. WordPress and vBulletin are examples of that.

Google’s Display Planner

Before getting started, I want to note I won’t be going deep on how to use the Display Planner for the sake of time, and because there are a number of resources covering the topic. I highly suggest some background reading if you’re not familiar with it, or at least do some brief hands-on experimenting.

I’ll start by looking for options in Google’s Display Planner by entering keywords related to my website and the topics of interest to my audience. I’ll use the single word “gardening.” In the screenshot below, I’ve selected “individual targeting ideas” from the menu mid-page, and then “sites.” This allows me to see specific sites the system believes match well with my targeting parameters.


I'll then select a top result to see a variety of information tied to the site, including demographics and main topics. Notice that I could refine my search results further by utilizing the filters on the left side of the screen under “Campaign Targeting.” For now, I'm happy with my results and won’t bother adjusting these.

Step 2 – Setting up Screaming Frog

Next, I'll take the website URL and open it in Chrome.

Once on the site, I need to first confirm that there's a portion of the site where discussion is taking place. Typically, you’ll be looking for forums, message boards, comment sections on articles or blog posts, etc. Essentially, any place where users are interacting can work, depending on your goals.

In this case, I'm in luck. My first target has a “Gardening Questions” section that's essentially a message board.


A quick look at a few of the thread names shows a variety of questions being asked and a good number of threads to work with. The specific parameters around this are up to you — just a simple judgment call.

Now for the fun part — time to fire up Screaming Frog!

I’ll utilize the “Custom Extraction” feature found here:

Configuration → Custom → Extraction

...within SF (you can find more details and broader use-case documentation set for this feature here). Utilizing Custom Extraction will allow me to grab specific text (or other elements) off of a set of pages.

Configuring extraction parameters

I'll start by configuring the extraction parameters.


In this shot I've opened the custom extraction settings and have set the first extractor to XPath. I need multiple extractors set up, because multiple thread titles on the same URL need to be grabbed. You can simply cut and paste the code into the next extractors — but be sure to update the number sequence (outlined in orange) at the end to avoid grabbing the same information over and over.

Notice as well, I've set the extraction type to “extract text.” This is typically the cleanest way to grab the information needed, although experimentation with the other options may be required if you’re having trouble getting the data you need.

Tip: As you work on this, you might find you need to grab different parts of the HTML than what you thought. This process of getting things dialed can take some trial-and-error (more on this below).

Grabbing Xpath code

To grab the actual extraction code we need (visible in the middle box above):

  1. Use Chrome
  2. Navigate to a URL with the content you want to capture
  3. Right-click on the text you’d like to grab and select “inspect” or “inspect element”


Make sure you see the text you want highlighted in the code view, then right-click and select “XPath” (you can use other options, but I recommend reviewing the SF documentation mentioned above first).


It’s worth noting that many times, when you're trying to grab the XPath for the text you want, you’ll actually need to select the HTML element one level above the text selected in the front-end view of the website (step three above).

At this point, it’s not a bad idea to run a very brief test crawl to make sure the desired information is being pulled. To do this:

  1. Start the crawler on the URL of the page where the XPath information was copied from
  2. Stop the crawler after about 10–15 seconds and navigate to the “custom” tab of SF, set the filter to “extraction” (or something different if you adjusted naming in some way), and look for data in the extractor fields (scroll right). If this is done right, I’ll see the text I wanted to grab next to one of the first URLs crawled. Bingo.

(image)Resolving extraction issues & controlling the crawl

Everything looks good in my example, on the surface. What you’ll likely notice, however, is that there are other URLs listed without extraction text. This can happen when the code is slightly different on certain pages, or SF moves on to other site sections. I have a few options to resolve this issue:

  1. Crawl other batches of pages separately walking through this same process, but with adjusted XPath code taken from one of the other URLs.
  2. Switch to using regex or another option besides XPath to help broaden parameters and potentially capture the information I'm after on other pages.
  3. Ignore the pages altogether and exclude them from the crawl.

In this situation, I'm going to exclude the pages I can’t pull information from based on my current settings and lock SF into the content we want. This may be another point of experimentation, but it doesn’t take much experience for you to get a feel for the direction you’ll want to go if the problem arises.

In order to lock SF to URLs I would like data from, I’ll use the “include” and “exclude” options under the “configuration” menu item. I’ll start with include options.


Here, I can configure SF to only crawl specific URLs on the site using regex. In this case, what’s needed is fairly simple — I just want to include anything in the /questions/ subfolder, which is where I originally found the content I want to scrape. One parameter is all that’s required, and it happens to match the example given within SF ☺:


The “excludes” are where things get slightly (but only slightly) trickier.

During the initial crawl, I took note of a number of URLs that SF was not extracting information from. In this instance, these pages are neatly tucked into various subfolders. This makes exclusion easy as long as I can find and appropriately define them.


In order to cut these folders out, I’ll add the following lines to the exclude filter:


Upon further testing, I discovered I needed to exclude the following folders as well:


It’s worth noting that you don’t HAVE to work through this part of configuring SF to get the data you want. If SF is let loose, it will crawl everything within the start folder, which would also include the data I want. The refinements above are far more efficient from a crawl perspective and also lessen the chance I'll be a pest to the site. It’s good to play nice.

Completed crawl & extraction example

Here’s how things look now that I've got the crawl dialed:


Now I'm 99.9% good to go! The last crawl configuration is to reduce speed to avoid negatively impacting the website (or getting throttled). This can easily be done by going to Configuration → Speed and reducing the number of threads and URIs that can be crawled. I usually stick with something at or under 5 threads and 2 URIs.

Step 3 – Ideas for analyzing data

After the end goal is reached (run time, URIs crawled, etc.) it’s time to stop the crawl and move on to data analysis. There a number of ways to start breaking apart the information grabbed that can be helpful, but for now I'll walk through one approach with a couple of variations.

Identifying popular words and phrases

My objective is to help generate content ideas and identify words and phrases that my target audience is using in a social setting. To do that, I’ll use a couple of simple tools to help me break apart my information:

The top two URLs perform text analysis, with some of you possibly already familiar with the basic word-cloud generating abilities of Online-Utility won’t pump out pretty visuals, but it provides a helpful breakout of common 2- to 8-word phrases, as well as occurrence counts on individual words. There are many tools that perform these functions; find the ones you like best if these don’t work!

I’ll start with

Utilizing Tagcrowd for analysis

The first thing I need to do is export a .csv of the data scraped from SF and combine all the extractor data columns into one. I can then remove blank rows, and after that scrub my data a little. Typically, I remove things like:

  • Punctuation
  • Extra spaces (the Excel “trim” function often works well)
  • Odd characters

Now that I've got a clean data set free of extra characters and odd spaces, I'll copy the column and paste it into a plain text editor to remove formatting. I often use the one online at

That leaves me with this:


In Editpad, you can easily copy your clean data and paste it into the entry box on Tagcrowd. Once you’ve done that, hit visualize and you’re there.


There are a few settings down below that can be edited in Tagcrowd, such as minimum word occurrence, similar word grouping, etc. I typically utilize a minimum word occurrence of 2, so that I have some level of frequency and cut out clutter, which I’ve used for this example. You may set a higher threshold depending on how many words you want to look at.

For my example, I've highlighted a few items in the cloud that are somewhat informational.

Clearly, there’s a fair amount of discussion around “flowers,” seeds,” and the words “identify" and “ID.” While I have no doubt my gardening sample site is already discussing most of these major topics such as flowers, seeds, and trees, perhaps they haven’t realized how common questions are around identification. This one item could lead to a world of new content ideas.

In my example, I didn’t crawl my sample site very deeply and thus my data was fairly limited. Deeper crawling will yield more interesting results, and you’ve likely realized already how in this example, crawling during various seasons could highlight topics and issues that are currently important to gardeners.

It’s also interesting that the word “please” shows up. Many would probably ignore this, but to me, it’s likely a subtle signal about the communication style of the target market I'm dealing with. This is polite and friendly language that I'm willing to bet would not show up on message boards and forums in many other verticals ☺. Often, the greatest insights besides understanding popular topics from this type of study are related to a better understanding of communication style, phrasing, and more that your audience uses. All of this information can help you craft your strategy for connection, content, and outreach.

Utilizing for analysis

Since I've already scrubbed and prepared my data for Tagcrowd, I can paste it into the Online-Utility entry box and hit “process text.”

After doing this, we ended up with this output:



There’s more information available, but for the sake of space, I've grabbed only a couple of shots to give you the idea of most of what you’ll see.

Notice in the first image, the phrases “identify this plant” & “what is this” both show up multiple times in the content I grabbed, further supporting the likelihood that content developed around plant identification is a good idea and something that seems to be in demand.

Utilizing Excel for analysis

Let’s take a quick look at one other method for analyzing my data.

One of the simplest ways to digest the information is in Excel. After scrubbing the data and combining it into one column, a simple A→Z sort, puts the information in a format that helps bring patterns to light.


Here, I can see a list of specific questions ripe for content development! This type of information, combined with data from tools such as, can help identify and capture long-tail search traffic and topics of interest that would otherwise be hidden.

Tip: Extracting information this way sets you up for very simple promotion opportunities. If you build great content that answers one of these questions, go share it back at the site you crawled! There’s nothing spammy about providing a good answer with a link to more information if the content you’ve developed is truly an asset.

It’s also worth noting that since this site was discovered through the Display Planner, I already have demographic information on the folks who are likely posting these questions. I could also do more research on who is interested in this brand (and likely posting this type of content) utilizing the powerful ad tools at Facebook.

This information allows me to quickly connect demographics with content ideas and keywords.

While intent has proven to be very powerful and will sometimes outweigh misaligned messaging, it’s always great to know as much about who you're talking to and be able to cater messaging to them.

Wrapping it up

This is just the beginning and it’s important to understand that.

The real power of this process lies in its usage of simple, affordable, tools to gain information efficiently — making it accessible to many on your team, and an easy sell to those that hold the purse strings no matter your organization size. This process is affordable for mid-size and small businesses, and is far less likely to result in waiting on larger purchases for those at the enterprise level.

What information is gathered and how it is analyzed can vary wildly, even within my stated objective of generating content ideas. All of it can be right. The variations on this method are numerous and allow for creative problem solvers and thinkers to easily gather data that can bring them great insight into their audiences’ wants, needs, psychographics, demographics, and more.

Be creative and happy crawling!

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!

Two weeks to SMX East – Register now!

In two weeks, SEOs and SEMs will gather at largest search marketing conference on the East Coast: SMX East. SMX East will be held in New York City, September 27-29. Don’t miss your chance to get the latest SEO and SEM tactics and connect with other search marketers. Here’s what you’ll get: Tactics...

Please visit Search Engine Land for the full article.

Digitizing traditional marketing methods to compete in local search

Columnist Lydia Jorden discusses techniques to digitize traditional marketing in a way that allows your brand to connect with its traditional audience while maintaining competitive local SERP positioning. The post Digitizing traditional marketing methods to compete in local search appeared first...

Please visit Search Engine Land for the full article.

Search in Pics: Google’s upcoming birthday, Bing 20% share balloons & fruit bar truck

In this week’s Search In Pictures, here are the latest images culled from the web, showing what people eat at the search engine companies, how they play, who they meet, where they speak, what toys they have and more. Bing Ads balloons for 20% UK marketshare: Source: Twitter Google Brazil...

Please visit Search Engine Land for the full article.

More posts are loading...