3/15/2014

Google CSE: How to exclude parts of your page from your Custom Search Engine

Most web pages have an area for main content.

And then there is the sidebar. Usually, it's full of ads, or something more useful -- related links, the latest posts, most popular articles, tags.

When you create your Google Custom Search Engine, there's no obvious way to exclude the sidebar content from filling up your results with useless info.

Can you stop your sidebar and navigation keywords from being indexed and showing up in your CSE results?

It is possible, according to this Google page. (I'm waiting to see if this really works, though. Some have reported that results still contain extraneous pages, but relevance is reduced):
https://support.google.com/customsearch/answer/2364585?hl=en

You have to wrap the sidebar content in a tag that includes this class:
class="nocontent"
Ex. <div id="mySidebar" class="nocontent graytext blackborder">

In this example, there are three classes, one of which is "nocontent".
(Did you know you can have multiple classes added to an element?)

Wait, too easy -- not done yet. Do this next:

  • Go to your Custom Search Engine admin page.
  • Look at the tabs -- it will start on [Basics].
  • Click on [Advanced].
  • Look for "CSE context".
  • Click the arrow to open the details.
  • Clickon [Download (XML)].
  • Save the file. It will probably be called "cse.xml".
  • Open the file in a text editor. 
  • The second tag is for <CustomSearchEngine ...>
  • Add this setting inside that tag at the end:  enable_nocontent_tag="true"
Ex. <CustomSearchEngine id="vdu79999999" creator="0029715399999999" language="en" encoding="UTF-8" enable_suggest="true" enable_nocontent_tag="true">

  • Save the file.
  • Click on the button that says: [Upload XML file].
  • Upload the new (or altered) "cse.xml" file back to Google.
To make sure that it uploaded correctly, click on [View in browser].
-------

So what is this supposed to do?

According to Google, any content given a class of "nocontent" will be ignored by your Custom Search Engine. It should still be indexed by the regular Google Search, and Googlebot will still follow all of the links. But what you have changed, is a setting for CSE only.

How long will it take?

I don't know, I just did it. I don't even know if it really works. lol.
Google is awesome, but I cannot say this will or will not work.
Some have reported that they still get extraneous results, but that those pages have less relevance.

What's an alternative method of hiding content?

Well, you could have your side content in a separate HTML document or script; and then bring it into your main page using Javascript. The first potential problem might be that your visitor might not have Javascript enabled on their browser or device; so the sidebar would be blank. (It's rare, but could happen.) Also, bots that scrape web pages are pretty smart, and they might insert the text in there for you regardless. Another problem is that you might have Javascript inside your sidebar, and it may not be invoked after loading.

Anyway, to do this, you would create an empty <div> with an ID where you sidebar would go, and then replace the contents: ex. <div id="mySidebar"></div>

To replace the contents, you could use jQuery and its load function.

If you don't want to use jQuery, your Javascript is going to need to use XMLHttpRequest and innerHTML in some way that I cannot explain here.

You could also use an <iframe>.

No comments :

Post a Comment