Link Development Strategy Session 
and Competitive Link Audit from Eric Ward
Spend some quality time with the guy 
Danny Sullivan calls "THE authority on links"
learn more
http://www.ericward.com/
Articles and commentary on the 
art of building links
by Eric Ward (aka The Link Mensch)
Linking Strategy Session | Linking Services | RSS Feed Publicity | Contact
.
 Updated February 2004
Link Mensch - Articles On Link Building Strategies and Tactics
 
Linking Mistakes To Avoid (Part 2): Removing Orphaned URLs

by Eric Ward, Link Mensch

Right now, as you read this, you probably have some orphaned URLs you don't know about, collecting dust in the forgotten pile at the bottom of the search engine indexes. It happens to the best of us. Even me, the self proclaimed Link Mensch, was humbled recently to discover several old URLs in AltaVista's database that no longer physically exist on my web server. Some expert I am...

During the life span of any web site, you create and update and delete and remove URLs on a regular or semi-regular basis.  New files go up, old ones come down, or get renamed and archived. Sometimes entire web sites with thousands of pages get re-hosted on new servers using new content management tools. I've even seen cases where every URL on a site changed at once.

What we all must remember is that at the same time we've been diligently running our web sites, adding, deleting, moving and archiving files and URLs, the search engine crawlers have been carousing the Web, and on occasion, our own sites, on a hit and run basis for years.  Maybe a crawler came across one of your URLs as it crawled a newsgroup post at Deja News a couple years ago. Maybe a newsletter wrote about your site and just as they archived that issue a crawler wandered by and stumbled onto your URL.  There are countless ways a crawler could have found your URLs without ever going near your server.  In fact, most of the URLs in any search engine's database were found and followed from sources other than your own site.

The question that matters most:

Of all the URLs your site has ever had in its lifetime, how many of them are still in the database of any given search engine?

Search engines do not know if the URLs they have recorded and indexed are still in existence at any given moment. Thus you may have updated your web site and removed links/URLs that the search engines still think exist.  Search results are nothing but placeholders for the contents of the page. Search results are a list of links.

Every URL from your site that no longer exists but which a search engine thinks does still exist is like a lump of coal to be turned into a diamond.  With search engines charging for indexing of URLs, it becomes even more important to revive those dead links before the engines find out they are dead and purge them. A purged URL is forever lost.

Nearly every marketer tries get their site fully indexed by the search engines. Most site owners wish they could get more of their sites' pages indexed. If you have old links showing up in search results, count yourself lucky.  And get busy making those dead links live again.

Finding them and fixing them

Here's one way to find out how many URLs from your site a search engine has indexed.  Go to Yahoo (since Google is stingy about giving you all the pasges your site has), and in the search box type

site:your domain

(replacing your domain with whatever your domain is, for example host:urlwire.com)

Look at the results. What you see is every single file that Google has in its index for my URLwire.com site and thus thinks are active.  Peruse the list. Put your mouse cursor over the clickable link but don't click. Look at the bottom of your browser to see the actual filename of the URL you're studying. Are all the filenames you see still in existence?  Probably not. Look at the filenames, and if some of them no longer exist on your site, create a new page with EXACTLY the same filename as the old one Yahoo thinks is still around, and get it on your server ASAP.  For my URLwire site, I happen to know for a fact that my site has less than 1,900 pages, yet Yahoo has 1,940 pages from my site indexed.  That means pages that know longer actually exist.

For example, let's say you used to have a sitemap page named site-map.html, and you see that file among the search results.  Now let's say that six months ago you changed that file to map.html, and removed the site-map.html file from your server. The search engine has no idea you removed the URL, and still has it a record of that page and what was on it.

You can also examine your own server logs to find all page requests that result in a 404 file not found server request. This even works if you use custom 404 pages.  This is how I discovered that on my site there was a file that had been returning 404 error messages about 30 times a day or almost 1,000 times a month. I created a file that had the same name and content as the one that no longer existed, and bingo, I have recaptured every bit of that lost traffic.  You can do the same thing. Start with your server logs and then try some test searches.

If you want to find out what URLs the engines have indexed from your site, Danny Sullivan's Search Engine Watch site has a section just for this at http://searchenginewatch.com/webmasters/checkurl.html

Until next time, I remain

Eric Ward, Link Mensch
 

About the Author
Eric Ward founded the Web's first service for announcing and linking Web sites back in 1994, and he still offers those services today.  His client list is a who's who of online brands. Ward is best known as the person behind the original linking campaigns for Amazon.com Books, The Link Exchange, Microsoft, Rodney Dangerfield, WarnerBros, The Discovery Channel, the AMA, and The Weather Channel. His services won the 1995 Tenagra Award For Internet Marketing Excellence, and he was selected as one of the Web's 100 most influential people by Websight magazine. Eric also writes columns for ClickZ and Ad Age magazine, and is the author of The Professional Reference Guide to Portal and Search Engine Links. Eric and his wife Melissa split time between offices in Knoxville, Tennessee and Santa Rosa Beach Florida. 
 
EricWard.com
Content Announcements & Link Building Strategies 
Publicity, link building, & site announcements for web content and feeds
Creation of topical linking, announcement and submission plans
Link building strategy development, analysis, consultation, and execution
Connecting useful content with its natural audience

Contact me | Eric Ward.com.

© EricWard.com  - All Rights Reserved
Knoxville, Tennessee
Contact: Via form | Via Email