There are many reasons you will see 404 errors in your log files. In case you forgot, a 404 error means a request was made for a file (or an object) and it did not exist. Some of the more common reasons are listed below.
– Moving a page:
Sometimes webmasters decide, for whatever reason, to reorganize their site. In these instances, references to old pages may still exist. This is especially true of off-site references – search engines, link exchanges and others. When you reorganize a site, you must remember to include 301 redirection to the new pages. This informs the search engines that your pages have moved.
– Renaming pages:
Sometimes pages are renamed. For example, you might have originally created “john.htm” and changed it to “johnsmith.htm”. In these instances, be sure and create 301 redirects when you do this.
Sometimes users do type URLs, especially if it’s been printed on a business card, in a magazine or in an advertisement. These URLs are often misspelled and this causes 404 errors. To prevent this, you can create 301 redirects for some common misspellings, and keep those URLs which are entered by users short.
– Misspellings in articles:
My articles get quoted and republished all over the web. Sometimes the URLs are misspelled, and this results in 404 errors. For example, I had a URL with a filename of “email.htm” and for some reason the author spelled the filename as “e-mail.htm”. I caught it quickly and created a 301 redirect. This is very important, as you don’t want visitors from those external links finding 404 errors.
– Some operating systems are case sensitive:
On Windows (IIS), the case (upper and lower characters) of files and folder names does not matter. Windows will match “ThisIsAPage” and “thisisapage” as identical. However, on Unix and Linux, these names are not identical. In fact, you have have both at the same time. This will cause 404 errors. It’s very common when webmasters move their site from Windows to another operating system. In general, it’s best to stick to lower-case characters for file and folder names, regardless of the operating system.
Internet Explorer introduced the idea of allowing a special icon graphic (called favicon.ico) on web sites. Thus, each time a surfer bookmarks a page using Internet Explorer, it attempts to find that icon file. If the file does not exist, an error is returned. To avoid this 404 error, be sure to include a favicon.ico file in each directory. This is a good idea anyway, as including an icon on bookmarks is a great way to brand your site.
– Robots.txt file:
The robots exclusion standard specifies that web sites should include a file called robots.txt in their root directory. This file indicates which parts of the web sites should NOT be spidered. Thus, spiders will attempt to open this file, and 404 errors will result of the file does not exist. You should always create a robots.txt file, even if it’s empty.
– Used domains:
If you purchase a domain name which has been previously owned, then you might find yourself getting strange 404 errors for files and directories which have no relation to your site. These are left over from the previous owner. There probably isn’t much you can do with these, although you might set up 301 redirects for some of the more active pages.
– Used TCP/IP addresses:
Some time ago, my site was hosted on a dedicated box. I noticed I was getting some strange errors, but I had owned the domain name for some time so I knew that was not the cause. After some investigation, I determined that someone had been using the TCP/IP address before, and I was catching some of the old traffic. This didn’t last long (just a month or so), but it produced a lot of 404 errors.
Was this information useful? What other tips would you like to read about in the future? Share your comments, feedback and experiences with us by commenting below!