The need to structure web pages
Most people in SEO have heard the term "semantic search" - this means that a search engine is for a specific niche with the aim being that you get more relevant results for your searches.
Let's say you search on Google for "Derby" - you could mean the Kentucky Derby, Derby hat, Derby in England, hotels in Derby and so on.
So there's a whole lot of people creating search engines for these niches - this is a "good thing". We've looked at it ourselves and have a few test sites out there but the problem we have is that it's great that we have context for our results but what about the results themselves?
If I'm looking for Derby hats, I want to see pictures and prices, if I'm looking for Derby in England, I want to see the local council site, the football club, the university and so on.
So you're looking at totally different results pages for each niche. This means you need a purpose written crawler to get the information you need, in the format you need it in.
The problem then is (bear with me here) that every website is built differently. So let's go back to the Derby hats example - the crawler needs to be able to identify products, pictures and prices and produce a nicely structured result. This is almost impossible with a generic web crawler. To approach each and every retailer to get a feed would be quite some task so you're left with creating a specific crawler for each site.
Now this isn't a problem in itself - it's not a huge job to get structured content from most websites but there could be 20,000 Derby hats websites - to write a crawler for each site is now a BIG job.
For me, this is why semantic search is always going to be limited. Until websites adhere to standards of some sort (ideally XML or meta tags) then there will always be this problem.
Putting content into meta tags is not that big a deal. Various government websites should adhere to certain standards which mean that data can be derived from a page such as source, author, FOI status and so on. If someone came out with a meta standard for website content then this would really shake things up.
Imagine a whole load of meta data with product title, description, price, pictures - a crawler could simply pull that information down and display nicely formatted results.
Google, Yahoo! and MSN/Live Search (whatever) are ideally placed to put such a standard forward. If Google said "if you put these tags into your pages then we'll rank you on X" then you can imagine thousands of webmasters reacting immediately.
It's no small task but it's do'able (it's a word...) - over to you Google - "organize the world's information and make it universally accessible and useful."












I'm not sure... I don't think it's that bad is it?!