Most people wildly overestimate the proportion of good technical content on their web pages. On some websites with complex menu systems and footers full of links, the actual body text which makes that page unique can be just a small fraction of what’s presented. Why should we expect a search engine to identify a page as being a great background to blue widgets, when 75% of the text presented is just page furniture which is nothing to do with blue widgets? But we’re mis-led by font sizes and colours, as well as the psychology of ignoring menus, into thinking that the main content jumping out at us actually represents the bulk of the data on display.
A little-noticed recent tweak to Google illustrates this well. Do a search, and under each result, you’ll see a little green arrow next to the page address. This takes you to Google’s “cached” copy of the page – the one it found, last time it visited. Along with the page, there are now two other options: “text only” and “source”. The text version is particularly interesting, as it shows what Google sees when all the presentational elements are stripped away. Your 100 words about blue widgets can look rather insignificant in this context.
It’s worth looking at these “text only” versions of pages from different sites, to see the differences. Look at the text of a Wikipedia page if one crops up in your search, and see what a model of clarity, structure and focus it is.