Monday, 21 August 2017

Morning musings

Waking up this morning, I got to thinking about one of the minor irritations on this blog.

Attentive readers will have noticed that I use unlikely three letter strings to group like or related posts together. So a group of posts about jelly lichen, perhaps posted over several weeks, might be grouped together by the inclusion of a line 'Group search key: jlb' at the end of each. Or 'hcf' might collect posts about a recent visit to Hampton Court Palace.

If one then keys 'hcf' into the search box (to the left of the sherlock holmes search icon, just above the top right of the blue square, bottom left in the illustration), blogger will return all the posts which contain that string. Elementary testing suggests that it will only find an 'hcf' which is properly separated out, delimited for example by spaces, and that it does not do startswith or contains, a good thing in this context. Don't know about wild cards. In any event, the idea is for the blog writer to use a search string which is unlikely to pop up by chance.

Now this works fine when one is reading the blog in the way intended. You see the search key at the bottom of the post and wanting to see the related posts, you put the key into the search box.

The irritation is that if I were to refer to all the posts with the key 'hcf' from some other post, that post would then itself become part of the group, by virtue of its inclusion of the key. Free text search is just that, it does not distinguish between attaching a key to a post and referring to that first post by use of that key from some second post.

One wheeze would be to refer to the first post by something like '#hcf', which works, in blogger blog search at least, provided that the reader knows, if he or she wanted to check up what '#hcf' was all about, that the idea is to strip off the leading hash mark before popping it in the search box. Not so good for the casual reader; it is bad enough having to know how to do search for a group search key.

In many academic publications, the convention or custom is that one avoids this problem by the inclusion of a keywords clause somewhere in one's paper, usually somewhere near the beginning, perhaps just after the abstract, assuming, that is, that one is the sort of academic that does abstracts. One then does a keyword search rather than a free text search, a search in which inclusion of the keyword(s) in the body of the paper would no longer count, in the way that it does here.

And a keyword search could then be included in a free text search, if what looked like free text search terms were actually embedded in some full-on query language, which just defaulted to a simple, free text search. So one would have something like 'key:HCP', where 'HCP' was the keyword for Hampton Court Palace. I suspect that this is not the case in Google blogger search. So what about Microsoft?

Asking google about Microsoft query languages turns up various offerings:
  • The heavy-lifting SQL query language, the mainstay of programming against databases, often SQL databases. For many years now the subject of an international standard, a standard originally driven forward, ironically, by IBM, just before the explosion of Microsoft onto the scene
  • Microsoft Power Query for Excel. Nothing yet known about this one, but I did find a specification
  • Windows Search Service. This is getting a bit warmer with a talk both of an instant search box and of a searchable catalog of documents
  • Windows Explorer search. Free text content search plus some property stuff, 'file type is Powerpoint' sort of thing, this last via the search tab which pops up if you click in the search box. Looks as if you can only do an implicit 'and', with no actual 'and', 'not' or 'or' and no brackets. But it looks as if you can do quotes enclosed text strings
  • Word search. A search within a document rather than for a document, with rather different rules
  • Start search. Bottom left on a Windows screen. Not yet worked out what this one is for.
But I found very little help about the last three of these, the three which are intended for use by end users like myself, and certainly no specification of the query languages involved. There must be such a thing somewhere, if only for the people coding these search features to work from.

I have also failed to find any help about the blog search feature that I started with, although there is plenty of stuff out there about searching blogs generally. Some moaning about Google's withdrawal of some blog search tool or other.

I didn't look for any help or support for google search generally. Something they are clearly very good at and which no doubt comes in at least 57 varieties.

So after all this, I have not come up with a very neat solution to my irritation. I guess I will have to settle for '#hcf' and hope for the best.

PS: I could digress on the important role that search now has in computing generally. About how in the olden days everything was given an identifier, something like a postcode or a telephone number, and one organised both one's data and its processing around that identifier. Sorting the records of a big file by identifier was important, and much treasure was spent on getting good at it. Whereas now search rules. Search is to computing what sulfuric acid is to chemical engineering or paint is to painting. You don't ask for me by national insurance number any more, you ask for me by some search term like 'Epsom blogger nerd trolley' - not that we have yet got to the point where that does anything for either Google or Bing. But I leave all these interesting speculations for another day.

No comments:

Post a Comment