The overlooked costs of poor indexing

Indexing methods; The benefits of good indexing;
Why it takes time to save time

All

indexing memory learning

The Overlooked Costs of Poor Indexing

What is indexing and why is it important?

Indexes make it easier to find information, or things.

Poor indexing (or a lack of indexing all together) makes it harder to find information or things.

In this article:

We'll look at some examples of good indexing and how it makes life easier, plus an example of what happens when indexing isn't done so well.
We'll then explore the idea of indexing, getting a better idea of what indexing actually is with an emphasis on why it is important.
We'll also look at the benefits of having multiple indexes, i.e. different ways of indexing the same thing. (And that relates to the point about how indexing can aid learning!)
We'll also look at indexing by location versus indexing via name.
Along the way, we'll make some suggestions for indexing effectively.

The better we understand indexing, the better we can assess when we need to use it.

We can also determine when it is worth the time investment (and when it isn't).

With some understanding of indexes and indexing, we can then look at how our brain can index information.

This will be covered in what could be thought of as part 2 to this article: how we remember, indexing memories for better recall.

If we understand how our brain indexes memories and parts of memories, we can make it easier for our brain to do just that.

We can then begin applying these ideas to help ourselves get better at saving information and retrieving it or at the very least understand why we are having difficulty retrieving things we are trying to learn.

If we learn to understand the limits of our brain we can then learn how to work within those limits and so that we can reduce frustration when learning and even make the process of learning enjoyable.

Simply put, we can get better at saving information and retrieving it.

Even if you are not interested in how the brain might index information, the idea of indexing can still be helpful.

To find out how and why, please read on.

Google and indexing, finding webpages easily

Google is perhaps one of the biggest and best known examples of the importance of indexing. It is perhaps such an obvious example that we don't even think about what Google does.

When I graduated from university, the internet was still relatively new. To find information on a particular subject, or business, there were many search engines, none of them that reliable. And so one tool that we could use was called dogpile.

Rather than trying the same search term in four or five different search engines, dogpile would do that for us automatically, providing results from all of these search engines. It was still hit and miss, but it was the best we had at the time.

When Google came along with its method of indexing web pages and ranking them, searching became a whole lot easier.

Nowadays Google is almost synonymous with the web. It provides an index of web pages, all ranked for relevancy with respect to various search term elements. It's indexing makes it easy for us to find what we are looking for on the web.

Google takes the time to index the internet. It constantly updates its database, canvassing the web for changes to web pages and for new web pages so that its index is as up to date, and as helpful as possible.As a result Google can respond to search requests in a flash.

Because of Google's indexing, we can find things easily on the internet.

Amazon and indexing, locating product quickly

Amazon provides another example of the importance of indexing, particularly with respect to their warehouses.

Now you might think that whenever stock comes in, each item is placed in a specific location. And then to get the needed quantity of that item a worker simply goes to that location. That's not the case. Instead, stock is stored wherever there is room.

When stock is brought in, it is stored wherever there is free space.

So that a particular item can be retrieved, every item's location is entered into a database as it is brought into the warehouse. Every item's location is indexed in that database so that it can be retrieved easily.

When an order for a particular item comes through, a program accesses the database and then tells a worker or a robot exactly where to go to retrieve said item.

They even get a map of the quickest route to get there.

In the case of both Google and Amazon, effective indexing makes it easy, and quick, to find things. This is despite the constant change that occurs in the respective domains of each.

Memories of defragging hard drives

For anyone who is old enough to remember having to occasionally defrag a hard drive, hard drives work in a similiar fashion to an Amazon warehouse.

When storing files on a hard drive, the file is divided up into smaller chunks with those chunks being stored wherever there is room.

An index of files and their pieces is maintained so that whenever a request to access a file is made, the file can be assembled in memory and then opened.

Defragging was (and it seems, still is) a way of tidying up a disk, putting the separate pieces of a file closer together, and perhaps freeing up larger contiguous sectors, making it easier to store new information and at the same time making it quicker to access existant files from that disc.

And so here again the idea of an index is that it makes it easier to retrieve something, in this case, the pieces of a file.

An example of inefficient indexing

What happens when indexing is less than good?

For a while a worked in a pizza restaurant. This was a small business with limited storage space. Like an Amazon warehouse, because space was limited, inventory was stored wherever there was room. Unlike an Amazon warehouse, the location of inventory was not logged or indexed in a central database.

As a result, any time we needed anything we had to search for it.

This could be troublesome for new employees who often didn't even know what they were looking for looked like. Our boss would tell them just to "look.... use your eyes..."

In a case like this, positioning items in the same place (or at the very least, in the same fridge) may have been a better approach.

Asking someone to retrieve something, we would have been able to tell them exactly where to go. Frustration would have been reduced all around and we could have got on with doing our jobs a lot faster.

Compounding the problem

Note that in the above case the problem was two fold. Not only were items not placed in specific locations, new employees often weren't familiar with what they were looking for. And so they'd be searching for something they might not even recognize in a vaguely specified location.

To compound the problem, we were also dealing with up to four different languages. The owner was Italian. His English wasn't always perfect. We all mainly spoke English amongst ourselves and bad to really bad Mandarin when dealing with customers (I live in Taiwan). Often the new hires were from places like Mongolia. And so both Mandarin and English were second languages to them.

Effective indexing might have made working in that place a lot less difficult.

The benefit of good indexing and the cost of bad indexing

Whether it is Google making information easier to find, or Amazon making product easier to retrieve, or a hard drive management system making it quicker to retrieve files, in all cases indexing makes it easier to find stuff, particularly when there is a lot of it or when the goal is to find stuff quickly and easily.

Bad indexing, or a lack of indexing at all makes it harder to locate things.

The cost of bad indexing is time. It takes longer to find things.

Indexing via location or indexing via name

I dabble in programming on the side, mainly simple stuff with a language called python.

I generally use it to speed things up.

When programming, we can use different structures for storing information. One of these is an array or list. Another is a dictionary.

With an array items are stored in numbered positions.
In a dictionary, each position has a unique name, termed a key.

To access items in an array, we refer to them via the position they occupy in the array. Positions are indexed by numbers starting from 0.

To call up a specific entry in a dictionary, we access it via its key. This can be any string or letters and numbers, so long as within the dictionary it is unique.

The nice thing with dictionaries is that the name you use for the thing that is indexed can be descriptive of the thing indexed. You just have to make sure you remember the name.

Arrays also have their benefits, in particular the ability to sort or arrange.

In one of the above cases, entries are called up via their position, in the other via their unique name.

Both have their uses.

Learning to recognize cheese

The two cases above have remarkable similiarities to the problems I highlighted in the Italian place.

On one hand, if things were stored in clearly defined locations we'd be able to find things easily, even if we didn't know what said thing looked like.

On the other hand, if we are intimately familiar with, for example, all the different types of cheeses, we don't have to look in too many different fridges, we can identify that cheese simply by look.

So if one of us was sent up for pecorino cheese, which looks a little bit similiar to parmissian if you are new to cheeses, and if pecorino is stored in a specific location, we can grab it easily, even if we aren't quite sure what it looks like.

This is the equivalent to indexing by position.

However, once we learn to recognize that cheese, and can tell the difference between it and other cheeses, we can simply look for it.

This is a rough equivalent to indexing by name.

Indexing can make up for a lack of training

In lieu of good indexing systems, a good training program might have sorted out a lot of the above difficulties, but that was something else missing at the Italian restaurant.

What constitutes a good training program? That's one of the things that will be covered in part two of this article.

Good training aside, with a reasonably consistant indexing system in place, we could have been easily been able to tell new hires where to go to retrieve particular items. We could have used labelling, or even color coded labels (big ones) to make things easier to located and easier to identify.

There then would have been less frustration ("why the heck am I paying you guys") and we could have gotten on with the main task, preparing and serving food, and maybe even have enjoyed the process of doing it.

A practical lesson on the benefits of indexing

On my last day at the Italian place I was showing a new person the location of certain ingredients in the kitchen fridge.

Since I was, at the time in charge of the kitchen, I took it upon myself to place certain items in particular places. So for example two jars we used constantly, and often together were the jars of red pesto and truffle sauce.

I placed them next to each other on the top shelf and I told the new guy where to find these two things, since he might on occasion be asked to grab them for me.

(While there are a lot of other ingredients he could help retrieve, I focused only on these two. He had a lot of other things to remember also.)

Note that prior to that day, those two jars weren't always placed in that location. And also, there were three or four other jars that looked the same but had different ingredients. Choosing to place these two jars in that specific location was the basic equivalent of indexing them by location. I was making them easy to find even if someone didn't know exactly what was in those jars.

Explaining to the new guy what to grab and from where, I had a realization of how much easier this could make things if we did that with all of the other ingredients.

I had a flash of inspiration where I realized that it was also something that I needed to apply in another arena. I needed to apply it to the indexing of my yoga website.

Indexing a web site

I have a 500 page yoga website and at the time it was not indexed well at all. By that I mean even I had difficulty finding particular pages, and I both built the website and wrote all the articles (bar one) that it contains.

Not long after my flash of inspiration at the Italian restaurant, I started work on a categorized index page for my website, with each page assigned to one (and only one) category.

Now from any page on that particular site, a user can navigate to its particular category and thus see all the other articles in that category. Or they can scan the different categories to find the particular page that they are looking for.

Finally I had a (relatively) decent and easy to use index page for my website.

This is only one of many indexes on my website.

Up to this point I'd experimented with quite a few indexing options.

I'll suggest here that one of the problems I've had was that I tended to build the indexing into the content of web pages so that my pages "ranked" with google.

While not a bad thing, I should have also provided more "out of content indexing", i.e. indexing that wasn't built into the content of various pages.

Choosing effective categories

So why did it take so long for me to come up with an effective index for that website, particularly since I'm here writing about indexes?

One of the main reasons is that the pages on that website tend to cover a lot. And so it can be difficult to peg each webpage down to one particular category. And that can be one of the challenges with indexing, clearly defining what it is that we are trying to index.

Another problem I had was deciding how to categorise my pages.

What categories did I want to use?
What categories would be most helpful?
What categories would people be searching for?

And this can be one of the challenges when indexing, choosing which categories to use.

The thing with categories is, they can always be changed.

And so one suggestion is to implement a set of categories and then try using them. And then change them, if required, based on how useful they are.

Choosing which categories to use is one of the challenges of indexing. And it's one thing that may have to be changed or fine tuned.

That being said, another option is to create multiple indexes.

This can involve a lot more work. But it be useful in more than just making things easier to find.

Chinese characters: Categories and multi-index systems

One of my other projects is a database of over 7000 Chinese characters and words along with associated meanings, pronounciation and other helpful information. Building it offered an excellent opportunity to explore the ideas of index categories and multiple indexes.

To start with this project was simply a database without any indexing or categories. How then did I lookup characters in this database without having an index?

When I started building my Chinese character database, I initially relied on phonetic input, like pretty much most people. This was problematic because a lot of times I didn't know the pronounciation. As a result, I had to look the character up in the dictionary first to find its pronounciation. Then, after typing the pinyin for the character, I then had to search for the character in the resulting drop down menus.

It was a time consuming process.

Eventually, I found out about a method of inputting Chinese characters that was based on shape.

A shape-based system for inputting Chinese characters

This shape based input system for Chinese characters came included for free with the operating system I was using. And it was relatively easy to learn.

It simply matched 24 basic character shapes to 24 letters of the alphabet. Each character could be broken down into elements that corresponded to one of these 24 shapes, or some combination thereof.

Once I'd learned the 24 basic characters shapes and their alphabetic counterparts, and once I'd learned how to deconstruct characters into their basic elements, I could basically touch-type Chinese characters on sight. (I already knew how to touch type in English.)

Now I know longer needed to look the character up in a dictionary to find out how to pronounce it. I could type it using its cangjie input code.

And this was my initial method of looking characters up in my excel database. I'd simply type them using their cangjie input code. Excel could then do the looking up.

Because I could type Chinese characters based on their shape, I didn't need an index to look them up.

Being able to touch-type Chinese characters, I gradually increased the content of my database. As well as definitions for each character, it also included their pronounciation. I also began to include their cangjie input codes as well.

Alphabetising Chinese characters by shape

Even though I could touch type Chinese characters, I still thought that I would need a way of indexing Chinese characters to make my database useful for others. Bearing in mind my experience with using a radical index, I wanted an index that made characters easy to lookup.

I eventually realized that the shape based input system I was using provided a simple way and consistant way of indexing Chinese characters. And because each of its elements corresponded to a letter of the alphabet, what it actually offered was a way of alphabetising Chinese characters by shape.

A consistent shape-based indexing method

One of the really nice things about the method was its consistency.

The first element in a characters input code was always the left most, top most, or outermost element. That meant with an index based on these codes, when looking up a character, you always used the first element of the character.

This is in direct contrast to a radical index where the radical can be any part of the character, the top, the left, even the bottom or right.

When using a radical index to find a character, often times you simply had to guess as to which part was the radical.

But with the cangjie index, the guess work was removed. It made character lookup a lot easier.

Note that this input system required some fine-tuning to use as an effective index system. Since the code for each character was limited to a maximum of five, elements could be left out of the coding. Plus, each letter corresponded to a range of roughly related shapes.

And so I had to do a lot of manual sorting to make the index useful.

But it provided my first consistant and relatively easy to use index for finding Chinese Characters.

Indexing from the back end instead of the beginning

An interesting point is that there were variations of this input system.

One such system was based on the first and last part of a character's cangjie code. This system was handy if you were only vaguely familiar with the cangjie system. You could easily figure out the first and last elements of a character's cangjie code and then select the desired characters from a drop down list.

While I occasionally used this method when it was available, the more important point it passed across was the idea of indexing Chinese characters via their last element.

One of the benefits if this was done in English is we'd see all words that end end in "ed" or "ing" grouped together.

And so this was one of the other indexes that I created using the cangjie input system as a base. A tail-end sorted alphabetic index of Chinese characters.

A rhyming index

Note that applying this same idea to a phonetic index for Chinese characters what we end up with is a rhyming index!

If we sub-sort such an index by shape, we can end up with a rhyming index that also has characters with the same phonetic elements grouped together.

Indexing with re-usable categories and sub-categories

Whether sorting using a character cangjie code's initial element or final element, the 24 shapes and their associated letters offered a set of main categories. From there, the various derived shapes associated with a single letter or combinations of letters offered sub-categories.

These categories and sub-categories would be based on a character's initial element.

For larger groups of characters, the characters could be further sorted via their second element using the same categories and sub-categories.

Thus this shape based input system offered an indexing system that could be applied repeatedly. It allowed for Chinese characters to be sorted first via their initial element and then sub-sorted via their second and even third elements.

This could be used whether an index was based on the intial element of characters or their final elements (or even their second elements).

There was a slight problem though.

Not beginner friendly

The problem with this index was in terms of beginner friendliness.

Not everybody was familiar with the cangjie input system. And I didn't think a lot of people would be interested in taking the time to learn the 24 elements, their derivatives, and their alphabetic correspondances.

And so I refined this index.

Note that I didn't get rid of them. These are still useful indexes, particularly if someone is interested in learning the cangjie input method.

Making the cangjie input method easier to learn

And actually, that's one way in which having both initial element and final element indexes is handy.

With initial and final element indexes based on cangjie input codes, these can make the cangjie input method easier to learn.

In one instance you can see characters together that begin with the same cangjie elements. In another you can see characters together that end with the same cangjie elements.

Rather than trying to parse "rules" for understanding cangjie codes, you can simply see the effects of when those rules are applied.

Making Chinese character indexes more beginner friendly

To make the initial indexing idea more beginner friendly, I reduced it to 12 basic shapes, each with up to 16 sub-shapes.

I picked shapes that would be familiar to someone, even if they weren't familiar with Chinese characters.

These indexing shapes weren't associated with letters of the alphabet or any other system for sorting. And so to make them easier to remember and easier to use, I ordered them into a sequence and gave each of them names.

So for example, the first shape is called "Box" and is literally a box shape.

It' sub-shapes are all variations of the box shape or include the box shape as their top-most, left most or top-left most element.

The next shape is a variation of the box, except its missing the bottom edge. And so this is called "top".

The shape after is missing its top edge. And so this is called "bottom".

Other shapes include a horizontal line called simply "horizontal" and a vertical line which is called "vertical". Then there's the shape that combines these two elements to form the shape called "cross".

I repeated this process when creating an index based on the final element of characters. One difference was that for the final element index I didn't name the shapes, at least not to begin with. And while I did arrange them in a particular order, I didn't pay as much attention to the ordering as I did for the initial elements.

I named the final element shapes and put more thought into their ordering simply because I found this index more difficult to use.

What I realize now, as I write this is that the names themselves are like index keys. They point to the individual shapes. And they actually offer handy links for refering to each of the shapes.

For example, if I see a Chinese character on a street sign and I want to remember it so that I can look it up later, I'll often remember it in terms of the shape elements that make it up.

Cangjie input codes offer a similiar kind of hook. Though when using cangjie input codes I'll generally refer to them in terms of their associated letters.

From there the next challenge was creating sub-groups.

Sub-sorting using the same indexing elements

I limited the number of subgroups to a maximum of sixteen to again make the index as useful as possible.

Sixteen subgroups are reasonably easy to scan. But it meant that I also had to do some creative combining for some elements.

The real challenge was then doing the actual sorting of over 7000 characters.

Note that it wasn't just sorting them via their first element, but also by second and in some cases third element to make each character relatively easy to find.

But one of the nice things was, I could simply reuse the same shape-based system for sub sorting the characters by second element and in some cases even by their third element.

The easy differentiability of Chinese characters

I should point out here that one of the things that made this doable is the fact that Chinese characters are easy to break down into elements. They themselves are made up of sequences of brush strokes and these brush strokes make it easy to break down characters into easy to differentiate elements.

This easy differentiability made it possible to create an effective index for Chinese characters, and not just a single index, but a system of indexes.

A taste of how our brain indexes memories

I'll suggest here that is, in some ways, more or less what Google uses when it canvasses the web to create an ranked index of web pages. It looks for distinguishable features to help in index individual web pages.

And this may be what our brain does when it indexes the contents of our memories and/or the things that we learn. It looks for distinguishable, or easy to-recognize features.

The point here is, if we learn to differentiate, we may be able to assist our brain in indexing our memories, or at the very least, figure out how we can retrieve memories a little more easily.

And that's what I'll talk a little bit more about in part 2 to this article.

The additional benefits of multiple indexes

Before finishing I should point out that there have been benefits to having multiple indexes for Chinese characters.

With an initial cangjie index, I am able to see Chinese characters that begin with the same cangjie code all grouped together. That can make it easier to understand how the cangjie coding system is applied at the beginning of a character.

With a final element cangjie index I am able to see Chinese characters grouped together that have the same final cangjie code elements. Thus I can better understand how the cangjie coding system is applied at the end of Chinese characters. Basically it codes the final brush stroke or combination of brush strokes. Seeing this in practice makes it easier to understand or grok.

With a variety of indexes for Chinese characters, say initial element and final element shape based indexes, I can see Chinese characters in two separate groups. Add different ways of indexing Chinese characters phonetically, and even by radical, and again I have opportunities to see Chinese characters in different contexts. This can make them easier to understand and easier to learn. And given the ability to move easily between characters and indexes, say via an app or website, this provides an interesting way of touring a dictionary.

It makes the dictionary more like a website, or more like a segment of the internet where each character is like a page and the indexes provide links for moving between those pages.

Now, imagine something like this built into our brains.

by: Neil Keleher

Published: 2021 11 12

Indexing takes time. It also takes effort. Why bother? Neil Keleher.