Creating an easy-lookup indexing system for Chinese Characters

The importance of indexing in general


Creating an Easy Lookup Indexing System for Chinese Characters

One of the most difficult aspects of learning to read (and perhaps even write) Chinese, particularly when you are self learning, is having a means of easily looking up Chinese characters.

These days it's a lot easier with smart phones and apps that allow you to write characters or take pictures of them, but that doesn't address the underlying problem. Chinese characters lack an efficient indexing system, particularly for beginners trying to learn Chinese characters, and even for locals.

You either have to know the radical, or be able to figure it out, or failing that, simply guess. Or you have to count strokes. If you have a modicum of experience with Chinese you may even be able to guess its pronunciation.

With English, looking up a new word is easy. You just go to your dictionary, and look it up alphabetically.

Wouldn't it be nice if there were a similar way of looking up Chinese characters.

Well now there is.

Why bother with an efficient indexing system for Chinese characters?

Now before I tell you about this method for looking up Chinese characters, you might ask, why bother? We have google, we have smart phones, we can use track pads or touch screens to write Chinese characters (and thus look them up). Why bother with old school indexes for looking up characters, even if they are new and improved?

Google got big because it learned how to index the web

You know how google got big? By indexing the web. They not only got big, they are now synonymous with the web. And this is because they learned how to index (and rank) web pages. And while google does rely on technology, at the heart of that technology isn't smart phone cameras or writing apps, it's actual indexing of pages based on content. With that indexing google can do lots and has done lots. Indexing has given them power.

Because of google we can find stuff easily whether it is porn, the equivalent of a cup of flower in ounces or whether or not brad pitt is still married.

Indexing takes time

Note that indexing is not a simple process. Google has to research each page it indexes, and while this process is automated using algorithms, it is still an important and time consuming process. However, the end result of this process is that google can provide search results in a flash.

Google spends the time and resources to index and rank webpages so that we all can find things easier. And to top it off, it has a method for ranking and indexing web pages. It studies pages, looking for ways to index it, and to see if it fits within current indexing criteria (i.e. something that is clearly defined or definable.)

Even with good indexing, we sometimes have to work, and that's a good thing

Note that even with google, sometimes finding what we want isn't easy. We have to think about how a particular item or area of interest is indexed. We are forced to ask questions of ourselves about what we are looking for.

When we use a smart phone camera and app, or copying and pasting, to find a character for ourselves, we foresake the process of looking at the character itself. We give up the chance to notice the details of the character itself. After all, what reason is there for doing this? Our smart device can lookup the character for us?

The point here is two fold. If we have an effective way of sorting and indexing characters, then we have a way of not only indexing characters in a database (whether that is on paper, in a dictionary or in the database of some app or program), we also have a reason to look at characters and break them down. Thus we learn to "see" the character in terms of its component parts, with the benefit that not only can we look it up easily, we also have a possible hook for accessing the character from our own memory.

So, if you understand a system (or systems) for indexing Chinese characters, that gives you a way of recognizing different elements of Chinese characters. It forces you to interact with them more fully than simply taking a picture (or even typing the character phonetically.)

Aspects of effective indexing

As mentioned, current indexing methods for characters tend to be antiquated. While it can be useful to know these methods, or know about them (in particular if you use old fashioned dictionaries say for calligraphy or more in-depth study of Chinese characters), I'd suggest that it would be nice to have an alternative means of indexing and looking up characters that is easier and also gives us potential "hooks" for remembering those characters, or simply to help us look at them with some measure of discernment.

What does discernment mean?

It's simply the ability to notice differences, or if you like relationships. Rather than looking at a whole character and thinking "that looks complicated" you zero in to a particular part so that you can take in the details, the relationships within that part and the relationship of that element of other parts of the character.

So then, what might an effective indexing system for Chinese characters involve? For a start, a shape based system that is consistently applied. We can use a dictionary easily because we know that English words are always sorted and listed by their first letter.

Indexing Chinese characters via elements vs brush strokes

We could use a similiar idea to sort and index Chinese characters, by using their first or initial element. Note I'm not talking about brush strokes here. While brush strokes are a possible sorting mechanism for Chinese characters, sorting via brush strokes requires experience and understanding of brush stroke order, something a beginner is not likely to have.

An "element" in terms of Chinese characters are one or more brush strokes that make up a clearly distinguishable component part. Note that this is not always easy. But it gives us a starting point.

The initial element of a character

The structure of Chinese characters is varied. Some characters have surrounding parts, some characters are vertically oriented, others horizontally, and yet others are a mix or are simply solid characters where it's hard to define where they begin, a simple set of rules would be helpful for determining the initial element.

The left most part, the top most part of the surrounding part.

Note that the radical index is a sort of shape based index. However, the position of the actual radical will vary. Plus, radicals themselves often relate to the meaning of the character.

Anyway, getting back to shape based indexing, we have an idea of where to start.

It should be pointed out here that the initial element isn't always the first element that is painted or written. This is the first element as it appears after the character has been written, not the first element that appears as it is written!

The next question is, what shapes or elements do we use and how do we actually sort them?

Turning an input method into an indexing method

Not long after I first moved to Taiwan, I began learning an input method called the Cangjie input method. I started learning this method because it was shape based and that was handy since I was copying characters from menus and other sources that didn't show the characters pronounciation. With the cangjie method I didn't need to know the characters pronunciation in order to input it, I could input it based on its shape.

It actually allowed me to learn to touch type Chinese characters. That's because this method maps 24 basic shapes to 24 letters of the alphabet. I simply had to learn these associations. I also had to learn the rules of decomposition. These are important because the cangjie input code for any character has a maximum length of 5. The rules of decomposition are useful because they make it easy to figure out how a character's code is reduced to fit within the 5 character limit.

Because there were some rule breakers with this code, I did have some trouble learning to type some characters, and so I input codes to the database I was building, as I learned them.

I eventually realized that the input system was itself a way of indexing Chinese characters by shape. How, because it maps elements to 24 letters of the alphabet. And while there are actually more that 24 basic shapes, it's still provides a means of sorting and even alphabetising Chinese characters.

Fine tuning Cangjie Codes for Indexing Purposes

Note that the cangjie input method uses an abreviated code, so it's not perfect for sorting. In addition, some characters can begin with the same code but have different initial shapes. So to make this a useful method of looking up Chinese characters, a lot of manual sorting and grouping is required. But it is the basis for two types (and possibly three) of indexing I use for Chinese characters.

The first type indexes characters by their first element. This can correspond to either the first or first two characters in a characters cangjie code.

Because the final character of a characters input code (if greater than one) represents the final stroke or element of the character, cangjie input codes also allow the sorting of characters by their final element.

There is a variation of the input system where you type the first and last elements of the characters input code to get a drop down list of candidate characters.

Indexing via second element

Using initial sorting, some groups of characters are quite large because they share the same element. With groups like this, another possibility is second element sorting and indexing.

Problems with Using Cangjie Codes for Indexing Chinese Characters

With indexing using the cangjie input method there are a total of 24 groups when indexing is for the initial element. For 2nd element and final element indexing there are 25 groups because of one additional shape/letter that never appears at the beginning of a character but can appear in the middle or the end.

One of the problems with this method is that there are some inconsistancies or challenges, particularly for beginners. Plus there is also the fact that there are 24 different shapes that you need to learn (plus their derivatives). I wasn't sure this would be acceptable as a lookup method for a majority of beginner learners.

An alternative shape-based indexing system

Thus an alternative method that I developed and call The Easy shape lookup system uses only 12 basic shapes to index and sort characters.

This is in some ways simply a refinement of the 24 elements used in the cangjie input system. And while this method does offer an easy way to lookup character via their initial, final and even second element, it doesn't offer a way of inputting Chinese characters.

And that's fine because this method makes it easy to lookup Chinese characters, and once you can find a character, you can also find it's input code.

As a bonus, this can be used as an initial lookup method from which you can then easily graduate to cangjie lookup methods if you choose.

Note that I use both methods to index and sort Chinese characters.

Making indexes more useful

With English dictionaries the tendency is to list characters alphabetically. We don't use an index, the entries themselves are organized in indexing order. If you like, the whole dictionary is an index which not only lists words, but the definition of that word also.

With Chinese characters we can build character information into the index itself. If we put too much information in, this can make character lookup unweildy. However, if we limit information, we can turn indexes into effective tools for not just looking up a character, but in the same step also providing information about the character.

Character Maps, indexes with character information built in!

One way that I've played with this idea is with something I call character maps. If printed, these are essentially giant posters of indexed and sorted Chinese characters. These poster can include cangjie input codes. Or they can include pinyin (along with radical and stroke count). They can also include English definition.

The result is a vast array of characters that are sorted so that they appear next to characters with the same first or last element (if those exist). Because definitions are reduced to the bear minimal key meaning, you can also see how the definitions of characters with the same elements, or pronunciations is similiar or vastly different. Those differences can give you hooks to help remember characters via their relationship to other characters.

This all comes about from taking to the time to create an effective indexing method for Chinese characters, one that makes it easy even for beginners to find a particular character.

Published: 2020 10 05
Defining ideas, relationships (and change) for better understanding, problem solving and experiences

If you enjoyed this article, or found it helpful, please share it!
Thanks, I really appreciate it!