There's an interesting paper titled The Latent Structure of Dictionariesfloating around the Internet. Written by a Canadian-led team, it forces clearer thinking about words.
Dictionaries rest on a well-known paradox. They use words to define words. So I might look up the word justice and read "the quality of being just; fairness." Ok. So I look up fairness and find "free from favoritism, self-interest, or preference in judgment." Oh, boy. I could look up all those words too, but a black hole emerges before me. The task stretches out to infinity.
Thanks to the computer, however, the endless task can be accomplished. There are, after all, a finite number of words in the dictionary. Let's say there are D words defined in a dictionary. Not all of these words are used in defining other words. For example, the dictionary defines the word cockroach but does not (I'm guessing, here) use the word in any of D's vast text of definitions. We can symbolize these unused words by the letter C and remove all of them from D. That process leaves us with a shorter list, call it D1. (D1 = D – C)
Now we repeat the process, looking for words in D1 that are not used in defining any member of the list. That process gives us a new list, D2. We keep going through this process until we at last come to some Dn that has no C, that is to say, every word in Dn is used in Dn's definitions. We have created a list of perfect circularity, and to escape that circle we have to point outside the dictionary to the world of objects, actions, and sensations. The words in Dn are where a language is grounded in something beyond itself.
The authors call this list of grounded words, the dictionary's core. Not surprisingly core words are among the first learned, most frequently used, and oldest words in a language. Regrettably, the authors do not provide any lists of their findings. Their paper is focused more on how to graph the findings than on the contents themselves. But that shortcoming is easily curable and the math is also important for it provides a way of relating different words to the core.
The words in a dictionary should be definable by the "grounded words or, recursively, using further words that can themselves be defined by those grounded words." This observation gives us a whole new way of categorizing a word. How many recursions does it take to get from ground words to a particular word?
This process suggests what to look for in the growth of languages, starting with core words, expanding to first recursion, second and so on. Although without an appendix listing the words I can't quite grasp the reality of it.