Member of the Internet Link Exchange
Copyright (c) 1995 by Alexey Andreyev
-------------------------------------
PHONOSEMANTIC VISUALIZATION OF TEXT:
A NEW WAY OF ART AND ANALYSIS
(abstract on one fancy idea)
Dreaming during the lecture on AI, I heard a phrase, that basically said that
"the grand old man Donald Knuth couldn't find a publisher to have his book
published in a way he wanted it to look like; so he invented TeX". Being
rather a humble person, I started thinking though: How would I like my book
to look like? Every idea is dying after its implementation. The idea of TeX
died (for me) at preliminary school: being 10 years old and knowing nothing
about computers, I already knew how to underline and thicken headings, frame
formulas using another color pen, etc. So this is no longer interesting.
What else could I do with my book that is supposed to contain my sonnets,
haiku, palindromones and other poems but no boring formulas? Let's dream.
As Plato said, "the idea is everything, the implementation is crap". Here is
the idea. It's not exactly mine. It's based on a couple of books the titles
and the authors of which I cannot tell now, because my flat was robbed once
and the books are gone. But the idea is alive, and with modern means like SGML
language it can lead to the creation of very interesting programming tools
that are to be helpful in fine arts as well as in serious analytical issues of
linguistics and text processing.
Text and word processing is on the whole wings now. But atoms of a language
are smaller than words: sounds and letters also bear pieces of meaning. They
represent a sort of linguistic core around which words are formed. That is
exactly what foreigners cannot catch so easily, even though they learn great
bunches of words and phrases of a new language. On the other hand, being a
native, you sometimes can tell the meaning or just "the mood" of a word of
your own language even if you haven't heard that word before. This "mood", or
"color" of sounds can considerably affect the color of the word as well as
that of the whole text.
So, let's start from simple things, and then go from color to other scales
of characteristics and more sophisticated analysis, which is to show
(in my dream, of course) how the nice mosaic appears from small colored
pieces.
In my book of poems I would probably first paint all Russian vowels: "a"
with red, "o" with yellow, "e" with green, "u" with blue, etc. (it's not a
random choice of colors; unfortunalely, as I said, I cannot present the
source of this research; anyway, real poets of all ages knew this matter very
well - alliteration and assonance are used for poems in all languages).
In English the palette will be different, but it will exist: English is still
not so dull a language, and it's still based on human, rather than computer,
"basement".
Even these colored vowels are enough to look at the mosaic and see what is
"out" of normal, average distribution of letters (colors) in this particular
text. Going further, we will consider how the sounds can color the word:
- the more "rare" letter, the greater its influence (what do you think or
feel about the word or text with high concentration of "z"?);
- first and stressed sounds will be painted with higher intensity (that must
probably be normalized by the length of the word);
- not only single letters, but also special coalitions of sounds (phonems)
could be taken into account;
- and so forth...
I used the example of "natural colors" of letters. But it's possible to
visualize other characteristics of the text -- here we artificially let a
certain color mark a certain characteristic of a sound (letter). Thus, we will
be able to say that the text is "funny" or "sharp" just glancing at it! Every
letter could be estimated and given certain values on different scales (for
example, on the scale "friendly <---0---> warlike" Russian letter "z" is gonna
be closer to the right end).
The method described here could be used even without visualization: we can
estimate frequencies not looking at the phonosemantic layout, but simply
calculating them. The implications are wide: for example, we can check if a
certain name (or text) satisfy required characteristics: e.g. the report for
an international workshop shouldn't sound funny (unlike this my essay); the
name of a new type of food shouldn't reminds of nausea -- moreover, it's
possible to generate a new name with given characteristics. The idea could
be interesting for developing AI software that has deal with human languages:
considering "phonosemantic scales" as an extra-database, such programms can
a) generate "more naturally sounding" pieces of text, and
b) use smaller dictionaries in more effective way.
Yet visualization can help greatly in analysis: to calculate certain
characteristics, you need to know first what to calculate; on the other hand,
in a phonosemantic layout you can see the deviation you even haven't thought
about.
Of course, every language is rather a dymanic thing: words "float" in it all
the time, and many of them go far away from the phonosemantic core (however,
if there are two new foreign words as a name of the same thing, generally that
one is picked up which "suits" native language's phonetics; and it's often
changed further then in a way to be closer to "the native roots"). Different
slangs, scientific terms, not so strong correlation between sounds and
letters... there are many things that wash aways this mosaic. Nevertheless,
this picture exists and could be seen. At least in my dream where I'm Donald
Knuth thinking about a fancy layout for my new book.
------------------------------------------------------------------