Seminatural language processing: Short description

Background

There are two completely different ways of analysing human language. In the early days of natural language parsing, the common way was to use a formal grammar. The language would be treated as a formal language, and analysed in largely the same way as a programming language. Nowadays it is more common to use statistical methods. One starts with a large text, a corpus, which has been analysed by hand, and uses machine learning techniques to get a program to learn the connections and be able to interpret a given text.

One application of language understanding is to give instructions to robots. Depending on the situation, natural or formal language may be more appropriate. Sometimes it is necessary to use formal language to avoid errors, but when possible natural language has some advantages. One is that the user already knows the language. But also, when robots develop and find use in everyday applications, it may be good to have a language which has many of the properties a natural language has, and even if it is different from the standard human language a naturalistic language is likely easier to learn than a formal language.

Problem

Using a formal grammar to analyse natural language turned out to be difficult. Natural languages do not follow the simple grammars which you see in grammar books; their rules are much more complex. They also include ambiguities which can only be resolved with extensive knowledge of the real world and the actual meaning of what is being said. Therefore, statistical methods turned out to be more effective. But it is in the nature of statistical methods that they are not always right, and for many sensitive applications the error frequency is unacceptable.

Idea

To get around the problem, one could use a language which has a formal grammar, but is otherwise similar to a natural language. Such a seminatural language would be more intuitive and therefore easier for humans to learn than a traditional formal language, and it would have different forms of expression. It would also be easier for machines to interpret than purely natural language, and perhaps in some situations it would help humans understand it as well.

My idea is thus to investigate which properties such a language could have, and make an application in the form of a program which reads text in a seminatural language, and responds by manipulating objects in a 3D world representing the environment a robot might interact with.

Specifically for communicating with robots, it could be useful to look at aspects of natural and formal languages other than the obvious difference that the formal languages have a formal grammar. For example, a natural language is capable of both excluding and repeating information, which can be useful in many situations. Even if a language is syntactically unambiguous there may then be semantic ambiguities, depending on the situation. Those ambiguities can be resolved using feedback. A graphical simulation gives the user visual feedback; the user sees the effect of what he has said, and can confirm or reject the result. Just like in a natural language this can be combined with verbal feedback.

On the other hand, there are properties of formal languages like programming languages which could be useful in a communication language, even more so if the communication is with machines. One example is the ability to express exact definitions of new words. Natural languages, including many of the constructed languages made for human communication, have a great number of concepts which are not strictly defined within the language; a large part of the lexicon can not be explained, only translated. One can also try to simplify grammatical rules to decrease the risk of misunderstanding, and add things like clearer logical expressions and more recursive structures. By adding such syntactical structures we pave the way for languages which are not only syntactically but also semantically partly or wholly deterministically analysable.

Seminatural language processing

2010-10-09

Short description

Background

Problem

Idea

No comments:

Post a Comment