Seminatural language processing: The structure of the program

The visualiser shows the current state of the world, and displays an input box asking the user to say something. The input is sent to a file. (The use of files along the way allows us to check for errors along the way, and to easily interfere at any point in the process when developing the system.)

The translator reads the file, and replaces the lexical words with the interlinear words. If the input is in the same language the visualiser uses (which happens to be English), it's a very simple process, translating the form words and marking the content words. By switching to a different translator, one can easily use a different language. The syntax analysis will work exactly the same, but if the final interpretation is to work there also has to be a translation of the relevant content words to the target language. This only applies to predefined words - it is possible to define new words within the system. Depending on the application, the number of predefined words could be very small. It is easy to imagine an application which is self-lexicalising regarding the content words; the computer has a preexisting notion of certain properties (objects, actions, etc.) but asks for the word to describe them, completely eliminating the need for predefined content words.
The results of the translation are written to a new file.

The parser reads that file, and attempts to identify the head and the function of each word. For natural language this is a very complicated process, and can only be done to a limited degree of precision. But here, all other steps of the process (including those in the user's brain) have adapted in order to make this one step trivial.
I have used two different approaches in constructing the parser. The first acts as a pushdown state machine, much like a regular compiler. Each state corresponds to a unit in the analysis. For each state there is a subroutine, which reads the next word. It determines whether to finish this current state and go up to the calling function, or call another function corresponding to the unit for which the word in question is the head.
The other method is to first note the part of speech and form of the words, and then go through them backwards. For each word, the parser simply steps backward until it finds a word which can act as the head of the current word. This method is simpler, but puts stricter limits on the structures used in the language.
The parser writes another file, in a table format similar to the CONLL standard.

The treemaker turns the list into an object oriented semantic tree. Each word is represented by a word object, with links to its head and dependants.

The semantic interpreter goes through the sentence tree, and performs actions described by the predicates. If compares the properties in the tree to the properties of the objects in a collection.
Finally it calls the various methods of the visual representations of the objects. The visualiser automatically updates to view the changes, and then asks for new input.

Seminatural language processing

2010-10-09

The structure of the program

No comments:

Post a Comment