A Knowledge-based Segmentation Method for Document Understanding

Junichi Higashino, Hiromichi Fujisawa, Yasuaki Nakano and Masakazu Ejiri
Proceedings of 8th ICPR, pp.745-748 (1986)

Abstract

In this paper, the design and implementation of a programming language for knowledge-based document segmentation are described. This approach makes it possible to write layout models of documents and provides a mechanism that analyzes the document image in a top-down manner. Most documents, particularly printed matter, have their own layout rules. Within each class of documents, these are rules are common, creating nearly fixed forms. Therefore, by incorporating models of layout as document knowledge, regions that contain bibliographic items, such as the title, authors and other important items, can be identified and extracted from the image of documents automatically. Presented here is the Form Definition Language (FDL). It is a language for representing the layout rules of a document. In the FDL, the layout is described as a set of rectangular regions in the frame structure, where the regions are repeatedly in terms of smaller rectangular regions. Some of the illustrative programming techniques and their experimental results are also presented.

[Document Understanding] [Research Themes of Prof. Nakano.]

mail address: ← お手数ですが打ち込んで下さい

First Written Before June 17, 1998
Transplanted to KSU Before June 19, 2003
Transplanted to So-net April 22, 2007
Last Update April 22, 2007

© Yasuaki Nakano 1998-2007