A Knowledge-based Segmentation Method for Document Understanding

Junichi Higashino, Hiromichi Fujisawa, Yasuaki Nakano and Masakazu Ejiri

Proceedings of 8th ICPR, pp.745-748 (1986)

Abstract

In this paper, the design and implementation of a programming language for knowledge-based document segmentation are described. This approach makes it possible to write layout models of documents and provides a mechanism that analyzes the document image in a top-down manner. Most documents, particularly printed matter, have their own layout rules. Within each class of documents, these are rules are common, creating nearly fixed forms. Therefore, by incorporating models of layout as document knowledge, regions that contain bibliographic items, such as the title, authors and other important items, can be identified and extracted from the image of documents automatically. Presented here is the Form Definition Language (FDL). It is a language for representing the layout rules of a document. In the FDL, the layout is described as a set of rectangular regions in the frame structure, where the regions are repeatedly in terms of smaller rectangular regions. Some of the illustrative programming techniques and their experimental results are also presented.

[文書理解][中野の研究][中野の目次]

mail address: ← お手数ですが打ち込んで下さい

First Written Before June 16, 1998
Transplanted to KSU Before May 16, 2003
Transplanted to So-net May 3, 2005
Last Update April 10, 2007

© Yasuaki Nakano 1998-2007