Segmentation Methods for Character Recognition

Hiromichi Fujisawa, Yasuaki Nakano and Kiyomichi Kurino
Proceedings of the IEEE, Vol.80 [7], pp.1079-1092 (1992)

Abstract

This paper discusses character segmentation methods, a key technology for character recognition that determines the usability and applicability of optical character readers. A pattern-oriented segmentation method that leads to document structure analysis is presented. A first example of advanced character segmentation is touching handwritten numeral segmentation. Connected pattern components are extracted instead of a pixel image, and spatial interrelations between components are measured to group them into meaningful character patterns. Stroke shapes are analyzed in the case of touching characters. A method of finding the touching positions can separate about 95% of connected numerals correctly. Ambiguities are handled by multiple hypotheses and verification by recognition. An extended form of pattern-oriented segmentation is also discussed by presenting another example of tabular form recognition. Document images of tabular forms are analyzed, and frames in the tabular structure can be extracted. By identifying semantic relationships between label frames and data frames, information on the form can be properly recognized. Advance character segmentation with a document structure analysis capability is becoming increasingly significant in automating information extraction from various kinds of documents.

[Segmentation of Touching Characters] [Research Themes of Prof. Nakano.]

mail address: <- Please enter the string into the address field

First Written Before June 17, 1998
Transplanted to KSU Before June 19, 2003
Transplanted to So-net April 22, 2007
Last Update April 22, 2007

© Yasuaki Nakano 1998-2007