DOCLAB
Rensselaer Polytechnic Institute
George Nagy
tel: (518)
276-6078
fax: (518)
276-6261
email:
Department of
Electrical, Computer, and Systems Engineering
Department of
Computer Science
OVERVIEW |
COMPLETED PROJECTS |
SPONSORS AND COLLABORATORS |
OVERVIEW
DocLab offers facilities for
document image analysis (DIA) and
optical character recognition (OCR)
to graduate and undergraduate students in computer engineering and computer
science. We conduct research on the extraction of information from both
hard-copy and computer-generated images. Our emphasis has been on printed text,
but we are also interested in hand print and script, tables, business forms,
engineering drawings, utility maps, ballots, and 2-D and 3-D images. We focus
on the development and experimental verification of algorithms, but we have
occasionally delivered commercial-grade software modules.
DocLab was established in
1985. We are located in the department of Electrical,
Computer, and Systems Engineering (Head: Kim Boyer) in the
We are interested in all
aspects of Document Image Analysis,
Optical Character Recognition, and Graphics Conversion:
style and language context
optical scanner and facsimile characterization
document and character defect modeling
format, layout, and functional component analysis
character segmentation and classification
consolidation of information from semiformatted web
data
ontologies from tables and table ontologies
error recovery and error classification
efficient interaction for document and picture
recognition
Applications include:
automated distribution and filing systems
op-scan election technologies and camera-based
ballot interpretation
transaction processing, postal address reading
digital libraries, information retrieval
personal record keeping
document markup for the web
automated construction of web ontologies
conversion of medical images to symbolic form
All of our projects must
meet the academic requirements of graduate research and scholarly publication,
and must lead commercial initiatives by several years. The common thread among
our projects is recognition systems that
improve with use.
George Nagy, Professor, Computer
Engineering
tel: (518) 276-6078; fax:
(518) 276-6162; email:
Nagy has published over hundred research and survey
papers on OCR and DIA. Some of the ideas that he and his colleagues have
originated have been widely adopted: prototype-based compression for printed
text, self-corrective (adaptive) classification, probabilistic decision trees,
and X-Y trees for document format representation. He is also known for his
early work with R. Casey at the
Mukkai Krishnamoorthy, Associate
Professor, Computer Science
PhD Indian Institute of
Technology,
tel: (518) 276-6993; fax:
(518) 276-4033; email: moorthy@cs.rpi.edu
Krishnamoorthy's research interests are software
engineering, graph algorithms, compilers, and combinatorics. In document image
analysis, he has contributed significantly to syntactic segmentation and the
analysis of printed tables.
MS Candidates: |
|
Raghav
Padmanabhan |
Query
by Table |
Ramana
Chakradhar |
|
Anne
Miller |
Useability
test for op-scan election ballots |
|
|
Former DocLab students (in image analysis and OCR):
Piyushee Jha |
MS May 2008 |
Table analysis for generating ontologies |
B.S. Yanamadala |
MS Dec 2007 |
Hypothesis tests for classifier evalulation |
Xiaoli Zhang |
PhD May 2007 |
Style quantification |
Srinivas Andra |
PhD Dec 2006 |
Style-constrained character recognition with SVMs |
Ashutosh Joshi |
PhD May 2006 |
Symbolic Indirect Correlation |
Abhishek Gattani |
MS August 2005 |
Mobile interactive visual pattern recognition |
Jie Zou |
PhD May 2004 |
Interactive visual pattern recognition |
Tong Zhang |
PhD May 2004 |
Volumetric change detection in concrete under compression |
Adnan El-Nasan |
PhD December 2003 |
On-line handwritten word
recognition |
Grant Deffenbaugh |
PhD December 2002 |
Vision-based robot gripper |
Asad Abu-Tarif |
PhD December 2002 |
Volumetric brain image
registration |
Harsha Veeramacheneni |
PhD December 2002 |
Style-conscious
classification |
Prateek Sarkar |
PhD May 2000 |
Style consistency in pattern fields |
Elisa Barney Smith |
PhD December 1998 |
Characterization of scanning defects |
Sutha Sivasubramaniam |
MS December 1998 |
Icon - label association in
digitized maps |
Yihong Xu |
PhD August 1998 |
Prototype extraction and
adaptive OCR |
Asad Abu Tarif |
MS May 1998 |
Table processing and understanding |
Kerim Kalafala |
MS December 1997 |
Identification of
street lines and intersections in a scanned urban topographic map |
Andrew Shapira |
PhD December 1997 |
Cycle parity generators and a general random number library |
|
MS 1990 |
Terrain visibility |
Shuvayu Kanjilal |
MS May 1997 |
Directory assistance query generation |
Ed Green |
PhD May 1996 |
Table image analysis |
Tasso Anagnostopoulos |
MS May 1996 |
Preclassifier
for OCR |
Edison de Jesus |
|
Schematic
diagram conversion |
Xiaoyin Wang |
MS December 1995 |
Reliable n-tuple
features for OCR |
Dz Mou Jung |
PhD May 1995 |
Joint feature
and classifier design for OCR based on a small training set |
|
MS 1990 |
Comparison of
algorithms for terrain visibility |
Trevor Salla |
MS May 1995 |
The validation and evaluation of synthetic image sets in OCR |
Prateek Sarkar |
MS December 1994 |
Random phase spatial sampling effects in digitized patterns |
James Waclawik |
MS December 1991 |
Parallel extraction of hierarchic projection profiles for very large
binary images |
Junichi Kanai |
PhD December 1990 |
Knowledge-based document image analysis system |
Mahesh Viswanathan |
PhD December 1990 |
A syntactic approach to document representation and labeling |
Mathews Thomas |
MS May 1988 |
Knowledge representation schemes for a document analysis system |
Marina Maculotti |
Laureate (U. Genoa) 1988 |
Strutture date ed algorithmi
per il riconoscimento di testi con formule matematiche |
CERVITOR
We develop computational methods that assist doctors
to organize, represent and query information in large image databases. The
repertoire of images under consideration all images that reside in a
cervigram archive of 60,000 images from NCI/NLM forms a narrow image domain
that has a limited and predictable variability. In such a domain, explicit
representation of domain knowledge alleviates the semantic gap between the raw
sensory recordings of a scene (i.e., raw image data), and objects and processes
implied from images (i.e., semantic interpretation). Weexplore, in the domain
of cervical images, an information hierarchy that proceeds from raw image data
to low-level image features, to recognition of objects, and finally, to
knowledge. (NSF supported joint project with Profs. D. Lopresti, X. Huang, G.
Tang at Lehigh U.)
PERFECT
PERFECT
is an acronym that stands for "Paper and Electronic Records for Elections:
Cultivating Trust." We study the reliable processing of paper ballots and
other hardcopy election records. Participating institutions include Lehigh
University, Boise State University, Muhlenberg College, and Rensselaer Polytechnic
Institute. Of current interest in DocLab is the development and evaluation of a
camera-based portable ballot counter. (NSF supported joint project with Profs C. Borick, Z. Munson, D. Loprest, E.
Barney Smith.)
TANGO (Table ANalysis for
Generating Ontologies).
TANGO
aims to develop conceptual-model-based extraction and table recognition, in new
and innovative ways to: (i) understand a tables structure and its conceptual
content; (ii) discover the constraints that hold between concepts extracted
from the table; (iii) match the recognized concepts with ones recognized in
other tables; (iv) merge the resulting structures to create a domain ontology;
and (v) adjust the created domain ontology so that it is a clean, complete,
accurate, and redundancy-free conceptualization of the source tables.
(NSF-supported joint project with Profs. D. Embley, D. Lonsdale et al. at BYU.)
Style context and adaptation
in OCR and ICR
We achieve near single-font and single-writer classification
performance in a multifont or multi-writer environment by taking advantage of
style consistency among characters in the same field, word, line, or document.
Electronic Ink
We apply string matching algorithms to on-line
handwriting analysis. Instead of training a recognizer, we compare bigram or
longer segments of features of an unknown word to segments of a reference
string of words at precomputed positions. Symbolic indirect correlation
is our approach to recognizing sequences of features whose relative order, but
not relative position, is preserved.
Camera Assisted Visual Interactive
Recognition (CAVIAR)
We
have developed algorithms and software for a camera-based recognition system.
CAVIAR draws on sequential pattern recognition, image database, expert systems,
pen computing, and digital camera technology. It recognizes wild flowers more
accurately than machine vision and faster than most laypersons. The novelty of
the approach is that human perceptual ability is exploited through interaction with
the image of the unknown object. The computer remembers the characteristics of
all previously seen classes, and displays the top-ranking candidates based on
already detected features.
The
interaction is based on the few primitive actions that can be executed easily
with a stylus and a small, touch-sensitive display. The richness of the
interaction results from its interpretation. When automated segmentation fails,
the operator need only point to an incorrectly segmented part of the picture.
Standard color, shape and texture features are instantly re-computed and the
new top candidates are displayed for operator acceptance or further search.
We have developed an MS-Windows style prototype, using a public domain Intel
computer vision library. Collaborators have ported the system to a digital
camera and a pocket computer. Possible modes of deployment include web cameras
with server-mediated classification, camera-back interaction, PDA-camera
combinations, and self-contained stationary systems for industrial or luggage
inspection. Our principal research objective is to establish a sound basis for
partitioning the necessary tasks between the operator and the machine. We also
hope to find partners to apply CAVIAR to industrial classification and
training, and to K1-12 and university-level education.
A current application is conversion of cervicograms to symbolic form.
Registration of multimodal brain
images
Analysis of X-ray microtomographs of
fracture in concrete
Segmentation and labeling of
digitized pages from technical journals.
Modeling random-phase spatial
sampling of printed characters.
Integrated
feature and classifier design.
Isolated hand-printed digit
recognition.
Validation of pseudo-random character
defect models for OCR.
SELECTED PUBLICATIONS (since 2003)
(For additional items, please see G.
Nagys list of publications
G. Nagy, S.
Veeramachaneni, Adaptive
and interactive approaches to document analysis, in Machine Learning in
Document Analysis and Recognitionl (S. Marinai, H. Fujisawa, editors),
Springer, Studies in Computational
Intelligence, Vol. 90, ISBN 978-3-540-76279-9, pp. 221-257, 2008.
D.
Lopresti, G. Nagy, S. Seth, X. Zhang, Multi-character
Field Recognition for Arabic and Chinese Handwriting, in Arabic &
Chinese Handwriting Recognition (D. Doermann, S. Jaeger, editors), Springer
LNCS # 4768, pp. 218-230, 2008.
S.
Veeramachaneni and G. Nagy, Analytical
results on style-constrained Bayesian classification of pattern fields, IEEE
Transactions on Pattern Analysis and Machine Intelligence, Vol. 29, #7, pp.
1280-1285, July 2007.
G. Nagy, Digitizing,
coding, annotating, disseminating, and preserving documents, Procs.
IWRIDL-2006 workshop on Digital Libraries,
J. Zou and
G. Nagy, Visible
models for interactive pattern recognition, Pattern Recognition Letters Vol. 28, pp 2335-2342, 2007.
J. Zou, Q.
Ji, and G. Nagy, A
Comparative Study of Local Matching Approach for Face Recognition, IEEE Transactions on Image Processing,
Vol. 16, #10, pp. 2617-2628, October 2007.
Ashutosh Joshi, George Nagy, Daniel P. Lopresti and Sharad C. Seth, "A Maximum-Likelihood Approach to Symbolic Indirect Correlation," 18th ICPR, vol. 3, pp. 99-103, 2006.
Xiaoli
Zhang and George Nagy, "Style
Quantification of Scanned Multi-source Digits," 18th ICPR, vol. 2, pp.
1018-1021, 2006
Srinivas
Andra and George Nagy, Combining
Dichotomizers for MAP Field Classification," 18th ICPR, vol. 4, pp.
210-214, 2006.