I am a student, teacher, reviewer, referee, and critic, and lucky enough to have had half-a-dozen persistent ideas. I don’t consider these ideas original, because any idea can be traced to a precedent. I just took custody of these notions and nursed them along till they could stand on their own feet.
In spite of many diversions, I keep returning to two intertwined strands. The first is adaptation. I hesitate to call it unsupervised learning, because all learning requires some external information. It need not, however, be a labeled training set! The second strand is judicious use of human interaction. Learning here cuts both ways: the machine can learn from us, and we can learn from the machine. For five decades I have sought to incorporate these notions into engineering solutions in various problem domains.
I have been less active in organizing conferences than many of my colleagues. This I attribute to a traumatic organizational experience nearly fifty years ago, at the very first workshop on Pattern Recognition, which was chaired by Al Hoagland, my boss’s boss at the IBM TJ Watson Research Center. He let me help in inviting participants and setting up the program, and then sent me down to Puerto Rico three days early to make sure that everything was ready.
When I arrived, the magnificent El Conquistador hotel, overlooking the cliff at the tip of the Island in Fajardo, was being expanded. The Swiss hotel manager allayed my concern about the pneumatic drills reverberating through the meeting rooms and assured me repeatedly that all construction work will stop when the workshop began. He also sent baskets of fruit and cognac to our room every night.
The construction did not stop: the danger of forsaking the on-time completion performance bond was too high. During the first morning session, I was asked to find an alternate venue, charter buses, procure sandwiches, and resettle the entire workshop at lunchtime. I did, and the second session took place, only one hour behind schedule, at the El Dorado 50 miles away.
During the next months I helped transcribe the conference talks from reel-to-reel tape. Laveen Kanal edited and published them under the title Pattern Recognition (Thompson 1968). My own report, “A Happening in Puerto Rico,” appeared in Computer Group News (that eventually morphed into Computer) and in Spectrum. Azriel Rosenfeld gave me some excellent suggestions, and I received many complements for my summary. Nevertheless this episode left me skittish about local arrangements chair honors, and I was glad to return to my research in Yorktown.
My first assignment was Chinese character recognition to feed a translation program for the Air Force. Working with Dick Casey, whom I met while we were being “processed” by IBM Personnel, made hierarchical classification of 1000 classes of ideographs joyful and exciting. We had a chance to review progress on subsequent developments after 25 years at ICPR 1988, but most of the Chinese OCR experts stayed away because the conference was moved on short notice from Beijing to the outskirts of Rome.
The next project suggested by our insightful boss Glenn Shelton, self-corrective character recognition, had an even more lasting impact. The idea was simple but counterintuitive. We assigned alphanumeric labels to a set of printed characters from the same source using a garden-variety classifier. This naïve classifier got many of the labels wrong. Nevertheless, we pretended that all the labels were correct, used them to design a new classifier, and then had the new classifier reclassify all the patterns. The resulting error rate was lower than the original error rate, and it kept decreasing during additional iterations of the classify-redesign cycle.
Nearly thirty years later, the idea caught Henry Baird’s attention. He offered to replicate it with his own classifier, and on his huge, one-hundred-font data set. The experiments again demonstrated that the classifier lifted itself by its own bootstraps. Still later, Yihong Xu demonstrated that adaptation could reduce the size of the training set required to recognize degraded and touching print.
The matter rested there until two of my students, Prateek Sarkar and Harsha Veeramachaneni brought their insight and formidable mathematical skills to bear on it. Prateek defined style consistency as the statistical dependence between features (not labels!) in a field of isogenous patterns. As opposed to font recognition, the optimal style-constrained classifier never identifies the underlying style but exploits the fact that the appearance of a 6 in a given style gives some information about the appearance of a 9 in the same style. Prateek modeled classes and styles by weighted multi-modal feature distributions, applied the EM algorithm to style-unsupervised training, and found a computable approximation for the most likely field class.
Harsha proposed to classify fields of isogenous patterns with a style-conscious quadratic discriminant field classifier. He showed that the covariance matrix of a field of any length can be computed from a composition of pairwise cross-covariance matrices and optimized his classifier for singlet-error or field-error. He investigated order independence that distinguishes style from language context, intra-class style, inter-class style, and the effect of field length on error rate.
Xiaoli Zhang, my last PhD student, explored the nature of style consistency in high-dimensional feature spaces. Prateek and Harsha demonstrated the benefits of style classifiers with an informative series of experiments on simulated patterns, on printed digits, and on NIST handprinted digits. I learned a great deal from them, and keep learning more as they keep coming up with truly interesting ideas. On a recent visit to Beijing, I was glad to see that Cheng-Lin Liu – who participated in this work while at Hitachi Central Research – and his students at the Beijing Institute of Automation are integrating style-conscious classification with segmentation, geometric context and language context for handwritten Chinese text.
Contrapuntal to the idea of shape context is recognizing scanned printed text without any preconception whatever of the shape of the characters. Dick Casey and I proposed to cluster feature vectors, represent the text by the sequence of arbitrary cluster labels, and decode the resulting substitution cipher (the Romans could already crack such ciphers). One problem that we had to solve was stopping the cluster merge/divide process at the natural number of clusters. Another that later became important in building language models was the estimation of rare bigram frequencies (we modified Laplace’s Law of Succession). It all worked like a charm, so we published it in IEEE Trans. Comp in 1968.
A decade later, when there was enough memory to store a dictionary of a few hundred common words, Sharad Seth, Kent Einspahr, and I developed more elegant methods that could solve the puzzle with only a few lines of upper and lower case scanned text. At ICPR 2000, Tin Ho and I presented “OCR with no shape training” on Spitz glyphs that befuddle human readers. So our “Autonomous Reading Machine” had a long run.
There was one more invention in those early years, suggested by Peter Welch, that resurfaced later. That was the notion of symbol-based compression of text images. We could not patent it because of an on-going anti-trust suit, so IBM did not let us publish it until 1974. I received many inquiries about it twenty-five years later when DjVu adopted a variant of the method (as did, still later, JBIG2). We had, however, missed a critical point. The prototype that took the place of the current letter always looked right, but nevertheless the method was not lossless. Subsequent researchers encoded the difference between the current glyph and its prototype.
In 1968 I spent a reverse sabbatical at the Université de Montrèal, half time in Informatique teaching pattern recognition and half-time in Neurophysiologie. I poked micro-electrodes into the lateral geniculate nucleus of cats to find an aural equivalent to Hubel’s and Wiesel’s Nobel prize-winning discovery of the organization of the visual cortex. I did not, but had my first wonderful graduate student, Kamal Abdali. He recently told me that the problem that I had set for him, optimal feature selection, was eventually proved NP-complete.
Whenever I ran out of ideas, I read voraciously and wrote surveys about whatever I read. My first one was mostly the lit review from my dissertation on analog memory systems for neural networks. The next one, “State of the Art in Pattern Recognition,“ was based on my preparation for Don Nelson’s invitation to give some lectures on pattern recognition to his staff of the computing center of the University of Nebraska (UNL). It won a Citation Classic award.
Towards the end of my stay at IBM Research, I was asked to help define the Landsat missions, so I wrote a survey about remote sensing. After I moved to UNL, I wrote lengthy reviews on geographic data processing (later GIS) with Sharad Wagle, on image registration with Bill Brogan, on text editors with Dave Embley, on optical scanners, and on OCR.
At UNL I also guided some student research. In 1975, Gregory Harambopopoulos analyzed a whole year’s worth of log tapes for the university’s mainframe computer. The most cycle-intensive customer was the Chemistry Department. We discovered that more than 10% of all jobs failed because of JCL (Job Control Language) errors. The timing was lucky, because the proliferation minicomputers ended the possibility of observing an entire university’s computing activities from a single vantage point. Our paper won a prize at a computer performance analysis conference, and Gregory made a career in performance evaluation with the federal government.
MS student Anandan, Dave Embley, and I applied file comparison methods to study student programming errors. (Thirty years later, Anandan and I were inducted together into the UNL Hall of Computing. Imagine!)
Sharad Seth and I introduced X-Y trees at ICPR 1984. Researchers are still adapting them to applications far beyond what we had in mind (see Getting to Know...Andreas Dengel, IAPR Fellow in this issue). During a summer break at IBM San Jose, Dick Casey, and I devised a probabilistic method of decision tree design that made its way into commercial OCR products.
Through Herb Freeman I linked up with Massimo Ancona, Leila De Floriani, Bianca Falcidiano, and Caterina Pienovi in Genoa. We exchanged visits (often with our students) and worked on triangulating topographic data, on terrain visibility, and on 3-D boundary models. This mix of topics triggered the conjecture that the triangles in any Delaunay tessellation could be ordered by geometric visibility. Finding a proof for this new theorem of planar Euclidian Geometry took us almost a year. Within a few months, Herbert Edelsbrunner generalized it to N dimensions.
After I moved to Rensselaer Polytechnic Institute (RPI), I continued to write surveys until I could reestablish my research. A lengthy treatise on OCR with Sharad Seth had to be re-titled “Modern Optical Character Recognition,” because the M volume of the Encyclopedia of Telecommunications was about to go to press. I wrote one about the history of neural networks, mainly because the 1990 International NN Conference was held in Paris. Dick Casey and I enjoyed another trip to France when we were invited to give a tandem survey of document analysis at the first ICDAR. Later I surveyed the frontiers of OCR and terrain visibility. Another review, “Twenty Years of Digital Image Analysis in PAMI,” has already garnered more citations than my 1968 “State of the Art…’, but I attribute that only to citation inflation. Dan, Dave, Sharad, Matthew Hurst, and I merged our references on table processing for IJDAR. Harsha and I contributed a survey to Simone Marinai and Hiromichi Fujisawa’s fine book on Machine Learning in Document Analysis and Recognition. I never expected that I would get so much mileage out of my three-week McGill summer course on surveying!
RPI DocLab was also Randolph Franklin’s Computational Geometry Lab. Maharaj Mukherjee (the second of my three consecutive PhD students from IIT Kharagpur and now a master inventor at IBM) and my former UNL student Shashank Mehta (now Professor at IIT Kanpur) used continuing fractions to find the best integer coefficient representation of geometric points and lines of intersection that are naturally calculated as floating point numbers. Maharaj was inspired by David Dobkin’s pentagon problem, but that turned out to be too hard.
We made some (small) waves in 1992 with a 2-D page-grammar-driven layout analysis for technical journals. Sharad, my RPI colleague Krishnamoorthy, and I just looked over the shoulders of PhD candidate Mahesh Viswanathan, who did all the work. Since then Mahesh has risen to Chief Cloud Architect and Master Inventor at IBM and no longer seems to fear us.
Tom Nartker invited me to spend parts of three summers at his Information Science Research Institute in Las Vegas with Steve Rice and my former student Junichi Kanai (now my colleague at RPI). Between expeditions to the twisting canyons surrounding the City of Sin, we devised evaluation methods for many different facets of OCR and layout analysis. The tests conducted by ISRI on commercial systems may still represent the best publicly available information on OCR accuracy on a variety of document types. The striking differences exhibited by OCR engines on the same scanned pages inspired Steve to create a coffee table book of error “snippets.” He enlisted Tom and me as co-authors, and Kluwer published it in 1999.
Dan Lopresti and I met at a DAIR conference in Las Vegas where we presented back-to-back papers on the validation of image defect models. With some help from Andrew Tompkins, we combined our ideas in a PAMI article. Since then we have collaborated on dozens of papers on table recognition and on mark-sense election technologies. Some of the table work was conducted with Sharad and Moorthy in the framework of Dave Embley’s long-lived TANGO (table ontologies) project, while my former student Elisa Barney Smith (now Associate Prof at BSU) contributed heavily to the ballot image processing. We also combined forces with Prateek and Jiangying Zhou for a formal analysis of the ever-present noise due to the random position of the spatial sampling grid relative to a scanned document. More recently, Dan and I have coauthored a series of white papers on prospects in DIA.
Let me return briefly to my interest in HCI (human computer interaction). It languished for quite a while because most of my students were so algorithmically inclined. Eventually, however, I persuaded an initially skeptical Jie Zou to develop an interactive classification system for his dissertation. Jie designed a user-friendly GUI and demonstrated that routine correction of errors can improve recognition on new data with minimal initial training. The human also learns from the system. Jie constructed CAVIAR (Computer Assisted Visual Interactive Recognition) systems for wildflowers and for face recognition. Among the best parts of the project was photographing the flowers in situ. Jie subsequently ported the interface to the web, so that it could be used with mobile devices. Sharad, Dave, and Dan liked the CAVIAR idea, so we are now applying it to tables (VeriClick), cervigrams (Cervitor), and calligraphy (CalliGUI). Perhaps the time has finally come for systems that improve with use!
Occasionally I am asked what is the best thing that I have ever written. My private hope is that I have not yet written my best, but I usually offer “Candide's Practical Principles of Experimental Pattern Recognition” (in PAMI 1983) because it is less than two pages and has no references. The editors did ask me for references, but I refused for fear of retribution for whistle blowing. I consider six of my articles tied for worst paper.
Only once have I ever written anything that I deemed poetic: “The Dimensions of Shape and Form.” I presented it at a delightful conference in Capri organized by my faithful Neapolitan friends Luigi Cordella, Carlo Arcelli, and Gabriella Sanniti di Baja. They kindly included it in their Visual Form (Plenum, 1991), but either it did not scan right, or poetry is not appreciated in our circles.
Research projects with my children rank high among my most pleasurable experiences,. My students and I collaborated on restoring diacritics and on clustering dialects with my daughter, and on quantifying crack-propagation in concrete with my son. In spite of any gripes that they may have overheard, both opted for university teaching careers.
These days I am often invited to give retro talks. I talked about OCR test data since the fifties at DAS 2010, about computer science research in the seventies at UNL, about Frank Rosenblatt’s perceptrons at Pace University. IAPR’s TC-1 has entrusted me with the bitter-sweet task of delivering the Pierre Devijver Award Lecture at S+SSPR 2012. Recently ABBYY offered to digitize, OCR, and index 19 bankers’ boxes of my OCR and DIA memorabilia. These cannot be posted on the web because of copyright issues, but I will make the searchable PDF files available to the IAPR community.
Now that I have time, I enjoy programming again – in M-code and Python. It is so much easier to run a large experiment these days. My only new project, on Chinese calligraphy, was initiated last year when Shanghai Professor Xiafen Zhang visited DocLab for six months.
There is no question that I have had a privileged life. My wife and I still enjoy breakfast and dinner together after 48 years. We look forward (and backward) to holidays with our children and their sig-others. I have kept some friends from those 19 years between Grade 1 and PhD. From time to time Dick Casey and I find a day for wide ranging discussion. When Hiromichi Fujisawa’s international standards-chair schedule permits, we walk and talk. There are always one or two papers on the burner with my staunch collaborators Sharad, Dave, Dan, and Moorthy, or with one of my ex-students. Most of my wonderful graduate students keep in touch. Several parents, partners, children, and children’s partners have also become part of our circle. Thank you all!
Getting to know…
A Self-serving Review of My Own Work
By George Nagy, IAPR Fellow (USA)
Professor George Nagy, IAPR Fellow
ICPR 1998, Brisbane, Australia
For contributions to document image analysis and for service to IAPR
George Nagy graduated from McGill University in Engineering Physics (fencing and chess). He earned his MS at McGill by solving Euler's Second Equation for the hysteresis motor. He was awarded the PhD at Cornell University in 1962 for helping Frank Rosenblatt build Tobermory, a sixteen-foot, four-layer neural network for speech recognition. After a short postdoc, he worked on character recognition and remote sensing at IBM, Yorktown (and claims credit for IBM's growth during this period). During a reverse sabbatical at the Université de Montréal he recorded pulse trains from cats' medial geniculate nuclei. In 1972, he was appointed chairman of the Department of Computer Science at the University of Nebraska where he dabbled in computational geometry, GIS, and HCI, and was eventually inducted in the NHC. From 1985 until his retirement in 2011, he was Professor of Computer Engineering at RPI in Troy, NY. Nagy's credits in document analysis include Chinese character recognition with Dick Casey, "self-corrective" character recognition with Glen Shelton (with a reprise twenty-eight years later with Henry Baird), character recognition via cipher substitution with Casey, Sharad Seth, and Tin Ho, growing X-Y trees with Seth, table interpretation with Dave Embley, Mukkai Krishnamoorthy, Dan Lopresti, and Seth, modeling random-phase noise with Prateek Sarkar and Lopresti, style-constrained classification with Sarkar, Harsha Veeramachaneni, Hiromichi Fujisawa, and Cheng-Lin Liu, and paper-based election systems research with Lopresti and Elisa Barney Smith. He tries to keep up with the state of the art by learning new ideas from former students. He is a Fellow of the IEEE and the IAPR, and received prematurely (in 2001) the ICDAR Lifetime Contributions Award. In his spare time Nagy enjoys skiing, sailing, and writing prolix surveys. Since his retirement from RPI in June, 2011, he enjoys having more time for family, friends, writing programs, reading (too often, patents), and septuagenarian outdoor activities.
for other articles in the
Get to know...Series