NCGIA Workshop on Geographies of the Information Society, Statement by Wm. Randolph Franklin, (24-Feb-1997)

Background

This contains my preliminary statement for the NCGIA Workshop on Geographies of the Information Society, Feb 26-28, 1997.

Altho my degrees are in CS and Applied Math, I've been assisting geographers in programming since around 1970. Currently I research in geometric algorithms, most recently in terrain visibility and elevation data compression. For more information, follow links from my home page.

My comments are organized by the Project Varenius strategic areas.


Disclaimers

So far, this statement lacks qualifiers, introductory material, and moderating bulk. Sometimes it is written provocatively.


Cognitive Models of Geographic Space

Theory is nice, but it should be validated by experiment. Try this model in a new whatever, and see what happens:

Law of Unintended Consequences

The practical implications of most new technology are not predicted in advance, so this is a chance for some experimental science.

Some examples of new technology's unpredicted uses:


Computational Implementations of Geographic Concepts

A Plea for Simple Algorithms and Data Structures

There is a need to investigate simple algorithms and data structures in GIS. Simple algorithms have the advantage that you don't spend all your time getting the program syntactically legal, let alone executing correctly.

They are also implementable, which is saying something, and take a reasonable time on reasonable problems, regardless of their asymptotic complexity.

Also, after getting the algorithm implemented correctly, there is time to play with the program to understand its behavior and improve it. Simple algorithms also often parallelize more easily and better. Finally simple algorithms also tend to work very fast on large datasets since their constant factors in their times are often small.

A worthwhile question is this: Why do some simple data structures work so well in practice? What does in practice mean? It's unfairly optimistic to assume the data to be uniform and uncorrelated, but unfairly pessimistic to make no assumptions at all.

Simple algorithms with limitations can also sometimes do quite well in places where no theoretical, worst-case, algorithm exists. In Computational Geometry, for example, there is the red-blue edge segment intersection problem. We have N red line edges (line segments) and N blue ones. Most of the red edges intersect each other, and most of the blue edges intersect each other. We do not know, and don't want to have to determine, all those red-red and blue-blue intersections. Instead we want to find the few red-blue intersections.

So far as I know, there is no good theoretical, asymptotic algorithm. Nevertheless, a bucketing, uniform grid approach may find them all in O(size_in+size_out) time, w/o finding any red-red or blue-blue intersections, if the input is not too badly distributed.

Nevertheless, simple algorithms can still cause deep theoretical questions. They include expected performance (over what probability measure?), limitations, and how to work around those limitations. Is much known about expected performance of discrete data structures? Even the computer scientist Don Knuth once incorrectly analyzed the average depth of a 3 node binary search tree under uniform, random, insertion and deletion.

Democratization of Data Manipulation

One constant theme in engineering is that tasks that formerly required a specialist eventually can be done by the average user. No longer do we need to hire a clerk to write our letters or a chauffeur to drive our car.

This trend can go too far - it's not clear that voice mail is a benefit to society. Nevertheless, the general concept is desirable. It's a disintermediation or democratization process, that is, letting the end user perform the task himself, without needing the help of a professional priesthood.

Many people deplore this trend. E.g., we see warnings about the dangers of allowing ignorant lusers to play at land-use planning on a PC with some toy GIS.

However, we should encourage this trend. Tools that are susceptible of misuse should be designed, when possible, to make it easier to use them properly. Continuing the be-your-own-chauffeur analogy, some people do crash their cars. Sometimes we yank their licenses; more often we design safer cars and roads.

How can geographic information systems be designed so that they can be easily used by intelligent ignorant users, and still probably give correct results?

Formally Modeling Approximate Computations

There is a great need for a better formal model of approximate computations, where roundoff occurs. There are also surprisingly few people working in this field.

As a point of reference, it may be useful to look at what numerical analysts have learned about matrix calculations in the last 50 years.

At the start, they figured that inverting a 40x40 matrix would be impossible because of roundoff. Then they decided that the important properties were the largest and smallest eigenvalues, and devised methods that preserved those.

When trying to solve linear equations, they devised "conditioning" measures of how bad the problem was. E.g., when intersecting two lines, the more nearly parallel the lines are, the worse the condition number. If a matrix was ill-conditioned, they observed that the calculated solution to Ax=b could be arbitrarily far from the actual solution. So, they inverted things, and invented backwards error analysis.

Instead of asking how far the calculated solution was from the real solution, they asked how much the input would have to be perturbed to make the calculated solution correct. This was more bounded.

The problem with applying these lessons to geometry is that here the structure of the problem, and the set of relations that must be preserved, is much richer. In principle, any roundoff error at all could be enlarged to be significant. To be perfect, a robust geometry system would have to incorporate a geometry theorem prover to determine the implications of any roundoff. This is not practical.

Lazy evaluation combined with increased precision only when needed is probably about the best. This idea has been around for some time, and was mentioned at the Dagstuhl on Computational Cartography in November.


Geographies of the Information Society

Lack of Privacy

What are implications of being in a recorded society, as Bill Gates put it?

Equal Access to Personal Data

  • Should we imitate Europe, and give people the right to copies of their records in databases?
  • A de minimis exception could be necessary.
  • Should organizations, such as credit bureaus, take more responsibility for forwarding incorrect data?
  • Access to Class Data

    This means access to data about the whole group that I'm a member of. If I'm considering should I have the right to see, respectively,

    The data should be anonymous, but, apart from that, be individual records, not just aggregate.

    Access to the Analysis Tools Also

    We should make not only the data, but also the analysis tools available online, so that people can test hypotheses themselves. This makes it harder for insiders to cook the data. A current example: what do airbags do for safety? A few years ago, NHTSA did a major re-analysis of the data, changing numbers by a factor of two, in the obvious direction. Did they cook the data?

    What if a table containing every accident were online, and the code used to analyze it?

    Ditto for electric power lines and cancer. Nuclear power plants and cancer. Shoe sizes and death rates (Males have larger feet than, and double the death rate of, females). ).

    Geographic Restrictions Obsolete

    Geographic restrictions on data use are becoming silly. Examples include

    Preserve Public Access

    As the proposal states, the "commons" is shrinking.


    Wm. Randolph Franklin, Associate Professor
    Email: wrfATecse.rpi.edu
    http://wrfranklin.org/
    ☎ +1 (518) 276-6077; Fax: -6261
    ECSE Dept., 6026 JEC, Rensselaer Polytechnic Inst, Troy NY, 12180 USA
    (GPG and PGP keys available)