NCGIA Workshop on Geographies of the Information Society,
Statement by Wm. Randolph Franklin, (24-Feb-1997)
Background
This contains my preliminary statement for the
NCGIA Workshop on Geographies of the Information Society, Feb 26-28, 1997.
Altho my degrees are in CS and Applied Math, I've been assisting
geographers in programming since around 1970. Currently I research in
geometric algorithms, most recently in terrain visibility and
elevation data compression. For more information, follow links from
my
home page.
My comments are organized by the Project Varenius strategic areas.
Disclaimers
So far, this statement lacks qualifiers, introductory material, and
moderating bulk. Sometimes it is written provocatively.
Cognitive Models of Geographic Space
Theory is nice, but it should be validated by experiment.
Try this model in a new whatever,
and see what happens:
Law of Unintended Consequences
The practical implications of most new technology are not predicted in
advance, so this is a chance for some experimental science.
Some examples of new technology's unpredicted uses:
- The purpose of the Arpanet was sharing computing resources, but
the biggest use for a long time was email. This was opposed at one
point as being a frivolous waste of an expensive resource.
- The largest use of mainframe computers, at one point, certainly
in the liberal arts, was possibly word
processing. Computer centers didn't much like that.
Computational Implementations of Geographic Concepts
A Plea for Simple Algorithms and Data Structures
There is a need to investigate simple algorithms and data
structures in GIS. Simple algorithms have the advantage that you
don't spend all your time getting the program syntactically legal, let
alone executing correctly.
They are also implementable, which is saying something, and take a
reasonable time on reasonable problems, regardless of their
asymptotic complexity.
Also, after getting the algorithm implemented correctly, there is
time to play with the program to understand its behavior and
improve it. Simple algorithms also often parallelize more easily
and better. Finally simple algorithms also tend to work very fast
on large datasets since their constant factors in their times are
often small.
A worthwhile question is this: Why do some simple data
structures work so well in practice? What does in
practice mean? It's unfairly optimistic to assume the data to
be uniform and uncorrelated, but unfairly pessimistic to make no
assumptions at all.
Simple algorithms with limitations can also sometimes do quite
well in places where no theoretical, worst-case, algorithm exists. In
Computational Geometry, for example, there is the red-blue edge
segment intersection problem. We have N red line edges (line
segments) and N blue ones. Most of the red edges intersect each
other, and most of the blue edges intersect each other. We do not
know, and don't want to have to determine, all those red-red and
blue-blue intersections. Instead we want to find the few red-blue
intersections.
So far as I know, there is no good theoretical, asymptotic
algorithm. Nevertheless, a bucketing, uniform grid approach may
find them all in O(size_in+size_out) time, w/o finding any red-red
or blue-blue intersections, if the input is not too badly
distributed.
Nevertheless, simple algorithms can still cause deep theoretical
questions. They include expected performance (over what probability
measure?), limitations, and how to work around those limitations. Is
much known about expected performance of discrete data structures?
Even the computer scientist Don Knuth once incorrectly analyzed the
average depth of a 3 node binary search tree under uniform, random,
insertion and deletion.
Democratization of Data Manipulation
One constant theme in engineering is that tasks that formerly required
a specialist eventually can be done by the average user. No longer do
we need to hire a clerk to write our letters or a chauffeur to drive
our car.
This trend can go too far - it's not clear that voice mail is a
benefit to society. Nevertheless, the general concept is desirable.
It's a disintermediation or democratization process, that is, letting
the end user perform the task himself, without needing the help of a
professional priesthood.
Many people deplore this trend. E.g., we see warnings about the
dangers of allowing ignorant lusers to play at land-use planning on a PC
with some toy GIS.
However, we should encourage this trend. Tools that are susceptible
of misuse should be designed, when possible, to make it easier to use
them properly. Continuing the be-your-own-chauffeur analogy, some
people do crash their cars. Sometimes we yank their licenses; more
often we design safer cars and roads.
How can geographic information systems be designed so that they can be
easily used by intelligent ignorant users, and still probably give
correct results?
Formally Modeling Approximate Computations
There is a great need for a better formal model of approximate
computations, where roundoff occurs. There are also surprisingly few
people working in this field.
As a point of reference, it may be useful to look at what numerical
analysts have learned about matrix calculations in the last 50 years.
At the start, they figured that inverting a 40x40 matrix would be
impossible because of roundoff. Then they decided that the
important properties were the largest and smallest eigenvalues,
and devised methods that preserved those.
When trying to solve linear equations, they devised "conditioning"
measures of how bad the problem was. E.g., when intersecting two
lines, the more nearly parallel the lines are, the worse the
condition number. If a matrix was ill-conditioned, they observed
that the calculated solution to Ax=b could be arbitrarily far from
the actual solution. So, they inverted things, and invented
backwards error analysis.
Instead of asking how far the calculated solution was from the
real solution, they asked how much the input would have to be
perturbed to make the calculated solution correct. This was more
bounded.
The problem with applying these lessons to geometry is that here
the structure of the problem, and the set of relations that must
be preserved, is much richer. In principle, any roundoff error at
all could be enlarged to be significant. To be perfect, a robust
geometry system would have to incorporate a geometry theorem
prover to determine the implications of any roundoff. This is not
practical.
Lazy evaluation combined with increased precision only when needed is
probably about the best. This idea has been around for some time, and
was mentioned at the Dagstuhl on Computational Cartography in
November.
Geographies of the Information Society
Lack of Privacy
What are implications of being in a recorded society, as
Bill Gates put it?
- Everywhere you go will be known.
- There is a Scandinavian proposal to observe, in real time, the
locations of all cars. The stated purpose is to micro-charge tolls
for road use, varying by the time of day and location.
- It was reported that the license plates of all cars entering and
leaving the
City of London are photographed and OCR'ed and recorded.
- A database of all world air travelers is being compiled. Most
recently, the
Flight 800 explosion was used as a pretext to require future identification
of all passengers.
-
Small, cheap, unobtrusive surveillance devices are being used more in
cities. In 20 years, will they be ubiquitous in the wilderness also?
- Face recognition technology, perhaps by thermal imaging, is
being developed.
- Every piece of info you access will become known, when accesses
are online.
- This means, not only which newspapers, but which articles.
- If you make an expensive claim for service (insurance claim, lawsuit, etc.)
you may soon expect those records to be retrieved.
- Effects of this include the following.
- Positive correlations, data mining, will make it much less
expensive for them to cause us a lot of trouble.
E.g., if you're a male enrolled in college, but not registered for the
draft, be prepared to explain why, with proofs.
- Secret databases are being established containing lists of suspects. You
have no right to know that you're in them, or to dispute their contents.
- You won't even know why you're being hassled when you: cross the
border, file a claim, etc.
- Official misuse of this. Cf. political abuse of IRS.
- Private misuse might be worse since here, often, there aren't even formal safeguards.
- Contrary to popular impression, the internet might lead to greater
control, not greater freedom.
Equal Access to Personal Data
Access to Class Data
This means access to data about the whole group that I'm a member of.
If I'm considering
- being admitted to a hospital,
- buying insurance,
- applying to a college, etc
should I have the right to see, respectively,
- how many patients with the same operation died in the hospital,
- how many claims the insurance company denied,
- the average grade in each course.
The data should be anonymous, but, apart from that, be individual
records, not just aggregate.
Access to the Analysis Tools Also
We should make not only the data, but also the analysis tools
available online, so that people can test hypotheses themselves. This
makes it harder for insiders to cook the data. A current example:
what do airbags do for safety? A few years ago, NHTSA did a major
re-analysis of the data, changing numbers by a factor of two, in the
obvious direction. Did they cook the data?
What if a table containing every accident
were online, and the code used to analyze it?
Ditto for electric power lines and cancer. Nuclear power plants and
cancer.
Shoe sizes and death rates (Males have larger feet than, and double the
death rate of, females). ).
Geographic Restrictions Obsolete
Geographic restrictions on data use are becoming silly. Examples
include
- blacking out certain cities from satellite reception,
- country codes in DVD,
- geographic licensing of the Olympics.
Preserve Public Access
As the proposal states, the "commons" is shrinking.
Wm. Randolph Franklin,
Associate Professor
Email: wrfATecse.rpi.edu
http://wrfranklin.org/
☎ +1 (518) 276-6077; Fax: -6261
ECSE Dept., 6026 JEC, Rensselaer Polytechnic Inst, Troy NY, 12180 USA
(GPG and PGP keys available)