Terrain Elevation Data Structure Operations
Wm Randolph Franklin
Rensselaer Polytechnic Institute
http://wrfranklin.org/
Goal
-
Operate on terrain elevation data.
-
Compare competing representations:
-
Operations:
-
Visibility
-
Compression
-
Interpolate contours to grid
-
Approx grid to TIN
-
Drainage (future)
-
Integrate into a system
How the Parts Fit Together

Theme
-
Experimental Computational Cartography
-
Interface between the theoreticians and the computer
-
Design then implement algorithms and data structures
-
efficiently
-
on the largest (not toy sized!) datasets
-
then
-
observe results and
-
suggest new research problems
-
use general purpose data structures and algorithms,
-
benefit from larger community effort
-
like using Open Source Software instead of proprietary packages.
-
Notable special-purpose failures: Lisp machines, floating point processors,
database engines, special graphics engines, and most parallel machines.
What's New?
... since people have converted data for several decades
now.
-
hardware,
-
SW packages,
-
specific algorithms.
Historical Context
-
Various formats possible for Hypsographic (elevation) data
-
Triangulated Irregular Networks (TINs),
-
grids or arrays, and
-
contour lines.
-
Which is best?
-
Criteria:
-
complexity, size, and accuracy
-
need for conversion algorithms.
-
need for a measure of goodness for approx conversions.
-
simplest: RMS error.
-
more useful: effect on
-
visibility, drainage, slope
Research Program Components
-
Conversion from DEM to TIN
-
Visibility
-
DEM Compression
-
Interpolating from Contours to DEM
Conversion from DEM to TIN
-
I designed possibly the first TIN program in GIS, in 1973.
-
Now, can easily completely TIN a complete 1201x1201 DEM.
Extensions:
-
How does error correlate to number of triangles?
-
Is a Delauney triangulation the best?
-
Is a constrained triangulation, forcing features to be included, necessary?
-
Should we alternate inserting and deleting points?
Visibility
-
viewshed: What points can an observer see?
-
Visibility indices: How much area can each
possible observer see?
Extensions:
-
Compute confidence intervals.
-
Unshaded (
): almost certainly visible.
-
Lightly shaded (
): probably visible.
-
Darkly shaded (
): probably hidden.
-
Black (
): almost certainly hidden.
Sample viewshed output
Following are two images:
-
some sample terrain from South Korea, color-coded by elevation, and
-
the visibility indices of every point, colored similarly.

Sample Terrain; Visibility Indices
Extensions:
-
siting observers: alternate the above 2 programs.
-
feature determination
-
small relation of height to visibility
Siting Observers
Programs viewshed and vix may be used to find
a set of observers that jointly can see every point as follows.
-
Use vix to list all the points by visibility index, and hence,
to find the most visible point. Place the first observer, O1,
there.
-
Use viewshed to find the points that O1 cannot see.
-
Filter the sorted list of points to delete points that O1 can
see.
-
Find the most visible point that O1 cannot see; that is the
second observer, O2.
-
Repeat until the set of observers can see every point.
DEM Compression
-
Hypothesis: The
effort spent on image compression might transfer to elevations.
-
Differences: one
channel, 16 bits, different measure of goodness
-
Uses: Said and Pearlman's
wavelet program
-
Test cases: 24 USGS
level-1 1201x1201 DEMs.
-
Today's example: level-2
30-meter DEM of Bountiful Utah.
-
Standard deviation of the elevations: 1096
meters,
-
= RMS error if the file was compressed to two bytes
-
W/o compression: 11 bits per point (bpp).
-
Lossless compression: 5.234 bpp
-
Lossy compression, size vs error:
| BPP |
RMS Error, m |
| 5.234 |
0.0 |
| 1.0 |
6.0 |
| 0.1 |
51.7 |
| 0.03 |
148 |
| 0.01 |
535 |
| 0.0 |
1096 |
Original
Lossy Compression to 1 Bit/Point, RMSE=6.0 meters
Lossy Compression to 0.1 Bit/Point, RMSE=51.7 m
Lossy Compression to 0.03 Bit/Point, RMSE=148 m
Lossy Compression to 0.01 Bit/Point, RMSE=535
Interpolating from Contours to DEM
Prior Art
-
Interpolate in vertical planes.
-
PDE - Lagrange (heat flow) or thin-plate
-
Extend straight lines out in 8 directions to the
nearest contour
-
Voronoi diagram / Delaunay triangulation, e.g. area
stealing
-
Medial axis
-
Inverse distance weighting.
-
Detect features, interpolate them, then fill in.
Limits of Existing Methods
-
Often tested only on synthetic, small, datasets.
-
May require unbroken contours.
-
May require contours be thinned.
-
Generated triangles may be horizontal, or long & thin.
-
Generated peaks may be flat.
-
May generate terraces and ringing.
Figure: Interpolating 3 Nested Squares with Inverse Distance
Weighting, and 3 Ways with Delaunay Triangles
Interpolation & Approximation Introduction
Desirable properties:
-
local control
-
variation minimization
-
interpolation
-
conformal
-
Don't want to see the contours in the result.
-
Peak inside top contour and valley outside outer contour.
The math is counterintuitive, and the desired properties mutually contradictory.
Goodness Criteria
Ideal:
-
Maximum likelihood surface fitting the data.
-
Requires: Formal probability model of terrain elevation.
-
which is still incompletely solved. (e.g., long range correlations from
drainage patterns).
Next best:
-
What looks good?
-
Shaded plots
-
Profiles
-
Statistically compare curvature at generated points on/near data points
against farther points. (future)
Partial differential equations (PDEs)
Laplacian, or heat-flow equation,
zxx+zyy = 0,
where
zxx = d2z/dx2
solved by iteration on a grid,
4zij = zi-1,j + zi+1,j + zi,j-1
+ zi,j+1.
Problem: demonstrates the terracing artifact.
The more complicated thin plate model minimizes total curvature,
similar to fitting a thin sheet of metal to fixed points, while minimizing
the energy of bending. Gaussian curvature, commonly used in geometry, is
inappropriate here, since it is zero on a ``developable'' surface, such
as a bent sheet of paper. Instead we use the (scaled) divergence, or
zi-1,j + zi+1,j + zi,j-1 +
zi,j+1 - 4zij
for the curvature at a point.
The partial differential equation is:
zxx2 + 2zxy2 + zyy2
= 0;
while the corresponding iterative equation on a grid is:
20zij = 8(zi-1,j + zi+1,j + zi,j-1
+ zi,j+1) - 2(zi-1,j-1+zi-1,j+1+zi+1,j-1+zi+1,j+1)
- (zi-2,j + zi+2,j + zi,j-2 + zi,j+2)
For example, see this:
The following shows a vertical slice thru the interpolating surfaces.
It runs thru opposite corners, the better to show any artifacts. The triangulated
line shows the original points. The other lines show the cubic, inverse
distance, linear, and nearest neighbor interpolations. We see that none
of the interpolations is acceptable since all show where the original contours
are.
Figure: Figure: Vertical Slice Thru Four Interpolating
Surfaces
Desirable properties include:
-
local control
-
variation minimization
-
interpolation
-
conformal
-
Don't want to see the contours in the result.
-
Peak inside top contour and valley outside outer contour.
The new methods described here are based on the PhD work of Gousie (1998).
Intermediate Contours
Our first new method repeatedly interpolates new contour lines between
the original ones, somewhat similar to the medial axis. If two adjacent
contour lines are: A, at elevation
a, and B, at b,
then we interpolate at elevation (a+b) thus.
-
Pick a point on A.
-
Find the closest point on B (approximating the gradient).
-
The midpoint is on the intermediate contour.
Our contrarian philosophy is that we don't find the elevation at certain
points, but rather find points with a certain elevation.

Figure: Crater Lake: Original Contours and Intermediate Contour
Interpolation
The Maximum Intermediate Contours (MIC) method iterates the process,
filling ever-finer contours. Alternatively, we can find intermediate contours
one or more times, then do a thin-plate, Hermite-spline the peaks, inverse-distance
weight other small gaps, and Gaussian-smooth the surface.
Gradient Lines
This method answers the problem of the generated surfaces terracing between
contour lines by importing the idea of lofting from CAGD. It proceeds
as follows. Generate a first version of a surface by any method. Find its
gradient
lines. Note that, on each gradient line, we know only the elevations
where it crosses the contours. Interpolate its elevations in-between.
This process smooths the surface, while keeping the interpolation. Springs
may be added to any method, producing an approximation.
The following figures show three approximation methods applied to a
900x900
piece of the Crater Lake DEM. The first shows the original contour lines,
which are seen to be separated by many pixels, which makes the process
harder. The next three show the classic thin plate, and our intermediate
contour and gradient line techniques. We see that both our techniques are
much smoother than the thin plate, and that the gradient method is best.
Our longer papers quantify this, and show more examples.
Figure 6. Crater Lake: Original Contours
|
Figure 7. Thin Plate Approx
|
Figure 8. Intermediate Contour Approx
|
Figure 9. Gradient Line Approx
|
Overdetermined Laplacian Solution
The problem with many existing methods is that no information flows across
the contours. This causes terracing. Our solution is to make the known
data points also be the average of their neighbors. The method is to assume
that there are N2 unknowns; i.e., even the known points
are unknown. There are now two types of equations. First, for all points:
4zij = zi-1,j + zi+1,j + zi,j-1
+ zi,j+1
Second, for the known points, we make an equation that they are equal
to their known values:
zij = hij
Since there are now more equations than unknowns, none of the equations
will be satisfied exactly. That is, no points will be exactly the average
of their neighbors, and the known points will not be exactly their known
values. If there are K points whose elevations we know, then there
are N2+K equations for the N2 unknowns.
We then do a least-squares solution in Matlab, to give an approximate
solution, rather than an interpolation. With a least-squares solution,
scaling an equation up makes it more important. Let R be the relative weight
of the average-value equations compared to the others. A lower R will cause
a more accurate surface, while a higher R will cause a smoother surface.
A little inaccuracy goes a long way towards smoothing the surface.
Here is a test case constructed to stress any contour interpolation
algorithm. It consists of four concentric squares, at a distance of five
from each other. The squares' sharp corners facilitate artifacts, while
the large gap between adjacent contours allows interpolated surfaces to
droop or terrace. Our intuition would desire that a square pyramid with
triangular sides result. However, few algorithms are likely to produce
such a surface with discontinuities in the slope.
The next figure shows the surfaces that do result. R is the relative
weight of the average-value equations. The four subfigures in order are
the original contours, and the approximated surfaces with R=0.1, 1.0, and
3.0.
Figure: Overdetermined Laplacian PDE, Weighting Different
Equations Differently
This figure shows the same four surfaces cut by a vertical plane cutting
from corner to opposite corner, the better to show any artifacts. We see
that with R=3, the location of the original contours is completely hidden,
which is desirable.
Figure: Vertical Slice of the Surfaces
As the following table shows, the average error at the known points
in this case is under 3%. That is, by allowing the elevations of known
contour points to vary by an average of 3%, we can fit a quite smooth surface
to the nested squares. Considering the accuracy standards of most hypsographic
data, that may be acceptable.
Table: Error versus Weight
| R |
Max % Error |
Mean % Error |
| 0.1 |
0.27 |
0.01 |
| 1.0 |
5.5 |
0.6 |
| 3.0 |
12 |
2.7 |
Note that our overdetermined solution concept is quite different from
putting springs on the data points. That is also an approximation, but
it does not try to make the data points to be the average of their neighbors.
Absent that, it is impossible to make the surface not show the original
contours.
Various extensions of the overdetermined solution technique are possible.
We might calculate the error on each data point, then for the less-accurate
points, increase their weights and re-solve. This reduces the max absolute
error, but increases the mean. The resulting surface is slightly less smooth.
Alternatively, we might guess that the points with large errors represent
breaks in the surface slope. Then an reasonable response might be to reduce
their weights and re-solve.
We have tested the overdetermined solution technique on grids of up
to 257x257 points, and are working on larger cases.
3D Connected Components
-
Given: a 500x500x500 volume of bits
-
Application: scanning a concrete block being
stressed to failure
-
To Find: connected components
-
Algorithm: Union-find, 1960s
-
What's new: N=108
-
Implication: Inserting 100,000,000 voxels
into a tree maybe is a bad idea.
-
Method:
-
Think for 6 months before coding.
-
Join voxels into 1-D runs.
-
Make 2 passes thru them to link connected components.
-
Time: Joining components is only several times
slower than reading the data.
-
Moral: Careful choice of data structure is
important.
Proposed Future Work
Short Term
-
Multiple observers covering a region.
-
Fast graphics cards?
-
Test the largest DEM datasets available.
-
Study other serendipitous questions that will arise.
Medium Term
-
Leverage from array operations tools, like Matlab?
-
Output sensitivity to input?
-
If visibility indices and viewsheds are sensitive to small input perturbations,
then
-
Try to establish error bounds on the output.
-
Faster and less accurate output when input is inaccurate?
-
Feature recognition: feed into constrained TIN.
Long Term
-
Investigate the general question of the proper representation of terrain
elevation data.
-
If using TINs, use triangular splines, not just piecewise multilinear flat
triangles.
-
Consider the idea of a conceptually deeper representation, based on the
geomorphological forces that created the terrain. Devise a basis set of
operators, such as uplift, downcut, etc. Deduce the operators
that created the particular terrain under consideration, and store them.
-
Slightly differently, represent the terrain by the features that people
would use to describe it. This idea is many decades old. Can we get it
to work? The problem is that you can't just say that there is a hill
over there, you have to specify the hill in considerable detail,
which would seem to take more space than simply listing all the elevations
with a grid or TIN.
-
Extend to geological data structures. Plan excavation strategy for open
pit mine.
-
The major creative force for terrestrial terrain is water erosion.
This does not apply to the moon, Venus, and, to the same extent, to Mars.
Therefore those surfaces might be statistically different. What impact,
if any, would this have on the speed and output distribution of our algorithms
and data structures?
How the Parts Fit Together
