(in WR FranklinResearch)

Here are some ideas that I'm pursuing, or would pursue given time. Collaborators and advice are welcome. Good PhD topics are flagged. This page is under construction.

1.  Recent changes

27 June 2009
minor additions.

2.  Boundary conditions of this work

  1. Use linux when possible.
  2. Use the most powerful tools, such as C++, STL, Boost, Mathematica and Matlab.
  3. Document with tools like LaTeX, beamer1, and a group in my pmwiki wiki.
  4. Test everything on the largest, baddest, possible examples. If my algorithm can be broken, I want to know now, not later. Also, my algorithms might perhaps be more robust than others'.
  5. Present results as vividly as possible, with color animations, using the best tools, such as POVRAY, Mathematica, VTK, or VISIT. It's ok to use ten times the computing power to present the results as it took to compute them.

3.  Themes

  1. massive db
  2. map reduce
  3. dynamic

4.  Computational cartography

4.1  Terrain representation by scooping

Phd topic!

The goal is to represent terrain with a set of operators that model, to the appropriate extent, the geophysical processes that created it. This representation would

  1. be more compact
  2. facilitate, for the first time, a theoretical analysis of terrain operators like compression and progressive transmission, and
  3. allow only legal terrain to be represented.

This is my most powerful long term idea, but hasn't been pushed adequately.

4.2  Terrain Drainage - Computation or Surface Representation

Phd topic!

Identifying and 'fixing' sinks or local minima is as important as computing the drainage networks. Large horizontal or almost horizontal regions are an equal problem. Current methods like incrementally filling in the sink until it flows over its lowest border do not scale up well.

  1. My connected component program should do this faster using less storage. The idea is that two pixels will be in the same connected watershed component iff one flows into the other. Compute an effective adjacency matrix from the terrain and feed it into connect.
  2. The question is, is there a need for faster hydrology computations, or are current techniques adequate? My guess is that my idea will scale up better, but this has to be demonstrated.
  3. Also, drainage networks, and inverted networks, computed on the negated terrain, can be used to represent and compress the terrain; see Jon Muckell's thesis and papers. The idea is to extend this.

Note that ANUDEM is already doing most or all of this. So, any contribution would need to be to do it faster on larger datasets.

4.3  ODETLAP on large terrains

There is a need to process large terrain databases, perhaps {$10^5\times10^5$} points. Extend ODETLAP to do this, perhaps using ideas of Jake Stookey.

  1. . Jared Stookey, Zhongyi Xie, Barbara Cutler, W. Randolph Franklin, Daniel M. Tracy and Marcus V.A. Andrade. In Walid G. Aref and others, editor, 16th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM GIS 2008), Irvine CA, 5-7 Nov 2008. (URL) (paper, talk: ppt, pdf, poster: ppt, pdf).

You Li observes that solving the normalized equation for ODETLAP is 100x faster, with the improvement growing with N. Wow!

4.4  ODETLAP in {$E^3$}

Phd topic!

ODETLAP has proven very effective at compressing {$E^2$} terrain. Extend it to compressing medical or geological data in {$E^3$}.

  1. With {$E^2$} ODETLAP, there is an equation for every interior point, setting it to the average of its 4 neighbors in the grid.
  2. With {$E^3$} ODETLAP, there is an equation for every interior point, setting it to the average of its 6 neighbors in the volume.
  3. In either case, border points are a separate issue.
    1. Try to ignore them for now.
    2. Otherwise, set each border point to the average of all of its legal neighbors, those that are not outside the universe.
    3. However, that biasses the slope of the surface towards horizontal at the edge.
  4. In either case, each known point has an additional equation setting it equal to its known value.
  5. This produces a sparse overdetermined system.
  6. The question in 3D is how fast can this be solved? That depends on how much the sparse matrix fills in as variables are eliminated.
  7. Fills in means that some zero entries become nonzero. That is undesirable since it costs time and space.

Li You is now working on this.

4.5  Slope compression with ODETLAP

Phd topic!

Xie and Tracy have extended ODETLAP to compress terrain so that the reconstructed terrain has accurate slope. The idea is to add new equations to encourage the reconstructed surface's slope to be close to the original surface's slope.

It may take some thinking to realize how major this is. We are not representing slope explicitly in the compressed surface. Rather, we are lossily compressing the surface so that its computed slope is reasonable accurate. SFAIK no one is doing compression like that.

Extend this work.

Zhongyi Xie is working on this.

4.6  ODETLAP with Other PDEs

Phd topic!

Our current version of ODETLAP uses the Laplacian (heat flow) PDE, possibly extended for slopes. Investigate other PDEs such as the thin plate equation. Note that the slope compression described above is one such extension.

It now appears that the thin-plate PDE is no better than the Laplacian, so this theme is now suspended.

4.7  Multiobserver Siting

Phd topic!

  1. Investigate the effect of compression and of errors on the quality of the multi-observer siting.
  2. Concerning 'quality', I don't care whether the observers moved, provided that they are just as effective in their new locations. That is a more sophisticated and realistic metric than other people may be using. It is also more difficult to understand. :-(
  3. I showed many years ago that viewshed computations have a large error bar. A minor change in interpolating terrain height between adjacent posts in the Lake Champlain West DEM changed the visibility of half of all the targets in the DEM. That apparent problem is really an opportunity. It means that faster algorithms might produce equally good results. Investigate this idea.

4.8  NPR terrain rendering

Test Non Photorealistic Rendering of terrain - does it bring out the details? This would be an undergrad or masters project.

My map overlay program can easily be extended to interpolate data from one map to another. E.g., we could estimate the population of drainage polygons by how much they overlap the census polygons, whose populations we know.

However this assumes that the population in each census polygon is even across the whole census polygon, and then jumps discontinuously at the border. Generalize the interpolation to assume the population is varying smoothly. That is, the census polygons are sampling a smooth population distribution. Resample it into drainage polygons.

This idea is from Fred Broome.

(masters - PhD)

4.9  TIN with Curved Patches

Extend the TIN to use triangular splines of higher degree, instead of triangular planar patches. The first step is to compute the curved patches for an existing TIN, then see if the fit is improved. If so, then this tells us something about the surface's continuity.

(masters - PhD)

5.  Computational geometry - specific

5.1  Applications of connected components

I have a very fast implementation of a union-find connected component algorithm, that runs on a {$20\,000\times20\,000$} bitmap in {$E^2$} or {$1000\times1000\times1000$} bitmap in {$E^3$}. The input is an array of bits, or voxels that have been classified, in {$E^3$}; the output a list of the connected components together with their volumes and surface areas. This is a solution waiting for a problem.

  1. One possible application would be to extend it to compute porosity of a material like soil. My program would compute the region(s) connecting the top and bottom of the soil sample. The research would start with computing the conductivity of such irregularly shaped regions.
  2. My program facilitates experiments on random connectivity, that is connectivity, fractal dimension, etc, of volumes where each voxel is randomly empty or solid. Take this idea somewhere.
  3. Compare this to Streaming Connected Component Computation for Trillion Voxel Images, Martin Isenburg and Jonathan Shewchuk, at the Workshop on Massive Data Algorithmics, June 11, 2009.

5.2  Nearest neighbors

I think we can beat ANN (Approximate NN); they disagree that I've shown that. Demonstrating this needs more work. My advantages are:

  1. Simple compact data structure allows large datasets to be processed internally.
  2. Constant preprocessing and query time per point (no {$\log N$}) factors. For large {$N$}, that's significant.
  3. The fast preprocessing time facilitates dynamic 'fixed' point sets. That is, if the 'fixed' points are moving, it may be easier for us to update the data structure or even to recompute it from scratch.
  4. My data structure might be more tolerant of very unevenly spaced data, such as data that is locally 2-D, perhaps derived from a laser range scanner. However, this needs to be demonstrated.

The project here is to run experiments and collect data to see whether I am right. Beating a widely used method would be fun.

ANN was reporting an erroneously large query time because of an overflow, so we're currently not faster than it for queries. We are faster and smaller for the preprocessing. We might have other advantages, but this is lower priority for the moment.

5.3  Local topological data structures

Phd topic!

  1. A local data structure for a polygon or polyhedron represents that object with a set of primitives that depend on only local properties of each vertex.
  2. No global info such, as complete edges, is explicit, altho it may be derived if necessary.
  3. Many properties, such as volume and surface area, and operations, may be computed with a reduction operation on that set.
  4. The idea may be extended to computing the volume of the union of polyhedra.
  5. An advantage of this method is that most of its steps are reduction operations that may be parallelized.

The goal here is to extend this idea. It is fun to go in the opposite direction from the widely held belief that we should use lots of global topology.

IMHO, this research direction is greatly underappreciated.

5.4  Unifying example - Volume of union of many polyhedra

Several of the ideas described here fit together nicely, e.g., in the application of computing mass properties of boolean combinations many polyhedra. I've an algorithm and implementation for theh volume of the union of many cubes, tested to {$2\cdot 10^7$} cubes. The theoretical analysis, substantiated by experimental measurements, is that the time is linear in the number of cubes.

This is another solution looking for a problem. This is also extremely parallelizable.

A theme would be to make this dynamic.

6.  Computational geometry - general

6.1  Turn static geometry algorithms into dynamic

Examples include nearest points and mass properties of the union of multiple polyhedra. My simpler data structures, such as the uniform grid, are more amenable to this than some competing ones.

6.2  External algorithms

  1. What big datasets are big enough? Clearly some datasets will be that large. However, with a compact representation, we can do a lot more internally than people realize. That means that when we do go external, the appropriate algorithm and data structure might be different.
  2. E.g., I originally designed my connected component algorithm to be efficient when external. However, even when processing more than {$10^9$} binary voxels, that wasn't necessary.
  3. One technique might be to add a new level to the uniform grid.
  4. The quasi-linear properties of external media have perhaps not been properly exploited. On the one hand, sequential reads are much faster than randomly seeking to another position in the file. On the other hand, large blocks can often be transposed in constant time merely by swapping pointers.

6.3  Parallel

Phd topic!

  1. I graduated 2 PhDs in parallel Computational Geometry in the 1980s. Maybe it's time to return to this.
  2. Many of my algorithms can be cast in the mapcar aka map-reduce aka scatter-gather paridigm that is recently popular with Google. E.g., consider the steps of the uniform grid for edge segment intersection:
    1. For each edge segment, determine what cells it passes thru.
    2. Invert that relation so that we know the segments contained in each cell.
    3. For each cell, intersect all pairs of segments in that cell.

Steps 1 and 3 are totally parallelizable. Simple data structures really pay off here.

7.  Computational Biology

Bio has lots of little objects called "atoms"; I efficiently process lots of little objects; is there a connection here?

However, perhaps Computational Geometry is by now mature enough that the biologists don't need any more geometry research; but see the next topic.

7.1  Identifying complete neurons from images.

One problem is to determine the neurons or blood vessels from a confocal microscope image. The image shows many disconnected pieces; the task is to connect them in a logical pattern.

My ideas is to impose structure on the space between object such as neuron parts, in order to fill in gaps. This is analogous to filling in drainage patterns in cartography.

7.2  Protein docking

Given two molecules, do an unconstrained optimization using the 6-12 law. This is an obvious idea that no one appears to be doing. Therefore it probably has a fatal flaw. That might be that it gets hung up in local optima or else it is too slow.

Or, it might be that no one has has used a good optimization algorithm that uses the 2nd derivative (Jacobian) of the objective function. Our objective function has an easy 2nd derivative, but optimization implementations that can use it are hard to find.

One possible solution to the local optima hang-up might be to start with thousands of random initial positions. That works if the process if fast enough.

The purpose of this exercise is to investigate this idea.

It's not clear that this topic needs more research.

 

1 Many topics mentioned on this page, such as this one, have useful wikipedia entries


<< Non Geometric Data Structures Algorithms | Research Page List | Old Program Solicitations >>