This page describes WRF's program to compute hydrography, or water flow on terrain. That is, it reads a square matrix of terrain elevations, and writes a square matrix of the water flowing through each pixel.
Contents (hide)
- 1. Assumptions
- 2. Definitions and specific assumptions
- 3. Reading the elevation data
- 4. Removing 1-pixel pits
- 5. Computing the adjacency matrix
- 6. Finding connected components
- 7. Joining basins - preliminaries
- 8. Fill depression
- 9. Joining two basins - details
- 10. Joining all basins
- 11. Compute flows
- 12. Misc notes about hydrog.cc
- 13. Implementation
- 14. Code fragments and tips
- 15. Data files
- 16. Misc files
- 17. Filename extensions
- 18. Web files
- 19. Misc notes about other programs
- 20. Results
- 21. Status
- 22. Possible optimizations
- 23. Research ideas
- 24. Prior Art
1. Assumptions
- Although datasets are getting larger, so are computers' internal memory. My latest notebook has 8GB of main memory. My 5-year-old lab machine has 64GB of main memory. Therefore internal algorithms, which do not use external storage, are still important.
- Although processing the largest databases would require external storage, the cutoff beyond which that is true is higher and higher, currently hundreds of millions of pixels.
- Space and time efficiency is more important than ever. Reduced space means that larger datasets can be processed internally. That means that reduced asymptotic time growth and multiplicative time constants become more important. Consider the size of {$ (\lg N) $} when {$N=10^{9} $}, the number of voxels that my connected components program can easily process.
- Speed is also important because a fast implementation becomes a feasible subroutine of a larger algorithm. E.g., hydrography might be used in terrain compression.
2. Definitions and specific assumptions
- Everything, terrain and hydrography, is discretized on a regular grid,
sometimes called the world.
- Water flows off the edge of the world.
- The grid has a regular spacing either in meters or in angle in lat/long.
- For convenience, we call one grid cell or quantum a pixel (since cell can have alternate meanings, such as a complete DEM-1 array of {$1201^2$} pixels).
- A particular pixel is denoted as {$p_{ij}$}.
- Terrain: elevation,
- above a reference surface (sea level, the geoid),
- expressed as an array of numbers {$z_{ij}$}.
- Hydrography: in this context, the water flow through each pixel.
- I don't think that 4-connected vs 8-connected makes much difference. Since people prefer 8-connected, that's what I now use.
- I don't process non-tree hydrography graphs, like islands, or lakes like Lake George NY with two exits.. They would add to the complexity but not, seemingly, require major new ideas.
- Each pixel gets one inch of rain.
- All the water entering a pixel flows out to its lowest neighbor. Ties are broken consistently.
The big problem is that there are a surprisingly large number of basins, or sections of terrain that trap their water. The reason is that all it takes to cause a separate basin is a 1-pixel wide 1 unit hi ridge at the bottom:
3. Reading the elevation data
- Binary input is surprisingly faster than ASCII input.
- Reading {$4097^2$} ASCII unsigned short ints took about 5 CPU seconds.
- Reading them in binary takes only 0.08 CPU seconds.
- Perhaps there is something wrong with how ASCII input is implemented; GNU has implemented C++ I/O badly in the past.
- So, for now, read binary.
4. Removing 1-pixel pits
- A 1-pixel pit is a pixel that is lower than (or the same elevation as) all of its 8 neighbors.
- Removing them is done before adjacencies are computed.
- It is not strictly necessary, but is very fast, and removes about 60% of the basins, which is a big win. Therefore the connected component computation is sufficiently faster that the total time does not increase. Even better, the basin joining step should be much faster.
- The process is this: If a pixel {$\le$} than the lowest of its 8 neighbors, then raise its elevation to the mean of its two lowest neighbors.
- That elevation is probably a fraction, so internal elevations are stored as doubles altho the input elevations are unsigned short ints.
- Using {$\le$} instead of {$<$} removes many more basins. The reason is that it raises pixels on the margins of flat regions and forestalls many ties that would have occurred in the adjacency matrix computation.
- This means that when the 2 lowest neighbors are at the same elevation, then this pixel is raised to their elevation. If it were raised higher, it would create a ridge between those pixels, preventing this pixel from joining them into the same basin.
- This process creates some new 1-pixel pits, from regions that were 2-pixel basins.
- Therefore, this whole process might be repeated. Each time, it would remove a smaller number of pits.
5. Computing the adjacency matrix
- A pair of pixels is adjacenct if one flows into the other, or if both flow into the same third pixel.
- As of April 4, 2011, we assume 8-connectivity.
- In this step, pixels on the terrain boundary are assumed to have a neighbor outside the edge of the world with a hi elevation. This keeps two basins that both flow off the edge of the world from joining.
- We assume that the water flowing thru a pixel all flows out to that pixel's lowest neighbor. (That would be easy to change.)
- When comparing elevations, ties are broken in a consistent manner. E.g., following the ideas of Simulation of Simplicity, instead of comparing just {$z_{i_1,j_1} $}, we lexicographically compare the triples {$ (z_{i_1,j_1}, i_1, j_1) $}.
- Two pixels {$p_{i_1,j_1}$} and {$p_{i_2,j_2}$} are geometrically adjacent iff the 2nd is one pixel away from the 1st in any of the 8 directions.
- Two geometrically adjacent pixels {$p_{i_1,j_1}$} and {$p_{i_2,j_2}$} are flow-adjacent if the water flowing out from one flows to the other. That is, either one of the two pixels is the other's lowest neighbor.
- A basin is a connected set in the transitive completion of flow adjacency.
- In what follows, we index arrays from 0.
- The basic idea is:
- Create a graph where the nodes are the pixels and there is an edge between 2 nodes that are flow-adjacent, then
- Find its connected components.
connect2works on a Euclidean bitmap, where 2 1-pixels are in the same component if they are adjacent. There are no explicit edges.- Therefore we will create such a bitmap from the terrain, and then find its connected components.
- From the {$n\times n$} elevation array {$z$}, create a
{$(3n-2)\times(3n-2)$} adjacency matrix {$\alpha$} (called
adjin the program) as follows 1.- {$\alpha_{3i,3j}$} corresponds to {$p_{ij}$}.
- Set {$\alpha_{i,j}=0$}.
- Then set {$\alpha_{3i,3j}=1$}.
- Set {$\alpha_{3i+1,3j}=\alpha_{3i+2,3j}=1 $} iff {$p_{ij}$} is flow-adjacent to {$p_{i+1,j}$}.
- Set {$\alpha_{3i+1,3j+1}=\alpha_{3i+2,3j+2}=1 $} iff {$p_{ij}$} is flow-adjacent to {$p_{i+1,j+1}$}.
- Proceed analogously for the other 7 directions.
6. Finding connected components
- Now we process {$\alpha$} to find its connected components, or basins.
- I am gradually changing the terminology in the program over from component, which was natural when connected components were the topic to basin, which is more natural here.
- First we pad {$\alpha$} on all 4 sides with a row of pixels of a high elevation. This is to prevent basins that both flow off the edge of the world from joining.
- The connection program is an extension of connect2, my fast Connected Components program. One change is that I am now connecting the 1 pixels, not the 0s that were more natural for connect2's original application (connecting voids).
- The result is a matrix
basinidgiving the component (basin) number of each pixel.
7. Joining basins - preliminaries
- Each basin has exactly one local minimum. (If there were a 2nd local min, the pixels flowing into it would not be flow connected to the others.)
- That local min may or may not be at the edge of the basin.
- In the normal case, a basin's local min is at the edge of the basin.
- However, many basins have interior local mins, sometimes genuine geographic features, but usually caused by data errors or too coarse quantization.
- These are unacceptable because they trap rivers and prevent any long rivers from forming. (The effect on the computed hydrography is quite dramatic.)
- We will fix this by raising the elevations of certain pixels to remove all interior local mins, before calculating the flow.
- This will result in all basins with interior local mins being destroyed by being joined to other basins.
- Let the unique lowest point of basin {${\cal B}_i$} be {$d_i$}. {$d_i$} is lower than any other point in {${\cal B}_i$} and also lower than all of {$d_i$}'s 8 neighbors, even if they be in a different basin. (If {$d_i$} had a lower neighbor, then {${\cal B}_i$} wouldn't be a separate basin.)
- That lowest point is unique because ties are broken consistently.
- All the water in {${\cal B}_i$} flows to {$d_i$}.
- The problem is to combine the many small basins into fewer larger ones.
- A sill is a pixel {$ (x_1,y_1) $} in a basin {$ {\cal B}_1 $}, which is adjacent to a pixel {$ (x_2,y_2) $} in basin {$ {\cal B}_2 $}. Any other pixel in {$b_2$} that is adjacent to a pixel in {$ {\cal B}_1 $} is higher than pixel {$(x_2,y_2)$}. IOW, if {$ {\cal B}_1 $} were to overflow into {$ {\cal B}_2 $}, it would overflow from {$ (x_1,y_1) $} to {$ (x_2,y_2) $}.
- {$ (x_1,y_1) $} may be lower, the same elevation as, or higher than {$(x_2,y_2)$}. Here are 1-D examples. In each case, red pixels are {${\cal B}_1$} and blue pixels {${\cal B}_2$}. lower: 1 2 3 1, same: 1 2 2 1, higher 1 3 2 1
- Note the effect of the tie-breaking rule: 2 1 3 3 3 2.
- The spill elevation of a sill of {$ {\cal B}_1 $} is the greater of the that sill's elevation and the elevation of the pixel in {$ {\cal B}_2$} that it would spill over into.
- {$ {\cal B}_1 $} usuall has many sills, each one to a different basin.
- We need to remember the higher sills because the lowest sill will be deleted should {$ {\cal B}_1 $} and {$ {\cal B}_2 $} be joined. That would cause the sill between them to be deleted.
- Conceptually, gradually fill a basin until it flows over a sill into a lower neighboring basin, and join the 2 basins.
- One complication is this case: 4 4 4 1 2 3 2 We want to fill (red) {$ {\cal B}_1$} until it overflows into (blue) {$ {\cal B}_2$}. However, raising {$ {\cal B}_1$} until the level of the adjacent pixel in {$ {\cal B}_2$}, creates a new low region.
8. Fill depression
- This routine processes one basin, {$ {\cal B}_1$}, each time it's called.
- It handles the case that some pixels in {$ {\cal B}_1$} might be lower than {$ {\cal B}_1$}'s spill elevation (because the spill elevation might be higher because the adjacent pixel in the adjacent basin might be higher).
- The process of raising certain pixels in {$ {\cal B}_1$} goes as follows.
- Create an empty deque {$\cal D$} of pixels in {$ {\cal B}_1$} waiting to be processed. (Using a deque causes the pixels to be processed in breadth-first order.)
- Create an empty set {$\cal S$} of pixels in {$ {\cal B}_1$} that have been processed.
- Assume a constant small delta elevation {$dz$}. It should be small enough that, even when added many times to a legal pixel elevation, we do not get as high as the next legal elevation. It should be large enough that it is different from zero when added.
- Initialize a current elevation {$z$} to the spill elevation plus {$dz$}.
- Insert the sill into {$\cal D$} and {$\cal S$}.
- Repeat the following until {$\cal D$} is empty.
- Pull the 1st pixel off of {$\cal D$}. Call it {$z_{ij}$}.
- If its elevation is greater than {$z$}, then this pixel was already elevated, so go back to pull another pixel. It is possible that its elevation is between the spill elevation and {$z$}, if it was earlier elevated by virtue of being in the depression of another sill. If so, it needs to be elevated again.
- Insert some or all of its 8 neighbors into {$\cal D$}. To be inserted, a
neighbor must -
- not be in {$\cal S$},
- be in {$ {\cal B}_1$}, and
- have elevation less than {$z$}.
- Increase {$z$} by {$dz$}.
- Set {$z_{ij}$} to {$z$}.
- Note: This can cause a pixel that would have flowed to another pixel in this basin, now to flow in a different direction into another basin. I don't think that this is a problem.
- This also means that the adjacent pixel to a spill pixel might now be higher. Therefore the spill elevation must be raised.
9. Joining two basins - details
The process of joining {$ {\cal B}_1$} into {$ {\cal B}_2$} goes as follows.
- Call fill_depression on {$ {\cal B}_1$}.
- Merge the sill list of {$ {\cal B}_1$} into {$ {\cal B}_2$}, keeping the list sorted by increasing spill elevation.
- Make a note in the data structure for {$ {\cal B}_1$}, that it has been joined into {$ {\cal B}_2$}. However, if {$ {\cal B}_2$} was earlier joined into {$ {\cal B}_3$}, then point {$ {\cal B}_1$} directly to {$ {\cal B}_3$}.
- If {$ {\cal B}_2$} now has two or more sills with the same spill basin, then delete the higher one(s).
- If there is a sill whose spill basin is {$ {\cal B}_2$}, which includes {$ {\cal B}_1$}, then delete it.
10. Joining all basins
- There may arise chains of joined basins, which is inefficient and makes it hard to determine if two basins are ultimately joined to each other. Therefore I frequently follow these chains and collapse them so that each basin points straight to the root of the tree of joined basins.
- The root basin id is the id of the original basin with the lowest spill elevation.
- The process of joining all joinable basins goes as follows.
- Create a heap (priority queue) {$\cal H$} of all the basins in increasing order of spill height.
- Repeat until {$\cal H$} is empty.
- Pop the first basin {$ {\cal B}_1$} off of {$\cal H$}.
- If {$ {\cal B}_1$}'s spill elevation has changed since it was originally pushed onto {$\cal H$}, then push it onto {$\cal H$} again with its new elevation and continue to the next iteration.
- Its spill elevation could have increased because two basins mutually
spilled into each other. When they were joined, the combined basin's
new spill elevation is now higher. This figure shows the problem.

- If {$ {\cal B}_1$} does not spill off the edge of the world, (indicated by a basinid of -1), then join it to the first basin on its sill list, {$ {\cal B}_2$}.
- If it does spill off the edge of the world, then call fill_depression because it might have a depression next to its sill.
- After joining, the remaining basins will be those that spill off the edge of the world.
11. Compute flows
Follow Eddie's suggestion to use the Terraflow idea, since, it's fast enough for the initial version here.
- Sort the pixels in descending order of elevation. Break ties as usual.
- Initialize each element of the flow array to 1, for the incoming rain.
- Traversing the sorted pixel array, add each pixel's flow to the flow of its lowest neighbor, provided that the neighbor is lower than the pixel, or is of equal elevation, breaking ties the same as during the sorting. If there is no lower neighbor, then we've found a surviving interior sink, which should have been removed by now.
- If the flow stops with this pixel, then sum its flow into
totflowas a check.
12. Misc notes about hydrog.cc
- Function
induses a variable that is a pair of indices to index into a 2D array. It would be more obvious to overloadoperator[]but I can't see an easy way to do that. Televisdoublebecause a lot of significant digits are needed to handle the small increments added infill_depression.
13. Implementation
hydrog.ccis the major program.- Usage:
hydrog NROWS < ELEVATION-FILE.elevb - Wrap function function don() { time hydrog $2 < data/$1.elevb convert flow.pgm $1-flow.png convert flow.pgm -gamma 5 -normalize $1-flow2.png rm -f flow.pgm xv $1-flow.png&
- Output image file:
FOO-flow.pgmThe water flow thru each pixel, scaled with sqrt to reduce the range and to keep the max flow under 65535, which image manipulation programs prefer. Converting this to png saves space. Converting it to jpg would distort it.
- Optional or obsolete output image files.
The optional outputs are enabled by setting conditional compilation macros
in the program. Enabling them significantly slows the program, since I/O
is slow. Some of them might not be correct if that code hasn't been updated.
FOO-pits.pbmThe 1-pixel pits (that were then removed).FOO-comps.pgmThe basins, before they were joined.FOO-edges.pbmThe edges between the basins.FOO-edgesover.pngImage of the edges overlaid on the original terrain.
14. Code fragments and tips
float2int.ccReads floats from cin, write ints to cout. Used to process ASCII data files written by Matlab: float2int < t10.dat >! t10.eleva cat head10.pgm t10.eleva >! t10.pgmint2bin.ccReads ascii unsigned short ints from cin and write in 2-byte binary to cout.- To convert 16-bit png to 16-bit ascii pgm:
convert -compress None ps_height_1k.png ps_height_1k.pgm - To delete 1st 3 lines of a file:
tail --lines=+4 in > out - To convert
elevafile topgmfile: Prepend a header likehead400.pgm - To cause ints to be written with commas thus: 12,345,678, set a locale in the program.
15. Data files
t10.eleva10x10 test with a saddle, from t10.dat from Matlab. A subset of hill2.{hill,mtn}{1,2,3}.elevaThe GeoStar 400x400 datasets.mtn1a.eleva100x100 top left corner ofmtn1ps{1,4,16}k.elevaPuget Sound, from http://www.cc.gatech.edu/projects/large_models/ps.html . {$1025^2,4097^2,16387^2 $}. Originally calledps_height_1k.elevaetc.FOO.elevbThe above elevation files in binary format.
16. Misc files
fnszsh functions to run the programs.head10.pgmetc. Head for a 10x10 ASCII PGM file with a particular max value for a pixel.
17. Filename extensions
.compOutput from connect2.bitaASCII bitmap, containing only '0' and '1' characters, of a 2D or 3D rectangular region intended to have the connected components of regions of 0s (or is it the 1s?) computed. If the region is {$ (n_x,n_y,n_z) $} voxels, then this file's size is {$ n_xn_yn_z $} bytes..bitbBinary bitmap. File size is {$ \lceil n_xn_yn_z/8 \rceil $} bytes..svgDrawings created in inkscape. (However, Firefox doesn't display svg files in an IMG tag.).pngDerivative drawings for display..elevaASCII file of square matrix of elevations, containing {$ N^2 $} blank-separated positive ints..elevbbinary file of square matrix of elevations, containing {$ N^2 $} 2-byte binary unsigned ints.
18. Web files
- hydrography/ - the code and smaller data files.
- Larger data files (Puget Sound):
19. Misc notes about other programs
- When xv reads an ascii pgm it automatically scales stated max pixel value up or down to be 255. Therefore, sometimes set stated max to 255.
- Ditto ImageMagick display apparently.
- Dunno how to handle max over 255.
- The display programs display pgm file pixels left to right, top to bottom.
- However if you probe a pixel in xv, it lists its coords in the order (col,row), e.g., (y,x).
20. Results
- This table shows the CPU time to compute the flow in each pixel from the elevation matrix.
- The CPU time is precise to about 5%.
- The number of basins is the original number of connected components.
- The number of watersheds is the number of basins after all interior sinks have been removed by joining basins with sills that are not at the edge of the world. I.e., it is the number of basins flowing off the edge of the world.
| Dataset | # rows | # pixels | CPU time (s) | # basins | # watersheds |
|---|---|---|---|---|---|
| mtn1 | 400 | 160,000 | 0.12 | 309 | 71 |
| mtn2 | 400 | 160,000 | 0.11 | 243 | 68 |
| mtn3 | 400 | 160,000 | 0.13 | 253 | 50 |
| hill1 | 400 | 160,000 | 0.13 | 1,345 | 18 |
| hill2 | 400 | 160,000 | 0.13 | 486 | 53 |
| hill3 | 400 | 160,000 | 0.14 | 1,075 | 12 |
| ps1k | 1,025 | 1,050,625 | 1.05 | 7,102 | 349 |
| ps4k | 4,097 | 16,785,409 | 28.5 | 39,779 | 675 |
Here are the computed flows. Zoom the larger cases to see details.

Mtn1

Mtn2

Mtn3

Hill1

Hill2

Hill3

Ps1k

Ps4k
Now with gamma=5 and normalized:

Mtn1 (intensity normalized)

Mtn2 (intensity normalized)

Mtn3 (intensity normalized)

Hill1 (intensity normalized)

Hill2 (intensity normalized)

Hill3 (intensity normalized)

Ps1k (intensity normalized)

Ps4k (intensity normalized)
21. Status
- The current version is hydrography-20110424.tar.gz.
- Hydrog apparently works, and very quickly, taking slightly worse than linear time per pixel.
- It is economical with storage. The {$4097^2$} case uses 690MB of memory
(including libraries, determined by viewing the process with
top). That is only 43B per pixel. It might be reduceable. - Since the flow is so uneven (most pixels are 1; the biggest flow may have much of all the water in the world), you need to gamma scale the image to see a lot.
22. Possible optimizations
If you want to make hydrog even faster or smaller, start here:
- Replace the STL
sortroutine, which is the slowest part of the whole program, with a radix sort. - Consider replacing the STL
setwith a boost set or a specially coded version. - Represent elevations with 4-byte fixed point numbers instead of 8-byte ints.
23. Research ideas
hydrogis fast enough that you can call it thousands of times. This introduces new research possibilities.- Compare to other methods, in speed, space, and in quality of output.
- Use it to postprocess terrain produced by other methods.
- Try on random terrain.
- Study effects on hydrography of lower res terrain, or terrain with noise added.
- Simple things:
- Overlay the flow.pgm image on the elevation image.
- Write new elevated terrain as a separate file.
24. Prior Art
1 {$\alpha_{3i,3j}$} is {$ \Large \alpha_{ (3i),(3j)}$} ⇑
<< | Research Page List | >>