UP | HOME

第二十九课 Graph-BFS

Table of Contents

1 GRAPHS (BFS)

Breadth-first search (BFS) is a little more complicated than depth-first search, because it's not naturally recursive. We use a queue so that vertices are visited in order according to their distance from the starting vertex. BFS is a lot like level-order tree traversal.

  public void bfs(Vertex u) {                                                   //       initial *
    for (each vertex v in V) {                                   // O(|V|) time //               |
      v.visited = false;                                                        //-+             v 访问
    }                                                                           // |             |
    u.visit(null);                            // Do some unspecified thing to u // |             v
    u.visited = true;                              // Mark the vertex u visited // |             进队:enqueue 求子
    q = new Queue();                                            // New queue... // |             |
    q.enqueue(u);                                  // ...initially containing u // |             v
    while (q is not empty) {                // With adjacency list, O(|E|) time // |  +--------> 出队:dequeue 得子
      v = q.dequeue();                                                          // |  |          |
      for (each vertex w such that (v, w) is an edge in E) {                    // |  |          +-------------------------------+ queue build (for-loop)
        if (!w.visited) {                                                       // |  |          |   |->|  |->|  |->|   ......   |
          w.visit(v);                         // Do some unspecified thing to w //-+  |          v   |  v  |  v  |  v            v
          w.visited = true;                        // Mark the vertex w visited // |  |         子 1 访问 子 2 访问 子 3 访问子 4 访问       子 n 访问
          q.enqueue(w);                                                         // |  |          |  /   |  /  |  /  |            |
        }                                                                       // |  |          v /    v /   v /   v            v
      }                                                                         // |  |        进队/    进/   进/    进队         进队
    }                                                                           // |  +----------+ queue FIFO (while-loop)
  }                                                                             // | 整体就是 访问-->儿子队:循环(求子->访问子->子进队)-->出子->访问
                                                                                   |
                                         public class Vertex {                     |
                                           protected Vertex parent;                |
                                           protected int depth;                    |
Notice that when we visit a vertex,        protected boolean visited;              |
we pass the edge's origin vertex                                                   |
as a parameter.  This allows us to       public void visit(Vertex origin) {    <---+
do a computation such as finding           this.parent = origin;
the distance of the vertex from            if (origin == null) {
the starting vertex, or finding              this.depth = 0;
the shortest path between them.            } else {
The visit() method at right                  this.depth = origin.depth + 1;
accomplishes both these tasks.             }
                                         }
                                       }
                                                                      +-4--+
                                                           forbidden  X    |
                                                                      |    |
                                                                      v  --3
                                                                    __2     \3
注意这里是如何保证每个 node 存储的都是最短路径?                          _1    \2    \3
1. 首先利用 queue 确保层层推进; 越先访问的,路径越短;                      \1   |2    |3
2. 其次利用 visited 保证每个 node's depth 只被写入一次;                *  |1   |2    |3
   所以后访问的,想改写前面 node's depth 是 forbidden 的;                /1   /2    /3
                                                                    --2     /3
                                                                        ---3

When an edge (v, w) is traversed to visit a Vertex w, the depth of w is set to the depth of v plus one, and v is set to become the parent of w.

The sequence of figures below shows BFS running on the city adjacency graph (Albany, Kensington, Emeryville, Berkeley, Oakland, Piedmont) from last lecture, starting from Albany. A "V" is currently visited; a digit shows the depth of a vertex that is marked visited; a "*" is a vertex which we try to visit but discover has already been visited. Underneath each figure of the graph, I depict the queue and the current value of the variable "v" in bfs().

V-K  0-V  0-1  *-1  0-1  *-1  0-*  0-1  0-1  0-1  0-1  0-1  0-1  0-1  0-1  0-1
 \|   \|   \|   \|   \|   \|   \|   \|   \|   \|   \|   \|   \|   \|   \|   \|
E-B  E-B  E-V  E-1  E-*  E-1  E-1  V-1  2-1  2-*  2-1  *-1  2-*  2-1  2-1  2-1
|/   |/   |/   |/   |/   |/   |/   |/   |/   |/   |/   |/   |/   |/   |/   |/
O-P  O-P  O-P  O-P  O-P  O-P  O-P  O-P  V-P  2-P  *-P  2-P  2-P  2-V  *-3  2-3

===  ===  ===  ===  ===  ===  ===  ===  ===  ===  ===  ===  ===  ===  ===  ===
A    K    KB   B    B              E    EO   O    O              P
===  ===  ===  ===  ===  ===  ===  ===  ===  ===  ===  ===  ===  ===  ===  ===
     v=A  v=A  v=K  v=K  v=B  v=B  v=B  v=B  v=E  v=E  v=O  v=O  v=O  v=P

After we finish, we can find the shortest path from any vertex to the     0<--1
starting vertex by following the parent pointers (right).  These           ^
pointers form a tree rooted at the starting vertex.  Note that they         \
point in the direction _opposite_ the search direction that got us there.    \
                                                                          2-->1
Why does this work?  The starting vertex is enqueued first, then all the     ^
vertices at a distance of 1 from the start, then all the vertices at a      /
distance of 2, and so on.  Why?  When the starting vertex is dequeued,     /
all the vertices at a distance of 1 are enqueued, but no other vertex     2<--3
is.  When the depth-1 vertices are dequeued and processed, all the
vertices at a distance of 2 are enqueued, because every vertex at a distance of
2 must be reachable by a single edge from some vertex at a distance of 1.  No
other vertex is enqueued, because every vertex at a distance less than 2 has
been marked, and every vertex at a distance greater than 2 is not reachable by
a single edge from some vertex at a distance of 1.

Recommendation: pull out a piece of paper, draw a graph and a program stack, and simulate BFS, with you acting as the computer and executing bfs() line by line. You will understand it much better after taking the time to do this.

BFS, like DFS, runs in O(|V| + |E|) time if you use an adjacency list; O(|V|^2) time if you use an adjacency matrix.

2 Weighted Graphs

A weighted graph is a graph in which each edge is labeled with a numerical weight. A weight might express the distance between two nodes, the cost of moving from one to the other, the resistance between two points in an electrical circuit, or many other things.

In an adjacency matrix, each weight is stored in the matrix. Whereas an

  • unweighted graph uses an array of booleans;
  • weighted graph uses an array of ints, doubles, or some other numerical type.

Edges missing from the graph can be represented by a special number like Integer.MIN_VALUE, at the cost of declaring that number invalid as an edge weight. (If you want to permit every int to be a valid edge weight, you might use an additional array of booleans as well.)

In an adjacency list, recall that each edge is represented by a listnode. Each listnode must be enlarged to include a weight, in addition to the reference to the destination vertex. (If you're using an array implementation of lists, you'll need two separate arrays: one for weights, and one for destinations.)

There are two particularly common problems involving weighted graphs.

One is the shortest_path_problem. Suppose a graph represents a highway map, and each road is labeled with the amount of time it takes to drive from one interchange to the next. What's the fastest way to drive from Berkeley to Los Angeles? A shortest path algorithm will tell us. You'll learn several of these algorithms if you take CS 170.

The second problem is constructing a minimum_spanning_tree. Suppose that you're wiring a house for electricity. Each node of the graph represents an outlet, or the source of electricity. Every outlet needs to be connected to the source, but not necessarily directly–possibly routed via another outlet. The edges of the graph are labeled with the length of wire you'll need to connect one node to another. How do you connect all the nodes together with the shortest length of wire?

3 [greedy algo]Kruskal's Algorithm for Finding Mimumum Spanning Trees

Let G = (V, E) be an undirected graph. A spanning_tree T = (V, F) of G is a graph containing the same vertices as G, and |V| - 1 edges of G that form a tree. (Hence, there is exactly one path between any two vertices of T.)

If G is not connected, it has no spanning tree, but we can instead compute a spanning_forest, or collection of trees, having one tree for each connected component of G.

If G is weighted, then a minimum spanning tree T of G is a spanning tree of G whose total weight (summed over all edges of T) is minimal. In other words, no other spanning tree of G has a smaller total weight.

Kruskal's algorithm computes the mimimum spanning tree of G as follows.

[1]  Create a new graph T with the same vertices as G, but no edges (yet).
[2]  Make a list of all the edges in G.
[3]  Sort the edges by weight, from least to greatest.
[4]  Iterate through the edges in sorted order.
     For each edge (u, w):
[4a]   If u and w are not connected by a path in T, add (u, w) to T.
这个意思就是由图,生成树(没有环路),然后确保整棵树的 weight 最小。

Because this algorithm never adds (u, w) if some path already connects u and w, T is guaranteed to be a tree (if G is connected) or a forest (if G is not).

Why is T a minimum spanning tree in the end?

Suppose the algorithm is considering adding an edge (u, w) to T, and there is not yet a path connecting u to w. Let U be the set of vertices in T that are connected (so far) to u, and let W be a set containing all the other vertices, including w. Let the bridge_edges be any edges in G that have one end vertex in U and one end vertex in W. Any spanning tree must contain at least one of these bridge edges. As long as we choose a bridge edge with the least weight, we are safe. (There may be several bridge edges with the same least weight, in which case it doesn't matter which one we choose.)

+-------------+-----------------+-------------+  T
|             |                 |             |
| U           |    W            | other       |
|             |                 |             |
+---^---+-----+---------^--+----+-------------+
    |   |               |  |
    |   |connected      |  |connected
    |   |               |  |
    |   u               |  w
    |                   |
----+-------------------+--------------------------------------
    +---------------+   +--+                     G
                    |      |
                 ---+------+--
                (    E        )
                 -------------

Because we go through the edges of G in order by weight, (u, w) must have the least weight, because it's the first edge we encountered connecting U to W. (See Goodrich and Tamassia page 649 for a proof that choosing the bridge edge with least weight is always the right thing to do.)

What is the running time of Kruskal's algorithm?

  1. step [1] and [2],running time is O(|V|)
  2. step [3],as we'll discover in the next two lectures, sorting |E| edges takes O(|E| log |E|) time.
  3. step [4a], determining whether u and w are already connected by a path. The simplest way to do this is by doing a depth-first search on T starting at u, and seeing if we visit w. But if we do that, Kruskal's algorithm might take Theta(|E| |V|)) time.
  4. We can do better. In Lecture 33, we'll learn how to solve that problem quickly, so that all the iterations of [4a] together take less than O(|E| log |E|) time.
  5. add 1~4 up:
    • If we use an adjacency list, the running time is in O(|V| + |E| log |E|). But |E| < |V|^2, so log |E| < 2 log |V|. Therefore, Kruskal's algorithm runs in O(|V| + |E| log |V|) time.
    • If we use an adjacency matrix, the running time is in O(|V|^2 + |E| log |E|), because it takes Theta(|V|^2) time simply to make a list of all the edges.

4 [greedy algo]Prim's Algo for Finding minimum spanning tree

minimum spanning tree 肯定是有 n 个节点,且有 n-1 条连接的这么个树。

greedy algo: take the smallest as you can ,and keep adding. kruskal: let forest grow together. prim: let edge is connected to the tree you have started to grow let edge which is connected to the small tree that we've created. we could have a heap that has stored all the nodes, we are going to keep the smallest edge from that vertex to the frontier(可以理解为前线). we keep nodes on the heap that represents our frontier.

Prime algo, just pick a node randomly as root.
need 2 arrays:
parent array, the backer is the decendence, storing the tree simply by storing the
              partent, no left, no right. parent represents the current potential finge
value array, how dose it cost to include this node into the spanning tree.
             we only change it if it do better,means an item of array will smaller.
     1 heap: candidates for the next node, 为什么需要 heap,注意 heap 的特点,可以用 O(1)的速度取出最大 or 最小值。
             这里就可以从 candidates 中取出最小值。
             heap store all nodes in it,

running time: n + nlogn + elogn

注意,这里使用的是 heap + 2 arrays, 这种方法适用于稀疏矩阵。还可以使用二维矩阵, 他的复杂度最差是 O(n^2), 适合于非稀疏矩阵。

5 Shortest path tree and Distances

      B -----1------ F
    /   \          / |
   /     \        4  |
  2       2      /   |
 /         \    /    |
/           \  /     1
A ----1----- C       |
\          / \       |
 \        /   \      |
  1      4     3     |
   \    /       \    |
    \  /         \   |
     \/           \  |
      D -----2----- E

  1. $ = infinite

                                      array Dist                   array Parent
           B -----1------ F          A|  0   |                      A| nil |
         /   \          / |          B|  $   |                      B|     |
        /     \        4  |          C|  $   |                      C|     |
       4       2      /   |          D|  $   |                      D|     |
      /         \    /    |          E|  $   |                      E|     |
     /           \  /     1          F|  $   |                      F|     |
    *A ----1----- C       |
     \          / \       |
      \        /   \      |
       1      4     3     |
        \    /       \    |
         \  /         \   |
          \/           \  |
           D -----2----- E
    
    
  2. go from A to all nodes adjacented, record distance in Dist and parent in Parent then, cut A off, to guarantee every node will be visited ONLY ONCE.

                                      array Dist                   array Parent
           B -----1------ F         +A|  0   |+                    +A| nil |+
         /.  \          / |          B|  4   |                      B|  A  |
        /.    \        4  |          C|  1   |                      C|  A  |
       4.      2      /   |          D|  1   |                      D|  A  |
      /.        \    /    |          E|  $   |                      E|     |
     /.          \  /     1          F|  $   |                      F|     |
    *A ----1----- C       |
     \........../ \       |
      \.       /   \      |
       1.     4     3     |
        \.   /       \    |
         \. /         \   |
          \/           \  |
           D -----2----- E
    
    
  3. then, scan the node with smallest distance in Dist like (2) Dist[C] + CB < Dist[B], overwrite Dist[B] with 3, overwrite Parent[B] with C '$' changed with Dist[C] + C* cut C off, if scan finish

                                      array Dist                   array Parent
           B -----1------ F         +A|  0   |+                    +A| nil |+
         /.  \.        ./ |          B|+4+,3 |                      B|+A+,C|
        /.    \.      .4  |         +C|  1   |+                    +C|  A  |+
       4.      \.    ./   |          D|  1   |                      D|  A  |
      /.        \.  ./    |          E|+$+,4 |                      E|  C  |
     /.          \../     1          F|+$+,5 |                      F|  C  |
    *A ----1----- *C      |
     \........../.\.      |
      \.       /.  \.     |
       1.     4.    3.    |
        \.   /.      \.   |
         \. /.        \.  |
          \/.          \. |
           D -----2----- E
    
    
  4. then, scan the node smallest distance in Dist like (2) Dist[B] + BF < Dist[F], overwrite Dist[F] with 4, overwrite Parent[F] with B '$' changed with Dist[C] + C* cut C off, if scan finish

             ............             array Dist                   array Parent
          *B -----1------ F        +A|  0      |+                 +A| nil     |+
         /.  \.        ./ |         +B|+4+,3    |+                 +B|+A+,C    |+
        /.    \.      .4  |         +C|  1      |+                 +C|  A      |+
       4.      2.    ./   |          D|  1      |                   D|  A      |
      /.        \.  ./    |          E|+$+,4    |                   E|  C      |
     /.          \../     1          F|+$+,+5+,4|                   F|+C+,B    |
    *A ----1----- *C      |
     \........../.\.      |
      \.       /.  \.     |
       1.     4.    3.    |
        \.   /.      \.   |
         \. /.        \.  |
          \/.          \. |
           D -----2----- E
    
    (*) .... loop like upper steps, at last, then you will get the shottest distance
             in Dist array and the Path follow the Parent array.