Adjacency Matrix

Graph algorithms on GPUs

F. Busato , N. Bombieri , in Advances in GPU Research and Practice, 2017

1.1 Adjacency Matrices

An adjacency matrix allows representing a graph with a V × V matrix M = [f(i, j)] where each element f(i, j) contains the attributes of the edge (i, j). If the edges do not have an attribute, the graph can be represented by a boolean matrix to save memory space (Fig. 1).

Fig. 1. Matrix representation of a graph in memory.

Common algorithms that use this representation are all-pair shortest path (APSP) and transitive closure [3–9]. If the graph is weighted, each value of f(i, j) is defined as follows:

M i , j 0 if i = j w i , j if i j and i , j E if i j and i , j E

On GPUs, both directed and undirected graphs represented by an adjacency matrix take O(|V |2) memory space, because the whole matrix is stored in memory with a large continuous array. In GPU architectures, it is also important, for performance, to align the matrix with memory to improve coalescence of memory accesses. In this context, the Compute Unified Device Architecture (CUDA) language provides the function cudaMallocPitch [10] to pad the data allocation, with the aim of meeting the alignment requirements for memory coalescing. In this case the indexing changes are as follows:

M [ i V + j ] M [ i p i t c h + j ]

The O(|V |2) memory space required is the main limitation of the adjacency matrices. Even on recent GPUs, they allow handling of fairly small graphs. For example, on a GPU device with 4   GB of DRAM, graphs that can be represented through an adjacency matrix can have a maximum of only 32,768 vertices (which, for actual graph datasets, is considered restrictive). In general, adjacency matrices best represent small and dense graphs (i.e., |E|≈|V |2). In some cases, such as for the all-pairs shortest path problem, graphs larger than the GPU memory are partitioned and each part is processed independently [7–9].

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128037386000070

Graph theory

Mary Attenborough , in Mathematics for Electrical Engineering and Computing, 2003

19.3 Matrix representation of a graph

The incidence matrix and adjacency matrix

The incidence matrix of a graph G is a |V| ×|E| matrix. The element aij = the number of times that vertex v i is incident with the edge e j.

The adjacency matrix of G is the | V| × |V| matrix. aij = the number of edges joining v i and v j The incidence matrix for the graph in Figure 19.2 is given by

e 1 e 2 e 3 e 4 e 5 e 6 e 7 e 8 e 9 v 1 v 2 v 3 v 4 v 5 v 6 ( 1 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 0 1 0 1 1 0 0 0 0 2 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 1 1 0 0 )

and the adjacency matrix by

v 1 v 2 v 3 v 4 v 5 v 6 v 1 v 2 v 3 v 4 v 5 v 6 ( 0 1 0 0 0 1 1 0 1 0 2 1 0 1 1 1 0 0 0 0 1 0 1 0 0 2 0 1 0 0 1 1 0 0 0 0 )

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780750658553500450

Graphs in SQL

Joe Celko , in Joe Celko's SQL for Smarties (Fifth Edition), 2015

27.4 Adjacency Matrix Model

An adjacency matrix is a square array whose rows are out-node and columns are in-nodes of a graph. A one in a cell means that there is edge between the two nodes. Using the graph in figure 30.1, we would have an array like this:

A B C D E F G H
A 1 1 1 0 0 0 0 0
B 0 1 0 1 0 0 0 0
C 0 0 1 1 0 0 1 0
D 0 0 0 1 1 1 0 0
E 0 0 0 0 1 0 0 1
F 0 0 0 0 0 1 0 0
G 0 0 0 0 0 0 1 1
H 0 0 0 0 0 0 0 1

Many graph algorithms are based on the adjacency matrix model and can be translated into SQL. Go back to the chapter on modeling matrices in SQL and in particular matrix multiplication in SQL. For example, Dijkstra's algorithm for the shortest distances between each pair of nodes in a graph looks like this in pseudo-code.

FOR k   =   1 TO n

DO FOR i   =   1 TO n

  DO FOR j   =   1 TO n

IF a[i,k]   +   a[k,j]   <   a[i,j]

THEN a[i,j]   =   a[i,k]   +   a[k,j]

END IF;

  END FOR;

END FOR;

END FOR;

You need to be warned that for a graph of (n) nodes, the table will be of size (n^2). The algorithms often run in (n^3) time. The advantage it has is that once you have completed a table it can be used for look ups rather than recomputing distances over and over.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128007617000279

Graphs in SQL

Joe Celko , in Joe Celko's Trees and Hierarchies in SQL for Smarties (Second Edition), 2012

12.1.4 Adjacency Matrix Model

An adjacency matrix is a square array whose rows are out-node and columns are in-nodes of a graph. A one in a cell means that there is edge between the two nodes. Using the following graph, we would have an array like this:

A B C D E F G H
A 1 1 1 0 0 0 0 0
B 0 1 0 1 0 0 0 0
C 0 0 1 1 0 0 1 0
D 0 0 0 1 1 1 0 0
E 0 0 0 0 1 0 0 1
F 0 0 0 0 0 1 0 0
G 0 0 0 0 0 0 1 1
H 0 0 0 0 0 0 0 1

Many graph algorithms are based on the adjacency matrix model and can be translated into SQL. Go to the appropriate chapter for the details of modeling matrices in SQL and, in particular, look at the section on matrix multiplication in SQL. For example, Dijkstra's algorithm for shortest distances between each pair of nodes in a graph looks like this in this array pseudo-code.

FOR k = 1 TO n

DO FOR i = 1 TO n

DO FOR j = 1 TO n

IF a[i,k] + a[k,j] < a[i,j]

THEN a[i,j] = a[i,k] + a[k,j]

END IF;

END FOR;

END FOR;

END FOR;

You need to be warned that for a graph of (n) nodes, the table will be of size (n^2). The algorithms often run in (n^3) time. The advantage it has is that once you have completed a table, it can be used for lookups rather than recomputing distances over and over.

Running the query against the data set …

INSERT INTO AdjacencyListGraph

VALUES ('a', 'd', 1),

('d', 'e', 1),

('e', 'c', 1),

('c', 'b', 1),

('b', 'd', 1),

('a', 'e', 5);

Gives the result SET …

source_node dest_node min_wgt
a b 4
a c 3
a d 1
a e 2
b c 3
b d 1
b e 2
c b 1
c d 2
c e 3
d b 3
d c 2
d e 1
e b 2
e c 1
e d 3

Doing the Dijkstra algorithm would probably execute significantly faster in a language with arrays than in SQL.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123877338000045

A novel graph clustering algorithm based on discrete-time quantum random walk

S.G. Roy , A. Chakrabarti , in Quantum Inspired Computational Intelligence, 2017

12 Proposed Graph-Based Quantum Clustering Algorithm

We identify clusters within a given general graph G(V, E). If there is any cluster structure present in the graph, the observed QCL simulation result identifies the number of clusters within that graph. The proposed quantum algorithm to identify clusters is given below:

Input: Graph adjacency matrix

Output: Number of clusters within the graph and nodes within each clusters.

Quantum clustering algorithm: Design discrete-time quantum random walk circuit for the graph (as explained below):

1.

Start at the origin node: x = 0.

2.

Choose the coin operator (assumed here to be the here Hadamard coin operator):

H | x , 0 | x , 0 + | x , 1 2 ,

H | x , 1 | x , 0 | x , 1 2

.
3.

The shift operator S helps to move the quantum walker from one quantum state to another state of the superposition states according to the permutation circuit.

4.

Apply U = S ( I Ĉ ) on the initial quantum state, where I is the identity operator, S is the shift operator, and C is the coin operator. (If U transformation for initial states is repeatedly used, then the resulting superposition state |ψ〉 will contain more nodes of the graph. The random walker traversed the graph faster.)

5.

Repeat step 4 for r consecutive transformations of U until a steady state (convergence) is reached (i.e., all nodes with all the coin states are visited by the quantum walker).

6.

Interpret the quantum simulation result for initial clustering. The nodes with the same visiting probability distribution values are now in the same cluster.

7.

Step 6 may generate many clusters (the number of nodes is less than the threshold value). Apply the merge and split model to fulfill the threshold criteria and to identify the actual cluster sets.

8.

End.

12.0.2 Merge and split model

1.

Let p i be the probability distribution values of node i = 0, 1, 2, n. Depending on the QCL simulation result, group the nodes into different sets of cluster Cluster j , where j = 1, 2, …, n, according to their visiting probability distribution values.

2.

The k-dimensional tree method is discussed in [34, 35]. We use the one-dimensional tree method (as our nodes are one-dimensional) in our merge and split model to identify the clusters.

3.

Assume we have set P, which consists of visiting probability distributions values for each cluster Cluster j , where j = 1, 2, …, n.

4.

Set P is divided into two subregions in accordance with their average probability distribution values: 1 N i = 1 N P i (N denotes the number of clusters obtained in step 6 of the proposed quantum clustering algorithm and P i is the visiting probability distribution values for each cluster). One region has values greater than the average probability distribution value (p avg). The other region has values less than the average probability distribution value.

5.

Repeat until threshold criteria are reached.

Observation of the proposed algorithm shows that the nodes with the same or nearly the same visiting probability values tend to lie within the same cluster.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128044094000115

Computer Aided Molecular Design: Theory and Practice

P.M. Harper , ... R. Gani , in Computer Aided Chemical Engineering, 2003

Generation Algorithm for Level-3

The level-3 generation algorithm transforms the group based connectivity information (the adjacency matrix from level-2) into atom-based information. This is achieved by expanding each group into its corresponding atom-based adjacency matrix and replacing the groups in the group based description with additional rows and columns to allow for group expansion. When performing the group expansion into an atomic representation it is possible to experience that one group based description yields more than one atomic description. This is the case with compounds containing any of the groups listed in Table 5. It can be noted that the additional representations appear in the cases where the original groups have a ring element with 1 or more free connections because of the ambiguously defined distance (in the ring) between the free bonds (as in ortho/meta/para) or between hetero-atoms and bonds in aromatic rings (as in Pyridine derivates).

Table 5. Examples of first-order groups with multiple atomic representations

First-order group Number of isomers on an atomic basis
C5H4N 3
C5H3N 6
C4H3S 2
C4H2S 4

The algorithm for generation of atomic adjacency matrices from group-based ones consists of the following steps:

1.

Set the matrix A equal to the group based matrix from Level 2

2.

List the groups in the compound

3.

For each of the groups in the compound:

(a).

Load the corresponding atom based matrix or matrices (for groups with ambiguous 2 dimensional representation)

(b).

Insert the atom based matrix in the place of the corresponding group in A . If the particular group has multiple representations create a corresponding number of copies of A

4.

Identify the atoms taking part in the original bonds between groups

5.

Reconnect the molecule by establishing connections between the atoms identified in point 3

6.

Stop

After performing the conversion the net result is a series of compounds described using atoms and how they are interconnected. Furthermore all 2D structural variations on the atomic level have been generated. This conversion process is illustrated in Figure 16.

Figure 16. Illustration of the conversion from group to atomic representation

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/S1570794603800081

Elements of Graph Theory

Jean-Michel Réveillac , in Optimization Tools for Logistics, 2015

2.3 Directed graph or digraph

When the edges in a graph have a direction, we get a digraph or oriented graph. An oriented graph G is formed of two sets, the first S is the set containing the vertices {s1, s2,…, sn} and the second C contains the edges {c1, c2,…, cn}, denoted as G=(S, C).

An edge c is a pair or an ordered couple of vertices. If c={x, y), then edge c goes from x to y. The initial endpoint of c is y and the final endpoint is y.

The external degree d + (x) is denoted as the number of edges with x as the initial endpoint and the internal degree d - (x) is the number of edges with x as a final endpoint, meaning that the degree of a vertex of a diagraph is d(x)=d + (x)+d - (x).

A directed graph can be symmetric if the direction of the edges is reversed from the initial graph.

2.3.1 Path and circuit in a digraph

A path is a sequence of vertices and edges that link one vertex to another.

In a digraph, what we call distance is the length of the shortest path linking two vertices.

In Figure 2.20, the distance of d(s4, s1)=1, d(s5, s2)=3, d(s2, s3)=∞ (no path).

Figure 2.20. A digraph

A diagraph is said to be strongly connected if each vertex can be reached from the other vertices by at least one path.

2.3.2 Absence of circuit in a digraph

For applications related particularly to task scheduling, digraphs without circuits are very important. This type of graph is also said to be acyclic.

Let G=(S, C) a diagraph. It is without circuit if and only if the set of its vertices has been topologically sorted.

Topological sorting is a depth-first search of the graph where a vertex is always visited before its successors, each vertex being given a number denoting the order.

In Figure 2.21, we consider digraph G=(S, C) with S={A, B, C, D, E, F, G, H} and C={(A, B), (A, C), (B, D), (B, E), (C, B), (C, D), (C, E), (D, F), (D, E), (E, G), (F, G), (F, H), (G, H)}.

Figure 2.21. A digraph without circuit

An acceptable topological order could be the following sequence of vertices: F, A, C, B, D, E, G, H, meaning that digraph G has no circuit.

A digraph without circuit contains at least a vertex x of the inferior degree or is equal to 0, so d-x(x) = 0.

To express the fact that a digraph has no circuit we can also use the notion of row r(x) and level n(x).

Let G=(S, C) a digraph. Row r(x), with x ∈ S, is the number of edges in the longest path with x as a terminal endpoint. The level n(x), with x ∈ S, is the number of edges in the longest path with x as an initial endpoint.

Each digraph without circuit can be ordered by ascending row or descending level.

Figure 2.22. A digraph without circuit with its rows and levels

2.3.3 Adjacency matrix

A digraph can be represented by an adjacency matrix . This is a double entry table with n lines and m columns representing the vertices of the digraph and whose intersections designate a vertex. This matrix is always square and it always has 0 on its diagonal unless it is a loop. It is not symmetric.

On the right, Figure 2.23 shows the adjacency matrix representing the graph. When an edge joins two edges the value in the matrix is 1.

Figure 2.23. An example of a graph and its adjacency matrix

There are other possible uses for the adjacency matrix, which has very interesting properties. These uses will be described in the following chapters of this book.

2.3.4 Valued graph matrix

As for the adjacency matrix, a valued graph can be represented by a square matrix. Each coefficient corresponds to the value (weight, cost) represented by an edge.

If G = (S, C, v) is the graph in Figure 2.24, the valuation matrix of G can be defined as the square matrix M=m(i, j) with a size n × n respecting:

Figure 2.24. A valued digraph

M ij = v ( i , j ) if ( i , j ) C otherwise

We get the associated matrix:

[ 2 6 3 4 1 2 7 2 11 ]

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9781785480492500025

Nodes, Edges, and Network Measures

Jennifer Golbeck , in Analyzing the Social Web, 2013

Adjacency matrix

An alternative to the adjacency list is an adjacency matrix. In an adjacency matrix, a grid is set up that lists all the nodes on both the X-axis (horizontal) and the Y-axis (vertical). Then, values are filled in to the matrix to indicate if there is or is not an edge between every pair of nodes. Typically, a 0 indicates no edge and a 1 indicates an edge.

The Adjacency Matrix for the Apollo 13 Network

Notice a couple of things about this matrix. First, the diagonal is all zeroes because there are no edges between a node and itself in our example. Some networks do allow for self-loops. For example, in an email network, if a person emails himself, there could be a link from one node to itself, and thus there would be a 1 on the diagonal. Second, the matrix is symmetric. The numbers in the first row are the same as the numbers in the first column. The numbers in the second row are the same as the numbers in the second column. This is because the graph is undirected. Just as in the adjacency list, where the order of pairs in an undirected graph didn't matter,

Notice that the Diagonal, Indicating a Person's Link to Himself, is all 0s

If we have a directed network, the matrix will not necessarily be symmetric. For example, consider the small network in Figure 2.5. In this case, there are edges from A to C, and C to A, and from A to B, but the reciprocal edge from B to A is absent. Thus, we only record a 1 for the A–B edge, and record a 0 for the B–A edge. The adjacency matrix would look like this:

A Small Adjacency Matrix for a Directed Network

In the examples we have seen so far, we have been recording a 1 in the matrix to indicate an edge is present, and a 0 when there is no edge. This scheme can be altered to show the weight of an edge as well. To do this, we replace the 1 with the edge weight. Using the values from Figure 2.4, we would have a weight of 4 between Tom Hanks and Gary Sinise. The matrix would look like this:

The Adjacency Matrix for the Apollo 13 Network with Edge Weights

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B978012405531500002X

Parallel Program Development

Peter S. Pacheco , in An Introduction to Parallel Programming, 2011

6.2.11 Implementation of tree search using MPI and static partitioning

The vast majority of the code used in the static parallelizations of tree search using Pthreads and OpenMP is taken straight from the second implementation of serial, iterative tree search. In fact, the only differences are in starting the threads, the initial partitioning of the tree, and the Update_best_tour function. We might therefore expect that an MPI implementation would also require relatively few changes to the serial code, and this is, in fact, the case.

There is the usual problem of distributing the input data and collecting the results. In order to construct a complete tour, a process will need to choose an edge into each vertex and out of each vertex. Thus, each tour will require an entry from each row and each column for each city that's added to the tour, so it would clearly be advantageous for each process to have access to the entire adjacency matrix. Note that the adjacency matrix is going to be relatively small. For example, even if we have 100 cities, it's unlikely that the matrix will require more than 80,000 bytes of storage, so it makes sense to simply read in the matrix on process 0 and broadcast it to all the processes.

Once the processes have copies of the adjacency matrix, the bulk of the tree search can proceed as it did in the Pthreads and OpenMP implementations. The principal differences lie in

partitioning the tree,

checking and updating the best tour, and

after the search has terminated, making sure that process 0 has a copy of the best tour for output.

We'll discuss each of these in turn.

Partitioning the tree

In the Pthreads and OpenMP implementations, thread 0 uses breadth-first search to search the tree until there are at least thread_count partial tours. Each thread then determines which of these initial partial tours it should get and pushes its tours onto its local stack. Certainly MPI process 0 can also generate a list of comm_sz partial tours. However, since memory isn't shared, it will need to send the initial partial tours to the appropriate process. We could do this using a loop of sends, but distributing the initial partial tours looks an awful lot like a call to MPI_Scatter. In fact, the only reason we can't use MPI_Scatter is that the number of initial partial tours may not be evenly divisible by comm_sz. When this happens, process 0 won't be sending the same number of tours to each process, and MPI_Scatter requires that the source of the scatter send the same number of objects to each process in the communicator.

Fortunately, there is a variant of MPI_Scatter, MPI_Scatterv, which can be used to send different numbers of objects to different processes. First recall the syntax of MPI_Scatter:

int MPI_Scatter(

void  sendbuf   /* in */,

int  sendcount   /* in */,

  MPI_Datatype   sendtype   /* in */,

void*   recvbuf   /* out */,

int  recvcount   /* in */,

  MPI_Datatype   recvtype   /* in */,

int  root   /* in */,

  MPI_Comm   comm   /* in */);

Process root sends sendcount objects of type sendtype from sendbuf to each process in comm. Each process in comm receives recvcount objects of type recvtype into recvbuf. Most of the time, sendtype and recvtype are the same and sendcount and recvcount are also the same. In any case, it's clear that the root process must send the same number of objects to each process.

MPI_Scatterv, on the other hand, has syntax

int MPI_Scatterv(

void*   sendbuf   /* in */,

int*   sendcounts   /* in */,

int*   displacements   /* in */,

  MPI_Datatype sendtype   /* in */,

void*   recvbuf   /* out */,

int  recvcount   /* in */,

  MPI_Datatype recvtype   /* in */,

int  root   /* in */,

  MPI_Comm   comm   /* in */);

The single sendcount argument in a call to MPI_Scatter is replaced by two array arguments: sendcounts and displacements. Both of these arrays contain comm_sz elements: sendcounts[q] is the number of objects of type sendtype being sent to process q. Furthermore, displacements[q] specifies the start of the block that is being sent to process q. The displacement is calculated in units of type sendtype. So, for example, if sendtype is MPI_INT, and sendbuf has type int*, then the data that is sent to process q will begin in location

  sendbuf + displacements[q]

In general, displacements[q] specifies the offset into sendbuf of the data that will go to process q. The "units" are measured in blocks with extent equal to the extent of sendtype.

Similarly, MPI_Gatherv generalizes MPI_Gather:

int MPI_Gatherv(

void*   sendbuf   /* in */,

int  sendcount   /* in */,

  MPI_Datatype sendtype   /* in */,

void*   recvbuf   /* out */,

int*   recvcounts   /* in */,

int*   displacements   /* in */,

  MPI_Datatype recvtype   /* in */,

int  root   /* in */,

  MPI_Comm   comm   /* in */);

Maintaining the best tour

As we observed in our earlier discussion of parallelizing tree search, having each process use its own best tour is likely to result in a lot of wasted computation since the best tour on one process may be much more costly than most of the tours on another process (see Exercise 6.21). Therefore, when a process finds a new best tour, it should send it to the other processes.

First note that when a process finds a new best tour, it really only needs to send its cost to the other processes. Each process only makes use of the cost of the current best tour when it calls Best_tour. Also, when a process updates the best tour, it doesn't care what the actual cities on the former best tour were; it only cares that the cost of the former best tour is greater than the cost of the new best tour.

During the tree search, when one process wants to communicate a new best cost to the other processes, it's important to recognize that we can't use MPI_Bcast, for recall that MPI_Bcast is blocking and every process in the communicator must call MPI_Bcast. However, in parallel tree search the only process that will know that a broadcast should be executed is the process that has found a new best cost. If it tries to use MPI_Bcast, it will probably block in the call and never return, since it will be the only process that calls it. We therefore need to arrange that the new tour is sent in such a way that the sending process won't block indefinitely.

MPI provides several options. The simplest is to have the process that finds a new best cost use MPI_Send to send it to all the other processes:

for (dest = 0; dest < comm_sz; dest++)

if (dest != my_rank)

  MPI_Send(&new_best_cost, 1, MPI_INT, dest, NEW_COST_TAG, comm);

Here, we're using a special tag defined in our program, NEW_COST_TAG. This will tell the receiving process that the message is a new cost–as opposed to some other type of message–for example, a tour.

The destination processes can periodically check for the arrival of new best tour costs. We can't use MPI_Recv to check for messages since it's blocking; if a process calls

  MPI_Recv(&received_cost, 1, MPI_INT, MPI_ANY_SOURCE, NEW_COST_TAG, comm, &status);

the process will block until a matching message arrives. If no message arrives—for example, if no process finds a new best cost—the process will hang. Fortunately, MPI provides a function that only checks to see if a message is available; it doesn't actually try to receive a message. It's called MPI_Iprobe, and its syntax is

int MPI_Iprobe(

int  source   /* in */,

int  tag   /* in */,

  MPI_Comm   comm   /* in */,

int*   msg_avail_p   /* out */,

  MPI_Status* status_p   /* out */);

It checks to see if a message from process rank source in communicator comm and with tag tag is available. If such a message is available, *msg_avail_p will be assigned the value true and the members of *status_p will be assigned the appropriate values. For example, status_p->MPI_SOURCE will be assigned the rank of the source of the message that's been received. If no message is available, *msg_avail_p will be assigned the value false. The source and tag arguments can be the wildcards MPI_ANY_SOURCE and MPI_ANY_TAG, respectively. So, to check for a message with a new cost from any process, we can call

  MPI_Iprobe(MPI_ANY_SOURCE, NEW_COST_TAG, comm, &msg_avail, &status);

If msg_avail is true, then we can receive the new cost with a call to MPI_Recv:

  MPI_Recv(&received_cost, 1, MPI_INT, status.MPI_SOURCE, NEW_COST_TAG, comm, MPI_STATUS_IGNORE);

A natural place to do this is in the Best_tour function. Before checking whether our new tour is the best tour, we can check for new tour costs from other processes with the code in Program 6.9.

Program 6.9. MPI code to check for new best tour costs

This code will continue to receive messages with new costs as long as they're available. Each time a new cost is received that's better than the current best cost, the variable best_tour_cost will be updated.

Did you spot the potential problem with this scheme? If there is no buffering available for the sender, then the loop of calls to MPI_Send can cause the sending process to block until a matching receive is posted. If all the other processes have completed their searches, the sending process will hang. The loop of calls to MPI_Send is therefore unsafe.

There are a couple of alternatives provided by MPI: buffered sends and nonblocking sends. We'll discuss buffered sends here. See Exercise 6.22 for a discussion of nonblocking operations in MPI.

Modes and Buffered Sends

MPI provides four modes for sends: standard, synchronous, ready, and buffered. The various modes specify different semantics for the sending functions. The send that we first learned about, MPI_Send, is the standard mode send. With it, the MPI implementation can decide whether to copy the contents of the message into its own storage or to block until a matching receive is posted. Recall that in synchronous mode, the send will block until a matching receive is posted. In ready mode, the send is erroneous unless a matching receive is posted before the send is started. In buffered mode, the MPI implementation must copy the message into local temporary storage if a matching receive hasn't been posted. The local temporary storage must be provided by the user program, not the MPI implementation.

Each mode has a different function: MPI_Send, MPI_Ssend, MPI_Rsend, and MPI_Bsend, respectively, but the argument lists are identical to the argument lists for MPI_Send:

int MPI_Xsend(

void*   message   /* in */,

int  message_size   /* in */,

  MPI_Datatype   message_type   /* in */,

int  dest   /* in */,

int  tag   /* in */,

  MPI_Comm   comm   /* in */);

The buffer that's used by MPI_Bsend must be turned over to the MPI implementation with a call to MPI_Buffer_attach:

int MPI_Buffer_attach(

void*   buffer   /* in */,

int  buffer_size   /* in */);

The buffer argument is a pointer to a block of memory allocated by the user program and buffer_size is its size in bytes. A previously "attached" buffer can be reclaimed by the program with a call to

int MPI_Buffer_detach(

void*   buf_p   /* out */,

int*   buf_size_p   /* out */);

The *buf_p argument returns the address of the block of memory that was previously attached, and *buf_size_p gives its size in bytes. A call to MPI_Buffer_detach will block until all messages that have been stored in the buffer are transmitted. Note that since buf_p is an output argument, it should probably be passed in with the ampersand operator. For example:

char buffer[1000];

char* buf;

int buf_size;

 

  MPI_Buffer_attach(buffer, 1000);

 

  /* Calls to MPI_Bsend */

 

  MPI_Buffer_detach(&buf, &buf_size);

At any point in the program only one user-provided buffer can be attached, so if there may be multiple buffered sends that haven't been completed, we need to estimate the amount of data that will be buffered. Of course, we can't know this with any certainty, but we do know that in any "broadcast" of a best tour, the process doing the broadcast will make comm_sz−1 calls to MPI_Bsend, and each of these calls will send a single int . We can thus determine the size of the buffer needed for a single broadcast. The amount of storage that's needed for the data that's transmitted can be determined with a call to MPI_Pack_size:

int MPI_Pack_size(

int  count   /* in */,

  MPI_Datatype   datatype   /* in */,

  MPI_Comm   comm   /* in */,

int*   size_p   /* out */);

The output argument gives an upper bound on the number of bytes needed to store the data in a message. This won't be enough, however. Recall that in addition to the data, a message stores information such as the destination, the tag, and the communicator, so for each message there is some additional overhead. An upper bound on this additional overhead is given by the MPI constant MPI_BSEND_OVERHEAD. For a single broadcast, the following code determines the amount of storage needed:

int data_size;

int message_size;

int bcast_buf_size;

  MPI_Pack_size(1, MPI_INT, comm, &data_size);

  message_size = data_size + MPI_BSEND_OVERHEAD;

  bcast_buf_size = (comm_sz − 1)*message_size;

We should guess a generous upper bound on the number of broadcasts and multiply that by bcast_buf_size to get the size of the buffer to attach.

Printing the best tour

When the program finishes, we'll want to print out the actual tour as well as its cost, so we do need to get the tour to process 0. It might at first seem that we could arrange this by having each process store its local best tour—the best tour that it finds—and when the tree search has completed, each process can check its local best tour cost and compare it to the global best tour cost. If they're the same, the process could send its local best tour to process 0. There are, however, several problems with this. First, it's entirely possible that there are multiple "best" tours in the TSP digraph, tours that all have the same cost, and different processes may find these different tours. If this happens, multiple processes will try to send their best tours to process 0, and all but one of the threads could hang in a call to MPI_Send. A second problem is that it's possible that one or more processes never received the best tour cost, and they may try to send a tour that isn't optimal.

We can avoid these problems by having each process store its local best tour, but after all the processes have completed their searches, they can all call MPI_Allreduce and the process with the global best tour can then send it to process 0 for output. The following pseudocode provides some details:

struct {

int cost;

int rank;

  } loc_data, global_data;

  loc_data.cost = Tour_cost(loc_best_tour);

  loc_data.rank = my_rank;

  MPI_Allreduce(&loc_data, &global_data, 1, MPI_2INT, MPI_MINLOC, comm);

if (global_data.rank == 0) return;

  /* 0 already has the best tour */

if (my_rank == 0)

  Receive best tour from process global_data.rank;

else if (my_rank == global_data.rank)

  Send best tour to process 0;

The key here is the operation we use in the call to MPI_Allreduce. If we just used MPI_MIN, we would know what the cost of the global best tour was, but we wouldn't know who owned it. However, MPI provides a predefined operator, MPI_MINLOC, which operates on pairs of values. The first value is the value to be minimized—in our setting, the cost of the tour—and the second value is the location of the minimum—in our setting, the rank of the process that actually owns the best tour. If more than one process owns a tour with minimum cost, the location will be the lowest of the ranks of the processes that own a minimum cost tour. The input and the output buffers in the call to MPI_Allreduce are two-member structs. Since both the cost and the rank are int s, both members are int s. Note that MPI also provides a predefined type MPI_2INT for this type. When the call to MPI_Allreduce returns, we have two alternatives:

If process 0 already has the best tour, we simply return.

Otherwise, the process owning the best tour sends it to process 0.

Unreceived messages

As we noted in the preceding discussion, it is possible that some messages won't be received during the execution of the parallel tree search. A process may finish searching its subtree before some other process has found a best tour. This won't cause the program to print an incorrect result; the call to MPI_Allreduce that finds the process with the best tour won't return until every process has called it, and some process will have the best tour. Thus, it will return with the correct least-cost tour, and process 0 will receive this tour.

However, unreceived messages can cause problems with the call to MPI_Buffer_detach or the call to MPI_Finalize. A process can hang in one of these calls if it is storing buffered messages that were never received, so before we attempt to shut down MPI, we can try to receive any outstanding messages by using MPI_Iprobe. The code is very similar to the code we used to check for new best tour costs. See Program 6.9. In fact, the only messages that are not sent in collectives are the "best tour" message sent to process 0, and the best tour cost broadcasts. The MPI collectives will hang if some process doesn't participate, so we only need to look for unreceived best tours.

In the dynamically load-balanced code (which we'll discuss shortly) there are other messages, including some that are potentially quite large. To handle this situation, we can use the status argument returned by MPI_Iprobe to determine the size of the message and allocate additional storage as necessary (see Exercise 6.23).

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123742605000063

Mathematical background

Barry Dwyer , in Systems Analysis and Synthesis, 2016

2.7.4.1 Adjacency Matrices

The first structure is a two-dimensional array, or adjacency matrix . 34 The rows of the matrix represent x values, and the columns represent y values. The entry in cell ( x , y ) is true if x y is a member of the relation, and is false otherwise. By extension, a matrix can also represent a labelled graph , the entry in cell ( x , y ) containing the label of the edge x y . The matrix representation allows all ( x , y ) pairs with a given x value or a given y value to be easily enumerated (by scanning row x or column y ), but it usually makes poor use of memory: the number of cells is X × Y , which may be many times more than the number of pairs in the relation.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128053041000114