*Algorithm Design* is an excellent introduction to algorithms. The book’s official site is here:

http://www.aw-bc.com/info/kleinberg/

**Chp. 4 Q27:**

In trying to understand the combinatorial structure of spanning trees, we can consider the space of *all* possible spanning trees of a given graph, and study the properties of this space. This is a strategy that has been applied to many similar problems as well.

Here is one way to do this. Let G be a connected graph, and T and T’ two different spanning trees of G. We say that T and T’ are *neighbors* if T contains exactly one edge that is not in T’, and T’ contains exactly one edge that is not in T. Now, from any graph G, we can build a (large) graph H as follows. The nodes of H are the spanning trees of G, and there is an edge between two nodes H if the corresponding spanning trees are neighbors.

Is it true that for any connected graph G, the resulting graph H is connected? Give a proof that H is always connected, or provided an example (with explanation) of a connected graph G for which H is not connected.

A:

True that the resulting H is always connected. Let G=(V,E) be a connected graph, and T=(V, E1) and T’=(V, E2) be two different spanning trees of G [by definition the spanning trees contain all the vertices of G and some, maybe all, the edges of G].

Hypothesis: Let k be the difference in the number of edges, then there is a path of length k from T to T’ in H.

*Proof by induction on k:*

Base case: k=1. Say T’ contains exactly one edge not in T (it could be vice versa). Then they are neighbors, and by definition of how H is constructed, there is an edge between the two nodes of H corresponding to the two spanning trees because they are neighbors. Since one edge connects the two nodes in H, that edge is a path of length 1.

Inductive step: Assume hypothesis true for k=n. Need to show hypothesis true for k=n+1. T’=(V, E2) and T=(V, E1) have n+1 edges not in common, ie. |E1-E2|=n+1. Choose an edge e that is in E2 but is not in E1. Then tree T∪{e} contains a cycle, and the cycle contains an edge e’ not in E2. Let tree T’’=T∪{e}-{e’}=(V, E3). E3 and E2 have n edges not in common, so by the hypothesis, there is a path of length n from T’ to T’’ in H. T and T’’ are neighbors, so there is a path of length n+1 from T to T’.

**Chp. 5 Q3:**

Suppose you’re consulting for a bank that’s concerned about fraud detection, and they come to you with the following problem. They have a collection of n bank cards that they’ve confiscated, suspecting them of being used in fraud. Each bank card is a small plastic object, containing a magnetic stripe with some encrypted data, and it corresponds to a unique account in the bank. Each account can have many bank cards corresponding to it, and we’ll say that two bank cards are equivalent if they correspond to the same account.

It’s very difficult to read the account number off a bank card directly, but the bank has a high-tech “equivalence tester” that takes two bank cards and, after performing some computations, determines whether they are equivalent.

Their question is the following: among the collection of n cards, is there a set of more than n/2 of them that are allequivalent to one another? Assume that the only feasible operations you can do with the cards are to pick two of themand plug them in to the equivalence tester. Show how to decide the answer to their question with only O(n log n)invocations of the equivalence tester.

A:

Algorithm: EquivTester(set of cards S) n=|S| If n=1 return the only card If n=2 If card1 and card2 are equivalent return card1 Let S1 be the first ⌊n/2⌋ cards. Let S2 be the remaining ⌊n/2⌋ cards. If EquivTester(S1) returns a card test the returned card against all other cards in S If you have not found the card for which more than cards are equivalent If EquivTester(S2) returns a card test the returned card against all other cards in S Return the card for which more than n/2 cards are equivalent if found

Correctness: This algorithm is correct because if more than n/2 cards are equivalent, then when you divide the whole set into two sets, at least one of the half-sets will have more than half of the cards equivalent to the whole set’s majority equivalence. Thus, one of the two recursive calls must return a card equivalent to the whole set’s majority equivalence, and this algorithm compares all returned cards to the larger set, so the majority equivalence will be found.

Running time: Number of times the equivalence tester is invoked for n cards = T(n). There are two recursive calls, each of size n/2, and at most collectively 2n tests for the two returned cards at each level. So T(n) ≤ 2T(n/2) + 2n = O(n log n).

**Chp. 6 Q1:**

a) Take a path 3-5-3. The greedy algorithm will pick 5, but picking the two 3-weight nodes yields a larger total weight (3+3=6>5).

b) Consider the example path provided by the book, 1-8-6-3-6. S1 has total weight 1+6+6=13, and S2 has total weight 8+3=11, so this algorithm will return S1. However, choosing the 8-node and the right-most 6-node yields the maximum total weight 14, so S1 is not correct.

c) Have an array, Score, of size n+1. Score[i] contains the maximum total weight of an independent set on {v1, v2,…,vi}.

Score[0]=0 | O(1) | |

Score[1]=weight of v1 | O(1) | |

For i=2 to n | O(n) | |

Score[i]=max{Score[i-1], weight of vi + Score[i-2]} | O(1) |

Score[n] is the maximum total weight of the independent set. Looking at the max computations, we can follow what the independent set with maximum weight is.

Correctness: The null set has no weight (so Score[0]=0). The set of one node has weight of that node (so Score[1]=weight of v1). For i>1, vi is either in the maximum weight independent set or is not. If it is not, then the max weight is the max weight of the independent set on {v1, v2,…,vi-1}. If it is in the set, then vi-1 cannot be in the set, so we have the weight of vi plus the max weight of the independent set on {v1, v2,…,vi-2}.

Runtime: For some size i where we determine the maximum total weight of an independent set on {v1, v2,…,vi}, we have an operation that can be done in constant time (assignment of constant, or taking max of two values that have been calculated in previous steps). Since we examine max weights for size i from 0 to n, the running time is O(n).

**Chp. 6 Q4:**

Suppose you’re running a lightweight consulting business– just you, two associates,and some rented equipment. Your clients are distributed between the East Coast andthe West Coast, and this leads to the following question.

Each month, you can either run your business from an oce in New York (NY), orfrom an oce in San Francisco (SF). In month i, you’ll incur an operating cost of Niif you run the business out of NY; you’ll incur an operating cost of Si if you run thebusiness out of SF. (It depends on the distribution of client demands for that month.)However, if you run the business out of one city in month i, and then out of the othercity in month i + 1, then you incur a fixed moving cost of M to switch base offices.Given a sequence of n months, a plan is a sequence of n locations– each one equal toeither NY or SF– such that the ith location indicates the city in which you will bebased in the ith month. The cost of a plan is the sum of the operating costs for each ofthe n months, plus a moving cost of M for each time you switch cities. The plan canbegin in either city.

The problem is: Given a value for the moving cost M, and sequences of operatingcosts N1,…,Nn and S1,…,Sn, find a plan of minimum cost. (Such a plan will becalled optimal.)

(a) Show that the following algorithm does not correctly solve this problem, by givingan instance on which it does not return the correct answer.

For i = 1 to n If Ni < Si then Output "NY in Month i" Else Output "SF in Month i" End

In your example, say what the correct answer is and also what the above algorithmfinds.

(b) Give an example of an instance in which every optimal plan must move (i.e. changelocations) at least three times.

Provide an explanation saying why your example has thisproperty.

(c) Give an algorithm that takes values for n, M, and sequences of operating costsN1,...,Nn and S1,...,Sn, and returns the cost of an optimal plan.

The running time of your algorithm should be polynomial in n. You should prove thatyour algorithm works correctly, and include a brief analysis of the running time.

A:

a) Let M=1000. {N1, N2}={1, 2}, {S1, S2}={3, 1}. The optimal plan would be [NY, NY] with total cost 1+2=3, but this algorithm yields plan [NY, SF] with total cost 1+(1000+1)=1002.

b) Let M=2. {N1, N2, N3, N4}={1, 100, 1, 100}, {S1, S2, S3, S4}={100, 1, 100, 1}. The optimal plan would be [NY, SF, NY, SF] with total cost 1+(1+2)+(1+2)+(1+2)=10. The cost of operating in NY and SF are alternately extremely high relative to that of the other city, and since the moving fee is relatively low it is optimal to move three times.

c) Have two arrays, ScoreNY and ScoreSF, of size n+1. ScoreNY[i] is the optimal cost up to month i with operations in month i in NY. ScoreSF[i] is the optimal cost up to month i with operations in month i in SF.

ScoreNY[0]=0 | O(1) | ||

ScoreSF[0]=0 | O(1) | ||

ScoreNY[1]=N1 | O(1) | ||

ScoreSF[1]=S1 | O(1) | ||

For i=2 to n | O(n) | ||

ScoreNY[i]=Ni + min{ScoreNY[i-1], ScoreSF[i-1]+M} | O(1) | ||

ScoreSF[i]=Si + min{ScoreSF[i-1], ScoreNY[i-1]+M} | O(1) | ||

If ScoreNY[n]O(1) | |||

return ScoreNY[n] | O(1) | ||

Else | |||

return ScoreSF[n] | O(1) |

Correctness: The cost of operating for 0 months is 0. The cost of operating for the first month in NY and SF is N1 and S1, respectively, since there are no moving costs as you start out in that city. Say for the last month i you are ending in NY. Then you must add the cost of operating in NY in month i (Ni) plus either the cost of optimally operating and ending in NY the previous month, or the cost of optimally operating and ending in SF the previous month and the moving fee. We take the minimum of the two to determine the minimum cost ScoreNY[i]. Reversing NY and SF, the same holds for ScoreSF[i]. Finally, we return the cost that is cheaper: operating lastly in NY or in SF.

Runtime: For some length of time i where we determine the minimum cost up to i months, we have two operations that can be done in constant time (assignment of constant, or taking min of two values that have been calculated in previous steps). Since we examine min costs for length of time i from 0 to n, the running time is O(n).

**Chp. 8 Q1:**

For each of the two questions below, decide whether the answer is(i) "Yes," (ii) "No," or (iii) "Unknown, because it would resolve the questionof whether P = NP."

a) Let’s define the decision version of the Interval Scheduling Problemfrom Chapter 4 as follows: Given a collection of intervals ona time-line, and a bound k, does the collection contain a subset ofnonoverlapping intervals of size at least k?

Question: Is it the case that Interval Scheduling ≤P Vertex Cover?

b) Question: Is it the case that Independent Set ≤P Interval Scheduling?

A:

a) Yes. Recall that the greedy algorithm used to solve the Interval Scheduling problem is O(n log n) [or see p.121 of book]. The Interval Scheduling problem can be solved in polynomial time without making calls to a black box that solves the Vertex Cover problem. Therefore, the interval scheduling problem can be solved using a polynomial number of standard computational steps, plus a polynomial number of calls (zero calls) to a black box that solves Vertex Cover, hence Interval Scheduling ≤P Vertex Cover.

b) Unknown, because it would resolve the question of whether P=NP.

ie. Independent Set ≤P Interval Scheduling ←→ P=NP

Proof:

→ Independent Set ≤P Interval Scheduling.

(8.1) says: Suppose Y ≤P X. If X can be solved in polynomial time, then Y can be solved in polynomial time.

Interval scheduling can be solved in polynomial time, so Independent Set can be solved in polynomial time. I ndependent Set is NP-complete (8.16).

(8.12) says: Suppose X is an NP-complete problem. Then X is solvable in polynomial time if and only if P=NP.

So Independent Set being NP-complete and solvable in polynomial time imply P=NP.

← P=NP. Independent Set is NP. If P=NP, then Independent Set is P, i.e. Independent Set can be solved in polynomial time. Then Independent Set ≤P Interval Scheduling because Independent Set can be solved in polynomial time with a polynomial number of calls to a black box that solves Interval Scheduling (zero calls are needed).

**Chp. 8 Q2:**

A store trying to analyze the behavior of its customers will often maintaina two-dimensional array A, where the rows correspond to its customersand the columns correspond to the products it sells. The entry A[i, j]specifies the quantity of product j that has been purchased by customer i.

One thing that a store might want to do with this data is the following.Let us say that a subset S of the customers is diverse if no two of theof the customers in S have ever bought the same product (i.e., for eachproduct, at most one of the customers in S has ever bought it). A diverseset of customers can be useful, for example, as a target pool for marketresearch.

We can now define the Diverse Subset Problem’ as follows: Given anm x n array A as defined above, and a number k < m, is there a subset ofat least k of customers that is diverse?

Show that Diverse Subset is NP-complete.

A:

Diverse Subset Problem is NP: Given a set of k customers, it can be checked in polynomial time that no two customers in the set have ever bought the same product.

Independent Set is known to be NP-complete (8.16).

Independent Set ≤P Diverse Subset Problem: Suppose we have a black box for Diverse Subset Problem and want to solve an instance of Independent Set. For our Independent Set Problem, we have a graph G=(V,E) and a number k, and need to find out if G contains an independent set of size (at least) k. We need to reduce the Independent Set Problem to a Diverse Subset Problem. We do this by constructing an array where each v in V is a customer and each e in E is a product, and customer v purchased every product e for which the product edge e touches the customer node v. Then we ask the black box for the Diverse Subset Problem if there is a subset of k customers that is diverse.

The black box for the Diverse Subset Problem will return “Yes” ←→ the Independent Set Problem is “Yes”, i.e. graph G contains an independent set of size k.

→ The black box for the Diverse Subset Problem will return “Yes”: there is a subset of k customers that is diverse, no two customers in the subset have ever bought the same product. Then in the graph that can be constructed from this, no two customer nodes in the diverse subset share an edge, so it is an independent set of size k.

← The Independent Set Problem is “Yes”, i.e. graph G contains an independent set of size k. Then in the corresponding array, the independent set of size k corresponds to a set of customers where only one customer has purchased each product, so there is a diverse subset of size k

Independent Set Problem requires polynomial time to construct the problem as a Diverse Subset Problem and polynomial calls to the Diverse Subset Problem black box. Hence, Independent Set ≤P Diverse Subset Problem.

(8.14) If Y is an NP-complete problem, and X is a problem in NP with the property that Y ≤P X, then X is NP-complete.

Thus, Diverse Subset Problem is NP-complete.

**Chp. 8 Q3:**

Suppose you’re helping to organize a summer sports camp, and thefollowing problem comes up. The camp is supposed to have at least one counselor who’s skilled at each of the n sports covered by the camp(baseball, volleyball, and so on). They have received job applications fromm potential counselors. For each of the n sports, there is some subsetof the m applicants qualified in that sport. The question is: For a givennumber k < m, is it possible to hire at most k of the counselors and haveat least one counselor qualified in each of the n sports? We’ll call this theEfficient Recruiting Problem.

Show that Efficient Recruiting is NP-complete.

A:

Efficient Recruiting is NP: Given a set of k counselors, it can be checked in polynomial time that at least one counselor is qualified in each of the n sports.

Vertex Cover is known to be NP-complete (8.16).

Vertex Cover ≤P Efficient Recruiting: Suppose we have a black box for Efficient Recruiting and want to solve an instance of Vertex Cover. For our Vertex Cover Problem, we have a graph G=(V,E) and a number k, and need to find out if G contains a vertex cover of size (at most) k. We need to reduce the Vertex Cover Problem to an Efficient Recruiting Problem. We do this by assigning each counselor to a node and each edge represents some sport. Each counselor is qualified in the sports for which the sport edge is incident on their corresponding counselor node. Then we ask the black box for the Efficient Recruiting Problem if there is a subset of k counselors that are qualified in all sports.

The black box for Efficient Recruiting will return “Yes” ←→ the Vertex Cover Problem is “Yes”, i.e. graph G contains a vertex cover of size k.

→ The black box for Efficient Recruiting will return “Yes”: there is a subset of k counselors that are qualified in all sports, so every sport edge is incident on at least one node in the set of nodes corresponding to the counselor subset. Hence, this set of nodes is a vertex cover of size k.

← The Vertex Cover Problem is “Yes”, i.e. graph G contains a vertex cover of size k. Then for the subset of counselors that are assigned to the nodes in the vertex cover, each sport edge is incident on at least one node in the vertex cover, so there is a subset of k counselors corresponding to the vertex cover nodes that are qualified in all sports. The black box for Efficient Recruiting will return “Yes”.

Vertex Cover Problem requires polynomial time to construct the problem as an Efficient Recruiting Problem and polynomial calls to the Efficient Recruiting Problem black box. Hence, Vertex Cover ≤P Efficient Recruiting.

(8.14) If Y is an NP-complete problem, and X is a problem in NP with the property that Y ≤P X, then X is NP-complete.

Thus, Efficient Recruiting is NP-complete.

**Chp.8 Q9:**

You are managing a communication network, modeled by a directed graph, G = (V, E). There are cusers who are interested in making use of this network. User i (for each i = 1, 2, . . . , c) issues a requestto reserve a speciﬁc path Pi in G on which to transmit data.

You are interested in accepting as many of the path requests as possible, subject to the followingrestriction: if you accept both Pi and Pj, then Pi and Pj cannot share any nodes.

Thus, the Path Selection Problem asks: Given a directed graph G = (V, E), a set of requests P1, P2, . . . , Pc–each of which must be a path in G–and a number k, is it possible to select at least k of the paths sothat no two of the selected paths share any node?

Prove that Path Selection is NP-complete.

A:

Path Selection Problem is NP: Given n paths, it can be checked in polynomial time that none of the paths share any nodes.

3-Dimensional Matching is NP-complete (8.20).

3-Dimensional Matching ≤P Path Selection Problem: Suppose we have a black box for Path Selection Problem and want to solve an instance of 3-Dimensional Matching. For our 3-Dimensional Matching Problem, we have disjoint sets X, Y, and Z, each of size n, and given a set T⊆ X x Y x Z of ordered triples, and need to find out if there exist a set of n triples in T so that each element of X ∪ Y ∪ Z is contained in exactly one of these triples. We need to reduce the 3-Dimensional Matching Problem to a Path Selection Problem. We do this by constructing a graph G.

For each triple (x, y, z) in T, add nodes x, y, and z to G. Add a directed edge from x to y and a directed edge from y to z to G. Let Pi be the path in G that goes from node x to node y to node z.

Then given a set of path requests we ask the black box for the Path Selection Problem if there are n paths such that no two of the selected paths share any nodes.

The black box for the Path Selection Problem will return “Yes” ←→ the 3-Dimensional Matching Problem is “Yes”, i.e. there exists a set of n triples in T so that each element of X ∪ Y ∪ Z (X, Y, Z are size n) is contained in exactly one of these triples.

→ The black box for the Path Selection Problem will return “Yes”: Given a directed graph G, a set of requests P1, P2, … Pc—each of which is a path in G—and a number n, it is possible to select n paths so that no two of the paths share any nodes. Then for the triples corresponding to the n paths, each triple contains an element of X, an element of Y, and element of Z that is not shared by another path, so not shared by another triple. These n triples form a perfect 3-dimensional matching.

← The 3-Dimensional Matching is “Yes”, i.e. there exists a set of n triples in T so that each element of X ∪ Y ∪ Z (X, Y, Z are size n) is contained in exactly one of these triples. Then the corresponding n paths in the graph do not share any nodes.

3-Dimensional Matching requires polynomial time to construct the problem as a Path Selection Problem and polynomial calls to the Path Selection Problem black box. Hence, 3-Dimensional Matching ≤P Path Selection Problem.

(8.14) If Y is an NP-complete problem, and X is a problem in NP with the property that Y ≤P X, then X is NP-complete.

Thus, Path Selection is NP-complete.