Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit a3738f5

Browse files
committed
initial version of manhattan mst
1 parent 7e4b0db commit a3738f5

File tree

2 files changed

+176
-0
lines changed

2 files changed

+176
-0
lines changed

src/geometry/manhattan-distance.md

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
---
2+
tags:
3+
- Original
4+
---
5+
6+
# Manhattan Distance
7+
8+
## Definition
9+
Consider we have some points on a plane, and define a distance from point $p$ to $q$ as being the sum of the difference between their $x$ and $y$ coordinates:
10+
11+
$d(p,q) = |p.x - q.x| + |p.y - q.y|$
12+
13+
This is informally know as the [Manhattan distance, or taxicab geometry](https://en.wikipedia.org/wiki/Taxicab_geometry), because we can think of the points as being intersections in a well designed city, like manhattan, where you can only move on the streets, as shown in the image below:
14+
15+
This images show some of the smallest paths from one black point to the other, all of them with distance $12$.
16+
17+
There are some interseting tricks and algorithms that can be done with this distance, and we will show some of them here.
18+
19+
## Farthest pair of points in Manhattan Distance
20+
21+
Given $n$ points $P$, we want to find the pair of points $p,q$ that are farther apart, that is, maximize $d(p, q) = |p.x - q.x| + |p.y - q.y|$.
22+
23+
Let's think first in one dimension, so $y=0$. The main observation is that we can bruteforce if $|p.x - q.x|$ is equal to $p.x - q.x$ or $-p.x + q.x$, because if we "miss the sign" of the absolute value, we will get only a smaller value, so it can't affect the answer. More formally, we have that:
24+
25+
$|p.x - q.x| = max(p.x - q.x, -p.x + q.x)$
26+
27+
So for example, we can try to have $p$ such that $p.x$ has the plus sign, and then $q$ must have the negative sign. This way we want to find:
28+
$max_{p, q \in P}(p.x + (-q.x)) = max_{p \in P}(p.x) + max_{q \in P}( - q.x )$.
29+
30+
Notice that we can extend this idea further for 2 (or more!) dimensions. For $d$ dimensions, we must bruteforce $2^d$ possible values of the signs. For example, if we are in $2$ dimensions and bruteforce that $p$ has both the plus signs we want to find:
31+
32+
$max_{p, q \in P} (p.x + (-q.x)) + (p.y + (-q.y)) = max_{p \in P}(p.x + p.y) + max_{q \in P}(-q.x - q.y)$.
33+
34+
As we made $p$ and $q$ independent, it is now easy to find the $p$ and $q$ that maximize the expression.
35+
36+
## Rotating the points and Chebyshev distance
37+
38+
39+
40+
## Manhattan Minimum Spanning Tree
41+
42+
The Manhattan MST problem consists of, given some points in the plane, find the edges that connect all the points and have a minimum total sum of weights. The weight of an edge that connects to points is their Manhattan distance. For simplicity, we assume that all points have different locations.
43+
Here we show a way of finding the MST in $O(n\logn)$ by finding for each point its nearest neighbor in each octant, as represented by the image below. This will give us $O(n)$ candidate edges, which will guarantee that they contain the MST. The final step is then using some standard MST, for example, [Kruskal algorithm using disjoint set union](https://cp-algorithms.com/graph/mst_kruskal_with_dsu.html).
44+
45+
The algorithm show here was first presented in a paper from [H. Zhou, N. Shenoy, and W. Nichollos (2002)](https://ieeexplore.ieee.org/document/913303). There is also another know algorithm that uses a Divide and conquer approach by [J. Stolfi](https://www.academia.edu/15667173/On_computing_all_north_east_nearest_neighbors_in_the_L1_metric), which is also very interesting and only differ in the way they find the nearest neighbor in each octant.
46+
47+
First, let's understand why it is enough to consider only the nearest neighbor in each octant. The idea is to show that for a point s and any two other points $p$ and $q$ in the same octant, $dist(p, q) < max(dist(s, p), dist(s, q))$. This is important, because it shows that if there was a MST where $s$ is connected to both $p$ and $q$, we could erase one of these edges and add the edge $(p,q)$, which would decrease the total cost. To prove, we assume without loss of generality that $p$ and $q$ are in the octanct $R_1$, which is defined by: $x_s \leq x$ and $x_s - y_s > x - y$, and then do some casework. The images below give some intuition on why this is true.
48+
49+
Therefore, the main question is how to find the nearest neighbor in each octant for every single of the $n$ points.
50+
51+
## Nearest Neighbor in each Octant in $O(n\logn)$
52+
53+
For simplicity we focus on the north-east octant. All other directions can be found with the same algorithm by rotating the input.
54+
55+
We will use a sweep-line approach. We process the points from south-west to north-east, that is, by non-decreasing $x + y$. We also keep a set of points which don't have their nearest neighbor yet.
56+
57+
When we add a new point point $p$, for every point $s$ that has it in it's octant we can safely assign $p$ as the nearest neighbor. This is true because their distance is $d(p,s) = |x_p - x_s| + |y_p - y_s| = (x_p + y_p) - (x_s + y_s)$, because $p$ is in the north-east octant. As all the next points will not have a smaller value of $x + y$ because of the process order, $p$ is guaranteed to have the smaller distance. We can then remove all such points from the active set, and finally add $p$ to this set.
58+
59+
The next question is how to efficiently find which points $s$ have $p$ in the north-east octant. That is, which points $s$ satisfy:
60+
61+
- $x_s \leq x_p$
62+
- $x_p - y_p < x_s - y_s$
63+
64+
Because no points in the active set are in the R_1 of another, we also have that for two points $q_1$ and $q_2$ in the active set, $x_{q_1} \neq x_{q_2}$ and $x_{q_1} < x_{q_2} \implies x_{q_1} - y_{q_1} \leq x_{q_2} - y_{q_2}$.
65+
66+
This means that if we keep the active set ordered by $x$ the candidates $s$ are consecutively placed. We can then find the largest $x_s \leq x_p$ and process the points in decreasing order of $x$ until the second condition $x_p - y_p < x_s - y_s$ breaks (we can actually allow that $x_p - y_p = x_s - y_s$ and that deals with the case of points with equal coordinates). Notice that because we remove from the set right after processing, this will have an amortized complexity of $O(n \log(n))$.
67+
Now that we have the nearest point in the north-east direction, we rotate the points and repeat. It is possible to show that actually we also find this way the nearest point in the south-west direction, so we can repeat only 4 times, instead of 8.
68+
69+
In summary we:
70+
- Sort the points by $x + y$ in non-decreasing order;
71+
- For every point, we iterate over the active set starting with the point with the largest $x$ such that $x \leq x_p$, and we break the loop if $x_p - y_p \geq x_s - y_s$. For every valid point $s$ we add the edge $(s,p, dist(s,p))$ in our list;
72+
- We add the point $p$ to the active set;
73+
- Rotate the points and repeat until we iterate over all the octants.
74+
- Apply Kruskal algorithm in the list of edges to get the MST.
75+
76+
Below you can find a implementation, based on the one from [KACTL](https://github.com/kth-competitive-programming/kactl/blob/main/content/geometry/ManhattanMST.h).
77+
78+
```{.cpp file=manhattan_mst.cpp}
79+
vector<tuple<long long,int,int> > manhattan_mst_edges(vector<point> ps){
80+
vector<int> ids(ps.size());
81+
iota(ids.begin(), ids.end(), 0);
82+
vector<tuple<long long,int,int> > edges;
83+
for(int rot = 0; rot < 4; rot++){ // for every rotation
84+
sort(ids.begin(), ids.end(), [&](int i,int j){
85+
return (ps[i].x + ps[i].y) < (ps[j].x + ps[j].y);
86+
});
87+
map<int, int, greater<int> > active; // (xs, id)
88+
for(auto i : ids){
89+
for(auto it = active.lower_bound(ps[i].x); it != active.end();
90+
active.erase(it++)){
91+
int j = it->second;
92+
if(ps[i].x - ps[i].y > ps[j].x - ps[j].y)break;
93+
assert(ps[i].x >= ps[j].x && ps[i].y >= ps[j].y);
94+
edges.push_back({(ps[i].x - ps[j].x) + (ps[i].y - ps[j].y), i, j});
95+
}
96+
active[ps[i].x] = i;
97+
}
98+
for(auto &p : ps){ // rotate
99+
if(rot&1)p.x *= -1;
100+
else swap(p.x, p.y);
101+
}
102+
}
103+
return edges;
104+
}
105+
```

test/manhattan_mst.cpp

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
#include <bits/stdc++.h>
2+
using namespace std;
3+
#include "manhattan_mst.h"
4+
5+
struct point {
6+
int x, y;
7+
};
8+
9+
struct DSU {
10+
int n;
11+
vector<int> p, ps;
12+
DSU(int _n){
13+
n = _n;
14+
p = ps = vector<int>(n+1, 1);
15+
for(int i=1;i<=n;i++)p[i] = i;
16+
}
17+
int f(int x){return p[x]=(p[x]==x?x:f(p[x]));}
18+
bool join(int a,int b){
19+
a=f(a),b=f(b);
20+
if(a==b)return false;
21+
if(ps[a] > ps[b])swap(a,b);
22+
ps[b] += ps[a];
23+
p[a] = b;
24+
return true;
25+
}
26+
};
27+
28+
long long mst_cost(vector<tuple<long long,int ,int> > e,int n){
29+
sort(e.begin(), e.end());
30+
DSU dsu(n);
31+
long long c=0;
32+
for(auto &[w, i, j] : e){
33+
if(dsu.join(i, j))c += w;
34+
}
35+
return c;
36+
}
37+
38+
vector<tuple<long long,int,int> > brute(vector<point> ps){
39+
vector<tuple<long long,int,int> > e;
40+
for(int i=0;i<ps.size();i++)for(int j=i+1;j<ps.size();j++){
41+
e.push_back({abs(ps[i].x - ps[j].x) + abs(ps[i].y - ps[j].y), i, j});
42+
}
43+
return e;
44+
}
45+
46+
mt19937 rng(123);
47+
vector<point> get_random_points(int n,int maxC){
48+
vector<point> ps;
49+
for(int i=0;i<n;i++){
50+
int x = rng()%(2*maxC) - maxC, y = rng()%(2*maxC) - maxC;
51+
ps.push_back({x, y});
52+
}
53+
return ps;
54+
}
55+
56+
int32_t main(){
57+
vector<int> max_cs = {5, 1000, 100000, 100000000};
58+
vector<int> ns = {5, 100, 500};
59+
for(int maxC : max_cs)for(int n : ns){
60+
auto ps = get_random_points(n, maxC);
61+
auto e1 = brute(ps);
62+
auto e2 = manhattan_mst_edges(ps);
63+
assert(mst_cost(e1, n) == mst_cost(e2, n));
64+
}
65+
auto time_begin = clock();
66+
auto ps = get_random_points(200000, 1000000);
67+
auto e = manhattan_mst_edges(ps);
68+
cerr << setprecision(5) << fixed;
69+
cerr << (double)(clock() - time_begin)/CLOCKS_PER_SEC << endl;
70+
assert((double)(clock() - time_begin)/CLOCKS_PER_SEC < 2);
71+
}

0 commit comments

Comments
 (0)