A reference implementation Propagational Proxy Voting.
Back of Napkin
- visitors creates user accounts
- visitors can search for {users, policies, categories, groups}. The
- user creates policies. This user becomes the manager of this policy. The policy has a title and metadata.
- user creates categories, by selecting one policy to start with. This user becomes the manager of this policy.
- user creates groups, by assinging themselves as the first member and becomes the manager.
- user votes to users, policies, categories, groups
- A manager can edit the {policy, category, group}.
- manager can change the manager to another user
- manager can change the metadata of the {policy, category, group}, meta data is a json file
- manager can delete the {policy, category, group}
API and model
from other programming languages
The most straight forward api will be, using a map of votes as input
votes := make(map[string]map[string]float64) if _, ok := votes["source"]; !ok { votes["source"] = make(map[string]float64) } votes["source"]["dest"] = 1.0 ppv := PPV{votes} // potentially expensive ppv.run()
thinking to use this library in different languages, the wrapper should be as thin as possible.
python
naive implementation
we skip the most naive implementation to iterate through the matrix to calculate the dot product.
we use numpy
instead.
The following would be the most straight forward implementation, given the matrix \(V\), we will do something like..
while True: a = a @ v result_seq = a[SOURCES:, :N_DELEGATE].sum(axis=1) if np.sum(np.abs(result_seq - p_result)) < EPSILON: break p_result = result_seq i += 1
EPSILON
is a constant to define ’convergence’.
Command | Mean [ms] | Min [ms] | Max [ms] |
---|---|---|---|
delegates: 10 |
91.2 ± 11.2 | 73.7 | 115.0 |
delegates: 60 |
100.7 ± 12.0 | 81.1 | 136.6 |
delegates: 110 |
140.3 ± 55.3 | 99.9 | 333.3 |
delegates: 160 |
174.6 ± 24.2 | 140.6 | 242.8 |
delegates: 210 |
274.7 ± 41.1 | 209.1 | 362.5 |
delegates: 260 |
380.6 ± 36.7 | 318.2 | 433.9 |
delegates: 310 |
524.7 ± 65.8 | 477.4 | 694.1 |
delegates: 360 |
719.8 ± 37.7 | 654.6 | 791.3 |
delegates: 410 |
1063.9 ± 88.2 | 985.8 | 1297.1 |
delegates: 460 |
1380.5 ± 104.2 | 1226.6 | 1525.1 |
delegates: 1000 |
161740 ± 258 | 15843 | 16719 |
so just by having 1000 delegates can be quite slow…
We want to handle at least 10,522 voters. This will never going to converge.
breaking up the matrix
One thing we observed is that when we have more policies, this converges faster…
Command | Mean [ms] | Min [ms] | Max [ms] |
---|---|---|---|
100 0 10 |
129.5 ± 16.0 | 106.3 | 185.2 |
100 0 60 |
105.7 ± 11.0 | 89.2 | 132.6 |
100 0 110 |
122.2 ± 13.0 | 98.9 | 153.8 |
100 0 160 |
133.8 ± 14.7 | 111.5 | 162.3 |
100 0 210 |
150.0 ± 15.2 | 123.6 | 182.8 |
100 0 260 |
165.1 ± 61.0 | 129.9 | 406.9 |
100 0 310 |
177.9 ± 25.8 | 147.8 | 246.0 |
100 0 360 |
185.3 ± 10.6 | 170.2 | 204.8 |
100 0 410 |
211.9 ± 16.1 | 191.2 | 245.4 |
100 0 460 |
222.2 ± 18.8 | 196.3 | 269.2 |
we see that policies with only 10 is slower than having 60-110 nodes…. Even the fact that it contributes to having a large voting matrix overall.
This give a hint that the balance between the sources and policies do have an effect on the computational load.
At the end it is this matrix, and we are only interested in \(V_1\) and \(V_2\).
\begin{aligned} V^{t+1} &= V^t \cdot A \\ &= \left[ \begin{array}{c|c} V_{1}^{t} & \mathbf{0} \\ \hline V_{2}^{t} & I \\ \end{array} \right] \cdot \left[ \begin{array}{c|c} A_1 & \mathbf{0} \\ \hline A_2 & I \end{array} \right] \\ V_1^{t+1} &= V_1^{t} \cdot A_1 \\ V_2^{t+1} &= V_1^{t} \cdot A_2 + A_2 \\ \end{aligned}This helps when there are lots of policies relative to sources(delegates and intermediaries)
sm_1 = v[:SOURCES, :SOURCES] sm_2 = v[SOURCES:, :SOURCES].T while True: sma_2 = sma_1 @ sm_2 + sma_2 sma_1 = sma_1 @ sm_1 result_sm = sma_2[:N_DELEGATE, :].sum(axis=0) if np.sum(np.abs(result_sm - p_result)) < EPSILON: break p_result = result_sm i += 1
Command | Mean [ms] | Min [ms] | Max [ms] |
---|---|---|---|
100 0 10 |
132.0 ± 14.7 | 106.6 | 158.1 |
100 0 60 |
102.3 ± 10.7 | 85.0 | 132.3 |
100 0 110 |
115.0 ± 23.0 | 86.3 | 204.4 |
100 0 160 |
119.6 ± 20.9 | 89.5 | 166.9 |
100 0 210 |
114.7 ± 19.4 | 93.9 | 164.9 |
100 0 260 |
120.8 ± 15.4 | 84.0 | 153.1 |
100 0 310 |
118.9 ± 15.0 | 99.1 | 150.5 |
100 0 360 |
129.6 ± 20.1 | 105.1 | 194.6 |
100 0 410 |
130.0 ± 20.3 | 101.8 | 181.5 |
100 0 460 |
140.3 ± 38.6 | 106.6 | 303.3 |
We see the same trend here. 60-360
is faster than 10. It’s nice to see a small speed bump though.
but the differences get negligible when the number of sources gets big….
Benchmark 1: python calculate_ppv.py 500 0 50 Time (mean ± σ): 1.706 s ± 0.091 s [User: 14.242 s, System: 1.692 s] Benchmark 1: python calculate_ppv.py 500 0 50 Time (mean ± σ): 1.891 s ± 0.089 s [User: 15.547 s, System: 1.837 s]
This case below is the worst case to the sequential
version.
# sequential Time (mean ± σ): 24.685 s ± 0.887 s [User: 217.134 s, System: 16.602 s] Range (min … max): 23.928 s … 27.032 s 10 runs # sub matrix Time (mean ± σ): 27.390 s ± 2.447 s [User: 237.191 s, System: 16.035 s] Range (min … max): 22.516 s … 30.215 s 10 runs
but as soon as we have more policies, breaking this up to sub matrix has some speed increase.
# sequential Time (mean ± σ): 12.831 s ± 1.350 s [User: 112.784 s, System: 9.302 s] Range (min … max): 12.039 s … 16.446 s 10 runs # sub matrix Time (mean ± σ): 12.467 s ± 0.241 s [User: 110.049 s, System: 9.258 s] Range (min … max): 12.097 s … 12.904 s 10 runs
floating point arithmatic
but, it gets interesting when we see the results. Mathmatically this is the same, yet we see some discrepancies.
sequential iteration: 7677 took: 21.688924542000223 sub matrix iteration: 7524 took: 21.49630766600012 result delta: 0.13598852449656817
we see smaller delta when we have more policies.
nodes | seq iteration | sub iteration | delta |
---|---|---|---|
500 0 02 | 7677 | 7524 | 0.13598852449656817 |
500 0 04 | 4034 | 3984 | 0.0996262102025618 |
500 0 50 | 380 | 364 | 0.006377236222159261 |
500 0 100 | 207 | 194 | 0.003150795373400985 |
so the more the iteration, we have more differences. This comes fome multipling two values that are order of magnitudes different to each other. We are trying to mitigate this by making sure we use f64, but we will see this.
n to the power of 2
we know that we can speed things up by multipling \(V\) together, meaning we can \(V^{t+1} = V \cdot V\). Lets do this with both the sequential pattern and sub matrix.
Command | Mean [ms] | Min [ms] | Max [ms] |
---|---|---|---|
10 |
81.9 ± 8.6 | 69.8 | 106.6 |
60 |
84.4 ± 8.4 | 72.6 | 99.8 |
110 |
92.0 ± 13.1 | 75.2 | 140.5 |
160 |
99.1 ± 9.2 | 81.4 | 115.6 |
210 |
108.8 ± 8.7 | 89.8 | 130.9 |
260 |
117.8 ± 7.4 | 105.1 | 134.8 |
310 |
136.6 ± 13.0 | 118.5 | 163.6 |
360 |
150.6 ± 6.2 | 140.5 | 166.9 |
410 |
187.6 ± 36.2 | 162.5 | 321.0 |
460 |
198.2 ± 12.8 | 170.7 | 216.2 |
1000 |
625.3 ± 24.5 | 595.3 | 662.5 |
5000 |
20862 ± 441 | 19915 | 21301 |
wow, that is a x271
speed bump for larger matrices. We can still wait for 20 seconds.
sub matrix version
Command | Mean [ms] | Min [ms] | Max [ms] |
---|---|---|---|
10 |
98.1 ± 12.9 | 83.6 | 145.1 |
60 |
102.9 ± 7.8 | 92.2 | 121.4 |
110 |
110.4 ± 10.1 | 91.7 | 131.8 |
160 |
122.2 ± 12.3 | 101.5 | 146.6 |
210 |
141.5 ± 18.2 | 119.2 | 178.2 |
260 |
148.0 ± 9.5 | 125.4 | 162.9 |
310 |
177.5 ± 43.8 | 154.4 | 350.1 |
360 |
180.4 ± 14.6 | 162.4 | 218.9 |
410 |
222.0 ± 17.0 | 198.8 | 252.7 |
460 |
233.9 ± 19.9 | 207.9 | 283.7 |
1000 |
714.2 ± 13.7 | 681.0 | 732.6 |
5000 |
26.627 ± 803 | 25886 | 28594 |
here we see a slight speed down compared to the normal multiplitication.
zig
lets use zig, see if this will help. numpy
uses BLAS
under the hood, so we should also use this.
zig test src/main.zig -lc -L/System/Library/Frameworks/Accelerate.framework/Versions/Current/Frameworks/vecLib.framework/libBLAS.dynlib -lBLAS -OReleaseFast
Command | Mean | Min | Max |
---|---|---|---|
./main 10 |
747.7µs ± 155.1 | 503.6µs | 1555.9µs |
./main 60 |
1173.5µs ± 143.3 | 936.6µs | 1961.8µs |
./main 110 |
1959.6µs ± 157.7 | 1701.4µs | 2811.6µs |
./main 160 |
3253.8µs ± 156.4 | 2915.4µs | 4063.2µs |
./main 210 |
5047.6µs ± 149.9 | 4683.0µs | 5937.6µs |
./main 260 |
6630.8µs ± 179.3 | 6199.3µs | 7487.5µs |
./main 310 |
8609.8µs ± 170.4 | 8167.1µs | 9566.0µs |
./main 360 |
11320.0µs ± 194.3 | 10877.6µs | 12363.8µs |
./main 410 |
14868.5µs ± 185.5 | 14407.6µs | 15442.5µs |
./main 460 |
18.5258ms ± .2908 | 18.0263ms | 20759.3µs |
we dipped into sub order!
Command | Mean | Min | Max |
---|---|---|---|
./main 1000 |
105.7 ms ± 1.5 ms | 103.2 ms | 109.2 ms |
./main 5000 |
8.316 s ± 0.033 s | 8.281 s | 8.379 s |
./main 10000 |
64.337 s ± 0.162 s | 64.122 s | 64.657 s |
thats is an 1530x
speed increase than the original python numpy implmentation with 1000 nodes! The other thing is that the calculation doensn’t hog the cpu!
To increase the number than this, I think we need to go deeper into GPUs.