libppv

A reference implementation Propagational Proxy Voting.

Back of Napkin

visitors creates user accounts
visitors can search for {users, policies, categories, groups}. The
user creates policies. This user becomes the manager of this policy. The policy has a title and metadata.
user creates categories, by selecting one policy to start with. This user becomes the manager of this policy.
user creates groups, by assinging themselves as the first member and becomes the manager.
user votes to users, policies, categories, groups
A manager can edit the {policy, category, group}.
- manager can change the manager to another user
- manager can change the metadata of the {policy, category, group}, meta data is a json file
- manager can delete the {policy, category, group}

API and model

from other programming languages

The most straight forward api will be, using a map of votes as input

votes := make(map[string]map[string]float64)

if _, ok := votes["source"]; !ok {
  votes["source"] = make(map[string]float64)
}

votes["source"]["dest"] = 1.0

ppv := PPV{votes}

// potentially expensive
ppv.run()

thinking to use this library in different languages, the wrapper should be as thin as possible.

python

naive implementation

we skip the most naive implementation to iterate through the matrix to calculate the dot product. we use numpy instead.

The following would be the most straight forward implementation, given the matrix \(V\), we will do something like..

while True:
    a = a @ v
    result_seq = a[SOURCES:, :N_DELEGATE].sum(axis=1)
    if np.sum(np.abs(result_seq - p_result)) < EPSILON:
        break
    p_result = result_seq
    i += 1

EPSILON is a constant to define ’convergence’.

Command	Mean [ms]	Min [ms]	Max [ms]
`delegates: 10`	91.2 ± 11.2	73.7	115.0
`delegates: 60`	100.7 ± 12.0	81.1	136.6
`delegates: 110`	140.3 ± 55.3	99.9	333.3
`delegates: 160`	174.6 ± 24.2	140.6	242.8
`delegates: 210`	274.7 ± 41.1	209.1	362.5
`delegates: 260`	380.6 ± 36.7	318.2	433.9
`delegates: 310`	524.7 ± 65.8	477.4	694.1
`delegates: 360`	719.8 ± 37.7	654.6	791.3
`delegates: 410`	1063.9 ± 88.2	985.8	1297.1
`delegates: 460`	1380.5 ± 104.2	1226.6	1525.1
`delegates: 1000`	161740 ± 258	15843	16719

so just by having 1000 delegates can be quite slow…

We want to handle at least 10,522 voters. This will never going to converge.

breaking up the matrix

One thing we observed is that when we have more policies, this converges faster…

Command	Mean [ms]	Min [ms]	Max [ms]
`100 0 10`	129.5 ± 16.0	106.3	185.2
`100 0 60`	105.7 ± 11.0	89.2	132.6
`100 0 110`	122.2 ± 13.0	98.9	153.8
`100 0 160`	133.8 ± 14.7	111.5	162.3
`100 0 210`	150.0 ± 15.2	123.6	182.8
`100 0 260`	165.1 ± 61.0	129.9	406.9
`100 0 310`	177.9 ± 25.8	147.8	246.0
`100 0 360`	185.3 ± 10.6	170.2	204.8
`100 0 410`	211.9 ± 16.1	191.2	245.4
`100 0 460`	222.2 ± 18.8	196.3	269.2

we see that policies with only 10 is slower than having 60-110 nodes…. Even the fact that it contributes to having a large voting matrix overall.

This give a hint that the balance between the sources and policies do have an effect on the computational load.

At the end it is this matrix, and we are only interested in \(V_1\) and \(V_2\).

\begin{aligned} V^{t+1} &= V^t \cdot A \\ &= \left[ \begin{array}{c|c} V_{1}^{t} & \mathbf{0} \\ \hline V_{2}^{t} & I \\ \end{array} \right] \cdot \left[ \begin{array}{c|c} A_1 & \mathbf{0} \\ \hline A_2 & I \end{array} \right] \\ V_1^{t+1} &= V_1^{t} \cdot A_1 \\ V_2^{t+1} &= V_1^{t} \cdot A_2 + A_2 \\ \end{aligned}

This helps when there are lots of policies relative to sources(delegates and intermediaries)

sm_1 = v[:SOURCES, :SOURCES]
sm_2 = v[SOURCES:, :SOURCES].T

while True:

    sma_2 = sma_1 @ sm_2 + sma_2
    sma_1 = sma_1 @ sm_1

    result_sm = sma_2[:N_DELEGATE, :].sum(axis=0)

    if np.sum(np.abs(result_sm - p_result)) < EPSILON:
        break

    p_result = result_sm

    i += 1

Command	Mean [ms]	Min [ms]	Max [ms]
`100 0 10`	132.0 ± 14.7	106.6	158.1
`100 0 60`	102.3 ± 10.7	85.0	132.3
`100 0 110`	115.0 ± 23.0	86.3	204.4
`100 0 160`	119.6 ± 20.9	89.5	166.9
`100 0 210`	114.7 ± 19.4	93.9	164.9
`100 0 260`	120.8 ± 15.4	84.0	153.1
`100 0 310`	118.9 ± 15.0	99.1	150.5
`100 0 360`	129.6 ± 20.1	105.1	194.6
`100 0 410`	130.0 ± 20.3	101.8	181.5
`100 0 460`	140.3 ± 38.6	106.6	303.3

We see the same trend here. 60-360 is faster than 10. It’s nice to see a small speed bump though.

but the differences get negligible when the number of sources gets big….

Benchmark 1: python calculate_ppv.py 500 0 50
  Time (mean ± σ):      1.706 s ±  0.091 s    [User: 14.242 s, System: 1.692 s]

Benchmark 1: python calculate_ppv.py 500 0 50
  Time (mean ± σ):      1.891 s ±  0.089 s    [User: 15.547 s, System: 1.837 s]

This case below is the worst case to the sequential version.

# sequential
  Time (mean ± σ):     24.685 s ±  0.887 s    [User: 217.134 s, System: 16.602 s]
  Range (min … max):   23.928 s … 27.032 s    10 runs
# sub matrix
  Time (mean ± σ):     27.390 s ±  2.447 s    [User: 237.191 s, System: 16.035 s]
  Range (min … max):   22.516 s … 30.215 s    10 runs

but as soon as we have more policies, breaking this up to sub matrix has some speed increase.

# sequential
  Time (mean ± σ):     12.831 s ±  1.350 s    [User: 112.784 s, System: 9.302 s]
  Range (min … max):   12.039 s … 16.446 s    10 runs
# sub matrix
  Time (mean ± σ):     12.467 s ±  0.241 s    [User: 110.049 s, System: 9.258 s]
  Range (min … max):   12.097 s … 12.904 s    10 runs

floating point arithmatic

but, it gets interesting when we see the results. Mathmatically this is the same, yet we see some discrepancies.

sequential
    iteration:  7677
    took:  21.688924542000223

sub matrix
    iteration:  7524
    took:  21.49630766600012

result delta:  0.13598852449656817

we see smaller delta when we have more policies.

nodes	seq iteration	sub iteration	delta
500 0 02	7677	7524	0.13598852449656817
500 0 04	4034	3984	0.0996262102025618
500 0 50	380	364	0.006377236222159261
500 0 100	207	194	0.003150795373400985

so the more the iteration, we have more differences. This comes fome multipling two values that are order of magnitudes different to each other. We are trying to mitigate this by making sure we use f64, but we will see this.

n to the power of 2

we know that we can speed things up by multipling \(V\) together, meaning we can \(V^{t+1} = V \cdot V\). Lets do this with both the sequential pattern and sub matrix.

Command	Mean [ms]	Min [ms]	Max [ms]
`10`	81.9 ± 8.6	69.8	106.6
`60`	84.4 ± 8.4	72.6	99.8
`110`	92.0 ± 13.1	75.2	140.5
`160`	99.1 ± 9.2	81.4	115.6
`210`	108.8 ± 8.7	89.8	130.9
`260`	117.8 ± 7.4	105.1	134.8
`310`	136.6 ± 13.0	118.5	163.6
`360`	150.6 ± 6.2	140.5	166.9
`410`	187.6 ± 36.2	162.5	321.0
`460`	198.2 ± 12.8	170.7	216.2
`1000`	625.3 ± 24.5	595.3	662.5
`5000`	20862 ± 441	19915	21301

wow, that is a x271 speed bump for larger matrices. We can still wait for 20 seconds.

sub matrix version

Command	Mean [ms]	Min [ms]	Max [ms]
`10`	98.1 ± 12.9	83.6	145.1
`60`	102.9 ± 7.8	92.2	121.4
`110`	110.4 ± 10.1	91.7	131.8
`160`	122.2 ± 12.3	101.5	146.6
`210`	141.5 ± 18.2	119.2	178.2
`260`	148.0 ± 9.5	125.4	162.9
`310`	177.5 ± 43.8	154.4	350.1
`360`	180.4 ± 14.6	162.4	218.9
`410`	222.0 ± 17.0	198.8	252.7
`460`	233.9 ± 19.9	207.9	283.7
`1000`	714.2 ± 13.7	681.0	732.6
`5000`	26.627 ± 803	25886	28594

here we see a slight speed down compared to the normal multiplitication.

zig

lets use zig, see if this will help. numpy uses BLAS under the hood, so we should also use this.

zig test src/main.zig -lc -L/System/Library/Frameworks/Accelerate.framework/Versions/Current/Frameworks/vecLib.framework/libBLAS.dynlib -lBLAS -OReleaseFast

Command	Mean	Min	Max
`./main 10`	747.7µs ± 155.1	503.6µs	1555.9µs
`./main 60`	1173.5µs ± 143.3	936.6µs	1961.8µs
`./main 110`	1959.6µs ± 157.7	1701.4µs	2811.6µs
`./main 160`	3253.8µs ± 156.4	2915.4µs	4063.2µs
`./main 210`	5047.6µs ± 149.9	4683.0µs	5937.6µs
`./main 260`	6630.8µs ± 179.3	6199.3µs	7487.5µs
`./main 310`	8609.8µs ± 170.4	8167.1µs	9566.0µs
`./main 360`	11320.0µs ± 194.3	10877.6µs	12363.8µs
`./main 410`	14868.5µs ± 185.5	14407.6µs	15442.5µs
`./main 460`	18.5258ms ± .2908	18.0263ms	20759.3µs

we dipped into sub order!

Command	Mean	Min	Max
`./main 1000`	105.7 ms ± 1.5 ms	103.2 ms	109.2 ms
`./main 5000`	8.316 s ± 0.033 s	8.281 s	8.379 s
`./main 10000`	64.337 s ± 0.162 s	64.122 s	64.657 s

thats is an 1530x speed increase than the original python numpy implmentation with 1000 nodes! The other thing is that the calculation doensn’t hog the cpu!

To increase the number than this, I think we need to go deeper into GPUs.