# Pubsub Test Analysis Notebook

## Usage

This notebook analyzes the output of a single pubsub test execution.

You must run all the cells (`Cell > Run All` menu) to load the data and generate the charts.

This may take quite some time and require considerable RAM on the first
run, but the results will be cached to `ANALYSIS_DIR/pandas` for future runs.

<p/>

<details>
    <summary>Expand to show example data</summary>

### `scores`

The `scores` `DataFrame` contains peer score events, indexed by timestamp.

**Example**:

| timestamp                     | observer                                             | peer                                                 |     score |
|-------------------------------|------------------------------------------------------|------------------------------------------------------|-----------|
| 2020-03-31 16:46:22.675526100 | 12D3KooWM1Q8EazdTBaYidtmAnaEqjtAzCGkYypzjgUmDU3bNpAS | 12D3KooWGHEfmKenMpsGDu1xEuVw7itsEMMT6oBVbYx642627Etw |  0.0027   |
| 2020-03-31 16:46:22.675526100 | 12D3KooWM1Q8EazdTBaYidtmAnaEqjtAzCGkYypzjgUmDU3bNpAS | 12D3KooWN5fa46NhuVP9as8ANmZ54gAM5cPuhTuu1bMAY5wvgoPG | 16.5617   |


- `observer` is the peer assigning the score
- `peer` is the peer receiving the score

### `metrics`

The `metrics` `DataFrame` contains aggregated tracer metrics.

**Example**:

|    |   published |   rejected |   delivered |   duplicates |   droppedrpc |   peersadded |   peersremoved |   topicsjoined |   topicsleft | peer                                                 |   sent_rpcs |   sent_messages |   sent_grafts |   sent_prunes |   sent_iwants |   sent_ihaves |   recv_rpcs |   recv_messages |   recv_grafts |   recv_prunes |   recv_iwants |   recv_ihaves |
|----|-------------|------------|-------------|--------------|--------------|--------------|----------------|----------------|--------------|------------------------------------------------------|-------------|-----------------|---------------|---------------|---------------|---------------|-------------|-----------------|---------------|---------------|---------------|---------------|
|  0 |           0 |          0 |         721 |            0 |            0 |            2 |              1 |              1 |            0 | 12D3KooWM1Q8EazdTBaYidtmAnaEqjtAzCGkYypzjgUmDU3bNpAS |         722 |             721 |             0 |             0 |             0 |             0 |         727 |             721 |             2 |             0 |             0 |             0 |
|  1 |           0 |          0 |         721 |            0 |            0 |            2 |              1 |              1 |            0 | 12D3KooWN5fa46NhuVP9as8ANmZ54gAM5cPuhTuu1bMAY5wvgoPG |         724 |             721 |             1 |             0 |             0 |             0 |         726 |             721 |             1 |             0 |             0 |             0 |

    
</details>

In [None]:
# Parameters in this cell can be overriden using papermill

# path to directory contaning output from the extract_test_outputs method in analyze.py
ANALYSIS_DIR="."

# dir to save figure images
FIGURE_OUT="./figures"

# path to zip file containing all figures
FIGURE_ZIP_OUT="./figures.zip"

# font sizes
SMALL_SIZE = 8
MEDIUM_SIZE = 10
BIGGER_SIZE = 18

In [None]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import toml
import ipywidgets as widgets
from pprint import pprint
import pathlib
import seaborn as sns
from durations import Duration

import notebook_helper
from notebook_helper import no_scores_message, load_pandas, archive_figures, p25, p50, p75, p95, p99

# load a helper to save figures to FIGURE_OUT
save_fig = notebook_helper.save_fig_fn(FIGURE_OUT)

# render charts in a larger, zoomable style
%matplotlib notebook

# turn off autosaving for the notebook
%autosave 0

# prettify the colors
sns.set(color_codes=True)

# helper to set font sizes for charts
def set_chart_fontsize(size):
    plt.rc('font', size=size)          # controls default text sizes
    plt.rc('axes', titlesize=size)     # fontsize of the axes title
    plt.rc('axes', labelsize=size)    # fontsize of the x and y labels
    plt.rc('xtick', labelsize=size)    # fontsize of the tick labels
    plt.rc('ytick', labelsize=size)    # fontsize of the tick labels
    plt.rc('legend', fontsize=size)    # legend fontsize
    plt.rc('figure', titlesize=size)  # fontsize of the figure title

    
# set chart fonts to BIGGER_SIZE by default
set_chart_fontsize(BIGGER_SIZE)

# load data
print('loading test data from ' + ANALYSIS_DIR)

tables = load_pandas(ANALYSIS_DIR)
scores = tables['scores']
metrics = tables['metrics']
cdf = tables['cdf']
pdf = tables['pdf']
peers = tables['peers']

# resample score index into 5s windows for plotting later
if not scores.empty:
    print('resampling peer scores')
    resample_interval = '5s'
    sampled = scores.resample(resample_interval)

t_warm = peers['t_warm'].max()
t_run = peers['t_run'].max()
t_cool = peers['t_cool'].max()
t_complete = peers['t_complete'].max()

params_panel, test_params = notebook_helper.test_params_panel(ANALYSIS_DIR)

## Test Parameters

The cell below shows the parameters that were used to create the composition file for this test run.

You can access the parameter values from other cells via the `test_params` dict, e.g.:

```python
from durations import Duration
warmup = Duration(test_params['T_WARM'])
print('warmup seconds: {}'.format(warmup.to_seconds()))
```

In [None]:
params_panel

### Aggregations

#### Latency Distribution

#### CDF

In [None]:
fig = notebook_helper.plot_latency_cdf(cdf)
save_fig(fig, 'latency-cdf')

#### PDF

In [None]:
fig = notebook_helper.plot_latency_pdf(pdf)
save_fig(fig, 'latency-pdf')

#### PDF (above p99)

In [None]:
fig = notebook_helper.plot_latency_pdf_above_quantile(pdf, quantile=0.99)
save_fig(fig, 'latency-pdf-over-p99')

#### Tracestat summary
Only Publish and Deliver counts are accurate, the rest are filtered.

In [None]:
print(notebook_helper.tracestat_summary(ANALYSIS_DIR))

#### Aggregated tracer metrics (per-peer)

In [None]:
metrics[['published', 'delivered', 'rejected', 'duplicates', 'droppedrpc']].agg([np.min, np.max, np.median, np.mean]).rename(columns={'amax': 'max', 'amin': 'min', 'amedian': 'median', 'amean': 'mean'})


#### All peer scores, aggregated across the test runtime

In [None]:
if not scores.empty:
    scores['score'].agg({'min': np.min, 'max': np.max, 'median': np.median, 'mean': np.mean})
else:
    no_scores_message()

#### Aggregated score values for peers with negative scores

- `observer` is the peer assigning the score. 
- `peer` is the peer receiving the score.

In [None]:
if not scores.empty:
    neg = scores.where(scores['score'] < 0).groupby(['peer', 'observer'])
    n = neg.agg({'score': [np.min, np.max, np.median, np.mean]})
    n
else:
    no_scores_message()

### Show honest peers with negative scores, joined with tracer metrics

In [None]:
if not scores.empty:
    # select columns from metrics table
    m = metrics[['peer', 'published', 'delivered', 'rejected']]

    pd.DataFrame(n).merge(m, on='peer').groupby('peer').head()
else:
    no_scores_message()

#### global min/max score over time

In [None]:
time_annotations = [
    {'label': 'warmup complete', 'time': t_run},
    {'label': 'cooldown begin', 'time': t_cool},
]

def annotate_score_plot(plot, label, legend_anchor=None):
    notebook_helper.annotate_score_plot(plot, label, legend_anchor=legend_anchor, time_annotations=time_annotations)

if not scores.empty:
    fig, ax = plt.subplots(2, figsize=(11, 8))
    plt.subplots_adjust(hspace=0.5)
    fig.suptitle("Min / Max Peer Scores")
    plot = sampled['score'].agg([np.min, np.max]).plot(ax=ax[0])
    annotate_score_plot(plot, 'All Peers min/max', legend_anchor=(0, -0.2))
    
    # hide bottom plot so legend is visible
    ax[1].set_visible(False)
    
    # write png
    save_fig(fig, 'score-min-max')
else:
    no_scores_message()

#### global mean / median score over time

In [None]:
if not scores.empty:    
    # create 4 subplots - three for charts and one that we'll hide to make blank space
    # for the legend
    fig, ax = plt.subplots(2, figsize=(11, 8))
    plt.subplots_adjust(hspace=0.5)
    fig.suptitle('Mean / Median Peer Scores')
    plot = sampled['score'].agg([np.mean, np.median]).plot(ax=ax[0])
    annotate_score_plot(plot, 'All Peers mean/median',  legend_anchor=(0, -0.2))

    # hide bottom plot so legend is visible
    ax[1].set_visible(False)
    
    save_fig(fig, 'score-mean-median')
else:
    no_scores_message()

#### mean score distribution (all peers)

In [None]:
if not scores.empty:
    plot = scores[['peer', 'score']].groupby('peer').mean().plot.hist(bins=20)
    fig = plot.get_figure()
    fig.suptitle('Mean score distribution (all peers)')
    save_fig(fig, 'score-global-mean-distribution')
else:
    no_scores_message()

#### score distributions (honest vs attacker)

In [None]:
aggregations = [np.min, np.max, p25, p75, p95]
kwargs = {'kind': 'hist', 'subplots': True, 'sharex': True, 'sharey': True, 'figsize': (8, 10)}

if not scores.empty:
    scores_by_peer = scores.groupby('peer')
    
    plots = scores_by_peer['score'].agg(aggregations).plot(title='Score distributions (all peers)', **kwargs)
    for p in plots:
        p.set_ylabel('freq')
    fig = plots[0].get_figure()
    save_fig(fig, 'score-distributions-all-peers')

else:
    no_scores_message()

In [None]:
# zip all figure images into a bundle
archive_figures(FIGURE_OUT, FIGURE_ZIP_OUT)