dagger-research/analysis/Modeling_durability_using_r...

687 lines
162 KiB
Plaintext
Raw Normal View History

{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "P8ihE6AA_nRH"
},
"source": [
"# Modeling of replication state\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "gHIcGWr6u5I6"
},
"source": [
"## Initial \"clairvoyant\" model based on\n",
"\n",
"*`Giroire, Frederic, Julian Monteiro, and Stéphane Pérennes. Peer-to-Peer Storage Systems: A Practical Guideline to Be Lazy. In 2010 IEEE Global Telecommunications Conference GLOBECOM 2010, 16, 2010. https://doi.org/10/c47cmb.`*\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "mHR26MQvLgh8"
},
"source": [
"Code based on\n",
"- https://github.com/TommasoBelluzzo/PyDTMC\n",
"- https://ipython-books.github.io/131-simulating-a-discrete-time-markov-chain/"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "hyN4uhLwJ8FR"
},
"outputs": [],
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"!pip install PyDTMC --quiet\n",
"import pydtmc\n",
"plt.rcParams['figure.figsize'] = [15, 8]"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "9sBOkAyjBMVW"
},
"source": [
"Our first model, based on [Giroire2010], assumes perfect visibility of redundancy state of individual erasure coded blocks, and the consequent immediate triggering of the reconstruction process. It models $(n,k)$ MDS erasure coding, with any $k$ of $n$ chunks enough to successfully reconstruct all $n$ erasure coded chunks.\n",
"\n",
"It models $r$, the current level of redundancy, which is the current amount of chunks available over $k$\n",
"\n",
"It assumes an $r_0$ threashold. If redundancy reaches that level, reconstruction starts immediately.\n",
"\n",
"It is not modelling behvious below $k$ available chunks. As soon as the number of chunks go below $k$, the block is assumed to be lost. (For reasons of modeling, it is assumed that a lost block is replaced by a new fully redundant block.)\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "DmlI4_mzn64C"
},
"outputs": [],
"source": [
"k = 16 # (K) initial fragments of a block\n",
"n = 32 # coded fragments\n",
"# r = n-k # redundancy fragments\n",
"r0 = 8 # reconstruction threshold\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "EErtFXELVw67"
},
"source": [
"### The DTMC model\n",
"\n",
"Lets use a discrete time model with time step τ.\n",
"\n",
"The basis of state transition probabilities is disk (node) failure rates and reconstruction time. There are various statistics about disk failures, see e.g. https://www.usenix.org/conference/fast-07/disk-failures-real-world-what-does-mttf-1000000-hours-mean-you\n",
"\n",
"As a first approximation, one could start from MTTF numbers and try to factor in other reasons of permanent node failure. \n",
"\n",
"With a given MTTF and assuming i.i.d. disk failures, the probability for a disk to fail during a timestep is α = 1/MTTF. Reconstruction might also be going on. If reconstruction is on-going, and reconstruction of a block takes MTTR time steps, the probability of a block beeing reconstructed in a time step can be modeled as γ = 1/MTTR.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "XJ0VypCTWu9R"
},
"outputs": [],
"source": [
"τ = 1 # time step [hours].\n",
"MTTF = 1e+4 # mean time to failure [hours]. Although disks are specified to MTTF in the range of 1e+6, these numbers are irrealistic for node failures.\n",
"MTTR = 12 # mean time to reapair [hours]"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "qPnSbur7BFhp"
},
"source": [
"The model has r+2 states: \n",
"- r+1 states of enough redundancy, from 0 to r chunks of redundancy,\n",
"- and one special state where redundancy is not enough, and the block is lost.\n",
"\n",
"The original figure from the paper is shown below\n",
"\n",
"![image.png](
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "XYA2qEo0SpQb"
},
"outputs": [],
"source": [
"def dtmc_clearvoyant(k, n, r0, tau, mttf, mttr):\n",
"\n",
" r = n-k\n",
" alpha = tau/mttf # probability of disk failure in a timestep\n",
" gamma = min(1, tau/mttr) # probabilty of disk reconstructed in a timestep\n",
"\n",
" # State transition matrix:\n",
" # index is the amount of redundancy, shifted by 1 \n",
" # - 0: less that k good chunks, i.e. dead block\n",
" # - i: k+(i-1) chunks are good, r-(i-1) chunks erased \n",
" p = np.zeros((r+2,r+2))\n",
"\n",
" def delta(i):\n",
" # chunk loss probability in a timestep given i redundancy\n",
" return (k+i) * alpha\n",
"\n",
" for i in range(0,r+1):\n",
" p[i+1,i] = delta(i)\n",
" for i in range(0, r0+1):\n",
" p[i+1,r+1] = gamma * (1 - delta(i))\n",
" p[0,r+1] = 1\n",
" for i in range(0, r+2):\n",
" t = 0\n",
" for j in range(0, r+2):\n",
" t += p[i,j]\n",
" p[i,i] = 1 - t\n",
"\n",
" mc = pydtmc.MarkovChain(p)\n",
" return mc"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "7e_4tUa4w3Gl"
},
"outputs": [],
"source": [
"mc = dtmc_clearvoyant(k, n, r0, τ, MTTF, MTTR)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "P7J0rm_fVsJB"
},
"outputs": [],
"source": [
"#pydtmc.plot_graph(mc)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "vxDA6HUYyW4e"
},
"source": [
"Let's derive the expected distribution of replication state (measured as available redundancy) for individual blocks. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "J0_BvLhso7v9"
},
"outputs": [],
"source": [
"statdist = mc.stationary_distributions[0]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "nk0pa4XcVChZ"
},
"outputs": [],
"source": [
"def plot_dist(statdist):\n",
" x = np.arange(len(statdist))\n",
" plt.rcParams['figure.figsize'] = [15, 8]\n",
" plt.subplot(211)\n",
" plt.xticks(x-1)\n",
" plt.xlabel(\"redundancy (extra chunks)\")\n",
" plt.bar(x-1, statdist)\n",
" plt.subplot(212)\n",
" plt.yscale('log')\n",
" plt.xticks(x-1)\n",
" plt.xlabel(\"redundancy (extra chunks)\")\n",
" plt.bar(x-1, statdist)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 497
},
"id": "qHs1pnBnM8zU",
"outputId": "cd484d0f-c143-4241-a399-0686e1bd1028"
},
"outputs": [],
"source": [
"plot_dist(statdist)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "f0N8GWrJ80Q0"
},
"source": [
"### Loss Rate"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "8u-Vx34EAjsr"
},
"source": [
"Loss rate is derived from the state representing dead blocks with r below 0.\n",
"$$LossRate = P (dead)/τ$$ "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "BBmMD-5_ISqQ",
"outputId": "ce3ca093-c042-47a4-ef49-e605577a9196"
},
"outputs": [],
"source": [
"def lossrate(mc, tau): \n",
" statdist = mc.stationary_distributions[0]\n",
" LossRateBlock = statdist[0] / τ \n",
" return LossRateBlock\n",
"\n",
"print(\"block loss rate:\", lossrate(mc, τ), \"/hour\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "bUSYxLSVDXqL"
},
"source": [
"System level loss rate is derived from the state representing dead blocks with r below 0.\n",
"$$LossRate = B · P (dead)/τ$$ \n",
"Where $B$ is the total number of blocks in the system, and $τ$ is the time unit (used to derive transition probabilities)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Qvq5-PTYS9Sf"
},
"outputs": [],
"source": [
"N = 500 #number of peers\n",
"D = 20e+12 # total amount of data in the system [bytes]\n",
"l_f = 320e+3 # chunk size [bytes]\n",
"l_b = k * l_f # erasure block size"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "dulN07WjT_Se",
"outputId": "3b8261e1-28a4-4b93-d2fb-5fc20b69ab39"
},
"outputs": [],
"source": [
"B = D/l_b #number of blocks in the system\n",
"print(\"number of blocks:\", B)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "5lWenTGcCpOl",
"outputId": "a7dbc95d-33b5-45c6-baa6-89c9e57479d8"
},
"outputs": [],
"source": [
"LossRateSystem = B * statdist[0] / τ \n",
"print(\"block loss rate at system level:\", LossRateSystem, \"/hour\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "DtJa1whS9Icf"
},
"source": [
"### Repair Bandwidth\n",
"\n",
"Lazy repair allows for the optimization of repair bandwith by starting repair only when several chunks are missing.\n",
"\n",
"The amount of data needed to reconstruct a missing block is k chunks, independent of the number of missing chunks (which is $r-i$ with replication state $i$ . If repair is done independently for each missing chunk, the traffic generated is $(r-i) * k$ chunks, meaning that is does not really matter when we start repair.\n",
"\n",
"If, instead, there is a repair node, it can generate all $r-i$ missing chunks and distribute these, at a cost of $k + r-i-1$ chunk transfers (assuming one of the chunks is stored by the repair node)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "qPOC33LL_DzF"
},
"outputs": [],
"source": [
"def repair_bw(mc, k, n, tau):\n",
" r = n-k\n",
" statdist = mc.stationary_distributions[0]\n",
" bw = 0\n",
" for i in range(1, r+1):\n",
" bw += statdist[i] * mc.p[i,r+1] * (k + r - i - 1) / tau\n",
"\n",
" return bw/k\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 501
},
"id": "A5RpvECfDk23",
"outputId": "68a895b8-c60d-44d3-ff07-b9e31fe7ac61"
},
"outputs": [],
"source": [
"k=16\n",
"n=32\n",
"tau=1\n",
"mttf=1e4\n",
"mttr=12\n",
"r0=np.arange(0,n-k)\n",
"\n",
"def plot_clearvoyant(k, n, r0, tau, mttf, mttr):\n",
" clearvoyant = np.vectorize(dtmc_clearvoyant)(k, n, r0, tau, mttf, mttr)\n",
" bw = np.vectorize(repair_bw)(clearvoyant, k, n, tau)\n",
" loss = np.vectorize(lossrate)(clearvoyant, tau)\n",
"\n",
" plt.plot(loss, bw, '-o', label=f'RS({n},{k})')\n",
" plt.ylabel(\"repair bandwidth [chunks/hour]\")\n",
" plt.xlabel(\"loss rate [per hour]\")\n",
" plt.xscale('log')\n",
" for i, txt in enumerate(r0):\n",
" plt.annotate(txt, (loss[i], bw[i]))\n",
"\n",
"def plot_clearvoyant_r0(k, n, tau, mttf, mttr):\n",
" r0=np.arange(0,n-k)\n",
" plot_clearvoyant(k, n, r0, tau, mttf, mttr)\n",
"\n",
"\n",
"tau=1\n",
"mttf=1e4\n",
"mttr=240\n",
"plot_clearvoyant_r0(10, 15, tau, mttf, mttr)\n",
"plot_clearvoyant_r0(16, 32, tau, mttf, mttr)\n",
"plot_clearvoyant_r0(50, 100, tau, mttf, mttr)\n",
"plt.legend()\n",
"plt.show()\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Qm6mJ5Oj_DfF"
},
"source": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "63EgFQq0Ylix"
},
"source": [
"# Modeling of verification\n",
"\n",
"Until now, we have considered a system where the state of each erasure coded block is perfectly visible, and thus reconstruction is based on perfect information. Here we model what happens if information is only partial.\n",
"\n",
"We can differentiate between the following two repair approaches:\n",
"- performs a single test and start repair as soon as it fails\n",
"- estimate $r$ based on a test and start repair based on this estimate\n",
"\n",
"First, lets assume a repair process that is triggered by a simple verification process with the following parameters:\n",
"- $fail(r)$ is the negative outcome of the verification (test failed) as a function of the actual state of redundancy"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "cmshiH8jfY7e"
},
"outputs": [],
"source": [
"def fail(i): # assuming the simplest test of checking one block\n",
" return 1 - (k+i) / (k+r)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "kVUuEXEqZpMZ"
},
"source": [
"Lets see how these modify our transition matrix. If a block is not verified, it cannot start repair. If instead a block is being verified, there is $fail(r)$ chance it starts repair, while $1 - fail(r)$ nothing will happen."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "_UVj9nBVcoqf"
},
"outputs": [],
"source": [
"def dtmc_simplequery(k, n, tau, mttf, mttr):\n",
"\n",
" r=n-k\n",
" alpha = tau/mttf # probability of disk failure in a timestep\n",
" gamma = min(1, tau/mttr) # probabilty of disk reconstructed in a timestep\n",
"\n",
" def fail(i): # assuming the simplest test of checking one block\n",
" return 1 - (k+i) / (k+r)\n",
"\n",
" def delta(i):\n",
" # chunk loss probability in a timestep given i redundancy\n",
" return (k+i) * alpha\n",
"\n",
" p = np.zeros((r+2,r+2))\n",
" for i in range(0,r+1):\n",
" p[i+1,i] = delta(i)\n",
" for i in range(0, r):\n",
" p[i+1,r+1] = gamma * fail(i+1)\n",
" p[0,r+1] = 1\n",
" for i in range(0, r+2):\n",
" t = 0\n",
" for j in range(0, r+2):\n",
" t += p[i,j]\n",
" p[i,i] = 1 - t\n",
"\n",
" mc = pydtmc.MarkovChain(p)\n",
" return mc"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Xje_lCGeznI9"
},
"outputs": [],
"source": [
"mc = dtmc_simplequery(k, n, τ, MTTF, MTTR)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 497
},
"id": "J5WDCPtmeXle",
"outputId": "d75ef6a3-eb83-4839-a0ad-0ef1b746fec9"
},
"outputs": [],
"source": [
"statdist = mc.stationary_distributions[0]\n",
"plot_dist(statdist)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "TXzWIje4f8jL",
"outputId": "97d6c8a9-b9ad-425e-cc74-f5d90409f3f5"
},
"outputs": [],
"source": [
"LossRate = B * statdist[0] / τ \n",
"print(\"block loss rate:\", LossRate, \"/hour\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 515
},
"id": "iKJTxHoWQqjg",
"outputId": "b1f52dfb-1a62-41f9-b9b0-cd2d898fb729"
},
"outputs": [],
"source": [
"from scipy.stats import hypergeom\n",
"\n",
"def dtmc_multiquery(k, n, l, maxfail, tau, mttf, mttr):\n",
" # k: initial fragments of a block\n",
" # n: coded fragments\n",
" # l: query length\n",
" # maxfail: max failures allowed (-1: test always fails, 0: test passes only if all are good, 1: allows one to fail ... l: always pass)\n",
" # tau: time step [hours].\n",
" # mttf: mean time to failure [hours].\n",
"\n",
" alpha = tau/mttf # probability of disk failure in a timestep\n",
" gamma = min(1, tau/mttr) # probabilty of disk reconstructed in a timestep, lower bound to 1 timestep\n",
" r = n-k # redundancy fragments\n",
" p_v = 1 \n",
"\n",
" def fail(i): # prob. test failed with i redundancy of r remaining\n",
" #return 1 - hypergeom(n, k+i, l).sf(r-maxfail) #n chunks, of which k+i are good (r-i are bad), l are tested, fail if at least f fail\n",
" return hypergeom(n, r-i, l).sf(maxfail) #n chunks, of which k+i are good (r-i are bad), l are tested, fail if at least maxfail fail\n",
"\n",
" def delta(i):\n",
" return (k+i) * alpha\n",
"\n",
" p = np.zeros((r+2,r+2))\n",
" for i in range(0,r+1):\n",
" p[i+1,i] = delta(i)\n",
" for i in range(0, r):\n",
" p[i+1,r+1] = gamma * p_v * fail(i+1)\n",
" p[0,r+1] = 1\n",
" for i in range(0, r+2):\n",
" t = 0\n",
" for j in range(0, r+2):\n",
" t += p[i,j]\n",
" p[i,i] = 1 - t\n",
"\n",
" mc = pydtmc.MarkovChain(p)\n",
" return mc\n",
"\n",
"τ = 1 # time step [hours].\n",
"MTTF = 1e+4 # mean time to failure [hours]. Although disks are specified to MTTF in the range of 1e+6, these numbers are irrealistic for node failures.\n",
"alpha = 1/MTTF # probability of disk failure in a timestep\n",
"MTTR = 50 # probabilty of disk reconstructed in a timestep\n",
"K = 16 # (K) initial fragments of a block\n",
"N = 32 # coded fragments\n",
"L = 6 #query length\n",
"MAXFAIL = 2 #max failures allowed (-1: test always fails, 0: test passes only if all are good, 1: allows one to fail ... l: always pass)\n",
"\n",
"mc = dtmc_multiquery(K, N, L, MAXFAIL, L, MTTF, MTTR)\n",
"statdist = mc.stationary_distributions[0]\n",
"\n",
"#plt.yscale('log')\n",
"plot_dist(statdist)\n",
"LossRate = B * statdist[0] / τ \n",
"print(\"block loss rate:\", LossRate, \"/hour\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "v1cgSNHNpnP6"
},
"outputs": [],
"source": [
"def plot_multiquery(k, n, l, maxfail, tau, mttf, mttr):\n",
" mcs = np.vectorize(dtmc_multiquery)(k, n, l, maxfail, tau, mttf, mttr)\n",
" bw = np.vectorize(repair_bw)(mcs, k, n, tau)\n",
" loss = np.vectorize(lossrate)(mcs, tau)\n",
"\n",
" plt.plot(loss, bw, '-o', label=f'RS({n},{k}) - mq({maxfail}/{l})')\n",
" plt.ylabel(\"repair bandwidth [chunks/hour]\")\n",
" plt.xlabel(\"loss rate [per hour]\")\n",
" plt.xscale('log')\n",
" plt.yscale('log')\n",
" for i, txt in enumerate(maxfail):\n",
" plt.annotate(txt, (loss[i], bw[i]))\n",
"\n",
"def plot_multiquery_maxfail(k, n, l, tau, mttf, mttr):\n",
" maxfail=np.arange(0, l)\n",
" plot_multiquery(k, n, l, maxfail, tau, mttf, mttr)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 501
},
"id": "dW5J4tFAqXaQ",
"outputId": "b8cf0919-8d9c-4094-b2b1-b247c6581a33"
},
"outputs": [],
"source": [
"tau=1\n",
"mttf=1e4\n",
"mttr=24\n",
"plot_multiquery_maxfail(16, 32, 8, 8*tau, mttf, mttr)\n",
"plot_multiquery_maxfail(16, 32, 4, 4*tau, mttf, mttr)\n",
"plot_multiquery_maxfail(16, 32, 1, 1*tau, mttf, mttr)\n",
"plot_multiquery_maxfail(50, 100, 8, 8*tau, mttf, mttr)\n",
"#plot_clearvoyant_r0(10, 15, tau, mttf, mttr)\n",
"plot_clearvoyant_r0(16, 32, tau, mttf, mttr)\n",
"plot_clearvoyant_r0(50, 100, tau, mttf, mttr)\n",
"plt.legend()\n",
"plt.show()\n"
]
}
],
"metadata": {
"colab": {
"collapsed_sections": [],
"name": "Modeling durability using replication state and related metrics (Markov Chain Model)",
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 0
}