687 lines
162 KiB
Plaintext
687 lines
162 KiB
Plaintext
|
{
|
|||
|
"cells": [
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"id": "P8ihE6AA_nRH"
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"# Modeling of replication state\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"id": "gHIcGWr6u5I6"
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"## Initial \"clairvoyant\" model based on\n",
|
|||
|
"\n",
|
|||
|
"*`Giroire, Frederic, Julian Monteiro, and Stéphane Pérennes. ‘Peer-to-Peer Storage Systems: A Practical Guideline to Be Lazy’. In 2010 IEEE Global Telecommunications Conference GLOBECOM 2010, 1–6, 2010. https://doi.org/10/c47cmb.`*\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"id": "mHR26MQvLgh8"
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"Code based on\n",
|
|||
|
"- https://github.com/TommasoBelluzzo/PyDTMC\n",
|
|||
|
"- https://ipython-books.github.io/131-simulating-a-discrete-time-markov-chain/"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"id": "hyN4uhLwJ8FR"
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"import numpy as np\n",
|
|||
|
"import matplotlib.pyplot as plt\n",
|
|||
|
"!pip install PyDTMC --quiet\n",
|
|||
|
"import pydtmc\n",
|
|||
|
"plt.rcParams['figure.figsize'] = [15, 8]"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"id": "9sBOkAyjBMVW"
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"Our first model, based on [Giroire2010], assumes perfect visibility of redundancy state of individual erasure coded blocks, and the consequent immediate triggering of the reconstruction process. It models $(n,k)$ MDS erasure coding, with any $k$ of $n$ chunks enough to successfully reconstruct all $n$ erasure coded chunks.\n",
|
|||
|
"\n",
|
|||
|
"It models $r$, the current level of redundancy, which is the current amount of chunks available over $k$\n",
|
|||
|
"\n",
|
|||
|
"It assumes an $r_0$ threashold. If redundancy reaches that level, reconstruction starts immediately.\n",
|
|||
|
"\n",
|
|||
|
"It is not modelling behvious below $k$ available chunks. As soon as the number of chunks go below $k$, the block is assumed to be lost. (For reasons of modeling, it is assumed that a lost block is replaced by a new fully redundant block.)\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"id": "DmlI4_mzn64C"
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"k = 16 # (K) initial fragments of a block\n",
|
|||
|
"n = 32 # coded fragments\n",
|
|||
|
"# r = n-k # redundancy fragments\n",
|
|||
|
"r0 = 8 # reconstruction threshold\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"id": "EErtFXELVw67"
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### The DTMC model\n",
|
|||
|
"\n",
|
|||
|
"Lets use a discrete time model with time step τ.\n",
|
|||
|
"\n",
|
|||
|
"The basis of state transition probabilities is disk (node) failure rates and reconstruction time. There are various statistics about disk failures, see e.g. https://www.usenix.org/conference/fast-07/disk-failures-real-world-what-does-mttf-1000000-hours-mean-you\n",
|
|||
|
"\n",
|
|||
|
"As a first approximation, one could start from MTTF numbers and try to factor in other reasons of permanent node failure. \n",
|
|||
|
"\n",
|
|||
|
"With a given MTTF and assuming i.i.d. disk failures, the probability for a disk to fail during a timestep is α = 1/MTTF. Reconstruction might also be going on. If reconstruction is on-going, and reconstruction of a block takes MTTR time steps, the probability of a block beeing reconstructed in a time step can be modeled as γ = 1/MTTR.\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"id": "XJ0VypCTWu9R"
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"τ = 1 # time step [hours].\n",
|
|||
|
"MTTF = 1e+4 # mean time to failure [hours]. Although disks are specified to MTTF in the range of 1e+6, these numbers are irrealistic for node failures.\n",
|
|||
|
"MTTR = 12 # mean time to reapair [hours]"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"id": "qPnSbur7BFhp"
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"The model has r+2 states: \n",
|
|||
|
"- r+1 states of enough redundancy, from 0 to r chunks of redundancy,\n",
|
|||
|
"- and one special state where redundancy is not enough, and the block is lost.\n",
|
|||
|
"\n",
|
|||
|
"The original figure from the paper is shown below\n",
|
|||
|
"\n",
|
|||
|
"![image.png](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAABBQAAAH4CAYAAAD6ob7CAAAMbmlDQ1BJQ0MgUHJvZmlsZQAASImVVwdYU8kWnluSkJDQAghICb0J0gkgJYQWQHoRRCUkgYQSY0JQsZdFBdcuoljRVRHFtgIiNixYWBR7XyyoKOuiLjZU3oQEdN1Xvne+b+79c+bMf8qdyb0HAM0PXIkkH9UCoEBcKE0ID2aMSUtnkJ4CBBgAPWAD1Lk8mYQVFxcNoAze/y7vbkBrKFedFFz/nP+vosMXyHgAIBkQZ/FlvAKITwCAr+dJpIUAEBV6y8mFEgWeDbGuFAYI8SoFzlHinQqcpcRNAzZJCWyILwOgRuVypTkAaNyDekYRLwfyaHyG2EXMF4kB0BwBcQBPyOVDrIh9REHBRAWugNgO2ksghvEAZtZ3nDl/488a4udyc4awMq8BUQsRyST53Kn/Z2n+txTkywd92MBBFUojEhT5wxreypsYpcBUiLvFWTGxilpD/EHEV9YdAJQilEckK+1RY56MDesH9CF24XNDoiA2hjhMnB8TrdJnZYvCOBDD3YJOERVykiA2gHihQBaaqLLZLJ2YoPKF1mVL2SyV/hxXOuBX4euBPC+ZpeJ/IxRwVPyYRrEwKRViCsRWRaKUGIg1IHaW5SVGqWxGFQvZMYM2UnmCIn4riBME4vBgJT9WlC0NS1DZlxbIBvPFNgtFnBgVPlAoTIpQ1gc7zeMOxA9zwS4LxKzkQR6BbEz0YC58QUioMnfsuUCcnKji+SApDE5QrsUpkvw4lT1uIcgPV+gtIPaQFSWq1uIphXBzKvnxbElhXJIyTrw4lxsZp4wHXwaiARuEAAaQw5EFJoJcIGrrru+Gv5QzYYALpCAHCICTSjO4InVgRgyviaAY/AGRAMiG1gUPzApAEdR/GdIqr04ge2C2aGBFHngKcQGIAvnwt3xglXjIWwp4AjWif3jnwsGD8ebDoZj/9/pB7TcNC2qiVRr5oEeG5qAlMZQYQowghhHtcSM8APfDo+E1CA43nIn7DObxzZ7wlNBOeES4Tugg3J4gmiv9IcrRoAPyh6lqkfV9LXAbyOmJB+P+kB0y4/q4EXDCPaAfFh4IPXtCLVsVt6IqjB+4/5bBd09DZUd2IaPkYeQgst2PKzUcNDyHWBS1/r4+ylizhurNHpr50T/7u+rz4T3qR0tsIXYQa8FOYuexJqweMLDjWAPWih1V4KHd9WRgdw16SxiIJw/yiP7hj6vyqaikzKXGpcvls3KuUDClUHHw2BMlU6WiHGEhgwXfDgIGR8xzHsFwc3FzA0DxrlH+fb2NH3iHIPqt33TzfgfA/3h/f/+Rb7rI4wDs94bH//A3nR0TAG11AM4d5smlRUodrrgQ4L+EJjxphsAUWAI7mI8b8AJ+IAiEgkgQC5JAGhgPqyyE+1wKJoPpYA4oAWVgGVgN1oFNYCvYCfaAA6AeNIGT4Cy4CC6D6+Au3D2d4CXoAe9AH4IgJISG0BFDxAyxRhwRN4SJBCChSDSSgKQhmUgOIkbkyHRkHlKGrEDWIVuQamQ/chg5iZxH2pHbyEOkC3mDfEIxlIrqoiaoDToSZaIsNApNQsehOegktBidjy5BK9AqdDdah55EL6LX0Q70JdqLAUwd08fMMSeMibGxWCwdy8ak2EysFCvHqrBarBE+56tYB9aNfcSJOB1n4E5wB0fgyTgPn4TPxBfj6/CdeB1+Gr+KP8R78K8EGsGY4EjwJXAIYwg5hMmEEkI5YTvhEOEMPEudhHdEIlGfaEv0hmcxjZhLnEZcTNxA3Es8QWwnPib2kkgkQ5IjyZ8US+KSCkklpLWk3aTjpCukTtIHNXU1MzU3tTC1dDWx2ly1crVdasfUrqg9U+sja5Gtyb7kWDKfPJW8lLyN3Ei+RO4k91G0KbYUf0oSJZcyh1JBqaWcodyjvFVXV7dQ91GPVxepz1avUN+nfk79ofpHqg7VgcqmZlDl1CXUHdQT1NvUtzQazYYWREunFdKW0Kppp2gPaB806BrOGhwNvsYsjUqNOo0rGq80yZrWmizN8ZrFmuWaBzUvaXZrkbVstNhaXK2ZWpVah7VuavVq07VdtWO1C7QXa+/SPq/9XIekY6MTqsPXma+zVeeUzmM6Rreks+k8+jz6NvoZeqcuUddWl6Obq1umu0e3TbdHT0fPQy9Fb4pepd5RvQ59TN9Gn6Ofr79U/4D+Df1Pw0yGsYYJhi0aVjvsyrD3BsMNggwEBqUGew2uG3wyZBiGGuYZLjesN7xvhBs5GMUbTTbaaHTGqHu47nC/4bzhpcMPDL9jjBo7GCcYTzPeatxq3GtiahJuIjFZa3LKpNtU3zTINNd0lekx0y4zulmAmchsldlxsxcMPQaLkc+oYJxm9Jgbm0eYy823mLeZ91nYWiRbzLXYa3HfkmLJtMy2XGXZbNljZWY12mq6VY3VHWuyNdNaaL3GusX6vY2tTarNApt6m+e2BrYc22LbGtt7djS7QLtJdlV21+yJ9kz7PPsN9pcdUAdPB6FDpcMlR9TRy1HkuMGxfQRhhM8I8YiqETedqE4spyKnGqeHzvrO0c5zneudX420Gpk+cvnIlpFfXTxd8l22udx11XGNdJ3r2uj6xs3BjedW6XbNneYe5j7LvcH9tYejh8Bjo8ctT7rnaM8Fns2eX7y8vaRetV5d3lbemd7rvW8ydZlxzMXMcz4En2CfWT5NPh99vXwLfQ/4/unn5Jfnt8vv+SjbUYJR20Y99rfw5/pv8e8IYARkBmwO6Ag0D+QGVgU+CrIM4gdtD3rGsmflsnazXgW7BEuDDwW/Z/uyZ7BPhGAh4SGlIW2hOqHJoetCH4RZhOWE1YT1hHuGTws/EUGIiIpYHnGTY8Lhcao5PZHekTMiT0dRoxKj1kU9inaIlkY3jkZHR45eOfpejHWMOKY+FsRyYlfG3o+zjZsUdySeGB8XXxn/NME1YXpCSyI9cULirsR3ScFJS5PuJtsly5ObUzRTMlKqU96nhqSuSO0YM3LMjDEX04zSRGkN6aT0lPTt6b1jQ8euHtuZ4ZlRknFjnO24KePOjzcanz/+6ATNCdwJBzMJmamZuzI/c2O5VdzeLE7W+qweHpu3hveSH8Rfxe8S+AtWCJ5l+2evyH6e45+zMqdLGCgsF3aL2KJ1ote5Ebmbct/nxebtyOvPT83fW6BWkFlwWKwjzhOfnmg6ccrEdomjpETSMcl30upJPdIo6XYZIhsnayjUhR/1rXI7+U/yh0UBRZVFHyanTD44RXuKeErrVIepi6Y+Kw4r/mUaPo03rXm6+fQ50x/OYM3YMhOZmTWzeZblrPmzOmeHz945hzInb85vc13mrpj717zUeY3zTebPnv/4p/Cfako0SqQlNxf4Ldi0EF8oWti2yH3R2kVfS/mlF8pcysrLPi/mLb7ws+vPFT/3L8le0rbUa+nGZcRl4mU3lgcu37lCe0XxiscrR6+sW8VYVbrqr9UTVp8v9yjftIayRr6moyK6omGt1dplaz+vE667XhlcuXe98fpF699v4G+4sjFoY+0mk01lmz5tFm2+tSV8S12VTVX5VuLWoq1Pt6Vsa/mF+Uv1dqPtZdu/7BDv6NiZsPN0tXd19S7jXUtr0Bp5TdfujN2X94Tsaah1qt2yV39v2T6wT77vxf7M/TcORB1oPsg8WPur9a/rD9EPldYhdVPreuqF9R0NaQ3thyMPNzf6NR464nxkR5N5U+VRvaNLj1GOzT/Wf7z4eO8JyYnukzknHzdPaL57asypa6fjT7ediTpz7mzY2VMtrJbj5/zPNZ33PX/4AvNC/UWvi3Wtnq2HfvP87VCbV1vdJe9LDZd9Lje2j2o/diXwysmrIVfPXuNcu3g95nr7jeQbt25m3Oy4xb/1/Hb+7dd3iu703Z19j3Cv9L7W/fIHxg+qfrf/fW+HV8fRhyEPWx8lPrr7mPf45RPZk8+d85/SnpY/M3tW/dzteVNXWNflF2NfdL6UvOzrLvlD+4/1r+xe/fpn0J+tPWN6Ol9LX/e/WfzW8O2Ovzz+au6N633
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"id": "XYA2qEo0SpQb"
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"def dtmc_clearvoyant(k, n, r0, tau, mttf, mttr):\n",
|
|||
|
"\n",
|
|||
|
" r = n-k\n",
|
|||
|
" alpha = tau/mttf # probability of disk failure in a timestep\n",
|
|||
|
" gamma = min(1, tau/mttr) # probabilty of disk reconstructed in a timestep\n",
|
|||
|
"\n",
|
|||
|
" # State transition matrix:\n",
|
|||
|
" # index is the amount of redundancy, shifted by 1 \n",
|
|||
|
" # - 0: less that k good chunks, i.e. dead block\n",
|
|||
|
" # - i: k+(i-1) chunks are good, r-(i-1) chunks erased \n",
|
|||
|
" p = np.zeros((r+2,r+2))\n",
|
|||
|
"\n",
|
|||
|
" def delta(i):\n",
|
|||
|
" # chunk loss probability in a timestep given i redundancy\n",
|
|||
|
" return (k+i) * alpha\n",
|
|||
|
"\n",
|
|||
|
" for i in range(0,r+1):\n",
|
|||
|
" p[i+1,i] = delta(i)\n",
|
|||
|
" for i in range(0, r0+1):\n",
|
|||
|
" p[i+1,r+1] = gamma * (1 - delta(i))\n",
|
|||
|
" p[0,r+1] = 1\n",
|
|||
|
" for i in range(0, r+2):\n",
|
|||
|
" t = 0\n",
|
|||
|
" for j in range(0, r+2):\n",
|
|||
|
" t += p[i,j]\n",
|
|||
|
" p[i,i] = 1 - t\n",
|
|||
|
"\n",
|
|||
|
" mc = pydtmc.MarkovChain(p)\n",
|
|||
|
" return mc"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"id": "7e_4tUa4w3Gl"
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"mc = dtmc_clearvoyant(k, n, r0, τ, MTTF, MTTR)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"id": "P7J0rm_fVsJB"
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"#pydtmc.plot_graph(mc)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"id": "vxDA6HUYyW4e"
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"Let's derive the expected distribution of replication state (measured as available redundancy) for individual blocks. "
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"id": "J0_BvLhso7v9"
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"statdist = mc.stationary_distributions[0]"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"id": "nk0pa4XcVChZ"
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"def plot_dist(statdist):\n",
|
|||
|
" x = np.arange(len(statdist))\n",
|
|||
|
" plt.rcParams['figure.figsize'] = [15, 8]\n",
|
|||
|
" plt.subplot(211)\n",
|
|||
|
" plt.xticks(x-1)\n",
|
|||
|
" plt.xlabel(\"redundancy (extra chunks)\")\n",
|
|||
|
" plt.bar(x-1, statdist)\n",
|
|||
|
" plt.subplot(212)\n",
|
|||
|
" plt.yscale('log')\n",
|
|||
|
" plt.xticks(x-1)\n",
|
|||
|
" plt.xlabel(\"redundancy (extra chunks)\")\n",
|
|||
|
" plt.bar(x-1, statdist)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"colab": {
|
|||
|
"base_uri": "https://localhost:8080/",
|
|||
|
"height": 497
|
|||
|
},
|
|||
|
"id": "qHs1pnBnM8zU",
|
|||
|
"outputId": "cd484d0f-c143-4241-a399-0686e1bd1028"
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"plot_dist(statdist)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"id": "f0N8GWrJ80Q0"
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### Loss Rate"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"id": "8u-Vx34EAjsr"
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"Loss rate is derived from the state representing dead blocks with r below 0.\n",
|
|||
|
"$$LossRate = P (dead)/τ$$ "
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"colab": {
|
|||
|
"base_uri": "https://localhost:8080/"
|
|||
|
},
|
|||
|
"id": "BBmMD-5_ISqQ",
|
|||
|
"outputId": "ce3ca093-c042-47a4-ef49-e605577a9196"
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"def lossrate(mc, tau): \n",
|
|||
|
" statdist = mc.stationary_distributions[0]\n",
|
|||
|
" LossRateBlock = statdist[0] / τ \n",
|
|||
|
" return LossRateBlock\n",
|
|||
|
"\n",
|
|||
|
"print(\"block loss rate:\", lossrate(mc, τ), \"/hour\")"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"id": "bUSYxLSVDXqL"
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"System level loss rate is derived from the state representing dead blocks with r below 0.\n",
|
|||
|
"$$LossRate = B · P (dead)/τ$$ \n",
|
|||
|
"Where $B$ is the total number of blocks in the system, and $τ$ is the time unit (used to derive transition probabilities)."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"id": "Qvq5-PTYS9Sf"
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"N = 500 #number of peers\n",
|
|||
|
"D = 20e+12 # total amount of data in the system [bytes]\n",
|
|||
|
"l_f = 320e+3 # chunk size [bytes]\n",
|
|||
|
"l_b = k * l_f # erasure block size"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"colab": {
|
|||
|
"base_uri": "https://localhost:8080/"
|
|||
|
},
|
|||
|
"id": "dulN07WjT_Se",
|
|||
|
"outputId": "3b8261e1-28a4-4b93-d2fb-5fc20b69ab39"
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"B = D/l_b #number of blocks in the system\n",
|
|||
|
"print(\"number of blocks:\", B)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"colab": {
|
|||
|
"base_uri": "https://localhost:8080/"
|
|||
|
},
|
|||
|
"id": "5lWenTGcCpOl",
|
|||
|
"outputId": "a7dbc95d-33b5-45c6-baa6-89c9e57479d8"
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"LossRateSystem = B * statdist[0] / τ \n",
|
|||
|
"print(\"block loss rate at system level:\", LossRateSystem, \"/hour\")"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"id": "DtJa1whS9Icf"
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### Repair Bandwidth\n",
|
|||
|
"\n",
|
|||
|
"Lazy repair allows for the optimization of repair bandwith by starting repair only when several chunks are missing.\n",
|
|||
|
"\n",
|
|||
|
"The amount of data needed to reconstruct a missing block is k chunks, independent of the number of missing chunks (which is $r-i$ with replication state $i$ . If repair is done independently for each missing chunk, the traffic generated is $(r-i) * k$ chunks, meaning that is does not really matter when we start repair.\n",
|
|||
|
"\n",
|
|||
|
"If, instead, there is a repair node, it can generate all $r-i$ missing chunks and distribute these, at a cost of $k + r-i-1$ chunk transfers (assuming one of the chunks is stored by the repair node)."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"id": "qPOC33LL_DzF"
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"def repair_bw(mc, k, n, tau):\n",
|
|||
|
" r = n-k\n",
|
|||
|
" statdist = mc.stationary_distributions[0]\n",
|
|||
|
" bw = 0\n",
|
|||
|
" for i in range(1, r+1):\n",
|
|||
|
" bw += statdist[i] * mc.p[i,r+1] * (k + r - i - 1) / tau\n",
|
|||
|
"\n",
|
|||
|
" return bw/k\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"colab": {
|
|||
|
"base_uri": "https://localhost:8080/",
|
|||
|
"height": 501
|
|||
|
},
|
|||
|
"id": "A5RpvECfDk23",
|
|||
|
"outputId": "68a895b8-c60d-44d3-ff07-b9e31fe7ac61"
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"k=16\n",
|
|||
|
"n=32\n",
|
|||
|
"tau=1\n",
|
|||
|
"mttf=1e4\n",
|
|||
|
"mttr=12\n",
|
|||
|
"r0=np.arange(0,n-k)\n",
|
|||
|
"\n",
|
|||
|
"def plot_clearvoyant(k, n, r0, tau, mttf, mttr):\n",
|
|||
|
" clearvoyant = np.vectorize(dtmc_clearvoyant)(k, n, r0, tau, mttf, mttr)\n",
|
|||
|
" bw = np.vectorize(repair_bw)(clearvoyant, k, n, tau)\n",
|
|||
|
" loss = np.vectorize(lossrate)(clearvoyant, tau)\n",
|
|||
|
"\n",
|
|||
|
" plt.plot(loss, bw, '-o', label=f'RS({n},{k})')\n",
|
|||
|
" plt.ylabel(\"repair bandwidth [chunks/hour]\")\n",
|
|||
|
" plt.xlabel(\"loss rate [per hour]\")\n",
|
|||
|
" plt.xscale('log')\n",
|
|||
|
" for i, txt in enumerate(r0):\n",
|
|||
|
" plt.annotate(txt, (loss[i], bw[i]))\n",
|
|||
|
"\n",
|
|||
|
"def plot_clearvoyant_r0(k, n, tau, mttf, mttr):\n",
|
|||
|
" r0=np.arange(0,n-k)\n",
|
|||
|
" plot_clearvoyant(k, n, r0, tau, mttf, mttr)\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"tau=1\n",
|
|||
|
"mttf=1e4\n",
|
|||
|
"mttr=240\n",
|
|||
|
"plot_clearvoyant_r0(10, 15, tau, mttf, mttr)\n",
|
|||
|
"plot_clearvoyant_r0(16, 32, tau, mttf, mttr)\n",
|
|||
|
"plot_clearvoyant_r0(50, 100, tau, mttf, mttr)\n",
|
|||
|
"plt.legend()\n",
|
|||
|
"plt.show()\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"id": "Qm6mJ5Oj_DfF"
|
|||
|
},
|
|||
|
"source": []
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"id": "63EgFQq0Ylix"
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"# Modeling of verification\n",
|
|||
|
"\n",
|
|||
|
"Until now, we have considered a system where the state of each erasure coded block is perfectly visible, and thus reconstruction is based on perfect information. Here we model what happens if information is only partial.\n",
|
|||
|
"\n",
|
|||
|
"We can differentiate between the following two repair approaches:\n",
|
|||
|
"- performs a single test and start repair as soon as it fails\n",
|
|||
|
"- estimate $r$ based on a test and start repair based on this estimate\n",
|
|||
|
"\n",
|
|||
|
"First, lets assume a repair process that is triggered by a simple verification process with the following parameters:\n",
|
|||
|
"- $fail(r)$ is the negative outcome of the verification (test failed) as a function of the actual state of redundancy"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"id": "cmshiH8jfY7e"
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"def fail(i): # assuming the simplest test of checking one block\n",
|
|||
|
" return 1 - (k+i) / (k+r)\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"id": "kVUuEXEqZpMZ"
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"Lets see how these modify our transition matrix. If a block is not verified, it cannot start repair. If instead a block is being verified, there is $fail(r)$ chance it starts repair, while $1 - fail(r)$ nothing will happen."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"id": "_UVj9nBVcoqf"
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"def dtmc_simplequery(k, n, tau, mttf, mttr):\n",
|
|||
|
"\n",
|
|||
|
" r=n-k\n",
|
|||
|
" alpha = tau/mttf # probability of disk failure in a timestep\n",
|
|||
|
" gamma = min(1, tau/mttr) # probabilty of disk reconstructed in a timestep\n",
|
|||
|
"\n",
|
|||
|
" def fail(i): # assuming the simplest test of checking one block\n",
|
|||
|
" return 1 - (k+i) / (k+r)\n",
|
|||
|
"\n",
|
|||
|
" def delta(i):\n",
|
|||
|
" # chunk loss probability in a timestep given i redundancy\n",
|
|||
|
" return (k+i) * alpha\n",
|
|||
|
"\n",
|
|||
|
" p = np.zeros((r+2,r+2))\n",
|
|||
|
" for i in range(0,r+1):\n",
|
|||
|
" p[i+1,i] = delta(i)\n",
|
|||
|
" for i in range(0, r):\n",
|
|||
|
" p[i+1,r+1] = gamma * fail(i+1)\n",
|
|||
|
" p[0,r+1] = 1\n",
|
|||
|
" for i in range(0, r+2):\n",
|
|||
|
" t = 0\n",
|
|||
|
" for j in range(0, r+2):\n",
|
|||
|
" t += p[i,j]\n",
|
|||
|
" p[i,i] = 1 - t\n",
|
|||
|
"\n",
|
|||
|
" mc = pydtmc.MarkovChain(p)\n",
|
|||
|
" return mc"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"id": "Xje_lCGeznI9"
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"mc = dtmc_simplequery(k, n, τ, MTTF, MTTR)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"colab": {
|
|||
|
"base_uri": "https://localhost:8080/",
|
|||
|
"height": 497
|
|||
|
},
|
|||
|
"id": "J5WDCPtmeXle",
|
|||
|
"outputId": "d75ef6a3-eb83-4839-a0ad-0ef1b746fec9"
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"statdist = mc.stationary_distributions[0]\n",
|
|||
|
"plot_dist(statdist)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"colab": {
|
|||
|
"base_uri": "https://localhost:8080/"
|
|||
|
},
|
|||
|
"id": "TXzWIje4f8jL",
|
|||
|
"outputId": "97d6c8a9-b9ad-425e-cc74-f5d90409f3f5"
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"LossRate = B * statdist[0] / τ \n",
|
|||
|
"print(\"block loss rate:\", LossRate, \"/hour\")"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"colab": {
|
|||
|
"base_uri": "https://localhost:8080/",
|
|||
|
"height": 515
|
|||
|
},
|
|||
|
"id": "iKJTxHoWQqjg",
|
|||
|
"outputId": "b1f52dfb-1a62-41f9-b9b0-cd2d898fb729"
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"from scipy.stats import hypergeom\n",
|
|||
|
"\n",
|
|||
|
"def dtmc_multiquery(k, n, l, maxfail, tau, mttf, mttr):\n",
|
|||
|
" # k: initial fragments of a block\n",
|
|||
|
" # n: coded fragments\n",
|
|||
|
" # l: query length\n",
|
|||
|
" # maxfail: max failures allowed (-1: test always fails, 0: test passes only if all are good, 1: allows one to fail ... l: always pass)\n",
|
|||
|
" # tau: time step [hours].\n",
|
|||
|
" # mttf: mean time to failure [hours].\n",
|
|||
|
"\n",
|
|||
|
" alpha = tau/mttf # probability of disk failure in a timestep\n",
|
|||
|
" gamma = min(1, tau/mttr) # probabilty of disk reconstructed in a timestep, lower bound to 1 timestep\n",
|
|||
|
" r = n-k # redundancy fragments\n",
|
|||
|
" p_v = 1 \n",
|
|||
|
"\n",
|
|||
|
" def fail(i): # prob. test failed with i redundancy of r remaining\n",
|
|||
|
" #return 1 - hypergeom(n, k+i, l).sf(r-maxfail) #n chunks, of which k+i are good (r-i are bad), l are tested, fail if at least f fail\n",
|
|||
|
" return hypergeom(n, r-i, l).sf(maxfail) #n chunks, of which k+i are good (r-i are bad), l are tested, fail if at least maxfail fail\n",
|
|||
|
"\n",
|
|||
|
" def delta(i):\n",
|
|||
|
" return (k+i) * alpha\n",
|
|||
|
"\n",
|
|||
|
" p = np.zeros((r+2,r+2))\n",
|
|||
|
" for i in range(0,r+1):\n",
|
|||
|
" p[i+1,i] = delta(i)\n",
|
|||
|
" for i in range(0, r):\n",
|
|||
|
" p[i+1,r+1] = gamma * p_v * fail(i+1)\n",
|
|||
|
" p[0,r+1] = 1\n",
|
|||
|
" for i in range(0, r+2):\n",
|
|||
|
" t = 0\n",
|
|||
|
" for j in range(0, r+2):\n",
|
|||
|
" t += p[i,j]\n",
|
|||
|
" p[i,i] = 1 - t\n",
|
|||
|
"\n",
|
|||
|
" mc = pydtmc.MarkovChain(p)\n",
|
|||
|
" return mc\n",
|
|||
|
"\n",
|
|||
|
"τ = 1 # time step [hours].\n",
|
|||
|
"MTTF = 1e+4 # mean time to failure [hours]. Although disks are specified to MTTF in the range of 1e+6, these numbers are irrealistic for node failures.\n",
|
|||
|
"alpha = 1/MTTF # probability of disk failure in a timestep\n",
|
|||
|
"MTTR = 50 # probabilty of disk reconstructed in a timestep\n",
|
|||
|
"K = 16 # (K) initial fragments of a block\n",
|
|||
|
"N = 32 # coded fragments\n",
|
|||
|
"L = 6 #query length\n",
|
|||
|
"MAXFAIL = 2 #max failures allowed (-1: test always fails, 0: test passes only if all are good, 1: allows one to fail ... l: always pass)\n",
|
|||
|
"\n",
|
|||
|
"mc = dtmc_multiquery(K, N, L, MAXFAIL, L, MTTF, MTTR)\n",
|
|||
|
"statdist = mc.stationary_distributions[0]\n",
|
|||
|
"\n",
|
|||
|
"#plt.yscale('log')\n",
|
|||
|
"plot_dist(statdist)\n",
|
|||
|
"LossRate = B * statdist[0] / τ \n",
|
|||
|
"print(\"block loss rate:\", LossRate, \"/hour\")"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"id": "v1cgSNHNpnP6"
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"def plot_multiquery(k, n, l, maxfail, tau, mttf, mttr):\n",
|
|||
|
" mcs = np.vectorize(dtmc_multiquery)(k, n, l, maxfail, tau, mttf, mttr)\n",
|
|||
|
" bw = np.vectorize(repair_bw)(mcs, k, n, tau)\n",
|
|||
|
" loss = np.vectorize(lossrate)(mcs, tau)\n",
|
|||
|
"\n",
|
|||
|
" plt.plot(loss, bw, '-o', label=f'RS({n},{k}) - mq({maxfail}/{l})')\n",
|
|||
|
" plt.ylabel(\"repair bandwidth [chunks/hour]\")\n",
|
|||
|
" plt.xlabel(\"loss rate [per hour]\")\n",
|
|||
|
" plt.xscale('log')\n",
|
|||
|
" plt.yscale('log')\n",
|
|||
|
" for i, txt in enumerate(maxfail):\n",
|
|||
|
" plt.annotate(txt, (loss[i], bw[i]))\n",
|
|||
|
"\n",
|
|||
|
"def plot_multiquery_maxfail(k, n, l, tau, mttf, mttr):\n",
|
|||
|
" maxfail=np.arange(0, l)\n",
|
|||
|
" plot_multiquery(k, n, l, maxfail, tau, mttf, mttr)\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"colab": {
|
|||
|
"base_uri": "https://localhost:8080/",
|
|||
|
"height": 501
|
|||
|
},
|
|||
|
"id": "dW5J4tFAqXaQ",
|
|||
|
"outputId": "b8cf0919-8d9c-4094-b2b1-b247c6581a33"
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"tau=1\n",
|
|||
|
"mttf=1e4\n",
|
|||
|
"mttr=24\n",
|
|||
|
"plot_multiquery_maxfail(16, 32, 8, 8*tau, mttf, mttr)\n",
|
|||
|
"plot_multiquery_maxfail(16, 32, 4, 4*tau, mttf, mttr)\n",
|
|||
|
"plot_multiquery_maxfail(16, 32, 1, 1*tau, mttf, mttr)\n",
|
|||
|
"plot_multiquery_maxfail(50, 100, 8, 8*tau, mttf, mttr)\n",
|
|||
|
"#plot_clearvoyant_r0(10, 15, tau, mttf, mttr)\n",
|
|||
|
"plot_clearvoyant_r0(16, 32, tau, mttf, mttr)\n",
|
|||
|
"plot_clearvoyant_r0(50, 100, tau, mttf, mttr)\n",
|
|||
|
"plt.legend()\n",
|
|||
|
"plt.show()\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"colab": {
|
|||
|
"collapsed_sections": [],
|
|||
|
"name": "Modeling durability using replication state and related metrics (Markov Chain Model)",
|
|||
|
"provenance": []
|
|||
|
},
|
|||
|
"kernelspec": {
|
|||
|
"display_name": "Python 3",
|
|||
|
"name": "python3"
|
|||
|
},
|
|||
|
"language_info": {
|
|||
|
"name": "python"
|
|||
|
}
|
|||
|
},
|
|||
|
"nbformat": 4,
|
|||
|
"nbformat_minor": 0
|
|||
|
}
|