sourcecred/CHANGELOG.md
William Chargin 7f81337d74
Store GitHub data gzipped at rest (#751)
Summary:
We store the relational view in `view.json.gz` instead of `view.json`,
taking advantage of the isomorphic `pako` library for gzip encoding and
decoding.

Sample space savings (note that post bodies are included; i.e., #747 has
not been applied):

       SAVE     OLD (B)     NEW (B) REPO
      89.7%       25326        2617 sourcecred/example-github
      82.9%     3257576      555948 sourcecred/sourcecred
      85.2%    11287621     1665884 ipfs/js-ipfs
      88.0%    20953425     2520358 gitcoinco/web
      84.4%    38196825     5951459 ipfs/go-ipfs
      84.9%   205770642    31101452 tensorflow/tensorflow

<details>
<summary>Script to generate space savings output</summary>

```shell
savings() {
    printf '% 7s % 11s % 11s %s\n' 'SAVE' 'OLD (B)' 'NEW (B)' 'REPO'
    for repo; do
        file="${SOURCECRED_DIRECTORY}/data/${repo}/github/view.json.gz"
        if ! [ -f "${file}" ]; then
            printf >&2 'warn: no such file %s\n' "${file}"
            continue
        fi
        script="$(sed -e 's/^ *//' <<EOF
            repo = '${repo}'
            pre_size = $(<"${file}" gzip -dc | wc -c)
            post_size = $(<"${file}" wc -c)
            percentage = '%0.1f%%' % (100 * (1 - post_size / pre_size))
            p = '% 7s % 11d % 11d %s' % (percentage, pre_size, post_size, repo)
            print(p)
EOF
        )"
        python3 -c "${script}"
    done
}
```

</details>

Closes #750.

Test Plan:
Comparing the raw old version with the decompressed new version shows
that they are identical:

```
$ <~/tmp/sourcecred/data/sourcecred/example-github/github/view.json \
> shasum -a 256 -
63853b9d3f918274aafacf5198787e18185a61b9c95faf640a1e61f5d11fa19f  -
$ <~/tmp/sourcecred/data/sourcecred/example-github/github/view.json.gz \
> gzip -dc | shasum -a 256
63853b9d3f918274aafacf5198787e18185a61b9c95faf640a1e61f5d11fa19f  -
```

Additionally, `yarn test --full` passes, and `yarn start` still loads
data and runs PageRank properly.

wchargin-branch: gzip-relational-view
2018-09-01 10:42:30 -07:00

906 B
Raw Blame History

Changelog

[Unreleased]

  • Store GitHub data compressed at rest, reducing space usage by 68× (#750)
  • Improve weight sliders display (#736)
  • Separate bots from users in the UI (#720)
  • Add a feedback link to the prototype (#715)
  • Support combining multiple repositories into a single graph (#711)
  • Normalize scores so that 1000 cred is split amongst users (#709)
  • Stop persisting weights in local store (#706)
  • Execute GraphQL queries with exponential backoff (#699)
  • Introduce a simplified Git plugin that only tracks commits (#685)
  • Rename cred explorer table columns (#680)
  • Display version string in the app's footer
  • Support hosting SourceCred instances at arbitrary gateways, not just the root of a domain (#643)
  • Aggregate over connection types in the cred explorer (#502)
  • Start tracking changes in CHANGELOG.md