0c2908dbfb
Summary: This patch adds independent exponential backoff to each individual GitHub GraphQL query. We remove the fixed `GITHUB_DELAY_MS` delay before each query in favor of this solution, which requires no additional configuration (thus resolving a TODO in the process). We use the NPM module `retry` with its default settings: namely, a maximum of 10 retries with factor-2 backoff starting at 1000ms. Empirically, it seems very unlikely that we should require much more than 2 retries for a query. (See Test Plan for more details.) This is both a short-term unblocker and a good kind of thing to have in the long term. Test Plan: Note that `yarn test --full` passes, including `fetchGithubRepoTest.sh`. Consider manual testing as follows. Add `console.info` statements in `retryGithubFetch`, then load a large repository like TensorFlow, and observe the output: ```shell $ node bin/sourcecred.js load --plugin github tensorflow/tensorflow 2>&1 | ts -s '%.s' 0.252566 Fetching repo... 0.258422 Trying... 5.203014 Trying... [snip] 1244.521197 Trying... 1254.848044 Will retry (n=1)... 1260.893334 Trying... 1271.547368 Trying... 1282.094735 Will retry (n=1)... 1283.349192 Will retry (n=2)... 1289.188728 Trying... [snip] 1741.026869 Ensuring no more pages... 1742.139978 Creating view... 1752.023697 Stringifying... 1754.697116 Writing... 1754.697772 Done. ``` This took just under half an hour, with 264 queries total, of which: - 225 queries required 0 retries; - 38 queries required exactly 1 retry; - 1 query required exactly 2 retries; and - 0 queries required 3 or more retries. wchargin-branch: github-backoff |
||
---|---|---|
config | ||
flow-typed/npm | ||
scripts | ||
sharness | ||
src | ||
.eslintrc.js | ||
.flowconfig | ||
.gitignore | ||
.prettierignore | ||
.prettierrc.json | ||
.travis.yml | ||
CHANGELOG.md | ||
CONTRIBUTING.md | ||
LICENSE | ||
README.md | ||
package.json | ||
yarn.lock |
README.md
SourceCred
Vision
Open source software is amazing, and so are its creators and maintainers. How amazing? It's difficult to tell, since we don't have good tools for recognizing those people. Many amazing open-source contributors labor in the shadows, going unappreciated for the work they do.
SourceCred will empower projects to track contributions and create cred, a reputational measure of how valuable each contribution was to the project. Algorithmically, contributions will be organized into a graph, with edges representing connections between contributions. Then, a configurable PageRank algorithm will distill that graph into a cred attribution.
SourceCred is dogfooding itself. People who contributes to SourceCred—by writing bug reports, participating in design discussions, or writing pull requests—will receive cred in SourceCred.
Design Goals
SourceCred development is organized around the following high-level goals.
- Transparent
It should be easy to see why cred is attributed as it is, and link a person's cred directly to contributions they've made.
- Community Controlled
Each community has the final say on what that community's cred is. We don't expect an algorithm to know what's best, so we'll empower communities to use algorithmic results as a starting point, and improve results with their knowledge.
- Decentralized
Individual projects and communities will control their own SourceCred instances, and own their own data. The SourceCred creators won't have the power to control or modify other projects' cred.
- Forkable
Forking is important to open source, and gives people the freedom to vote with their feet. SourceCred will support forking, and forks will be able to modify their cred independently of the original.
- Flexible & Extensible
SourceCred is focused on open-source projects for now, but we think it can be a general system for building reputation networks. We're organizing around very flexible core abstractions, and a plugin architecture for specific domains.
Current Status
As of July 2018, it's still early days for SourceCred! So far, we've set the following foundations:
- the graph class is the heart of SourceCred, and we've spent a lot of time polishing those APIs 🙂
- the GitHub plugin downloads data from GitHub and imports it into a graph
- the Git plugin clones a Git repository and imports it into a graph
- our PageRank implementation does cred attribution on the graph
- the cred explorer makes the PageRank results transparent
The PageRank results aren't very good yet - we need to add more configurability to get higher quality results. We're working out improvements in this issue.
Roadmap
The team is focused right now on building an end-to-end beta that can import GitHub repositories and produce a reasonable and configurable cred attribution. We hope to have the beta ready by November 2018.
Running the Prototype
If you'd like to try it out, you can run a local copy of SourceCred using the following commands. You need to have node and yarn installed first. This repo is stable and tested on Node version 8.x.x, and Yarn version 1.7.0. You also need to get a GitHub API access token. This token does not need any specific permissions.
git clone https://github.com/sourcecred/sourcecred.git
cd sourcecred
yarn install
yarn backend
export SOURCECRED_GITHUB_TOKEN=YOUR_GITHUB_TOKEN
node bin/sourcecred.js load REPO_OWNER/REPO_NAME
# this loads sourcecred data for a particular repository
yarn start
# then navigate to localhost:3000 in your browser
For example, if you wanted to look at cred for ipfs/js-ipfs, you could run:
$ export SOURCECRED_GITHUB_TOKEN=0000000000000000000000000000000000000000
$ node bin/sourcecred.js load ipfs/js-ipfs
replacing the big string of zeros with your actual token.
Contributing
We’d love to accept your contributions! Please join our Discord to get in touch with us, and check out our contributing guide to get started.