Summary:
This commit slightly reorganizes the internals of `basicPagerank` to use
the `SparseMarkovChain` type from the `markovChain` module.
Test Plan:
Behavior of `yarn start` is unchanged.
wchargin-branch: use-sparsemarkovchain
Summary:
This function is mostly useful for easily describing Markov chains in
test cases.
Test Plan:
Unit tests added. Run `yarn test`.
wchargin-branch: sparseMarkovChainFromTransitionMatrix
Summary:
For now, this module has just two types: `Distribution` and
`SparseMarkovChain`. We’ll gradually pull code from `basicPagerank` into
this module, adding unit tests along the way.
Test Plan:
None required.
wchargin-branch: markov-chain-module
Summary:
See #66 for more context. This yields the following performance
improvements for me, on the SourceCred graph with 11 072 nodes and
20 250 edges:
- Loading a graph from disk is improved overall from 1172 ms to 292 ms
(4.0× improvement).
- The full pipeline for basic PageRank, from button press to final
render, is improved from 8.44 s to 4.39 s (1.9× improvement).
- The PageRank preprocessing step, which involves turning the graph
into a typed array Markov chain, is improved from 2430 ms to 573 ms
(4.2× improvement).
- The PageRank postprocessing step, which involves turning the typed
array distribution into an `AddressMap` distribution, is improved
from 83.53 ms to 4.81 ms (17× improvement).
- The `PagerankTable` React component `render` method (constructing
the virtual tree only, not diffing or embedding into the DOM) is
improved from 1708 ms to 332 ms (5× improvement).
The core matrix computations of PageRank are unaffected, because they do
not use the `AddressMap` abstraction.
Test Plan:
Existing unit tests suffice.
wchargin-branch: fast-addressmap
Summary:
This takes `AddressMap` access, and therefore JSON stringification, off
the critical path, resulting in a significant performance increase. The
resulting code is much faster than the original TFJS implementation. On
my laptop, we can run about 300 iterations of PageRank per second on a
graph with 10 000 nodes and 18 000 edges (namely, the SourceCred graph).
Paired with @decentralion.
Test Plan:
Run `yarn start` and note that the cred attribution for SourceCred is
roughly the same as before… but is created faster.
wchargin-branch: pagerank-typed-arrays
Summary:
We’re not convinced that using TFJS at this time is worth it, for two
reasons. First, our matrix computations can be expressed using sparse
matrices, which will improve the performance by orders of magnitude.
Sparse matrices do not appear to be supported by TFJS (the layers API
makes some use of them, but it is not clear that they have much support
their, either). Second, having to deal with GPU memory and WebGL has
already been problematic. When WebGL PageRank is running, the machine is
mostly unusable, and other applications’ video output is negatively
affected (!).
This commit rewrites the internals of `basicPagerank.js` while retaining
its end-to-end public interface. We also add a test file with a trivial
test. The resulting code is not faster yet—in fact, it’s a fair amount
slower. But this is because our use of `AddressMap`s puts JSON
stringification on the critical path, which is obviously a bad idea. In
a subsequent commit, we will rewrite the internals again to use typed
arrays.
Paired with @decentralion.
Test Plan:
The new unit test is not sufficient. Instead, run `yarn start` and
re-run PageRank on SourceCred; note that the results are roughly
unchanged.
wchargin-branch: pagerank-without-tfjs
We were sorting low-to-high, and then reversing. We can just sort
high-to-low.
Test plan: No behavior change (confirm by interacting with the Cred
Explorer).
Paired with @wchargin
Test plan: Open the cred explorer, and use the + sign to expand the
neighbors. Observe that those neighbors are now sorted by cred.
Paired with @wchargin
Test plan: Open the cred explorer, and try clicking the + signs. They
will expand a recursive table showing the neighbor nodes and their cred.
Paired with @wchargin
`PagerankTable` is forked from `ContributionList`.
Test plan: I took it for a spin and it seemed OK. I'm not inclined to
write formal tests because we are in rapid prototyping mode stil.
Test Plan:
Load the cred explorer for the first time to see two empty boxes.
Refresh to see the same thing. Then, add some content to each box.
Refresh and see the same content.
wchargin-branch: cred-explorer-localstorage
This re-organizes the GitHub porcelain tests to be:
- organized by each method signature, rather than having blocks that
test many different methods on each wrapper
- make extensive use of snapshot tests for convenience
Test plan: Inspect the new unit tests, and the corresponding snapshots.
It should be relatively easy to do this because you can copy+paste the
urls to verify the properties.
Summary:
This fixes two problems in the previous version:
- A new JS file not checked into git, but with a `@flow` directive,
would cause `ensure-flow` to fail, because one list of files was
from `git grep` and the other was from `find`.
- Only the hard-coded directories `src config scripts` were searched.
Now, we search all JS files checked into Git, except for some hard-coded
exceptions, namely `flow-typed`.
Test Plan:
1. Add `foo.js`, not checked into Git. Note that `ensure-flow` passes.
2. Add `@flow` to `foo.js`, and note that `ensure-flow` still passes.
3. Remove `@flow` from `.eslintrc.js`, and note that `ensure-flow`
fails and nicely prints the filename. (Note: this file is at the
repository root.)
4. Create a file `echo stuff >$'naughty\nfilename.js'`, and note that
`ensure-flow` has the correct behavior in both positive and
negative cases.
wchargin-branch: ensure-flow-improvements
Summary:
Even though it’s not really a source file, and it lives at the
repository root, it might as well have typing to make sure that we don’t
do anything really dumb.
Test Plan:
`yarn flow` still passes.
wchargin-branch: flow-eslintrc
Test plan: Unit tests were added.
(Note: I haven't tested the error case, when there are an invalid number
of parents. I think this is OK for now, but I'm disclosing this so
reviewers can easily take issue with it. I'm planning to re-organize the
test code to be by method rather than by wrapper type (so the wrappers
section doesn't keep being a kitchen sink) and will hopefully
put that test in.)
Also updates the GitHub porcelain.
Existing observable behavior is unchanged, except that performance may
be improved for issueOrPrByNumber.
A bug that would afflict a multi-repository graph (namely, that calling
`repo.issues()` would get all issues for all repositories) is
pre-emptively removed. No test cases were added as we do not yet support
multi-repository graphs.
Test plan: existing unit test coverage is sufficient.
Currently, the `authors` method is attached at the Repository level.
This is incorrect; it actually finds all the authors in the graph, not
all the authors of that repository. This commit moves the method to the
correct class.
Test plan: This function is only used in test code. The tests still
pass.
Summary:
We don’t expect the results to be of good quality right now. Rather,
this gives us a starting point from which to iterate the algorithm.
The convergence criterion also needs to be adjusted. (In particular, it
should almost certainly not be a constant.)
Test Plan:
Run `yarn start`. Select a graph, like `sourcecred/example-github`. Open
the JS console and click “Run basic PageRank”. Watch the console.
wchargin-branch: basic-pagerank
Summary:
This commit enables the cred explorer to fetch pre-generated graphs. The
form has poor UX, but gets the job done. (To do later: saving in
localStorage, allowing fetching the list of possible graphs, linking the
submit button so that it’s triggered by `Enter`, etc.).
Test Plan:
Set up with `node ./bin/sourcecred.js graph sourcecred sourcecred`, then
run `yarn start` and check the following cases:
- success case: `sourcecred`/`sourcecred`
- error case: `sarcecrod`/`sarcecrod` (nice console error)
- error case: the repository owner or name is an empty string or has
invalid characters, like `../../secret.txt` (nice console error)
wchargin-branch: fetch-graphs
Summary:
This commit changes `yarn start` to run a production version of the API
server, which serves static files from a pre-built directory and also
handles API requests.
Test Plan:
```shell
$ yarn build
$ yarn backend
$ node ./bin/sourcecred.js start -d /tmp/srccrd &
$ mkdir -p /tmp/srccrd
$ echo hello >/tmp/srccrd/world
$ curl -s localhost:4000/ | file -
/dev/stdin: HTML document, ASCII text, with very long lines, with no line terminators
$ curl -s localhost:4000/api/v1/data/world
hello
```
wchargin-branch: cli-start-prod-server
Test plan: Run `yarn start`, and observe that the Cred Explorer is now
included in the nav bar, and can be selected, in which case a short
message is displayed.
testUtil contains some useful configuration endpoints for our frontend
testing. This commit moves it out of the artifact editor and up to
`src/app`.
Test plan: `yarn travis` passes.
@wchargin suggested that the entity-wrapping logic in porcelain
reference handling should be factored out as its own method. This was a
great suggestion; it will be very useful for plugin consumers, and it
also results in better test coverage.
Test plan: The new unit tests are nice. For your own safety, do not
question or quibble with the magnificent foo plugin.
Summary:
This way, our frontend can talk to a backend that can read from the
filesystem (among other things).
Paired with @decentralion.
Test Plan:
```
$ yarn backend
$ SOURCECRED_DIRECTORY=/tmp/srccrd yarn start
$ # verify that the browser looks good
$ mkdir /tmp/srccrd
$ echo hello >/tmp/srccrd/world
$ curl localhost:3000/api/v1/data/world
hello
$ curl localhost:4000/api/v1/data/world
hello
```
wchargin-branch: webpack-proxy
It was doing some clever array construction that added possible booleans
to the array, then filtered them out. To make the typing simpler for
Flow's inspection, we now only put string elements in the array.
Test plan: `yarn travis --full` passed, and the CLI still works.
Summary:
- The value of `process.stdout.isTTY` is either `true` or `undefined`.
Flow (reasonably) dislikes this, so we add an explicit check.
- More `package.json` burnination.
Test Plan:
Note that `require("./package.json").proxy === undefined` in the Node
console, and that `yarn start` works.
wchargin-branch: flow--scripts-start
- scripts/backend.js: We incorrectly set an environment variable to
a boolean, when in fact it must be a string. Fixed it to set a string
value "true", and updated usage in config/babel.js
- scripts/test.js: No changes
- scripts/build.js: Removed a call to printHostingInstructions, so that
we don't need to require the package.json.
Test plan:
`yarn travis --full` passes, and the SourceCred cli still works.
Fixing the flow error corresponded to (correctly) documenting that the
GitHub token is mandatory, not optional.
Test plan: `yarn travis --full` still passes.
- Fix accidental string-to-NaN coercion in ensureSlash
- Don't dynamically require package.json; to determine public url, just
use the environment variable or "/"
Test plan: `yarn start` and travis still work
Summary:
This way, different plugins can have `LocalStore`s with different cache
keys.
Test Plan:
Note that the app (`yarn start`) preserves the local store values from
before this change, and that updates continue to work.
wchargin-branch: generic-localstore
This commit modifies App.js to use routing, such that it's possible to
navigate between a home screen, and the Artifact Editor.
Test plan: Run `yarn start`, and navigate between the home screen and
artifact editor. Verify that the artifact editor still works as
expected.
This commit adds src/app/App.js, which proxies in the frontend from
src/plugins/artifact/editor/App.js. The observed behavior (run `yarn
start`; see Artifact Editor) is unchanged.
Test plan: Observe that `yarn start` has the same behavior, and travis
passes.
This commit executes a micro-refactor to move all top-level app setup
code out of src/plugins/artifact/editor and into src/app. The observed
behavior from `yarn start`, which is to show the artifact editor, is
unchanged.
Summary:
We need a way for our web applications to interact with data on the
filesystem. In this commit, we introduce a webserver that serves
statically from two directory trees: first, the result of a live-updated
Webpack build; second, the SourceCred data directory.
Test Plan:
Run `yarn backend` and `node ./bin/sourcecred.js start`. When ready,
navigate to the server’s root route in a web browser. Note that a nice
React app is displayed. Then, change something in that React app source.
Note that the server console displays Webpack’s update messages, and
that refreshing the page in the browser renders the new version of the
app. Finally, visit
/__data__/graphs/sourcecred/example-github/graph.json
in the browser to see the graph for the example repository, assuming
that you had generated its graph previously.
wchargin-branch: start
This script ensures that either //@flow or //@no-flow is present in
every js file. Every existing js file that would fail this check has
been given //@no-flow, we should work to remove all of these in the
future.
Test plan:
I verified that `yarn travis` fails before fixing the other js files,
and passes afterwards.