Commit Graph

416 Commits

Author SHA1 Message Date
William Chargin 719bf47156
Merge `merge`s (#297)
Summary:
We had three graph merging functions: `merge`, `mergeConservative`, and
`mergeManyConservative`. Of these, `merge` was never used outside of
code directly testing its behavior, and `mergeConservative` is a
strictly inferior version of `mergeManyConservative`. This commit
removes `merge` and `mergeConservative`, and renames
`mergeManyConservative` to `mergeConservative`.

Paired with @decentralion.

Test Plan:
Existing unit tests suffice; some useless tests pruned.

wchargin-branch: mmeerrggee
2018-05-25 13:55:44 -07:00
Dandelion Mané 418d046691
Remove Node/Edge polymorphism on Graph (#289)
As previously implemented, the Graph was polymorphic in its NodePayload
and EdgePayload. This was a lot of bookkeeping for very
little apparent benefit. In general, graphs may be constructed with
foreign plugins, so it's hard to do anything with this information.

In my experience, having the Graph polymorphism has never caught a bug,
and has led to lots of boilerplate code, especially typing `Graph<any,
any>`.  I view the fact that in #286 we added a new core `NodeReference`
concept, which always types its Graph as `<any, any>`, as strongly
suggestive that this was not going to provide any lasting value.

In this commit, I've removed the Graph polymorphism. Note how in many
cases where we were typing the graph, it provided no value, as evidenced
by the fact that the imported Node and Edge types were used no-where
else in the file other than in the Graph declaration.

Test plan:
Removing extra typing information is very unlikely to cause regressions.
`yarn flow` and `yarn lint` both pass.
2018-05-24 12:03:47 -07:00
William Chargin e20b794edc
Add design doc for address–payload unification (#296)
Summary:
See #190, and discussion on this PR, for context.

wchargin-branch: address-payload-unification-design
2018-05-24 11:37:04 -07:00
William Chargin 2b301f9159
Use indexed edges in graph internals (#295)
Summary:
This is an implementation-only, API-preserving change to the `Graph`
class. Edges’ `src` and `dst` attributes are now internally represented
as integer indices into a fixed ordering of nodes, which may depend on
non-logical properties such as insertion order. The graph’s serialized
form also now stores edges with integer `src`/`dst` keys, but the node
ordering is canonicalized so that two graphs are logically equal if and
only if their serialized forms are equal. This change substantially
reduces the rest storage space for graphs: the `sourcecred/sourcecred`
graph drops from 39MB to 30MB.

Currently, the graph will have to translate between integer indices and
full addresses at each client operation. This is not actually a big
performance regression, because it is just one more integer-index
dereference over the previous behavior, but it does indicate that the
optimization is not living up to its full potential. In subsequent
changes, the `NodeReference` class will be outfitted with facilities to
take advantage of the internal indexing; a long-term goal is that
roughly all operations should be able to be performed within the indexed
layer, and that translating between integers and addresses should only
happen at non-hot-path API boundaries.

This diff is considerably smaller and easier to read with `-w`.

Paired with @decentralion.

Test Plan:
I inspected the snapshots for general form, and manually verified that
the indices for one edge were correct (the MERGED_AS edge for the head
commit of the example repo). Other than that, existing unit tests mostly
suffice; one new unit test added.

wchargin-branch: graph-indexed-edges
2018-05-22 13:15:39 -07:00
Dandelion Mané 5a40bb0a30
Use versioning for `Graph` and `AddressMap` (#293)
This commit adds explicit versioning to the `Graph` and `AddressMap`
JSON representations, using the new `compat` module. This will make it
safer to change the serialization format for these classes.

(Note: I don't expect we'll add backcompat handlers for these classes
soon, but having versioning means that we can change the serialization
format in a way that breaks old data cleanly and explicitly, rather than
introducing undefined behavior.)

Test plan: The changes are slight, and well-captured by the snapshot
tests. Note that after this commit, the SourceCred commands will fail on
old data, so old data will need to be regenerated.
2018-05-21 11:47:05 -07:00
Dandelion Mané 1fc860bd56
Add data versioning utilities (#292)
This commit adds the `src/util/compat` module, which is responsible for
data compatibility. It provides two methods: `toCompat` and
`fromCompat`.

`toCompat` allows an object to be versioned with a type and version
string. Doing so wraps the object in an array whose first element has
`version` and `type` fields.

`fromCompat` allows loading compatibilized data. It takes a `type` and
`version` expectation, as well as optional processors for various data
versions. It throws an error if the data was not compatibilized, or if
it is the wrong type, or if it has an unsupported version.

The versioning is designed to be composable; see the final test block
for an example of loading a structure with nested versioning.

Test plan: Carefully inspect the included unit tests, and feel free to
propose more.

For context, see #280.
2018-05-21 11:21:05 -07:00
Dandelion Mané 180c3454af
Upgrade babel-plugin-flow-react-proptypes to 23 (#294)
Will allow us to use opaque types in #292 without breaking the build
in #293.

Test plan:
If travis passes, we're good.
2018-05-21 11:11:21 -07:00
Dandelion Mané 610a92a683
Run commands/graph with more heap available (#291)
`sourcecred graph` tends to die due to lack of heap during the Git
plugin. Node defaults to ~1.76GB of heap available, which is just not
that much. This commit uses the `--max_old_space_size` argument to
`node` to increase the limit. We use a default value of `8192`, and it's
configurable by the user via a flag.

This command is only available natively in `sourcecred graph`, because
that command turns on other node processes. For commands that run in
their original process, you would need to set the value yourself.
2018-05-16 20:07:40 -07:00
Dandelion Mané 64a8514cf8
markovChain should scale to 1m nodes (#290)
`markovChain.findStationaryDistribution` is currently written such that
it blows the function call stack size if it is called with >200k nodes.

This commit effects a small refactor so that the delta is computed in a
loop over elements in the distribution, rather than creating a massive
function arguments list. This results in the correct behavior.

Test plan:
We discussed offline and decided not to test it. Reasons:

- We could factor out `computeDelta` and test that function in
isolation, but I think this provides very, very little value. The class
of regression it protects against is almost precisely the case where
someone reverts this commit.

- We could test the entire `findStationaryDistribution` function, which
is more valuable, but requires a large, slow test case (~1 sec)

- This is a bug that is unlikely to re-occur or to slip by unnoticed if
it does. It's also unlikely to surface in future refactoring.
2018-05-16 19:27:41 -07:00
Dandelion Mané 1bd444a33b
Refactor Git plugin to use NodeReference (#288)
See #286 for context.

I also upgraded client code in src/app.

Test plan:
Unit tests are extensive, including testing that `get`, `ref`, and
constructors are overriden on every `GitReference` and `GitPorcelain`
type that is exposed to clients.

Paired with @wchargin
2018-05-15 19:10:30 -07:00
Dandelion Mané bb77c36626
Update GitHub to use NodeReference & NodePorcelain (#287)
See #286 for context. There are a few miscellaneous changes in
src/app/credExplorer to change clients to use the new API.

Test plan:
New unit test were added, and existing behavior is preserved. Most of
the functionality of the GitHub porcelain was already well tested.

Paired with @wchargin
2018-05-15 17:41:36 -07:00
Dandelion Mané 7ccef98c87
Add `NodeReference` and `NodePorcelain` to core (#286)
`NodeReference` and `NodePorcelain` act as abstractions over the two
states a Node can be in.

- We might have the address of a node (because some edge pointed to it),
but don't actually have the Node in the graph. In that case, we can do
some queries on the node (e.g. find its neighbors), but can't access
the payload. This corresponds to having a `NodeReference`.

- We might have the node in the graph. In that case, we can access a
`NodePorcelain`.

The main benefit this abstraction brings is type-safety over accessing
data from a `NodePayload`. Previously, the coding conventions encouraged
clients to ignore the distinction, and the type signatures incorrectly
reported that many payload-level properties were non-nullable. Now, the
`get` method that mapp a `Reference` to a `Payload` is explicilty
nullable.

Given a `NodePorcelain`, it's always possible to retrieve the reference
via `ref()`. Given the `NodeReference`, you might be able to retrieve
the `NodePorcelain` via `get()`.

Clients that subtype `NodePorcelain` and `NodeReference` should, in
general, override the `ref()` and `get()` methods to return their
subtype. We also recommend having subclasses overwrite the constructors
to take a base `NodePorcelain` and `NodeReference` respectively
(although the base classes take a `Graph` and `address` as constructor
arguments).

Test Plan: Inspect the unit tests, they are pretty thorough.

Paired with @wchargin
2018-05-15 17:25:20 -07:00
William Chargin f31d2c517d
Upgrade Flow to v0.72.0 (#285)
Summary:
A few changes were made to code that is correct (as far as I can tell),
but for which Flow can no longer infer a type parameter. The change is a
bit more annoying than it otherwise would be, because this particular
file is run directly via node and so must use Flow’s comment syntax for
type annotations, but Prettier breaks such comments in the cases that we
need. We work around this by rewriting the original code to avoid the
need for comments.

Test Plan:
In addition to standard CI, run `yarn build` and then run a server from
`build/`, to see that the production build produces a working bundle.
(That the app loads and renders is sufficient.)

wchargin-branch: upgrade-flow-v0.72.0
2018-05-15 17:09:29 -07:00
William Chargin 9591792f59
Separate GitHub `issueByNumber`/`prByNumber` (#284)
Summary:
Instead of having one function that returns a union, we present two
functions, each of which returns a more specific type.

Paired with @decentralion.

Test Plan:
Existing unit tests pass, with sufficient modifications.

wchargin-branch: separate-issueByNumber-prByNumber
2018-05-14 18:15:45 -07:00
Dandelion Mané fb8da7fcdb
Add porcelain for the Git plugin (#281)
This commit adds "Node Porcelain" for the Git plugin. Node porcelain is
a wrapper over a graph and node address, which makes it possible to
access payload data and adjacent nodes via a familiar object-oriented
API.

I believe this porcelain provides substantially better legibility and
usability for the Git plugin. Consider that it is now easy to see what
relationships each Git node type can have by reading the method
signatures in the porcelain, rather than needing to inspect all of the
Edge types in types.js.

This porcelain has slightly different conventions from the porcelain in
the GitHub plugin, although the APIs are very similar. I intend to
follow this commit with two more: one that switches clients of the Git
plugin to use the porcelain, and another that refactors the GitHub and
Git porcelains to use a base Porcelain implementation in src/core.

Test plan:
Examine the public API of the Git porcelain (this is unlikely to change
much), and its corresponding test code.

`yarn travis --full` passes.
2018-05-14 12:46:20 -07:00
William Chargin 115d7f3921
Extract `findStationaryDistribution` (#277)
Test Plan:
Unit tests added. Run `yarn test`.

wchargin-branch: extract-findStationaryDistribution
2018-05-11 21:56:52 -07:00
William Chargin 9d7f9f78cd
Make `findStationaryDistribution` configurable (#276)
Summary:
There are substantive options for `convergenceThreshold` and
`maxIterations`, as well as the output option `verbose`. This change is
made in preparation for extracting this function into `markovChain`,
where we will add unit tests for it.

Test Plan:
Behavior of `yarn start` is unchanged.

wchargin-branch: configurable-findstationarydistribution
2018-05-11 21:45:26 -07:00
William Chargin 0a608acbff
Extract `sparseMarkovChainAction` (#275)
Test Plan:
Unit tests added. Run `yarn test`.

wchargin-branch: extract-sparseMarkovChainAction
2018-05-11 21:38:07 -07:00
William Chargin 69b9f6657d
Extract `uniformDistribution` (#274)
Test Plan:
Unit tests added. Run `yarn test`.

wchargin-branch: extract-uniformDistribution
2018-05-11 21:35:02 -07:00
William Chargin 017fbd774a
Use `SparseMarkovChain` in `basicPagerank` (#273)
Summary:
This commit slightly reorganizes the internals of `basicPagerank` to use
the `SparseMarkovChain` type from the `markovChain` module.

Test Plan:
Behavior of `yarn start` is unchanged.

wchargin-branch: use-sparsemarkovchain
2018-05-11 21:28:58 -07:00
William Chargin e5472752ac
Allow converting transition matrix to sparse chain (#272)
Summary:
This function is mostly useful for easily describing Markov chains in
test cases.

Test Plan:
Unit tests added. Run `yarn test`.

wchargin-branch: sparseMarkovChainFromTransitionMatrix
2018-05-11 21:22:10 -07:00
William Chargin 3bd449d1c3
Create a `markovChain.js` module (#271)
Summary:
For now, this module has just two types: `Distribution` and
`SparseMarkovChain`. We’ll gradually pull code from `basicPagerank` into
this module, adding unit tests along the way.

Test Plan:
None required.

wchargin-branch: markov-chain-module
2018-05-11 21:14:56 -07:00
William Chargin e9e001b894
Switch AddressMap implementation to nested maps (#278)
Summary:
See #66 for more context. This yields the following performance
improvements for me, on the SourceCred graph with 11 072 nodes and
20 250 edges:

  - Loading a graph from disk is improved overall from 1172 ms to 292 ms
    (4.0× improvement).

  - The full pipeline for basic PageRank, from button press to final
    render, is improved from 8.44 s to 4.39 s (1.9× improvement).

  - The PageRank preprocessing step, which involves turning the graph
    into a typed array Markov chain, is improved from 2430 ms to 573 ms
    (4.2× improvement).

  - The PageRank postprocessing step, which involves turning the typed
    array distribution into an `AddressMap` distribution, is improved
    from 83.53 ms to 4.81 ms (17× improvement).

  - The `PagerankTable` React component `render` method (constructing
    the virtual tree only, not diffing or embedding into the DOM) is
    improved from 1708 ms to 332 ms (5× improvement).

The core matrix computations of PageRank are unaffected, because they do
not use the `AddressMap` abstraction.

Test Plan:
Existing unit tests suffice.

wchargin-branch: fast-addressmap
2018-05-11 19:46:54 -07:00
William Chargin 3e70edb3be
Use typed arrays for PageRank (#267)
Summary:
This takes `AddressMap` access, and therefore JSON stringification, off
the critical path, resulting in a significant performance increase. The
resulting code is much faster than the original TFJS implementation. On
my laptop, we can run about 300 iterations of PageRank per second on a
graph with 10 000 nodes and 18 000 edges (namely, the SourceCred graph).

Paired with @decentralion.

Test Plan:
Run `yarn start` and note that the cred attribution for SourceCred is
roughly the same as before… but is created faster.

wchargin-branch: pagerank-typed-arrays
2018-05-11 13:22:36 -07:00
William Chargin 7e97ba6bf3
Rewrite basic PageRank without TFJS (#266)
Summary:
We’re not convinced that using TFJS at this time is worth it, for two
reasons. First, our matrix computations can be expressed using sparse
matrices, which will improve the performance by orders of magnitude.
Sparse matrices do not appear to be supported by TFJS (the layers API
makes some use of them, but it is not clear that they have much support
their, either). Second, having to deal with GPU memory and WebGL has
already been problematic. When WebGL PageRank is running, the machine is
mostly unusable, and other applications’ video output is negatively
affected (!).

This commit rewrites the internals of `basicPagerank.js` while retaining
its end-to-end public interface. We also add a test file with a trivial
test. The resulting code is not faster yet—in fact, it’s a fair amount
slower. But this is because our use of `AddressMap`s puts JSON
stringification on the critical path, which is obviously a bad idea. In
a subsequent commit, we will rewrite the internals again to use typed
arrays.

Paired with @decentralion.

Test Plan:
The new unit test is not sufficient. Instead, run `yarn start` and
re-run PageRank on SourceCred; note that the results are roughly
unchanged.

wchargin-branch: pagerank-without-tfjs
2018-05-11 13:11:14 -07:00
Dandelion Mané 2a52ff85f8 Cred Explorer: Modify color based on table depth
Test plan: Open the cred explorer, and expand some nodes.

Paired with @wchargin
2018-05-10 17:05:01 -07:00
Dandelion Mané 880b0099e9 Remove unnecessary reversals in sort routine
We were sorting low-to-high, and then reversing. We can just sort
high-to-low.

Test plan: No behavior change (confirm by interacting with the Cred
Explorer).

Paired with @wchargin
2018-05-10 17:05:01 -07:00
Dandelion Mané 8a4b9592b1 Sort within recursive neighbor enumeration
Test plan: Open the cred explorer, and use the + sign to expand the
neighbors. Observe that those neighbors are now sorted by cred.

Paired with @wchargin
2018-05-10 17:05:01 -07:00
Dandelion Mané 121b83717f Cred Explorer: Recursively show neighbor nodes
Test plan: Open the cred explorer, and try clicking the + signs. They
will expand a recursive table showing the neighbor nodes and their cred.

Paired with @wchargin
2018-05-10 17:05:01 -07:00
Dandelion Mané 4bea0133db Set expanded state, update via +/− buttons
Test plan: Try clicking on the buttons and see that they toggle between
plus and minus. They don't do anything else.

Paired with @wchargin
2018-05-10 17:05:01 -07:00
Dandelion Mané 6714d4b95e Reorder cred explorer columns
Now the node description is first.

Test plan: Observe that the behavior is unchanged, except for the
columnar order.

Paired with @wchargin
2018-05-10 17:05:01 -07:00
Dandelion Mané 0dae0c995f Factor out the non-recursive RecursiveTable
Test plan: Behavior is unchanged; manually verify.

Paired with @wchargin
2018-05-10 17:05:01 -07:00
Dandelion Mané 2f0f523065 s/typeFilter/topLevelFilter/g
Test plan: No change, it's just a variable rename.

Paired with @wchargin
2018-05-10 17:05:01 -07:00
Dandelion Mané 7fc31f6a26
Add `PagerankTable` for exploring PageRank results (#264)
`PagerankTable` is forked from `ContributionList`.

Test plan: I took it for a spin and it seemed OK. I'm not inclined to
write formal tests because we are in rapid prototyping mode stil.
2018-05-10 14:49:00 -07:00
William Chargin bb2ec756a4
Save explorer’s repo settings in localStorage (#263)
Test Plan:
Load the cred explorer for the first time to see two empty boxes.
Refresh to see the same thing. Then, add some content to each box.
Refresh and see the same content.

wchargin-branch: cred-explorer-localstorage
2018-05-10 14:42:54 -07:00
Dandelion Mané ed1f17f8ca
Add `nodeDescription` for GitHub nodes (#261)
`nodeDescription` gives a short, readable description of the content at
a given node.

Test plan: View the included snapshot test.
2018-05-10 14:08:06 -07:00
William Chargin d21ad1312b
Add `git/render.js` with `nodeDescription` (#262)
Test Plan:
Unit tests added, with full coverage of reachable cases.

wchargin-branch: git-render
2018-05-10 14:02:19 -07:00
Dandelion Mané 2a88bbc091
Reorganize GitHub porcelain tests (#260)
This re-organizes the GitHub porcelain tests to be:
- organized by each method signature, rather than having blocks that
test many different methods on each wrapper
- make extensive use of snapshot tests for convenience

Test plan: Inspect the new unit tests, and the corresponding snapshots.
It should be relatively easy to do this because you can copy+paste the
urls to verify the properties.
2018-05-10 12:53:32 -07:00
William Chargin 47bec6cc10
Make `ensure-flow.sh` more precise and accurate (#259)
Summary:
This fixes two problems in the previous version:
  - A new JS file not checked into git, but with a `@flow` directive,
    would cause `ensure-flow` to fail, because one list of files was
    from `git grep` and the other was from `find`.
  - Only the hard-coded directories `src config scripts` were searched.

Now, we search all JS files checked into Git, except for some hard-coded
exceptions, namely `flow-typed`.

Test Plan:
  1. Add `foo.js`, not checked into Git. Note that `ensure-flow` passes.
  2. Add `@flow` to `foo.js`, and note that `ensure-flow` still passes.
  3. Remove `@flow` from `.eslintrc.js`, and note that `ensure-flow`
     fails and nicely prints the filename. (Note: this file is at the
     repository root.)
  4. Create a file `echo stuff >$'naughty\nfilename.js'`, and note that
     `ensure-flow` has the correct behavior in both positive and
     negative cases.

wchargin-branch: ensure-flow-improvements
2018-05-10 12:38:39 -07:00
William Chargin 1901d471f3
Add `@flow` to `.eslintrc.js` (#258)
Summary:
Even though it’s not really a source file, and it lives at the
repository root, it might as well have typing to make sure that we don’t
do anything really dumb.

Test Plan:
`yarn flow` still passes.

wchargin-branch: flow-eslintrc
2018-05-10 12:27:46 -07:00
Dandelion Mané 0d6742be39
Add `parent()` relationships for GitHub porcelain (#255)
Test plan: Unit tests were added.
(Note: I haven't tested the error case, when there are an invalid number
of parents. I think this is OK for now, but I'm disclosing this so
reviewers can easily take issue with it. I'm planning to re-organize the
test code to be by method rather than by wrapper type (so the wrappers
section doesn't keep being a kitchen sink) and will hopefully
put that test in.)
2018-05-10 11:41:19 -07:00
Dandelion Mané 04390e5609
Add CONTAINS edges from Repositories to Issues/PRs (#253)
Also updates the GitHub porcelain.
Existing observable behavior is unchanged, except that performance may
be improved for issueOrPrByNumber.

A bug that would afflict a multi-repository graph (namely, that calling
`repo.issues()` would get all issues for all repositories) is
pre-emptively removed. No test cases were added as we do not yet support
multi-repository graphs.

Test plan: existing unit test coverage is sufficient.
2018-05-10 11:34:53 -07:00
Dandelion Mané 9d24190c03
GH Porcelain: add `Repository.from` (#257)
I think the absence of this method when I added the `Repository` class
was a bug.

Test plan: There are new unit tests.
2018-05-10 11:24:58 -07:00
Dandelion Mané 5c44dd0373
GH Porcelain: Move `authors` top-level Porcelain (#254)
Currently, the `authors` method is attached at the Repository level.
This is incorrect; it actually finds all the authors in the graph, not
all the authors of that repository. This commit moves the method to the
correct class.

Test plan: This function is only used in test code. The tests still
pass.
2018-05-10 11:24:24 -07:00
William Chargin 61d3cb3f52
Implement basic PageRank analysis (#252)
Summary:
We don’t expect the results to be of good quality right now. Rather,
this gives us a starting point from which to iterate the algorithm.

The convergence criterion also needs to be adjusted. (In particular, it
should almost certainly not be a constant.)

Test Plan:
Run `yarn start`. Select a graph, like `sourcecred/example-github`. Open
the JS console and click “Run basic PageRank”. Watch the console.

wchargin-branch: basic-pagerank
2018-05-10 11:21:18 -07:00
William Chargin 8e4668cc91
Fetch generated graphs on the frontend (#251)
Summary:
This commit enables the cred explorer to fetch pre-generated graphs. The
form has poor UX, but gets the job done. (To do later: saving in
localStorage, allowing fetching the list of possible graphs, linking the
submit button so that it’s triggered by `Enter`, etc.).

Test Plan:
Set up with `node ./bin/sourcecred.js graph sourcecred sourcecred`, then
run `yarn start` and check the following cases:
  - success case: `sourcecred`/`sourcecred`
  - error case: `sarcecrod`/`sarcecrod` (nice console error)
  - error case: the repository owner or name is an empty string or has
    invalid characters, like `../../secret.txt` (nice console error)

wchargin-branch: fetch-graphs
2018-05-09 12:47:56 -07:00
Dandelion Mané 6ca4f77b6d
Add dependency on tfjs-core (#250) 2018-05-09 10:22:48 -07:00
William Chargin cb1339a0a7
Start a production server from `sourcecred start` (#247)
Summary:
This commit changes `yarn start` to run a production version of the API
server, which serves static files from a pre-built directory and also
handles API requests.

Test Plan:
```shell
$ yarn build
$ yarn backend
$ node ./bin/sourcecred.js start -d /tmp/srccrd &
$ mkdir -p /tmp/srccrd
$ echo hello >/tmp/srccrd/world
$ curl -s localhost:4000/ | file -
/dev/stdin: HTML document, ASCII text, with very long lines, with no line terminators
$ curl -s localhost:4000/api/v1/data/world
hello
```

wchargin-branch: cli-start-prod-server
2018-05-08 22:00:36 -07:00
Dandelion Mané ac8d0ff66c
Setup credExplorer app scaffold (#249)
Test plan: Run `yarn start`, and observe that the Cred Explorer is now
included in the nav bar, and can be selected, in which case a short
message is displayed.
2018-05-08 17:05:56 -07:00
Dandelion Mané 0c59435a2b
Move testUtil.js into src/app (#248)
testUtil contains some useful configuration endpoints for our frontend
testing. This commit moves it out of the artifact editor and up to
`src/app`.

Test plan: `yarn travis` passes.
2018-05-08 17:03:01 -07:00