Commit Graph

348 Commits

Author SHA1 Message Date
William Chargin 9d7f9f78cd
Make `findStationaryDistribution` configurable (#276)
Summary:
There are substantive options for `convergenceThreshold` and
`maxIterations`, as well as the output option `verbose`. This change is
made in preparation for extracting this function into `markovChain`,
where we will add unit tests for it.

Test Plan:
Behavior of `yarn start` is unchanged.

wchargin-branch: configurable-findstationarydistribution
2018-05-11 21:45:26 -07:00
William Chargin 0a608acbff
Extract `sparseMarkovChainAction` (#275)
Test Plan:
Unit tests added. Run `yarn test`.

wchargin-branch: extract-sparseMarkovChainAction
2018-05-11 21:38:07 -07:00
William Chargin 69b9f6657d
Extract `uniformDistribution` (#274)
Test Plan:
Unit tests added. Run `yarn test`.

wchargin-branch: extract-uniformDistribution
2018-05-11 21:35:02 -07:00
William Chargin 017fbd774a
Use `SparseMarkovChain` in `basicPagerank` (#273)
Summary:
This commit slightly reorganizes the internals of `basicPagerank` to use
the `SparseMarkovChain` type from the `markovChain` module.

Test Plan:
Behavior of `yarn start` is unchanged.

wchargin-branch: use-sparsemarkovchain
2018-05-11 21:28:58 -07:00
William Chargin e5472752ac
Allow converting transition matrix to sparse chain (#272)
Summary:
This function is mostly useful for easily describing Markov chains in
test cases.

Test Plan:
Unit tests added. Run `yarn test`.

wchargin-branch: sparseMarkovChainFromTransitionMatrix
2018-05-11 21:22:10 -07:00
William Chargin 3bd449d1c3
Create a `markovChain.js` module (#271)
Summary:
For now, this module has just two types: `Distribution` and
`SparseMarkovChain`. We’ll gradually pull code from `basicPagerank` into
this module, adding unit tests along the way.

Test Plan:
None required.

wchargin-branch: markov-chain-module
2018-05-11 21:14:56 -07:00
William Chargin e9e001b894
Switch AddressMap implementation to nested maps (#278)
Summary:
See #66 for more context. This yields the following performance
improvements for me, on the SourceCred graph with 11 072 nodes and
20 250 edges:

  - Loading a graph from disk is improved overall from 1172 ms to 292 ms
    (4.0× improvement).

  - The full pipeline for basic PageRank, from button press to final
    render, is improved from 8.44 s to 4.39 s (1.9× improvement).

  - The PageRank preprocessing step, which involves turning the graph
    into a typed array Markov chain, is improved from 2430 ms to 573 ms
    (4.2× improvement).

  - The PageRank postprocessing step, which involves turning the typed
    array distribution into an `AddressMap` distribution, is improved
    from 83.53 ms to 4.81 ms (17× improvement).

  - The `PagerankTable` React component `render` method (constructing
    the virtual tree only, not diffing or embedding into the DOM) is
    improved from 1708 ms to 332 ms (5× improvement).

The core matrix computations of PageRank are unaffected, because they do
not use the `AddressMap` abstraction.

Test Plan:
Existing unit tests suffice.

wchargin-branch: fast-addressmap
2018-05-11 19:46:54 -07:00
William Chargin 3e70edb3be
Use typed arrays for PageRank (#267)
Summary:
This takes `AddressMap` access, and therefore JSON stringification, off
the critical path, resulting in a significant performance increase. The
resulting code is much faster than the original TFJS implementation. On
my laptop, we can run about 300 iterations of PageRank per second on a
graph with 10 000 nodes and 18 000 edges (namely, the SourceCred graph).

Paired with @decentralion.

Test Plan:
Run `yarn start` and note that the cred attribution for SourceCred is
roughly the same as before… but is created faster.

wchargin-branch: pagerank-typed-arrays
2018-05-11 13:22:36 -07:00
William Chargin 7e97ba6bf3
Rewrite basic PageRank without TFJS (#266)
Summary:
We’re not convinced that using TFJS at this time is worth it, for two
reasons. First, our matrix computations can be expressed using sparse
matrices, which will improve the performance by orders of magnitude.
Sparse matrices do not appear to be supported by TFJS (the layers API
makes some use of them, but it is not clear that they have much support
their, either). Second, having to deal with GPU memory and WebGL has
already been problematic. When WebGL PageRank is running, the machine is
mostly unusable, and other applications’ video output is negatively
affected (!).

This commit rewrites the internals of `basicPagerank.js` while retaining
its end-to-end public interface. We also add a test file with a trivial
test. The resulting code is not faster yet—in fact, it’s a fair amount
slower. But this is because our use of `AddressMap`s puts JSON
stringification on the critical path, which is obviously a bad idea. In
a subsequent commit, we will rewrite the internals again to use typed
arrays.

Paired with @decentralion.

Test Plan:
The new unit test is not sufficient. Instead, run `yarn start` and
re-run PageRank on SourceCred; note that the results are roughly
unchanged.

wchargin-branch: pagerank-without-tfjs
2018-05-11 13:11:14 -07:00
Dandelion Mané 2a52ff85f8 Cred Explorer: Modify color based on table depth
Test plan: Open the cred explorer, and expand some nodes.

Paired with @wchargin
2018-05-10 17:05:01 -07:00
Dandelion Mané 880b0099e9 Remove unnecessary reversals in sort routine
We were sorting low-to-high, and then reversing. We can just sort
high-to-low.

Test plan: No behavior change (confirm by interacting with the Cred
Explorer).

Paired with @wchargin
2018-05-10 17:05:01 -07:00
Dandelion Mané 8a4b9592b1 Sort within recursive neighbor enumeration
Test plan: Open the cred explorer, and use the + sign to expand the
neighbors. Observe that those neighbors are now sorted by cred.

Paired with @wchargin
2018-05-10 17:05:01 -07:00
Dandelion Mané 121b83717f Cred Explorer: Recursively show neighbor nodes
Test plan: Open the cred explorer, and try clicking the + signs. They
will expand a recursive table showing the neighbor nodes and their cred.

Paired with @wchargin
2018-05-10 17:05:01 -07:00
Dandelion Mané 4bea0133db Set expanded state, update via +/− buttons
Test plan: Try clicking on the buttons and see that they toggle between
plus and minus. They don't do anything else.

Paired with @wchargin
2018-05-10 17:05:01 -07:00
Dandelion Mané 6714d4b95e Reorder cred explorer columns
Now the node description is first.

Test plan: Observe that the behavior is unchanged, except for the
columnar order.

Paired with @wchargin
2018-05-10 17:05:01 -07:00
Dandelion Mané 0dae0c995f Factor out the non-recursive RecursiveTable
Test plan: Behavior is unchanged; manually verify.

Paired with @wchargin
2018-05-10 17:05:01 -07:00
Dandelion Mané 2f0f523065 s/typeFilter/topLevelFilter/g
Test plan: No change, it's just a variable rename.

Paired with @wchargin
2018-05-10 17:05:01 -07:00
Dandelion Mané 7fc31f6a26
Add `PagerankTable` for exploring PageRank results (#264)
`PagerankTable` is forked from `ContributionList`.

Test plan: I took it for a spin and it seemed OK. I'm not inclined to
write formal tests because we are in rapid prototyping mode stil.
2018-05-10 14:49:00 -07:00
William Chargin bb2ec756a4
Save explorer’s repo settings in localStorage (#263)
Test Plan:
Load the cred explorer for the first time to see two empty boxes.
Refresh to see the same thing. Then, add some content to each box.
Refresh and see the same content.

wchargin-branch: cred-explorer-localstorage
2018-05-10 14:42:54 -07:00
Dandelion Mané ed1f17f8ca
Add `nodeDescription` for GitHub nodes (#261)
`nodeDescription` gives a short, readable description of the content at
a given node.

Test plan: View the included snapshot test.
2018-05-10 14:08:06 -07:00
William Chargin d21ad1312b
Add `git/render.js` with `nodeDescription` (#262)
Test Plan:
Unit tests added, with full coverage of reachable cases.

wchargin-branch: git-render
2018-05-10 14:02:19 -07:00
Dandelion Mané 2a88bbc091
Reorganize GitHub porcelain tests (#260)
This re-organizes the GitHub porcelain tests to be:
- organized by each method signature, rather than having blocks that
test many different methods on each wrapper
- make extensive use of snapshot tests for convenience

Test plan: Inspect the new unit tests, and the corresponding snapshots.
It should be relatively easy to do this because you can copy+paste the
urls to verify the properties.
2018-05-10 12:53:32 -07:00
Dandelion Mané 0d6742be39
Add `parent()` relationships for GitHub porcelain (#255)
Test plan: Unit tests were added.
(Note: I haven't tested the error case, when there are an invalid number
of parents. I think this is OK for now, but I'm disclosing this so
reviewers can easily take issue with it. I'm planning to re-organize the
test code to be by method rather than by wrapper type (so the wrappers
section doesn't keep being a kitchen sink) and will hopefully
put that test in.)
2018-05-10 11:41:19 -07:00
Dandelion Mané 04390e5609
Add CONTAINS edges from Repositories to Issues/PRs (#253)
Also updates the GitHub porcelain.
Existing observable behavior is unchanged, except that performance may
be improved for issueOrPrByNumber.

A bug that would afflict a multi-repository graph (namely, that calling
`repo.issues()` would get all issues for all repositories) is
pre-emptively removed. No test cases were added as we do not yet support
multi-repository graphs.

Test plan: existing unit test coverage is sufficient.
2018-05-10 11:34:53 -07:00
Dandelion Mané 9d24190c03
GH Porcelain: add `Repository.from` (#257)
I think the absence of this method when I added the `Repository` class
was a bug.

Test plan: There are new unit tests.
2018-05-10 11:24:58 -07:00
Dandelion Mané 5c44dd0373
GH Porcelain: Move `authors` top-level Porcelain (#254)
Currently, the `authors` method is attached at the Repository level.
This is incorrect; it actually finds all the authors in the graph, not
all the authors of that repository. This commit moves the method to the
correct class.

Test plan: This function is only used in test code. The tests still
pass.
2018-05-10 11:24:24 -07:00
William Chargin 61d3cb3f52
Implement basic PageRank analysis (#252)
Summary:
We don’t expect the results to be of good quality right now. Rather,
this gives us a starting point from which to iterate the algorithm.

The convergence criterion also needs to be adjusted. (In particular, it
should almost certainly not be a constant.)

Test Plan:
Run `yarn start`. Select a graph, like `sourcecred/example-github`. Open
the JS console and click “Run basic PageRank”. Watch the console.

wchargin-branch: basic-pagerank
2018-05-10 11:21:18 -07:00
William Chargin 8e4668cc91
Fetch generated graphs on the frontend (#251)
Summary:
This commit enables the cred explorer to fetch pre-generated graphs. The
form has poor UX, but gets the job done. (To do later: saving in
localStorage, allowing fetching the list of possible graphs, linking the
submit button so that it’s triggered by `Enter`, etc.).

Test Plan:
Set up with `node ./bin/sourcecred.js graph sourcecred sourcecred`, then
run `yarn start` and check the following cases:
  - success case: `sourcecred`/`sourcecred`
  - error case: `sarcecrod`/`sarcecrod` (nice console error)
  - error case: the repository owner or name is an empty string or has
    invalid characters, like `../../secret.txt` (nice console error)

wchargin-branch: fetch-graphs
2018-05-09 12:47:56 -07:00
William Chargin cb1339a0a7
Start a production server from `sourcecred start` (#247)
Summary:
This commit changes `yarn start` to run a production version of the API
server, which serves static files from a pre-built directory and also
handles API requests.

Test Plan:
```shell
$ yarn build
$ yarn backend
$ node ./bin/sourcecred.js start -d /tmp/srccrd &
$ mkdir -p /tmp/srccrd
$ echo hello >/tmp/srccrd/world
$ curl -s localhost:4000/ | file -
/dev/stdin: HTML document, ASCII text, with very long lines, with no line terminators
$ curl -s localhost:4000/api/v1/data/world
hello
```

wchargin-branch: cli-start-prod-server
2018-05-08 22:00:36 -07:00
Dandelion Mané ac8d0ff66c
Setup credExplorer app scaffold (#249)
Test plan: Run `yarn start`, and observe that the Cred Explorer is now
included in the nav bar, and can be selected, in which case a short
message is displayed.
2018-05-08 17:05:56 -07:00
Dandelion Mané 0c59435a2b
Move testUtil.js into src/app (#248)
testUtil contains some useful configuration endpoints for our frontend
testing. This commit moves it out of the artifact editor and up to
`src/app`.

Test plan: `yarn travis` passes.
2018-05-08 17:03:01 -07:00
Dandelion Mané d9b4673dbd
Factor out `github.porcelain.asEntity` (#246)
@wchargin suggested that the entity-wrapping logic in porcelain
reference handling should be factored out as its own method. This was a
great suggestion; it will be very useful for plugin consumers, and it
also results in better test coverage.

Test plan: The new unit tests are nice. For your own safety, do not
question or quibble with the magnificent foo plugin.
2018-05-08 16:33:19 -07:00
William Chargin 9ea1f981aa
Proxy Webpack dev server through to an API server (#245)
Summary:
This way, our frontend can talk to a backend that can read from the
filesystem (among other things).

Paired with @decentralion.

Test Plan:
```
$ yarn backend
$ SOURCECRED_DIRECTORY=/tmp/srccrd yarn start
$ # verify that the browser looks good
$ mkdir /tmp/srccrd
$ echo hello >/tmp/srccrd/world
$ curl localhost:3000/api/v1/data/world
hello
$ curl localhost:4000/api/v1/data/world
hello
```

wchargin-branch: webpack-proxy
2018-05-08 16:09:37 -07:00
Dandelion Mané 1647c1abac
Fix flow errors in fetchAndPrintGithubRepo.js (#240)
Fixing the flow error corresponded to (correctly) documenting that the
GitHub token is mandatory, not optional.

Test plan: `yarn travis --full` still passes.
2018-05-08 14:24:11 -07:00
Dandelion Mané d34503799c
Enable flow: sourcecred.js and editor/App.test.js (#239)
They were already correct from a typing perspective, so no other changes
needed.
2018-05-08 14:21:26 -07:00
William Chargin 7d9a98128d
Extract a generic `LocalStore` module (#235)
Summary:
This way, different plugins can have `LocalStore`s with different cache
keys.

Test Plan:
Note that the app (`yarn start`) preserves the local store values from
before this change, and that updates continue to work.

wchargin-branch: generic-localstore
2018-05-08 13:40:48 -07:00
Dandelion Mané 372f8f9bd6 Setup routing within App.js
This commit modifies App.js to use routing, such that it's possible to
navigate between a home screen, and the Artifact Editor.

Test plan: Run `yarn start`, and navigate between the home screen and
artifact editor. Verify that the artifact editor still works as
expected.
2018-05-08 12:55:38 -07:00
Dandelion Mané e1808d1126 Add src/app/App.js
This commit adds src/app/App.js, which proxies in the frontend from
src/plugins/artifact/editor/App.js. The observed behavior (run `yarn
start`; see Artifact Editor) is unchanged.

Test plan: Observe that `yarn start` has the same behavior, and travis
passes.
2018-05-08 12:55:38 -07:00
Dandelion Mané 63351e6149 Move app scaffolding to src/app
This commit executes a micro-refactor to move all top-level app setup
code out of src/plugins/artifact/editor and into src/app. The observed
behavior from `yarn start`, which is to show the artifact editor, is
unchanged.
2018-05-08 12:55:38 -07:00
Dandelion Mané c2fb88b11a Turn on flow for index.js
Test plan: `yarn travis` passes
2018-05-08 12:55:38 -07:00
William Chargin 57682065fd
Add `sourcecred start` (#234)
Summary:
We need a way for our web applications to interact with data on the
filesystem. In this commit, we introduce a webserver that serves
statically from two directory trees: first, the result of a live-updated
Webpack build; second, the SourceCred data directory.

Test Plan:
Run `yarn backend` and `node ./bin/sourcecred.js start`. When ready,
navigate to the server’s root route in a web browser. Note that a nice
React app is displayed. Then, change something in that React app source.
Note that the server console displays Webpack’s update messages, and
that refreshing the page in the browser renders the new version of the
app. Finally, visit

    /__data__/graphs/sourcecred/example-github/graph.json

in the browser to see the graph for the example repository, assuming
that you had generated its graph previously.

wchargin-branch: start
2018-05-07 20:10:49 -07:00
Dandelion Mané 93e2798f37
Ensure that flow is used in all js files (#232)
This script ensures that either //@flow or //@no-flow is present in
every js file. Every existing js file that would fail this check has
been given //@no-flow, we should work to remove all of these in the
future.

Test plan:
I verified that `yarn travis` fails before fixing the other js files,
and passes afterwards.
2018-05-07 20:02:19 -07:00
Dandelion Mané ed1adc7b37
Rename `src/plugins/github/{api,porcelain}` (#231)
I also added a module-level docstring for the porcelain.
2018-05-07 19:18:01 -07:00
Dandelion Mané 9b3019434d
Create `github.Porcelain`: whole-graph porcelain (#230)
Now that we have repository nodes (#171), it makes sense that the Github
porcelain should provide a way to wrap the entire graph, and provide
easy access for the various repositories. This adds a `Porcelain` class
to fulfill that need.

The `Porcelain` is very straightforward: it takes in the whole graph,
and gives a way to get all the Repositories, or to request a particular
Repository by owner/name. In the odd case wherein a graph contains
multiple repository nodes with the same owner and name, an error is
thrown. Per standard JS map semantics (bleh), it can return undefined if
there is no matching repository.

Test plan:
See that the unit tests now use the standard behavior, and a test
verifies behavior for non-existant repositories. I don't have a test
case where there are multiple repo nodes, but that itself would be an
error, so throwing an error in that case is just defensive programming.
2018-05-07 17:56:33 -07:00
Dandelion Mané f219636a56
Create "REPOSITORY" nodes in GitHub plugin graph (#229)
This commit creates a new node type in the GitHub graph: the REPOSITORY
node. The REPOSITORY node has the following payload properties:
- url (string)
- name (string)
- owner (string)

Things this commit does:
- Add new node type and payload type (RepositoryNodePayload)
- Update parser to instantiate the new node type
- Update api.js to have Repository wrap the new node type (thus
Repository is a GitHub entity)
- Update snapshots
- Update users of GitHub node types to ensure they are exhaustive

Things that will come in a followon commit:
- Add CONTAINS edges from the repository to all its PRs and Issues
- Update the Repository porcelain to use those edges, rather than
scanning the graph for every possible Issue/PR (eventually those might
belong to other Repositories)
- Create a GitHubGraph abstraction in the porcelain, which makes it easy
to find all of the Repositories in a graph

Note that retrieving the repository owner technically involved fetching
the whole owner representation (as a GitHub user). I could have chosen
to add that user to the graph, with a "OWNS" edge pointing to the
repository. For simplicity's sake, I've declined to do that, and instead
just parse the owner's name directly.

Test plan:
Added tests to verify that the Repository porcelain entity has the right
properties. Combined with the snapshot tests, that should be sufficient.
2018-05-07 17:28:47 -07:00
Dandelion Mané 9d4ae8b901
Deleted users no longer break GitHub parser (#228)
When a GitHub user delete their account, all of their comments remain,
but with a `null` author. Previously, our code did not account for this
possibility. Now it does (by simply not adding an AUTHORSHIP edge).
Conveniently, our porcelain API already represents authors as a list, so
this doesn't require any change in porcelain API usage.

Test plan:
I did not add any unit tests, simply because
creating-and-deleting a GitHub user to make a repro seemed like a bit of
a pain. However, it is very unlikely that this bug will re-occur,
because the nullability of AuthorJSON is now enforced at the type level
inside graphql.js.

Also, running `node bin/sourcecred.js src-d go-git` now succeedsinstead
of failing.
2018-05-07 17:06:26 -07:00
William Chargin 0cae9d742d
Extract a common SOURCECRED_DIRECTORY flag (#227)
Summary:
This solves two problems:

 1. The “output directory” argument to `sourcecred graph` is also the
    input directory to other commands, like `sourcecred analyze`
    (hypothetically). In such cases, it would be nice for the flag to
    have the same name, but clearly `--output-directory` does not always
    make sense.

 2. In addition to storing graphs, we’ll need to store other kinds of
    data: settings, intermediate data for plugins to cache results, etc.
    We should store these under a single umbrella.

With both of these problems in mind, it makes sense to create a
`SOURCECRED_DIRECTORY` flag under which we store all relevant data.

Test Plan:
Run:
```shell
$ yarn backend
$ node bin/sourcecred.js help graph
$ node bin/sourcecred.js graph sourcecred example-github
$ node bin/sourcecred.js graph sourcecred example-github -d /tmp/sorcecrod
$ SOURCECRED_DIRECTORY=/tmp/srccrd node bin/sourcecred.js graph sourcecred example-github
$ for dir in /tmp/{sourcecred,sorcecrod,srccrd}; do find "${dir}"; done
```

wchargin-branch: graph-directory
2018-05-07 17:05:58 -07:00
William Chargin 498480db06
Factor out `defaultStorageDirectory` function (#226)
Test Plan:
Run `yarn backend && node bin/sourcecred.js help graph`.

wchargin-branch: defaultstoragedirectory
2018-05-07 16:23:53 -07:00
Dandelion Mané 61635a14a7
Remove redundant scripts (#225)
Our SourceCred CLI tool now ipmlements printCombinedGraph and
cloneAndPrintGitGraph, but with more principled implementations and
interfaces :)

Test plan:
`yarn travis --full` passes, so I didn't delete any needed test infra.
2018-05-07 16:15:00 -07:00
Dandelion Mané 55856d7a46
CLI commands error on unhandled promise rejection (#224)
Previously, if a CLI command had an unhandled promise rejection, this
would result in a spurious success and zero exit value.

This commit causes all of our CLI commands to instead fail if they have
an unhandled promise rejection.

Test plan: Previously, `sourcecred graph src-d go-git` would claim to
succeed, although it actually fails due to an unrelated bug. After this
change is applied, it correctly fails to retrieve the GitHub graph (and
hte combine step is never run).
2018-05-07 16:04:54 -07:00
Dandelion Mané 82bf739f35
Query GitHub for repository information (#222)
This commit pulls new information from GitHub about the url, name, and
owner of a GitHub repository. Part of #171.

Test plan: example-github-repo.json has been updated. `yarn travis
--full` passes. This should be sufficient.
2018-05-07 15:48:30 -07:00
Dandelion Mané f3bfed3deb
Split `GithubResponseJSON` and `RepositoryJSON` (#219)
Currently, we generate a `RepositoryJSON` object via querying GitHub.
That `RepositoryJSON` object has a `repository` field... which is weird,
and suggests we got the names slightly wrong.

This commit renames the top-level response to `GithubResponseJSON`, and
factors the `repository` field out as `RepositoryJSON`. Correspondingly,
the `addData` and `addRepository` methods on the parser are now
distinct.

This is a precursor for #171.

Test plan: This is a simple refactor; the fact that yarn travis passes
should be sufficient.
2018-05-07 14:49:59 -07:00
William Chargin 1e0d846675
Change plugin graph label from "PASS" to "DONE" (#221)
Test Plan:
Run `node bin/sourcecred.js graph sourcecred example-github` and note
the new output:
```
Storing graphs into: /tmp/sourcecred/sourcecred/example-github

Starting tasks
  GO   create-git
  GO   create-github
 DONE  create-git
 DONE  create-github
  GO   combine
 DONE  combine

Full results
 DONE  create-git
 DONE  create-github
 DONE  combine

Overview
Final result:  SUCCESS
```

wchargin-branch: plugin-graph-label
2018-05-07 14:47:43 -07:00
William Chargin 4e8d5b574a
Make `execDependencyGraph` labels configurable (#220)
Summary:
The `"PASS"` label only makes sense for tests. This commit makes the
labels configurable, so that the verbiage can make more sense in other
contexts, too.

Test Plan:
Apply a patch like
```diff
diff --git a/config/travis.js b/config/travis.js
index af0996b..b0ab3b6 100644
--- a/config/travis.js
+++ b/config/travis.js
@@ -10,7 +10,11 @@ function main() {
     process.argv.includes("--full")
       ? "FULL"
       : "BASIC";
-  execDependencyGraph(makeTasks(mode)).then(({success}) => {
+  execDependencyGraph(makeTasks(mode), {
+    taskLaunchLabel: " YO ",
+    taskPassLabel: "WHEE",
+    taskFailLabel: "UHOH",
+  }).then(({success}) => {
     process.exitCode = success ? 0 : 1;
   });
 }
```
and note that `GITHUB_TOKEN=none yarn travis --full` exhibits all
desired messages.

wchargin-branch: configurable-labels
2018-05-07 14:39:59 -07:00
William Chargin 2aeeca9a13
Implement a command-line interface (#217)
Summary:
This commit implements the `sourcecred` command-line utility, which has
three subcommands:
  - `plugin-graph` creates one plugin’s graph;
  - `combine` combines multiple on-disk graphs; and
  - `graph` creates all plugins’ graphs and combines them.

As an implementation detail, the `into.sh` script is very convenient,
avoiding needing to do any pipe management in Node (which is Not Fun).
When we build for release, we may want to factor that differently.

Test Plan:
To see it all in action, run `yarn backend`, and then try:
```
$ export SOURCECRED_GITHUB_TOKEN="your_token_here"
$ node ./bin/sourcecred.js graph sourcecred sourcecred
Using output directory: /tmp/sourcecred/sourcecred

Starting tasks
  GO   create-git
  GO   create-github
 PASS  create-github
 PASS  create-git
  GO   combine
 PASS  combine

Full results
 PASS  create-git
 PASS  create-github
 PASS  combine

Overview
Final result:  SUCCESS

$ ls /tmp/sourcecred/sourcecred/
graph-github.json  graph-git.json  graph.json

$ jq '.nodes | length' /tmp/sourcecred/sourcecred/*.json
1000
7302
8302
```
The `node sourcecred.js graph` command takes 9.8s for me.

(The salient point of the last command is that the two small graphs have
node count adding up to the node count of the big graph. Incidentally,
we are [almost][1] at a nice round number of nodes in the GitHub graph.)

[1]: https://xkcd.com/1000/

wchargin-branch: cli
2018-05-07 12:23:09 -07:00
William Chargin d7bfa02a54
Change `execDependencyGraph` export format (#216)
Summary:
To be honest, I have no idea what exactly this does or why it’s
necessary, but if we don’t do this then it is not possible to `import`
the exported member from a Webpack-bundled script. I’ve seen this
pattern before; one day I’ll actually figure out what it does. :-)

Test Plan:
Note that `yarn travis` (success) and `yarn travis --full` (failure; no
GitHub token) both have the expected behaviors.

wchargin-branch: execdependencygraph-export
2018-05-07 10:30:56 -07:00
Dandelion Mané fa4082c95b
Minimal toy oclif integration (#214)
This commit adds [oclif] as a command-line framework. It is successfully
integrated with webpack.

[oclif]: https://github.com/oclif/oclif

Usage:
`yarn backend` to build the cli.
`node bin/sourcecred.js` to launch the CLI and see usage
`node bin/sourcecred.js example` for one example command
`node bin/sourcecred.js goodbye` for another example command
2018-05-04 19:28:37 -07:00
William Chargin d3443a3d4c
Extract `execDependencyGraph` core from CI script (#208)
Summary:
We’d like to use the same abstraction for creating multiple cred graphs
and then combining them together. This will enable us to do that.

Test Plan:
Run `yarn travis` to test the success case, and `yarn travis --full`
(without setting a `GITHUB_TOKEN`) to test the failure case.

wchargin-branch: execdepgraph
2018-05-04 15:47:26 -07:00
William Chargin a642ed46b9
Expose `Graph.mergeManyConservative` (#209)
Summary:
This offers #205 to general users.

Test Plan:
Existing tests suffice.

wchargin-branch: merge-many-conservative
2018-05-04 15:42:39 -07:00
Dandelion Mané e3469f157d
Add `src/tools/bin/printCombinedGraph.js` (#207)
`printCombinedGraph` loads and prints a cross-plugin combined
contribution graph for a given GitHub repository.

It is a simple executable wrapper around `src/tools/loadCombinedGraph`.

Example usage:
`node bin/printCombinedGraph.js sourcecred example-git $GITHUB_TOKEN`
2018-05-04 12:10:20 -07:00
Dandelion Mané e66ed45cba
Add CLI for printing a fresh Git graph (#206)
`cloneAndPrintGitGraph` clones a git repository, and generates a Git
object graph for that repository.

This can be run as follows:
```
yarn backend;
node bin/cloneAndPrintGitGraph sourcecred example-git
```

This commit also adds two utility modules:
* `cloneAndLoadRepository` , which clones a Git repository to a tmpdir,
parses the `Repository` data out, and then cleans up.
* `cloneGitGraph`, which calls `cloneAndLoadRepository` and `createGraph`

Test plan: These don't fit well into our CI, because they require
network access to clone repositories from GitHub. I verified that the
functions work via the demo script above.
2018-05-04 11:35:14 -07:00
William Chargin d3dcf1ef5a
Speed up Git graph creation (#205)
Summary:
Because of `mergeConservative`’s naive implementation, using it as a
reducer results in a roughly quadratic algorithm. Replacing this with a
mutative accumulation has the same semantics but goes much faster.

Test Plan:
For correctness: tests pass. For performance: apply the following patch
to collect timing data. Then run:
```shell
$ NODE_ENV=production yarn backend
$ node bin/loadAndPrintGitRepository.js . >/tmp/sourcecred-git
$ node bin/createGitGraph.js /tmp/sourcecred-git
```
to run against the current state of SourceCred. Before this patch, the
elapsed time is about 6m00s; after this patch, it is about 0m1.3s.
Specifically:
```
$ cat timing_before
[0] Git graph creation: 239593.958ms
[1] Git graph creation: 240380.557ms
[2] Git graph creation: 241687.042ms

$ cat timing_after
[0] Git graph creation: 1585.903ms
[1] Git graph creation: 1315.430ms
[2] Git graph creation: 1373.833ms
```

<details>
<summary>Patch to collect timing data</summary>

```diff
diff --git a/config/paths.js b/config/paths.js
index f875eee..1bc1469 100644
--- a/config/paths.js
+++ b/config/paths.js
@@ -64,5 +64,6 @@ module.exports = {
     loadAndPrintGitRepository: resolveApp(
       "src/plugins/git/bin/loadAndPrintRepository.js"
     ),
+    createGitGraph: resolveApp("src/plugins/git/bin/createGraph.js"),
   },
 };
diff --git a/src/plugins/git/bin/createGraph.js b/src/plugins/git/bin/createGraph.js
new file mode 100644
index 0000000..a35ca1b
--- /dev/null
+++ b/src/plugins/git/bin/createGraph.js
@@ -0,0 +1,25 @@
+// @flow
+
+import {readFileSync} from "fs";
+
+import {createGraph} from "../createGraph";
+
+function main() {
+  const filename = process.argv[2];
+  const raw = JSON.parse(readFileSync(filename).toString());
+  const results = [];
+  for (let i = 0; i < 3; i++) {
+    const timer = `[${i}] Git graph creation`;
+    console.time(timer);
+    results.push(createGraph(raw));
+    console.timeEnd(timer);
+  }
+  console.log(
+    "Checksum: " +
+      JSON.stringify(
+        results.map((graph) => graph.nodes().length ^ graph.edges().length)
+      )
+  );
+}
+
+main();
```

</details>

wchargin-branch: collect-gold-rings
2018-05-04 10:52:41 -07:00
Dandelion Mané 0bf4f73f86
Add fetchGithubGraph (#204)
fetchGithubGraph is a tiny module which is responsible for fetching
GitHub contribution data, and parsing it into a graph.

Test plan:
The function is trivial and does not merit explicit testing.
2018-05-04 10:21:21 -07:00
William Chargin 315f66cc4c
Add BECOMES edges in the Git graph (#203)
Summary:
If a commit causes a tree entry to change hash while keeping the same
name, we now add a BECOMES edge between the corresponding entries.

Test Plan:
Snapshot changes are readable enough to manually verify. Programmatic
tests also added.

wchargin-branch: graph-becomes-edges
2018-05-03 14:16:18 -07:00
William Chargin e9ecb8c608
Find BECOMES edges for a high-level repository (#202)
Test Plan:
For the snapshot: verify that two of the BECOMES edges are the same as
those tested in `findBecomesEdgesForCommits` and that they have the
right commit hashes; then, verify that the remaining edge is correct.

wchargin-branch: high-level-becomes-repository
2018-05-03 14:09:13 -07:00
William Chargin c572b7f880
Find BECOMES edges for high-level commits (#201)
Test Plan:
Unit tests included. I verified that the hashes in the snapshot are
correct.

wchargin-branch: high-level-becomes-commits
2018-05-03 13:46:03 -07:00
Dandelion Mané a76d01ab75
Connect PRs and commits via MERGED_AS edges (#200)
This adds MERGED_AS edges which link from a PullRequest to a Commit. It
adds a corresponding `mergedCommitHash` method on the porcelain PR that
returns the hash of the merged commit (if available).

I would have preferred to return a porcelain wrapper over the commit,
but since we don't have a porcelain Git api, it seemed preferrable to
return the hash as a string. Returning a Node would both break
consistency in the porcelain api, and be problematic as the node does
not necessarily exist in the api. To ensure that the hash is available
without parsing Addresses, I used the edge payload. :)

Test plan:
Inspect the snapshot changes in the graph (they are fairly readable) and
the api testing in api.test.js.
2018-05-03 13:29:44 -07:00
Dandelion Mané 723efeb05f
Pull merge commit SHAs from GitHub (#198)
This commit adds a few fields to the PullRequest query fragment so that
we now retrieve merge commit shas. In cases where there is no merge
commit (ie the PR did not merge), the field is null. Observe that this
is the case for our example unmerged pull request.

Test plan: Inspect the changes to the demo data, and verify that they
are appropriate.
2018-05-03 12:41:20 -07:00
Dandelion Mané 9cbfa35a3a
Expose `commitAddress` from the Git plugin (#199)
For the GitHub plugin to create edges pointing to commits from the Git
plugin, it needs a way to create the appropriate address given the
commit's hash. This commit exposes that functionality by moving
`makeAddress` out of the "createGraph" module and into a new "address"
module, and using it to implement `commitAddress`.

Test plan: The code is so trivial that I don't think it merits testing.
2018-05-03 12:41:01 -07:00
Dandelion Mané 136cfa839c
Update example-github data (#197) 2018-05-03 12:03:19 -07:00
Dandelion Mané ce11a1c4e3
Rename sourcecred/example-{repo,github} (#196)
Our repository containing example GitHub data has been renamed from
"sourcecred/example-repo" to "sourcecred/example-github". This commit
updates all snapshots and tests to reflect this rename.
2018-05-03 11:51:12 -07:00
Dandelion Mané b4474e6bd1
Remove the `repositoryName` field from Addresses (#195)
See [#190] for context.

The change is almost entirely straightforward; the only "interesting"
decision I made was to move the repo owner and repo name into the string
id for the Artifact Plugin addresses, as the id would otherwise not be
unique.

[#190]: https://github.com/sourcecred/sourcecred/issues/190#issuecomment-386362870
2018-05-03 11:12:02 -07:00
William Chargin 082515e16a
Create nodes for submodule commits (#186)
Summary:
Previously, a tree entry had exactly one `HAS_CONTENTS` edge, unless the
tree entry corresponded to a submodule commit, in which case we had no
information at all. Now, submodule commit tree entries point to zero or
more `SUBMODULE_COMMIT` nodes. In the vast majority of cases, there will
be exactly one such node—however, it is possible that a particular tree
entry could correspond to multiple submodules (clone two identical
submodules with different URLs) or none at all (some manually edited
`.gitmodules` or other corruption).

Test Plan:
The snapshot changes are easy enough to read and verify (two new nodes
and five new edges). Additionally, the path-to-submodule `createGraph`
test has been updated.

wchargin-branch: graph-submodule-urls
2018-05-03 10:44:06 -07:00
William Chargin 7dbecfdac6
Load submodule URLs at each commit (#185)
Summary:
In Git, a tree may point to a commit directly. In our graph, we’d like
to represent “submodule commits” explicitly, because, a priori, we do
not know the repository to which the commit belongs. A submodule commit
node will store the hash of the referent commit, as well as the URL to
the subproject as listed in the .gitmodules file. In this commit, we
load the list of those URLs into the in-memory repository.

Shout-out to `git` for having an excellent command-line API:
the `--blob` argument to `git-config` is perfect for this situation.

Test Plan:
Snapshot updates are readable and sufficient.

wchargin-branch: load-submodule-urls
2018-05-03 10:39:03 -07:00
William Chargin bbb05c9508
Store `TreeEntry` metadata in non-string form (#184)
Summary:
Prior to this commit, given a `Tree` node with an edge to a `TreeEntry`
node, there was no way to tell what the entry name was other than
parsing the ID (which should never be required). This adds appropriate
data to the payload of a `TreeEntry`, and also to the inclusion edge (so
that if you only have the edge, you don’t have to fetch the entry).

Test Plan:
Snapshot changes are readable.

wchargin-branch: treeentry-metadata
2018-05-03 10:33:25 -07:00
William Chargin 79dff9a083
Add options to not rebuild on shell script tests (#188)
Summary:
This can be useful for speed, but it can also be important for
correctness (at least theoretically): if we run both these scripts
concurrently, then we don’t want one of them to squash the `bin`
directory while the other is about to invoke an executable therein.

One might note that the diffs to the two files in this commit are
virtually identical, and indeed the files themselves are quite similar.
I’d prefer to keep the duplication for now; if we really need a Bash
snapshot testing framework, we can factor one out.

Test Plan:
Run each script with `--help`, with `--build` and `--no-build`, and with
and without `-u`.

wchargin-branch: optional-rebuild
2018-05-02 15:09:46 -07:00
William Chargin ee03c58357
Exclude punctuation surrounding URL references (#183)
Summary:
To avoid confusion, we simultaneously remove unused capturing groups.
This is not strictly necessary, but it makes the code less brittle.

Test Plan:
The newly added test fails before the change to `findReferences.js`.

wchargin-branch: url-punctuation
2018-05-01 14:51:18 -07:00
Dandelion Mané acf5000547
Create GitHub reference edges (#182)
This commit adds the `addReferenceEdges()` method to the GitHub parser,
which examines all of the posts in the parsed graph and adds References
edges when it detects references between posts. As an example, `Hey
@wchargin, take a look at #1337` would generate two references.

We currently parse the following kinds of references:
- Numeric references to issues/PRs.
- Explicit in-repository url references (to any entity)
- @-author references

We do not parse:
- Cross-repository urls
- Cross-repository shortform (e.g. `sourcecred/sourcecred#100`)

`Parser.parse` calls `addReferenceEdges()`, so no change is required by
consumers to have reference edges added to their graphs.

The GitHub porcelain API layer now includes methods for retreiving the
entities referenced by a post.

Test plan:
This commit is tested both via snapshot tests, and explicit testing at
api layer. (Actually, the creation of the porcelain API layer was
prompted by wanting a cleaner way to test this commit.) I recommend
inspecting the snapshot tests for sanity, but mostly relying on the
tested behavior in api.test.js.
2018-04-30 20:19:38 -07:00
Dandelion Mané f358c33e2a
findReferences now finds url references to users (#181)
For example, `https://github.com/decentralion` is now a valid url
reference to a GitHub author.

Test plan: Check the added test case.
2018-04-30 19:23:58 -07:00
Dandelion Mané 0c0bbf58e2
Update example repo data (#180)
I added a lot of new comments that have url references to different
types of GitHub entities, e.g. to pull request review comments.

The commit was generated by running the example repo fetcher, and
running yarn test and updating snapshots.
2018-04-30 19:21:41 -07:00
Dandelion Mané a1d072846d
Add PR reviews and comments to GitHub api (#179)
Also, a slight re-organization of the GitHub api test code.
2018-04-30 18:22:03 -07:00
William Chargin 16e8e399e6
Add commit parent edges in the Git graph (#178)
Test Plan:
To verify the snapshot change, either believe the programmatic tests, or
use the following script to verify that the right edges were added:
```bash
#!/bin/bash
set -eu
example_repo="$(mktemp -d)"
yarn backend >/dev/null 2>/dev/null
node bin/createExampleRepo.js "${example_repo}"
expected() {
    git -C "${example_repo}" log --format='%H %P' \
        | awk '{ if (NF > 1) { print $2 " " $1 } }' \
        | sort
}
actual() {
    git diff HEAD^..HEAD | grep -A 1 -F -e src -e dst \
        | sed -n 's/^+.*"id": "\(.\+\)".*/\1 /p' \
        | tr -d $'\n' | cat - <(echo) \
        | fold -s -w 82 | sed 's/ *$//' \
        | sort
}
diff -u <(expected) <(actual)
```

wchargin-branch: graph-commit-parents
2018-04-30 18:08:40 -07:00
William Chargin 56ddb5cf9b
Load commit parent hashes into memory (#177)
Test Plan:
Snapshot updated with `./src/plugins/git/loadRepositoryTest.sh -u`; unit
tests suffice.

wchargin-branch: load-commit-parents
2018-04-30 18:01:16 -07:00
William Chargin d5f468ca68
Fix Git plugin `NodePayload` definition (#176)
Summary:
Flow didn’t catch this because all the payloads are `{}` anyway.

Test Plan:
Note that every node and edge payload is now listed exactly once in the
correct spot for each of `{Node,Edge}{Type,Payload}`.

wchargin-branch: git-nodepayload
2018-04-30 17:08:49 -07:00
Dandelion Mané 0609201af4
Remove Graph.{in,out}Edges (#174)
This method removes `Graph.inEdges` and `Graph.outEdges`. As a
replacement to these two functions, `graph.neighborhood` now takes an
optional `direction` flag, which can be set to `"IN" | "OUT" | "ANY"`.

This reduces the surface area of the Graph API, and means that the same
pattern can be used when requesting in or out neighbors as is used when
requesting all neighbors.

This change generates significant churn in the test files, and in some
cases the tests are less elegant / show historicity, as they were
written for the type signature of `{in,out}Edges`, which just returns an
array of edges, and now receive an array of neighbors. I think this is
acceptable, and it's not worth re-writing the test.

In many cases, replacing existing calls to `{in,out}Edges` in our actual
codebase resulted in cleaner code, as `neighborhood` successfully
abstracts over the common patterns that users of `{in,out}Edges` were
implementing.

As a fly-by refactor, I also changed the `neighborAddress` part of the
`neighborhood` return value to `neighbor`. It's a little less
descriptive, but it's more concise, and flow is there to help ensure
it's used correctly.

Test plan: Note that CI passes. Inspect the test changes, and verify
that they are appropriate transformations for consuming the new API.
2018-04-30 15:32:28 -07:00
William Chargin 5af5748ed7
Convert in-memory Git repos to cred graphs (#169)
Test Plan:
This snapshot test is too unwieldy to actually read—it’s 1000 lines of
opaque SHAs and thrice-stringified JSON objects—so it should be
interpreted as a regression test only. The programmatic tests should
suffice.

wchargin-branch: wip-git-create-graph
2018-04-30 15:23:37 -07:00
William Chargin f3a440244e
Fix all lint errors, adding a lint CI step (#175)
Test Plan:
Run `yarn lint` and `yarn travis` and observe success. Add something
that triggers a lint warning, like `const zzz = 3;`; re-run and observe
failures.

wchargin-branch: lint
2018-04-30 14:52:28 -07:00
Dandelion Mané 22ca77ed05
Add safe type coercion for GitHub api (#173)
In general, methods in the porcelain GitHub api may return multiple
types; e.g. a reference could be to an Issue, PullRequest, Comment,
Author (or more). To make working with the api more convenient while
maintaining safety, this commit adds a static `asType` method to each
Entity class, which confirms that type coercion is safe, and errors if
not.

This commit also adds `issueOrPRByNumber`, a convenience method, to
api.test.js.

Test plan: Check the API usage and verify that it is reasonable.
2018-04-30 10:07:23 -07:00
Dandelion Mané d878be6550
Update the GitHub example repo data (#172)
Commit generated by running src/plugins/github/fetchGithubRepoTest -u
2018-04-29 22:13:00 -07:00
Dandelion Mané 7158deaad3
Add a porcelain api for Github data (#170)
Interacting with raw contribution graphs is cumbersome. We'll need
more fluent and convenient ways to retrieve data from them; we can do
this by creating porcelain APIs that wrap the underlying graph.

This commit adds a simple porcelain API for the GitHub data. It creates
the following classes:

* `api.Repository`
* `api.Issue`
* `api.PullRequest`
* `api.Comment`
* `api.Author`

The classes all wrap a graph and a nodeAddress. They provide read-only
functions for retreiving data from the graph; that data might be a part
of the node payload, or it might do some graph traversal under the hood.

The choice to have the wrapper hold onto the Address rather than the
node itself was deliberate; in the future, the graph may contain nodes
that are not synchronously reachable, so this approach allows us to
create wrappers for nodes we can't synchronously reach. When this comes
up in practice, we can then add async methods to the wrapper.

Note that some data already included in our graph, such as
PullRequestReviews and PullRequestReviewComments, were deliberately
excluded, so as to allow the core ideas to be reviewed without
unnecessary clutter.

Test plan:
Check that the unit tests appropriately test the behavior, and that the
API seems pleasant to use.
2018-04-27 21:45:30 -07:00
William Chargin 1c28c75e39
Check in example repo’s in-memory representation (#166)
Summary:
Two reasons for this. First, we want tests to be able to operate on this
data without having to generate repositories via `git(1)`. (Doing that
is slow, and requires a Git installation, and makes it less clear that
the tests are correctly isolated/provides more surface area for
something to go wrong.) Second, in general plugins will need a canonical
source of test data, so setting/continuing this precedent is a good
thing.

Test Plan:
Observe that the old Jest snapshot must be equivalent to the new JSON
one, because the test criterion in `loadRepository.test.js` changed and
the test still passes. Then, run `loadRepositoryTest.sh` and note that
it passes; change the `example-git.json` file and note that the test
fails when re-run; then, run the test with `--updateSnapshot` and watch
it magically revert your changes.

wchargin-branch: check-in-git-repo
2018-04-27 20:51:54 -07:00
William Chargin 301e542ee1
Switch in-memory Git types from Maps to objects (#165)
Summary:
I’d like to use `Map`s whenever the keys are homogeneous (i.e.,
dictionaries, not structs). But this has proven infeasible. The primary
issue at this point is that `JSON.stringify(anyMap)` is `"{}"`—not
entirely unreasonable given that maps can have non-string keys, but
frustrating enough to not use them.

Test Plan:
Jest appears to order the snapshot keys differently for `Map`s and
objects (the former by insertion order and the latter alphabetical),
which makes the snapshot change harder to read. I verified that the
general structure is okay, and hand-verified some of the individual
changes. Noting that the number of lines added and deleted in the
snapshot is a good sanity check.

wchargin-branch: map-to-object
2018-04-27 20:46:55 -07:00
Dandelion Mané ec3d084ffc
Graph: type filter for `nodes()` and `edges()` (#168)
When requesting nodes and edges from the graph, it is convenient to
filter them by their type.

In the future, we should add plugin filtering as well, as we
expect type names to collide across plugins.

We may also want to consider keeping a cache of nodes and edges by type
to speed up these calls, if they become performance bottlenecks. (The
implementation in this commit naively iterates over every node/edge.)

Test plan:
Verify that the unit tests are appropriate.
2018-04-27 20:10:51 -07:00
Dandelion Mané 3d79f7680e
Collapse the 3 author types into 1 (#164)
Currently, we store GitHub Users, Organizations, and Bots as separate
nodetypes in the graph. This is inconvenient, as we don't care very much
what type of entity authored a node.

This commit collapses those three categories into one nodetype. The
extra information has migrated to the node payload, so it is still
possible to discover this information if it's important.

Test plan: There is some amount of snapshot churn because the author
node types and payloads have changed. Verify that the snapshot changes
are appropriate, and that CI passes.
2018-04-27 15:52:55 -07:00
Dandelion Mané dd48084810
Remove `get` prefix from getters in graph.js (#163)
This commit renames the following graph functions:

* `get{Node,Edge}{,s}` -> `{node,edge}{,s}`
* `get{In,Out}Edges` -> `{in,out}Edges`
* `getNeighborhood` -> `neighborhood`

The rename was effected across the repo by running:

```
$ find src -name "*.js" -exec sed -i 's/getNeighborhood/neighborhood/g' {} +
```

modified appropriately for each subsitution.

Test plan:
Inspect the code to make sure nothing was erronously renamed. Check that
CI passes.
2018-04-27 15:10:12 -07:00
Dandelion Mané 678924087a
Replace `Graph.{getAdjacentEdges,getNeighborhood}` (#162)
`Graph.getAdjacentEdges` had a serious defect: for the adjacent edges,
it's hard to tell which of the {src,dst} is the neighboring node address
and which is the node we called `getAdjacentEdges` on.

This commit fixes that limitation by replacing `getAdjacentEdges` with
`getNeighborhood`, with a return signature of
`{edge: Edge<EP>, neighborAddress: Address}[]`

Some yak shaving was required: we needed a version of `expectSameSorted`
and, by extension, `sortedByAddress` that takes an accessor to an
Addressable, so that we could test that the neighborhoods returned were
correct. To satisfy flow, I created `expectSameSortedAccessorized` and
`sortedByAddressAccessorized`. Cumbersome, but it worked. ¯\_(ツ)_/¯
2018-04-27 15:05:36 -07:00
Dandelion Mané 28e686c369
Remove `address.sortedByAddress` (#161)
Previously, the address module exported `sortedByAddress`, a utility
function that sorts an array of `Addressable`s. This function was only
used in test code.

This commit replaces it with generic usage of `lodash.sortBy`. This
reduces the API surface area of the module, and removes test-only code
from the exported api.

New dependency added: `lodash.sortby`
https://www.npmjs.com/package/lodash.sortby
2018-04-27 14:29:49 -07:00
William Chargin 1550e6d05e
Add one-way GitHub sync for Git example repos (#160)
Test Plan:
Run the script with `--dry-run`, which currently prints
```shell
$ src/plugins/git/demoData/synchronizeToGithub.sh -n
yarn run v1.5.1
[build output truncated]
Build completed; results in 'bin'.
Done in 3.30s.

Synchronizing: example-git
warning: no common commits
To github.com:sourcecred/example-git.git
 + 3507b7c...3715ddf HEAD -> master (forced update)

Synchronizing: example-git-submodule
Everything up-to-date

Done.
```
This reflects the correct state of affairs, because #158 changed the
example repository. Note that the `3715ddf` SHA in the output of the
above script matches the SHA in the `exampleRepo.test.js.snap` snapshot.

wchargin-branch: sync-git-example-repos
2018-04-27 14:03:54 -07:00
William Chargin f4de3e2067
Standardize environment passed to Git (#159)
Summary:
When we shell out to `git`, we don’t want the end user’s environment
variables and Git configuration to influence the results. This commit
standardizes those inputs. Standardizing the environment has the side
benefit that the `GIT_DIR` environment variable is not set, which means
that the test suite will work properly when run from the `exec` step of
a Git rebase.

Test Plan:
Tests pass and snapshots are unchanged. Note that
```shell
$ git rebase HEAD --exec 'CI=1 yarn test'
```
works after this commit but not before it.

wchargin-branch: standardize-git-environment
2018-04-27 11:50:32 -07:00
William Chargin c7235f6e49
Remove superfluous commas from example repo README (#158)
Summary:
Using `array.join()` added commas at the start of some lines; I meant to
use `array.join("")`.

(I’ve now inspected the full generated contents of both repos, and they
look good.)

Test Plan:
It is expected that these attributes of the snapshots should change.
There’s no need to carefully check the SHAs.

wchargin-branch: readme-change
2018-04-27 11:48:40 -07:00
Dandelion Mané 8e9ddbf9fc
Add `Graph.getAdjacentEdges` for in and out edges (#157)
Some consumers of the graph may prefer to treat it as an undirected
graph. For example, when finding the author of an issue, it is wholly
sufficient to find an edge with the `AUTHORS` type; the caller may
prefer not to be bothered with remembering which end of the `AUTHORS`
end is considered the `src` and which is the `dst`.

The `getAdjacentEdges` call enables that, by combining the output of
`getInEdges` and `getOutEdges`.

Test plan:
The new tests are pretty comprehensive.
2018-04-26 20:27:46 -07:00
William Chargin aa071ceab3
Include a submodule in the main example repository (#156)
Summary:
The main example repository now covers the currently desired features:
it has blobs, subtrees, and submodules, and commits that change each of
these. (We don’t have merge commits yet—we can add those once we start
to care about them.)

Once this is merged, I will push the two repositories to GitHub.

Test Plan:
Verifying and understanding is easier than ever before. You can run the
following commands to create the repositories in question on your disk:
```shell
$ yarn backend
$ node bin/createExampleRepo.js /tmp/repo
$ node bin/createExampleRepo.js --submodule /tmp/subrepo
```

You can then explore these repositories at your leisure. For instance,
to check that the `loadRepository` snapshot has the right set of
commits, inspect the output of the following command:
```shell
$ git -C /tmp/repo log --format='%H %T'
```
Or, to check that a particular tree has the right contents, just run:
```shell
$ git -C /tmp/repo ls-tree TREE_SHA
```
Verifying the `exampleRepo` snapshot is similarly easy: just check that
the lists of commit SHAs in `/tmp/repo` and `/tmp/subrepo` are correct.

wchargin-branch: include-submodule
2018-04-26 20:11:44 -07:00
William Chargin d6e9b0a72b
Add a command-line script to create example repos (#155)
Summary:
We’ll use this to create the repositories on disk and then push them to
GitHub.

Test Plan:
Generate both kinds of repository, and check out the SHAs:
```shell
$ yarn backend
$ node bin/createExampleRepo.js /tmp/repo
$ node bin/createExampleRepo.js --submodule /tmp/repo-submodule
$ node bin/createExampleRepo.js --no-submodule /tmp/repo-no-submodule
$ # (first and third lines do the same thing)
$ git -C /tmp/repo rev-parse HEAD
677b340674bde17fdaac3b5f5eef929139ef2a52
$ git -C /tmp/repo-submodule rev-parse HEAD
29ef158bc982733e2ba429fcf73e2f7562244188
$ git -C /tmp/repo-no-submodule rev-parse HEAD
677b340674bde17fdaac3b5f5eef929139ef2a52
```
Then, note that these SHAs are expected per the snapshot file in
`exampleRepo.test.js.snap`.

wchargin-branch: create-example-repo-command
2018-04-26 19:53:46 -07:00
William Chargin 28a118c814
Create an example submodule repository (#153)
Summary:
We want our main repository to include submodules so that we can test
submodule support. Here, we create a repository to be included as a
submodule.

wchargin-branch: example-submodule-repository
2018-04-26 19:43:01 -07:00
William Chargin 75fd068a35
Extract code to create the example repository (#152)
Test Plan:
Note that the snapshot change is simply a move: no SHAs were changed.

wchargin-branch: extract-example-repository-code
2018-04-26 19:38:29 -07:00
William Chargin 6f9941b526
Clean up temporary directories in tests (#151)
Summary:
The `loadRepository` test tries to clean up temporary directories, but
failed to do so because the directories were not empty. The cleanup hook
threw an error, but this error was silenced by Jest due to [a known
bug][1] that was fixed a few days ago. We can fix this by asking `tmp`
to clean up directories even if they are not empty, using the
`unsafeCleanup` option.

[1]: https://github.com/facebook/jest/issues/3266

Test Plan:
While running `watch -n 0.1 'ls /tmp | grep "tmp-.*" | wc -l'`, run
tests. Note that the number increases by five and then drops down again;
before this patch, it would increase by 5 and then stay there.

wchargin-branch: clean-up-tmpdirs
2018-04-26 19:34:23 -07:00
William Chargin 3679529bef
Move `localGit`/`GitDriver` into Git utils (#150)
Summary:
A few reasons for this:
 1. This _is_ a utility, so it makes sense semantically.
 2. This unifies the utilities API; clients like `loadRepository.test`
    don’t have to keep around both a `git` and a `gitUtils`.
 3. Most importantly, further scripts and tests shouldn’t depend on
    `loadRepository` just for `localGit`. Depending on `gitUtils` makes
    much more sense.

(Note that `makeUtils` is no longer dependency-injectable, but that’s
okay; I considered this and favored YAGNI on this one.)

Test Plan:
Existing unit tests pass.

wchargin-branch: move-localgit
2018-04-26 19:31:27 -07:00
William Chargin dad8777e6c
Add utility functions for working with Git repos (#149)
Summary:
Utilities like `deterministicCommit` provide valuable functionality that
we will want to use in other scripts and perhaps other test cases. It
makes sense to factor these out into utility functions.

Test Plan:
Existing tests pass.

wchargin-branch: git-utils
2018-04-26 19:17:16 -07:00
Dandelion Mané 087d8d561e
Enable type filtering on Graph.get{In,Out}Edges (#147)
This commit adds an optional `typeOptions` argument to Graph.getInEdges
and Graph.getOutEdges. The `typeOptions` allow filtering the returned
edges by the type of the edge, and the type of the node that the edge is
connected to. This makes it much easier to use these methods to find
connections that have a certain relationship, e.g. finding the author of
a commit or the comments on an issue.

Test plan:
A new test suite was written that comprehensively tests this behavior,
both for getInEdges and getOutEdges.
2018-04-26 19:09:37 -07:00
Dandelion Mané c635034dab
Add richer types to graphDemoData (#146)
In preparation for using type info in the Graph apis, it is helpful to
have richer type info in the graph demo data.

Test plan: Check that the snapshot changes only consist of type changes,
and that CI passes.
2018-04-26 16:44:34 -07:00
Dandelion Mané 18ce9982d2
Refactor GH parser to expose a functional api (#145)
Our GitHub parser is implemented via a `GithubParser` class which builds
the GitHub graph. This is a convenient implementation, but an awkward
API. This commit refactors the module so that it exposes a clean `parse`
function, which ingests the GitHub JSON data and returns as completed
graph.

Test plan:
The unit tests have been re-written to use the new public API. All the
snapshots are unchanged, and flow passes. Additionally, I ran `yarn
start` and verified that the GithubGraphFetcher for the Artifact plugin
is still working.
2018-04-25 19:39:28 -07:00
William Chargin 5e4b7b1fcc
Extract repository types to a separate module (#141)
Summary:
We should be able to get the types without depending on the function to
load a Git repo from disk, and in particular without depending on
`child_process`.

Test Plan:
Flow and tests are sufficient.

wchargin-branch: extract-repository-types
2018-04-25 14:35:45 -07:00
Dandelion Mané ad56ba087c
Use urls as ids for GitHub nodes (#144)
There's some context at #127, in which I initially proposed this change.

In addition to the long-term benefits described in #127, there is a
short-term benefit which is that it makes snapshot tests easier to read,
because the GitHub ids are opaque and unreadable, while the GitHub urls
are relatively easy to parse.

This change results in significant snapshot churn.
2018-04-25 14:28:06 -07:00
Dandelion Mané 4f857a8bb1
Add return type info for fetchGithubRepo (#143)
Once the type was added, flow correctly discovered a bug in
GithubGraphFetcher.js, which resulted in broken graph fetching in the
ArtifactEditor. Oops! / Good work Flow!

I made `ensureNoMorePages` expect the result it is testing to be an
`any`, which is appropriate for how the function is written (i.e. it is
written in a way that is agnostic to the actual result).
2018-04-24 16:00:50 -07:00
Dandelion Mané 8fdfacd097
fetchGithubRepo: remove vestigial data field (#142)
The example-repo.json file is regenerated with large diffs due to the
change in indentation level throughout the file.

Test plan:
Sanity check the snapshot (close inspection is unnecessary due to the
simplicity of the code change). Check that CI pases.
2018-04-24 15:46:16 -07:00
Dandelion Mané 6a3e4d754c
Add flow types for GitHub graphql query response (#140)
This commit adds flow typing for the JSON result from hitting the GitHub
graphql api. We can't prove that the flow typing is correct, but since
the type definition is colocated with the corresponding fragment
definitions, we can hope that maintainers will maintain both together.

We update the parser to consume the new flow types. There are no flow
errors.

Test plan:
Inspect the flowtypes, verify that they correspond to the data in
example-repo.json, and that there are no flow errors.
2018-04-24 14:05:51 -07:00
William Chargin 418b745d7c
Load Git repositories into memory (#139)
Summary:
In this newly added module, we load the structural state of a git
repository into memory. We do not load into memory the contents of any
blobs, so this is not enough information to perform any analysis
requiring file diffing. However, it is sufficient to develop a notion of
“this file was changed in this commit”, by simply diffing the trees.

Test Plan:
Unit tests added; `yarn test` suffices. Reading these snapshots is
pretty easy, even though they’re filled with hashes:
  - First, read over the commit specifications on lines 69–83 of
    `loadRepository.test.js`, so you know what to expect.
  - In the snapshot file, keep handy the time-ordered list of commit
    SHAs at the bottom of the file, so that you know which commit SHA is
    which.
  - To verify that the large snapshot is correct: for each commit, read
    the corresponding tree object and make sure that the structure is
    correct.
  - To verify the small snapshot, just check that it’s the correct
    subset of the large snapshot.
  - If you want to verify that the SHA for a blob is correct, open a
    terminal and run `git hash-object -t blob --stdin`; then, enter the
    content of the blob and press `<C-d>`. The result is the blob SHA.

To run a sanity-check on a large repository: apply the following patch:

<details>
<summary>Patch to print out statistics about loaded repository</summary>

```diff
diff --git a/config/paths.js b/config/paths.js
index d2f25fb..8fa2023 100644
--- a/config/paths.js
+++ b/config/paths.js
@@ -62,5 +62,6 @@ module.exports = {
     fetchAndPrintGithubRepo: resolveApp(
       "src/plugins/github/bin/fetchAndPrintGithubRepo.js"
     ),
+    loadRepository: resolveApp("src/plugins/git/loadRepository.js"),
   },
 };
diff --git a/src/plugins/git/loadRepository.js b/src/plugins/git/loadRepository.js
index a76b66c..9380941 100644
--- a/src/plugins/git/loadRepository.js
+++ b/src/plugins/git/loadRepository.js
@@ -106,3 +106,7 @@ function findTrees(git: GitDriver, rootTrees: Set<Hash>): Tree[] {
   }
   return result;
 }
+
+const result = loadRepository(...process.argv.slice(2));
+console.log("commits", result.commits.size);
+console.log("trees", result.trees.size);
```
</details>

Then, run `yarn backend` and put the following script in `test.sh`:

<details>
<summary>Contents for `test.sh`</summary>

```shell
#!/bin/bash
set -eu

repo="$1"
ref="$2"

via_node() {
    node bin/loadRepository.js "${repo}" "${ref}"
}

via_git() (
    cd "${repo}"
    printf 'commits '
    git rev-list "${ref}" | wc -l
    printf 'trees '
    git rev-list "${ref}" |
        while read -r commit; do
            git rev-parse "${commit}^{tree}"
            git ls-tree -rt "${commit}" \
                | grep ' tree ' \
                | cut -f 1 | cut -d ' ' -f 3
        done | sort | uniq | wc -l
)

echo
printf 'Running directly via git...\n'
time a="$(via_git)"

echo
printf 'Running Node script...\n'
time b="$(via_node)"

diff -u <(cat <<<"${a}") <(cat <<<"${b}")
```
</details>

Finally, run `./test.sh /path/to/some/repo origin/master`, and verify
that it exits successfully (zero diff). Here are some timing results on
SourceCred and TensorBoard:

  - SourceCred: 0.973s via Node, 0.327s via git.
  - TensorBoard: 30.836s via Node, 6.895s via git.

For TensorFlow, running via git takes 7m33.995s. Running via Node fails
with an out-of-memory error after 39 minutes, with 10GB RAM and 4GB
swap. See details below.

<details>
<summary>
Full timing details, commit SHAs, and OOM error message
</summary>

```
+ ./test.sh /home/wchargin/git/sourcecred 01634aabcc

Running directly via git...

real	0m0.327s
user	0m0.016s
sys	0m0.052s

Running Node script...

real	0m0.973s
user	0m0.268s
sys	0m0.176s
+ ./test.sh /home/wchargin/git/tensorboard 7aa1ab9d60671056b8811b7099eec08650f2e4fd

Running directly via git...

real	0m6.895s
user	0m0.600s
sys	0m0.832s

Running Node script...

real	0m30.836s
user	0m3.216s
sys	0m10.588s
+ ./test.sh /home/wchargin/git/tensorflow 968addadfd4e4f5688eedc31f92a9066329ff6a7

Running directly via git...

real	7m33.995s
user	5m21.124s
sys	1m5.476s

Running Node script...
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
 1: node::Abort() [node]
 2: 0x121a2cc [node]
 3: v8::Utils::ReportOOMFailure(char const*, bool) [node]
 4: v8::internal::V8::FatalProcessOutOfMemory(char const*, bool) [node]
 5: v8::internal::Factory::NewFixedArray(int, v8::internal::PretenureFlag) [node]
 6: v8::internal::DeoptimizationInputData::New(v8::internal::Isolate*, int, v8::internal::PretenureFlag) [node]
 7: v8::internal::compiler::CodeGenerator::PopulateDeoptimizationData(v8::internal::Handle<v8::internal::Code>) [node]
 8: v8::internal::compiler::CodeGenerator::FinalizeCode() [node]
 9: v8::internal::compiler::PipelineImpl::FinalizeCode() [node]
10: v8::internal::compiler::PipelineCompilationJob::FinalizeJobImpl() [node]
11: v8::internal::Compiler::FinalizeCompilationJob(v8::internal::CompilationJob*) [node]
12: v8::internal::OptimizingCompileDispatcher::InstallOptimizedFunctions() [node]
13: v8::internal::Runtime_TryInstallOptimizedCode(int, v8::internal::Object**, v8::internal::Isolate*) [node]
14: 0x12dc8b08463d
```
</details>

wchargin-branch: load-git-repositories

# Please enter the commit message for your changes. Lines starting
# with '#' will be kept; you may remove them yourself if you want to.
# An empty message aborts the commit.
#
# Date:      Mon Apr 23 23:02:14 2018 -0700
#
# HEAD detached at origin/wchargin-load-git-repositories
# Changes to be committed:
#	modified:   package.json
#	new file:   src/plugins/git/__snapshots__/loadRepository.test.js.snap
#	new file:   src/plugins/git/loadRepository.js
#	new file:   src/plugins/git/loadRepository.test.js
#
# Untracked files:
#	out
#	runtests.sh
#	src/plugins/artifact/editor/ArtifactSetInput.js
#	src/plugins/git/repository.js
#	test.sh
#	todo
#
2018-04-24 13:57:10 -07:00
Dandelion Mané 515c577825
Split github/githubPlugin.js into three files (#138)
github/githubPlugin.js was growing ungainly - it contained two major
pieces: all of the node and edge types, and the GitHub parser. As I
contemplated adding a third major new section of logic (an easy-to-use
api for traversing the GitHub graph, with first class support for
comments, authorship, etc), I found the prospect of adding even more
into that file quite unappealing. So, I have instead split it into three
files:

* github/pluginName.js: Exports the plugin name.
* github/types.js: Exports types of nodes, edges, and their payloads
* github/parser.js: Exports `GithubParser`

No logic has been changed whatsoever - this is purely a rename-refactor.

Test plan:
CI still passes. I manually verified that the Artifact Editor can still
load and display GitHub data.
2018-04-24 09:30:22 -07:00
Dandelion Mané 01634aabcc
Add url to author payloads (#137)
We have urls for all the author types, so for consistency across GitHub
payloads, I am adding urls to the author paylods.

Test plan:
Check that snapshot changes consist entirely of adding urls, and that
the urls are appropraite.
2018-04-23 18:30:19 -04:00
Dandelion Mané e191614dd4
Find GitHub username references (#136)
Add logic to findReferences for finding GitHub username references,
e.g. "Hello, @wchargin!". The API is unchanged.

Test plan:
There are new unit tests that verify this behavior works as expected.
2018-04-23 18:13:21 -04:00
Dandelion Mané d9d83c1157
Rename parseReferences.js to findReferences.js (#135)
This is consistent with the single export (findReferences.js now exports
`findReferences`)
2018-04-23 17:59:00 -04:00
Dandelion Mané 240f05899c
Simplify GH reference handling (#130)
Currently, every type of reference has its own type signature: numeric
references are returned as numbers, url references are a complicated
object containing url parts, and so forth.

Since ultimately the references are just strings, it makes more sense to
treat references as plain strings. This allows a much simpler
implementation of reference edge creation in the GitHub plugin. It also
results in a simpler API for the parseReferences file (it only exports a
single findReferences function).

Test plan:
Verify that the updated tests encode appropriate behavior.
2018-04-23 17:49:09 -04:00
William Chargin e0a5118f8d
Switch typed dispatch table to `empty` assertion (#133)
Summary:
This replaces the implementation of a static check from a somewhat
complicated use of higher-order types to a more simple empty-union
assertion, as suggested by jez in “Case Exhaustiveness in Flow”:
https://blog.jez.io/flow-exhaustiveness/

(I know; we’re not using Reason. One step at a time. :-) )

I adapted the implementation a bit because I prefer explicitly disabling
an ESLint warning over a no-op function call; it is not clear from the
latter that the purpose is to suppress a lint warning.

Test Plan:
In `githubPlugin.js`, add `| "ANOTHER"` to the `NodeType` type, and note
a compile-time Flow error on the appropriate line, with a very readable
error message. Note that all unit tests pass, and running the UI on
`sourcecred/sourcecred` yields correct titles for each node type present
(namely, all node types except for `ORGANIZATION` and `BOT`).

wchargin-branch: empty-union-assertion
2018-04-23 13:12:44 -07:00
Dandelion Mané 1e311c59f4
Add README and update flow for GitHub demo data (#129)
Keeping the GitHub demo data up-to-date is important, and there isn't
good documentation for how to do that.

This commit adds a short README.md for the demo data, and adds an update
flag to fetchGithubRepoTest.sh that can be used to easily update it.

Test plan:
Modify example-repo.json (e.g. by deleting it entirely). Run
fetchGithubRepoTest.sh -u and confirm that the data was regenerated
without change. Run fetchGithubRepoTest.sh and confirm the test passes.

Note: The end cursor is sensitive to the timezone, which seems to be
cached with the GitHub token. An erroneous switch to Israel timezone
made it into master; this commit reverts back to US/Pacific.
2018-04-23 14:21:01 -04:00
Dandelion Mané d025e78b1d
Get urls for all GitHub objects (#128)
This commit modifies our GitHub graphql query so that we request urls
for all objects (e.g. users, pull requests, pull request review
comments). Some change along these lines is necessary so that we can
correctly represent URL reference edges to e.g. issue comments. (It
might be possible to do without by reverse-enginering from the ids, but
we are resolved to treat ids as opaque).

Strictly speaking, we don't need to collect urls for users, issues, and
pull requests - they are generated via simple schema. However, for
consistency, I think it's better to just take URLs on everything.

Test plan: The example-repo.json has been regenerated. The diffs are as
expected.
2018-04-20 10:50:03 -04:00
William Chargin 9d1200275e
Implement exhaustive fetching for our GitHub query (#123)
Summary:
This commit completes the ad hoc pagination solution described in #117:
we implement pagination specifically for our current query against the
GitHub API. This is done in such a way that reasonable additions to the
query will not be hard to implement—for instance, if we want to fetch
a new kind of field, the marginal cost is at most a bit of extra
copy-and-paste and some modifications to tests. However, we should
certainly still plan to implement the fully automatic pagination system
described in #117.

Running on TensorBoard with the default page limits takes 30–33 seconds
over 7 queries, uses 103 GitHub API points (out of 5000 for any given
hour), and produces a JSON file that is 8.7MB (when pretty-printed).
This all seems quite reasonable to me.

Test Plan:
Extensive unit tests added. The snapshots are quite readable, by design.

For a real-world test, you can `yarn start` the artifact viewer and use
the GUI to fetch the data for tensorflow/tensorboard.

To demonstrate that the fetching process gives the same results
regardless of the page size, follow these steps:

 1. In `fetchGithubRepo.js`’s `postQuery` function, insert a new
    statement `console.error("Posting query...")` at the beginning. (It
    is important to print to stderr instead of stdout.)
 2. Run `yarn backend` and then invoke `fetchGithubRepo.js` on a repo
    large enough to require pagination, like SourceCred or TensorBoard.
    Pipe the result to `shasum -a 256` and note the SHA.
 3. In `github/graphql.js`, change the page size constants near the top
    of the file. Re-run step 2. The number of queries that are posted
    will vary significantly as the page size constants vary, but the
    checksum should remain the same.
 4. Repeat until satisfied. (I tried three sets of values: the standard
    values, the standard values but all divided by 10, and all 5s.)

wchargin-branch: ad-hoc-pagination
2018-04-06 07:29:55 -07:00
William Chargin e82b56e52c
Implement a function to merge query results (#122)
Summary:
Once we execute the root query, find continuations, embed the
continuations into queries, and execute the continuation query, we will
need to merge the continuations’ results back into the root results.
This commit adds a function `merge` that will be suitable for doing just
that.

Test Plan:
New unit tests added, with 100% coverage. Run `yarn test`.

wchargin-branch: merge-query-results
2018-04-05 02:27:03 -07:00
William Chargin 751172ea77
Create pagination continuations for GitHub query (#121)
Summary:
Per #117, this is a first step toward at writing a pagination API that
specifically targets our current GitHub query. For design details, see
new module docs on `src/plugins/github/graphql.js`.

This commit modifies the core GitHub query and thus the
`example-repo.json` snapshot: we now request `endCursor` fields for all
pagination info, and we request the `id` of the root `repository` field.
The former is obviously necessary. The latter is necessary for the
repository to be consistent with other nodes that offer connections as
fields: we require an ID on the node containing the connection so that
we can have random access to it in a continuation selector.

Test Plan:
Unit tests added. You can also try out the generated continuation
queries for yourself: apply the patch below, run `yarn backend`, and
then run the `fetchGithubRepo.js` script on `sourcecred/sourcecred`.
This will output a nicely formatted query that you can paste directly
into GitHub’s API explorer and execute. (Note that, because this patch
is not fully polished, the query must be run against a repository that
has a continuation for every node type: more pages of issues, PRs,
comments, reviews, and review comments. This is due to an
easy-but-annoying-to-fix bug in the patch, not in the code included in
this commit.)

<details>
<summary>Patch for generating a continuations query</summary>

```diff
diff --git a/src/plugins/github/fetchGithubRepo.js b/src/plugins/github/fetchGithubRepo.js
index 789a20e..418c736 100644
--- a/src/plugins/github/fetchGithubRepo.js
+++ b/src/plugins/github/fetchGithubRepo.js
@@ -6,8 +6,13 @@

 import fetch from "isomorphic-fetch";

-import {stringify, inlineLayout} from "../../graphql/queries";
-import {createQuery, createVariables} from "./graphql";
+import {stringify, inlineLayout, multilineLayout} from "../../graphql/queries";
+import {
+  continuationsFromQuery,
+  continuationQuery,
+  createQuery,
+  createVariables,
+} from "./graphql";

 /**
  * Scrape data from a GitHub repo using the GitHub API.
@@ -66,8 +71,13 @@ function postQuery(payload, token) {
       if (x.errors) {
         return Promise.reject(x);
       }
-      ensureNoMorePages(x);
-      return Promise.resolve(x);
+      console.log(
+        stringify.body(
+          continuationQuery(Array.from(continuationsFromQuery(x.data))),
+          multilineLayout("  ")
+        )
+      );
+      throw new Error("STOPSHIP");
     });
 }

diff --git a/src/plugins/github/graphql.js b/src/plugins/github/graphql.js
index 9ea2592..9ead42b 100644
--- a/src/plugins/github/graphql.js
+++ b/src/plugins/github/graphql.js
@@ -39,11 +39,11 @@ import {build} from "../../graphql/queries";
  *
  * [1]: https://developer.github.com/v4/guides/resource-limitations/#node-limit
  */
-export const PAGE_LIMIT = 100;
-const PAGE_SIZE_ISSUES = 100;
-const PAGE_SIZE_PRS = 100;
-const PAGE_SIZE_COMMENTS = 20;
-const PAGE_SIZE_REVIEWS = 10;
+export const PAGE_LIMIT = 10;
+const PAGE_SIZE_ISSUES = 10;
+const PAGE_SIZE_PRS = 10;
+const PAGE_SIZE_COMMENTS = 3;
+const PAGE_SIZE_REVIEWS = 1;
 const PAGE_SIZE_REVIEW_COMMENTS = 10;

 /**
@@ -340,6 +340,36 @@ function* continuationsFromReview(
   }
 }

+/**
+ * Combine continuations into a query.
+ */
+export function continuationQuery(
+  continuations: $ReadOnlyArray<Continuation>
+): Body {
+  const nonces: string[] = continuations.map((_, i) => `_n${String(i)}`);
+  const nonceToIndex = {};
+  nonces.forEach((n, i) => {
+    nonceToIndex[n] = i;
+  });
+  const b = build;
+  const query = b.query(
+    "Continuations",
+    [],
+    continuations.map((continuation, i) =>
+      b.alias(
+        nonces[i],
+        b.field(
+          "node",
+          {id: b.literal(continuation.enclosingNodeId)},
+          continuation.selections.slice()
+        )
+      )
+    )
+  );
+  const body = [query, ...createFragments()];
+  return body;
+}
+
 /**
  * These fragments are used to construct the root query, and also to
  * fetch more pages of specific entity types.
```
</details>

wchargin-branch: ad-hoc-pagination-continuations
2018-04-05 02:23:17 -07:00
William Chargin 7711f01b84
Extract paginatable fragments of GitHub query (#120)
Summary:
Any time that we pull fields off a connection object, we may need to
repeat the query for subsequent pages. Therefore, such fragments will be
shared across multiple queries, and also shared within a query if we
need to fetch—say—more issue comments on two or more distinct issues.
This is a perfect use case for fragments.

This commit refactors the GitHub query to be organized in terms of
fragments, without changing the format of the results.

(We also take this opportunity to factor the page limits into
constants.)

Test Plan:
After running `yarn backend`, the `fetchGithubRepoTest.sh` test passes.

wchargin-branch: extract-github-query-fragments
2018-04-05 02:19:29 -07:00
William Chargin fbb6ec28db
Extract GitHub GraphQL code to a module (#119)
Summary:
Per #117, we want to develop an ad hoc pagination API written
specifically against the query that we use to interact with GitHub.
The pagination logic should be separate from the logic to actually fetch
the repo, but should be colocated with the query itself, so this commit
extricates the query from `fetchGithubRepo.js` into a new module.

Test Plan:
Existing tests pass, including `fetchGithubRepoTest.sh`.

wchargin-branch: extract-github-graphql
2018-04-05 00:08:52 -07:00
William Chargin 806c5e5687
Update example data for @decentralion name change (#118)
Summary:
This was created by re-crawling the GitHub repo via `fetchGithubRepo`,
and then updating Jest snapshots.

Test Plan:
Note that `fetchGithubRepoTest.sh` passes, so the data is now up to
date.

Inspect the snapshot, and note that the only changes are to change login
names from `dandelionmane` to `decentralion`. To do so automatically:
```bash
set -eu
diff_contents() {
    git difftool HEAD^ HEAD --extcmd=diff --no-prompt
}
! diff_contents | grep '^<' | grep -vF '"dandelionmane"'
! diff_contents | grep '^>' | grep -vF '"decentralion"'
```

wchargin-branch: decentralion-data
2018-04-04 23:29:30 -07:00
William Chargin e6f401df30
Add field aliases to structured GraphQL queries (#116)
Summary:
For pagination, we’ll want to query against multiple entities of the
same type. GraphQL uses aliases to facilitate this. This commit adds
support for aliases to our GraphQL query DSL.

Test Plan:
Inspect snapshot changes, and note that `yarn flow` and `yarn test`
pass.

wchargin-branch: graphql-aliases
2018-03-26 20:54:16 -07:00
William Chargin 2f50aa7364
Rename getAll{Nodes,Edges} to get{Nodes,Edges} (#115)
Test Plan:
Standard `yarn flow` and `yarn test` suffice.

wchargin-branch: get-no-all
2018-03-26 20:25:06 -07:00
William Chargin f93c9a8e42
Allow editing artifact descriptions (#114)
Summary:
It looks like this:
![Screenshot](https://user-images.githubusercontent.com/4317806/37943962-a8352b94-312e-11e8-9523-855a34020709.png)

Test Plan:
Interaction tests included. Run `yarn test`.

wchargin-branch: artifact-descriptions
2018-03-26 20:11:30 -07:00
William Chargin 99f24c420a
Convert ArtifactList to ArtifactGraphEditor (#112)
Summary:
This component now maintains a graph of just artifacts and the edges
among them. It owns the state, and notifies its parents of changes with
a callback. We treat the graph objects as properly immutable, copying
them on each change. So far, descriptions are always the empty string.

Test Plan:
Interaction tests added. Run `yarn test`.

wchargin-branch: artifact-graph-editor
2018-03-26 19:49:21 -07:00
Dandelion Mané 823e7da374
Cleanup GitHub plugin Node/Edge types (#113)
Update GitHub plugin to respect two new conventions:
- Node/Edge types are exported as UPPER_CASE_CONSTANTS
- Edge types are always verbs, which can be read as $src verb $dst

Test plan:
Flow passes. Inspect snapshot changes.
2018-03-26 19:24:14 -07:00
William Chargin 458744e77f
Redefine artifact plugin node and edge semantics (#111)
Summary:
This commit revises our implementations of node and edge types, and
specifies the semantics for artifact plugin IDs: we create IDs by
slugifying an artifact name and then resolving collisions

Test Plan:
Unit tests added. Run `yarn flow` and `yarn test`.

wchargin-branch: artifact-plugin-node-edge-semantics
2018-03-26 19:17:42 -07:00
William Chargin e57a16efbd Allow removing nodes and edges from the graph (#110)
Summary:
Wherein we change the semantics to allow\* dangling edges. This is
necessary for plugins that want to update nodes, such as changing a
description or other noncritical field.

\* (It was technically possible before by abusing `merge`, but now you
can just do it.)

Paired with @dandelionmane.

Test Plan:
Extensive tests added. Run `yarn flow` and `yarn test`.

wchargin-branch: allow-removing-from-graph
2018-03-26 17:40:19 -07:00
William Chargin 26508051a4 Add a covariant `copy` method on `Graph` (#109)
Summary:
Clients of `Graph` that wish to treat the graph as immutable will
benefit from a `copy` method. We should provide it on `Graph` instead of
asking clients to reimplement it because it affords us the opportunity
to get the type signature right: in particular, copying should allow
upcasting of the type parameters, even though `Graph` itself is
invariant.

Paired with @dandelionmane.

Test Plan:
Unit tests added. Run `yarn flow` and `yarn test`. To check that
downcasting is not allowed, change the types in the new static test case
in `graph.test.js` to be contravariant instead of covariant, and note
that `yarn flow` fails.

wchargin-branch: graph-copy
2018-03-26 13:58:12 -07:00
William Chargin 007cf88172
Separate artifact settings from GitHub graph fetch (#108)
Summary:
We need to know the repo owner and name for purposes other than fetching
the GitHub graph: for instance, fetching the `artifacts.json` file that
describes the artifact subgraph. It makes sense that these should be
settings global to the application. This commit separates a settings
component and the original GitHub graph fetcher.

This invalidates localStorage; you can manually migrate.

Paired with @dandelionmane.

Test Plan:
Note that the data continues to be stored in localStorage and that it is
updated on each keypress. Note that the state is properly passed around:
if you change the repository name from `example-repo` to `sourcecred`,
e.g., and click “Fetch GitHub graph”, then the proper graph is fetched.

wchargin-branch: separate-artifact-settings
2018-03-26 13:26:44 -07:00
Dandelion Mané d310561b94
Generate Edge ids automatically (#107)
* Generate Edge ids automatically

Adds edgeID in graph, which creates a string id from src and dest.
Provided that the plugin only uses edgeID for generating edge ids of
that type, these ids will be unique.

Modify the GitHub plugin to use edgeID. This allows the code and
typesignature to be simpler, and will be more consistent with other
plugins.

Test plan:
Carefully inspect the snapshots.
2018-03-26 12:05:25 -07:00
Dandelion Mané 043c37f9c6
Add a toString and fromString method on addresses (#106)
* Add a toString and fromString method on addresses

The toString and fromString methods use json-stable-stringify,
and we've modified other address code to use these methods.
As such, a number of the snapshots have changed (ordering).

Test plan:
Verify the new included unit tests are comprehensive. Inspect the
snapshots.
2018-03-26 11:32:25 -07:00
Dandelion Mané b2c13ac891
[refactor] Move node/edge types to addresses (#105)
This pull request adds a string type as a field of the address, thus canonicalizing that all Nodes and Edges have a Type. This allows for simpler PluginAdapters, and simpler implementation of the GitHub plugin (as it no longer needs to invent its own mechanism for storing types).

I explored making the Address interface generic, with a Type parameter that is a subtype of string. Unfortunately, Flow type resolution seems to have an exponential performance degradation with many subtyped parameters, and adding the extra type parameters to Graph.merge resulted in my flow instance locking. Maybe we can explore adding the subtypes later.

In the GitHub plugin, we entirely do away with the NodeIDs, but we still have EdgeIDs. I plan to remove the need for EdgeIDs in a separate PR which will enforce uniqueness of (srcAddress, dstAddress, pluginName, type) tuples, so that explicitly generating IDs for edges will be unnecessary. In the mean time, I bifurcated the makeAddress function in the GitHub plugin to makeEdgeAddress and makeNodeAddress.

Test plan:
Flow and unit tests all pass. Inspect snapshots, and verify that all the changes are reasonable. Note that since we order by serialized address, adding the type field to address has changed the snapshot order in a few cases.

Close #103
2018-03-26 10:36:49 -07:00
William Chargin 8fdf758cb9
Standardize on Enzyme shallow rendering (#104)
Summary:
This commit moves our existing frontend tests to use Enzyme’s shallow
rendering API <http://airbnb.io/enzyme/docs/api/shallow.html>. The
benefit over also using `react-test-renderer` is simply consistency (the
two are functionally equivalent); the benefits over `mount` are that
subcomponents cannot contaminate the test state (i.e., you’re only
testing one component at a time), that the resulting snapshots are more
readable because the root props are not shown, and that the
implementation is more efficient. This is a follow-up to #102.

In a case where we actually need a full DOM tree, we should still feel
free to use `mount`, but we haven’t needed that yet.

Test Plan:
Verify that the new `ContributionList.test.js.snap` represents the same
data as the old one.

wchargin-branch: standardize-enzyme-shallow
2018-03-21 18:28:06 -07:00
William Chargin feac85ad2c
Use Enzyme to test ContributionList dynamics (#102)
Summary:
This is our first dynamic test of a React component! Enzyme looks pretty
easy to use to me, for both snapshot tests and interaction simulation.

In doing so, we catch a minor bug in the edge case where a contribution
is not owned by any plugin (`colSpan`, not `colspan`). This edge case
does not appear in the sample data, but it does appear in the test data,
even prior to this commit. The previous renderer, `react-test-renderer`,
appears not to surface this error. Furthermore, this bug did not cause
any user-visible errors except a `console.error`.

Test Plan:
Inspect the snapshot file to make sure that it is reasonable. (The
existing test case has its snapshot regenerated due to formatting
differences between the two renderers.)

To test that the browser error is fixed, render a contribution list on a
GitHub graph but with an empty adapter set. One way to do this is to comment out line 7 of
`standardAdapterSet.js`; alternately, you can use the React Dev Tools to
select the `ContributionList` node, then run
```js
$r.props.adapters.adapters = {};
$r.forceUpdate();
```
Note subsequently that there is no console error and that the `<td>`s in
question span three columns.

wchargin-branch: contributionlist-dynamic-test
2018-03-21 17:35:17 -07:00
William Chargin 5dd5de306c
Allow filtering by type in the contribution list (#101)
Summary:
Filter options are “all contributions” or a specific plugin/type
combination. This includes a snapshot test for the static state. I’ll
add an interaction test in a subsequent commit.

Test Plan:
`yarn start`, fetch graph, play with the filtering options.

wchargin-branch: filter-contributions
2018-03-20 18:50:25 -07:00
Dandelion Mané 39fd3fa354
Make GitHub capitalization consistent within code (#100)
* Make GitHub capitalization consistent within code

We now never capitalize the H in GitHub within variable or function
names. We still capitalize it in comments or user facing strings.

Test plan:
Unit tests, the fetchGithubRepoTest.sh, and
`git grep itHub` only shows comment lines and print statements.

* Fix William's klaxon
2018-03-20 18:32:05 -07:00
Dandelion Mané 41cdf2d855
Implement GitHub reference detection (#98)
This commit only adds logic for finding references in GitHub posts,
either by #-numeric reference, or explicit urls. Adding the reference
edges to the graph will occur in a followon commit.

Test plan: New unit tests are included
2018-03-20 18:10:03 -07:00
William Chargin 61624a5dcf
Extract Aphrodite-management test code to testUtil (#99)
Summary:
Any tests that render Aphrodite-styled React elements will need to do
this, so it’s nice to have the code in one place.

wchargin-branch: aphrodite-testutils
2018-03-20 17:35:26 -07:00
Dandelion Mané 5b420c6294
Add EdgeTypes type in githubPlugin (#97) 2018-03-20 15:35:22 -07:00
William Chargin f02e0610be
Use structured GraphQL query API in GitHub fetcher (#94)
Summary:
In addition to being nicer on the eyes, this enables the query to be
statically analyzed (e.g., by an auto-pagination API) and used by other
modules.

Test Plan:
Manually running
```shell
$ yarn backend
$ GITHUB_TOKEN="<your_token>" src/plugins/github/fetchGitHubRepoTest.sh
```
succeeds.

wchargin-branch: use-structured-graphql-queries
2018-03-20 15:29:31 -07:00
William Chargin ab619432e1
Begin work on contributions and adapters (#93)
Summary:
This commit begins to extend the artifact editor to display
contributions. To display contributions from arbitrary plugins, we need
to communicate with those plugins somehow. We do so via an adapter
interface that plugins implement; included in this commit is an
implementation of this interface for the GitHub plugin (partially: we
punt on rendering).

This includes a snapshot test. The snapshot format is designed to be
human-readable and -auditable so that it can serve as documentation.

Test Plan:
Run the application with `yarn start`. Then, fetch a graph and watch as
its contributions appear in the view.

wchargin-branch: contributions-and-adapters
2018-03-20 14:26:02 -07:00
William Chargin e9ca833448
Expose `NodeTypes` from the GitHub plugin (#96)
Summary:
This is useful for metaprogramming. For instance, suppose we have an
object like this:
```js
const stringifiers = {
  ISSUE: (stringifyIssue: (Node<IssueNodePayload>) => string),
  COMMENT: (stringifyComment: (Node<CommentPayload>) => string),
  ...
}
```
How do we type this? We might try
```js
{[type: NodeType]: (Node<NodePayload>) => string}
```
but this is not correct, because `Node<IssueNodePayload>` is a subtype of
`Node<NodePayload>`, and `(_) => K` is contravariant, not covariant. (In
other words, a function from `Node<IssueNodePayload>` is not as general
as a function from `Node<NodePayload>`.) We need to express a dependency
between the object key and the value. We instead write:
```js
type TypedNodeToStringifier = <T: $Values<NodeTypes>>(
  T
) => (node: Node<$ElementType<T, "payload">>) => string;
(stringifiers: $Exact<$ObjMap<NodeTypes, TypedNodeToStringifier>>);
```
This expresses exactly (heh) the right type.

Test Plan:
Note that removing any of the elements of `NodeTypes` yields a Flow
error, due to the static assertion following the type definition.

wchargin-branch: node-types
2018-03-20 13:27:08 -07:00
Dandelion Mané 559ed393a9
Update example-repo json and snapshots (#95)
I added some new issues to sourcecred/example-repo to test unicode
support and parsing of extremely long issues titles.

This commit merely updates our example-repo json and the corresponding
snapshots.

Test plan:
Run testFetchGithubRepo.sh
Run unit tests
2018-03-20 13:18:38 -07:00
Dandelion Mané 02754d2523
GitHub parser now recognizes pull request reviews (#91)
Also, since there are now two types of things that are being
"contained" (comments and pull request reviews), I factored out an
addContainment method to avoid repeating that code.

To make our handling of PullRequestReviewComments and regular Comments
consistent, I modified our query string so that we now request urls on
PullRequestReviewComments. Also, since I didn't notice until closely
inspecting the snapshot that we had been adding payloads with some
undefined properties, I added a test to verify that every property on
every node and edge payload is defined.

I regenerated the example-repo data to reflect the change to query
string.

Test plan:
Verify that the snapshot changes are appropriate
Run standard tests
Run `yarn backend`
Run `GITHUB_TOKEN={your_token} ./src/plugins/github/fetchGithubRepoTest.sh`
2018-03-20 11:46:46 -07:00
William Chargin 55225fd53e Save graph fetcher credentials in local storage
Test Plan:
Make a request, then refresh, and note that the fields are populated.

Paired with @dandelionmane.

wchargin-branch: graph-fetcher-localstore
2018-03-19 20:06:52 -07:00
William Chargin 5d80e39473 Fetch and parse GitHub graphs on the frontend
Summary:
This is quick and dirty. No error handling yet. We’ll soon save
credentials and repository to local storage.

Paired with @dandelionmane.

Test Plan:
Run `yarn start`, then enter your API key and specify the
`sourcecred/example-repo` repo. Note that the resulting graph is shortly
logged to the console.

wchargin-branch: fetch-parse-github-frontend
2018-03-19 20:06:52 -07:00
William Chargin 0eed384850 Enable creating artifacts in the artifact list
Summary:
Paired with @dandelionmane.

wchargin-branch: create-artifacts
2018-03-19 20:06:52 -07:00
William Chargin 0c3aa9c7ba Create a view-only artifact viewer
Summary:
Paired with @dandelionmane.

wchargin-branch: create-artifact-viewer
2018-03-19 20:06:52 -07:00
William Chargin 5d042c0008 Use isomorphic-fetch instead of node-fetch
Summary:
Paired with @dandelionmane.

Test Plan:
```
$ CI=true yarn test
$ yarn backend
$ GITHUB_TOKEN="<your_token>" src/plugins/github/fetchGitHubRepoTest.sh
```

wchargin-branch: isomorphic-fetch
2018-03-19 20:06:52 -07:00
William Chargin d18cb945af Add style support to the artifacts app
Test Plan:
Note that the header, when rendered, is magenta.

wchargin-branch: stylish-artifacts
2018-03-19 20:06:52 -07:00
William Chargin bbecf00615
Repurpose React app as artifact editor (#89)
Summary:
We’ll now start creating the artifact plugin. A large part of this will
be the user interface, including a GUI. For now, our build system just
builds a single React app, so we’re cannibalizing the main explorer to
serve this purpose.

Paired with @dandelionmane.

Test Plan:
The following still work:
  - `yarn test`
  - `yarn start`
  - `yarn build; (cd build; python -m SimpleHTTPServer)`

wchargin-branch: repurpose-react-app-as-artifact-editor
2018-03-19 15:25:23 -07:00
William Chargin 8f8d9c4564 Strip down explorer app to a barebones React app (#88)
Summary:
We’re not deleting it because it works with the build system and has the
service worker stuff from create-react-app, but we’ll soon repurpose it.

Paired with @dandelionmane.

Test Plan:
The following still work:
  - `yarn test`
  - `yarn start`
  - `yarn build; (cd build; python -m SimpleHTTPServer)`

wchargin-branch: dismantle-explorer
2018-03-19 15:09:11 -07:00
William Chargin ca85fdf234 Reorganize `src/` directory (#87)
Test Plan:
Note that tests still pass, and all changes to snapshot files are
verbatim moves.

wchargin-branch: reorg
2018-03-19 14:31:50 -07:00
Dandelion Mané 30600004e4
Implement the GitHub graph parser (#81)
The GitHub parser transforms GraphQL api data from GitHub into our Graph
data structure. This commit focuses on properly parsing Issues, Pull
Requests, Comments, and Users.

Test Plan:
Run the unit tests. Inspect the snapshot results (particuarly those for
individual pull requests or issues, which are easier to parse) and
verify that the output is appropriate.
2018-03-19 12:26:32 -07:00
William Chargin 274007c90d
Configure Webpack for backend applications (#84)
Summary:
Running `yarn backend` will now bundle backend applications. They’ll be
placed into the new `bin/` directory. This enables us to use ES6 modules
with the standard syntax, Flow types, and all the other goodies that
we’ve come to expect. A backend build takes about 2.5s on my laptop.

Created by forking the prod configuration to a backend configuration and
trimming it down appropriately.

To test out the new changes, this commit changes `fetchGitHubRepo` and
its driver to use the ES6 module system and Flow types, both of which
are properly resolved.

Test Plan:
Run `yarn backend`. Then, you can directly run an entry point via
```
$ node bin/fetchAndPrintGitHubRepo.js sourcecred example-repo "${TOKEN}"
```
or invoke the standard test driver via
```shell
$ GITHUB_TOKEN="${TOKEN}" src/backend/fetchGitHubRepoTest.sh
```
where `${TOKEN}` is your GitHub authentication token.

wchargin-branch: webpack-backend
2018-03-18 22:43:23 -07:00
William Chargin 291dcb17c3
Add a GraphQL structured query format (#77)
Summary:
See motivation in #76. Feel free to look at the new snapshot file to
inspect the structured representation and also the stringified output.

This implementation is sufficient to encode our query against the
GitHub v4 API; see the test plan below.

Test Plan:
Unit tests added; run `yarn flow && yarn test`.

This code has full coverage except for lines 260, 315, and 380 of
`queries.js`; these lines check invariants that should never be
violated.

You can also use the following steps to verify that the sample query is
valid GraphQL that produces the same results as our hand-written query:

 1. Apply the following hacky patch:

    ```diff
    diff --git a/src/backend/graphql/queries.test.js b/src/backend/graphql/queries.test.js
    index 52bdec7..c04a636 100644
    --- a/src/backend/graphql/queries.test.js
    +++ b/src/backend/graphql/queries.test.js
    @@ -3,6 +3,18 @@
     import type {Body} from "./queries";
     import {build, stringify, multilineLayout, inlineLayout} from "./queries";

    +function emitGitHubQuery(layout, filename) {
    +  const fs = require("fs");
    +  const path = require("path");
    +  const result = stringify.body(usefulQuery(), layout);
    +  const outputFilepath = path.join(__dirname, "..", filename);
    +  const outputText = `module.exports = ${JSON.stringify(result)};\n`;
    +  fs.writeFileSync(outputFilepath, outputText);
    +  console.log(`Wrote output to ${outputFilepath}.`);
    +}
    +emitGitHubQuery(multilineLayout("  "), "githubQueryMultiline.js");
    +emitGitHubQuery(inlineLayout(), "githubQueryInline.js");
    +
     describe("queries", () => {
       describe("end-to-end-test cases", () => {
         const testCases = {
    ```

 2. Run `CI=true yarn test`, and verify that the following two files
    written to `src/backend/` contain appropriate contents. You can just
    eyeball them, or check that they match my results:
    https://gist.github.com/wchargin/f37b99fd4ec345c9d2541c2dc53ceda9

 3. In `fetchGitHubRepo.js`, change the definition of `const query` to

    ```js
    const query = require("./githubQueryMultiline.js");
    ```

    Run

    ```shell
    GITHUB_TOKEN="<your_token_here>" src/backend/fetchGitHubRepoTest.sh
    ```

    and verify that it exits successfully.

 4. Repeat for `require("./githubQueryInline.js")`.

wchargin-branch: graphql-structured-queries
2018-03-18 22:35:20 -07:00
William Chargin 1083540d21
Parameterize `Graph` over node and edge payloads (#83)
Summary:
Closes #82. This affords clients type-safety without needing to
verbosely annotate every node or edge passed into the graph functions.
It also enables graph algorithms to be more expressive in their types:
for instance, the merge function now clearly indicates from its type
that the first graph’s nodes are passed as the first argument to the
node reducer, and the second graph’s nodes to the second. Clients can
upgrade immediately by using `Graph<*, *>`.

Thankfully, Flow supports variance well enough for this all to be
possible without too much trouble.

Test Plan:
Existing unit tests pass statically and at runtime. I added a test case
to demonstrate that merging works covariantly.

To see some failures, change `string` to an incompatible type, like
`number`, in the definitions of `makeGraph` in test functions for
conservatively rejecting graphs with conflicting nodes/edges
(ll. 446, 462).

wchargin-branch: parameterize-graph
2018-03-17 11:36:53 -07:00
Dandelion Mané 894d6a2291
Allow adding explicitly typed nodes/edges (#80)
Summary:
Flow doesn’t allow us to specify variance annotations in generic
function parameters, and doesn’t allow coercing `Node<T>` to
`Node<mixed>`. This forces us to put `any`s in our code, which…works.

Paired with @dandelionmane.

Test Plan:
New unit tests trivially pass dynamically, and now pass statically
(failing before the changes to `graph.js`).

wchargin-branch: explicitly-typed-nodes-edges
2018-03-17 00:28:33 -07:00
Dandelion Mané 1e791782d5
Allow redundant adds to the Graph (#79)
Graph.addNode and Graph.addEdge now allow adding the same node or edge
multiple times, provided that the duplicate adds are trying to insert
identical content.

This came up while prototyping the GitHub plugin; rather than create
myriad subgraphs and merge them, I found it convenient to construct a
single graph and iteratively add nodes. Since the same node may be
discovered multiple times (most notably user identities), there was a
need for a "conservative add" abstraction that adds a node if it doesn't
exist yet, but errors only if multiple adds conflict.

Since this behavior is generic and highly conservative, it seemed
appropriate to include in the graph class itself.

Test Plan:
The unit tests have been updated to include the new behavior.
2018-03-17 00:28:17 -07:00
Dandelion Mané d2501947a6
Regenerate example-repo testdata (#78)
I moved sourcecred/tiny-example-repository to sourcecred/example-repo
as it's simpler to remember. I also unarchived it and added comments
to an issue, so that we can create a simple test for issue parsing.

This commit merely updates SourceCred to point to example-repo with
the regenerated canoncial output.
2018-03-16 11:22:17 -07:00
Dandelion Mané 0e57b42095 Fetch GitHub repos using the GitHub v4 API (#75)
Summary:
It’s a whole new world of GraphQL! Our parser is now just a GraphQL
query that asks for exactly what we want and dumps it to a file. The
data exposed by the v4 API is also in a much nicer format than that of
the v3 API, so this is pretty much a universal improvement.

Currently, we do not handle pagination. We require that the repository
in question have fewer than a fixed number of issues, and comments per
issue, and reviews per PR, and review comments per PR, and so on. If
this limit is exceeded, the script will fail-fast with a nice error
message. To fix this, we’ll need to write a general-purpose pagination
API that allows traversing cursors at any level of the query.

Paired with @wchargin.

Test Plan:
Run

    $ GITHUB_TOKEN="your_token_here" src/backend/fetchGitHubRepoTest.sh

and verify that it exits with 0. Note that if you change this script’s
repository from `tiny-example-repository` to `sourcecred`, the script
correctly fails and outputs a useful diff.

wchargin-branch: github-v4-graphql
2018-03-15 14:56:25 -07:00
Dandelion Mané fb00c35823
Factor eventide graph demo data to a new module (#71)
* Factor evertide graph demo data to a new module

It would be helpful to make our standard tiny graph available to other
test and demo instances, outside of just graph.test.js. This way we can
use it as a test case for the Graph Explorer.
2018-03-05 20:58:47 -08:00
Dandelion Mané 7ea8bdd964
Make App.js into skeleton for GraphExplorer (#70)
* Make App.js into skeleton for GraphExplorer

We make a very basic skeleton for the Graph Explorer as a basis
for future development.

This commit also removes the UserExplorer and FileExplorer from
App.js. Since we have changed the underlying data model, we are
unlikely to use the UserExplorer or FileExplorer in anything like
their current state, so they are effectively deprecated. I am deferring
removing them because it is nice to have some examples of working React
code to copy from, before the Graph Explorer is ready.

Test plan: run `yarn start`, and observe that the App displays the
words "Graph Explorer" underneath the "SourceCred Explorer" title bar.
2018-03-05 20:09:24 -08:00
William Chargin 5960eab6c1
Make `Graph` serializable (#69)
Summary:
This commit adds `toJSON()` and `static fromJSON()` on `Graph`. The main
benefit at this time is that this gets us free interoperability with
Jest’s snapshot testing.

The implementation of `fromJSON` is not performance-tuned, and could
probably be significantly optimized.

See #65 for discussion.

Test Plan:
New unit tests added: `yarn flow && yarn test`.

wchargin-branch: make-graph-serializable
2018-03-05 16:22:58 -08:00
William Chargin cee90fd10f
Use `AddressMap` in `Graph` (#68)
Summary:
This commit simplifies the implementation of `Graph` without changing
its interface. We now use the `AddressMap` for all four instance fields
of `Graph`.

Test Plan:
All existing tests pass, and coverage is maintained.

wchargin-branch: use-address-map-in-graph
2018-03-05 16:20:52 -08:00
William Chargin a8da44c94b
Create an `AddressMap` abstraction (#67)
Summary:
This commit reifies the concept of an `Addressable`, which is any object
that has a covariant `address: Address` attribute, and implements a
simple data structure for storing addressable items keyed against their
addresses. Instances of `AddressMap` can replace the four fields of
`Graph`:
```js
_nodes: AddressMap<Node<mixed>>;
_edges: AddressMap<Edge<mixed>>;
_outEdges: AddressMap<{|+address: Address, +edges: Address[]|}>;
_inEdges: AddressMap<{|+address: Address, +edges: Address[]|}>;
```

Test Plan:
New unit tests included, with 100% coverage: `yarn flow && yarn test`.

wchargin-branch: address-map
2018-03-05 16:18:20 -08:00
William Chargin a798f9bac2
Rewrite GitHub plugin payload type system (#64)
Summary:
We’re stripping down the payload types for the GitHub plugin, to only
include what we expect to use immediately. In doing so, we take the
opportunity to make the typing a little stronger, so that we can ensure
that the `type` field of a specific type of payload is set to a
particular constant.

Paired with @dandelionmane.

Test Plan:
Adding these lines to `githubPlugin.js` and running `yarn flow`
indicates that the typechecking is working as expected:
```js
("ISSUE" : NodeType);       // works
("WEIRD" : NodeType);       // fails
("AUTHORSHIP" : EdgeType);  // works
("UNEXPECTED" : EdgeType);  // fails
```

wchargin-branch: github-plugin-payload-types
2018-03-03 15:17:11 -08:00
William Chargin 50c575b2f9
Explicitly test address↔string function error case (#63)
Summary:
`graph.js` coverage is now 100% :-)

Test Plan:
`yarn jest --env=jsdom --coverage` shows no uncovered lines for
`graph.js`, and no failing tests.

wchargin-branch: coverage-gremlin
2018-03-03 15:10:57 -08:00
William Chargin 9b203e8489
Add graph merge functions (#62)
Summary:
Merging graphs will be a common operation. At a per-plugin level, it
will often be useful to build up graphs by creating many very small
graphs and then merging them together. At a cross-project level, we will
need to merge graphs across repositories to gain an understanding of how
value flows among these repositories. It’s important that the core graph
type provide useful functions for merging; this commit adds them.

Test Plan:
New unit tests added; run `yarn flow && yarn test`.

wchargin-branch: graph-merge
2018-03-02 21:35:51 -08:00
William Chargin 82dbf64a2c
Add an equality function for `Graph` (#61)
Summary:
We need this for testing graph equality: deep-equality is not sufficient
because two graphs can be logically equal even if, say, two nodes are
added in different orders.

This commit adds a dependency on `lodash.isequal` for deep equality.

Test Plan:
New unit tests added. Run `yarn flow && yarn test`.

wchargin-branch: graph-equals
2018-03-02 21:13:30 -08:00
William Chargin 5a2380d486
Move `addNode`/`addEdge` tests away from getters (#60)
Summary:
Nothing big; these were just organized wrong.

wchargin-branch: test-reorg
2018-03-02 20:46:35 -08:00
William Chargin 58410c62fa
Replace lingering `mealGraph`s in test case (#57)
Summary:
In merging #54, there was a semantic merge conflict that was not also a
textual merge conflict; this created a failure that only appeared once
that commit was merged.

We propose that to fix this in the future, we only merge commits that
are directly ahead of master.

Test Plan:
This fixes `yarn flow` and `yarn test`.

wchargin-branch: fix-merge-conflict
2018-03-02 14:16:51 -08:00
William Chargin 97446138ab
Make `Address`, `Node`, `Edge` read-only and exact (#56)
Summary:
Again: we assume these invariants, so we may as well encode them.
We should just keep in mind that non-Flow users may wantonly violate
these, so we should still code defensively.

wchargin-branch: readonly-exact
2018-03-02 13:49:34 -08:00
William Chargin f305a48391
Check for `null`/`undefined` in graph functions (#55)
Summary:
These will make nicer error functions in cases where static analysis
doesn’t detect the pollution: e.g., a user isn’t using Flow, or an
expression like `arr[0]` introduces an `undefined`.

Paired with @dandelionmane.

Test Plan:
New unit tests added. Run `yarn test`.

wchargin-branch: null-undefined-check
2018-03-02 13:47:13 -08:00
Dandelion Mané ca3502009b
Create an 'advancedMealGraph' test case (#54)
Create an 'advancedMealGraph' test case

The advancedMealGraph will be a grab-all that holds all advanced and
edge behaviors, e.g. the crab-self-referential loop, and the case
where there are multiple directed edges between the same two nodes.

Aggregating them into one test case will make it easier to test more
complex behaviors, like graph merging and serialization, on the
edge case graphs. However, it's still nice to have the simple graph
so that we can test simple things too. The specific tests for edge
case behavior are left mostly unchanged, in that they start from the
simple graph and add just the advanced feature that they want to test.
2018-03-02 13:45:52 -08:00
William Chargin cae3a92dc9
Add `getAllNodes` and `getAllEdges` functions (#53)
Summary:
Without these functions, it is not possible to meaningfully operate on
an arbitrary graph.

Paired with @dandelionmane.

Test Plan:
New unit tests included. Run `yarn flow && yarn test`.

wchargin-branch: get-all
2018-03-02 11:33:45 -08:00
William Chargin 01510ca63f
Make node and edge types exact (#51)
Summary:
We’ve realized that `u: Edge<T>` implies `u: Node<T>`. That certainly
wasn’t what we were expecting! We might want something like that
eventually, to capture the fact that valuations are themselves valuable,
but for now the type system should encode the assumptions that we’re
actually making. See also #50.

Paired with @dandelionmane.

wchargin-branch: exact-types
2018-03-02 11:31:38 -08:00
William Chargin 09156bf3f4
Promote `Graph` to a class with useful methods (#49)
Summary:
We had planned to expose our core types as simple Plain Old JavaScript
Objects, with accompanying standalone functions to act directly on these
data structures. We chose this instead of creating `class`es for the
types because it simplifies serialization interop: it obviates the need
for serialization and deserialization functions, because the code is
separated from the data entirely. Reconsidering, we now think that the
convenience benefits of using classes probably outweigh these
serialization cons. Furthermore, this design enables us to separate
ancillary data structures and caches from the raw data, presenting a
cleaner API for consumers of the data.

This commit introduces a `Graph` class and some related logic. With lots
of tests! And 100% code coverage! :-)

Paired with @dandelionmane.

Test Plan:
Run `yarn flow && yarn test` to see the new tests.

wchargin-branch: graph-class
2018-03-01 01:04:11 -08:00
William Chargin f5d486087d
Pull in-edges and out-edges up to top-level graph (#48)
Summary:
The main problem with having these fields on the node is that this
presents the illusion that the API surface area is larger than it
actually is. Clients with reference to a node object could
somewhat-reasonably expect that mutating these fields would be
sufficient to update the structure of the graph, but this isn’t the case
(as the edge objects would need to be updated, too). It’s a nice
semantic bonus, too, as edges aren’t conceptually “part of” nodes.

wchargin-branch: top-level-edges
2018-03-01 00:48:23 -08:00
William Chargin 66243a16c1
Remove weights from the weighted graph (#47)
Summary:
This is an experiment. There are a couple diffferent meanings of
“weight” in play: most prominently, weights assigned by plugins versus
those suitable for comparison among other arbitrary weights. We’re not
sure what the right thing is to put in the actual graph object, so we’re
going to think about this a bit more before adding the field back in.

wchargin-branch: remove-weights
2018-03-01 00:45:52 -08:00
William Chargin 43450f18b1
Rename `sourceId`/`destId` to `src`/`dst` (#46)
Summary:
The “ID” parts were left-over from the Great Address Migration, and we
think that abbreviations are fine here, anyway.

Test Plan:
`yarn flow && yarn test`

wchargin-branch: src-dst-rename
2018-03-01 00:32:30 -08:00
William Chargin 01df727c39
Add tiny-example-repository example data (#44)
Summary:
The sourcecred/tiny-example-repository repository stores some example
data that we can use to generate test cases. As of now, the repository
has been archived so that its state is stable. This commit checks in the
result of our scraper on the repository.

wchargin-branch: example-data
2018-02-28 20:36:18 -08:00
Dandelion Mané 9dc9d5e4f3 Change order of repositoryName and pluginName 2018-02-28 17:47:15 -08:00
Dandelion Mané 58ad1eb635 Add inEdges and outEdges for Nodes. 2018-02-28 17:47:15 -08:00
Dandelion Mané 2992a31157 Graph concept renames
- ID -> Address
- ID.name -> Address.id
- GraphEdge -> Edge
- GraphNode -> Node
2018-02-28 17:47:15 -08:00
William Chargin 791cad9059
Add conversion functions for id (#38)
Test Plan:
Run `yarn test` and note that tests pass.

wchargin-branch: id-conversion
2018-02-26 22:57:00 -08:00
Dandelion Mané bc2377448f
Move package json to root (#37)
Reorganize the code so that we have a single package.json file, which is at the root.
All source code now lives under `src`, separated into `src/backend` and `src/explorer`.

Test plan:

- run `yarn start` - it works
- run `yarn test` - it finds the tests (all in src/explorer) and they pass
- run `yarn flow` - it works. (tested with an error, that works too)
- run `yarn prettify` - it finds all the js files and writes to them
2018-02-26 22:32:23 -08:00