This commit modifies `PagerankGraph.runPagerank` so that the user can
provide an alpha and seed vector. The seed vector is specified via a map
of weights, which will be normalized into a probability distribution
over all the nodes in the graph. In the event that the map is empty (or
the total weight is otherwise 0), a uniform distribution is created.
To effect this change, a helper function called `weightedDistribution`
has been added (and thoroughly tested) in the `graphToMarkovChain`
module. Then, that function is used in `pagerankGraph.runPagerank`
(along with light testing).
Currently, the default alpha is set to 0, to ensure consistency with the
legacy pagerank implementation in `analysis/pagerank`. Once that has
been replaced with `PagerankGraph`, we can consider changing the defualt
alpha to non-zero (thus removing the need for synthetic self-loops).
I took a different approach in the [odyssey-hackathon repo][commit].
The previous approach was a much more complicated (and fairly redundant)
API, that allowed specifying "NO_SEED", "UNIFORM_SEED", "SELECTED_SEED",
and "SPECIFIED_SEED". I'm much happier with this API and implementation.
[commit]: ed07861073
Test plan: Unit tests included; run `yarn test`.
Summary:
the cred calculation is defined by a Markov Mixing process. By
introducing the seed vector and teleportation parameter alpha, the
Markov mixing process is augmented with a source of cred originating
from the seed vector. The resulting algorithm is the generalized
variation of PageRank, allowing computation of both canonical PageRank
where the seed vector is the uniform distribution and personalized
PageRank where the seed vector is an indicator distribution. It is still
possible to get the simple markov chain solution by setting alpha = 0.
Note that this changes the Markov process state update, but does not
provide updates to the APIs. All existing behavior is unchanged because
alpha is always set to 0.
This is a port of
https://github.com/sourcecred/odyssey-hackathon/pull/3,
which was created during the Odyssey Hackathon.
Test Plan:
Existing tests have been extended to include passing alpha = 0 to
reproduce exisiting test cases for the simple Markov Process. Additional
test cases include
- Verifying that resulting stationary distribution is unaffected by seed when alpha = 0
- Verifying that resulting stationary distribution is precisely equal to seed when alpha = 1
- Verifying that the resulting stationary distribution is linear in the seed vector
- Verifying that the correct stationary distribution is computed for non-zero alpha
- verify that the algorithm converges immediately when the initialDistribution is the stationary distribution
- verify that the changing the initialDistribution does not change the stationary distribution
Paired with @mzargham
Right now PagerankGraph requires that the user choose specific values
for maxIterations and convergenceThreshold when running PageRank.
I also rename `PagerankConvergenceOptions` to `PagerankOptions`.
The motivation is that I want to add future arguments to the same
options dict (e.g. alpha and the seed vector), so the rename is
appropriate, and allowing the options to be unset (and thus inherit
default values) will make the whole API much cleaner as I add more
options.
Test plan: Unit test added. `yarn test` passes.
In [#1128: Add support for seed vectors][#1128], we significantly
increase the number of arguments to
markovChain.findStationaryDistribution. To clean up the invocations, I
added a followon PR (#1129) which converts findStationaryDistribution to
use a `PagerankParams` object instead.
However, I think it will be cleaner to land the PagerankParams refactor
before adding new features in #1128, so I'm making this PR as
pre-cleanup.
Test plan: This is a trivial refactor. `yarn test` passes.
[#1128]: https://github.com/sourcecred/sourcecred/pull/1128
This is a minor refactor so that we use Aphrodite for styling on
HomePage.js. It's not super consequential, but I want to switch to using
Aphrodite more consistently in the codebase, so why not start here.
Test plan:
`yarn test` reveals no errors.
`yarn start` launches a correctly styled frontend.
I also used `build_static_site.sh` and the resultant site is also
correctly styled.
This commit updates the `sourcecred load` command so that it also
automatically runs PageRank on completion.
The implementation is slightly hacky, in that it prints two sets of
task status headers/footers to console, for reasons described in a
comment in the source code. The user-visible effect of this hack can
be seen below:
```
❯ node bin/sourcecred.js load sourcecred/example-github
Starting tasks
GO load-git
GO load-github
DONE load-github
DONE load-git
Overview
Final result: SUCCESS
Starting tasks
GO run-pagerank
DONE run-pagerank
Overview
Final result: SUCCESS
```
It would be good to clean this up, but for now I think it's acceptable.
Note that it is not safe to assume that a PagerankGraph always exists
for repos that are included in the RepoIdRegistry. The repo gets added
to the registry before the pagerank task runs. Consumers that are
loading the `PagerankGraph` can just check that the file exists, though.
Test plan: I've added unit tests that verify that the right tasks are
generated. Most importantly, the snapshot of the results of `sourcecred
load` now include a snapshotted pagerank graph.
(The snapshot was updated via `UPDATE_SNAPSHOT=1 yarn test --full`.)
Further progress on #967.
Across SourceCred usage, we depend on the `SOURCECRED_GITHUB_TOKEN`
environment variable being set. Confusingly, some tests expect
`GITHUB_TOKEN` instead of `SOURCECRED_GITHUB_TOKEN`.
This commit resolves that inconsistency, by having all tests that read
from the environment use `SOURCECRED_GITHUB_TOKEN`. This was already
available as a secret in our CI configuration, so no change is needed
there. (After this merges, we may remove the GITHUB_TOKEN variable from
the environment.)
Test plan: `yarn test --full` passes without the environment variable
set. Also, the following grep produces only innocuous hits.
```
git grep -P "(?<\!SOURCECRED_)GITHUB_TOKEN"
```
This commit refactors the `sourcecred load` CLI command so that it uses
dependency injection, much like the testing setup #1110. This makes it
feasible to test "surface logic" of how the CLI parses flags and
transforms them into data separately from the "piping logic" of invoking
the right API calls using that data.
This is motivated by the fact that I have other pulls on the way that
modify the `load` command (e.g. #1115) and testing them within the
current framework is onerous.
Test plan:
This is a pure refactoring commit, which substantially re-writes the
unit tests. The new unit tests pass (`yarn test --full` is happy).
Note that `yarn test -full` also includes a sharness test that does an
E2E usage of `sourcecred load`
(see sharness/test_load_example_github.t), so we may be confident that
the command still works as intended.
This commit fixes three broken links (two in the README, one in the prototype app) that were still pointing to https://discuss.sourcecred.io/.
Test plan:
Verify that there are no other bad links to the old Discourse location, by running `git grep "discuss.sourcecred.io"`.
This commit adds a new CLI command, `pagerank`, which runs PageRank on a
given repository. At present, the command only ever uses the default
weights, although I plan to make this configurable in the future. The
command then saves the resultant pagerank graph in the SourceCred
directory.
On its own, this command is not yet very compelling, as it doesn't
present any easily-consumed information (e.g. users' scores). However,
it is the first step for building other commands which do just that. My
intention is to make running this command the last step of `sourcecred
load`, so that future commands may assume the existence of pagerank
scores for any loaded repository.
Test plan: The new command is thoroughly tested; see
`cli/pagerank.test.js`. It also has nearly perfect code coverage (one
line missing, the dependency-injected real function for loading graphs).
Additionally, the following sequence of commands works:
```
$ yarn backend
$ node bin/sourcecred.js load sourcecred/pm
$ node bin/sourcecred.js pagerank sourcecred/pm
$ cat $SOURCECRED_DIRECTORY/data/sourcecred/pm/pagerankGraph.json
```
Material progress on #967.
This commit adds a module, `fetchGithubOrg`, which loads data on GitHub
organizations, most notably including the list of repositories in that
org.
The structure of this commit is heavily influenced by review feedback
from @wchargin's [review] of a related PR.
Test plan: This logic depends on actually hitting GitHub's API, so the
tests are modeled off the related `fetchGithubRepo` module. There is a
new shell test, `src/plugins/github/fetchGithubOrgTest.sh`, which
verifies that that the org loading logic works via a snapshot.
To verify the correctness of this commit, I've performed the following
checks:
- `yarn test --full` passes
- inspection of `src/plugins/github/example/example-organization.json`
confirms that the list of repositories matches the repos for the
"sourcecred-test-organization" organization
- manually breaking the snapshot (by removing a repo from the snapshot)
causes `yarn test --full` to fail
- running `src/plugins/github/fetchGithubOrgTest.sh -u` restores the
snapshot, afterwhich `yarn test --full` passes again.
[review]: https://github.com/sourcecred/sourcecred/pull/1089#pullrequestreview-204577637
This pulls the logic for loading a SourceCred graph from disk out
`cli/exportGraph` and into `analysis/loadGraph`. The rationale is that
`exportGraph` is not the only command that wants the ability to load a
graph from the analysis adapters.
The new command has a clean return signature that reports whether the
load was successful, or failed because the graph wasn't loaded, or
failed due to an error in plugin code.
Testing of the loading logic has been moved to `loadGraph.test`, and the
CLI has been refactored so that the loadGraph method is dependency
injected. This allows for (IMO) cleaner testing of the CLI method.
There is one (deliberate) change in behavior, which is that the command no
longer throws an error if no plugins are included; instead it will just
export an empty graph. I don't have a strong preference between the two
behaviors; changing it was just more convenient.
Test plan: New unit tests have been added, and tests of the cli command
have been re-written. As a sanity check, I've verified that the
following sequence still works:
```
$ yarn backend
$ node bin/sourcecred.js load sourcecred/pm
$ node bin/sourcecred.js export-graph sourcecred/pm
```
Nearly perfect code coverage is maintained. One line is uncovered, and
it's the line that injects in the actual graph loading behavior.
This commit adds a `neighbors` method to `PagerankGraph`. This is an
augmented version of `Graph.neighbors`. It returns the base data from
`Graph.neighbors` as well as the score, the edge weights, and the score
contribution. The score contribution basically means how much score was
contributed from the target node by this particular neighbor connection.
When the graph is well-converged, a node's score will be the sum of all
its neighbors' score contributions, as well as the contribution it
received from its synthetic loop edge. So, for completeness sake, I
added another method, `syntheticLoopScoreContribution`, which computes
how much score a node received from its synthetic loop edge. (This value
should usually be close to 0).
You can think of these two methods as providing a replacement for the
`PagerankNodeDecomposition` logic.
Test plan: I've added tests that verify:
- That neighbors returns results consistent with Graph.neighbors
- That neighbors' score contributions are computed correctly
- That neighbors errors if the graph has been modified
- That synthetic score contributions are computed correctly
- That a node's score is the sum of all its contributions
Test plan: Unit tests included. Run `yarn test`.
This commit moves the default Pagerank options out of
`analysis/pagerank` and to `core/pagerankGraph`. This reflects the
gradual migration of core pagerank logic into `pagerankGraph`.
Test plan: `yarn test` should suffice. It's a trivial change.
* Show tooltips in weightConfig UI
* Updated to pass checks from prettier
* Updates unit tests to check WeightSlider descriptions
* Update CHANGELOG.md to reflect PR #1081
* Add CLI command: `sourcecred export-graph`
This adds an `export-graph` command to the SourceCred CLI. It exports
the combined cred graphs for individual RepoIds, as was done for
[sourcecred/research#4].
Example usage:
```
$ node bin/sourcecred.js load sourcecred/mission
$ node bin/sourcecred.js export-graph sourcecred/mission >
/tmp/mission_graph.json
```
Test plan:
The new command is thoroughly unit tested.
[sourcecred/research#4]: https://github.com/sourcecred/research/pull/4
* Address review feedback by @wchargin
This commit makes several improvements to `repoIdRegistry`:
- Create `writeRegistry` and `getRegistry` methods to abstract over
needing to find the right file, populate an empty registry if its not
available, etc.
- Create `getEntry` for efficiently checking whether a RepoId is in the
registry
- Rename `addRepoId` to `addEntry` for consistency
- Add docstrings
The `load` command has been refactored to use the new methods.
Test plan: Unit tests added, and they pass. The `load` command is
already thoroughly tested, so regressions are very unlikely.
Part of ongoing work for #1020.
Test plan:
Added tests that mirror the edge filtering tests in `graph.test`
to check that `graph` and `pagerankGraph` return the same edges
with the given `EdgesOptions` parameter. Also added a sanity check
that a `weight` prop is returned from the iterator along with the edge.
Given the dependence on a helper function to test the edge
iterator's equality between graphs, I would suggest reviewers give
particular attention to that function:
`expectConsistentEdges()`
This commit adds a `totalOutWeight` method to `PagerankGraph`.
For any given node, `totalOutWeight` reports the total weight traveling
away from the node on edges (including the synthetic loop edge). Using
totalOutWeight makes it possible to normalize the weights to get the
actual markov transition probabilities.
Test plan: Unit tests verify the following properties:
- An error is thrown if the requested node does not exist.
- An error is thrown if the graph has been modified.
- The out weights are computed correctly in the standard case.
- The out weights are computed correctly in the case where there are no
weights (except the synthetic loop weight)
- The out weights are still computed correctly after
JSON-deserialization.
Inspired by a [suggestion] @decentralion made to improve #1105
This will enable `pagerankGraph` to throw an error when it is
called with invalid option parameters. Previously, to elicit
this error we had to access the iterator through `Array.from()`
or similar.
Test plan:
Yarn test passes.
Specifically, I removed the `Array.from()` wrapper around `pagerankGraph`
in the test that checks to see that `pagerankGraph` throws an error when
`nodes()` is passed invalid options.
[suggestion]: https://github.com/sourcecred/sourcecred/pull/1105#pullrequestreview-206496537
The motivation for this change is to make it easier
to access the selected Node's `name` prop for #576,
in which we plan to show a Card displaying summary
stats for the selected node. With only the `topLevelFilter`
available, it's trickier than it needs to be to find out
a node type's `name`.
Test Plan:
* Yarn test passes.
* Visual/Manual inspection of table doesn't surface any issues.
* Updated `it("filter defaults to defaultNodeFilter if available")`
to `it("selectedNodeType defaults to defaultNodeType if available")`.
* Verified that the above new test is failable in several ways by
mangling the tests to test for the wrong node type and mangling the
code to set the wrong node type.
* Since we factored out 'topLevelFilter' and 'defaultNodeFilter',
running `git grep -i topLevelFilter` and `git grep -i defaultNodeFilter`
turns up empty, just to make sure those terms aren't hanging
around to confuse anybody in the future.
* I don't think changing the `prop` parameter warrants any
additional tests, as the current tests verify that the prop
is passed in correctly.
This was at @decentralion's suggestion, following the Contributing
Guideline's Kent Beck quote of making the easy change to make the
change we were originally after (#576) easier. 🙌
* Highlight tableRows on :hover and :focus-within
Resolves#1041
The purpose of this commit is to make the pagerankTable easier
to read, as it's currently difficult to distinguish which score is
associated with which row because of the tight spacing of the
rows and the space between the score column and the row detail column.
@wchargin provided the implementation using `linearGradient()`
and `backgroundImage`s.
A side effect of highlighting the row on `focus-within` is that the rows
will become highlighted when the expand button is clicked, which we
decided was acceptable.
Test plan:
Yarn test passes.
To test the new highlight behavior, visual/manual inspection
passes.
Also added the Aphrodite className to the snapshot
tests. The combination of testing the className + inline style props
should make these tests sensitive to UI changes in the future.
Screenshots:
<img width="939" alt="screenshot 2019-02-17 15 46 34" src="https://user-images.githubusercontent.com/26695477/52918955-332f5280-32cb-11e9-87d3-887c8877116a.png">
<img width="931" alt="screenshot 2019-02-17 15 45 10" src="https://user-images.githubusercontent.com/26695477/52918953-2f9bcb80-32cb-11e9-9356-82c6dccab4ae.png">
* bump CI
Suggested by @decentralion in his review of #1090
Test plan:
yarn test passes. Also verified that the new test case is
failable, if you pass in the wrong array of nodes to `expect()` or
if you mangle the node filter code.
Continuing work on #1020.
Adding an optional parameter to `nodes()` which enables optional
node prefix filtering.
Test plan:
@decentralion suggested on Discord that the tests should verify:
1) the parameter was passed to `_graph` correctly
2) the augmentation logic was applied correctly
The tests I added are identical to the tests in `graph.test`, except
that they verify that the result of `pagerankGraph` matches that of
`graph`. On one hand, this creates a dependence on `graph`,
as these tests don't verify that the filter works correctly, only that
graph has applied the filter and returned the iterator.
However, my prevailing thought is that it isn't `pagerankGraph's` responsibility
to test the behavior of `graph`, and so testing the exact filter results
of `pagerankGraph` like we do in `graph.test` isn't the best strategy, and
testing that `pagerankGraph`'s results equal `graph`'s results is a better strategy.
The tests also check that a `score` is provided alongside each `node` in the iterator,
to minimally satisfy @decentralion's second spec.
yarn test passes.
* PagerankGraph: Add toJSON/fromJSON
This commit adds serialization logic to `PagerankGraph`. As with many
things in PagerankGraph, it's based on the corresponding logic in `Graph`.
Much like graph, it stores data associated with nodes and edges (in this
case, the scores and edge weights) in an ordered array rather than a
map, so as to avoid repetitiously serializing the node and edge
addresses.
Test plan: Unit tests added, and they should be sufficient. Also take a
look at the included snapshot.
Part of ongoing work for #1020.
Adds an equals method for the PagerankGraph. This is really quite
straightforward, the logic is based on the matching logic for
`Graph.equals`.
Tests added.
Test plan: The added tests are comprehensive, and they pass.
As discussed in #1004, we want to be able to package metadata with a
graph's nodes and edges. We can do this much more compactly if we store
the metadata as an array, ordered by the corresponding node/edge
address, rather than storing a map. The disadvantage is that clients
then need to manually sort the graph addresses to deserialize.
This commit adds public methods that allow a client to efficiently
retrieve the sorted addresses from the GraphJSON (where they are already
serialized). This behavior is tested. Note that we appropriately don't
allow clients to peek and directly depend on the exact representation of
GraphJSON, we just promise that sorted address retrieval is possible.
Test plan: Unit tests added (and I verified that breaking the sorting
will fail the test).
* Start work on the PagerankGraph
This commit begins work on the `PagerankGraph` class, as described in
[#1020]. As of this commit, the `PagerankGraph` has basic functionality
like retrieving nodes and edges, and running PageRank. However, it is
missing utility functionality like equality testing and serialization,
and doesn't yet have score decomposition logic.
This was mostly produced during a [live coding session]. Thanks to
@BrianLitwin, @anthrocypher, and @wchargin for participating.
Test plan:
The new code is thoroughly unit tested. Please review the test coverage,
and also the quality of the documentation.
[#1020]: https://github.com/sourcecred/sourcecred/issues/1020
[live coding session]: https://github.com/sourcecred/mission/issues/14
* Improvements from self-review
- Don't allow PRG around empty graph, as there's no way to make it
a valid probability distribution
* Add issue ref in TODOs
This commit modifies `markovChain.findStationaryDistribution` so that
in addition to returning the final distribution, it also reports the
final convergence delta.
This is motivated by the proposed API for the new PagerankGraph (see
[#1020]). Also, I think it makes a nice addition to the test code.
Note that this slightly changes the output from `findStationaryDistribution`,
because we now return the first distribution that is sufficiently converged,
rather than that distribution with one additional Markov action.
Test plan:
Unit tests are updated, and `yarn test` passes.
[#1020]: https://github.com/sourcecred/sourcecred/issues/1020
Thanks to @BrianLitwin for semi-pair-programming it
Thanks to @wchargin for extensive review feedback.
Pull #1080 added in a description field for edge types, but put in a
placeholder message for each actual description. This pull adds in
descriptions for each edge type.
Test plan: `yarn test` passes, and additionally
`git grep 'TODO: Add a description for this edge type'` returns no hits.
Reviewed by @BrianLitwin and @wchargin.
* Enable loading private git repositories
This commit enables loading private repositories, assuming that the user
has ssh-agent configured with keys to allow cloning the private
repository, and has provided a GitHub API token with permissions for the
repository in question.
I have not added automated testing. I don't think a cost-benefit
analysis favors adding such tests at this time:
- This code changes very infrequently, and so is unlikely to break
- If it does break, it will be pretty easy to catch and to fix
- the @sourcecred org is on a free plan, which doesn't allow private
repos, so setting up the test case is a bit of a pain
Test plan: `yarn test --full` passes, so I haven't broken existing Git
clone behavior. Locally, I am able to load private repositories.
* Remove unnecessary process import.
Fixes#1019.
Test plan: Loading the prototype works, as does clicking through to different prototype pages.
“Running `git grep -F '/prototypes/'` returns no results; before this commit, it yielded 2 results.”
This commit #811, allowing users to set the weights of node/edge types to 0.
The WeightSlider now sets the weight to 0 when its dragged to its minimum value.
The logic for converting between weights and sliders has also been made more robust,
and is more thoroughly tested.
In cases where we wanted to set the weight to 0 (e.g. backwards Reaction edges),
the default weight has been changed.
Test plan:
Loading the UI, check that the sliders still work as expected (dragging them changes the displayed weight, dragging to the far left sets weight to 0). Check that the weights are consumed as expected (setting weight for issues to 0 leads to no cred for issues). Check that the weights for backwards reaction edges now have 0 weight. `git grep "TODO(#811)"` returns no hits.
PR #1075 added a new EdgeType, and #1080 added a new field to EdgeTypes.
Both PRs merged and this broke the build.
This very trivial commit fixes the build breakage in a noncontroversial
way (copies the placeholder edge description used for every other edge
over).
Test plan: `yarn test` passes.
Updating github example data with support
for 🚀 and 👀 reaction types.
This follows #1068 and @decentralion updating
the archived repo with the new reaction types.
`src/plugins/github/fetchGithubRepoTest.sh -u`
(as @decentralion suggested) updated `example-github.json`
`yarn unit` caught two tests with failing snapshot
tests (`createGraph.test` and `relationalView.test`), so
I updated those with `yarn unit -u`
`yarn test -full` caught a failing snapshot test
at `sharness-full`, resolved by updating the
snapshot in `view.json.gz` with
`UPDATE_SNAPSHOT=1 yarn test --full`.
Thanks to @wchargin for the [explanation] on how
to resolve that issue.
[explanation]: https://github.com/sourcecred/sourcecred/pull/1077#pullrequestreview-196805017
**Test Plan:**
`yarn test --full` is passing.
Additionally, the commands:
```sh
filepath="./sharness/__snapshots__/example-github-load/data/sourcecred/example-github/github/view.json.gz" &&
[ -f "${filepath}" ] && # sanity check
diff -u \
<(git show "HEAD:${filepath}" | gzip -d | jq .) \
<(gzip -dc "${filepath}" | jq .) \
;
```
yields the following output:
```
--- /dev/fd/63 2019-01-27 08:34:15.020387301 -0500
+++ /dev/fd/62 2019-01-27 08:34:15.021095696 -0500
@@ -654,6 +654,22 @@
"subtype": "USER",
"login": "decentralion"
}
+ },
+ {
+ "content": "ROCKET",
+ "user": {
+ "type": "USERLIKE",
+ "subtype": "USER",
+ "login": "decentralion"
+ }
+ },
+ {
+ "content": "EYES",
+ "user": {
+ "type": "USERLIKE",
+ "subtype": "USER",
+ "login": "decentralion"
+ }
}
]
}
```
Again, thanks @wchargin's for providing those commands and accompanying
explanation.
Resolves#1054
Added "ROCKET" and "EYES" to the list of reaction types.
Added "ROCKET" as a valid cred signal, kept "EYES" invisible.
My approach was to use `git git grep THUMBS_UP '*.js'`
and `git grep ThumbsUp '*.js'` to find all the relevant files,
as suggested in #1054
**Test Plan**
1) Inspecting Sourcecred/Mission's UI:
[#13] contains: GOT 🚀 FROM 1 user
@BrianLitwin contains: REACTED 🚀 TO 1 issue
@BrianLitwin contains: REACTED 🚀 TO #13
2) Yarn Test passes
3) `github/edges.test` includes a snapshot test to verify
that we can create an edge using ROCKET
4) @wchargin also noted that:
```sh
diff -u <(git grep -c 'THUMBS_UP' '*.js') <(git grep -c 'ROCKET' '*.js')
diff -u <(git grep -c 'ThumbsUp' '*.js') <(git grep -c 'Rocket' '*.js')
```
passes.
`graphql/mirror.test` now includes "ROCKET" and "EYES" in the example
GithubSchema, but their inclusion has no effect
on any tests.
**Screenshots**
1.
<img width="378" alt="screenshot 2019-01-22 09 02 12" src="https://user-images.githubusercontent.com/26695477/51540428-6c87b600-1e24-11e9-8334-1d9d993dce01.png">
2.
<img width="525" alt="screenshot 2019-01-22 09 02 41" src="https://user-images.githubusercontent.com/26695477/51540472-84f7d080-1e24-11e9-8847-245c0c09ddd6.png">
<br>
Shoutout to [this comment], which saved me an untold amount of head-scratching,
and also @Decentralion's help debugging in the Issue thread.
[#13]: https://github.com/sourcecred/mission/issues/13
[this comment]: e0762303d4/src/plugins/github/graphqlTypes.test.js (L13-L15)
* Add descriptions for NodeTypes
As highlighted by @decentralion in issue #807, we need descriptions for Node and
Edge types in the UI to explain to users what each Node and Edge type does. This
PR modifies the type definition for `NodeType` and adds a `+description: string`
field, then updates all NodeTypes throughout the codebase with descriptions.
Test plan:
Verify that all tests pass and the descriptions makes sense.
This commit adds a new `modificationCount` method to `Graph`, which
exposes's that graph's modification count. This enables clients to write
cached data structures on top of Graph, knowing that they can
programatically detect when the cache has been invalidated.
Test plan: Unit tests have been addded; `yarn test` passes.
This commit is motivated by work on #1020.
There are two kinds of plugin adapters: adapters for doing cred
analysis, called "analysis adapters", and adapters for the cred
explorer, which are confusingly called "app adapters".
This commit decreases the confusion by renaming app adapters to explorer
adapters across the codebase. In a future commit, I will add
documentation to the adapter interfaces so that it is clearer to a
newcomer to the codebase why these interfaces exist.
Thanks to @BrianLitwin, who asked a question during [office hours]
that surfaced this issue.
[office hours]: https://github.com/sourcecred/mission/issues/12
Test plan: `yarn test` passes, suggests that this rename went off
without a hitch. Code review as a sanity check.
Also: grepping for `AppAdapter` returns 0 results:
```
$ git grep AppAdapter | wc -l
0
```
Note: After producing this commit, I can confirm that the word "adapter"
starts to look like utter gibberish after you type it often enough.
This commit adds some docstrings for the concepts of NodeType and
EdgeType. I also swapped the order so that `NodeType` comes first,
which is more consistent with usage across the codebase.
This commit makes no changes to the actual code; the only effects
are re-organization and documentation.
Test plan: `yarn test` && human inspection
Summary:
We use Aphrodite, not CSS imports, for styling. We do have a small
`index.css` file that is included during server-side rendering, and is
only referenced from `src/homepage/server.js`. But our `index.js` file
also has a superfluous `import "./style.css"`, which might suggest that
we support CSS imports more generally. This patch removes that import.
Thanks to @brianlitwin on Discord for pointing out that this might be
confusing.
Test Plan:
Verified that, under both `yarn start` and `yarn build`, the appearance
is the same, and the document still includes a `<style>` element with
the contents of `index.css` (which is included by `server.js`).
wchargin-branch: remove-css-import
* Add documentation to the Graph module
This commit adds a module-level docstring that gives an overview of the
Graph class and its importance to SourceCred, as well as adding
docstrings to specific methods.
Test plan:
In addition to review by the SourceCred maintainers, this should be
reviewed by at least one person who is not familiar with the codebase,
so that we can verify that it's actually working as documentation. :)
* Incorporate @wchargin's many suggestions.
Test plan: Human review.
This commit adds a new runOption for execDependencyGraph, namely
`printVerboseResults`. If this flag is true, then execDependencyGraph
will print a "Full Results" section along with the standard error and
standard out of every task, regardless of whether it failed or
succeeded. (Note, this is the existing behavior for all invocations
prior to this commit).
If the flag is not true, then execDependencyGraph will not print a full
results section, and stdout/stderr will be logged only for tasks that
fail.
This commit also modifies `yarn test` to use the new flag so that it
prints verbose tests only when the `--full` option is provided. This is
consistent with our sharness behavior: we print the full sharness logs
only when `--full` was provided.
This fixes#1035, and ensures that running `yarn test` has a high signal
to noise ratio (i.e. it only shows an enumeration of top level tasks).
This improves the developer ergonomics of SourceCred by not having a
super commonly used and core script spam the user with mostly irrelevant
information.
Test plan:
Run `yarn test` when all tests are passing, and observe that the output
has much less noise:
```
yarn run v1.12.3
$ node ./config/test.js
tmpdir for backend output: /tmp/sourcecred-test-6337SZ9smvWsWvqE
Starting tasks
GO ensure-flow-typing
GO check-stopships
GO check-pretty
GO lint
GO flow
GO unit
GO backend
PASS check-stopships
PASS ensure-flow-typing
PASS flow
PASS backend
GO sharness
PASS sharness
PASS check-pretty
PASS lint
PASS unit
Overview
Final result: SUCCESS
Done in 11.66s.
```
Run `yarn test` when there is a real failure (e.g. a unit test failure)
and observe that full details on the failure, including the output from
stdout/stderr, is still provided.
Run `yarn test --full` and observe that full, verbose logs are provided.
Resolves#1027
Using `repoId.owner/repoId.name` for the project title
because that is how projects are identified on `PrototypePage`.
Created a `<ProjectDetail />` component inside `<App />` that consumes a `RepoId`
and renders a title.
**Test Plan:**
Added two unit tests:
The first verifies that the parent `<App />` component
instantiates a `<ProjectDetai />` component with the correct props.
The current correct prop is a `RepoId` object.
The second test verifies that the `<ProjectDetail />` component renders
the title correctly given the `RepoId`, ie as a `<p>` element
with `repoId.owner/repoId.name` for text.
Visual tests verify that the title is above the Analyze Cred
button, and that clicking from one project to another renders
the appropriate title for separate projects.
Attaching a screenshot as a comment at #1032
for reference:
<img width="1253" alt="screenshot 2019-01-04 13 40 03" src="https://user-images.githubusercontent.com/26695477/50706562-34aeff00-102c-11e9-9c1c-6c1e3fa6c415.png">
This moves the invariant checking code from the top of the Graph class
to the bottom. Most readers of this file will probably be more
interested in seeing the API, and reading the invariant checker first
is likely to be confusing and off-putting.
Test plan: `yarn test` suffices. No semantic change.
This commit substantially improves SourceCred's performance in
production.
Measurement methodology: I create a new tab in Chrome, navigate to my local
prototypes, and select go-ipfs. I then turn on profiling, and click the
analyze button, and then turn off profiling when analysis is done. I
then go to the "bottom-up" tab in the JS analysis box on the bottom and
sort by "Total Time".
__Before this commit:__
| fn | total time | time as % |
|:---------------- | ----------:| ---------:|
| assertValid | 815ms | 8.6% |
| assertValidParts | 261ms | 2.7% |
__After this commit:__
| fn | total time | time as % |
|:---------------- | ----------:| ---------:|
| assertValid | 21ms | 0.2% |
| assertValidParts | 23ms | 0.3% |
Test plan: `yarn test`, also performance measurement as described above.
Fixes#1011.
This adds a blacklisted id for @greenkeeper, a bot which used to be a
user. This is a temporary fix until we solve #998.
Test plan: `yarn test` passes. Before this commit, attempting to load
`probot/probot` fails. After this commit, it succeeds.
When I implemented this function, I incorrectly assumed that
`lodash.sortBy` only calls subsequent accessor functions if there is a
tie from the first accessor. Actually, it calls it every time. We can
avoid lots of wasteful JSON.serialization by just grabbing the exact
properties of interest.
Test plan:
For correctness: `yarn test` suffices, as this functionality is already
tested.
For performance improvement: I ran the full load+analyze workflow, in
Chrome, on twbs/bootstrap. Before this change, decompose took 6.9s;
after this change, it takes 1.3s, for a 5.3x speedup.
Close#943.
Summary:
Our registry was defined to simply be a list of IDs. This is
insufficiently flexible; we want to be able to annotate these IDs with,
e.g., last-updated times (#989). This commit wraps the entries in a
simple object, updating clients appropriately.
Test Plan:
- Run `node ./bin/sourcecred.js load sourcecred/example-github` with a
repository registry in the old format, and note that it errors
appropriately.
- Run `yarn build` with a repository registry in the old format, and
note that it errors (“Compat mismatch”).
- Delete the old registry and re-run the `load` command. Note that it
runs successfully and outputs a registry. Run `yarn build`; note
that this works.
- Load data for two repositories. Run `yarn start`. Note that the list
of prototypes still works, and that you can navigate to and render
attributions for individual project pages.
- Verify that `yarn test --full` passes.
wchargin-branch: repo-id-registry-metadata
Summary:
There have been some breaking changes that require new type annotations,
which is a good thing: these prevent `any`-leakage.
Test Plan:
Run `yarn flow`.
wchargin-branch: flow-v0.86.0
Summary:
This serves as a regression test for #1000.
Test Plan:
Note that `yarn unit` passes with this patch but fails if the change to
the code is reverted, or if the patch in #1000 is reverted. Note that
`yarn build` also passes but fails if the patch in #1000 is reverted.
Note also that `yarn test --full` passes.
wchargin-branch: link-verify-trailing-slash
Summary:
Prior to this commit, clicking the in-copy link to the prototypes page
would raise a console error:
> Warning: [react-router] Location "/prototype" did not match any routes
Test Plan:
Run `yarn start` and click the link.
wchargin-branch: site-fix-homepage-prototype-link
Summary:
Prior to this commit, the prototypes page, which lists just a handful of
repositories, was rendered with a vertical scrollbar: you had to scroll
200px to see the version info. This is silly.
The `height: 100%` is necessary not to get it to fill up the whole page,
but to get it to _not_ fill up ~30 extra pixels. I have no idea why.
Test Plan:
Run `yarn start` and note that `/prototypes/` now renders without a
scrollbar, and with the version info in the bottom-right corner.
wchargin-branch: site-fix-prototypes-page-dimensions
This resolves an outstanding TODO in pagerankNodeDecomposition to remove
the unused sourceScore field.
I have removed it, and it was indeed unused.
Test plan: `yarn test` passes.
Summary:
The `MapUtil` map–object conversion functions used inexact objects for
both input and output. They are in fact stronger than that: they can
accept arbitrary inexact objects and return arbitrary exact outputs.
(Recall that exact objects are subtypes of their inexact counterparts,
so this is the maximally permissive combination.)
Test Plan:
Unit tests added. The “can return an exact object” test fails Flow
before this change. The other tests would have passed already.
wchargin-branch: maputil-exact-output
RepoIdRegistry is used across the project, but not in the explorer. So
it makes very little sense that it live in the explorer module. It's now
moved to core.
Test plan: `yarn test --full` passes
We added a configurable cred feedback url on the theory that we would
create a separate discourse post to collect feedback for each project in
particular.
We've now realized that no one is using this, so it's just vestigial
complexity now. So I'm removing the logic for configuring the feedback
url on a per-project basis.
Instead, we will always link to a Google form for collecting feedback.
Test plan: `yarn test --full` passes, and I manually checked the links.
Historically, a single cred explorer instance could load many different
repositories. This turned out to be an anti-feature: we'd rather have a
particular url hardlink to exploring the cred for a particular project.
This commit removes the repository select from the explorer, and instead
mandates that the explorer always has the RepoId passed down from above.
Besides providing a better UX, this also greatly simplifies the logic
for the explorer, since we no longer have an "initializing state" that
doesn't have any RepoId.
This builds on the work in #984, and swaps out the old "prototype" page
(which has been rendered non-functional by this change) for the new
"prototypes" page. Note that it stays at the same route, so links to
sourcecred.io/prototype will continue to function.
Test plan: Ran `yarn test --full`, and verified that `yarn start`
produces a working site.
Test Plan:
Apply the following patch:
```diff
diff --git a/src/homepage/routeData.js b/src/homepage/routeData.js
index 32d3eb65..aac7fc9a 100644
--- a/src/homepage/routeData.js
+++ b/src/homepage/routeData.js
@@ -38,7 +38,10 @@ const routeData /*: $ReadOnlyArray<RouteDatum> */ = [
path: "/prototypes/",
contents: {
type: "PAGE",
- component: () => require("./PrototypesPage").default([]),
+ component: () =>
+ require("./PrototypesPage").default([
+ {owner: "sourcecred", name: "example-github"},
+ ]),
},
title: "SourceCred prototypes",
navTitle: null, // for now
```
Then, load <http://localhost:8080/prototypes/> and see that there is an
entry in the list, and that it links to
<http://localhost:8080/prototypes/sourcecred/example-github/>. Note that
clicking the link raises a console error because there is no such route.
wchargin-branch: homepage-prototypes-page
Summary:
We want to remove the repository selector dropdown on the cred explorer
homepage and instead render a separate web page for each project. To do
this, we need to know which pages to render statically. We choose to
ingest this information from the state of the repository registry at
build time.
This commit adds an environment variable `REPO_REGISTRY` whose contents
are the stringified version of the repository registry, or `null` if
SourceCred has been built for the backend. This variable is defined with
Webpack’s `DefinePlugin`, so any code bundled by Webpack can refer to it
via `process.env.REPO_REGISTRY` both on the server and in the browser.
Paired with @wchargin.
Test Plan:
Sharness tests modified; running `yarn test --full` suffices.
The explorer no longer ships with a set of default plugins. (This made
an inappropriate dependency from explorer/ to plugins/, and complicated
explorer's contract as a generic component.) Instead, the homepage
module is responsible for choosing the plugins to display on the
homepage.
Test plan: `yarn test --full` passes, and `yarn start` reveals a
functioning homepage and prototype.
Currently version is located in `homepage/`, which doesn't make much
sense, since it's versioning the whole project.
We move it to core.
Test plan: `yarn test --full`
Summary:
This was used for ad hoc testing of the Mirror module before it was
integrated into SourceCred. We haven’t kept it up to date with schema
changes, and it is no longer needed: you can just run `sourcecred load`.
This was also the only untested code in the `graphql/` package, so it is
nice to remove it.
Test Plan:
Running `yarn test --full` passes.
wchargin-branch: remove-mirror-demo
Summary:
This adds object IDs to the GitHub GraphQL blacklist such that the
`twbs/bootstrap` repository can be loaded.
Ingesting the Mirror-extracted data into the RelationalView yields the
warnings
```
IssueComment[MDEyOklzc3VlQ29tbWVudDEwNTI4Mzk4Ng==].reactions: unexpected null value
IssueComment[MDEyOklzc3VlQ29tbWVudDI0NTQ3OTM3OA==].reactions: unexpected null value
IssueComment[MDEyOklzc3VlQ29tbWVudDMwNDE4NzIzMg==].reactions: unexpected null value
```
because we have nulled out these `Reaction`s in their enclosing
connections. This is expected.
Test Plan:
Run `yarn backend` and `node ./bin/sourcecred.js load twbs/bootstrap`.
Run `yarn start` and note that the cred attribution renders properly.
(Loading the GitHub data may take an hour or two. The resulting SQLite3
database is 172MB. Ingesting it into the `RelationalView` still takes
just a few seconds, and the cred attribution is rendered quickly.)
wchargin-branch: github-use-blacklists
Summary:
This enables us to deal with GraphQL remotes that violate their contract
guarantees and provide a node of the wrong type. An instance in which
the GitHub GraphQL API does this is documented here:
<https://gist.github.com/wchargin/a2b8561b81bcc932c84e493d2485ea8a>
A Mirror cache is only valid for a fixed set of blacklisted IDs. This is
necessary to ensure consistency. If the set of blacklisted IDs changes,
simply remove the existing database and download again from scratch.
Test Plan:
Unit tests added, with full coverage.
wchargin-branch: mirror-blacklist
Currently, the cred explorer is a submodule of `app`. This is somewhat
confusing, as `app` is essentially our homepage, and the explorer is a
standalone React application which happens to get embedded in our
homepage. This commit pulls the explorer from `app/credExplorer/` into
`explorer/`, which is a better organization.
The `app/adapters` were actually only used by the cred explorer, so
those files have been moved to `explorer/adapters`. We should rename
them from "App Adapters" to "Explorer Adapters", but I didn't do that in
this commit so as to minimize the (already substantial) size of the
change.
Also, we should rename `app/` to `homepage/` in a subsequent commit.
I encountered a nasty Flow bug, which I fixed with help from @wchargin.
The result is extra annotations on the demo and fallback dynamic
adapters (so that the `static()` method is type annotated).
Test plan: This change is massive, but it's just a rename. `yarn test`
suffices.
I'm planning to pull `credExplorer` out of `app` and into its own
top-level module. This is a bit awkward, as `credExplorer` depends on
a lot of little modules that are currently collected in `app/`.
To resolve this, I pull all of these little utility modules into
`webutil/`. It's not a totally principled grouping, but it's quite
convenient and keeps these rarely changing modules out of the way.
Test plan: It's a file move, `yarn test` suffices.
The logic for converting weights into an edge evaluator should not be
coupled to the frontend application.
Progress towards #967.
Test plan: Very straightforward rename; `yarn test` suffices.
Now that the `analysis` module owns the Node and Edge types, it should
own the "fallback plugin" too. (Note that it's not actually a plugin,
though it somewhat acts like one.)
We now declare the fallback type in `analysis`, along with a fallback
analysis adapter. `app/adapters` then declares a fallback app adapter.
Test plan: `yarn test`
Progress towards #967.
There's a folder called `app/credExplorer/weights` which contains the
type specification for weights (for PageRank configurability), and also
contains frontend code for specifying those weights. This commit creates
a `weights` module under `analysis` which will contain just the logic
for specifying and using the weights, without any frontend
consideration.
It's mostly a port of the existing logic in `credExplorer/weights`, with
the caveat that app adapter related concepts have been removed, in favor
of referencing the declaration instead.
We then remove the duplicated logic and re-route imports.
Test plan: `yarn test`
* Add the demo plugin
This ports the ad-hoc demo adapter defined in
`src/app/adapters/demoAdapters.js` into its own demo plugin.
This has the benefit that the demo plugin can now be depended on outside
the app module, e.g. for the analysis module as well. Correspondingly,
I've added a demo analysis adapter.
Test plan: `yarn test`. Note that no unit tests were added, as the demo
plugin is trivial.
* Delete `src/app/adapters/demoAdapters.js`
Now that we have an explicit demo plugin at `src/plugins/demo/`, we can
remove the legacy declaration of that plugin within the `app` module.
This commit deletes the old version, and re-writes all references to
point to the standalone plugin.
Test plan: `yarn test`
Summary:
It is time. (Replaced with #622.)
Test Plan:
Running `yarn flow` suffices. Running `yarn test --full` also passes.
wchargin-branch: remove-legacy-graphql
Summary:
This test has data in the old format, and uses the `RelationalView`
method that automatically translates it. As we prepare to delete that
code, we upgrade the underlying format of this test data. The end code
is nicer to read, too (e.g., we don’t need the `connection` helper
function).
Recommend reviewing with `git show -b`.
Test Plan:
Running `yarn test` suffices.
wchargin-branch: mentionsAuthorReference-remove-legacy-graphql
Summary:
A number of modules depended on the legacy `github/graphql.js` module
solely to get at the `Reactions` enum object. As of #961, that object is
exposed from the much lighter-weight `graphqlTypes.js`. This patch
switches over the relevant imports, reducing our dependencies on this
legacy module and its large bundle size.
Test Plan:
It suffices to run `yarn flow` and verify that the two values being
imported are identical.
wchargin-branch: github-use-generated-enums
Summary:
We have a `const Reactions` convenience enum in `github/graphql.js`.
That value is useful, but that module is slated to die. This commit
extends our Flow type generation script to include these values.
Test Plan:
Existing unit tests suffice.
wchargin-branch: schema-generate-enums
For #704, we're adding plugin adapters that are specific to the needs of
the analysis module. They have a simple scope: they just provide a way
to get the declaration, and to load the corresponding graph.
Adapters for the `git` and `github` plugins have been implemented, along
with unit tests.
Test plan: `yarn test` suffices.
Summary:
Fixes#953. See that issue for context.
Test Plan:
Unit tests updated. To see the change in action, load the SourceCred
data and expand @decentralion’s commits-authored to find commits that
were merged into non-`master` branches. Note that these commits are
rendered correctly (in the same way that they were before this patch),
and that there is no console error (new as of this patch).
![Screenshot](https://user-images.githubusercontent.com/4317806/47805669-1f98b580-dcf5-11e8-8683-8ee91f7f478a.png)
wchargin-branch: git-no-warn-on-unknown-commit
Test Plan:
Remove the SourceCred output directory, run `yarn backend`, and load
data for `sourcecred/example-github` and `sourcecred/sourcecred`. Then,
run `yarn start` and note that the cred explorer still works. Finally,
note that `yarn test --full` passes.
wchargin-branch: release-v0.2.0
Now that we're planning to add adapters for the `analysis` module, we
should rename the `PluginAdapter` to make it clear that it's scoped for
`app`.
Test plan: `yarn test` suffices
The plugin adapters are specific to `app/` and have logic for fetching
data from the backend, producing React nodes for descriptions, et
cetera.
However, they also have information that is generic to the plugin
itself: its name, its node/edge prefixes, and its types.
This method factors out the generic info into a `PluginDeclaration`,
which is a type (rather than an interface). Then, the plugin adapter has
a `declaration` method which returns the declaration.
Current users of the plugin adapters get additional mechanical
complexity because they need to call `.declaration().property` rather
than `.property()`, but this is not too significant.
The main benefit is that #704 is unblocked, as the cli `analyze` command
will be able to get necessary information from the declaration. As an
added benefit, the organization of plugin code is somewhat improved.
Test plan: `yarn test` sufficies, but I also ran `yarn start` and
checked the UI behavior to be safe.
The `core/attribution/` folder has some code that really is "core" in
that it deals with very basic concepts around converting graphs to
markov chains and running PageRank on them, and some code that is less
"core", like for normalizing scores and doing analysis on them.
To make progresson #704, we need an intermediary directory that has
analysis-related code that is e.g. aware of Node and Edge types, and
weights on those types, and can use them to run weight-informed
PageRank. That code shouldn't live in the app directory (since it is not
coupled to the frontend rendering), but also shouldn't live in core
(since "core" is basically finalized code with fully baked abstractions,
and per #710, this is not true of the node/edge type system).
Thus, I've decided to create the `analysis` directory. To get that
directory started, I've moved the non-core code in `core/attribution/`
to `analysis/`.
Test plan: `yarn test` passes, which is all we need, since this is a
straightforward file rename.
The `analyze` command is the first step towards #704 and #703. When
fully implemented, it will run PageRank for a loaded repository,
generating a complete graph and cred attribution.
For now, this just adds a scaffold. It does basic argument parsing, and
has help text, but the actual command is not yet implemented.
Test plan:
Unit tests verify that the analyze command is hooked into `sourcecred`
and `sourcecred help`, and that it responds to the `--help` command and
parses its arguments appropriately.
Summary:
As of this commit, `node ./bin/sourcecred.js load` uses the Mirror code,
and the legacy continuation-fetching code is not included in the
`sourcecred.js` bundle.
We do not yet perform the commit prefetching described in #923. The code
should be plenty fast for repositories that merge pull requests at least
occasionally.
Test Plan:
Running `yarn test --full` passes. Loading `sourcecred/sourcecred` works
and generates a reasonable credit attribution. Loading it again
completes immediately.
wchargin-branch: fetchGithubRepo-mirror
Summary:
This makes significant progress toward #923. As of this commit, it is
possible to use the Mirror module for the whole loading pipeline. This
process may be slow for repositories that do not use pull requests at
all (more precisely, that have large connected commit subgraphs none of
whose nodes is the merge commit of a pull request; see #920 for details)
so it is not yet the default codepath.
Test Plan:
Existing unit tests should suffice. For extra testing, I’ve added a
script that fetches a repository both via the old continuations logic
and the new Mirror logic, then constructs relational views and checks
whether the data is the same. For `example-github`, the views are
identical. For `sourcecred`, they are not: the old continuations logic
erroneously omits two commits, which the Mirror logic includes.
You can run the test like this:
```
$ node ./bin/testContinuations.js \
> sourcecred sourcecred MDEwOlJlcG9zaXRvcnkxMjAxNDU1NzA= \
> /tmp/continuations.json /tmp/mirror.json \
> 2> >(jq . >&2)
{
"child": "0d38dde23a6de831315f3643a7d2bc15e8df7678",
"parent": "cb8ba0eaa1abc1f921e7165bb19e29b40723ce65",
"type": "UNKNOWN_PARENT_OID"
}
{
"child": "d152f48ce4c2ed1d046bf6ed4f139e7e393ea660",
"parent": "de7a8723963d9cd0437ef34f5942a071b850c0e7",
"type": "UNKNOWN_PARENT_OID"
}
Different. Saving to disk...
```
Use `diff -u <(jq . /tmp/continuations.json) <(jq . /tmp/mirror.json)`
to inspect the differences, and note that exactly the two missing
commits have been added and that there are no other changes. (The diff
is small: just 51 lines of nicely formatted JSON.) The full log is here:
<https://gist.github.com/wchargin/e159cac9dcf3cc3b1efbd54f59e24e0b>
I also generated the `sourcecred/sourcecred` cred attribution and viewed
it with `yarn start`, which seems to work fine.
wchargin-branch: relationalview-new-data-format
Summary:
An upcoming commit will happen to change the order in which commits are
ingested. This is not an observable change, and should not cause a
snapshot failure.
Test Plan:
Inspection.
wchargin-branch: relationalview-snapshots-order-invariant
Summary:
This implements the translation module described in #923. See that issue
for context.
Test Plan:
This is a mostly straightforward translation from one strongly typed
data structure to another, so Flow handles most of it.
As a check on the snapshot, run:
```
$ grep -e oid -e target -e mergeCommit \
> src/plugins/github/__snapshots__/translateContinuations.test.js.snap
"target": Object {
"oid": "6bd1b4c0b719c22c688a74863be07a699b7b9b34",
"oid": "c430bd74455105f77215ece51945094ceeee6c86",
"oid": "6d5b3aa31ebb68a06ceb46bbd6cf49b6ccd6f5e6",
"oid": "0a223346b4e6dec0127b1e6aa892c4ee0424b66a",
"oid": "ec91adb718a6045b492303f00d8e8beb957dc780",
"oid": "ecc889dc94cf6da17ae6eab5bb7b7155f577519d",
"oid": "ec91adb718a6045b492303f00d8e8beb957dc780",
"mergeCommit": Object {
"oid": "0a223346b4e6dec0127b1e6aa892c4ee0424b66a",
"oid": "ec91adb718a6045b492303f00d8e8beb957dc780",
"oid": "ecc889dc94cf6da17ae6eab5bb7b7155f577519d",
"oid": "ec91adb718a6045b492303f00d8e8beb957dc780",
"mergeCommit": Object {
"oid": "6d5b3aa31ebb68a06ceb46bbd6cf49b6ccd6f5e6",
"oid": "0a223346b4e6dec0127b1e6aa892c4ee0424b66a",
"oid": "ec91adb718a6045b492303f00d8e8beb957dc780",
"oid": "ecc889dc94cf6da17ae6eab5bb7b7155f577519d",
"oid": "ec91adb718a6045b492303f00d8e8beb957dc780",
"mergeCommit": null,
```
Cross-check this against [the example-github commits][commits] thus:
- Note that commit `6bd1b4c` is the head commit, and is thus the root
commit of the `target` chain.
- Note that commits `0a22334` and `6d5b3aa`, which were merged via
pull request, appear twice each: once in the history from head, and
once as the merge commit of a pull request.
- Note that commit `0a22334` has two parents at each occurrence.
- Note that the unmerged pull request’s merge commit is `null`.
[commits]: https://github.com/sourcecred/example-github/commits/master
To run this on real-world data, apply the following patch:
```diff
diff --git a/src/plugins/github/fetchGithubRepo.js b/src/plugins/github/fetchGithubRepo.js
index 6ac201af..b14ca760 100644
--- a/src/plugins/github/fetchGithubRepo.js
+++ b/src/plugins/github/fetchGithubRepo.js
@@ -11,6 +11,7 @@ import {stringify, inlineLayout, type Body} from "../../graphql/queries";
import {createQuery, createVariables, postQueryExhaustive} from "./graphql";
import type {GithubResponseJSON} from "./graphql";
import type {RepoId} from "../../core/repoId";
+import translateContinuations from "./translateContinuations";
/**
* Scrape data from a GitHub repo using the GitHub API.
@@ -44,6 +45,11 @@ export default function fetchGithubRepo(
payload
).then((x: GithubResponseJSON) => {
ensureNoMorePages(x);
+ console.warn("Translating continuations...");
+ for (const w of translateContinuations(x).warnings) {
+ console.warn(w);
+ }
+ console.warn("Done.");
return x;
});
}
```
Then run:
```
$ yarn backend >/dev/null 2>/dev/null; echo $?
0
$ node ./bin/sourcecred.js load sourcecred/sourcecred --plugin github 2>&1 |
> ts -s '%.s'
55.015740 Translating continuations...
55.037217 { type: 'UNKNOWN_PARENT_OID',
55.037273 child: '0d38dde23a6de831315f3643a7d2bc15e8df7678',
55.037290 parent: 'cb8ba0eaa1abc1f921e7165bb19e29b40723ce65' }
55.037309 { type: 'UNKNOWN_PARENT_OID',
55.037336 child: 'd152f48ce4c2ed1d046bf6ed4f139e7e393ea660',
55.037359 parent: 'de7a8723963d9cd0437ef34f5942a071b850c0e7' }
55.037383 Done.
```
Note that the two commits in question were each merged into a non-master
branch, in #28 and #329 respectively. Note also that translating these
continuations took just 22 milliseconds.
wchargin-branch: github-translate-continuations
Summary:
This fixes the following issues:
- Pull request reviews actually do not have reactions.
- We must fetch the `id` of a `Ref`.
- We must fetch the `id` of a `Commit`, `Tree`, `Blob`, or `Tag`, and
should also fetch its `oid`.
- Repository owners cannot be bots.
- Commit and reaction authors cannot be bots, organizations, or
`undefined`.
Test Plan:
Running `yarn test --full` passes, and the snapshot diff is clearly
correct.
wchargin-branch: github-fix-up-continuations
Summary:
The included schema is forked from the one in `graphql/demo.js`.
Primitive types have been added, and the `parents` connection has been
added to commit objects per #920. (We do not include this in the demo
script because without prefetching it would take a long time to load.)
Test Plan:
Unit tests added; run `yarn unit`. Then run `yarn backend` and verify
that `node ./bin/generateGithubGraphqlFlowTypes.js` generates exactly
the same output as in the types file:
```
$ node ./bin/generateGithubGraphqlFlowTypes.js |
> diff -u - ./src/plugins/github/graphqlTypes.js
$ echo $?
0
```
Change the `graphqlTypes.js` file and verify that `yarn unit` fails.
As the build config has been changed, a `yarn test --full` is warranted.
It passes.
Finally, I have manually verified that the schema is consistent with the
documentation at <https://developer.github.com/v4/object/repository/>
and related pages.
wchargin-branch: github-schema-flow-types
Summary:
Generating Flow types from a structured schema is both straightforward
and terribly useful. This commit implements it.
Test Plan:
Inspect the snapshot for correctness manually. Then, copy it into a new
file, remove backslash-escapes, and verify that it passes Flow.
A subsequent commit will generate types for the actual GitHub schema.
wchargin-branch: graphql-generate-flow-types
Summary:
Prior to this change, primitive fields were un\[i\]typed. This commit
allows adding type annotations. Such annotations do not change the
semantics at all, but we will be able to use them to generate Flow types
corresponding to a schema.
This commit also strengthens the validation on schemata to catch some
errors that would previously have gone unnoticed at schema construction
time: e.g., a node reference to a type that does not exist.
Test Plan:
Unit tests updated, retaining full coverage. The demo script continues
to work when loading `example-github` or `sourcecred`.
wchargin-branch: schema-annotated-primitives
Summary:
To ease the transition from manual continuation resolution to the Mirror
API, we update the old system to fetch commit parent OIDs, as described
in #923.
Test Plan:
To check that the continuations are wired correctly, apply the following
patch to force continuations to be followed at every step:
```diff
diff --git a/src/plugins/github/graphql.js b/src/plugins/github/graphql.js
index 05761ca..a21a364 100644
--- a/src/plugins/github/graphql.js
+++ b/src/plugins/github/graphql.js
@@ -53,7 +53,7 @@ const PAGE_SIZE_COMMENTS = 20;
const PAGE_SIZE_REVIEWS = 5;
const PAGE_SIZE_REVIEW_COMMENTS = 10;
const PAGE_SIZE_COMMIT_HISTORY = 100;
-const PAGE_SIZE_COMMIT_PARENTS = 5;
+const PAGE_SIZE_COMMIT_PARENTS = 0;
const PAGE_SIZE_REACTIONS = 5;
/**
@@ -358,6 +358,7 @@ function* continuationsFromCommit(
) {
const b = build;
if (result.parents && result.parents.pageInfo.hasNextPage) {
+ console.warn(result.parents.pageInfo);
yield {
enclosingNodeType: "COMMIT",
enclosingNodeId: nodeId,
@@ -366,7 +367,7 @@ function* continuationsFromCommit(
b.field(
"parents",
{
- first: b.literal(PAGE_LIMIT),
+ first: b.literal(1),
after: b.literal(result.parents.pageInfo.endCursor),
},
[b.fragmentSpread("commitParents")]
```
(It is important that the initial page limit be `0` and not (say) `1`,
because the `defaultBranchRef` is likely to have just one parent; by
using `0`, we test its continuations as long as it has at least one
parent.)
Then run `./src/plugins/github/fetchGithubRepoTest.sh` and note that the
test passes and the output is
```
{ hasNextPage: true, endCursor: null }
{ hasNextPage: true, endCursor: null }
{ hasNextPage: true, endCursor: null }
{ hasNextPage: true, endCursor: null }
{ hasNextPage: true, endCursor: null }
{ hasNextPage: true, endCursor: null }
{ hasNextPage: true, endCursor: null }
{ hasNextPage: true, endCursor: 'MQ==' }
{ hasNextPage: true, endCursor: 'MQ==' }
```
Note that the test output (as updated in this commit) includes commits
with a unique parent, a commit with no parents, and a commit with two
parents.
I also ran `node ./bin/sourcecred.js load` on sourcecred/sourcecred and
ipfs/js-ipfs. Each worked, and the resulting credit attributions loaded
fine.
wchargin-branch: github-commit-parent-oids
Summary:
This has two benefits:
- The dates on commits are data that we will probably want when we add
timestamps to authorship edges to accommodate time-weighted cred.
- Once the mirror module is integrated with the GitHub plugin, we’ll
want to fetch dates on commits, because this is the only real-world
test case for a nested field that contains a primitive field (as
opposed to a node reference), so it’ll be nice to be continually
exercising that somewhat-edge case.
Date strings are in commit-local time and do not depend on the time zone
of the requester (in contrast to [cursors]). For example, on SourceCred:
```shell
$ time node ./bin/fetchAndPrintGithubRepo.js \
> sourcecred sourcecred "${GITHUB_TOKEN}" |
> jq -rc '
> .repository.defaultBranchRef.target.history.nodes[]
> .author?.date[-6:]
> ' | sort | uniq -c
1 +03:00
6 -04:00
717 -07:00
58 -08:00
```
[cursors]: <https://github.com/sourcecred/sourcecred/pull/129#issuecomment-382970474>
Test Plan:
The snapshot contains 8 instances of `oid` and 8 instances of `date`,
which is good (each of these properties appears exactly once on each
commit, and nowhere else). Running `yarn test --full` passes.
wchargin-branch: github-commit-dates