- PluginAdapters no longer expose a Renderer; instead, the render
methods are inlined on the PluginAdapter. The extra abstraction didn't
provide any lift in the current architecture.
- The edgeVerb function has been removed.
- PluginAdapters now enumerate EdgeTypes. Each has a prefix, and a
forward and a backward name.
Test plan: `yarn travis`, plus manual testing of the frontend and the
weight config.
Summary:
We don’t need this to be a “progressive web app”—certainly not now. The
n+1 caching problem is not a good tradeoff for us, and furthermore
service workers are causing flashes of content on server-side rendered
pages.
This commit is a quick fix to remove them. We can remove the code
entirely if we want, or just keep it as is.
Test Plan:
On a machine has the service worker registered, run `yarn build`, then
`node bin/sourcecred.js start`. Note in the network panel that the
service worker is loaded on the first page load, but then deregistered.
On subsequent refreshes, it should not activate. In the “Application”
panel of the Chrome dev tools, it should appear as “deleted”.
wchargin-branch: disable-sw
The WeightConfig is a power user feature. Now that we're building a
public-facing demo out of the Cred Explorer, it will be better to hide
the weight configuration by default.
This commit adds a button for showing/hiding the weight configuration.
The weights are still propagated correctly regardless of whether the
weight config is shown.
Test plan:
- Ensure that the site loads with weights hidden by default.
- Ensure that clicking the button causes the weight config to display.
- Ensure that PageRank loads and displays correctly with the weights
hidden.
- Ensure that changes to the weight config still propagate to PageRank
(with weights hidden or not hidden).
Test Plan:
Ensure that `require("./src/app/routeData")` works in `node` without any
preprocessing. Ensure that `yarn start` works, and that `yarn build`
then `node ./bin/sourcecred.js start` also works.
wchargin-branch: vanilla-route-data
Summary:
Some of the code here is adapted from my site (source available on
GitHub at wchargin/wchargin.github.io). It has been improved when
possible and made worse when necessary to fit into our existing build
system with minimal churn.
As of this commit, there remain the following outstanding tasks:
- Use a non-hardcoded list of paths in static site generation router.
This is not trivial. We have the paths nicely available in
`routes.js`, but this module is written in ES6, and transitively
depends on many files written in ES6 (i.e., the whole app). Yet
naïvely it would be required from a Webpack config file, which is
interpreted as vanilla JavaScript.
- Add `csso-loader` to minify our CSS. This is easy.
- Add unit tests for `dedent`. (As is, it comes from my site
verbatim. I wrote it. dmnd’s `dedent` package on npm is insufficient
because it dedents arguments as well as the format string, which is
incorrect at least for our purposes.)
- Link in canonical static data for the site.
- Rip out the whole build system and replace it with my build config,
which is orders of magnitude saner and less bad. (By “the whole
build system” I mostly mean `webpack.config.{dev,prod}.js`.)
Test Plan:
```shell
$ yarn backend
$ yarn build
$ node ./bin/sourcecred.js start
```
wchargin-branch: static-v0
Summary:
This adds a dummy landing page. We’ll want to actually put nice content
on it. For development convenience, I’m totally fine with having the
`yarn start` launch `/explorer` instead of just `/`.
Test Plan:
Run `yarn start` and note that the navigation works.
wchargin-branch: landing-page
Summary:
This commit hooks up the PageRank table to the PageRank node
decomposition developed previously. The new cred explorer displays one
entry per contribution to a node’s cred (i.e., one entry per in-edge,
per out-edge, and per synthetic loop), listing the proportion of the
node’s cred that is provided by this contribution. This makes it easy to
observe facts like, “90% of this issue’s cred is due to being written by
a particular author”.
Paired with @decentralion.
Test Plan:
Unit tests added; run `yarn travis`.
wchargin-branch: pagerank-table-node-decomposition
Summary:
The aesthetically nicest win is in `WeightConfig`. Other changes are
nice to have.
In many cases, we reduce the specificity of error messages thrown. For
instance, if an invariant was violated on an edge `e`, then we might
have thrown an error with message `EdgeAddress.toString(e.address)`. But
we did so not because we thought that this was genuinely worth it, but
only because we were forced to explicitly throw an error at all. These
errors should never be hit, anyway, so we don’t feel bad about replacing
these with errors that are simply the string `"null"` or `"undefined"`,
as appropriate.
Test Plan:
Running `yarn travis --full` passes, and the cred explorer still seems
to work with both populated and empty `localStorage`.
wchargin-branch: use-null-util
Summary:
This commit adds a module with four functions: `get`, `orThrow`, `map`,
and `orElse`.
Here is a common pattern wherein `get` is useful:
```js
sortBy(Array.from(map.keys()), (x) => {
const result = map.get(x);
if (result == null) {
throw new Error("Cannot happen");
}
return result.score;
});
// versus
sortBy(Array.from(map.keys()), (x) => NullUtil.get(map.get(x)).score)
```
(The variant `orThrow` allows specifying a custom message that is only
computed in the case where the error will be thrown.)
Here is a common pattern where `map` is useful:
```js
arr.map((x) => {
const result = complicatedComputation(x);
return result == null ? result : processResult(result);
});
// versus
arr.map((x) => NullUtil.map(complicatedComputation(x), processResult))
```
In each of these cases, by using these functions we gain a dose of
safety in addition to our concision: it is tempting to “shorten” the
expression `x == null ? y : z` to simply `x ? y : z`, while forgetting
that the latter behaves incorrectly for `0`, `false`, `""`, and `NaN`.
Similar patterns like `x || defaultValue` also suffer from this problem,
and can now be replaced with `orElse`.
Designed with @decentralion.
Test Plan:
Unit tests included; run `yarn travis`.
wchargin-branch: null-util
There's no sense having a landing page with no content and
a nav bar with only one meaningful options. We can re-add
them later if we actually need navigation
Test plan: Local testing
Summary:
When updating `PagerankTable` to work with contributions, we found it
difficult to keep track of everything when we tried to do two things
simultaneously: compute the values to be displayed, and render them
hierarchically. @decentralion suggested computing the relevant data
ahead of time, and then having a straightforward React component to
render this structure. This would incidentally make `PagerankTable`
easier to test.
This commit implements that data structure and the function to create
it from a `PagerankResult`. A subsequent commit will update
`PagerankTable` accordingly.
As evidence that this structure is well-designed, note that the main
contents of a contribution row can be rendered entirely from a
`ScoredContribution` datum (though the component will still of course
require the full `PagerankNodeDecomposition` to pass down to its
children). (At least, I think that it can be!)
Designed with @decentralion.
Test Plan:
Unit tests added. I have checked that the snapshot is structurally
correct: each node has contributions with the correct contributors.
I did not manually compute the stationary distribution and check the
snapshot for correctness. The snapshot is complemented by automated
tests.
wchargin-branch: pagerank-node-decomposition
Summary:
Now that `MapUtil` provides `toObject`/`fromObject`, we can keep storing
weights in localStorage while representing them as ES6 `Map`s in memory.
Here are some advantages:
- The code is genuinely more typesafe. While writing this,
I accidentally wrote `edgeWeights.get(key)`, where `edgeWeights`
should have been `nodeWeights`. This was caught at compile time, and
would not have been in the previous version.
- Relatedly, the code now has zero `any`-casts as opposed to five.
- The initialization of the default values is not abysmally ugly.
- Whenever we iterate over these maps, (a) we can use `.entries()`,
and (b) we don’t have to cast between string keys and semantic keys.
This simplifies some of the control flow.
- The extra null-checking on `get` forces us to either think about
ways in which the check might fail, or reuse a previously fetched
value that is known to be non-null (perhaps because it came from
`entries`).
- A particularly annoying Prettier line-break is avoided. :-)
Here are some disadvantages:
- The null-pipelining around the `rehydrate` function is a bit
annoying. As @decentralion pointed out, what we want here is not a
default value, but a default value and a function to transform a
present value. This is Haskell’s `maybe : β → (α → β) → Maybe α → β`
or Java’s `optional.map(fromObject).orElse(defaultValue)`. This
commit implements one approach; another is to note that `fromObject`
is invertible, writing `fromObject(LocalStore.get(k, toObject(d)))`.
- That’s it, I think?
Test Plan:
I’ve tested that the sliders for both edge and node weights correctly
influence the PageRank behavior, that the component is properly
initialized with an empty localStorage, and that the component properly
rehydrates from localStorage.
wchargin-branch: weightconfig-maps
Summary:
These call sites were selected from `git grep Map`. In this commit, we
only add usage of the utility functions; we do not change any existing
object types to maps.
Test Plan:
Running `yarn travis --full` passes.
wchargin-branch: use-map-util
Summary:
We’d like to like ES6 `Map`s, because they provide better type safety
than objects (primarily, `Map.prototype.get` has nullable result type).
However, the vanilla APIs are weak. Prominent problems are that `Map`s
always become `"{}"` under `JSON.stringify`, that there is no easy way
to convert between `Map`s and objects, and that there are no functions
to map over the keys and values of `Map`s.
In this commit, we add versions of those functions to a utility module.
The value-level implementations are straightforward, but these functions
nevertheless deserve a utility module because the types are somewhat
tricky to get right. The implementation requires casts through `any`,
and these should be written, analyzed, and proven correct just once. (In
particular, it would be easy to write an unsound type for `fromObject`.)
In a followup commit, we will amend existing portions of the codebase to
use these functions.
Test Plan:
Unit tests added; run `yarn travis`.
wchargin-branch: map-util
This commit adds another bank of sliders to the cred explorer, for
changing the directionality of edges. The sliders have the range [0,1]
with step size of 0.01.
The layout is pretty ugly and clearly should be refactored. But playing
with these sliders is interesting :)
Test plan: We don't have any unit tests on the WeightConfig, but I did
drive it by hand. An interesting experiment is to set the AUTHORS edge
directionality to 1, so that users get no credit for authoring posts. As
expected, this utterly tanks the users' scores; many users then have a
score of -Infinity.
Summary:
If we want to snapshot an edge, then none of the available options is
ideal:
- Snapshotting the edge directly includes literal NUL bytes in the
snapshot file, so it is treated as binary. This is bad.
- Using `edgeToString` works, but all fields of the edge are combined
into a single string, which is somewhat hard to read.
- Using `edgeToParts` works, but each address in the edge takes up a
lot of visual space: one line per part in the address. This is also
somewhat hard to read.
This commit adds `edgeToStrings`, which simply applies the appropriate
`toString` operation to each field of an edge.
Test Plan:
Unit tests added; run `yarn travis`.
wchargin-branch: edge-to-strings
The implementation is similar to the LocalStore usage in
`credExplorer/app.js`. Had to make a spurious refactor from Map to
Object because ES6 maps don't stringify by default, and I didn't feel
like writing a custom JSON serializer.
Test plan:
Didn't add unit tests, although at some point we should come up with a
nice LocalStore mock and test LocalStore code. I did, however, manually
try it out and verify that it works :)
Paired with @wchargin
Summary:
Prettier inserted these in a previous version of the code, but the lines
got shorter and so Prettier no longer minds if we remove the breaks.
Test Plan:
shipitquick
wchargin-branch: remove-line-breaks
Summary:
This change is motivated by the behavior of loops, but applies more
generally to edges. Previously, a loop would induce two contributions
with the same contributor, but possibly different weights: one with the
to-weight of the edge and one with the fro-weight. For one, this is
annoying to downstream clients, who would like to use the contributor as
a superkey. But it is also somewhat strange that a single contributor
could have two different weights.
The same applies to edges in general: every edge induces two
contributions with the same contributor, of type `NEIGHBOR`.
As of this commit, we replace `NEIGHBOR` with `IN_EDGE` and `OUT_EDGE`,
one of each induced by each edge. This has the effect that a contributor
maps to at most one contribution.
Test Plan:
Existing unit tests updated.
wchargin-branch: separate-in-out-edge-contributions
Summary:
When we convert a graph to a Markov chain, each cell in the transition
matrix is a sum of edge weights from the given `src` to the given `dst`,
plus the synthetic self-loop needed for stability. Performing this sum
loses information: given the transition matrix, a client cannot
determine how much a particular edge contributed to the score of a node
without redoing the relevant computations. In this commit, we expose the
structure of these contributions (i.e., edges and synthetic loops).
This changes the API of `graphToMarkovChain.js`, but it does not change
the resulting Markov chains. It also does not change the API of
`pagerank.js`. In particular, clients of `pagerank.js` will not have
access to the contributions structure that we have just created.
Test Plan:
Existing unit tests have been updated to use the new API, and pass
without change. An additional test is added for a newly exposed
function, even though this function is also tested extensively as part
of later downstream tests.
In one snapshot, one value changes from `0.25` to `0.25 + 1.7e-16`. The
other values in the enclosing distribution do not change, so I think
that it is more likely that this is due to floating-point instability
than an actual bug. (I’m not sure where exactly I commuted or associated
an operation, but it’s quite possible that I may have done so). To
compensate, I added an additional check that the values in the
stationary distribution sum to `1.0` within `1e-9` tolerance; this check
passes.
wchargin-branch: expose-contributions
Previously, when expanding a node in the cred explorer, it would display
the neighboring nodes, but not any information about the edges linking
to that node. If the same node was reached by multiple edges, this
information was not communicated to the user.
As of this commit, it now concisely communicates what kind of edge was
connecting the chosen node to its adjacencies. There's a new `edgeVerb`
method that plugin adapters must implement, which gives a
direction-based verb descriptiong of the edge, e.g. "authors" or "is
authored by".
Test plan:
Unit tests added to the PagerankTable tests, and hand inspection.
Paired with @wchargin
Summary:
This commit adds sliders for each node and edge type (hard-coded for
now), and hooks them up to the cred explorer so that re-running PageRank
uses the newly induced edge evaluator.
Paired with @decentralion.
Test Plan:
We will add tests later. We promise! In the meantime, the results that
appear when you drag a slider and re-run PageRank seem appropriate. For
instance, changing the “Git blob” node type from `0.0` to `-10.0`
results in the Git blobs not dominating the whole view.
wchargin-branch: configurable-weights-ui
Summary:
PageRank wants an _edge evaluator_: a function mapping an edge to its
to-weight and fro-weight. This commit provides functions for creating
edge evaluators based on the high-level, coarse notions of node and edge
types. We use [the formulation described in a comment on #476][1].
Paired with @decentralion.
[1]: https://github.com/sourcecred/sourcecred/issues/476#issuecomment-402576435
Test Plan:
None. We will add tests later. We promise!
wchargin-branch: edge-weight-generators
Summary:
As we add sliders for adjusting the PageRank parameters, we trigger a
bunch of unneeded renders on `PagerankTable`. As `PagerankTable` is a
pure component, we can [mark it and its children as such][1] to see notable
performance improvements: the `Array.from` and `sort` in its `render`
method are showing up on the flamegraph.
[1]: https://reactjs.org/docs/react-api.html#reactpurecomponent
Paired with @decentralion.
Test Plan:
Unit tests pass, whereas if we instead implement `shouldComponentUpdate`
by `return false` then the interaction tests fail. Also, `yarn start`
seems to behave as expected as we switch among different graphs.
wchargin-branch: pure-pageranktable
Summary:
To verify that the correct graph is loaded, we display the graph’s node
and edge count in the UI. As previously implemented, this would be
recomputed on every change, requiring iteration over all nodes and edges
of the graph. On my machine (T440s) and the SourceCred graph, this
induced an ~80ms performance floor for any operations that caused the
app to re-render, which is noticeable.
This commit retains the useful information, but computes it only at
graph load time.
Paired with @decentralion.
Test Plan:
Note that the behavior is unchanged.
wchargin-branch: cache-graph-size
Summary:
See #432. This commit switches the GitHub Markdown parser from matching
regular expressions against the raw post body to matching the same
regular expressions against the semantic text of a Markdown document.
See test cases for `parseReferences` and `parseMarkdown` for more
details.
There are no changes to snapshots or cached GitHub data because the
Markdown in the example repository is simple enough to be properly
parsed by a simple approach that treats everything as literal text.
The change to the “finds numeric references in a multiline string” test
case is to strip leading indentation from the string in question. As
previously written, the test had four spaces of leading indentation on
each line, causing the Markdown parser to interpret it as a code block,
in turn causing our logic to (correctly) omit it entirely. The revised
version of the test has no leading indentation.
Test Plan:
Light unit tests included; these tests are intended to verify that the
actual Markdown logic from `parseMarkdown` is being used, and that file
has more extensive tests. Additionally, `yarn travis --full` passes.
wchargin-branch: github-parse-markdown
Summary:
This commit exposes a function of type `(string): string[]` to
encapsulate the whole Markdown pipeline, from parsing to AST
transformation to text node extraction. Clients of this module do not
need to know about `commonmark`.
Test Plan:
A single comprehensive test case has been added.
wchargin-branch: text-blocks
Summary:
The regular expressions used to detect GitHub references were of the
form `/(?:\W|^)(things)(?:\W|$)/gm`, where the outer non-capturing
groups were intended to enforce a word boundary constraint. However,
this caused reference detection in strings like `"#1 #2"` to fail,
because the putative matches would be `"#1 "` and `" #2"`, but these
matches overlap, and the JavaScript RegExp API (like most such APIs)
finds only non-overlapping matches. Therefore, in a string of
space-separated references of the same kind, only every other reference
would be detected.
A solution is to use a positive lookahead instead of the second
non-capturing group: i.e., `/(?:\W|$)/` becomes `/(?=\W|$)/`. (Ideally,
the first non-capturing group would just be a lookbehind, but JavaScript
doesn’t support those.)
In some cases, using `\b` is a simpler solution. But this does not work
in all cases: for instance, it works for repo-numeric references, but
does not work for numeric references, because the presence of the hash
means that there cannot be an immediately preceding word boundary. For
consistency, I opted to use the lookahead solution in all cases, but any
solution that passes tests is okay with me.
Test Plan:
Regression tests added. They fail before this patch and pass with it.
wchargin-branch: fix-space-separated-github-references
Summary:
The `^` metacharacter only matches the start of a non-initial line when
the `m` (“multiline”) flag is set on the RegExp. This flag was set only
for the GitHub URL RegExp.
Test Plan:
Regression tests added. They fail before this patch and pass with it.
wchargin-branch: fix-sol-github-references
Test plan:
`git grep -i v3` only shows incidental hits in longer strings
`yarn travis --full` passes
`yarn backend` works
`yarn build` works
`yarn start` works
`node bin/sourcecred.js start` works
`node bin/sourcecred.js load sourcecred example-github` works
Paired with @wchargin
Summary:
The bridge introduced in #448 has now served its purpose, and may be
deconstructed. This implements the first part of the last step of the
plan described in that pull request.
Paired with @decentralion.
Test Plan:
After `yarn backend && yarn build`:
- `node bin/sourcecred.js start` works, and
- `yarn start` works, and
- `yarn travis --full` works.
wchargin-branch: demolish-bridge
Test plan:
`node bin/sourcecred.js load sourcecred example-github` works
`yarn start` works
`node bin/sourcecred.js start-v3` works
`yarn travis --full` passes
Paired with @wchargin
Summary:
This could also be moved into the bridge directory, but this way is
marginally easier, and it doesn’t really matter in the end.
Test Plan:
`yarn backend` followed by `node bin/sourcecredV3.js start-v3` works.
wchargin-branch: start-v3
The `load` command replaces `plugin-load`. By default, it loads data for
all plugins, and does so in parallel using execDependencyGraph. If
passed the optional `--plugin` flag, then it will load data just for
that plugin.
As an implementation detail, when loading all plugins, load calls itself
with the plugin flag set.
Usage:
`node bin/sourcecred.js load repoOwner repoName`
Test plan:
Tested by hand; I blew away my SourceCred directory and then loaded the
example-github repository.
This integrates the PageRank table from #466 into the v3 cred explorer
app, bringing the v3 frontend to better-than-parity with v1!
Test plan:
Some unit tests were included, and running `yarn start` and inspecting
the App reveals that it is working correctly. Loading a PageRank result
and then changing the repository no longer triggers a crash :).
Paired with @wchargin
Ports #265 to the v3 branch, along with some tweaks:
- Only display log score, and normalize them by adding 10 (so that most
are non-negative)
- Change colors to a soothing green
- Improve display, e.g. make overflowing node description text wrap
within the row
Implements most of the tests requested in #269
Test plan: Many unit tests added
Paired with @wchargin
Summary:
This enables plugins to specify different semantic types of nodes, along
with human-readable names. This will be used, for instance, in the cred
explorer, where users may filter to one of these node types.
Paired with @decentralion.
Test Plan:
Flow passes.
wchargin-branch: plugin-node-types
Summary:
This presents a human-readable name for a plugin. It’s not yet used
anywhere.
Paired with @decentralion.
Test Plan:
Flow passes.
wchargin-branch: plugin-name
Test plan:
Run the following commands:
```
node bin/sourcecredV3.js load-plugin-v3 sourcecred example-github --plugin=git
node bin/sourcecredV3.js load-plugin-v3 sourcecred example-github --plugin=github
yarn start
```
Then, navigate in-browser to the v3 cred explorer and load data for
`sourcecred/example-github`. The following messages are printed to
console:
```
GitHub: Loaded graph: 31 nodes, 73 edges.
Git: Loaded graph: 15 nodes, 19 edges.
Combined: Loaded graph: 44 nodes, 92 edges.
```
Paired with @wchargin
Summary:
This enables grabbing the GitHub relational view from disk and
converting it to a graph on the client.
Paired with @decentralion.
Test Plan:
For testing, information about the GitHub graph is printed when you
click the “Load data” button in the UI. Do so.
wchargin-branch: github-plugin-adapter
Summary:
Text input boxes for repository owner and name now appear. “Loading the
data” consists of logging the attempt to the console.
Test Plan:
Run `yarn start`, and note that the inputs are keyed against the same
local store key as their V1 equivalents. Note that clicking “Load data”
prints a message to the console.
Paired with @decentralion.
wchargin-branch: v3-load-data-ui
In GitHub, you can make cross repo references. For example,
sourcecred/sourcecred#459 is one such reference. This commit adds
support for detecting those references and adding them to the GitHub
graph.
Test plan:
See attached unit tests.
This commit adds `loadGitData`, which clones the git repository for a
given repo and saves the corresponding git graph. It also adds that
method to the `loadPlugin` command, so that the following command now
works:
```
$ node bin/sourcecredV3.js load-plugin-v3 sourcecred example-git --plugin=git
```
After running that command, the correct file is present:
```
$ du -sh tmp/sourcecred/data/sourcecred/example-git/git/graph.json
28K /home/dandelion/tmp/sourcecred/data/sourcecred/example-git/git/graph.json
```
The command takes:
| repository | time (s) |
:------------------------- | ----------:
| `sourcecred/example-git` | 1 |
| `sourcecred/sourcecred` | 5 |
| `ipfs/js-ipfs` | 18 |
| `ipfs/go-ipfs` | ∞ (OOM) |
This module exposes a method, `pagerank`, which is a convenient entry
point for taking a `Graph` and returning a `PagerankResult`. This
obviates the need for `src/v1/app/credExplorer/basicPagerank.js`.
Test plan: Unit tests included.
I'm planning to make a `pagerank.js` module that is a clean entry point
for all the graph-pagerank-related code, so it will be cleaner to expose all
the default options there.
Test plan: travis
Paired with @wchargin
If `findStationaryDistribution` is passed `0` as `maxIterations`, then
it should return the initial distribution.
Test plan: see new unit test
Paired with @wchargin
This commit enables paired authorship on GitHub authored entities.
If the entity has the string "Paired with" in the body, followed by a
username reference, that entity will be recorded as having dual
authorship, with the nominal author and the paired-with author being
treated identically in the relational view and the graph.
If there's a need to pair with more than one author, the "Paired with"
signifier may be repeated. The regex matcher is forgiving of
capitalizing the P or W, and an optional colon may be added immediately
after the word "with".
Note that the code assumes that every `TextContentEntity` is also an
`AuthoredEntity`. If that changes, it will cause a type error and we'll
need to refine the code somewhat.
As implemented, it is impossible for the same user to author a post
multiple time; if this is textually suggested (e.g. by a paired-with
reference to the post's nominal author), the extra paired-with
references are silently ignored. Also, having a paired-with reference
suppresses the basic reference (although it is possible to have a post
that is paired with someone, and additionally references them).
Test plan:
Tests have been updated, and the behavior of the parser is extensively
tested. For an end-to-end demonstration, I've also added a unit test in
the relational view that verifies that sourcecred/example-github#10 has
two authors. You can also see that the graph snapshot has updated to
include additional authorship edges (and that corresponding reference
edges have disappeared).
Closes#218
Summary:
This implements Step 3 of the plan described in #448.
Test Plan:
Run `yarn start` and navigate to `/v3` (by clicking the nav link).
wchargin-branch: branch-v3
Summary:
The bridge now lets you select any version of the app that you want, as
long as that’s V1, because that’s the only version that exists. We’ll
add a V3 version shortly.
This implements Step 2 of the plan described in #448.
Test Plan:
In `yarn start` and `node bin/sourcecred.js start`, note that navigating
to `/` redirects to `/v1`, and that the cred explorer works.
wchargin-branch: bridge-select
Summary:
Our build system doesn’t make it easy to have two separate React
applications, which we would like to have for the V1 and V3 branches.
Instead, we’ll implement a bridge to maintain compatibility.
The plan looks like this:
1. Change the app from pointing to V1 to pointing to a bridge
2. Move the router into the bridge and move the V1 app from the `/`
route to the `/v1` route (e.g., `/v1/explorer`)
3. Add a V3 app under the `/v3` route
4. ???
5. Delete the V1 app and remove it from the bridge
6. Delete the bridge and move the V3 app from the `/v3` route to `/`
This commit implements Step 1.
Test Plan:
To verify that the bridge is in fact showing, apply
```diff
diff --git a/src/bridge/app/index.js b/src/bridge/app/index.js
index 379e289..72e784c 100644
--- a/src/bridge/app/index.js
+++ b/src/bridge/app/index.js
@@ -9,5 +9,11 @@ const root = document.getElementById("root");
if (root == null) {
throw new Error("Unable to find root element!");
}
-ReactDOM.render(<V1App />, root);
+ReactDOM.render(
+ <React.Fragment>
+ <h1>Hello</h1>
+ <V1App />
+ </React.Fragment>,
+ root
+);
registerServiceWorker();
```
and say “hello” back to the app.
wchargin-branch: bridge
Summary:
This provides a command-line entry point `load-plugin-v3` (which will
become `load-plugin` eventually), which fetches the GitHub data via
GraphQL and saves the resulting `RelationalStore` to disk.
A change to the Babel config is needed to prevent runtime errors of the
form `_callee7` is not defined, where `_callee7` is a gensym that is
appears exactly once in the source (in use position, not definition
position). I’m not sure exactly what is causing the error or why this
config change fixes it. But while this patch may be fragile, I don’t
think that it’s likely to subtly break anything, so I’m okay with
pushing it for now and dealing with any resulting breakage as it arises.
Paired with @decentralion.
Test Plan:
Run `yarn backend`, then run something like:
```
node bin/sourcecredV3.js load-plugin-v3 \
sourcecred example-github --plugin github
```
Inspect results in `SOURCECRED_DIR/data/OWNER/NAME/github/view.json`,
where `SOURCECRED_DIR` is `/tmp/sourcecred` by default, and `OWNER` and
`NAME` are the repository owner and name.
This example repository takes about 1.1 seconds to run. The SourceCred
repository takes about 45 seconds.
wchargin-branch: cli-load-plugin
Summary:
Due to oclif’s structure, this entry point shares its `commands`
directory with that of the V1 entry point. We’ll therefore add commands
like `start-v3` as we go.
Test Plan:
`yarn backend` works, and `node bin/sourcecredV3.js start` launches the
V1 server.
wchargin-branch: v3-cli
This adds additions and deletions to the v3 Pull data model, and also
uses them in the pull descriptions.
It's basically a port of #340 to v3.
Test plan: Snapshots
This adds methods for serializing the GitHub RelationalView.
We have not put in the work to ensure that these methods generate
canonical data. Getting the issues in a different order, or finding
references in a different order, can change the JSON output even if the
resulting repositories are equivalent.
@decentralion think it's not worth putting in the effort, since we may
switch to a SQL database soon anyway.
Test plan: travis
Paired with @wchargin
Now that we want to implement RelationalView de/serialization, we need a
way to construct one without adding data to it.
Now that we're allowing `addData` to be called explicitly, we also want
to make sure it's idempotent, which necessitated a small change to
reference handling. A new test verifies idempotency.
Test plan: travis
Paired with @wchargin
Summary:
This makes progress on #432. We’d like to look for GitHub references
only within each text node of the Markdown AST. But there are two
complications:
- Text nodes split across formatting, and it’s valid for someone to
write `*Paired* with @decentralion, but *tested* independently`, or
`**Closes** #12345`, or something.
- Sometimes contiguous blocks of text expand to multiple text nodes,
because of how CommonMark approaches smart punctuation. For
instance: the document `It's got "punctuation" and stuff!` has eight
text nodes ([demo][1]).
In this commit, we introduce functions `deformat` and `coalesceText` to
solve these problems. (They go together because `coalesceText` is useful
for testing `deformat`.)
[1]: https://spec.commonmark.org/dingus/?text=It%27s%20got%20%22punctuation%22%20and%20stuff!
wchargin-branch: markdown-deformat
This commit adds a `description` method that takes a GitHub entity, and
returns a description of that entity. Based on the work in #261.
In contrast to the implementation in #261:
- It won't crash on entities without an author (although we don't have a
test case for this; see #389).
- It handles multi-authors reasonably (although we can't test that, as
we haven't implemented multi-authorship yet; see #218).
Test plan:
Inspect snapshot to see some examples.
Currently, GitHub tests load example data with ad-hoc methods. It makes
it easy for the author of a new test file to forget to clone the test
data (and risk cross-test-file state pollution), or to forget to apply
the correct typing.
This commit factors a shared `example` module which provides a safe way
to access the example data, along with some convenient helpers for
constructing a graph or relational view.
Test plan:
`yarn travis`
Fixes#430.
The Git and GitHub plugins have folders that contain small example data,
as used for tests and snapshots. These folders were called `demoData`
which is misleading since the data isn't used for demos. The folders
themselves contained files called "example", like "example-github.json"
or "exampleRepo.js". Renaming the folders to `example` is cleaner.
Test plan:
`yarn travis --full` passes.
This is a very simple extension of #431 to use the new reference
detection logic added in #429.
Test plan:
Inspect snapshot change for plausibility. Note that the snapshot adds
exactly 16 reference edges, which is the same as the number of
references in the reference snapshot test.
This commit modifies `github/createGraph` to use the `RelationalView`
class created in #424. The code is now much cleaner.
I also fixed some `any`s that were leaking in our test code (due to use
of runtime require for GitHub example data). These anys were discovered
by bumping into uncaught type errors. :)
This commit supersedes #413 and #419.
Test plan:
Observe that the graph snapshot was not changed.
For every `TextContentEntity` (`Issue`, `Pull`, `Review`, `Comment`),
this commit adds a `references` method that iterates over the entities
that the text content entity references.
For every `ReferentEntity` (actually, every entity), this commit adds a
`referencedBy` method which iterates over the text content entities that
reference that referent entity.
This method also adds `referentEntities` and `textContentEntities`
methods to the `RelationalView`, as they are used in both implementation
and test code.
Test plan:
The snapshot tests include every reference, in a format that is very
convenient for inspecting the ground truth on GitHub. For every
reference, it's easy to check that the reference actually exists by
copying the `from` url and pasting it into the browser. I've done this
and they check out. (It's not easy to prove that there are no missing
references, but I'm pretty confident that this code is working.)
Unit tests ensure that the `references` and `referencedBy`
methods are consistent.
Based on offline design discussion with @wchargin, we've decided to
upgrade the `RelationalView` to be *the* comprehensive source for GitHub
data inside SourceCred. The `RelationalView` will contain the full
dataset, including parsed relational information (such as
cross-references between GitHub entities). Then, we will project our
GitHub graph out of the `RelationalView`.
To that end, the `RelationalView` no longer exports raw data blobs.
Instead, it exports nice classes: `Repo`, `Issue`, `Pull`, `Review`,
and `Userlike`. These classes have convenient methods for accessing both
their own data and related entities, e.g. `repo.issues()` yields all
the issues in that repo.
This is effectively a port of #170 into the v3 API. The main difference
is that in v1, the Graph contained this data store, whereas in v3, we
will use this data store to generate the graph.
This supersedes #418.
Test plan:
The snapshot tests are quite readable.
Summary:
This is based on the V1 file `basicPagerank.js`. The API is necessarily
changed for the new graph format, and we export additional utilities
compared to the previous version of the module (useful for testing and
serialization). We also improve the implementation to make it simpler
and easier to understand.
Test Plan:
Unit tests included.
wchargin-branch: v3-graph-markov-chain
Summary:
We’d like to use this test case to generate a Markov chain, which
requires that it not be local to the `graph.js` tests.
Test Plan:
Existing unit tests suffice.
wchargin-branch: expose-advanced-graph
Summary:
This code is independent of the graph abstraction, and so is mostly
copied. The only change is to the structure of the test code (we now
prefer to wrap everything in a big `describe` block with an absolute
path to the module under test).
Test Plan:
Unit tests included.
wchargin-branch: v3-markov-chain
This code is about parsing references out of text, so `parseReferences`
is a better name.
The code that consumes this logic to find all the references in the
GitHub data shall be rightly called `findReferences`
Test plan:
`yarn travis`
Summary:
Closes#417. Submodule commits are dead; long live commits. The ontology
is now:
- A tree includes tree entries.
- A tree entry may have a blob as contents.
- A tree entry may have a tree as contents.
- A tree entry may have a commit as contents.
Test Plan:
Existing unit tests suffice, especially `#commits yields all commits`.
wchargin-branch: git-remove-submodule-commits
Summary:
Submodule commits need not have associated tree objects, in case the
repository to which they belong does not exist in our graph. We’d like
to represent submodule commits as actual commits, which necessitates
this change. See #417 for context.
Test Plan:
Existing unit tests suffice.
wchargin-branch: git-affine-trees
Summary:
This commit adds logic to create the Git graph, modeled after the GitHub
graph creator in #405. In this commit, we do not include the
corresponding porcelain; a Git `GraphView` will be added subsequently.
Kudos to @decentralion for suggesting in #187 that I write the logic to
detect BECOMES edges against the high-level data structures. Due to that
decision, the logic and tests are copied directly from the V1 code
without change, because the high-level data structures are the same. The
new code is exactly the body of the `GraphCreator` class.
Test Plan:
Verify that the new snapshot is likely equivalent to the V1 snapshot,
using the heuristic that the two graphs have the same numbers of nodes
(59) and edges (84). (I have performed this check.)
wchargin-branch: git-v3-create-graph
The `RelationalView` maps the GitHub GraphQL response data into a View
class, which makes it easy to access pieces of GitHub data by their
corresponding `StructuredAddress`.
This will be a valuable companion to the graph, making it possible to
access GitHub node data like the title or body of an issue via the
issue's address. This basically is the supplement to the GitHub graph
that includes the "payloads" from our v1 Graph.
It will also make creating the GitHub graph a lot more convenient,
although I've left that for another commit.
Designed with feedback from @wchargin.
Note: The `RelationalView` objects have a `nominalAuthor` rather than
`author`, so as to distinguish between authorship in the GitHub data
model (entities have at most one author) and in the SourceCred model
(entities may have multiple authors).
Test plan:
Inspect the included snapshots for reasonability, and run unit tests.
The code will be refactored so that references are expressed in terms of
the GitHub node address code; the implementation is copied first so that
the review will be cleaner.
Test plan:
`yarn travis` passes.
Summary:
The public method `checkInvariants` on graph is now cached. The cache is
invalidated when the graph is modified via the public API. As a result
of this change, the time of `yarn ci-test --testPathPattern src/v3/`
decreases from 5.631s to 3.866s (best-of-three timing, but low variance
anyway). This effect becomes much more pronounced as higher-level APIs
check their own invariants by themselves indirectly invoking the graph’s
`checkInvariants` method many times.
Test Plan:
Existing unit tests have been adapted and extended. Tests for the
invariant checking have been updated to call the internal, uncached
method, and new tests have been added to check that the caching behavior
is correct.
wchargin-branch: cache-graph-invariants
This commit:
- adds `github/createGraph.js`
- which ingests GitHub GraphQL response
- and creates a GitHub graph
- adds `github/graphView.js`,
- which takes a Graph
- and validates that all GitHub specific node and edge invariants hold
- every github node may be parsed by `github/node/fromRaw`
- with the right node type
- every github edge may be parsed by `github/edge/fromRaw`
- with the right edge type
- with the right src address prefix
- with the right dst address prefix
- every child node has exactly one parent
- of the right type
- and provides convenient porcelain methods for
- finding repos in the graph
- finding issues of a repo
- finding pulls of a repo
- finding reviews of a pull
- finding comments of a Commentable
- finding authors of Authorables
- finding parent of a ChildAddress
- tests `createGraph`
- via snapshot testing
- by checking the GraphView invariants hold
- tests `graphView`
- by checking individual entities in the example-git repository have
the proper relationships
- by checking that for every class of invariant, errors are thrown if
the invariant is violated
Test plan:
- Extensive unit and snapshot tests added. `yarn travis` passes.
Summary:
This is modeled after the GitHub edge module format. In particular, the
whole length encoding garbage is directly copied. As in that module, we
decline to test the error paths.
Test Plan:
Unit tests added; run `yarn travis`. Snapshots are readable.
wchargin-branch: git-v3-edges
Summary:
This is modeled after the GitHub node module format, with the obvious
alterations plus a bit more type safety in the implementation of `toRaw`
(namely, we check `type` exhaustively).
Test Plan:
Unit tests added; run `yarn travis`.
wchargin-branch: git-v3-nodes
Summary:
Many files are unchanged. Some files have had paths updated, or new
build/test targets added.
The `types.js` file includes payload type definitions. These are
technically independent of the graph abstraction (i.e., nothing from V1
is imported and the code all still works), but it of course implicitly
depends on the V1 model. For now, we include the entirety of this file,
just so that we have a clean copy operation. Subsequent commits will
strip out this extraneous code.
Suggest reviewing with the `--find-copies-harder` argument to Git’s
diffing functions.
Test Plan:
Running `yarn travis --full` passes. Running
./src/v3/plugins/git/demoData/synchronizeToGithub.sh --dry-run
yields “Everything up-to-date”.
wchargin-branch: git-v3-copy
Summary:
We had `edgeExamples`, wherein we constructed examples of edge addresses
(not actual edges) by manual instantiation. We checked that these
matched snapshots, and then we also called `createEdge` a bunch to
create actual edges, and checked that those matched snapshots, too.
Consequently, we had twice as many snapshots as we needed, and also
defined twice as many edge addresses as we needed.
Test Plan:
Note that snapshot contents are either deleted or unchanged.
wchargin-branch: simplify-github-edge-tests
Test Plan:
Observe that the new snapshots are easier to read. Might as well make
sure that they encode the same data as the old snapshots, too. Note that
the backslash character no longer appears in this snapshot file. :-)
wchargin-branch: use-edgeToParts
Summary:
We have `edgeToString`, which formats edges as nicely human-readable
strings. However, these strings have some quotes in them, and so when
they are themselves stringified (e.g., as part of a Jest snapshot), they
become much harder to read. We thus introduce `edgeToParts` to make our
snapshots more readable.
Test Plan:
Unit tests added; run `yarn travis`.
wchargin-branch: add-edgeToParts
Summary:
In #394, we uppercased the constants for GitHub node types. However, we
were using string literals instead of constants in the test cases. These
test cases were supposed to cover every error path, but instead ended up
just covering the “bad type” error path many times.
Any one of the following would have prevented this regression:
1. using string constants instead of literals in the test case;
2. throwing and checking more precise error messages; or
3. being alerted that coverage decreased as a result of the change.
In this commit, we enact the first of these options. I’m open to adding
a coverage bot, but don’t feel strongly about it at this time.
Test Plan:
Running `yarn coverage` now shows 100% coverage for the `nodes.js`
module, whereas previously almost all `throw fail();` lines were
uncovered (and the branch coverage was just 76%).
wchargin-branch: fix-github-node-error-tests
- Switch string constant node and edge types (e.g. "REPO") to exported
consts (eg `export const REPO_TYPE`).
- Add (and internally use) a `_Prefix` psuedomodule which contains
per-type address prefixes
- Test that constructing a StructuredAddress with the wrong type is an
error.
Test plan:
Unit tests pass, snapshots unchanged.
Paired with @wchargin
Summary:
This fixes up all instances of `fromParts([])` that are not in
`address.js` or `address.test.js`.
Paired with @decentralion.
Test Plan:
Running `git grep --name-only -F 'fromParts([])'` yields only the two
modules listed above. Existing unit tests suffice for correctness.
wchargin-branch: use-address-empty
Summary:
This can make invocations of `FooAddress.fromParts([])` a bit more
succinct.
Paired with @decentralion.
Test Plan:
Unit tests added. Run `yarn travis`.
wchargin-branch: address-empty
Similar to #390, we now allow filtering the results from `Graph.edges`
by address prefixes. It's a little more complicated than #390, as we
allow filtering by src, dst, or address.
Test plan:
Unit tests added. `yarn travis` passes.
Paired with @wchargin
Simple API addition to match v1/v2 semantics.
In the future, we can perf optimize this if we switch graph to
store nodes organized by shared prefixes.
Test plan:
Unit tests were added. `yarn travis` passes.
Paired with @wchargin
Summary:
This module includes a raw edge type, a structured edge type, and edge
creation functions that take source and destination and create an edge.
Test Plan:
Unit tests added. These cover all of the successful cases, and none of
the unsuccessful cases. We plan to refactor this code Soon™, and it is
hard to see how to nicely factor the tests without just testing the same
code paths over and over.
wchargin-branch: github-edges
Summary:
First, we rename the module itself from `address` to `nodes`: we’d like
to put the edge functions in a parallel `edges` module instead of
cramping it into this one, so it stands to reason that this one should
be called `nodes`.
We also rename the `GithubAddressT` type to `RawAddress`, so that the
module exports `RawAddress` and `StructuredAddress`. The functions then
have much better natural names of `toRaw` and `fromRaw`.
Test Plan:
Existing unit tests suffice.
wchargin-branch: rename-nodes
One of the slight modifications we've made in v3 is to effect the
following renames (as implemented in #380):
PullRequest -> Pull
PullRequestReview -> Review
PullRequestReviewComment -> ReviewComment
This commit just changes the rest of the github code in v3 to follow the
new convention.
Test plan:
`yarn travis --full` passes.
Summary:
This has two primary benefits:
- Humans can look at this snapshot file to see what’s being queried,
or to manually issue a query.
- When we change the programmatically generated query, we can easily
see what the results are in the GraphQL output. This makes it easy
to verify that a change is correct.
Test Plan:
None.
wchargin-branch: snapshot-query
Summary:
This module exposes a structured type `StructuredAddress`, an embedding
`GithubAddressT` of this type into the `NodeAddress` layer, and
functions to convert between the two.
Paired with @wchargin.
Test Plan:
Unit tests added, with full coverage. Snapshots are easily readable.
This commit copies the following logic necessary for downloading GitHub
data into v3. Minimal changes have been made to accomodate the new path
structure.
Test plan:
- Manually ran plugins/github/fetchGithubRepoTest.sh and verified that
it can correctly pass and fail
- Added the v3 github repo test to `yarn travis --full`
- Ran `yarn travis --full` and it passed
Paired with @wchargin
Summary:
We’ve added a comment directly on a pull request.
Paired with @decentralion.
Test Plan:
`yarn travis --full` passes.
wchargin-branch: update-example-github
Summary:
Each of the invariants listed at the top of the `Graph` class is now
explicitly checked by `checkInvariants`, which is called at the end of
each `Graph` method during tests only. This is powerful: it means that
not only do our tests for `Graph` test the graph, but also any tests
that depend on `Graph`—e.g., plugin code—will give us extra invariant
testing on `Graph`. As noted in a comment, if this becomes bad for
performance, we can blacklist expensive tests or whitelist tests that we
care about.
A graph method may assume that the graph invariants hold before the
method is invoked. Within the body of a graph method, invariants may be
violated, but the method must ensure that the invariants hold
immediately before it returns or yields. A consequence of this is that
if a graph function internally calls a public function (e.g., `addEdge`
might call `hasNode` to check that the source and destination exist),
then it must ensure that the invariants hold before the internal call.
This is not an “implementation detail” or “caveat”; it is simply part of
the interface of public functions. It is legal and reasonable for
private helper functions to explicitly not expect or not guarantee that
particular invariants hold, and in this case the exception should be
documented. (This is not yet the case in any of our code.)
Finally, note that the `checkInvariants` method should not call any
public methods, because those methods in turn call `checkInvariants`. If
this becomes a huge pain, we can look into implementing some kind of
“only check invariants if the invariants are not actively being
checked”, but I’d much rather not do so if we don’t have to.
Test Plan:
Running `yarn coverage` indicates that each of the failure cases is
verified. In principle, I’d be willing to add a test that parses the
source code for `graph.js` and verifies that each `return`, `yield`, or
implicit return is preceded by an invariant check. But I don’t really
want to implement that right now.
wchargin-branch: automatic-invariants
The `advancedGraph` is an example graph defined in `graph.test.js`.
It shows off many tricksy features, like having loop edges, multiple
edges from the same src to same dst, etc. We also provide two ways of
constructing it: `graph1` is straightforward, `graph2` adds tons of
spurious adds, removes, and odd ordering. This way we can ensure that
our functions treat `graph1` and `graph2` equivalently.
Test plan:
New unit tests are added verifying that `equals`, `merge`, and
`to/fromJSON` handle the advanced graph appropriately.
The serialization scheme uses `IndexedEdge`s:
```js
type Integer = number;
type IndexedEdge = {|
Address: EdgeAddressT,
srcIndex: Integer,
dstIndex: Integer,
|}
```
The nodes are first sorted. Then, we generate indexed edges from the
regular edges by replacing each node address with its index in the
sorted order. This encoding reduces the number of addresses serialized
from `n + 3e` to `n + e` (where `n` is the number of nodes and `e` is
the number of edges).
This is based on work in #295, but in contrast to that PR, we do not
index the in-memory representations of graphs. Only the JSON
representation is indexed.
Test plan:
Unit tests added. A snapshot test is also included, both to make it easy
to inspect an example of a JSON-serialized graph, and to ensure
backwards-compatibility. (The snapshot likely should not change
independent of the VERSION string.)
* Implement `Graph.merge`
Tests are mostly copied over from the v2, as implemented in #320.
Some new tests were added, e.g. checking that Merge correctly handles
10 small graphs combined.
Test plan:
See unit tests.
* Reimplement `Graph.copy` using `Graph.merge`
Test plan:
Existing unit tests suffice
Suggested by @wchargin
Tests are mostly copied over from the v2, as implemented in #320.
Some new tests were added, e.g. checking that Merge correctly handles
10 small graphs combined.
Test plan:
See unit tests.
Summary:
The previously listed invariants were weak on two counts. First, it was
unstated that the keys of `_inEdges`, `_outEdges`, and `_nodes` should
coincide. Second, the “exactly once” condition on edge inclusion had the
unintentional effect that edge absent in `_edges` but present twice or
more in each of `_inEdges` and `_outEdges` would not violate the
invariant.
Test Plan:
Stay tuned.
wchargin-branch: strengthen-invariants
The implementation is quite simple. The tests are somewhat more
comprehensive than in v2 or v1. We now test that copies are equal to the
original in a variety of situations.
Test plan:
Unit tests added.
It turns out we forgot to add this to the API, so I added it. I also
implemented it. The tests are pretty thorough; as an added innovation
over our previous tests (e.g. in #312 and #61), we now consistently test
that equality is commutative.
In contrast to our previous implementations, this one is massively
simpler. That's an upside of using primitive ES6 data structures to
store all of the graph's information... which is itself an upside of not
trying to store arbitrary additional information in the graph. Now we
can just do a deep equality check on the underlying nodes set and edges
map!
We might be able to performance tune this method by taking advantage of
the structure of our nodes and edges. This should suffice for now,
though.
Paired with @wchargin
Test plan:
Unit tests were added. Run `yarn travis`
This is the change that puts the Graph into `Graph` :) We add `_inEdges`
and `_outEdges`, and use them to identify the neighbors of a given
`node`.
The API is implemented pretty uncontroversially. (We've done this a few
times before: see #319, #162). As with other iterators, we check for
comodification and error if this has occurred.
The tests cover some interesting cases like absent nodes, loops, and
multiple edges with the same src and dst.
Test plan:
Unit tests have been added. Run `yarn travis`.
Paired with: @wchargin
Summary:
A client of `Graph` is able to (e.g.) invoke `nodes()` to get a node
iterator, iterate over some of the nodes, then change the nodes by
adding or removing the nodes on the original graph. The semantics of
what to do here are not clear: ES6 specifies semantics for `Map` and
`Set`, but they have counterintuitive consequences: for instance, you
can get a `Set` iterator to yield the same value twice. Most collection
implementations in the Java standard library prohibit this entirely. In
this commit, we adopt the latter approach.
A caveat of this implementation is that a graph object may not be
mutated more than 2^53 − 1 (`Number.MAX_SAFE_INTEGER`) times. Clients
who need to mutate a graph more than nine quadrillion times are
encouraged to reconsider their data model.
Paired with @decentralion.
Test Plan:
Unit tests added. Run `yarn travis`.
wchargin-branch: comod-check
Adding a conflicting edge (i.e. one with the same address, but different
`src` or `dst`) is an error. However, our coverage has determined that
the behavior isn't tested. This commit adds that test.
Test plan:
Only change is adding a new test. Verify `yarn travis` passes.
Summary:
These are superseded by the unified implementation in `address.js`.
Test Plan:
Existing unit tests suffice.
wchargin-branch: remove-legacy-address
Summary:
This is basically a textual substitution.
Test Plan:
Existing unit tests suffice. Note that `_address.js` (with underscore)
is no longer imported except from its own tests.
wchargin-branch: unified-addresses-in-graph
Summary:
This implements the following functions for the unified addresses:
- assertions: `assertValid`, `assertValidParts`
- injection: `fromParts`
- projection: `toParts`
(These functions depend on each other for testing, so we implement them
together.)
Test Plan:
Unit tests included. Run `yarn travis`.
wchargin-branch: address-assertion-injection-projection
Summary:
This commit implements all the code needed for the top-level
`makeAddressModule` function, without implementing any of the address
functions. This mostly comprises handling of errors in the module
options.
Test Plan:
Unit tests added. Run `yarn travis`.
wchargin-branch: address-error-handling
Summary:
We have `NodeAddress` and `EdgeAddress`, which are opaque aliases of
`string` each with separate associated functions. We really want to keep
this setup: having the address types be structurally distinct is very
nice. But currently the implementation is extremely repetitive. Core
functionality is implemented twice, once for nodes and once for edges.
The test code is even worse, as it is forced to include ugly, hacky,
parametric generalizations to test both implementations—which are really
the same code, anyway!
In this commit, we introduce a system to unify the _implementations_
while keeping the _APIs_ separate. That is, users still see separate
opaque types `NodeAddressT` and `EdgeAddressT`. Users now also see
separate modules `NodeAddress` and `EdgeAddress`, each of which
implements the same interface for its appropriate type. These modules
are each implemented by the same address module factory.
To get this to work, we clearly need to parameterize the module type
over the address type. The problem is getting this to work in a way that
interacts nicely with the opaque types. The trick is to let the factory
always return a transparent module at type `string`, but to then
specialize the type of the resulting module in the module in which the
underlying type of the opaque type is known.
This commit includes specifications for all functions that are in the
current version of the API, but includes only as much implementation
code as is needed to convince me that tests and Flow are actually
running (i.e., very little). I’ll send implementations in separate PRs
for easier review.
The preliminary modules created in this commit _are_ exported from the
graph module, even though they are incomplete. This is so that we can be
sure that nothing will catch fire in Flow-land when we try to export
them (which is plausible, given that we have nontrivial interactions
between opaque types and parametric polymorphism).
Test Plan:
Unit tests included. Run `yarn travis`.
wchargin-branch: address-unified-foundations
Summary:
In actual code, we almost always call `neighbors` with a specified
direction. Usually, you want to inspect some relation like “parents of
this commit” or “GitHub nodes referenced by this comment”, and so the
edge direction matters. In each of the above cases, forgetting to
include the direction would introduce a bug: you’d get parents _and
children_ of a commit, or GitHub nodes referenced by _or that refer to_
a comment. It’s easy to forget this when writing the code, so we prefer
to make an explicit direction required, and allow people to specify
`Direction.ANY` in the case that that is what they want.
(In other words: we want to make the common thing easy, the uncommon
thing possible, and the wrong thing impossible.)
A similar situation holds for filters. By forcing clients to think about
what kinds of edges they want to follow to what kinds of nodes, we
encourage them to write more robust code. As before, if clients do want
to consider all nodes or all edges, they can pass the appropriate empty
address (`nodeAddress([])` or `edgeAddress([]`), which is a prefix of
every address.
Therefore, we require that clients pass an `options` object to
`neighbors`, and we furthermore require that each of the three options
be present.
Paired with @decentralion, in spirit.
Test Plan:
None; this changes the API for a function that has no implementation or
clients.
wchargin-branch: neighbors-options-required
Summary:
This saves clients from having to pollute their global namespace with
`IN` and `OUT` (and, soon, `ANY`). Calls now look like:
```js
myNode.neighbors({direction: Direction.OUT});
```
Callers who prefer to pollute their namespaces are of course still
welcome to do so, at the cost of one `const {IN, OUT} = Direction`.
Test Plan:
New unit tests for frozenness included.
wchargin-branch: direction-enum
Summary:
These functions can be used for address filtering: finding all nodes
owned by a plugin, or all edges of a certain plugin-type, or similar.
Clients can implement this in terms of `toParts`, but when the
underlying type of the opaque type is known there exists a simpler and
more efficient implementation.
Test Plan:
Unit tests added. Run `yarn travis`.
wchargin-branch: address-prefix
Summary:
This fixes an organizational error on my part: the assertions were not
in `it`-blocks, so (a) no test cases were listed, and (b) assertion
failure would have been considered an error in the test suite itself.
(This wouldn’t have led to tests spuriously passing, though.)
Test Plan:
Note the new test case structure:
```
nodeToString
errors on
✓ null base input
✓ undefined base input
✓ wrong kind
works on
✓ the empty address
✓ the address with one empty component
✓ a normal address
edgeToString
errors on
✓ null base input
✓ undefined base input
✓ wrong kind
works on
✓ the empty address
✓ the address with one empty component
✓ a normal address
```
wchargin-branch: tostring-test-case-structure
On larger repos (e.g. `ipfs/js-ipfs` and `ipfs/go-ipfs`) our GitHub
query tends to fail with an opaque (possibly timeout) error message.
Based on a discussion with @mikeal, it sounds like this is probably
GitHub killing queries that "take too long". This commit reduces our
default page sizes so that we make more smaller queries. Empirically, it
seems to help, and it's very unlikely to break anything.
Test plan:
Run
```
yarn backend
node bin/sourcecred.js plugin-graph --plugin=github ipfs js-ipfs \
> graph.json
```
before and after this change. Before, it fails; after, it succeeds.
This commit implements the following edge related methods on graph:
- `Graph.addEdge(edge)`
- `Graph.hasEdge(address)`
- `Graph.edge(address)`
- `Graph.edges()`
We've decided to enforce an invariant that for every edge, its `src` and
`dst` nodes must be present in the graph. As such, `Graph.addEdge` may
error if this condition is not true for the proposed edge, and
`Graph.removeNode` may error if any edges depend on the node being
removed. This invariant is documented via comments, and checked by the
test code.
Test plan:
Extensive unit tests have been added. Run `yarn travis`.
Paired with @wchargin
Summary:
If you just print out an address, depending on output context the NUL
separators may just disappear, which is really bad. Using `stringify` is
better, but causes these characters to appear as `\u0000`, which is not
very pretty.
This commit adds pretty-printing functions that clearly display the
addresses. A node address is printed as:
nodeAddress(["array","of","parts"])
An edge address is printed as:
edgeAddress(["array","of","parts"])
An edge in the graph is printed as:
{address: edgeAddress(["one"]), src: nodeAddress(["two"]), dst: nodeAddress(["three"])}
Note that each of these is a valid JavaScript expression that evaluates
to the argument that was originally converted to string.
Paired with @decentralion.
Test Plan:
Unit tests added. Run `yarn travis`.
wchargin-branch: node-edge-tostring
Summary:
Now, a client calling `assertNodeAddress` can indicate the context to
clients, like `assertNodeAddress(foo.bar.node, "foo.bar.node")`.
Paired with @decentralion.
Test Plan:
Unit tests updated. Existing graph code does not need to be updated,
because the default values work fine.
wchargin-branch: assertion-context-parameters
Summary:
Using `throw err;` was not useful when the error’s message was an
object: `[object Object]` would be printed to the console, in lieu of
any useful information. We now additionally use `console.error` to print
the full object contents.
Test Plan:
Apply the following patch:
```diff
diff --git a/src/v1/cli/commands/combine.js b/src/v1/cli/commands/combine.js
index b60f91e..2fc4061 100644
--- a/src/v1/cli/commands/combine.js
+++ b/src/v1/cli/commands/combine.js
@@ -24,6 +24,7 @@ export default class CombineCommand extends Command {
" where each GRAPH is a JSON file generated by plugin-graph";
async run() {
+ Promise.reject({a: {b: {c: {d: {e: {f: "gee"}}}}}});
const {argv} = this.parse(CombineCommand);
combine(argv);
}
```
Then, run `yarn backend && node bin/sourcecred.js combine`, and note
that the full contents of the object are printed. (If the `util.inspect`
is replaced with a simple `console.error`, then the object’s
representation will be truncated, so the `util.inspect` is important.)
wchargin-branch: rejection-error-full-contents
Summary:
For now, this contains the logic to register an `unhandledRejection`
error. I’ve removed all instances of those handlers, and `require`d this
module at every top-level entry point. (The individual CLI commands had
the handler before, but didn’t need it; conversely, the top-level CLI
entry point did not have the handler, but should have.)
Test Plan:
To test that the CLI commands still error on unhandled rejections, apply
the following patch:
```diff
diff --git a/src/v1/cli/commands/combine.js b/src/v1/cli/commands/combine.js
index b60f91e..d55b965 100644
--- a/src/v1/cli/commands/combine.js
+++ b/src/v1/cli/commands/combine.js
@@ -24,6 +24,7 @@ export default class CombineCommand extends Command {
" where each GRAPH is a JSON file generated by plugin-graph";
async run() {
+ Promise.reject("wat");
const {argv} = this.parse(CombineCommand);
combine(argv);
}
```
Then run `yarn backend` and `node bin/sourcecred.js`, and note that the
rejection handler is triggered.
wchargin-branch: unify-entry
Right now, the cred explorer attempts to display every node in the
graph. As graphs easily grow to O(100k) nodes, this is not kind to the
browser.
This commit limits display to the first 100 entries. Since they are
sorted by percieved importance, and it's easy to filter by type (e.g. to
find all users), this limitation is fine in practice.
Test plan:
Run the cred explorer on a larger repo and observe that the performance
is enormously improved. No unit tests added, as the cred explorer is a
prototype which is basically untested (#269)
Summary:
A `RecursiveTable` shows the tree-view from a single node; a
`RecursiveTables` shows multiple `RecursiveTable`s, and is used both in
the recursion and at the top level. This is useful because we’ll want to
make changes to both entry points, for things like pagination. In
particular, this makes #342 nicer.
Test Plan:
Note that the behavior is unchanged. Then, apply
```diff
diff --git a/src/v1/app/credExplorer/pagerankTable.js b/src/v1/app/credExplorer/pagerankTable.js
index 624bd71..e343e95 100644
--- a/src/v1/app/credExplorer/pagerankTable.js
+++ b/src/v1/app/credExplorer/pagerankTable.js
@@ -220,6 +220,7 @@ class RecursiveTables extends React.Component<RecursiveTablesProps> {
const y = pagerankResult.get(b).probability;
return y - x;
})
+ .slice(0, 10)
.map((address) => (
<RecursiveTable
depth={depth}
```
and note that (a) the root view has only ten entries, and (b) any
expanded subview also has only ten entries.
wchargin-branch: recursive-tables
Our GitHub renderer tries to display the author name for a comments or
pull request reviews. However, as we've already discovered (#228), not
every GitHub post has an author available. This breaks attempts to demo
the v1 cred explorer on some interesting repos, e.g. ipfs/js-ipfs.
Since the v1 code is all deprecated, it's effectively a demo branch
pending a rewrite of the cred explorer to use the v3 graph. As such, I
didn't include unit tests.
Test plan:
Attempt to run the cred explorer on `ipfs/js-ipfs` prior to this PR.
Observe that it loudly fails. Attempt to run it after this PR, and
that it succeeds.
Consider a case where a user wants to construct many addresses sharing a
common prefix. For example, every GitHub entity may have an address
starting with the component sequence `["sourcecred", "github"]`.
Currently, users can implement this with `toParts`:
```js
const GITHUB_ADDRESS = Address.nodeAddress(["sourcecred", "github"]);
function createGithubAddress(ids: $ReadOnlyArray<string>): NodeAddress {
return nodeAddress([...Address.toParts(GITHUB_ADDRESS), ...ids]);
}
```
This is unfortunate from a performance standpoint, as we needlessly
perform string and array operations, when under the hood this is
basically a string concatenation.
This commit fixes this by adding functions `nodeAppend` and
`edgeAppend`, which take a node (resp. edge) and some components to
append, returning a new address. Consider how straightforward our
example case becomes:
```js
const GITHUB_NODE_PREFIX = Address.nodeAddress(["sourcecred", "github"]);
const ISSUE_NODE_PREFIX = Address.nodeAppend(GITHUB_NODE_PREFIX, "issue");
// There is no longer any need for an explicit creation function
const anIssueAddress = Address.nodeAppend(ISSUE_NODE_PREFIX, someIssueID);
```
Paired with @wchargin.
Test Plan:
Unit tests added; run `yarn travis`.
This commit implements and tests the following methods on the `Graph`:
- `addNode`
- `removeNode`
- `hasNode`
- `nodes`
Test plan: Unit tests are included.
Paired with @wchargin
Summary:
Closes#336.
Test Plan:
Snapshots updated; changes are easily readable. Existing tests pass.
Running the cred explorer on the `sourcecred/sourcecred` graph shows
pull requests like `#124 (+73/−3): Update the README`, which is good.
wchargin-branch: pr-addition-deletion
This commit re-organizes our addressing code according to the following
conventions:
- All address code is implemented in `core/_address.js`, but clients
should not import that file directly.
- Public address-related methods will be made available via the
`Address` object in `core/graph.js`.
- The inaugural set of public address methods are:
- `nodeAddress`, for constructing a new `NodeAddress` from parts
- `edgeAddress`, for constructing a new `EdgeAddress` from parts
- `toParts`, for turning an address back into parts
- The types `NodeAddress` and `EdgeAddress`are also exported from
`graph.js`
Test plan: The tests cover both runtime and type properties.
Paired with @wchargin.
For more context and design discussions, see:
https://github.com/sourcecred/sourcecred/pull/334
This very minor commit adds the basic data structures to the `Graph`
that will act as underlying storage for nodes and edges. This is similar
to how nodes and edges are stored in `Graph` v1 and v2, except much
simpler, because we now have string keys, no data stored with nodes, and
minimal data stored with edges.
Test plan: The only observable change is that the graph can now be
constructed without error. I added a unit test to verify this behavior
:) as well as that the Graph constructor properly returns an object of
type `Graph` and not an `any`.
(Context: https://github.com/facebook/flow/issues/6400)
Paired with @wchargin
This commit implements Node/Edge addresses, and helper functions for
generating and manipulating them.
Test plan: Unit tests included.
Paired with @wchargin
The newly added `v3/src/core/graph.js` module is the heart of our `v3`
refactor, which aims at having our `Graph` class be just a graph[1] and
not a key-value store to boot.
This commit adds a skeleton of our intended `Graph` API. We've mostly
ported features from our existing `v1` `Graph` class, but with a few
changes:
- `NodeAddress` and `EdgeAddress` are both strings under the hood, which
allows for better performance and use of native JS Maps and Sets.
- Addresses are now hierarchical paths, so the functionality previously expressed
through `pluginName` and `type` can now be encoded into the path
components.
- `NodeAddress` and `EdgeAddress` are distinct types, and have different
serializations, so they can't get mixed up at runtime.
- It is no longer possible to bind arbitrary payloads to nodes or edges.
The expectation is that external data stores will be responsible for
that.
Test plan: No code was implemented; flow and lint are sufficient.
Paired with @wchargin
[1]: Technically, it's a [quiver]
[quiver]: https://en.wikipedia.org/wiki/Quiver_(mathematics)
We want to reset some of our basic assumptions, and make `Graph` into a
pure graph implementation, rather than a hybrid graph and key-value
store.
This is a substantial rewrite, so we want to start from scratch in a v3/
directory and pull code into v3 as necessary. So that we can do this in
a relatively clean fashion, we're first moving the v1 and v2 code into
their own directories.
Paired with @wchargin
Test plan: Travis, and `yarn backend`, `node bin/sourcecred.js start`.
Note that `yarn backend` and `node bin/sourcecred.js start` both use the
v1 versions. We'll migrate those (by changing paths.js) to v3 when
appropriate.
Summary:
The storage format is similar to the V1 storage format. Nodes are sorted
by address and stored in an ordered list. Edges have `srcIndex` and
`dstIndex`, keyed against the node order. Each node stores the name of
its plugin and the result of invoking `toJSON` on its payload.
Test Plan:
Unit tests added; run `yarn travis`. The single added snapshot is easy
to read and verify.
wchargin-branch: v2-json
Summary:
A `Node` includes a ref and a payload. We should say that two nodes at
the same address are equal iff their payloads are equal, and two
payloads at the same address are equal iff their JSON contents are
equal. Importantly, `Node` equality does not depend on the `ref`’s
internal structure: this structure includes a reference to the enclosing
graph, and so using this notion of node equality caused `equals` to
incorrectly return `false` when the inputs were two logically equal
graphs with different internal structure. (It also caused `equals` to be
very slow, performing a deep equality check on the graphs for every
node.)
Test Plan:
A unit test has been strengthened. It fails before this patch, and
passes afterward.
wchargin-branch: v2-node-payload-json-equality
Summary:
Both the implementation and the tests here are pretty straightforward.
The only change from GraphV1 is that we should explicitly check that the
plugins are copied correctly, because graph equality does not check
this.
Test Plan:
New unit tests suffice.
wchargin-branch: copy-v2
Summary:
I dropped the `edgeDecomposition` and `neighborhoodDecomposition` test
cases, because (a) we don’t have a canonical “interesting graph” on
which to run them, compounding that (b) it’s not clear how much value
they add.
Test Plan:
New unit tests suffice.
wchargin-branch: mergeConservative-v2
The implementation is adapted from our previous implementation, but has
been refactored to be more appropriate for our generator-function
approach.
The unit tests comprehensively explore a simple example graph, testing
the following cases:
- Direction filtering (IN/OUT/ANY/unspecified)
- Node types
- Edge types
- Self-loop edges
- Dangling edges
- Removed edges don't appear
Test plan: Carefully inspect the added unit tests.
Paired with @wchargin
Summary:
Based on code originally paired with @decentralion.
Test Plan:
Unit tests added. Running `yarn travis` is sufficient.
wchargin-branch: v2-edges
Given that we have reified the type `PluginFilter` as a consistent way
to filter nodes and edges by plugin and type, we should also have
re-usable logic for that end.
This commit adds `addressFilterer`, which takes a `PluginFilter` and
returns a function that checks whether a given address matches the
filter. Right now, only the `nodes` function uses it.
For added safety, `addressFilterer` does runtime validation that the
required `plugin` property was set.
Test plan: See included unit tests.
Summary:
It’s critical, even more so than usual, that the “base reference”
property of a `DelegateNodeReference` be a private property, because
this class is designed for inheritance. In ECMAScript 6, we can achieve
this by giving the property a `Symbol` key instead of a string key.
Unfortunately, Flow doesn’t know about `Symbol`s, so we need a few casts
through `any`, but they are localized to as small a scope as possible.
Test Plan:
Unit tests added. Note that they pass both before and after this change.
wchargin-branch: symbol-base-ref
This commit adds the following methods to the `Graph`:
* `addNode`
* `removeNode`
* `ref`
* `node`
* `nodes`
* `equals`
The graph now supports adding nodes, with thorough testing. Other
methods were implemented as necessary to test that `addNode` was
implemented properly.
Also, we've made a slight change to spec: `nodes` (and other filter
options) accept a `PluginFilter` object, which, if present, must
specify a plugin and may specify a type.
I've taken the opportunity to re-write the graph test code. Instead of
having a complicated `graphDemoData` file that creates a graph with many
different nodes, I've created an `examplePlugin` which makes it trivial
to instantiate new simple Foo and Bar nodes on the fly. Then, test cases
can construct a small graph that is clearly appropriate for whatever
functionality they are testing.
Test plan: Unit tests were added, travis passes.
Currently, when filtering by type (e.g. in neighbors), we require only a
type string. This is a design flaw, as if two plugins both define an
"ISSUE" type, either plugin may unexpectedly receive the other plugin's
nodes or edges.
We fix the flaw by explicitly binding the plugin name and type field
together as `PluginType`.
Test plan: Flow passes, and so does the following invocation to check
our design doc:
```
cat <(printf '// @flow\n') \
<(awk -v RS='```[a-z]*' '(NR+1)%2' \
src/core2/address_payload_unification_design.md) \
| yarn -s flow check-contents
```
- Add the `plugins` method to graph, and have the constructor take them
- Add in demo `PluginHandlers` to the `graphDemoData`
- Add unit tests for `plugins` methods
Test plan: Travis
Paired with: @wchargin
Also, rename `NodeDelegateReference` to `DelegateNodeReference` in the
design doc.
Test plan:
Didn't add tests, as the implementation is trivial. Flow passes.
src/core2/graph.js has the types and Graph class scaffold as
described in address_payload_unification_design.md.
Also, we decided to modify the design doc to include a type parameter
for Edges (although we aer open to removing if it becomes cumbersome).
Test plan:
`yarn flow` passes, and the design doc still typechecks too.
Paired with @wchargin
Since this document was originally written, we've iterated on the design
a bit. @wchargin and I have re-reviewed the doc, and we think it's now
up-to-date with our plans.
Test plan: I ran the following command to bring the code into my
clipboard:
```
❯ awk -v RS='```[a-z]*' '(NR+1)%2' src/core2/address_payload_unification_design.md | xsel -ib
```
Test plan:
Pasting this into https://flow.org/try produced no errors.
Paired with @wchargin
In #190, @wchargin lays out an ambitious proposal for refactoring away
the graph's address-payload distinction. This has proven to be a
complicated refactor to land, so we are going to achieve it by forking
parts of the project into v2, updating incrementally in v2, and then
replacing original components with their v2 versions.
Test plan:
No new code added. `yarn travis` passes.
Paired with @wchargin
Given that we are undergoing a major world-changing refactor (#190), all
outstanding code needs to be refactored to use the new conventions. We
don't actually use the Artifact Plugin yet, and reading the code, it
needs non-trivial rewrites to be in sync with the new world.
Rather than maintain it now, I am deleting it; we can regain the context
when the time is ripe to setup and integrate the plugin.
Test plan: Travis passes. `yarn start` produces no references to the
artifact editor.
Summary:
Improves the result of the preceding commit to strip out cruft. Paired
with @decentralion.
Test Plan:
```shell
$ ! git grep -Fi -e simple -e advanced src/core/graph*
```
Also review `awk '/(describe|it)\(/' src/core/graph.test.js`.
wchargin-branch: simplify-simple-advanced
Summary:
These tests have apparently been borked since the beginning. They were
testing properties about nodes instead of edges, had the wrong test
names, and had the LHS/RHS backward… this should make them more useful.
Test Plan:
Unit tests suffice, I suppose.
wchargin-branch: tests-for-missing-edges
Summary:
We now use the “advanced” graph everywhere. Happily, all tests still
pass!
Starting from `:%s/\%(advanced\|simple\)MealGraph/mealGraph/gc`, only a
few tests had to be manually changed: in particular, the `#equals` tests
for “returns false when the ?HS has nodes missing in the ?HS” became
trivially incorrect by the above substitution, and were therefore
changed to use more appropriate inputs.
Paired with @decentralion.
Test Plan:
Existing unit tests suffice.
wchargin-branch: remove-simple-graph
Summary:
We had three graph merging functions: `merge`, `mergeConservative`, and
`mergeManyConservative`. Of these, `merge` was never used outside of
code directly testing its behavior, and `mergeConservative` is a
strictly inferior version of `mergeManyConservative`. This commit
removes `merge` and `mergeConservative`, and renames
`mergeManyConservative` to `mergeConservative`.
Paired with @decentralion.
Test Plan:
Existing unit tests suffice; some useless tests pruned.
wchargin-branch: mmeerrggee
As previously implemented, the Graph was polymorphic in its NodePayload
and EdgePayload. This was a lot of bookkeeping for very
little apparent benefit. In general, graphs may be constructed with
foreign plugins, so it's hard to do anything with this information.
In my experience, having the Graph polymorphism has never caught a bug,
and has led to lots of boilerplate code, especially typing `Graph<any,
any>`. I view the fact that in #286 we added a new core `NodeReference`
concept, which always types its Graph as `<any, any>`, as strongly
suggestive that this was not going to provide any lasting value.
In this commit, I've removed the Graph polymorphism. Note how in many
cases where we were typing the graph, it provided no value, as evidenced
by the fact that the imported Node and Edge types were used no-where
else in the file other than in the Graph declaration.
Test plan:
Removing extra typing information is very unlikely to cause regressions.
`yarn flow` and `yarn lint` both pass.
Summary:
This is an implementation-only, API-preserving change to the `Graph`
class. Edges’ `src` and `dst` attributes are now internally represented
as integer indices into a fixed ordering of nodes, which may depend on
non-logical properties such as insertion order. The graph’s serialized
form also now stores edges with integer `src`/`dst` keys, but the node
ordering is canonicalized so that two graphs are logically equal if and
only if their serialized forms are equal. This change substantially
reduces the rest storage space for graphs: the `sourcecred/sourcecred`
graph drops from 39MB to 30MB.
Currently, the graph will have to translate between integer indices and
full addresses at each client operation. This is not actually a big
performance regression, because it is just one more integer-index
dereference over the previous behavior, but it does indicate that the
optimization is not living up to its full potential. In subsequent
changes, the `NodeReference` class will be outfitted with facilities to
take advantage of the internal indexing; a long-term goal is that
roughly all operations should be able to be performed within the indexed
layer, and that translating between integers and addresses should only
happen at non-hot-path API boundaries.
This diff is considerably smaller and easier to read with `-w`.
Paired with @decentralion.
Test Plan:
I inspected the snapshots for general form, and manually verified that
the indices for one edge were correct (the MERGED_AS edge for the head
commit of the example repo). Other than that, existing unit tests mostly
suffice; one new unit test added.
wchargin-branch: graph-indexed-edges
This commit adds explicit versioning to the `Graph` and `AddressMap`
JSON representations, using the new `compat` module. This will make it
safer to change the serialization format for these classes.
(Note: I don't expect we'll add backcompat handlers for these classes
soon, but having versioning means that we can change the serialization
format in a way that breaks old data cleanly and explicitly, rather than
introducing undefined behavior.)
Test plan: The changes are slight, and well-captured by the snapshot
tests. Note that after this commit, the SourceCred commands will fail on
old data, so old data will need to be regenerated.
This commit adds the `src/util/compat` module, which is responsible for
data compatibility. It provides two methods: `toCompat` and
`fromCompat`.
`toCompat` allows an object to be versioned with a type and version
string. Doing so wraps the object in an array whose first element has
`version` and `type` fields.
`fromCompat` allows loading compatibilized data. It takes a `type` and
`version` expectation, as well as optional processors for various data
versions. It throws an error if the data was not compatibilized, or if
it is the wrong type, or if it has an unsupported version.
The versioning is designed to be composable; see the final test block
for an example of loading a structure with nested versioning.
Test plan: Carefully inspect the included unit tests, and feel free to
propose more.
For context, see #280.
`sourcecred graph` tends to die due to lack of heap during the Git
plugin. Node defaults to ~1.76GB of heap available, which is just not
that much. This commit uses the `--max_old_space_size` argument to
`node` to increase the limit. We use a default value of `8192`, and it's
configurable by the user via a flag.
This command is only available natively in `sourcecred graph`, because
that command turns on other node processes. For commands that run in
their original process, you would need to set the value yourself.
`markovChain.findStationaryDistribution` is currently written such that
it blows the function call stack size if it is called with >200k nodes.
This commit effects a small refactor so that the delta is computed in a
loop over elements in the distribution, rather than creating a massive
function arguments list. This results in the correct behavior.
Test plan:
We discussed offline and decided not to test it. Reasons:
- We could factor out `computeDelta` and test that function in
isolation, but I think this provides very, very little value. The class
of regression it protects against is almost precisely the case where
someone reverts this commit.
- We could test the entire `findStationaryDistribution` function, which
is more valuable, but requires a large, slow test case (~1 sec)
- This is a bug that is unlikely to re-occur or to slip by unnoticed if
it does. It's also unlikely to surface in future refactoring.
See #286 for context.
I also upgraded client code in src/app.
Test plan:
Unit tests are extensive, including testing that `get`, `ref`, and
constructors are overriden on every `GitReference` and `GitPorcelain`
type that is exposed to clients.
Paired with @wchargin
See #286 for context. There are a few miscellaneous changes in
src/app/credExplorer to change clients to use the new API.
Test plan:
New unit test were added, and existing behavior is preserved. Most of
the functionality of the GitHub porcelain was already well tested.
Paired with @wchargin
`NodeReference` and `NodePorcelain` act as abstractions over the two
states a Node can be in.
- We might have the address of a node (because some edge pointed to it),
but don't actually have the Node in the graph. In that case, we can do
some queries on the node (e.g. find its neighbors), but can't access
the payload. This corresponds to having a `NodeReference`.
- We might have the node in the graph. In that case, we can access a
`NodePorcelain`.
The main benefit this abstraction brings is type-safety over accessing
data from a `NodePayload`. Previously, the coding conventions encouraged
clients to ignore the distinction, and the type signatures incorrectly
reported that many payload-level properties were non-nullable. Now, the
`get` method that mapp a `Reference` to a `Payload` is explicilty
nullable.
Given a `NodePorcelain`, it's always possible to retrieve the reference
via `ref()`. Given the `NodeReference`, you might be able to retrieve
the `NodePorcelain` via `get()`.
Clients that subtype `NodePorcelain` and `NodeReference` should, in
general, override the `ref()` and `get()` methods to return their
subtype. We also recommend having subclasses overwrite the constructors
to take a base `NodePorcelain` and `NodeReference` respectively
(although the base classes take a `Graph` and `address` as constructor
arguments).
Test Plan: Inspect the unit tests, they are pretty thorough.
Paired with @wchargin
Summary:
Instead of having one function that returns a union, we present two
functions, each of which returns a more specific type.
Paired with @decentralion.
Test Plan:
Existing unit tests pass, with sufficient modifications.
wchargin-branch: separate-issueByNumber-prByNumber
This commit adds "Node Porcelain" for the Git plugin. Node porcelain is
a wrapper over a graph and node address, which makes it possible to
access payload data and adjacent nodes via a familiar object-oriented
API.
I believe this porcelain provides substantially better legibility and
usability for the Git plugin. Consider that it is now easy to see what
relationships each Git node type can have by reading the method
signatures in the porcelain, rather than needing to inspect all of the
Edge types in types.js.
This porcelain has slightly different conventions from the porcelain in
the GitHub plugin, although the APIs are very similar. I intend to
follow this commit with two more: one that switches clients of the Git
plugin to use the porcelain, and another that refactors the GitHub and
Git porcelains to use a base Porcelain implementation in src/core.
Test plan:
Examine the public API of the Git porcelain (this is unlikely to change
much), and its corresponding test code.
`yarn travis --full` passes.
Summary:
There are substantive options for `convergenceThreshold` and
`maxIterations`, as well as the output option `verbose`. This change is
made in preparation for extracting this function into `markovChain`,
where we will add unit tests for it.
Test Plan:
Behavior of `yarn start` is unchanged.
wchargin-branch: configurable-findstationarydistribution
Summary:
This commit slightly reorganizes the internals of `basicPagerank` to use
the `SparseMarkovChain` type from the `markovChain` module.
Test Plan:
Behavior of `yarn start` is unchanged.
wchargin-branch: use-sparsemarkovchain
Summary:
This function is mostly useful for easily describing Markov chains in
test cases.
Test Plan:
Unit tests added. Run `yarn test`.
wchargin-branch: sparseMarkovChainFromTransitionMatrix
Summary:
For now, this module has just two types: `Distribution` and
`SparseMarkovChain`. We’ll gradually pull code from `basicPagerank` into
this module, adding unit tests along the way.
Test Plan:
None required.
wchargin-branch: markov-chain-module
Summary:
See #66 for more context. This yields the following performance
improvements for me, on the SourceCred graph with 11 072 nodes and
20 250 edges:
- Loading a graph from disk is improved overall from 1172 ms to 292 ms
(4.0× improvement).
- The full pipeline for basic PageRank, from button press to final
render, is improved from 8.44 s to 4.39 s (1.9× improvement).
- The PageRank preprocessing step, which involves turning the graph
into a typed array Markov chain, is improved from 2430 ms to 573 ms
(4.2× improvement).
- The PageRank postprocessing step, which involves turning the typed
array distribution into an `AddressMap` distribution, is improved
from 83.53 ms to 4.81 ms (17× improvement).
- The `PagerankTable` React component `render` method (constructing
the virtual tree only, not diffing or embedding into the DOM) is
improved from 1708 ms to 332 ms (5× improvement).
The core matrix computations of PageRank are unaffected, because they do
not use the `AddressMap` abstraction.
Test Plan:
Existing unit tests suffice.
wchargin-branch: fast-addressmap
Summary:
This takes `AddressMap` access, and therefore JSON stringification, off
the critical path, resulting in a significant performance increase. The
resulting code is much faster than the original TFJS implementation. On
my laptop, we can run about 300 iterations of PageRank per second on a
graph with 10 000 nodes and 18 000 edges (namely, the SourceCred graph).
Paired with @decentralion.
Test Plan:
Run `yarn start` and note that the cred attribution for SourceCred is
roughly the same as before… but is created faster.
wchargin-branch: pagerank-typed-arrays
Summary:
We’re not convinced that using TFJS at this time is worth it, for two
reasons. First, our matrix computations can be expressed using sparse
matrices, which will improve the performance by orders of magnitude.
Sparse matrices do not appear to be supported by TFJS (the layers API
makes some use of them, but it is not clear that they have much support
their, either). Second, having to deal with GPU memory and WebGL has
already been problematic. When WebGL PageRank is running, the machine is
mostly unusable, and other applications’ video output is negatively
affected (!).
This commit rewrites the internals of `basicPagerank.js` while retaining
its end-to-end public interface. We also add a test file with a trivial
test. The resulting code is not faster yet—in fact, it’s a fair amount
slower. But this is because our use of `AddressMap`s puts JSON
stringification on the critical path, which is obviously a bad idea. In
a subsequent commit, we will rewrite the internals again to use typed
arrays.
Paired with @decentralion.
Test Plan:
The new unit test is not sufficient. Instead, run `yarn start` and
re-run PageRank on SourceCred; note that the results are roughly
unchanged.
wchargin-branch: pagerank-without-tfjs
We were sorting low-to-high, and then reversing. We can just sort
high-to-low.
Test plan: No behavior change (confirm by interacting with the Cred
Explorer).
Paired with @wchargin
Test plan: Open the cred explorer, and use the + sign to expand the
neighbors. Observe that those neighbors are now sorted by cred.
Paired with @wchargin
Test plan: Open the cred explorer, and try clicking the + signs. They
will expand a recursive table showing the neighbor nodes and their cred.
Paired with @wchargin
`PagerankTable` is forked from `ContributionList`.
Test plan: I took it for a spin and it seemed OK. I'm not inclined to
write formal tests because we are in rapid prototyping mode stil.
Test Plan:
Load the cred explorer for the first time to see two empty boxes.
Refresh to see the same thing. Then, add some content to each box.
Refresh and see the same content.
wchargin-branch: cred-explorer-localstorage
This re-organizes the GitHub porcelain tests to be:
- organized by each method signature, rather than having blocks that
test many different methods on each wrapper
- make extensive use of snapshot tests for convenience
Test plan: Inspect the new unit tests, and the corresponding snapshots.
It should be relatively easy to do this because you can copy+paste the
urls to verify the properties.
Test plan: Unit tests were added.
(Note: I haven't tested the error case, when there are an invalid number
of parents. I think this is OK for now, but I'm disclosing this so
reviewers can easily take issue with it. I'm planning to re-organize the
test code to be by method rather than by wrapper type (so the wrappers
section doesn't keep being a kitchen sink) and will hopefully
put that test in.)
Also updates the GitHub porcelain.
Existing observable behavior is unchanged, except that performance may
be improved for issueOrPrByNumber.
A bug that would afflict a multi-repository graph (namely, that calling
`repo.issues()` would get all issues for all repositories) is
pre-emptively removed. No test cases were added as we do not yet support
multi-repository graphs.
Test plan: existing unit test coverage is sufficient.
Currently, the `authors` method is attached at the Repository level.
This is incorrect; it actually finds all the authors in the graph, not
all the authors of that repository. This commit moves the method to the
correct class.
Test plan: This function is only used in test code. The tests still
pass.
Summary:
We don’t expect the results to be of good quality right now. Rather,
this gives us a starting point from which to iterate the algorithm.
The convergence criterion also needs to be adjusted. (In particular, it
should almost certainly not be a constant.)
Test Plan:
Run `yarn start`. Select a graph, like `sourcecred/example-github`. Open
the JS console and click “Run basic PageRank”. Watch the console.
wchargin-branch: basic-pagerank
Summary:
This commit enables the cred explorer to fetch pre-generated graphs. The
form has poor UX, but gets the job done. (To do later: saving in
localStorage, allowing fetching the list of possible graphs, linking the
submit button so that it’s triggered by `Enter`, etc.).
Test Plan:
Set up with `node ./bin/sourcecred.js graph sourcecred sourcecred`, then
run `yarn start` and check the following cases:
- success case: `sourcecred`/`sourcecred`
- error case: `sarcecrod`/`sarcecrod` (nice console error)
- error case: the repository owner or name is an empty string or has
invalid characters, like `../../secret.txt` (nice console error)
wchargin-branch: fetch-graphs
Summary:
This commit changes `yarn start` to run a production version of the API
server, which serves static files from a pre-built directory and also
handles API requests.
Test Plan:
```shell
$ yarn build
$ yarn backend
$ node ./bin/sourcecred.js start -d /tmp/srccrd &
$ mkdir -p /tmp/srccrd
$ echo hello >/tmp/srccrd/world
$ curl -s localhost:4000/ | file -
/dev/stdin: HTML document, ASCII text, with very long lines, with no line terminators
$ curl -s localhost:4000/api/v1/data/world
hello
```
wchargin-branch: cli-start-prod-server
Test plan: Run `yarn start`, and observe that the Cred Explorer is now
included in the nav bar, and can be selected, in which case a short
message is displayed.
testUtil contains some useful configuration endpoints for our frontend
testing. This commit moves it out of the artifact editor and up to
`src/app`.
Test plan: `yarn travis` passes.
@wchargin suggested that the entity-wrapping logic in porcelain
reference handling should be factored out as its own method. This was a
great suggestion; it will be very useful for plugin consumers, and it
also results in better test coverage.
Test plan: The new unit tests are nice. For your own safety, do not
question or quibble with the magnificent foo plugin.
Summary:
This way, our frontend can talk to a backend that can read from the
filesystem (among other things).
Paired with @decentralion.
Test Plan:
```
$ yarn backend
$ SOURCECRED_DIRECTORY=/tmp/srccrd yarn start
$ # verify that the browser looks good
$ mkdir /tmp/srccrd
$ echo hello >/tmp/srccrd/world
$ curl localhost:3000/api/v1/data/world
hello
$ curl localhost:4000/api/v1/data/world
hello
```
wchargin-branch: webpack-proxy
Fixing the flow error corresponded to (correctly) documenting that the
GitHub token is mandatory, not optional.
Test plan: `yarn travis --full` still passes.
Summary:
This way, different plugins can have `LocalStore`s with different cache
keys.
Test Plan:
Note that the app (`yarn start`) preserves the local store values from
before this change, and that updates continue to work.
wchargin-branch: generic-localstore
This commit modifies App.js to use routing, such that it's possible to
navigate between a home screen, and the Artifact Editor.
Test plan: Run `yarn start`, and navigate between the home screen and
artifact editor. Verify that the artifact editor still works as
expected.
This commit adds src/app/App.js, which proxies in the frontend from
src/plugins/artifact/editor/App.js. The observed behavior (run `yarn
start`; see Artifact Editor) is unchanged.
Test plan: Observe that `yarn start` has the same behavior, and travis
passes.
This commit executes a micro-refactor to move all top-level app setup
code out of src/plugins/artifact/editor and into src/app. The observed
behavior from `yarn start`, which is to show the artifact editor, is
unchanged.
Summary:
We need a way for our web applications to interact with data on the
filesystem. In this commit, we introduce a webserver that serves
statically from two directory trees: first, the result of a live-updated
Webpack build; second, the SourceCred data directory.
Test Plan:
Run `yarn backend` and `node ./bin/sourcecred.js start`. When ready,
navigate to the server’s root route in a web browser. Note that a nice
React app is displayed. Then, change something in that React app source.
Note that the server console displays Webpack’s update messages, and
that refreshing the page in the browser renders the new version of the
app. Finally, visit
/__data__/graphs/sourcecred/example-github/graph.json
in the browser to see the graph for the example repository, assuming
that you had generated its graph previously.
wchargin-branch: start
This script ensures that either //@flow or //@no-flow is present in
every js file. Every existing js file that would fail this check has
been given //@no-flow, we should work to remove all of these in the
future.
Test plan:
I verified that `yarn travis` fails before fixing the other js files,
and passes afterwards.
Now that we have repository nodes (#171), it makes sense that the Github
porcelain should provide a way to wrap the entire graph, and provide
easy access for the various repositories. This adds a `Porcelain` class
to fulfill that need.
The `Porcelain` is very straightforward: it takes in the whole graph,
and gives a way to get all the Repositories, or to request a particular
Repository by owner/name. In the odd case wherein a graph contains
multiple repository nodes with the same owner and name, an error is
thrown. Per standard JS map semantics (bleh), it can return undefined if
there is no matching repository.
Test plan:
See that the unit tests now use the standard behavior, and a test
verifies behavior for non-existant repositories. I don't have a test
case where there are multiple repo nodes, but that itself would be an
error, so throwing an error in that case is just defensive programming.
This commit creates a new node type in the GitHub graph: the REPOSITORY
node. The REPOSITORY node has the following payload properties:
- url (string)
- name (string)
- owner (string)
Things this commit does:
- Add new node type and payload type (RepositoryNodePayload)
- Update parser to instantiate the new node type
- Update api.js to have Repository wrap the new node type (thus
Repository is a GitHub entity)
- Update snapshots
- Update users of GitHub node types to ensure they are exhaustive
Things that will come in a followon commit:
- Add CONTAINS edges from the repository to all its PRs and Issues
- Update the Repository porcelain to use those edges, rather than
scanning the graph for every possible Issue/PR (eventually those might
belong to other Repositories)
- Create a GitHubGraph abstraction in the porcelain, which makes it easy
to find all of the Repositories in a graph
Note that retrieving the repository owner technically involved fetching
the whole owner representation (as a GitHub user). I could have chosen
to add that user to the graph, with a "OWNS" edge pointing to the
repository. For simplicity's sake, I've declined to do that, and instead
just parse the owner's name directly.
Test plan:
Added tests to verify that the Repository porcelain entity has the right
properties. Combined with the snapshot tests, that should be sufficient.
When a GitHub user delete their account, all of their comments remain,
but with a `null` author. Previously, our code did not account for this
possibility. Now it does (by simply not adding an AUTHORSHIP edge).
Conveniently, our porcelain API already represents authors as a list, so
this doesn't require any change in porcelain API usage.
Test plan:
I did not add any unit tests, simply because
creating-and-deleting a GitHub user to make a repro seemed like a bit of
a pain. However, it is very unlikely that this bug will re-occur,
because the nullability of AuthorJSON is now enforced at the type level
inside graphql.js.
Also, running `node bin/sourcecred.js src-d go-git` now succeedsinstead
of failing.
Summary:
This solves two problems:
1. The “output directory” argument to `sourcecred graph` is also the
input directory to other commands, like `sourcecred analyze`
(hypothetically). In such cases, it would be nice for the flag to
have the same name, but clearly `--output-directory` does not always
make sense.
2. In addition to storing graphs, we’ll need to store other kinds of
data: settings, intermediate data for plugins to cache results, etc.
We should store these under a single umbrella.
With both of these problems in mind, it makes sense to create a
`SOURCECRED_DIRECTORY` flag under which we store all relevant data.
Test Plan:
Run:
```shell
$ yarn backend
$ node bin/sourcecred.js help graph
$ node bin/sourcecred.js graph sourcecred example-github
$ node bin/sourcecred.js graph sourcecred example-github -d /tmp/sorcecrod
$ SOURCECRED_DIRECTORY=/tmp/srccrd node bin/sourcecred.js graph sourcecred example-github
$ for dir in /tmp/{sourcecred,sorcecrod,srccrd}; do find "${dir}"; done
```
wchargin-branch: graph-directory
Our SourceCred CLI tool now ipmlements printCombinedGraph and
cloneAndPrintGitGraph, but with more principled implementations and
interfaces :)
Test plan:
`yarn travis --full` passes, so I didn't delete any needed test infra.
Previously, if a CLI command had an unhandled promise rejection, this
would result in a spurious success and zero exit value.
This commit causes all of our CLI commands to instead fail if they have
an unhandled promise rejection.
Test plan: Previously, `sourcecred graph src-d go-git` would claim to
succeed, although it actually fails due to an unrelated bug. After this
change is applied, it correctly fails to retrieve the GitHub graph (and
hte combine step is never run).
This commit pulls new information from GitHub about the url, name, and
owner of a GitHub repository. Part of #171.
Test plan: example-github-repo.json has been updated. `yarn travis
--full` passes. This should be sufficient.
Currently, we generate a `RepositoryJSON` object via querying GitHub.
That `RepositoryJSON` object has a `repository` field... which is weird,
and suggests we got the names slightly wrong.
This commit renames the top-level response to `GithubResponseJSON`, and
factors the `repository` field out as `RepositoryJSON`. Correspondingly,
the `addData` and `addRepository` methods on the parser are now
distinct.
This is a precursor for #171.
Test plan: This is a simple refactor; the fact that yarn travis passes
should be sufficient.
Test Plan:
Run `node bin/sourcecred.js graph sourcecred example-github` and note
the new output:
```
Storing graphs into: /tmp/sourcecred/sourcecred/example-github
Starting tasks
GO create-git
GO create-github
DONE create-git
DONE create-github
GO combine
DONE combine
Full results
DONE create-git
DONE create-github
DONE combine
Overview
Final result: SUCCESS
```
wchargin-branch: plugin-graph-label
Summary:
The `"PASS"` label only makes sense for tests. This commit makes the
labels configurable, so that the verbiage can make more sense in other
contexts, too.
Test Plan:
Apply a patch like
```diff
diff --git a/config/travis.js b/config/travis.js
index af0996b..b0ab3b6 100644
--- a/config/travis.js
+++ b/config/travis.js
@@ -10,7 +10,11 @@ function main() {
process.argv.includes("--full")
? "FULL"
: "BASIC";
- execDependencyGraph(makeTasks(mode)).then(({success}) => {
+ execDependencyGraph(makeTasks(mode), {
+ taskLaunchLabel: " YO ",
+ taskPassLabel: "WHEE",
+ taskFailLabel: "UHOH",
+ }).then(({success}) => {
process.exitCode = success ? 0 : 1;
});
}
```
and note that `GITHUB_TOKEN=none yarn travis --full` exhibits all
desired messages.
wchargin-branch: configurable-labels
Summary:
This commit implements the `sourcecred` command-line utility, which has
three subcommands:
- `plugin-graph` creates one plugin’s graph;
- `combine` combines multiple on-disk graphs; and
- `graph` creates all plugins’ graphs and combines them.
As an implementation detail, the `into.sh` script is very convenient,
avoiding needing to do any pipe management in Node (which is Not Fun).
When we build for release, we may want to factor that differently.
Test Plan:
To see it all in action, run `yarn backend`, and then try:
```
$ export SOURCECRED_GITHUB_TOKEN="your_token_here"
$ node ./bin/sourcecred.js graph sourcecred sourcecred
Using output directory: /tmp/sourcecred/sourcecred
Starting tasks
GO create-git
GO create-github
PASS create-github
PASS create-git
GO combine
PASS combine
Full results
PASS create-git
PASS create-github
PASS combine
Overview
Final result: SUCCESS
$ ls /tmp/sourcecred/sourcecred/
graph-github.json graph-git.json graph.json
$ jq '.nodes | length' /tmp/sourcecred/sourcecred/*.json
1000
7302
8302
```
The `node sourcecred.js graph` command takes 9.8s for me.
(The salient point of the last command is that the two small graphs have
node count adding up to the node count of the big graph. Incidentally,
we are [almost][1] at a nice round number of nodes in the GitHub graph.)
[1]: https://xkcd.com/1000/
wchargin-branch: cli
Summary:
To be honest, I have no idea what exactly this does or why it’s
necessary, but if we don’t do this then it is not possible to `import`
the exported member from a Webpack-bundled script. I’ve seen this
pattern before; one day I’ll actually figure out what it does. :-)
Test Plan:
Note that `yarn travis` (success) and `yarn travis --full` (failure; no
GitHub token) both have the expected behaviors.
wchargin-branch: execdependencygraph-export
This commit adds [oclif] as a command-line framework. It is successfully
integrated with webpack.
[oclif]: https://github.com/oclif/oclif
Usage:
`yarn backend` to build the cli.
`node bin/sourcecred.js` to launch the CLI and see usage
`node bin/sourcecred.js example` for one example command
`node bin/sourcecred.js goodbye` for another example command
Summary:
We’d like to use the same abstraction for creating multiple cred graphs
and then combining them together. This will enable us to do that.
Test Plan:
Run `yarn travis` to test the success case, and `yarn travis --full`
(without setting a `GITHUB_TOKEN`) to test the failure case.
wchargin-branch: execdepgraph
`printCombinedGraph` loads and prints a cross-plugin combined
contribution graph for a given GitHub repository.
It is a simple executable wrapper around `src/tools/loadCombinedGraph`.
Example usage:
`node bin/printCombinedGraph.js sourcecred example-git $GITHUB_TOKEN`
`cloneAndPrintGitGraph` clones a git repository, and generates a Git
object graph for that repository.
This can be run as follows:
```
yarn backend;
node bin/cloneAndPrintGitGraph sourcecred example-git
```
This commit also adds two utility modules:
* `cloneAndLoadRepository` , which clones a Git repository to a tmpdir,
parses the `Repository` data out, and then cleans up.
* `cloneGitGraph`, which calls `cloneAndLoadRepository` and `createGraph`
Test plan: These don't fit well into our CI, because they require
network access to clone repositories from GitHub. I verified that the
functions work via the demo script above.
fetchGithubGraph is a tiny module which is responsible for fetching
GitHub contribution data, and parsing it into a graph.
Test plan:
The function is trivial and does not merit explicit testing.
Summary:
If a commit causes a tree entry to change hash while keeping the same
name, we now add a BECOMES edge between the corresponding entries.
Test Plan:
Snapshot changes are readable enough to manually verify. Programmatic
tests also added.
wchargin-branch: graph-becomes-edges
Test Plan:
For the snapshot: verify that two of the BECOMES edges are the same as
those tested in `findBecomesEdgesForCommits` and that they have the
right commit hashes; then, verify that the remaining edge is correct.
wchargin-branch: high-level-becomes-repository
This adds MERGED_AS edges which link from a PullRequest to a Commit. It
adds a corresponding `mergedCommitHash` method on the porcelain PR that
returns the hash of the merged commit (if available).
I would have preferred to return a porcelain wrapper over the commit,
but since we don't have a porcelain Git api, it seemed preferrable to
return the hash as a string. Returning a Node would both break
consistency in the porcelain api, and be problematic as the node does
not necessarily exist in the api. To ensure that the hash is available
without parsing Addresses, I used the edge payload. :)
Test plan:
Inspect the snapshot changes in the graph (they are fairly readable) and
the api testing in api.test.js.
This commit adds a few fields to the PullRequest query fragment so that
we now retrieve merge commit shas. In cases where there is no merge
commit (ie the PR did not merge), the field is null. Observe that this
is the case for our example unmerged pull request.
Test plan: Inspect the changes to the demo data, and verify that they
are appropriate.
For the GitHub plugin to create edges pointing to commits from the Git
plugin, it needs a way to create the appropriate address given the
commit's hash. This commit exposes that functionality by moving
`makeAddress` out of the "createGraph" module and into a new "address"
module, and using it to implement `commitAddress`.
Test plan: The code is so trivial that I don't think it merits testing.
Our repository containing example GitHub data has been renamed from
"sourcecred/example-repo" to "sourcecred/example-github". This commit
updates all snapshots and tests to reflect this rename.
See [#190] for context.
The change is almost entirely straightforward; the only "interesting"
decision I made was to move the repo owner and repo name into the string
id for the Artifact Plugin addresses, as the id would otherwise not be
unique.
[#190]: https://github.com/sourcecred/sourcecred/issues/190#issuecomment-386362870
Summary:
Previously, a tree entry had exactly one `HAS_CONTENTS` edge, unless the
tree entry corresponded to a submodule commit, in which case we had no
information at all. Now, submodule commit tree entries point to zero or
more `SUBMODULE_COMMIT` nodes. In the vast majority of cases, there will
be exactly one such node—however, it is possible that a particular tree
entry could correspond to multiple submodules (clone two identical
submodules with different URLs) or none at all (some manually edited
`.gitmodules` or other corruption).
Test Plan:
The snapshot changes are easy enough to read and verify (two new nodes
and five new edges). Additionally, the path-to-submodule `createGraph`
test has been updated.
wchargin-branch: graph-submodule-urls
Summary:
In Git, a tree may point to a commit directly. In our graph, we’d like
to represent “submodule commits” explicitly, because, a priori, we do
not know the repository to which the commit belongs. A submodule commit
node will store the hash of the referent commit, as well as the URL to
the subproject as listed in the .gitmodules file. In this commit, we
load the list of those URLs into the in-memory repository.
Shout-out to `git` for having an excellent command-line API:
the `--blob` argument to `git-config` is perfect for this situation.
Test Plan:
Snapshot updates are readable and sufficient.
wchargin-branch: load-submodule-urls
Summary:
Prior to this commit, given a `Tree` node with an edge to a `TreeEntry`
node, there was no way to tell what the entry name was other than
parsing the ID (which should never be required). This adds appropriate
data to the payload of a `TreeEntry`, and also to the inclusion edge (so
that if you only have the edge, you don’t have to fetch the entry).
Test Plan:
Snapshot changes are readable.
wchargin-branch: treeentry-metadata
Summary:
This can be useful for speed, but it can also be important for
correctness (at least theoretically): if we run both these scripts
concurrently, then we don’t want one of them to squash the `bin`
directory while the other is about to invoke an executable therein.
One might note that the diffs to the two files in this commit are
virtually identical, and indeed the files themselves are quite similar.
I’d prefer to keep the duplication for now; if we really need a Bash
snapshot testing framework, we can factor one out.
Test Plan:
Run each script with `--help`, with `--build` and `--no-build`, and with
and without `-u`.
wchargin-branch: optional-rebuild
Summary:
To avoid confusion, we simultaneously remove unused capturing groups.
This is not strictly necessary, but it makes the code less brittle.
Test Plan:
The newly added test fails before the change to `findReferences.js`.
wchargin-branch: url-punctuation
This commit adds the `addReferenceEdges()` method to the GitHub parser,
which examines all of the posts in the parsed graph and adds References
edges when it detects references between posts. As an example, `Hey
@wchargin, take a look at #1337` would generate two references.
We currently parse the following kinds of references:
- Numeric references to issues/PRs.
- Explicit in-repository url references (to any entity)
- @-author references
We do not parse:
- Cross-repository urls
- Cross-repository shortform (e.g. `sourcecred/sourcecred#100`)
`Parser.parse` calls `addReferenceEdges()`, so no change is required by
consumers to have reference edges added to their graphs.
The GitHub porcelain API layer now includes methods for retreiving the
entities referenced by a post.
Test plan:
This commit is tested both via snapshot tests, and explicit testing at
api layer. (Actually, the creation of the porcelain API layer was
prompted by wanting a cleaner way to test this commit.) I recommend
inspecting the snapshot tests for sanity, but mostly relying on the
tested behavior in api.test.js.
I added a lot of new comments that have url references to different
types of GitHub entities, e.g. to pull request review comments.
The commit was generated by running the example repo fetcher, and
running yarn test and updating snapshots.
Summary:
Flow didn’t catch this because all the payloads are `{}` anyway.
Test Plan:
Note that every node and edge payload is now listed exactly once in the
correct spot for each of `{Node,Edge}{Type,Payload}`.
wchargin-branch: git-nodepayload
This method removes `Graph.inEdges` and `Graph.outEdges`. As a
replacement to these two functions, `graph.neighborhood` now takes an
optional `direction` flag, which can be set to `"IN" | "OUT" | "ANY"`.
This reduces the surface area of the Graph API, and means that the same
pattern can be used when requesting in or out neighbors as is used when
requesting all neighbors.
This change generates significant churn in the test files, and in some
cases the tests are less elegant / show historicity, as they were
written for the type signature of `{in,out}Edges`, which just returns an
array of edges, and now receive an array of neighbors. I think this is
acceptable, and it's not worth re-writing the test.
In many cases, replacing existing calls to `{in,out}Edges` in our actual
codebase resulted in cleaner code, as `neighborhood` successfully
abstracts over the common patterns that users of `{in,out}Edges` were
implementing.
As a fly-by refactor, I also changed the `neighborAddress` part of the
`neighborhood` return value to `neighbor`. It's a little less
descriptive, but it's more concise, and flow is there to help ensure
it's used correctly.
Test plan: Note that CI passes. Inspect the test changes, and verify
that they are appropriate transformations for consuming the new API.
Test Plan:
This snapshot test is too unwieldy to actually read—it’s 1000 lines of
opaque SHAs and thrice-stringified JSON objects—so it should be
interpreted as a regression test only. The programmatic tests should
suffice.
wchargin-branch: wip-git-create-graph
Test Plan:
Run `yarn lint` and `yarn travis` and observe success. Add something
that triggers a lint warning, like `const zzz = 3;`; re-run and observe
failures.
wchargin-branch: lint
In general, methods in the porcelain GitHub api may return multiple
types; e.g. a reference could be to an Issue, PullRequest, Comment,
Author (or more). To make working with the api more convenient while
maintaining safety, this commit adds a static `asType` method to each
Entity class, which confirms that type coercion is safe, and errors if
not.
This commit also adds `issueOrPRByNumber`, a convenience method, to
api.test.js.
Test plan: Check the API usage and verify that it is reasonable.
Interacting with raw contribution graphs is cumbersome. We'll need
more fluent and convenient ways to retrieve data from them; we can do
this by creating porcelain APIs that wrap the underlying graph.
This commit adds a simple porcelain API for the GitHub data. It creates
the following classes:
* `api.Repository`
* `api.Issue`
* `api.PullRequest`
* `api.Comment`
* `api.Author`
The classes all wrap a graph and a nodeAddress. They provide read-only
functions for retreiving data from the graph; that data might be a part
of the node payload, or it might do some graph traversal under the hood.
The choice to have the wrapper hold onto the Address rather than the
node itself was deliberate; in the future, the graph may contain nodes
that are not synchronously reachable, so this approach allows us to
create wrappers for nodes we can't synchronously reach. When this comes
up in practice, we can then add async methods to the wrapper.
Note that some data already included in our graph, such as
PullRequestReviews and PullRequestReviewComments, were deliberately
excluded, so as to allow the core ideas to be reviewed without
unnecessary clutter.
Test plan:
Check that the unit tests appropriately test the behavior, and that the
API seems pleasant to use.
Summary:
Two reasons for this. First, we want tests to be able to operate on this
data without having to generate repositories via `git(1)`. (Doing that
is slow, and requires a Git installation, and makes it less clear that
the tests are correctly isolated/provides more surface area for
something to go wrong.) Second, in general plugins will need a canonical
source of test data, so setting/continuing this precedent is a good
thing.
Test Plan:
Observe that the old Jest snapshot must be equivalent to the new JSON
one, because the test criterion in `loadRepository.test.js` changed and
the test still passes. Then, run `loadRepositoryTest.sh` and note that
it passes; change the `example-git.json` file and note that the test
fails when re-run; then, run the test with `--updateSnapshot` and watch
it magically revert your changes.
wchargin-branch: check-in-git-repo
Summary:
I’d like to use `Map`s whenever the keys are homogeneous (i.e.,
dictionaries, not structs). But this has proven infeasible. The primary
issue at this point is that `JSON.stringify(anyMap)` is `"{}"`—not
entirely unreasonable given that maps can have non-string keys, but
frustrating enough to not use them.
Test Plan:
Jest appears to order the snapshot keys differently for `Map`s and
objects (the former by insertion order and the latter alphabetical),
which makes the snapshot change harder to read. I verified that the
general structure is okay, and hand-verified some of the individual
changes. Noting that the number of lines added and deleted in the
snapshot is a good sanity check.
wchargin-branch: map-to-object
When requesting nodes and edges from the graph, it is convenient to
filter them by their type.
In the future, we should add plugin filtering as well, as we
expect type names to collide across plugins.
We may also want to consider keeping a cache of nodes and edges by type
to speed up these calls, if they become performance bottlenecks. (The
implementation in this commit naively iterates over every node/edge.)
Test plan:
Verify that the unit tests are appropriate.
Currently, we store GitHub Users, Organizations, and Bots as separate
nodetypes in the graph. This is inconvenient, as we don't care very much
what type of entity authored a node.
This commit collapses those three categories into one nodetype. The
extra information has migrated to the node payload, so it is still
possible to discover this information if it's important.
Test plan: There is some amount of snapshot churn because the author
node types and payloads have changed. Verify that the snapshot changes
are appropriate, and that CI passes.
This commit renames the following graph functions:
* `get{Node,Edge}{,s}` -> `{node,edge}{,s}`
* `get{In,Out}Edges` -> `{in,out}Edges`
* `getNeighborhood` -> `neighborhood`
The rename was effected across the repo by running:
```
$ find src -name "*.js" -exec sed -i 's/getNeighborhood/neighborhood/g' {} +
```
modified appropriately for each subsitution.
Test plan:
Inspect the code to make sure nothing was erronously renamed. Check that
CI passes.
`Graph.getAdjacentEdges` had a serious defect: for the adjacent edges,
it's hard to tell which of the {src,dst} is the neighboring node address
and which is the node we called `getAdjacentEdges` on.
This commit fixes that limitation by replacing `getAdjacentEdges` with
`getNeighborhood`, with a return signature of
`{edge: Edge<EP>, neighborAddress: Address}[]`
Some yak shaving was required: we needed a version of `expectSameSorted`
and, by extension, `sortedByAddress` that takes an accessor to an
Addressable, so that we could test that the neighborhoods returned were
correct. To satisfy flow, I created `expectSameSortedAccessorized` and
`sortedByAddressAccessorized`. Cumbersome, but it worked. ¯\_(ツ)_/¯
Previously, the address module exported `sortedByAddress`, a utility
function that sorts an array of `Addressable`s. This function was only
used in test code.
This commit replaces it with generic usage of `lodash.sortBy`. This
reduces the API surface area of the module, and removes test-only code
from the exported api.
New dependency added: `lodash.sortby`
https://www.npmjs.com/package/lodash.sortby
Test Plan:
Run the script with `--dry-run`, which currently prints
```shell
$ src/plugins/git/demoData/synchronizeToGithub.sh -n
yarn run v1.5.1
[build output truncated]
Build completed; results in 'bin'.
Done in 3.30s.
Synchronizing: example-git
warning: no common commits
To github.com:sourcecred/example-git.git
+ 3507b7c...3715ddf HEAD -> master (forced update)
Synchronizing: example-git-submodule
Everything up-to-date
Done.
```
This reflects the correct state of affairs, because #158 changed the
example repository. Note that the `3715ddf` SHA in the output of the
above script matches the SHA in the `exampleRepo.test.js.snap` snapshot.
wchargin-branch: sync-git-example-repos
Summary:
When we shell out to `git`, we don’t want the end user’s environment
variables and Git configuration to influence the results. This commit
standardizes those inputs. Standardizing the environment has the side
benefit that the `GIT_DIR` environment variable is not set, which means
that the test suite will work properly when run from the `exec` step of
a Git rebase.
Test Plan:
Tests pass and snapshots are unchanged. Note that
```shell
$ git rebase HEAD --exec 'CI=1 yarn test'
```
works after this commit but not before it.
wchargin-branch: standardize-git-environment
Summary:
Using `array.join()` added commas at the start of some lines; I meant to
use `array.join("")`.
(I’ve now inspected the full generated contents of both repos, and they
look good.)
Test Plan:
It is expected that these attributes of the snapshots should change.
There’s no need to carefully check the SHAs.
wchargin-branch: readme-change
Some consumers of the graph may prefer to treat it as an undirected
graph. For example, when finding the author of an issue, it is wholly
sufficient to find an edge with the `AUTHORS` type; the caller may
prefer not to be bothered with remembering which end of the `AUTHORS`
end is considered the `src` and which is the `dst`.
The `getAdjacentEdges` call enables that, by combining the output of
`getInEdges` and `getOutEdges`.
Test plan:
The new tests are pretty comprehensive.
Summary:
The main example repository now covers the currently desired features:
it has blobs, subtrees, and submodules, and commits that change each of
these. (We don’t have merge commits yet—we can add those once we start
to care about them.)
Once this is merged, I will push the two repositories to GitHub.
Test Plan:
Verifying and understanding is easier than ever before. You can run the
following commands to create the repositories in question on your disk:
```shell
$ yarn backend
$ node bin/createExampleRepo.js /tmp/repo
$ node bin/createExampleRepo.js --submodule /tmp/subrepo
```
You can then explore these repositories at your leisure. For instance,
to check that the `loadRepository` snapshot has the right set of
commits, inspect the output of the following command:
```shell
$ git -C /tmp/repo log --format='%H %T'
```
Or, to check that a particular tree has the right contents, just run:
```shell
$ git -C /tmp/repo ls-tree TREE_SHA
```
Verifying the `exampleRepo` snapshot is similarly easy: just check that
the lists of commit SHAs in `/tmp/repo` and `/tmp/subrepo` are correct.
wchargin-branch: include-submodule
Summary:
We’ll use this to create the repositories on disk and then push them to
GitHub.
Test Plan:
Generate both kinds of repository, and check out the SHAs:
```shell
$ yarn backend
$ node bin/createExampleRepo.js /tmp/repo
$ node bin/createExampleRepo.js --submodule /tmp/repo-submodule
$ node bin/createExampleRepo.js --no-submodule /tmp/repo-no-submodule
$ # (first and third lines do the same thing)
$ git -C /tmp/repo rev-parse HEAD
677b340674bde17fdaac3b5f5eef929139ef2a52
$ git -C /tmp/repo-submodule rev-parse HEAD
29ef158bc982733e2ba429fcf73e2f7562244188
$ git -C /tmp/repo-no-submodule rev-parse HEAD
677b340674bde17fdaac3b5f5eef929139ef2a52
```
Then, note that these SHAs are expected per the snapshot file in
`exampleRepo.test.js.snap`.
wchargin-branch: create-example-repo-command
Summary:
We want our main repository to include submodules so that we can test
submodule support. Here, we create a repository to be included as a
submodule.
wchargin-branch: example-submodule-repository
Summary:
The `loadRepository` test tries to clean up temporary directories, but
failed to do so because the directories were not empty. The cleanup hook
threw an error, but this error was silenced by Jest due to [a known
bug][1] that was fixed a few days ago. We can fix this by asking `tmp`
to clean up directories even if they are not empty, using the
`unsafeCleanup` option.
[1]: https://github.com/facebook/jest/issues/3266
Test Plan:
While running `watch -n 0.1 'ls /tmp | grep "tmp-.*" | wc -l'`, run
tests. Note that the number increases by five and then drops down again;
before this patch, it would increase by 5 and then stay there.
wchargin-branch: clean-up-tmpdirs
Summary:
A few reasons for this:
1. This _is_ a utility, so it makes sense semantically.
2. This unifies the utilities API; clients like `loadRepository.test`
don’t have to keep around both a `git` and a `gitUtils`.
3. Most importantly, further scripts and tests shouldn’t depend on
`loadRepository` just for `localGit`. Depending on `gitUtils` makes
much more sense.
(Note that `makeUtils` is no longer dependency-injectable, but that’s
okay; I considered this and favored YAGNI on this one.)
Test Plan:
Existing unit tests pass.
wchargin-branch: move-localgit
Summary:
Utilities like `deterministicCommit` provide valuable functionality that
we will want to use in other scripts and perhaps other test cases. It
makes sense to factor these out into utility functions.
Test Plan:
Existing tests pass.
wchargin-branch: git-utils
This commit adds an optional `typeOptions` argument to Graph.getInEdges
and Graph.getOutEdges. The `typeOptions` allow filtering the returned
edges by the type of the edge, and the type of the node that the edge is
connected to. This makes it much easier to use these methods to find
connections that have a certain relationship, e.g. finding the author of
a commit or the comments on an issue.
Test plan:
A new test suite was written that comprehensively tests this behavior,
both for getInEdges and getOutEdges.
In preparation for using type info in the Graph apis, it is helpful to
have richer type info in the graph demo data.
Test plan: Check that the snapshot changes only consist of type changes,
and that CI passes.
Our GitHub parser is implemented via a `GithubParser` class which builds
the GitHub graph. This is a convenient implementation, but an awkward
API. This commit refactors the module so that it exposes a clean `parse`
function, which ingests the GitHub JSON data and returns as completed
graph.
Test plan:
The unit tests have been re-written to use the new public API. All the
snapshots are unchanged, and flow passes. Additionally, I ran `yarn
start` and verified that the GithubGraphFetcher for the Artifact plugin
is still working.
Summary:
We should be able to get the types without depending on the function to
load a Git repo from disk, and in particular without depending on
`child_process`.
Test Plan:
Flow and tests are sufficient.
wchargin-branch: extract-repository-types
There's some context at #127, in which I initially proposed this change.
In addition to the long-term benefits described in #127, there is a
short-term benefit which is that it makes snapshot tests easier to read,
because the GitHub ids are opaque and unreadable, while the GitHub urls
are relatively easy to parse.
This change results in significant snapshot churn.
Once the type was added, flow correctly discovered a bug in
GithubGraphFetcher.js, which resulted in broken graph fetching in the
ArtifactEditor. Oops! / Good work Flow!
I made `ensureNoMorePages` expect the result it is testing to be an
`any`, which is appropriate for how the function is written (i.e. it is
written in a way that is agnostic to the actual result).
The example-repo.json file is regenerated with large diffs due to the
change in indentation level throughout the file.
Test plan:
Sanity check the snapshot (close inspection is unnecessary due to the
simplicity of the code change). Check that CI pases.
This commit adds flow typing for the JSON result from hitting the GitHub
graphql api. We can't prove that the flow typing is correct, but since
the type definition is colocated with the corresponding fragment
definitions, we can hope that maintainers will maintain both together.
We update the parser to consume the new flow types. There are no flow
errors.
Test plan:
Inspect the flowtypes, verify that they correspond to the data in
example-repo.json, and that there are no flow errors.
Summary:
In this newly added module, we load the structural state of a git
repository into memory. We do not load into memory the contents of any
blobs, so this is not enough information to perform any analysis
requiring file diffing. However, it is sufficient to develop a notion of
“this file was changed in this commit”, by simply diffing the trees.
Test Plan:
Unit tests added; `yarn test` suffices. Reading these snapshots is
pretty easy, even though they’re filled with hashes:
- First, read over the commit specifications on lines 69–83 of
`loadRepository.test.js`, so you know what to expect.
- In the snapshot file, keep handy the time-ordered list of commit
SHAs at the bottom of the file, so that you know which commit SHA is
which.
- To verify that the large snapshot is correct: for each commit, read
the corresponding tree object and make sure that the structure is
correct.
- To verify the small snapshot, just check that it’s the correct
subset of the large snapshot.
- If you want to verify that the SHA for a blob is correct, open a
terminal and run `git hash-object -t blob --stdin`; then, enter the
content of the blob and press `<C-d>`. The result is the blob SHA.
To run a sanity-check on a large repository: apply the following patch:
<details>
<summary>Patch to print out statistics about loaded repository</summary>
```diff
diff --git a/config/paths.js b/config/paths.js
index d2f25fb..8fa2023 100644
--- a/config/paths.js
+++ b/config/paths.js
@@ -62,5 +62,6 @@ module.exports = {
fetchAndPrintGithubRepo: resolveApp(
"src/plugins/github/bin/fetchAndPrintGithubRepo.js"
),
+ loadRepository: resolveApp("src/plugins/git/loadRepository.js"),
},
};
diff --git a/src/plugins/git/loadRepository.js b/src/plugins/git/loadRepository.js
index a76b66c..9380941 100644
--- a/src/plugins/git/loadRepository.js
+++ b/src/plugins/git/loadRepository.js
@@ -106,3 +106,7 @@ function findTrees(git: GitDriver, rootTrees: Set<Hash>): Tree[] {
}
return result;
}
+
+const result = loadRepository(...process.argv.slice(2));
+console.log("commits", result.commits.size);
+console.log("trees", result.trees.size);
```
</details>
Then, run `yarn backend` and put the following script in `test.sh`:
<details>
<summary>Contents for `test.sh`</summary>
```shell
#!/bin/bash
set -eu
repo="$1"
ref="$2"
via_node() {
node bin/loadRepository.js "${repo}" "${ref}"
}
via_git() (
cd "${repo}"
printf 'commits '
git rev-list "${ref}" | wc -l
printf 'trees '
git rev-list "${ref}" |
while read -r commit; do
git rev-parse "${commit}^{tree}"
git ls-tree -rt "${commit}" \
| grep ' tree ' \
| cut -f 1 | cut -d ' ' -f 3
done | sort | uniq | wc -l
)
echo
printf 'Running directly via git...\n'
time a="$(via_git)"
echo
printf 'Running Node script...\n'
time b="$(via_node)"
diff -u <(cat <<<"${a}") <(cat <<<"${b}")
```
</details>
Finally, run `./test.sh /path/to/some/repo origin/master`, and verify
that it exits successfully (zero diff). Here are some timing results on
SourceCred and TensorBoard:
- SourceCred: 0.973s via Node, 0.327s via git.
- TensorBoard: 30.836s via Node, 6.895s via git.
For TensorFlow, running via git takes 7m33.995s. Running via Node fails
with an out-of-memory error after 39 minutes, with 10GB RAM and 4GB
swap. See details below.
<details>
<summary>
Full timing details, commit SHAs, and OOM error message
</summary>
```
+ ./test.sh /home/wchargin/git/sourcecred 01634aabcc
Running directly via git...
real 0m0.327s
user 0m0.016s
sys 0m0.052s
Running Node script...
real 0m0.973s
user 0m0.268s
sys 0m0.176s
+ ./test.sh /home/wchargin/git/tensorboard 7aa1ab9d60671056b8811b7099eec08650f2e4fd
Running directly via git...
real 0m6.895s
user 0m0.600s
sys 0m0.832s
Running Node script...
real 0m30.836s
user 0m3.216s
sys 0m10.588s
+ ./test.sh /home/wchargin/git/tensorflow 968addadfd4e4f5688eedc31f92a9066329ff6a7
Running directly via git...
real 7m33.995s
user 5m21.124s
sys 1m5.476s
Running Node script...
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
1: node::Abort() [node]
2: 0x121a2cc [node]
3: v8::Utils::ReportOOMFailure(char const*, bool) [node]
4: v8::internal::V8::FatalProcessOutOfMemory(char const*, bool) [node]
5: v8::internal::Factory::NewFixedArray(int, v8::internal::PretenureFlag) [node]
6: v8::internal::DeoptimizationInputData::New(v8::internal::Isolate*, int, v8::internal::PretenureFlag) [node]
7: v8::internal::compiler::CodeGenerator::PopulateDeoptimizationData(v8::internal::Handle<v8::internal::Code>) [node]
8: v8::internal::compiler::CodeGenerator::FinalizeCode() [node]
9: v8::internal::compiler::PipelineImpl::FinalizeCode() [node]
10: v8::internal::compiler::PipelineCompilationJob::FinalizeJobImpl() [node]
11: v8::internal::Compiler::FinalizeCompilationJob(v8::internal::CompilationJob*) [node]
12: v8::internal::OptimizingCompileDispatcher::InstallOptimizedFunctions() [node]
13: v8::internal::Runtime_TryInstallOptimizedCode(int, v8::internal::Object**, v8::internal::Isolate*) [node]
14: 0x12dc8b08463d
```
</details>
wchargin-branch: load-git-repositories
# Please enter the commit message for your changes. Lines starting
# with '#' will be kept; you may remove them yourself if you want to.
# An empty message aborts the commit.
#
# Date: Mon Apr 23 23:02:14 2018 -0700
#
# HEAD detached at origin/wchargin-load-git-repositories
# Changes to be committed:
# modified: package.json
# new file: src/plugins/git/__snapshots__/loadRepository.test.js.snap
# new file: src/plugins/git/loadRepository.js
# new file: src/plugins/git/loadRepository.test.js
#
# Untracked files:
# out
# runtests.sh
# src/plugins/artifact/editor/ArtifactSetInput.js
# src/plugins/git/repository.js
# test.sh
# todo
#
github/githubPlugin.js was growing ungainly - it contained two major
pieces: all of the node and edge types, and the GitHub parser. As I
contemplated adding a third major new section of logic (an easy-to-use
api for traversing the GitHub graph, with first class support for
comments, authorship, etc), I found the prospect of adding even more
into that file quite unappealing. So, I have instead split it into three
files:
* github/pluginName.js: Exports the plugin name.
* github/types.js: Exports types of nodes, edges, and their payloads
* github/parser.js: Exports `GithubParser`
No logic has been changed whatsoever - this is purely a rename-refactor.
Test plan:
CI still passes. I manually verified that the Artifact Editor can still
load and display GitHub data.
We have urls for all the author types, so for consistency across GitHub
payloads, I am adding urls to the author paylods.
Test plan:
Check that snapshot changes consist entirely of adding urls, and that
the urls are appropraite.
Add logic to findReferences for finding GitHub username references,
e.g. "Hello, @wchargin!". The API is unchanged.
Test plan:
There are new unit tests that verify this behavior works as expected.
Currently, every type of reference has its own type signature: numeric
references are returned as numbers, url references are a complicated
object containing url parts, and so forth.
Since ultimately the references are just strings, it makes more sense to
treat references as plain strings. This allows a much simpler
implementation of reference edge creation in the GitHub plugin. It also
results in a simpler API for the parseReferences file (it only exports a
single findReferences function).
Test plan:
Verify that the updated tests encode appropriate behavior.
Summary:
This replaces the implementation of a static check from a somewhat
complicated use of higher-order types to a more simple empty-union
assertion, as suggested by jez in “Case Exhaustiveness in Flow”:
https://blog.jez.io/flow-exhaustiveness/
(I know; we’re not using Reason. One step at a time. :-) )
I adapted the implementation a bit because I prefer explicitly disabling
an ESLint warning over a no-op function call; it is not clear from the
latter that the purpose is to suppress a lint warning.
Test Plan:
In `githubPlugin.js`, add `| "ANOTHER"` to the `NodeType` type, and note
a compile-time Flow error on the appropriate line, with a very readable
error message. Note that all unit tests pass, and running the UI on
`sourcecred/sourcecred` yields correct titles for each node type present
(namely, all node types except for `ORGANIZATION` and `BOT`).
wchargin-branch: empty-union-assertion
Keeping the GitHub demo data up-to-date is important, and there isn't
good documentation for how to do that.
This commit adds a short README.md for the demo data, and adds an update
flag to fetchGithubRepoTest.sh that can be used to easily update it.
Test plan:
Modify example-repo.json (e.g. by deleting it entirely). Run
fetchGithubRepoTest.sh -u and confirm that the data was regenerated
without change. Run fetchGithubRepoTest.sh and confirm the test passes.
Note: The end cursor is sensitive to the timezone, which seems to be
cached with the GitHub token. An erroneous switch to Israel timezone
made it into master; this commit reverts back to US/Pacific.
This commit modifies our GitHub graphql query so that we request urls
for all objects (e.g. users, pull requests, pull request review
comments). Some change along these lines is necessary so that we can
correctly represent URL reference edges to e.g. issue comments. (It
might be possible to do without by reverse-enginering from the ids, but
we are resolved to treat ids as opaque).
Strictly speaking, we don't need to collect urls for users, issues, and
pull requests - they are generated via simple schema. However, for
consistency, I think it's better to just take URLs on everything.
Test plan: The example-repo.json has been regenerated. The diffs are as
expected.
Summary:
This commit completes the ad hoc pagination solution described in #117:
we implement pagination specifically for our current query against the
GitHub API. This is done in such a way that reasonable additions to the
query will not be hard to implement—for instance, if we want to fetch
a new kind of field, the marginal cost is at most a bit of extra
copy-and-paste and some modifications to tests. However, we should
certainly still plan to implement the fully automatic pagination system
described in #117.
Running on TensorBoard with the default page limits takes 30–33 seconds
over 7 queries, uses 103 GitHub API points (out of 5000 for any given
hour), and produces a JSON file that is 8.7MB (when pretty-printed).
This all seems quite reasonable to me.
Test Plan:
Extensive unit tests added. The snapshots are quite readable, by design.
For a real-world test, you can `yarn start` the artifact viewer and use
the GUI to fetch the data for tensorflow/tensorboard.
To demonstrate that the fetching process gives the same results
regardless of the page size, follow these steps:
1. In `fetchGithubRepo.js`’s `postQuery` function, insert a new
statement `console.error("Posting query...")` at the beginning. (It
is important to print to stderr instead of stdout.)
2. Run `yarn backend` and then invoke `fetchGithubRepo.js` on a repo
large enough to require pagination, like SourceCred or TensorBoard.
Pipe the result to `shasum -a 256` and note the SHA.
3. In `github/graphql.js`, change the page size constants near the top
of the file. Re-run step 2. The number of queries that are posted
will vary significantly as the page size constants vary, but the
checksum should remain the same.
4. Repeat until satisfied. (I tried three sets of values: the standard
values, the standard values but all divided by 10, and all 5s.)
wchargin-branch: ad-hoc-pagination
Summary:
Once we execute the root query, find continuations, embed the
continuations into queries, and execute the continuation query, we will
need to merge the continuations’ results back into the root results.
This commit adds a function `merge` that will be suitable for doing just
that.
Test Plan:
New unit tests added, with 100% coverage. Run `yarn test`.
wchargin-branch: merge-query-results
Summary:
Per #117, this is a first step toward at writing a pagination API that
specifically targets our current GitHub query. For design details, see
new module docs on `src/plugins/github/graphql.js`.
This commit modifies the core GitHub query and thus the
`example-repo.json` snapshot: we now request `endCursor` fields for all
pagination info, and we request the `id` of the root `repository` field.
The former is obviously necessary. The latter is necessary for the
repository to be consistent with other nodes that offer connections as
fields: we require an ID on the node containing the connection so that
we can have random access to it in a continuation selector.
Test Plan:
Unit tests added. You can also try out the generated continuation
queries for yourself: apply the patch below, run `yarn backend`, and
then run the `fetchGithubRepo.js` script on `sourcecred/sourcecred`.
This will output a nicely formatted query that you can paste directly
into GitHub’s API explorer and execute. (Note that, because this patch
is not fully polished, the query must be run against a repository that
has a continuation for every node type: more pages of issues, PRs,
comments, reviews, and review comments. This is due to an
easy-but-annoying-to-fix bug in the patch, not in the code included in
this commit.)
<details>
<summary>Patch for generating a continuations query</summary>
```diff
diff --git a/src/plugins/github/fetchGithubRepo.js b/src/plugins/github/fetchGithubRepo.js
index 789a20e..418c736 100644
--- a/src/plugins/github/fetchGithubRepo.js
+++ b/src/plugins/github/fetchGithubRepo.js
@@ -6,8 +6,13 @@
import fetch from "isomorphic-fetch";
-import {stringify, inlineLayout} from "../../graphql/queries";
-import {createQuery, createVariables} from "./graphql";
+import {stringify, inlineLayout, multilineLayout} from "../../graphql/queries";
+import {
+ continuationsFromQuery,
+ continuationQuery,
+ createQuery,
+ createVariables,
+} from "./graphql";
/**
* Scrape data from a GitHub repo using the GitHub API.
@@ -66,8 +71,13 @@ function postQuery(payload, token) {
if (x.errors) {
return Promise.reject(x);
}
- ensureNoMorePages(x);
- return Promise.resolve(x);
+ console.log(
+ stringify.body(
+ continuationQuery(Array.from(continuationsFromQuery(x.data))),
+ multilineLayout(" ")
+ )
+ );
+ throw new Error("STOPSHIP");
});
}
diff --git a/src/plugins/github/graphql.js b/src/plugins/github/graphql.js
index 9ea2592..9ead42b 100644
--- a/src/plugins/github/graphql.js
+++ b/src/plugins/github/graphql.js
@@ -39,11 +39,11 @@ import {build} from "../../graphql/queries";
*
* [1]: https://developer.github.com/v4/guides/resource-limitations/#node-limit
*/
-export const PAGE_LIMIT = 100;
-const PAGE_SIZE_ISSUES = 100;
-const PAGE_SIZE_PRS = 100;
-const PAGE_SIZE_COMMENTS = 20;
-const PAGE_SIZE_REVIEWS = 10;
+export const PAGE_LIMIT = 10;
+const PAGE_SIZE_ISSUES = 10;
+const PAGE_SIZE_PRS = 10;
+const PAGE_SIZE_COMMENTS = 3;
+const PAGE_SIZE_REVIEWS = 1;
const PAGE_SIZE_REVIEW_COMMENTS = 10;
/**
@@ -340,6 +340,36 @@ function* continuationsFromReview(
}
}
+/**
+ * Combine continuations into a query.
+ */
+export function continuationQuery(
+ continuations: $ReadOnlyArray<Continuation>
+): Body {
+ const nonces: string[] = continuations.map((_, i) => `_n${String(i)}`);
+ const nonceToIndex = {};
+ nonces.forEach((n, i) => {
+ nonceToIndex[n] = i;
+ });
+ const b = build;
+ const query = b.query(
+ "Continuations",
+ [],
+ continuations.map((continuation, i) =>
+ b.alias(
+ nonces[i],
+ b.field(
+ "node",
+ {id: b.literal(continuation.enclosingNodeId)},
+ continuation.selections.slice()
+ )
+ )
+ )
+ );
+ const body = [query, ...createFragments()];
+ return body;
+}
+
/**
* These fragments are used to construct the root query, and also to
* fetch more pages of specific entity types.
```
</details>
wchargin-branch: ad-hoc-pagination-continuations
Summary:
Any time that we pull fields off a connection object, we may need to
repeat the query for subsequent pages. Therefore, such fragments will be
shared across multiple queries, and also shared within a query if we
need to fetch—say—more issue comments on two or more distinct issues.
This is a perfect use case for fragments.
This commit refactors the GitHub query to be organized in terms of
fragments, without changing the format of the results.
(We also take this opportunity to factor the page limits into
constants.)
Test Plan:
After running `yarn backend`, the `fetchGithubRepoTest.sh` test passes.
wchargin-branch: extract-github-query-fragments
Summary:
Per #117, we want to develop an ad hoc pagination API written
specifically against the query that we use to interact with GitHub.
The pagination logic should be separate from the logic to actually fetch
the repo, but should be colocated with the query itself, so this commit
extricates the query from `fetchGithubRepo.js` into a new module.
Test Plan:
Existing tests pass, including `fetchGithubRepoTest.sh`.
wchargin-branch: extract-github-graphql
Summary:
This was created by re-crawling the GitHub repo via `fetchGithubRepo`,
and then updating Jest snapshots.
Test Plan:
Note that `fetchGithubRepoTest.sh` passes, so the data is now up to
date.
Inspect the snapshot, and note that the only changes are to change login
names from `dandelionmane` to `decentralion`. To do so automatically:
```bash
set -eu
diff_contents() {
git difftool HEAD^ HEAD --extcmd=diff --no-prompt
}
! diff_contents | grep '^<' | grep -vF '"dandelionmane"'
! diff_contents | grep '^>' | grep -vF '"decentralion"'
```
wchargin-branch: decentralion-data
Summary:
For pagination, we’ll want to query against multiple entities of the
same type. GraphQL uses aliases to facilitate this. This commit adds
support for aliases to our GraphQL query DSL.
Test Plan:
Inspect snapshot changes, and note that `yarn flow` and `yarn test`
pass.
wchargin-branch: graphql-aliases
Summary:
This component now maintains a graph of just artifacts and the edges
among them. It owns the state, and notifies its parents of changes with
a callback. We treat the graph objects as properly immutable, copying
them on each change. So far, descriptions are always the empty string.
Test Plan:
Interaction tests added. Run `yarn test`.
wchargin-branch: artifact-graph-editor
Update GitHub plugin to respect two new conventions:
- Node/Edge types are exported as UPPER_CASE_CONSTANTS
- Edge types are always verbs, which can be read as $src verb $dst
Test plan:
Flow passes. Inspect snapshot changes.
Summary:
This commit revises our implementations of node and edge types, and
specifies the semantics for artifact plugin IDs: we create IDs by
slugifying an artifact name and then resolving collisions
Test Plan:
Unit tests added. Run `yarn flow` and `yarn test`.
wchargin-branch: artifact-plugin-node-edge-semantics
Summary:
Wherein we change the semantics to allow\* dangling edges. This is
necessary for plugins that want to update nodes, such as changing a
description or other noncritical field.
\* (It was technically possible before by abusing `merge`, but now you
can just do it.)
Paired with @dandelionmane.
Test Plan:
Extensive tests added. Run `yarn flow` and `yarn test`.
wchargin-branch: allow-removing-from-graph
Summary:
Clients of `Graph` that wish to treat the graph as immutable will
benefit from a `copy` method. We should provide it on `Graph` instead of
asking clients to reimplement it because it affords us the opportunity
to get the type signature right: in particular, copying should allow
upcasting of the type parameters, even though `Graph` itself is
invariant.
Paired with @dandelionmane.
Test Plan:
Unit tests added. Run `yarn flow` and `yarn test`. To check that
downcasting is not allowed, change the types in the new static test case
in `graph.test.js` to be contravariant instead of covariant, and note
that `yarn flow` fails.
wchargin-branch: graph-copy
Summary:
We need to know the repo owner and name for purposes other than fetching
the GitHub graph: for instance, fetching the `artifacts.json` file that
describes the artifact subgraph. It makes sense that these should be
settings global to the application. This commit separates a settings
component and the original GitHub graph fetcher.
This invalidates localStorage; you can manually migrate.
Paired with @dandelionmane.
Test Plan:
Note that the data continues to be stored in localStorage and that it is
updated on each keypress. Note that the state is properly passed around:
if you change the repository name from `example-repo` to `sourcecred`,
e.g., and click “Fetch GitHub graph”, then the proper graph is fetched.
wchargin-branch: separate-artifact-settings
* Generate Edge ids automatically
Adds edgeID in graph, which creates a string id from src and dest.
Provided that the plugin only uses edgeID for generating edge ids of
that type, these ids will be unique.
Modify the GitHub plugin to use edgeID. This allows the code and
typesignature to be simpler, and will be more consistent with other
plugins.
Test plan:
Carefully inspect the snapshots.
* Add a toString and fromString method on addresses
The toString and fromString methods use json-stable-stringify,
and we've modified other address code to use these methods.
As such, a number of the snapshots have changed (ordering).
Test plan:
Verify the new included unit tests are comprehensive. Inspect the
snapshots.
This pull request adds a string type as a field of the address, thus canonicalizing that all Nodes and Edges have a Type. This allows for simpler PluginAdapters, and simpler implementation of the GitHub plugin (as it no longer needs to invent its own mechanism for storing types).
I explored making the Address interface generic, with a Type parameter that is a subtype of string. Unfortunately, Flow type resolution seems to have an exponential performance degradation with many subtyped parameters, and adding the extra type parameters to Graph.merge resulted in my flow instance locking. Maybe we can explore adding the subtypes later.
In the GitHub plugin, we entirely do away with the NodeIDs, but we still have EdgeIDs. I plan to remove the need for EdgeIDs in a separate PR which will enforce uniqueness of (srcAddress, dstAddress, pluginName, type) tuples, so that explicitly generating IDs for edges will be unnecessary. In the mean time, I bifurcated the makeAddress function in the GitHub plugin to makeEdgeAddress and makeNodeAddress.
Test plan:
Flow and unit tests all pass. Inspect snapshots, and verify that all the changes are reasonable. Note that since we order by serialized address, adding the type field to address has changed the snapshot order in a few cases.
Close#103
Summary:
This commit moves our existing frontend tests to use Enzyme’s shallow
rendering API <http://airbnb.io/enzyme/docs/api/shallow.html>. The
benefit over also using `react-test-renderer` is simply consistency (the
two are functionally equivalent); the benefits over `mount` are that
subcomponents cannot contaminate the test state (i.e., you’re only
testing one component at a time), that the resulting snapshots are more
readable because the root props are not shown, and that the
implementation is more efficient. This is a follow-up to #102.
In a case where we actually need a full DOM tree, we should still feel
free to use `mount`, but we haven’t needed that yet.
Test Plan:
Verify that the new `ContributionList.test.js.snap` represents the same
data as the old one.
wchargin-branch: standardize-enzyme-shallow
Summary:
This is our first dynamic test of a React component! Enzyme looks pretty
easy to use to me, for both snapshot tests and interaction simulation.
In doing so, we catch a minor bug in the edge case where a contribution
is not owned by any plugin (`colSpan`, not `colspan`). This edge case
does not appear in the sample data, but it does appear in the test data,
even prior to this commit. The previous renderer, `react-test-renderer`,
appears not to surface this error. Furthermore, this bug did not cause
any user-visible errors except a `console.error`.
Test Plan:
Inspect the snapshot file to make sure that it is reasonable. (The
existing test case has its snapshot regenerated due to formatting
differences between the two renderers.)
To test that the browser error is fixed, render a contribution list on a
GitHub graph but with an empty adapter set. One way to do this is to comment out line 7 of
`standardAdapterSet.js`; alternately, you can use the React Dev Tools to
select the `ContributionList` node, then run
```js
$r.props.adapters.adapters = {};
$r.forceUpdate();
```
Note subsequently that there is no console error and that the `<td>`s in
question span three columns.
wchargin-branch: contributionlist-dynamic-test
Summary:
Filter options are “all contributions” or a specific plugin/type
combination. This includes a snapshot test for the static state. I’ll
add an interaction test in a subsequent commit.
Test Plan:
`yarn start`, fetch graph, play with the filtering options.
wchargin-branch: filter-contributions
* Make GitHub capitalization consistent within code
We now never capitalize the H in GitHub within variable or function
names. We still capitalize it in comments or user facing strings.
Test plan:
Unit tests, the fetchGithubRepoTest.sh, and
`git grep itHub` only shows comment lines and print statements.
* Fix William's klaxon
This commit only adds logic for finding references in GitHub posts,
either by #-numeric reference, or explicit urls. Adding the reference
edges to the graph will occur in a followon commit.
Test plan: New unit tests are included
Summary:
Any tests that render Aphrodite-styled React elements will need to do
this, so it’s nice to have the code in one place.
wchargin-branch: aphrodite-testutils
Summary:
In addition to being nicer on the eyes, this enables the query to be
statically analyzed (e.g., by an auto-pagination API) and used by other
modules.
Test Plan:
Manually running
```shell
$ yarn backend
$ GITHUB_TOKEN="<your_token>" src/plugins/github/fetchGitHubRepoTest.sh
```
succeeds.
wchargin-branch: use-structured-graphql-queries
Summary:
This commit begins to extend the artifact editor to display
contributions. To display contributions from arbitrary plugins, we need
to communicate with those plugins somehow. We do so via an adapter
interface that plugins implement; included in this commit is an
implementation of this interface for the GitHub plugin (partially: we
punt on rendering).
This includes a snapshot test. The snapshot format is designed to be
human-readable and -auditable so that it can serve as documentation.
Test Plan:
Run the application with `yarn start`. Then, fetch a graph and watch as
its contributions appear in the view.
wchargin-branch: contributions-and-adapters
Summary:
This is useful for metaprogramming. For instance, suppose we have an
object like this:
```js
const stringifiers = {
ISSUE: (stringifyIssue: (Node<IssueNodePayload>) => string),
COMMENT: (stringifyComment: (Node<CommentPayload>) => string),
...
}
```
How do we type this? We might try
```js
{[type: NodeType]: (Node<NodePayload>) => string}
```
but this is not correct, because `Node<IssueNodePayload>` is a subtype of
`Node<NodePayload>`, and `(_) => K` is contravariant, not covariant. (In
other words, a function from `Node<IssueNodePayload>` is not as general
as a function from `Node<NodePayload>`.) We need to express a dependency
between the object key and the value. We instead write:
```js
type TypedNodeToStringifier = <T: $Values<NodeTypes>>(
T
) => (node: Node<$ElementType<T, "payload">>) => string;
(stringifiers: $Exact<$ObjMap<NodeTypes, TypedNodeToStringifier>>);
```
This expresses exactly (heh) the right type.
Test Plan:
Note that removing any of the elements of `NodeTypes` yields a Flow
error, due to the static assertion following the type definition.
wchargin-branch: node-types
I added some new issues to sourcecred/example-repo to test unicode
support and parsing of extremely long issues titles.
This commit merely updates our example-repo json and the corresponding
snapshots.
Test plan:
Run testFetchGithubRepo.sh
Run unit tests
Also, since there are now two types of things that are being
"contained" (comments and pull request reviews), I factored out an
addContainment method to avoid repeating that code.
To make our handling of PullRequestReviewComments and regular Comments
consistent, I modified our query string so that we now request urls on
PullRequestReviewComments. Also, since I didn't notice until closely
inspecting the snapshot that we had been adding payloads with some
undefined properties, I added a test to verify that every property on
every node and edge payload is defined.
I regenerated the example-repo data to reflect the change to query
string.
Test plan:
Verify that the snapshot changes are appropriate
Run standard tests
Run `yarn backend`
Run `GITHUB_TOKEN={your_token} ./src/plugins/github/fetchGithubRepoTest.sh`
Summary:
This is quick and dirty. No error handling yet. We’ll soon save
credentials and repository to local storage.
Paired with @dandelionmane.
Test Plan:
Run `yarn start`, then enter your API key and specify the
`sourcecred/example-repo` repo. Note that the resulting graph is shortly
logged to the console.
wchargin-branch: fetch-parse-github-frontend
Summary:
We’ll now start creating the artifact plugin. A large part of this will
be the user interface, including a GUI. For now, our build system just
builds a single React app, so we’re cannibalizing the main explorer to
serve this purpose.
Paired with @dandelionmane.
Test Plan:
The following still work:
- `yarn test`
- `yarn start`
- `yarn build; (cd build; python -m SimpleHTTPServer)`
wchargin-branch: repurpose-react-app-as-artifact-editor
Summary:
We’re not deleting it because it works with the build system and has the
service worker stuff from create-react-app, but we’ll soon repurpose it.
Paired with @dandelionmane.
Test Plan:
The following still work:
- `yarn test`
- `yarn start`
- `yarn build; (cd build; python -m SimpleHTTPServer)`
wchargin-branch: dismantle-explorer
The GitHub parser transforms GraphQL api data from GitHub into our Graph
data structure. This commit focuses on properly parsing Issues, Pull
Requests, Comments, and Users.
Test Plan:
Run the unit tests. Inspect the snapshot results (particuarly those for
individual pull requests or issues, which are easier to parse) and
verify that the output is appropriate.
Summary:
Running `yarn backend` will now bundle backend applications. They’ll be
placed into the new `bin/` directory. This enables us to use ES6 modules
with the standard syntax, Flow types, and all the other goodies that
we’ve come to expect. A backend build takes about 2.5s on my laptop.
Created by forking the prod configuration to a backend configuration and
trimming it down appropriately.
To test out the new changes, this commit changes `fetchGitHubRepo` and
its driver to use the ES6 module system and Flow types, both of which
are properly resolved.
Test Plan:
Run `yarn backend`. Then, you can directly run an entry point via
```
$ node bin/fetchAndPrintGitHubRepo.js sourcecred example-repo "${TOKEN}"
```
or invoke the standard test driver via
```shell
$ GITHUB_TOKEN="${TOKEN}" src/backend/fetchGitHubRepoTest.sh
```
where `${TOKEN}` is your GitHub authentication token.
wchargin-branch: webpack-backend
Summary:
See motivation in #76. Feel free to look at the new snapshot file to
inspect the structured representation and also the stringified output.
This implementation is sufficient to encode our query against the
GitHub v4 API; see the test plan below.
Test Plan:
Unit tests added; run `yarn flow && yarn test`.
This code has full coverage except for lines 260, 315, and 380 of
`queries.js`; these lines check invariants that should never be
violated.
You can also use the following steps to verify that the sample query is
valid GraphQL that produces the same results as our hand-written query:
1. Apply the following hacky patch:
```diff
diff --git a/src/backend/graphql/queries.test.js b/src/backend/graphql/queries.test.js
index 52bdec7..c04a636 100644
--- a/src/backend/graphql/queries.test.js
+++ b/src/backend/graphql/queries.test.js
@@ -3,6 +3,18 @@
import type {Body} from "./queries";
import {build, stringify, multilineLayout, inlineLayout} from "./queries";
+function emitGitHubQuery(layout, filename) {
+ const fs = require("fs");
+ const path = require("path");
+ const result = stringify.body(usefulQuery(), layout);
+ const outputFilepath = path.join(__dirname, "..", filename);
+ const outputText = `module.exports = ${JSON.stringify(result)};\n`;
+ fs.writeFileSync(outputFilepath, outputText);
+ console.log(`Wrote output to ${outputFilepath}.`);
+}
+emitGitHubQuery(multilineLayout(" "), "githubQueryMultiline.js");
+emitGitHubQuery(inlineLayout(), "githubQueryInline.js");
+
describe("queries", () => {
describe("end-to-end-test cases", () => {
const testCases = {
```
2. Run `CI=true yarn test`, and verify that the following two files
written to `src/backend/` contain appropriate contents. You can just
eyeball them, or check that they match my results:
https://gist.github.com/wchargin/f37b99fd4ec345c9d2541c2dc53ceda9
3. In `fetchGitHubRepo.js`, change the definition of `const query` to
```js
const query = require("./githubQueryMultiline.js");
```
Run
```shell
GITHUB_TOKEN="<your_token_here>" src/backend/fetchGitHubRepoTest.sh
```
and verify that it exits successfully.
4. Repeat for `require("./githubQueryInline.js")`.
wchargin-branch: graphql-structured-queries
Summary:
Closes#82. This affords clients type-safety without needing to
verbosely annotate every node or edge passed into the graph functions.
It also enables graph algorithms to be more expressive in their types:
for instance, the merge function now clearly indicates from its type
that the first graph’s nodes are passed as the first argument to the
node reducer, and the second graph’s nodes to the second. Clients can
upgrade immediately by using `Graph<*, *>`.
Thankfully, Flow supports variance well enough for this all to be
possible without too much trouble.
Test Plan:
Existing unit tests pass statically and at runtime. I added a test case
to demonstrate that merging works covariantly.
To see some failures, change `string` to an incompatible type, like
`number`, in the definitions of `makeGraph` in test functions for
conservatively rejecting graphs with conflicting nodes/edges
(ll. 446, 462).
wchargin-branch: parameterize-graph
Summary:
Flow doesn’t allow us to specify variance annotations in generic
function parameters, and doesn’t allow coercing `Node<T>` to
`Node<mixed>`. This forces us to put `any`s in our code, which…works.
Paired with @dandelionmane.
Test Plan:
New unit tests trivially pass dynamically, and now pass statically
(failing before the changes to `graph.js`).
wchargin-branch: explicitly-typed-nodes-edges
Graph.addNode and Graph.addEdge now allow adding the same node or edge
multiple times, provided that the duplicate adds are trying to insert
identical content.
This came up while prototyping the GitHub plugin; rather than create
myriad subgraphs and merge them, I found it convenient to construct a
single graph and iteratively add nodes. Since the same node may be
discovered multiple times (most notably user identities), there was a
need for a "conservative add" abstraction that adds a node if it doesn't
exist yet, but errors only if multiple adds conflict.
Since this behavior is generic and highly conservative, it seemed
appropriate to include in the graph class itself.
Test Plan:
The unit tests have been updated to include the new behavior.
I moved sourcecred/tiny-example-repository to sourcecred/example-repo
as it's simpler to remember. I also unarchived it and added comments
to an issue, so that we can create a simple test for issue parsing.
This commit merely updates SourceCred to point to example-repo with
the regenerated canoncial output.
Summary:
It’s a whole new world of GraphQL! Our parser is now just a GraphQL
query that asks for exactly what we want and dumps it to a file. The
data exposed by the v4 API is also in a much nicer format than that of
the v3 API, so this is pretty much a universal improvement.
Currently, we do not handle pagination. We require that the repository
in question have fewer than a fixed number of issues, and comments per
issue, and reviews per PR, and review comments per PR, and so on. If
this limit is exceeded, the script will fail-fast with a nice error
message. To fix this, we’ll need to write a general-purpose pagination
API that allows traversing cursors at any level of the query.
Paired with @wchargin.
Test Plan:
Run
$ GITHUB_TOKEN="your_token_here" src/backend/fetchGitHubRepoTest.sh
and verify that it exits with 0. Note that if you change this script’s
repository from `tiny-example-repository` to `sourcecred`, the script
correctly fails and outputs a useful diff.
wchargin-branch: github-v4-graphql
* Factor evertide graph demo data to a new module
It would be helpful to make our standard tiny graph available to other
test and demo instances, outside of just graph.test.js. This way we can
use it as a test case for the Graph Explorer.
* Make App.js into skeleton for GraphExplorer
We make a very basic skeleton for the Graph Explorer as a basis
for future development.
This commit also removes the UserExplorer and FileExplorer from
App.js. Since we have changed the underlying data model, we are
unlikely to use the UserExplorer or FileExplorer in anything like
their current state, so they are effectively deprecated. I am deferring
removing them because it is nice to have some examples of working React
code to copy from, before the Graph Explorer is ready.
Test plan: run `yarn start`, and observe that the App displays the
words "Graph Explorer" underneath the "SourceCred Explorer" title bar.
Summary:
This commit adds `toJSON()` and `static fromJSON()` on `Graph`. The main
benefit at this time is that this gets us free interoperability with
Jest’s snapshot testing.
The implementation of `fromJSON` is not performance-tuned, and could
probably be significantly optimized.
See #65 for discussion.
Test Plan:
New unit tests added: `yarn flow && yarn test`.
wchargin-branch: make-graph-serializable
Summary:
This commit simplifies the implementation of `Graph` without changing
its interface. We now use the `AddressMap` for all four instance fields
of `Graph`.
Test Plan:
All existing tests pass, and coverage is maintained.
wchargin-branch: use-address-map-in-graph
Summary:
This commit reifies the concept of an `Addressable`, which is any object
that has a covariant `address: Address` attribute, and implements a
simple data structure for storing addressable items keyed against their
addresses. Instances of `AddressMap` can replace the four fields of
`Graph`:
```js
_nodes: AddressMap<Node<mixed>>;
_edges: AddressMap<Edge<mixed>>;
_outEdges: AddressMap<{|+address: Address, +edges: Address[]|}>;
_inEdges: AddressMap<{|+address: Address, +edges: Address[]|}>;
```
Test Plan:
New unit tests included, with 100% coverage: `yarn flow && yarn test`.
wchargin-branch: address-map
Summary:
We’re stripping down the payload types for the GitHub plugin, to only
include what we expect to use immediately. In doing so, we take the
opportunity to make the typing a little stronger, so that we can ensure
that the `type` field of a specific type of payload is set to a
particular constant.
Paired with @dandelionmane.
Test Plan:
Adding these lines to `githubPlugin.js` and running `yarn flow`
indicates that the typechecking is working as expected:
```js
("ISSUE" : NodeType); // works
("WEIRD" : NodeType); // fails
("AUTHORSHIP" : EdgeType); // works
("UNEXPECTED" : EdgeType); // fails
```
wchargin-branch: github-plugin-payload-types
Summary:
`graph.js` coverage is now 100% :-)
Test Plan:
`yarn jest --env=jsdom --coverage` shows no uncovered lines for
`graph.js`, and no failing tests.
wchargin-branch: coverage-gremlin
Summary:
Merging graphs will be a common operation. At a per-plugin level, it
will often be useful to build up graphs by creating many very small
graphs and then merging them together. At a cross-project level, we will
need to merge graphs across repositories to gain an understanding of how
value flows among these repositories. It’s important that the core graph
type provide useful functions for merging; this commit adds them.
Test Plan:
New unit tests added; run `yarn flow && yarn test`.
wchargin-branch: graph-merge
Summary:
We need this for testing graph equality: deep-equality is not sufficient
because two graphs can be logically equal even if, say, two nodes are
added in different orders.
This commit adds a dependency on `lodash.isequal` for deep equality.
Test Plan:
New unit tests added. Run `yarn flow && yarn test`.
wchargin-branch: graph-equals
Summary:
In merging #54, there was a semantic merge conflict that was not also a
textual merge conflict; this created a failure that only appeared once
that commit was merged.
We propose that to fix this in the future, we only merge commits that
are directly ahead of master.
Test Plan:
This fixes `yarn flow` and `yarn test`.
wchargin-branch: fix-merge-conflict
Summary:
Again: we assume these invariants, so we may as well encode them.
We should just keep in mind that non-Flow users may wantonly violate
these, so we should still code defensively.
wchargin-branch: readonly-exact
Summary:
These will make nicer error functions in cases where static analysis
doesn’t detect the pollution: e.g., a user isn’t using Flow, or an
expression like `arr[0]` introduces an `undefined`.
Paired with @dandelionmane.
Test Plan:
New unit tests added. Run `yarn test`.
wchargin-branch: null-undefined-check
Create an 'advancedMealGraph' test case
The advancedMealGraph will be a grab-all that holds all advanced and
edge behaviors, e.g. the crab-self-referential loop, and the case
where there are multiple directed edges between the same two nodes.
Aggregating them into one test case will make it easier to test more
complex behaviors, like graph merging and serialization, on the
edge case graphs. However, it's still nice to have the simple graph
so that we can test simple things too. The specific tests for edge
case behavior are left mostly unchanged, in that they start from the
simple graph and add just the advanced feature that they want to test.
Summary:
Without these functions, it is not possible to meaningfully operate on
an arbitrary graph.
Paired with @dandelionmane.
Test Plan:
New unit tests included. Run `yarn flow && yarn test`.
wchargin-branch: get-all
Summary:
We’ve realized that `u: Edge<T>` implies `u: Node<T>`. That certainly
wasn’t what we were expecting! We might want something like that
eventually, to capture the fact that valuations are themselves valuable,
but for now the type system should encode the assumptions that we’re
actually making. See also #50.
Paired with @dandelionmane.
wchargin-branch: exact-types
Summary:
We had planned to expose our core types as simple Plain Old JavaScript
Objects, with accompanying standalone functions to act directly on these
data structures. We chose this instead of creating `class`es for the
types because it simplifies serialization interop: it obviates the need
for serialization and deserialization functions, because the code is
separated from the data entirely. Reconsidering, we now think that the
convenience benefits of using classes probably outweigh these
serialization cons. Furthermore, this design enables us to separate
ancillary data structures and caches from the raw data, presenting a
cleaner API for consumers of the data.
This commit introduces a `Graph` class and some related logic. With lots
of tests! And 100% code coverage! :-)
Paired with @dandelionmane.
Test Plan:
Run `yarn flow && yarn test` to see the new tests.
wchargin-branch: graph-class
Summary:
The main problem with having these fields on the node is that this
presents the illusion that the API surface area is larger than it
actually is. Clients with reference to a node object could
somewhat-reasonably expect that mutating these fields would be
sufficient to update the structure of the graph, but this isn’t the case
(as the edge objects would need to be updated, too). It’s a nice
semantic bonus, too, as edges aren’t conceptually “part of” nodes.
wchargin-branch: top-level-edges
Summary:
This is an experiment. There are a couple diffferent meanings of
“weight” in play: most prominently, weights assigned by plugins versus
those suitable for comparison among other arbitrary weights. We’re not
sure what the right thing is to put in the actual graph object, so we’re
going to think about this a bit more before adding the field back in.
wchargin-branch: remove-weights
Summary:
The “ID” parts were left-over from the Great Address Migration, and we
think that abbreviations are fine here, anyway.
Test Plan:
`yarn flow && yarn test`
wchargin-branch: src-dst-rename
Summary:
The sourcecred/tiny-example-repository repository stores some example
data that we can use to generate test cases. As of now, the repository
has been archived so that its state is stable. This commit checks in the
result of our scraper on the repository.
wchargin-branch: example-data
Reorganize the code so that we have a single package.json file, which is at the root.
All source code now lives under `src`, separated into `src/backend` and `src/explorer`.
Test plan:
- run `yarn start` - it works
- run `yarn test` - it finds the tests (all in src/explorer) and they pass
- run `yarn flow` - it works. (tested with an error, that works too)
- run `yarn prettify` - it finds all the js files and writes to them