Summary:
The previously listed invariants were weak on two counts. First, it was
unstated that the keys of `_inEdges`, `_outEdges`, and `_nodes` should
coincide. Second, the “exactly once” condition on edge inclusion had the
unintentional effect that edge absent in `_edges` but present twice or
more in each of `_inEdges` and `_outEdges` would not violate the
invariant.
Test Plan:
Stay tuned.
wchargin-branch: strengthen-invariants
The implementation is quite simple. The tests are somewhat more
comprehensive than in v2 or v1. We now test that copies are equal to the
original in a variety of situations.
Test plan:
Unit tests added.
It turns out we forgot to add this to the API, so I added it. I also
implemented it. The tests are pretty thorough; as an added innovation
over our previous tests (e.g. in #312 and #61), we now consistently test
that equality is commutative.
In contrast to our previous implementations, this one is massively
simpler. That's an upside of using primitive ES6 data structures to
store all of the graph's information... which is itself an upside of not
trying to store arbitrary additional information in the graph. Now we
can just do a deep equality check on the underlying nodes set and edges
map!
We might be able to performance tune this method by taking advantage of
the structure of our nodes and edges. This should suffice for now,
though.
Paired with @wchargin
Test plan:
Unit tests were added. Run `yarn travis`
This is the change that puts the Graph into `Graph` :) We add `_inEdges`
and `_outEdges`, and use them to identify the neighbors of a given
`node`.
The API is implemented pretty uncontroversially. (We've done this a few
times before: see #319, #162). As with other iterators, we check for
comodification and error if this has occurred.
The tests cover some interesting cases like absent nodes, loops, and
multiple edges with the same src and dst.
Test plan:
Unit tests have been added. Run `yarn travis`.
Paired with: @wchargin
Summary:
A client of `Graph` is able to (e.g.) invoke `nodes()` to get a node
iterator, iterate over some of the nodes, then change the nodes by
adding or removing the nodes on the original graph. The semantics of
what to do here are not clear: ES6 specifies semantics for `Map` and
`Set`, but they have counterintuitive consequences: for instance, you
can get a `Set` iterator to yield the same value twice. Most collection
implementations in the Java standard library prohibit this entirely. In
this commit, we adopt the latter approach.
A caveat of this implementation is that a graph object may not be
mutated more than 2^53 − 1 (`Number.MAX_SAFE_INTEGER`) times. Clients
who need to mutate a graph more than nine quadrillion times are
encouraged to reconsider their data model.
Paired with @decentralion.
Test Plan:
Unit tests added. Run `yarn travis`.
wchargin-branch: comod-check
Adding a conflicting edge (i.e. one with the same address, but different
`src` or `dst`) is an error. However, our coverage has determined that
the behavior isn't tested. This commit adds that test.
Test plan:
Only change is adding a new test. Verify `yarn travis` passes.
Summary:
We’re not mandating anything about coverage right now, but by making it
easier to track coverage perhaps people will organically become more
motivated to write good tests.
Test Plan:
Run `yarn coverage`, and then open `coverage/lcov-report/index.html`.
wchargin-branch: coverage
Test Plan:
Run `yarn test --env=jsdom --coverage`, and note that files are
generated into `coverage/`. Then, run `yarn travis`, which fails before
this patch (on `check-pretty`) and passes after it.
wchargin-branch: prettier-ignore-coverage
Summary:
These are superseded by the unified implementation in `address.js`.
Test Plan:
Existing unit tests suffice.
wchargin-branch: remove-legacy-address
Summary:
This is basically a textual substitution.
Test Plan:
Existing unit tests suffice. Note that `_address.js` (with underscore)
is no longer imported except from its own tests.
wchargin-branch: unified-addresses-in-graph
Summary:
This implements the following functions for the unified addresses:
- assertions: `assertValid`, `assertValidParts`
- injection: `fromParts`
- projection: `toParts`
(These functions depend on each other for testing, so we implement them
together.)
Test Plan:
Unit tests included. Run `yarn travis`.
wchargin-branch: address-assertion-injection-projection
Summary:
This commit implements all the code needed for the top-level
`makeAddressModule` function, without implementing any of the address
functions. This mostly comprises handling of errors in the module
options.
Test Plan:
Unit tests added. Run `yarn travis`.
wchargin-branch: address-error-handling
Summary:
We have `NodeAddress` and `EdgeAddress`, which are opaque aliases of
`string` each with separate associated functions. We really want to keep
this setup: having the address types be structurally distinct is very
nice. But currently the implementation is extremely repetitive. Core
functionality is implemented twice, once for nodes and once for edges.
The test code is even worse, as it is forced to include ugly, hacky,
parametric generalizations to test both implementations—which are really
the same code, anyway!
In this commit, we introduce a system to unify the _implementations_
while keeping the _APIs_ separate. That is, users still see separate
opaque types `NodeAddressT` and `EdgeAddressT`. Users now also see
separate modules `NodeAddress` and `EdgeAddress`, each of which
implements the same interface for its appropriate type. These modules
are each implemented by the same address module factory.
To get this to work, we clearly need to parameterize the module type
over the address type. The problem is getting this to work in a way that
interacts nicely with the opaque types. The trick is to let the factory
always return a transparent module at type `string`, but to then
specialize the type of the resulting module in the module in which the
underlying type of the opaque type is known.
This commit includes specifications for all functions that are in the
current version of the API, but includes only as much implementation
code as is needed to convince me that tests and Flow are actually
running (i.e., very little). I’ll send implementations in separate PRs
for easier review.
The preliminary modules created in this commit _are_ exported from the
graph module, even though they are incomplete. This is so that we can be
sure that nothing will catch fire in Flow-land when we try to export
them (which is plausible, given that we have nontrivial interactions
between opaque types and parametric polymorphism).
Test Plan:
Unit tests included. Run `yarn travis`.
wchargin-branch: address-unified-foundations
Summary:
In actual code, we almost always call `neighbors` with a specified
direction. Usually, you want to inspect some relation like “parents of
this commit” or “GitHub nodes referenced by this comment”, and so the
edge direction matters. In each of the above cases, forgetting to
include the direction would introduce a bug: you’d get parents _and
children_ of a commit, or GitHub nodes referenced by _or that refer to_
a comment. It’s easy to forget this when writing the code, so we prefer
to make an explicit direction required, and allow people to specify
`Direction.ANY` in the case that that is what they want.
(In other words: we want to make the common thing easy, the uncommon
thing possible, and the wrong thing impossible.)
A similar situation holds for filters. By forcing clients to think about
what kinds of edges they want to follow to what kinds of nodes, we
encourage them to write more robust code. As before, if clients do want
to consider all nodes or all edges, they can pass the appropriate empty
address (`nodeAddress([])` or `edgeAddress([]`), which is a prefix of
every address.
Therefore, we require that clients pass an `options` object to
`neighbors`, and we furthermore require that each of the three options
be present.
Paired with @decentralion, in spirit.
Test Plan:
None; this changes the API for a function that has no implementation or
clients.
wchargin-branch: neighbors-options-required
Summary:
This saves clients from having to pollute their global namespace with
`IN` and `OUT` (and, soon, `ANY`). Calls now look like:
```js
myNode.neighbors({direction: Direction.OUT});
```
Callers who prefer to pollute their namespaces are of course still
welcome to do so, at the cost of one `const {IN, OUT} = Direction`.
Test Plan:
New unit tests for frozenness included.
wchargin-branch: direction-enum
Summary:
These functions can be used for address filtering: finding all nodes
owned by a plugin, or all edges of a certain plugin-type, or similar.
Clients can implement this in terms of `toParts`, but when the
underlying type of the opaque type is known there exists a simpler and
more efficient implementation.
Test Plan:
Unit tests added. Run `yarn travis`.
wchargin-branch: address-prefix
Summary:
This fixes an organizational error on my part: the assertions were not
in `it`-blocks, so (a) no test cases were listed, and (b) assertion
failure would have been considered an error in the test suite itself.
(This wouldn’t have led to tests spuriously passing, though.)
Test Plan:
Note the new test case structure:
```
nodeToString
errors on
✓ null base input
✓ undefined base input
✓ wrong kind
works on
✓ the empty address
✓ the address with one empty component
✓ a normal address
edgeToString
errors on
✓ null base input
✓ undefined base input
✓ wrong kind
works on
✓ the empty address
✓ the address with one empty component
✓ a normal address
```
wchargin-branch: tostring-test-case-structure
On larger repos (e.g. `ipfs/js-ipfs` and `ipfs/go-ipfs`) our GitHub
query tends to fail with an opaque (possibly timeout) error message.
Based on a discussion with @mikeal, it sounds like this is probably
GitHub killing queries that "take too long". This commit reduces our
default page sizes so that we make more smaller queries. Empirically, it
seems to help, and it's very unlikely to break anything.
Test plan:
Run
```
yarn backend
node bin/sourcecred.js plugin-graph --plugin=github ipfs js-ipfs \
> graph.json
```
before and after this change. Before, it fails; after, it succeeds.
This commit implements the following edge related methods on graph:
- `Graph.addEdge(edge)`
- `Graph.hasEdge(address)`
- `Graph.edge(address)`
- `Graph.edges()`
We've decided to enforce an invariant that for every edge, its `src` and
`dst` nodes must be present in the graph. As such, `Graph.addEdge` may
error if this condition is not true for the proposed edge, and
`Graph.removeNode` may error if any edges depend on the node being
removed. This invariant is documented via comments, and checked by the
test code.
Test plan:
Extensive unit tests have been added. Run `yarn travis`.
Paired with @wchargin
Summary:
If you just print out an address, depending on output context the NUL
separators may just disappear, which is really bad. Using `stringify` is
better, but causes these characters to appear as `\u0000`, which is not
very pretty.
This commit adds pretty-printing functions that clearly display the
addresses. A node address is printed as:
nodeAddress(["array","of","parts"])
An edge address is printed as:
edgeAddress(["array","of","parts"])
An edge in the graph is printed as:
{address: edgeAddress(["one"]), src: nodeAddress(["two"]), dst: nodeAddress(["three"])}
Note that each of these is a valid JavaScript expression that evaluates
to the argument that was originally converted to string.
Paired with @decentralion.
Test Plan:
Unit tests added. Run `yarn travis`.
wchargin-branch: node-edge-tostring
Summary:
Now, a client calling `assertNodeAddress` can indicate the context to
clients, like `assertNodeAddress(foo.bar.node, "foo.bar.node")`.
Paired with @decentralion.
Test Plan:
Unit tests updated. Existing graph code does not need to be updated,
because the default values work fine.
wchargin-branch: assertion-context-parameters
Summary:
Using `throw err;` was not useful when the error’s message was an
object: `[object Object]` would be printed to the console, in lieu of
any useful information. We now additionally use `console.error` to print
the full object contents.
Test Plan:
Apply the following patch:
```diff
diff --git a/src/v1/cli/commands/combine.js b/src/v1/cli/commands/combine.js
index b60f91e..2fc4061 100644
--- a/src/v1/cli/commands/combine.js
+++ b/src/v1/cli/commands/combine.js
@@ -24,6 +24,7 @@ export default class CombineCommand extends Command {
" where each GRAPH is a JSON file generated by plugin-graph";
async run() {
+ Promise.reject({a: {b: {c: {d: {e: {f: "gee"}}}}}});
const {argv} = this.parse(CombineCommand);
combine(argv);
}
```
Then, run `yarn backend && node bin/sourcecred.js combine`, and note
that the full contents of the object are printed. (If the `util.inspect`
is replaced with a simple `console.error`, then the object’s
representation will be truncated, so the `util.inspect` is important.)
wchargin-branch: rejection-error-full-contents
Summary:
For now, this contains the logic to register an `unhandledRejection`
error. I’ve removed all instances of those handlers, and `require`d this
module at every top-level entry point. (The individual CLI commands had
the handler before, but didn’t need it; conversely, the top-level CLI
entry point did not have the handler, but should have.)
Test Plan:
To test that the CLI commands still error on unhandled rejections, apply
the following patch:
```diff
diff --git a/src/v1/cli/commands/combine.js b/src/v1/cli/commands/combine.js
index b60f91e..d55b965 100644
--- a/src/v1/cli/commands/combine.js
+++ b/src/v1/cli/commands/combine.js
@@ -24,6 +24,7 @@ export default class CombineCommand extends Command {
" where each GRAPH is a JSON file generated by plugin-graph";
async run() {
+ Promise.reject("wat");
const {argv} = this.parse(CombineCommand);
combine(argv);
}
```
Then run `yarn backend` and `node bin/sourcecred.js`, and note that the
rejection handler is triggered.
wchargin-branch: unify-entry
Right now, the cred explorer attempts to display every node in the
graph. As graphs easily grow to O(100k) nodes, this is not kind to the
browser.
This commit limits display to the first 100 entries. Since they are
sorted by percieved importance, and it's easy to filter by type (e.g. to
find all users), this limitation is fine in practice.
Test plan:
Run the cred explorer on a larger repo and observe that the performance
is enormously improved. No unit tests added, as the cred explorer is a
prototype which is basically untested (#269)
Summary:
A `RecursiveTable` shows the tree-view from a single node; a
`RecursiveTables` shows multiple `RecursiveTable`s, and is used both in
the recursion and at the top level. This is useful because we’ll want to
make changes to both entry points, for things like pagination. In
particular, this makes #342 nicer.
Test Plan:
Note that the behavior is unchanged. Then, apply
```diff
diff --git a/src/v1/app/credExplorer/pagerankTable.js b/src/v1/app/credExplorer/pagerankTable.js
index 624bd71..e343e95 100644
--- a/src/v1/app/credExplorer/pagerankTable.js
+++ b/src/v1/app/credExplorer/pagerankTable.js
@@ -220,6 +220,7 @@ class RecursiveTables extends React.Component<RecursiveTablesProps> {
const y = pagerankResult.get(b).probability;
return y - x;
})
+ .slice(0, 10)
.map((address) => (
<RecursiveTable
depth={depth}
```
and note that (a) the root view has only ten entries, and (b) any
expanded subview also has only ten entries.
wchargin-branch: recursive-tables
Our GitHub renderer tries to display the author name for a comments or
pull request reviews. However, as we've already discovered (#228), not
every GitHub post has an author available. This breaks attempts to demo
the v1 cred explorer on some interesting repos, e.g. ipfs/js-ipfs.
Since the v1 code is all deprecated, it's effectively a demo branch
pending a rewrite of the cred explorer to use the v3 graph. As such, I
didn't include unit tests.
Test plan:
Attempt to run the cred explorer on `ipfs/js-ipfs` prior to this PR.
Observe that it loudly fails. Attempt to run it after this PR, and
that it succeeds.
Consider a case where a user wants to construct many addresses sharing a
common prefix. For example, every GitHub entity may have an address
starting with the component sequence `["sourcecred", "github"]`.
Currently, users can implement this with `toParts`:
```js
const GITHUB_ADDRESS = Address.nodeAddress(["sourcecred", "github"]);
function createGithubAddress(ids: $ReadOnlyArray<string>): NodeAddress {
return nodeAddress([...Address.toParts(GITHUB_ADDRESS), ...ids]);
}
```
This is unfortunate from a performance standpoint, as we needlessly
perform string and array operations, when under the hood this is
basically a string concatenation.
This commit fixes this by adding functions `nodeAppend` and
`edgeAppend`, which take a node (resp. edge) and some components to
append, returning a new address. Consider how straightforward our
example case becomes:
```js
const GITHUB_NODE_PREFIX = Address.nodeAddress(["sourcecred", "github"]);
const ISSUE_NODE_PREFIX = Address.nodeAppend(GITHUB_NODE_PREFIX, "issue");
// There is no longer any need for an explicit creation function
const anIssueAddress = Address.nodeAppend(ISSUE_NODE_PREFIX, someIssueID);
```
Paired with @wchargin.
Test Plan:
Unit tests added; run `yarn travis`.
When the cron build fails, we want to make sure that we know about it.
This commit ensures that Travis emails @wchargin and @decentralion on
any build failure; our email filters will ensure that cron job failures
get noticed.
Based on documentation at: https://docs.travis-ci.com/user/notifications
This commit implements and tests the following methods on the `Graph`:
- `addNode`
- `removeNode`
- `hasNode`
- `nodes`
Test plan: Unit tests are included.
Paired with @wchargin
Summary:
Closes#336.
Test Plan:
Snapshots updated; changes are easily readable. Existing tests pass.
Running the cred explorer on the `sourcecred/sourcecred` graph shows
pull requests like `#124 (+73/−3): Update the README`, which is good.
wchargin-branch: pr-addition-deletion
This commit re-organizes our addressing code according to the following
conventions:
- All address code is implemented in `core/_address.js`, but clients
should not import that file directly.
- Public address-related methods will be made available via the
`Address` object in `core/graph.js`.
- The inaugural set of public address methods are:
- `nodeAddress`, for constructing a new `NodeAddress` from parts
- `edgeAddress`, for constructing a new `EdgeAddress` from parts
- `toParts`, for turning an address back into parts
- The types `NodeAddress` and `EdgeAddress`are also exported from
`graph.js`
Test plan: The tests cover both runtime and type properties.
Paired with @wchargin.
For more context and design discussions, see:
https://github.com/sourcecred/sourcecred/pull/334
This very minor commit adds the basic data structures to the `Graph`
that will act as underlying storage for nodes and edges. This is similar
to how nodes and edges are stored in `Graph` v1 and v2, except much
simpler, because we now have string keys, no data stored with nodes, and
minimal data stored with edges.
Test plan: The only observable change is that the graph can now be
constructed without error. I added a unit test to verify this behavior
:) as well as that the Graph constructor properly returns an object of
type `Graph` and not an `any`.
(Context: https://github.com/facebook/flow/issues/6400)
Paired with @wchargin
This commit implements Node/Edge addresses, and helper functions for
generating and manipulating them.
Test plan: Unit tests included.
Paired with @wchargin
The newly added `v3/src/core/graph.js` module is the heart of our `v3`
refactor, which aims at having our `Graph` class be just a graph[1] and
not a key-value store to boot.
This commit adds a skeleton of our intended `Graph` API. We've mostly
ported features from our existing `v1` `Graph` class, but with a few
changes:
- `NodeAddress` and `EdgeAddress` are both strings under the hood, which
allows for better performance and use of native JS Maps and Sets.
- Addresses are now hierarchical paths, so the functionality previously expressed
through `pluginName` and `type` can now be encoded into the path
components.
- `NodeAddress` and `EdgeAddress` are distinct types, and have different
serializations, so they can't get mixed up at runtime.
- It is no longer possible to bind arbitrary payloads to nodes or edges.
The expectation is that external data stores will be responsible for
that.
Test plan: No code was implemented; flow and lint are sufficient.
Paired with @wchargin
[1]: Technically, it's a [quiver]
[quiver]: https://en.wikipedia.org/wiki/Quiver_(mathematics)
We want to reset some of our basic assumptions, and make `Graph` into a
pure graph implementation, rather than a hybrid graph and key-value
store.
This is a substantial rewrite, so we want to start from scratch in a v3/
directory and pull code into v3 as necessary. So that we can do this in
a relatively clean fashion, we're first moving the v1 and v2 code into
their own directories.
Paired with @wchargin
Test plan: Travis, and `yarn backend`, `node bin/sourcecred.js start`.
Note that `yarn backend` and `node bin/sourcecred.js start` both use the
v1 versions. We'll migrate those (by changing paths.js) to v3 when
appropriate.
Summary:
This fixes a bug introduced in #317, which only occurred in the cron job
variant of the CI script (`yarn travis --full`): the two scripts run in
the cron job depend on `yarn backend` having previously written to the
`bin/` directory, but this is precisely what we wanted to prevent. To
fix this, we simply add an additional target for `yarn backend` during
the cron job. This is a little bit wasteful in that we compile the
backend applications twice, but it’s not a big deal because (a) it only
runs in cron jobs, so it won’t slow down normal builds, and (b) it only
takes about 5 seconds, anyway.
Test Plan:
Export a `GITHUB_TOKEN` and run `yarn travis --full`, which fails before
this change and passes after it.
wchargin-branch: cron-ci-overwrite-bin
Summary:
The storage format is similar to the V1 storage format. Nodes are sorted
by address and stored in an ordered list. Edges have `srcIndex` and
`dstIndex`, keyed against the node order. Each node stores the name of
its plugin and the result of invoking `toJSON` on its payload.
Test Plan:
Unit tests added; run `yarn travis`. The single added snapshot is easy
to read and verify.
wchargin-branch: v2-json
Summary:
A `Node` includes a ref and a payload. We should say that two nodes at
the same address are equal iff their payloads are equal, and two
payloads at the same address are equal iff their JSON contents are
equal. Importantly, `Node` equality does not depend on the `ref`’s
internal structure: this structure includes a reference to the enclosing
graph, and so using this notion of node equality caused `equals` to
incorrectly return `false` when the inputs were two logically equal
graphs with different internal structure. (It also caused `equals` to be
very slow, performing a deep equality check on the graphs for every
node.)
Test Plan:
A unit test has been strengthened. It fails before this patch, and
passes afterward.
wchargin-branch: v2-node-payload-json-equality
Summary:
Both the implementation and the tests here are pretty straightforward.
The only change from GraphV1 is that we should explicitly check that the
plugins are copied correctly, because graph equality does not check
this.
Test Plan:
New unit tests suffice.
wchargin-branch: copy-v2
Summary:
I dropped the `edgeDecomposition` and `neighborhoodDecomposition` test
cases, because (a) we don’t have a canonical “interesting graph” on
which to run them, compounding that (b) it’s not clear how much value
they add.
Test Plan:
New unit tests suffice.
wchargin-branch: mergeConservative-v2
The implementation is adapted from our previous implementation, but has
been refactored to be more appropriate for our generator-function
approach.
The unit tests comprehensively explore a simple example graph, testing
the following cases:
- Direction filtering (IN/OUT/ANY/unspecified)
- Node types
- Edge types
- Self-loop edges
- Dangling edges
- Removed edges don't appear
Test plan: Carefully inspect the added unit tests.
Paired with @wchargin