Summary:
This commit hooks up the PageRank table to the PageRank node
decomposition developed previously. The new cred explorer displays one
entry per contribution to a node’s cred (i.e., one entry per in-edge,
per out-edge, and per synthetic loop), listing the proportion of the
node’s cred that is provided by this contribution. This makes it easy to
observe facts like, “90% of this issue’s cred is due to being written by
a particular author”.
Paired with @decentralion.
Test Plan:
Unit tests added; run `yarn travis`.
wchargin-branch: pagerank-table-node-decomposition
Summary:
The aesthetically nicest win is in `WeightConfig`. Other changes are
nice to have.
In many cases, we reduce the specificity of error messages thrown. For
instance, if an invariant was violated on an edge `e`, then we might
have thrown an error with message `EdgeAddress.toString(e.address)`. But
we did so not because we thought that this was genuinely worth it, but
only because we were forced to explicitly throw an error at all. These
errors should never be hit, anyway, so we don’t feel bad about replacing
these with errors that are simply the string `"null"` or `"undefined"`,
as appropriate.
Test Plan:
Running `yarn travis --full` passes, and the cred explorer still seems
to work with both populated and empty `localStorage`.
wchargin-branch: use-null-util
Summary:
This commit adds a module with four functions: `get`, `orThrow`, `map`,
and `orElse`.
Here is a common pattern wherein `get` is useful:
```js
sortBy(Array.from(map.keys()), (x) => {
const result = map.get(x);
if (result == null) {
throw new Error("Cannot happen");
}
return result.score;
});
// versus
sortBy(Array.from(map.keys()), (x) => NullUtil.get(map.get(x)).score)
```
(The variant `orThrow` allows specifying a custom message that is only
computed in the case where the error will be thrown.)
Here is a common pattern where `map` is useful:
```js
arr.map((x) => {
const result = complicatedComputation(x);
return result == null ? result : processResult(result);
});
// versus
arr.map((x) => NullUtil.map(complicatedComputation(x), processResult))
```
In each of these cases, by using these functions we gain a dose of
safety in addition to our concision: it is tempting to “shorten” the
expression `x == null ? y : z` to simply `x ? y : z`, while forgetting
that the latter behaves incorrectly for `0`, `false`, `""`, and `NaN`.
Similar patterns like `x || defaultValue` also suffer from this problem,
and can now be replaced with `orElse`.
Designed with @decentralion.
Test Plan:
Unit tests included; run `yarn travis`.
wchargin-branch: null-util
There's no sense having a landing page with no content and
a nav bar with only one meaningful options. We can re-add
them later if we actually need navigation
Test plan: Local testing
Summary:
When updating `PagerankTable` to work with contributions, we found it
difficult to keep track of everything when we tried to do two things
simultaneously: compute the values to be displayed, and render them
hierarchically. @decentralion suggested computing the relevant data
ahead of time, and then having a straightforward React component to
render this structure. This would incidentally make `PagerankTable`
easier to test.
This commit implements that data structure and the function to create
it from a `PagerankResult`. A subsequent commit will update
`PagerankTable` accordingly.
As evidence that this structure is well-designed, note that the main
contents of a contribution row can be rendered entirely from a
`ScoredContribution` datum (though the component will still of course
require the full `PagerankNodeDecomposition` to pass down to its
children). (At least, I think that it can be!)
Designed with @decentralion.
Test Plan:
Unit tests added. I have checked that the snapshot is structurally
correct: each node has contributions with the correct contributors.
I did not manually compute the stationary distribution and check the
snapshot for correctness. The snapshot is complemented by automated
tests.
wchargin-branch: pagerank-node-decomposition
Summary:
Now that `MapUtil` provides `toObject`/`fromObject`, we can keep storing
weights in localStorage while representing them as ES6 `Map`s in memory.
Here are some advantages:
- The code is genuinely more typesafe. While writing this,
I accidentally wrote `edgeWeights.get(key)`, where `edgeWeights`
should have been `nodeWeights`. This was caught at compile time, and
would not have been in the previous version.
- Relatedly, the code now has zero `any`-casts as opposed to five.
- The initialization of the default values is not abysmally ugly.
- Whenever we iterate over these maps, (a) we can use `.entries()`,
and (b) we don’t have to cast between string keys and semantic keys.
This simplifies some of the control flow.
- The extra null-checking on `get` forces us to either think about
ways in which the check might fail, or reuse a previously fetched
value that is known to be non-null (perhaps because it came from
`entries`).
- A particularly annoying Prettier line-break is avoided. :-)
Here are some disadvantages:
- The null-pipelining around the `rehydrate` function is a bit
annoying. As @decentralion pointed out, what we want here is not a
default value, but a default value and a function to transform a
present value. This is Haskell’s `maybe : β → (α → β) → Maybe α → β`
or Java’s `optional.map(fromObject).orElse(defaultValue)`. This
commit implements one approach; another is to note that `fromObject`
is invertible, writing `fromObject(LocalStore.get(k, toObject(d)))`.
- That’s it, I think?
Test Plan:
I’ve tested that the sliders for both edge and node weights correctly
influence the PageRank behavior, that the component is properly
initialized with an empty localStorage, and that the component properly
rehydrates from localStorage.
wchargin-branch: weightconfig-maps
Summary:
These call sites were selected from `git grep Map`. In this commit, we
only add usage of the utility functions; we do not change any existing
object types to maps.
Test Plan:
Running `yarn travis --full` passes.
wchargin-branch: use-map-util
Summary:
We’d like to like ES6 `Map`s, because they provide better type safety
than objects (primarily, `Map.prototype.get` has nullable result type).
However, the vanilla APIs are weak. Prominent problems are that `Map`s
always become `"{}"` under `JSON.stringify`, that there is no easy way
to convert between `Map`s and objects, and that there are no functions
to map over the keys and values of `Map`s.
In this commit, we add versions of those functions to a utility module.
The value-level implementations are straightforward, but these functions
nevertheless deserve a utility module because the types are somewhat
tricky to get right. The implementation requires casts through `any`,
and these should be written, analyzed, and proven correct just once. (In
particular, it would be easy to write an unsound type for `fromObject`.)
In a followup commit, we will amend existing portions of the codebase to
use these functions.
Test Plan:
Unit tests added; run `yarn travis`.
wchargin-branch: map-util
This commit adds another bank of sliders to the cred explorer, for
changing the directionality of edges. The sliders have the range [0,1]
with step size of 0.01.
The layout is pretty ugly and clearly should be refactored. But playing
with these sliders is interesting :)
Test plan: We don't have any unit tests on the WeightConfig, but I did
drive it by hand. An interesting experiment is to set the AUTHORS edge
directionality to 1, so that users get no credit for authoring posts. As
expected, this utterly tanks the users' scores; many users then have a
score of -Infinity.
Summary:
If we want to snapshot an edge, then none of the available options is
ideal:
- Snapshotting the edge directly includes literal NUL bytes in the
snapshot file, so it is treated as binary. This is bad.
- Using `edgeToString` works, but all fields of the edge are combined
into a single string, which is somewhat hard to read.
- Using `edgeToParts` works, but each address in the edge takes up a
lot of visual space: one line per part in the address. This is also
somewhat hard to read.
This commit adds `edgeToStrings`, which simply applies the appropriate
`toString` operation to each field of an edge.
Test Plan:
Unit tests added; run `yarn travis`.
wchargin-branch: edge-to-strings
The implementation is similar to the LocalStore usage in
`credExplorer/app.js`. Had to make a spurious refactor from Map to
Object because ES6 maps don't stringify by default, and I didn't feel
like writing a custom JSON serializer.
Test plan:
Didn't add unit tests, although at some point we should come up with a
nice LocalStore mock and test LocalStore code. I did, however, manually
try it out and verify that it works :)
Paired with @wchargin
Summary:
Prettier inserted these in a previous version of the code, but the lines
got shorter and so Prettier no longer minds if we remove the breaks.
Test Plan:
shipitquick
wchargin-branch: remove-line-breaks
Summary:
This change is motivated by the behavior of loops, but applies more
generally to edges. Previously, a loop would induce two contributions
with the same contributor, but possibly different weights: one with the
to-weight of the edge and one with the fro-weight. For one, this is
annoying to downstream clients, who would like to use the contributor as
a superkey. But it is also somewhat strange that a single contributor
could have two different weights.
The same applies to edges in general: every edge induces two
contributions with the same contributor, of type `NEIGHBOR`.
As of this commit, we replace `NEIGHBOR` with `IN_EDGE` and `OUT_EDGE`,
one of each induced by each edge. This has the effect that a contributor
maps to at most one contribution.
Test Plan:
Existing unit tests updated.
wchargin-branch: separate-in-out-edge-contributions
Summary:
When we convert a graph to a Markov chain, each cell in the transition
matrix is a sum of edge weights from the given `src` to the given `dst`,
plus the synthetic self-loop needed for stability. Performing this sum
loses information: given the transition matrix, a client cannot
determine how much a particular edge contributed to the score of a node
without redoing the relevant computations. In this commit, we expose the
structure of these contributions (i.e., edges and synthetic loops).
This changes the API of `graphToMarkovChain.js`, but it does not change
the resulting Markov chains. It also does not change the API of
`pagerank.js`. In particular, clients of `pagerank.js` will not have
access to the contributions structure that we have just created.
Test Plan:
Existing unit tests have been updated to use the new API, and pass
without change. An additional test is added for a newly exposed
function, even though this function is also tested extensively as part
of later downstream tests.
In one snapshot, one value changes from `0.25` to `0.25 + 1.7e-16`. The
other values in the enclosing distribution do not change, so I think
that it is more likely that this is due to floating-point instability
than an actual bug. (I’m not sure where exactly I commuted or associated
an operation, but it’s quite possible that I may have done so). To
compensate, I added an additional check that the values in the
stationary distribution sum to `1.0` within `1e-9` tolerance; this check
passes.
wchargin-branch: expose-contributions
Previously, when expanding a node in the cred explorer, it would display
the neighboring nodes, but not any information about the edges linking
to that node. If the same node was reached by multiple edges, this
information was not communicated to the user.
As of this commit, it now concisely communicates what kind of edge was
connecting the chosen node to its adjacencies. There's a new `edgeVerb`
method that plugin adapters must implement, which gives a
direction-based verb descriptiong of the edge, e.g. "authors" or "is
authored by".
Test plan:
Unit tests added to the PagerankTable tests, and hand inspection.
Paired with @wchargin
Summary:
This commit adds sliders for each node and edge type (hard-coded for
now), and hooks them up to the cred explorer so that re-running PageRank
uses the newly induced edge evaluator.
Paired with @decentralion.
Test Plan:
We will add tests later. We promise! In the meantime, the results that
appear when you drag a slider and re-run PageRank seem appropriate. For
instance, changing the “Git blob” node type from `0.0` to `-10.0`
results in the Git blobs not dominating the whole view.
wchargin-branch: configurable-weights-ui
Summary:
PageRank wants an _edge evaluator_: a function mapping an edge to its
to-weight and fro-weight. This commit provides functions for creating
edge evaluators based on the high-level, coarse notions of node and edge
types. We use [the formulation described in a comment on #476][1].
Paired with @decentralion.
[1]: https://github.com/sourcecred/sourcecred/issues/476#issuecomment-402576435
Test Plan:
None. We will add tests later. We promise!
wchargin-branch: edge-weight-generators
Summary:
As we add sliders for adjusting the PageRank parameters, we trigger a
bunch of unneeded renders on `PagerankTable`. As `PagerankTable` is a
pure component, we can [mark it and its children as such][1] to see notable
performance improvements: the `Array.from` and `sort` in its `render`
method are showing up on the flamegraph.
[1]: https://reactjs.org/docs/react-api.html#reactpurecomponent
Paired with @decentralion.
Test Plan:
Unit tests pass, whereas if we instead implement `shouldComponentUpdate`
by `return false` then the interaction tests fail. Also, `yarn start`
seems to behave as expected as we switch among different graphs.
wchargin-branch: pure-pageranktable
Summary:
To verify that the correct graph is loaded, we display the graph’s node
and edge count in the UI. As previously implemented, this would be
recomputed on every change, requiring iteration over all nodes and edges
of the graph. On my machine (T440s) and the SourceCred graph, this
induced an ~80ms performance floor for any operations that caused the
app to re-render, which is noticeable.
This commit retains the useful information, but computes it only at
graph load time.
Paired with @decentralion.
Test Plan:
Note that the behavior is unchanged.
wchargin-branch: cache-graph-size
Rewrites the README to be a lot more concrete about what SourceCred is
doing, and to give more up-to-date information overall.
Test plan: The README includes a small code block with instructions for
turning on a local copy of the Cred Explorer. I ran this on my machine
(inside /tmp) and it worked locally.
Summary:
See #432. This commit switches the GitHub Markdown parser from matching
regular expressions against the raw post body to matching the same
regular expressions against the semantic text of a Markdown document.
See test cases for `parseReferences` and `parseMarkdown` for more
details.
There are no changes to snapshots or cached GitHub data because the
Markdown in the example repository is simple enough to be properly
parsed by a simple approach that treats everything as literal text.
The change to the “finds numeric references in a multiline string” test
case is to strip leading indentation from the string in question. As
previously written, the test had four spaces of leading indentation on
each line, causing the Markdown parser to interpret it as a code block,
in turn causing our logic to (correctly) omit it entirely. The revised
version of the test has no leading indentation.
Test Plan:
Light unit tests included; these tests are intended to verify that the
actual Markdown logic from `parseMarkdown` is being used, and that file
has more extensive tests. Additionally, `yarn travis --full` passes.
wchargin-branch: github-parse-markdown
Summary:
This commit exposes a function of type `(string): string[]` to
encapsulate the whole Markdown pipeline, from parsing to AST
transformation to text node extraction. Clients of this module do not
need to know about `commonmark`.
Test Plan:
A single comprehensive test case has been added.
wchargin-branch: text-blocks
Summary:
The regular expressions used to detect GitHub references were of the
form `/(?:\W|^)(things)(?:\W|$)/gm`, where the outer non-capturing
groups were intended to enforce a word boundary constraint. However,
this caused reference detection in strings like `"#1 #2"` to fail,
because the putative matches would be `"#1 "` and `" #2"`, but these
matches overlap, and the JavaScript RegExp API (like most such APIs)
finds only non-overlapping matches. Therefore, in a string of
space-separated references of the same kind, only every other reference
would be detected.
A solution is to use a positive lookahead instead of the second
non-capturing group: i.e., `/(?:\W|$)/` becomes `/(?=\W|$)/`. (Ideally,
the first non-capturing group would just be a lookbehind, but JavaScript
doesn’t support those.)
In some cases, using `\b` is a simpler solution. But this does not work
in all cases: for instance, it works for repo-numeric references, but
does not work for numeric references, because the presence of the hash
means that there cannot be an immediately preceding word boundary. For
consistency, I opted to use the lookahead solution in all cases, but any
solution that passes tests is okay with me.
Test Plan:
Regression tests added. They fail before this patch and pass with it.
wchargin-branch: fix-space-separated-github-references
Summary:
The `^` metacharacter only matches the start of a non-initial line when
the `m` (“multiline”) flag is set on the RegExp. This flag was set only
for the GitHub URL RegExp.
Test Plan:
Regression tests added. They fail before this patch and pass with it.
wchargin-branch: fix-sol-github-references
Test plan:
`git grep -i v3` only shows incidental hits in longer strings
`yarn travis --full` passes
`yarn backend` works
`yarn build` works
`yarn start` works
`node bin/sourcecred.js start` works
`node bin/sourcecred.js load sourcecred example-github` works
Paired with @wchargin
Summary:
The bridge introduced in #448 has now served its purpose, and may be
deconstructed. This implements the first part of the last step of the
plan described in that pull request.
Paired with @decentralion.
Test Plan:
After `yarn backend && yarn build`:
- `node bin/sourcecred.js start` works, and
- `yarn start` works, and
- `yarn travis --full` works.
wchargin-branch: demolish-bridge
Test plan:
`node bin/sourcecred.js load sourcecred example-github` works
`yarn start` works
`node bin/sourcecred.js start-v3` works
`yarn travis --full` passes
Paired with @wchargin
Summary:
This could also be moved into the bridge directory, but this way is
marginally easier, and it doesn’t really matter in the end.
Test Plan:
`yarn backend` followed by `node bin/sourcecredV3.js start-v3` works.
wchargin-branch: start-v3
The `load` command replaces `plugin-load`. By default, it loads data for
all plugins, and does so in parallel using execDependencyGraph. If
passed the optional `--plugin` flag, then it will load data just for
that plugin.
As an implementation detail, when loading all plugins, load calls itself
with the plugin flag set.
Usage:
`node bin/sourcecred.js load repoOwner repoName`
Test plan:
Tested by hand; I blew away my SourceCred directory and then loaded the
example-github repository.
This integrates the PageRank table from #466 into the v3 cred explorer
app, bringing the v3 frontend to better-than-parity with v1!
Test plan:
Some unit tests were included, and running `yarn start` and inspecting
the App reveals that it is working correctly. Loading a PageRank result
and then changing the repository no longer triggers a crash :).
Paired with @wchargin
Ports #265 to the v3 branch, along with some tweaks:
- Only display log score, and normalize them by adding 10 (so that most
are non-negative)
- Change colors to a soothing green
- Improve display, e.g. make overflowing node description text wrap
within the row
Implements most of the tests requested in #269
Test plan: Many unit tests added
Paired with @wchargin
Summary:
This enables plugins to specify different semantic types of nodes, along
with human-readable names. This will be used, for instance, in the cred
explorer, where users may filter to one of these node types.
Paired with @decentralion.
Test Plan:
Flow passes.
wchargin-branch: plugin-node-types
Summary:
This presents a human-readable name for a plugin. It’s not yet used
anywhere.
Paired with @decentralion.
Test Plan:
Flow passes.
wchargin-branch: plugin-name
Test plan:
Run the following commands:
```
node bin/sourcecredV3.js load-plugin-v3 sourcecred example-github --plugin=git
node bin/sourcecredV3.js load-plugin-v3 sourcecred example-github --plugin=github
yarn start
```
Then, navigate in-browser to the v3 cred explorer and load data for
`sourcecred/example-github`. The following messages are printed to
console:
```
GitHub: Loaded graph: 31 nodes, 73 edges.
Git: Loaded graph: 15 nodes, 19 edges.
Combined: Loaded graph: 44 nodes, 92 edges.
```
Paired with @wchargin
Summary:
This enables grabbing the GitHub relational view from disk and
converting it to a graph on the client.
Paired with @decentralion.
Test Plan:
For testing, information about the GitHub graph is printed when you
click the “Load data” button in the UI. Do so.
wchargin-branch: github-plugin-adapter
Summary:
Text input boxes for repository owner and name now appear. “Loading the
data” consists of logging the attempt to the console.
Test Plan:
Run `yarn start`, and note that the inputs are keyed against the same
local store key as their V1 equivalents. Note that clicking “Load data”
prints a message to the console.
Paired with @decentralion.
wchargin-branch: v3-load-data-ui
In GitHub, you can make cross repo references. For example,
sourcecred/sourcecred#459 is one such reference. This commit adds
support for detecting those references and adding them to the GitHub
graph.
Test plan:
See attached unit tests.
This commit adds `loadGitData`, which clones the git repository for a
given repo and saves the corresponding git graph. It also adds that
method to the `loadPlugin` command, so that the following command now
works:
```
$ node bin/sourcecredV3.js load-plugin-v3 sourcecred example-git --plugin=git
```
After running that command, the correct file is present:
```
$ du -sh tmp/sourcecred/data/sourcecred/example-git/git/graph.json
28K /home/dandelion/tmp/sourcecred/data/sourcecred/example-git/git/graph.json
```
The command takes:
| repository | time (s) |
:------------------------- | ----------:
| `sourcecred/example-git` | 1 |
| `sourcecred/sourcecred` | 5 |
| `ipfs/js-ipfs` | 18 |
| `ipfs/go-ipfs` | ∞ (OOM) |
Summary:
Pending the resolution of brigand/babel-plugin-flow-react-proptypes#201,
we’re removing this plugin from our build, because it results in
incorrect code generation. We’ll be happy to add it back if the bug is
fixed.
Test Plan:
Fingers crossed.
wchargin-branch: remove-bpfrpt
This module exposes a method, `pagerank`, which is a convenient entry
point for taking a `Graph` and returning a `PagerankResult`. This
obviates the need for `src/v1/app/credExplorer/basicPagerank.js`.
Test plan: Unit tests included.
I'm planning to make a `pagerank.js` module that is a clean entry point
for all the graph-pagerank-related code, so it will be cleaner to expose all
the default options there.
Test plan: travis
Paired with @wchargin
If `findStationaryDistribution` is passed `0` as `maxIterations`, then
it should return the initial distribution.
Test plan: see new unit test
Paired with @wchargin
This commit enables paired authorship on GitHub authored entities.
If the entity has the string "Paired with" in the body, followed by a
username reference, that entity will be recorded as having dual
authorship, with the nominal author and the paired-with author being
treated identically in the relational view and the graph.
If there's a need to pair with more than one author, the "Paired with"
signifier may be repeated. The regex matcher is forgiving of
capitalizing the P or W, and an optional colon may be added immediately
after the word "with".
Note that the code assumes that every `TextContentEntity` is also an
`AuthoredEntity`. If that changes, it will cause a type error and we'll
need to refine the code somewhat.
As implemented, it is impossible for the same user to author a post
multiple time; if this is textually suggested (e.g. by a paired-with
reference to the post's nominal author), the extra paired-with
references are silently ignored. Also, having a paired-with reference
suppresses the basic reference (although it is possible to have a post
that is paired with someone, and additionally references them).
Test plan:
Tests have been updated, and the behavior of the parser is extensively
tested. For an end-to-end demonstration, I've also added a unit test in
the relational view that verifies that sourcecred/example-github#10 has
two authors. You can also see that the graph snapshot has updated to
include additional authorship edges (and that corresponding reference
edges have disappeared).
Closes#218