In #529, I made the cred explorer populate a dropdown with the list of
repositories that are available to explore. That dropdown defaults to
selecting the alphabetically first repository.
This has an unfortunate consequence in that it makes it impossible for
us to explicitly set a default - for example, we would like
sourcecred.github.io/explorer to show sourcecred/sourcecred by default,
but instead it shows example-git.
So that we can choose the default, I've changed the logic so that it
instead shows the most-recently-loaded data first. This required
a breaking change to the repoRegistry serialized format, so I've also
refactored the module to use compat, which I should have done from the
beginning.
Test plan:
Unit tests for the repo selector are updated. The CLI load command
unfortunately has no tests, so I manually tested that it always provides
the lastest repository last, and appropriately handles the case where
the same repository is loaded multiple times.
Showing our example-github and example-git repos on sourcecred.github.io
is not particularly interesting. Let's show ipfs/js-ipfs instead!
Since sourcecred/sourcecred is the last repo to load, as of #531 it will
be the default option.
Test plan: Dry run of deploy script
Context: The Cred Explorer loads data (currently on a per-repository
basis) that has previously been prepared by running the `sourcecred
load` cli command.
Currently, to select a repository to load, the user must manually type
the repository owner and name. This is a confusing UI, because it
suggests that any repository may be chosen, when in fact only repos
already loaded into the data store are available. The user is given no
feedback as to which repositories are valid options.
As of #516, the backend stores a registry listing available
repositories. This commit adds a `RepositorySelect` component which
loads the available from that registry, and makes them available in a
dropdown, in sorted order.
When the user manually selects one of the repositories, that selection
is persisted into localStorage and respected on future loads. If the
user hasn't made such a choice, then the first repository is selected by
default.
The implementation is highly influenced by testability considerations.
The default export, `<RepositorySelect onChange={} localStore={} />`, is
pretty straightforward. The `RepositorySelect` is somewhat cumbersome to
test because it asynchronously fetches data and then updates its state,
which affects the render output. So as to avoid testing inside async
react components wherever possible, I've factored out:
* `loadStatus`, which uses fetch and localStore to get the status of the
selector.
* `PureRepositorySelect`, which just renders a `Status`, such as
loading, failure, or valid
* `LocalStoreRepositorySelect`, which wraps the `PureRepositorySelect`
with logic to bind the repository select to localStore on change.
Test plan: Extensive unit tests were added. Also, to ensure that the
tests were testing the right thing, I manually tested:
- attempting to load invalid registry
- attempting to load with no registry
- attempting to load with empty registry
- loading without valid localStore
- changing the setting via dropdown
- loading from localStore after changing the dropdown
And all behavior was as expected.
Thanks to @wchargin for considerable help testing this PR.
Also add config/jest/setupJest.js so we can configure jest-fetch-mock
Test plan: I have verified that mocked fetch works as expected in a
downstream commit.
Summary:
Test code should probably always use a checked, memory-backed local
storage implementation. This endpoint will help users not forget to
include the checks.
wchargin-branch: test-local-store
Summary:
Might as well have runtime type safety, in case we accidentally try to
store any more `Map`s or `undefined`s.
Test Plan:
Tests pass, but are likely not sufficient. Manual testing indicates that
the local storage still works, for both reads and writes, on a fresh
profile or with existing data, for both the repository owner/name and
the weight configuration.
wchargin-branch: use-checked-local-store
Summary:
We can use this in tests. If need be, we can enhance this class to allow
simulating failures, low storage limits, etc., but just having a pure
implementation at all is all we need right now.
Test Plan:
Unit tests added.
wchargin-branch: memory-local-store
Summary:
This provides some extra checking around `LocalStore` calls. In
particular, it fails fast on the nasty bug where storing a `Map`
actually stores the empty object (`JSON.stringify(new Map()) === "{}"`).
Similarly, retrieving a value that was stored as `undefined` will raise
an error, because `JSON.parse(JSON.stringify(undefined))` raises an
error.
This should have negligible performance impact—local storage access
should never be on a critical path. We can choose to elide this in
production if we want.
Test Plan:
Unit tests added. Manual testing of the cred explorer yields no errors.
wchargin-branch: checked-local-store
Summary:
This commit modifies components that directly depend on the
browser-specific local store implementation to instead have their
dependencies injected.
Test Plan:
Tests pass, but are likely not sufficient. Manual testing indicates that
the local storage still works, for both reads and writes, on a fresh
profile or with existing data, for both the repository owner/name and
the weight configuration.
wchargin-branch: di-localstore
Summary:
We’d really like to be able to test components that use `LocalStore`. We
can do this by dependency-injecting the storage backend. This commit
begins that process by extracting `LocalStore` to its interface,
preserving the unique existing implementation.
wchargin-branch: extract-localstore
Summary:
This commit switches to a double-buffered PageRank implementation. When
benchmarked on `ipfs/js-ipfs`, the critical section improves from
3059 ms to 2433 ms (79.5% of original), and peak heap usage drops from
342 MB to 207 MB. (Tested non-rigorously in Chrome 67.)
Test Plan:
Existing unit tests for `sparseMarkovChainAction`,
`findStationaryDistribution`, and `pagerank` are sufficient.
wchargin-branch: pagerank-dbuf
Summary:
The PageRank functions can take a long time to compute. We’d like them
to not lock the browser, and we’d also like them to communicate with
their clients (e.g., to update a progress bar). This code updates
`findStationaryDistribution` and downstream `pagerank` to return
promises.
Test Plan:
Unit tests updated. The cred explorer (`yarn start`) still works.
Applying
```diff
diff --git a/src/core/attribution/markovChain.js b/src/core/attribution/markovChain.js
index 2acce9c..c7a7159 100644
--- a/src/core/attribution/markovChain.js
+++ b/src/core/attribution/markovChain.js
@@ -166,6 +166,7 @@ export function findStationaryDistribution(
return;
}
} while (Date.now() - start < yieldAfterMs);
+ console.log("Yielding.");
setTimeout(tick, 0);
};
tick();
```
causes the appropriate log messages to be printed in the browser—about
once every ten iterations for `sourcecred/sourcecred`.
wchargin-branch: asynchronous-pagerank
We want the UI to offer a list of available repositories, rather than
using a text input box. To do this, we first need the backend to include
a registry of all available repositories.
Test plan:
Sadly we don't have CLI testing, so I manually verified this by doing
the following:
```
$ yarn backend
$ rm -r $SOURCECRED_DIRECTORY
$ node bin/sourcecred.js load sourcecred example-github
$ cat $SOURCECRED_DIRECTORY/repositoryRegistry.json
{"sourcecred/example-github":true}
$ node bin/sourcecred.js load sourcecred example-github
$ cat $SOURCECRED_DIRECTORY/repositoryRegistry.json
{"sourcecred/example-github":true}
$ node bin/sourcecred.js load sourcecred example-git
$ cat $SOURCECRED_DIRECTORY/repositoryRegistry.json
{"sourcecred/example-git":true,"sourcecred/example-github":true}
```
Previously, WeightConfig hackily contained its own enumeration of all
node and edge types. Now, it loads them from the StaticPluginAdapter.
Test plan:
Unit tests pass, as does manual inspection of the frontend.
In some cases (e.g. WeightConfig) we want to have information from the
PluginAdapater before loading any data from the server. In other cases,
we need to combine the PluginAdapater with actual data, e.g. so we can
get the description of a GitHub node.
To support this, we split the PluginAdapter into a Static and Dynamic
component. The Dynamic component has data needed to give node
descriptions, etc. Given a static adapter, you can get a promise to load
the dynamic adapter. Given the dynamic adapter, you can immediately get
the static adapter. (There's a parallel to NodeReference (static) and
NodePorcelain (dynamic)).
Test plan:
Travis passes, as does manual testing of the frontend.
- PluginAdapters no longer expose a Renderer; instead, the render
methods are inlined on the PluginAdapter. The extra abstraction didn't
provide any lift in the current architecture.
- The edgeVerb function has been removed.
- PluginAdapters now enumerate EdgeTypes. Each has a prefix, and a
forward and a backward name.
Test plan: `yarn travis`, plus manual testing of the frontend and the
weight config.
Summary:
We don’t need this to be a “progressive web app”—certainly not now. The
n+1 caching problem is not a good tradeoff for us, and furthermore
service workers are causing flashes of content on server-side rendered
pages.
This commit is a quick fix to remove them. We can remove the code
entirely if we want, or just keep it as is.
Test Plan:
On a machine has the service worker registered, run `yarn build`, then
`node bin/sourcecred.js start`. Note in the network panel that the
service worker is loaded on the first page load, but then deregistered.
On subsequent refreshes, it should not activate. In the “Application”
panel of the Chrome dev tools, it should appear as “deleted”.
wchargin-branch: disable-sw
The WeightConfig is a power user feature. Now that we're building a
public-facing demo out of the Cred Explorer, it will be better to hide
the weight configuration by default.
This commit adds a button for showing/hiding the weight configuration.
The weights are still propagated correctly regardless of whether the
weight config is shown.
Test plan:
- Ensure that the site loads with weights hidden by default.
- Ensure that clicking the button causes the weight config to display.
- Ensure that PageRank loads and displays correctly with the weights
hidden.
- Ensure that changes to the weight config still propagate to PageRank
(with weights hidden or not hidden).
Summary:
This removes the hard-coded route data from the Webpack config,
replacing it with the list of paths exported by the route data module.
Test Plan:
Note that the output of `yarn build` is identical before and after this
change: namely,
```shell
$ find build -exec shasum -a 256 {} + | shasum -a 256
7610a61f8a977f1d8edd849fc81256ca15f41f366e5fdb4b59a5d5ce37d6d58e
```
wchargin-branch: non-hard-coded-route-data
Test Plan:
Ensure that `require("./src/app/routeData")` works in `node` without any
preprocessing. Ensure that `yarn start` works, and that `yarn build`
then `node ./bin/sourcecred.js start` also works.
wchargin-branch: vanilla-route-data
Summary:
Some of the code here is adapted from my site (source available on
GitHub at wchargin/wchargin.github.io). It has been improved when
possible and made worse when necessary to fit into our existing build
system with minimal churn.
As of this commit, there remain the following outstanding tasks:
- Use a non-hardcoded list of paths in static site generation router.
This is not trivial. We have the paths nicely available in
`routes.js`, but this module is written in ES6, and transitively
depends on many files written in ES6 (i.e., the whole app). Yet
naïvely it would be required from a Webpack config file, which is
interpreted as vanilla JavaScript.
- Add `csso-loader` to minify our CSS. This is easy.
- Add unit tests for `dedent`. (As is, it comes from my site
verbatim. I wrote it. dmnd’s `dedent` package on npm is insufficient
because it dedents arguments as well as the format string, which is
incorrect at least for our purposes.)
- Link in canonical static data for the site.
- Rip out the whole build system and replace it with my build config,
which is orders of magnitude saner and less bad. (By “the whole
build system” I mostly mean `webpack.config.{dev,prod}.js`.)
Test Plan:
```shell
$ yarn backend
$ yarn build
$ node ./bin/sourcecred.js start
```
wchargin-branch: static-v0
Summary:
This adds a dummy landing page. We’ll want to actually put nice content
on it. For development convenience, I’m totally fine with having the
`yarn start` launch `/explorer` instead of just `/`.
Test Plan:
Run `yarn start` and note that the navigation works.
wchargin-branch: landing-page
Summary:
This commit hooks up the PageRank table to the PageRank node
decomposition developed previously. The new cred explorer displays one
entry per contribution to a node’s cred (i.e., one entry per in-edge,
per out-edge, and per synthetic loop), listing the proportion of the
node’s cred that is provided by this contribution. This makes it easy to
observe facts like, “90% of this issue’s cred is due to being written by
a particular author”.
Paired with @decentralion.
Test Plan:
Unit tests added; run `yarn travis`.
wchargin-branch: pagerank-table-node-decomposition
Summary:
The aesthetically nicest win is in `WeightConfig`. Other changes are
nice to have.
In many cases, we reduce the specificity of error messages thrown. For
instance, if an invariant was violated on an edge `e`, then we might
have thrown an error with message `EdgeAddress.toString(e.address)`. But
we did so not because we thought that this was genuinely worth it, but
only because we were forced to explicitly throw an error at all. These
errors should never be hit, anyway, so we don’t feel bad about replacing
these with errors that are simply the string `"null"` or `"undefined"`,
as appropriate.
Test Plan:
Running `yarn travis --full` passes, and the cred explorer still seems
to work with both populated and empty `localStorage`.
wchargin-branch: use-null-util
Summary:
This commit adds a module with four functions: `get`, `orThrow`, `map`,
and `orElse`.
Here is a common pattern wherein `get` is useful:
```js
sortBy(Array.from(map.keys()), (x) => {
const result = map.get(x);
if (result == null) {
throw new Error("Cannot happen");
}
return result.score;
});
// versus
sortBy(Array.from(map.keys()), (x) => NullUtil.get(map.get(x)).score)
```
(The variant `orThrow` allows specifying a custom message that is only
computed in the case where the error will be thrown.)
Here is a common pattern where `map` is useful:
```js
arr.map((x) => {
const result = complicatedComputation(x);
return result == null ? result : processResult(result);
});
// versus
arr.map((x) => NullUtil.map(complicatedComputation(x), processResult))
```
In each of these cases, by using these functions we gain a dose of
safety in addition to our concision: it is tempting to “shorten” the
expression `x == null ? y : z` to simply `x ? y : z`, while forgetting
that the latter behaves incorrectly for `0`, `false`, `""`, and `NaN`.
Similar patterns like `x || defaultValue` also suffer from this problem,
and can now be replaced with `orElse`.
Designed with @decentralion.
Test Plan:
Unit tests included; run `yarn travis`.
wchargin-branch: null-util
There's no sense having a landing page with no content and
a nav bar with only one meaningful options. We can re-add
them later if we actually need navigation
Test plan: Local testing
Summary:
When updating `PagerankTable` to work with contributions, we found it
difficult to keep track of everything when we tried to do two things
simultaneously: compute the values to be displayed, and render them
hierarchically. @decentralion suggested computing the relevant data
ahead of time, and then having a straightforward React component to
render this structure. This would incidentally make `PagerankTable`
easier to test.
This commit implements that data structure and the function to create
it from a `PagerankResult`. A subsequent commit will update
`PagerankTable` accordingly.
As evidence that this structure is well-designed, note that the main
contents of a contribution row can be rendered entirely from a
`ScoredContribution` datum (though the component will still of course
require the full `PagerankNodeDecomposition` to pass down to its
children). (At least, I think that it can be!)
Designed with @decentralion.
Test Plan:
Unit tests added. I have checked that the snapshot is structurally
correct: each node has contributions with the correct contributors.
I did not manually compute the stationary distribution and check the
snapshot for correctness. The snapshot is complemented by automated
tests.
wchargin-branch: pagerank-node-decomposition
Summary:
Now that `MapUtil` provides `toObject`/`fromObject`, we can keep storing
weights in localStorage while representing them as ES6 `Map`s in memory.
Here are some advantages:
- The code is genuinely more typesafe. While writing this,
I accidentally wrote `edgeWeights.get(key)`, where `edgeWeights`
should have been `nodeWeights`. This was caught at compile time, and
would not have been in the previous version.
- Relatedly, the code now has zero `any`-casts as opposed to five.
- The initialization of the default values is not abysmally ugly.
- Whenever we iterate over these maps, (a) we can use `.entries()`,
and (b) we don’t have to cast between string keys and semantic keys.
This simplifies some of the control flow.
- The extra null-checking on `get` forces us to either think about
ways in which the check might fail, or reuse a previously fetched
value that is known to be non-null (perhaps because it came from
`entries`).
- A particularly annoying Prettier line-break is avoided. :-)
Here are some disadvantages:
- The null-pipelining around the `rehydrate` function is a bit
annoying. As @decentralion pointed out, what we want here is not a
default value, but a default value and a function to transform a
present value. This is Haskell’s `maybe : β → (α → β) → Maybe α → β`
or Java’s `optional.map(fromObject).orElse(defaultValue)`. This
commit implements one approach; another is to note that `fromObject`
is invertible, writing `fromObject(LocalStore.get(k, toObject(d)))`.
- That’s it, I think?
Test Plan:
I’ve tested that the sliders for both edge and node weights correctly
influence the PageRank behavior, that the component is properly
initialized with an empty localStorage, and that the component properly
rehydrates from localStorage.
wchargin-branch: weightconfig-maps
Summary:
These call sites were selected from `git grep Map`. In this commit, we
only add usage of the utility functions; we do not change any existing
object types to maps.
Test Plan:
Running `yarn travis --full` passes.
wchargin-branch: use-map-util
Summary:
We’d like to like ES6 `Map`s, because they provide better type safety
than objects (primarily, `Map.prototype.get` has nullable result type).
However, the vanilla APIs are weak. Prominent problems are that `Map`s
always become `"{}"` under `JSON.stringify`, that there is no easy way
to convert between `Map`s and objects, and that there are no functions
to map over the keys and values of `Map`s.
In this commit, we add versions of those functions to a utility module.
The value-level implementations are straightforward, but these functions
nevertheless deserve a utility module because the types are somewhat
tricky to get right. The implementation requires casts through `any`,
and these should be written, analyzed, and proven correct just once. (In
particular, it would be easy to write an unsound type for `fromObject`.)
In a followup commit, we will amend existing portions of the codebase to
use these functions.
Test Plan:
Unit tests added; run `yarn travis`.
wchargin-branch: map-util
This commit adds another bank of sliders to the cred explorer, for
changing the directionality of edges. The sliders have the range [0,1]
with step size of 0.01.
The layout is pretty ugly and clearly should be refactored. But playing
with these sliders is interesting :)
Test plan: We don't have any unit tests on the WeightConfig, but I did
drive it by hand. An interesting experiment is to set the AUTHORS edge
directionality to 1, so that users get no credit for authoring posts. As
expected, this utterly tanks the users' scores; many users then have a
score of -Infinity.
Summary:
If we want to snapshot an edge, then none of the available options is
ideal:
- Snapshotting the edge directly includes literal NUL bytes in the
snapshot file, so it is treated as binary. This is bad.
- Using `edgeToString` works, but all fields of the edge are combined
into a single string, which is somewhat hard to read.
- Using `edgeToParts` works, but each address in the edge takes up a
lot of visual space: one line per part in the address. This is also
somewhat hard to read.
This commit adds `edgeToStrings`, which simply applies the appropriate
`toString` operation to each field of an edge.
Test Plan:
Unit tests added; run `yarn travis`.
wchargin-branch: edge-to-strings
The implementation is similar to the LocalStore usage in
`credExplorer/app.js`. Had to make a spurious refactor from Map to
Object because ES6 maps don't stringify by default, and I didn't feel
like writing a custom JSON serializer.
Test plan:
Didn't add unit tests, although at some point we should come up with a
nice LocalStore mock and test LocalStore code. I did, however, manually
try it out and verify that it works :)
Paired with @wchargin
Summary:
Prettier inserted these in a previous version of the code, but the lines
got shorter and so Prettier no longer minds if we remove the breaks.
Test Plan:
shipitquick
wchargin-branch: remove-line-breaks
Summary:
This change is motivated by the behavior of loops, but applies more
generally to edges. Previously, a loop would induce two contributions
with the same contributor, but possibly different weights: one with the
to-weight of the edge and one with the fro-weight. For one, this is
annoying to downstream clients, who would like to use the contributor as
a superkey. But it is also somewhat strange that a single contributor
could have two different weights.
The same applies to edges in general: every edge induces two
contributions with the same contributor, of type `NEIGHBOR`.
As of this commit, we replace `NEIGHBOR` with `IN_EDGE` and `OUT_EDGE`,
one of each induced by each edge. This has the effect that a contributor
maps to at most one contribution.
Test Plan:
Existing unit tests updated.
wchargin-branch: separate-in-out-edge-contributions
Summary:
When we convert a graph to a Markov chain, each cell in the transition
matrix is a sum of edge weights from the given `src` to the given `dst`,
plus the synthetic self-loop needed for stability. Performing this sum
loses information: given the transition matrix, a client cannot
determine how much a particular edge contributed to the score of a node
without redoing the relevant computations. In this commit, we expose the
structure of these contributions (i.e., edges and synthetic loops).
This changes the API of `graphToMarkovChain.js`, but it does not change
the resulting Markov chains. It also does not change the API of
`pagerank.js`. In particular, clients of `pagerank.js` will not have
access to the contributions structure that we have just created.
Test Plan:
Existing unit tests have been updated to use the new API, and pass
without change. An additional test is added for a newly exposed
function, even though this function is also tested extensively as part
of later downstream tests.
In one snapshot, one value changes from `0.25` to `0.25 + 1.7e-16`. The
other values in the enclosing distribution do not change, so I think
that it is more likely that this is due to floating-point instability
than an actual bug. (I’m not sure where exactly I commuted or associated
an operation, but it’s quite possible that I may have done so). To
compensate, I added an additional check that the values in the
stationary distribution sum to `1.0` within `1e-9` tolerance; this check
passes.
wchargin-branch: expose-contributions
Previously, when expanding a node in the cred explorer, it would display
the neighboring nodes, but not any information about the edges linking
to that node. If the same node was reached by multiple edges, this
information was not communicated to the user.
As of this commit, it now concisely communicates what kind of edge was
connecting the chosen node to its adjacencies. There's a new `edgeVerb`
method that plugin adapters must implement, which gives a
direction-based verb descriptiong of the edge, e.g. "authors" or "is
authored by".
Test plan:
Unit tests added to the PagerankTable tests, and hand inspection.
Paired with @wchargin
Summary:
This commit adds sliders for each node and edge type (hard-coded for
now), and hooks them up to the cred explorer so that re-running PageRank
uses the newly induced edge evaluator.
Paired with @decentralion.
Test Plan:
We will add tests later. We promise! In the meantime, the results that
appear when you drag a slider and re-run PageRank seem appropriate. For
instance, changing the “Git blob” node type from `0.0` to `-10.0`
results in the Git blobs not dominating the whole view.
wchargin-branch: configurable-weights-ui