Commit Graph

1033 Commits

Author SHA1 Message Date
Dandelion Mané 16edea6413
Remove the `graphView`s (#1171)
A long time ago, we made graph views for git and github. These are
interfaces over the graph which allow retrieving nodes' relations, e.g.
finding the parent address of a commit just using the graph.

These are fairly complex, and have seen almost no use at all. The one
thing they are used for is implementing invariant checking. The
invariant checking is nice in principle, but since we only apply it to
the example data, its of very limited value in practice.

Since I'm planning a significant Graph refactor (#1136), I'd rather delete this
code than continue to maintain it, since I think it's complexity/value
ratio is unfavorable.

Test plan: `yarn test`
2019-06-03 21:07:27 +03:00
Dandelion Mané fcbd024a83 Fix silently failing github token test
This is another minor silent test failure: the error message thrown by
loadIndividualPlugin when a GitHub token is not available is not quite
the error that was specified in the test.

There were two issues: that we were testing for the wrong error message,
and that the failure didn't fail the test. I fixed both issues (by
changing the message thrown to match the test, and by having the test
_return_ the expectation that the promise will reject, and by expecting
there to be one assertion.)

Test plan: `yarn unit cli/load` no longer shows any unhandled promise
rejection warnings. If the test is modified so that it checks for the
wrong string, it now properly fails rather than passing with an
unhandled rejection.
2019-06-02 11:27:36 +03:00
Dandelion Mané 7c4ff66907 Fix uncaught test failures
Prior to this commit, if you run `yarn unit cli/load`,
you would see a lot of unhandled promise rejection warnings related to
the fact that load calls a `saveTimestamps` function which expects the
SourceCred directory to contain the results of really loading SourceCred
plugins. However, when testing, these functions have been mocked, and so
saveTimestamps rejects. This rejection was not caught, and polluted the
test output without actually failing the tests.

This commit updates the tests so that saveTimestamps can be mocked (via
dependency injection) and we can both verify that it is invoked
correctly, and not pollute the test output with spurious warnings.

Test plan:
- `yarn test --full` passes
- `yarn unit cli/load` produces far fewer
UnhandledPromiseRejectionWarnings (there is still one unrelated one)
- loading sourcecred/research still works (as a canary)

Note: the PR which introduced this issue is #1162.
2019-06-02 11:27:36 +03:00
Dandelion Mané 1d32141f76
Fix `sourcecred load` for real repos (#1164)
As discussed in #1163, #1162 caused `sourcecred load` to start failing
for real repos (e.g. sourcecred/research). This commit merges a
somewhat hacky fix.

Test plan: `yarn test` passes, and `sourcecred load sourcecred/research`
works.
2019-05-31 20:16:58 +03:00
Tyler Mace 1fbf8cd587 Quicker failure and description when invalid token supplied (#1161)
Fixes #1156

When users export a GitHub API token that has insufficient privleges
or has been revoked, we have been using a catch all error with retry
to handle it. This change adds a new error type for bad credentials
and does not retry.

Test plan:
There are no unit tests that cover this, however, you can test the
change by supplying a revoked token and attempting to load a GitHub
repo.
2019-05-30 22:18:30 +03:00
Dandelion Mané ad2470e5c6
Aggregate timestamp information on sourcecred load (#1162)
This modifies `sourcecred load` so that it saves timestamp information
for all of the loaded plugins in a single aggregated map.

This is quite convenient, as it saves consumers of timestamp information
from needing to worry about the (rather hacky) implementation whereby
the data is fed from each adapter. Instead, consumers can just load the
timestamp map. This will also make it much easier to use timestamp info
in the research codebase.

Test plan: The timestampMap module has testing around generating the map
from the adapter and nodes, writing it, and reading it.

I haven't added any testing to the `load` CLI command. I think it would
be redundant as the updated snapshot test reveals that the map is
getting serialized properly.

Tests pass, and I have inspected the snapshot
2019-05-30 17:15:15 +03:00
Dandelion Mané 4dc97fcc57
PagerankGraph reuses existing distribution (#1160)
This commit modifies `PagerankGraph.runPagerank` so that rather than
always starting from a uniform distribution, it starts with the
PagerankGraph's existing score distribution. The PagerankGraph
initializes with a uniform score over nodes, so it has the exact same
behavior on the first time that runPagerank is called. On subsequent
calls, PageRank will likely converge a lot faster, because it's starting
from converged scores. (It should still be a lot faster in cases where
e.g. the user has tweaked the weights.)

In certain degerate cases, this could change the resultant scores.
Specifically, if there are isolated nodes in the graph and alpha=0, then
the isolated nodes' final scores depends on the initial score. In
general, I think this won't be an issue as we expect alpha > 0 in normal
usage.

Test plan: I added a unit test to verify this property, by checking that
running PageRank with maxIterations==0 on an already-converged graph
results in a still-converged graph. Also, existing tests pass.

I think we can now close #1020.
2019-05-29 20:29:28 +03:00
Dandelion Mané 14eee06799
Add a universal snapshot updater (#1159)
As SourceCred has evolved, we've grown more and more snapshot tests that
are not included in Jest. The GitHub plugin has two ad-hoc snapshot
tests, the Git plugin has one, and the sharness test suites have one.

This makes it difficult to keep track of where to update snapshots when
core changes are made. To fix this, I've added a script,
`scripts/update_snapshots.sh`, which updates snapshot tests across the
project.

Test plan: I removed existing snapshots across the codebase, ran the
snapshot tester, and they correctly regenerated.
2019-05-28 18:59:50 +03:00
Dandelion Mané e01247a642
Expose `createdAt` in `AnalysisAdapter` (#1157)
* Refactor Loader from AnalysisAdapter

At present, the only data the AnalysisAdapter provides is the Graph, so
the AnalysisAdapter has a `load` method which directly returns the
graph. I'm planning to add a `createdAt` getter to the adapter as well,
which also will depend on loading the data. To make this change
convenient, I'm starting by refactoring an AdapterLoader out, which
manages loading data from disk, so that once we have an AnalysisAdapter,
it already has all relevant data loaded. Then, it will be much easier to
add a `createdAt` method.

Test plan: Tests updated, flow passes.

* Add `createdAt` to the analysis adapter

A big step forward for time-varying cred. This will make `createdAt`
timestamps available for PageRank analysis.

Test plan: Added new unit tests. Inspect the snapshots. Run `yarn test`.
2019-05-28 18:03:44 +03:00
Dandelion Mané 61627531bb
Fix a build error induced by #1153 (#1154)
Thanks to @wchargin for [catching it].

[catching it]: https://github.com/sourcecred/sourcecred/issues/1151#issuecomment-494256526

Generated this change via:
```
$ yarn backend
$ (cd sharness; UPDATE_SNAPSHOT=1 ./test_load_example_github.t -l)
```

Test plan: `yarn test --full`, excluding the known (local-only) failure
described in #1151
2019-05-21 14:26:27 +03:00
Dandelion Mané da5bce255e
Add authorDate tracking to Git commits (#1153)
Modifies the Git plugin so that we now track commit author dates.
Similar to in #1152, they are encoded in MsSinceEpoch.

Test plan: `yarn test --full` passes, except for the pre-existing
failure discussed in #1151.

Thanks to @s-ben for a conversation which motivated these changes.
2019-05-21 06:56:57 +03:00
Dandelion Mané d51c0b6715
Add createdAt timestamp tracking to GitHub (#1152)
Updates github schema to include createdAt timestamps, and then updates
the RelationalView to provide those timestamps as MsSinceEpoch.

I added createdAt timestamps to Repos, Issues, Pulls, Reviews, and
Comments, as these correspond to GitHub graph nodes where I think
time-based filtering is relevant. I didn't add them to Users, Reactions,
or Commits. Reactions, because they correspond to edges not nodes. (We
could consider doing the time filtering on edges too, but I'd rather
keep it simple for now.) Commits, because they're owned by a different
plugin. Users, because... in a certain sense the user identity is
timeless, the time factoring is mostly so we can evaluate how users'
cred varies over time.

Anyway, it will be easy to add more fields later if we need them.

Test plan:
- Inspect snapshot changes
- Ran `yarn test --full`
  - Its only failure is pre-existing, per #1151

Thanks to @s-ben for some motivation and discussion about this change.
2019-05-21 06:16:16 +03:00
Dandelion Mané a831e05e5f
Add a WeightsFileManager (#1150)
This adds a WeightsFileManager component that allows the user to save or
load weights in the cred explorer. Clicking the download icon downloads
the weights, clicking the upload icon uploads them.

I also did a slight refactor to the FileUploader so that it no longer
always provides the file upload icon, instead the instantiator passes
children which act as the upload clickable. Seemed more consistent.

Test plan: No tests added, but I manually tested that upload and
download both work.
2019-05-21 04:41:00 +03:00
Dandelion Mané fb89559e44
Add a FileUploader component, with inspection test (#1149)
* Add FileUploader with inspection test

TODO: get it working

* Add a FileUploader component, with inspection test

This adds a FileUploader component, which allows the user to upload JSON
files. Rather than using automated testing, it has an inspection test.
The inspection test may be run by navigating to:
http://localhost:8080/test/FileUploader/

This commit also adds some basic utility functions for defining
inspection tests to `routeData.js`. We should improve support for
inspection tests in the future; see [#1148].

[#1148]: https://github.com/sourcecred/sourcecred/issues/1148

Test plan: Ran the included inspection test.
2019-05-20 17:12:57 +03:00
Dandelion Mané f6a4042342
Add JSON serialization support for weights (#1147)
Test plan: New unit tests added.
2019-05-20 14:12:12 +03:00
Dandelion Mané 03a8ab679b
Refactor the Weights module (#1146)
This commit refactors the `analysis/weights` module so that there's
a single top-level type called `Weights` which includes all of the
user-specified weights for running PageRank. This includes both manually
specified type weights, and per-node weights.

In contrast to the old weights module which had weighted types (i.e. the
type bundled with the weight), this only records the weight choice
(keyed by address). Similarly, the map is empty unless the user has
explicitly chosen a weight. This has a few advantages:

- For planned work on serializing weights, it's convenient that
there's a single unified object that bundles all the weights.

- For planned work on serializing weights, it's convenient that we only
record actual choices, not the default values (defaults are provided
within the WeightsToEdgeEvaluator). This way, we don't need to manually
filter out the default weights for serialization. (We want users who
have not made any choice about the type weights to automatically get new
type weightings we publish in sourcecred/sourcecred).

- Overall, the WeightConfig data piping has become a lot simpler.

While making these changes, I threw away the PluginWeightConfig
component, instead inlining it in the top-level WeightConfig. I also
moved responsibility for the WeightConfig out of the PagerankTable, and
into the containing App; the WeightConfig is created and wired by the
App, and then passed as a whole component into the PagerankTable via
props. This means that PagerankTable does not need to get a bunch of
extra props for piping the state changes into WeightConfig.

I also threw away some frontend test code for the WeightConfig
component. Over the course of the past few months, I've observed that
the value of a lot of the frontend-wiring-testing is pretty low (i.e.
they are not catching issues, because Flow catches them) and the cost of
maintaining the testing is high. I'm now inclined to believe that we
shouldn't be testing routine component wiring via explicit tests,
because it's very easy to verify that the behavior is correct by
interacting with the UI, and the tests are expensive to write and
maintain.

For "core logic" (e.g. behaviors of types and data structures,
algorithms, etc) I continue to believe testing is very valuable. One
heuristic I'll start applying is: "can I achieve high confidence that
this code is correct, without tests, using little effort by manually
inspecting the frontend?". If the answer is yes, I'm unlikely to write
tests.

Test plan: `yarn test` passes, and the UI continues to function,
including changing manual and type level weights and observing cred
changes.
2019-05-20 13:46:08 +03:00
Dandelion Mané 1ea3c1ec94 nit: remove fallbackDeclaration.js
In the previous commit, I failed to remove this file.
This is a cleanup.

Test plan: `yarn test`
2019-05-19 15:50:03 +03:00
Dandelion Mané 467b781788
Remove the fallback adapter (#1145)
Node and Edge types are increasingly important in SourceCred, as we use
them to decide what weights to provide, what description logic to use,
etc. In #631, we hit a bug around not finding a type matching a
particular node, and in #640 I responded by creating a
"FallbackAdapter", which matches any node, and a
"Static/DynamicAdapterSet", which are abstractions for finding adapters
in a way that guarantees the presence of the fallback adapter.

I think this solution is actually pretty brittle. It only works if we
are disciplined in only ever accessing node and edge types using a
context that included the FallbackAdapter, and it requires us to
centralize all of the "type-not-found" logic in one place (the fallback
declaration) irrespective of what the locally appropriate solution is.

Looking back at #631, the core issue was that we had a getter that
promised to always give a matching type and adapter, rather than
returning a possibly undefined adapter. We fixed this by attempting to
ensure that there always would be a matching adapter. I think it would
be better to have the methods honestly report that the adapter might be
null/undefined, and let the client code handle it appropriately.

This commit implements that change. It's motivated by me being
frustrated at the fallback adapter while doing other refactoring.

Test plan:
Unit tests have been updated and `yarn test` passes. Also, I
experimentally removed the Git plugin from the list of adapters for both
the backend and frontend, and verified that the frontend UI and the
pagerank and export-graph commands still worked.
2019-05-19 15:49:24 +03:00
Dandelion Mané 9d8d223653
Weights: Refactor around a new EdgeWeight type (#1144)
This refactors the weights module (and downstream consumers) so that
rather than tracking forwardWeight and backwardsWeight as separate
fields, we have an `EdgeWeight` type with a forwards and backwards
property. I feel this makes the code a little cleaner and wanted it a
few times while doing other refactors in this module.

Note that the graphToMarkovChain module has a distinct `EdgeWeight` type
which serves a similar purpose but happens to have different field
names. I've added a TODO to rename those fields for consistency.

Test plan: Since this is slight data structure re-organization, no new
tests are required; that `yarn test` passes is sufficient.
2019-05-19 14:17:46 +03:00
Dandelion Mané d2559960bb explorer: tweak weights on a per-node basis (#1143)
This pull request adds a weight slider to every NodeRow in the explorer,
enabling the user to manually set a weight for that node. The weights are
multiplicative with the type level weights, so that they can be changed
independently (e.g. you can have a comment that is weighted 2x higher than
regular comments, but still have comments get a low weight in general).

This pull coordinates a number of different changes across the codebase, all of
which are tested:

Adding support for manual weights in the weights and
weightsToEdgeEvaluator modules.
Modifying pagerankTable.TableRow so that it can show a slider in the second
column.
Adding piping for manual weights into the PagerankTable shared props, and
into the explorer app
Adding the slider to the NodeRow class that displays the current weight,
and can trigger the upstream weight change
Ensuring that the runPagerank call in the explorer actually uses the manual
weights
At present, there is no way to save these weights (they are ephemeral in the
frontend) and so this is clearly a prototype/tech demo level feature rather
than being ready for real usage. Correspondingly, CLI pagerank command always
uses an empty set of manual weights. I plan to remedy this in a follow-on pull
request.

Test plan: Run the included unit tests (yarn test) and also spin up the UI,
verify that it visually looks good in both Firefox and Chrome, and verify that
changing the weights and then re-running PageRank actually causes the cred of
the modified node to change.

Review plan: In addition to carefully reading the code, ensure that all of the
changes described a few paragraphs up are actually tested.

Merge plan: Squash and merge.

Thanks to @s-ben for proposing this feature in Discord, and to everyone
discussing its implications in this Discourse thread.
2019-05-18 19:21:17 +03:00
Dandelion Mané 2999d24593
Fix a conflict between #1140 and #1141 (#1142)
In #1140 I rename a field in PagerankGraph.ScoredNode, and in #1141 I
added a new test which referenced the field.

I merged them in quick succession, introducing a conflict. This PR fixes
it.

Test plan:
`yarn test` passes.
2019-05-17 17:11:22 +03:00
Dandelion Mané f046dd06a5
Add `setEdgeEvaluator` to PagerankGraph (#1141)
This commit adds a `setEdgeEvaluator` method to `PagerankGraph`,
and modifies the constructor to use that method. This will allow us to
use PagerankGraph in the explorer UI, so that we can update edge
weights without fully regenerating the graph.

Test plan: I've added new unit tests that verify basic properties of how
the edge weights are getting set and consumed.
2019-05-17 17:07:18 +03:00
Dandelion Mané f8c659a413
Rename `ScoredNode.node -> ScoredNode.address` (#1140)
It makes more sense that a ScoredNode has an address than that it has a
sub-node (which is an address).

Test plan: It's a simple field rename; `yarn test` suffices
2019-05-17 17:07:00 +03:00
Brian Litwin 0f038305a2 Add CLI command to clear sourcecred data directory (#1111)
Resolves #1067

Adds the CLI commands:
`sourcecred clear --all` -- removes the $SOURCECRED_DIRECTORY
`sourcecred clear --cache` -- removes the cache directory
`sourcecred clear --help` -- provides usage info
`sourcecred clear` -- prompts the user to be more specific

Test plan:
The unit tests ensure that the command is properly wired into the
 sourcecred CLI, including help text integration. However, just to be
safe, we can start by verifying that calling `sourcecred` without
arguments lists the `clear` command as a valid option, and that
calling `sourcecred help clear` prints help information. (Note: it's
necessary to run `yarn backend` before testing these changes)

The unit tests also ensure that the command removes the proper
directories, so there isn't really a need to manually test it,
although the reviewer may choose to do so to be safe.

Although out of scope for unit tests on this function, we can also do
integration tests, to make sure that running the clear command doesn't
leave the sourcecred directory in an invalid state from the perspective of the `load` command.

```js
$ yarn backend;
$ node bin/sourcecred.js load sourcecred/example-github;
$ node bin/sourcecred.js clear --cache;
$ node bin/sourcecred.js load sourcecred/example-github;
$ node bin/sourcecred.js clear --all;
$ node bin/sourcecred.js load sourcecred/example-github;
```
The expected behavior of the above command block is that the load command never fails or throws an error.

@decentralion and I discussed the scenario where `rimraf` errors.
We decided that testing this scenario wasn't necessary, because
`rimraf` doesn't error if a directory doesn't exist, and
rimraf's maintainer suggests [monkey-patching the fs module]
to get rimraf to error in testing scenarios.

Thanks @decentralion for reviewing and pair-programming this with me.

[monkey-patching the fs module]: https://github.com/isaacs/rimraf/issues/31#issuecomment-29534796
2019-05-13 12:59:58 +03:00
Dandelion Mané bed476517c
Port skeleton of Odyssey frontend (#1132)
This commit integrates an bare skeleton of the odyssey frontend that we
implemented in the [odyssey-hackathon] repository. You can see the
working frontend that we are trying to port over at
[sourcecred.io/odyssey-hackathon/][scio].

The prototype in the other repository has some tooling choices which are
incompatible/redundant with decisions in our codebase (sass vs
aphrodite), and requires some tools not yet present here
(svg-react-loader). This commit includes the build and integration work
needed to port the prototype frontend into mainline SourceCred. The
frontend scaffold isn't yet integrated with any "real" Odyssey data.

One potential issue: right now, every page that is rendered from the
SourceCred homepage is contained within a [homepage/Page], meaning that
it has full SourceCred website styling, along with the SourceCred
website header. The [application][scio] also has a header. Currently, I
work around this by having the Odyssey UI cover up the base header (via
absolute positioning), which works but is hacky. We can consider more
principled solutions:

- Finding a way to specify routes which aren't contained by
[homepage/Page]; maybe by adding a new top-level route
[here][route-alternative].
- Unify the headers for the Odyssey viewer and the page as a whole
(sounds like inappropriate entanglement?)
- Have a website header and also an application header (sounds ugly?)

[homepage/Page]: ee1d2fb996/src/homepage/Page.js
[route-alternative]: ee1d2fb996/src/homepage/createRoutes.js (L17)

Test plan: Run `yarn start`, and then navigate to
`localhost:8080/odyssey/`. observe that a working website is displayed,
and that the cred logo next to the word "SourceCred" is loaded properly
(i.e. svg-react-loader is integrated properly). Observe that there are
no build/compile errors from either `yarn start` or `yarn build`. Also,
observe that the UI looks passably nice, and that if the number of
elements in the entity lists is larger than can be displayed, the
sidebar pane scrolls independently.

The UI was tested in both Chrome and Firefox.

[odyssey-hackathon]: https://github.com/sourcecred/odyssey-hackathon
[scio]: https://sourcecred.io/odyssey-hackathon/

Thanks to @jmnemo, as the implementation is based on [his work].

[his work]: https://github.com/jmnemo/hackathon-event/
2019-05-06 18:15:39 +03:00
Dandelion Mané 9231085185
Initial data model for the Odyssey plugin (#1134)
This commit puts in a basic data model for the Odyssey plugin. It's
built around the `OdysseyInstance` class, which is basically a Graph
that keeps around descriptions for every node, and ensures that
nodes/edges are typed in accordance with the Odyssey plugin declaration.

In the future, I want to enable instances to declare their own node/edge
types, in which case the instance will assume responsibility for
tracking and serializing the types.

To make the Odyssey plugin a 'proper plugin', I've also added a plugin
declaration, as well as analysis and explorer adapters. I haven't
decided exactly where data for Odyssey instances should be stored, so
for now the plugin adapters always return an example instance which is
based on our experience at the Odyssey hackathon.

Test plan: The instance has unit tests for its logic.

If you want to see what the plugin looks like right now when it's
integrated, you can apply the following diff, and then load the
prototype. It will contain Odyssey nodes (find them using the node type
dropdown). Note that without a seed vector to move cred back to the
values/artifacts, the cred distribution in the Odyssey subgraph is
degenerate; the users are all sinks and have postiive cred scores, but
all the other nodes converge to 0 cred.

diff --git a/src/homepage/homepageExplorer.js b/src/homepage/homepageExplorer.js
index cae4548..48987f1 100644
--- a/src/homepage/homepageExplorer.js
+++ b/src/homepage/homepageExplorer.js
@@ -6,6 +6,7 @@ import type {Assets} from "../webutil/assets";
 import {StaticExplorerAdapterSet} from "../explorer/adapters/explorerAdapterSet";
 import {StaticExplorerAdapter as GithubAdapter} from "../plugins/github/explorerAdapter";
 import {StaticExplorerAdapter as GitAdapter} from "../plugins/git/explorerAdapter";
+import {StaticExplorerAdapter as OdysseyAdapter} from "../plugins/odyssey/explorerAdapter";
 import {GithubGitGateway} from "../plugins/github/githubGitGateway";
 import {AppPage} from "../explorer/App";
 import type {RepoId} from "../core/repoId";
@@ -14,6 +15,7 @@ function homepageStaticAdapters(): StaticExplorerAdapterSet {
   return new StaticExplorerAdapterSet([
     new GithubAdapter(),
     new GitAdapter(new GithubGitGateway()),
+    new OdysseyAdapter(),
   ]);
 }
2019-05-06 11:54:07 +03:00
Dandelion Mané 79017a477b
Add support for seed vectors to PagerankGraph (#1135)
This commit modifies `PagerankGraph.runPagerank` so that the user can
provide an alpha and seed vector. The seed vector is specified via a map
of weights, which will be normalized into a probability distribution
over all the nodes in the graph. In the event that the map is empty (or
the total weight is otherwise 0), a uniform distribution is created.

To effect this change, a helper function called `weightedDistribution`
has been added (and thoroughly tested) in the `graphToMarkovChain`
module. Then, that function is used in `pagerankGraph.runPagerank`
(along with light testing).

Currently, the default alpha is set to 0, to ensure consistency with the
legacy pagerank implementation in `analysis/pagerank`. Once that has
been replaced with `PagerankGraph`, we can consider changing the defualt
alpha to non-zero (thus removing the need for synthetic self-loops).

I took a different approach in the [odyssey-hackathon repo][commit].
The previous approach was a much more complicated (and fairly redundant)
API, that allowed specifying "NO_SEED", "UNIFORM_SEED", "SELECTED_SEED",
and "SPECIFIED_SEED". I'm much happier with this API and implementation.

[commit]: ed07861073

Test plan: Unit tests included; run `yarn test`.
2019-05-05 18:57:41 +03:00
Dandelion Mané e7bc025379
Add support for PageRank Seed Vectors (#1128)
Summary:
the cred calculation is defined by a Markov Mixing process. By
introducing the seed vector and teleportation parameter alpha, the
Markov mixing process is augmented with a source of cred originating
from the seed vector. The resulting algorithm is the generalized
variation of PageRank, allowing computation of both canonical PageRank
where the seed vector is the uniform distribution and personalized
PageRank where the seed vector is an indicator distribution. It is still
possible to get the simple markov chain solution by setting alpha = 0.

Note that this changes the Markov process state update, but does not
provide updates to the APIs. All existing behavior is unchanged because
alpha is always set to 0.

This is a port of
https://github.com/sourcecred/odyssey-hackathon/pull/3,
which was created during the Odyssey Hackathon.

Test Plan:

Existing tests have been extended to include passing alpha = 0 to
reproduce exisiting test cases for the simple Markov Process. Additional
test cases include
 - Verifying that resulting stationary distribution is unaffected by seed when alpha = 0
 - Verifying that resulting stationary distribution is precisely equal to seed when alpha = 1
 - Verifying that the resulting stationary distribution is linear in the seed vector
 - Verifying that the correct stationary distribution is computed for non-zero alpha
 - verify that the algorithm converges immediately when the initialDistribution is the stationary distribution
 - verify that the changing the initialDistribution does not change the stationary distribution

Paired with @mzargham
2019-04-24 16:37:16 +03:00
Dandelion Mané ee1d2fb996
Make PagerankGraph convergence options optional (#1131)
Right now PagerankGraph requires that the user choose specific values
for maxIterations and convergenceThreshold when running PageRank.

I also rename `PagerankConvergenceOptions` to `PagerankOptions`.

The motivation is that I want to add future arguments to the same
options dict (e.g. alpha and the seed vector), so the rename is
appropriate, and allowing the options to be unset (and thus inherit
default values) will make the whole API much cleaner as I add more
options.

Test plan: Unit test added. `yarn test` passes.
2019-04-21 15:35:09 +03:00
Dandelion Mané a8a3f4fc3a
refactor args to findStationaryDistribution (#1130)
In [#1128: Add support for seed vectors][#1128], we significantly
increase the number of arguments to
markovChain.findStationaryDistribution. To clean up the invocations, I
added a followon PR (#1129) which converts findStationaryDistribution to
use a `PagerankParams` object instead.

However, I think it will be cleaner to land the PagerankParams refactor
before adding new features in #1128, so I'm making this PR as
pre-cleanup.

Test plan: This is a trivial refactor. `yarn test` passes.

[#1128]: https://github.com/sourcecred/sourcecred/pull/1128
2019-04-21 14:00:30 +03:00
Dandelion Mané 6dd58a9c67
Use aphrodite for HomePage.js styling (#1127)
This is a minor refactor so that we use Aphrodite for styling on
HomePage.js. It's not super consequential, but I want to switch to using
Aphrodite more consistently in the codebase, so why not start here.

Test plan:
`yarn test` reveals no errors.
`yarn start` launches a correctly styled frontend.
I also used `build_static_site.sh` and the resultant site is also
correctly styled.
2019-04-19 17:10:38 +03:00
Dandelion Mané e465919281
Add sourcecred/{research,pm} to sourcecred.io (#1125)
Test plan: Carefully read the diff
2019-04-12 10:14:38 +02:00
Dandelion Mané 7efcc13618
Automatically run pagerank on `sourcecred load` (#1115)
This commit updates the `sourcecred load` command so that it also
automatically runs PageRank on completion.

The implementation is slightly hacky, in that it prints two sets of
task status headers/footers to console, for reasons described in a
comment in the source code. The user-visible effect of this hack can
be seen below:

```
❯ node bin/sourcecred.js load sourcecred/example-github

Starting tasks
  GO   load-git
  GO   load-github
 DONE  load-github
 DONE  load-git

Overview
Final result:  SUCCESS

Starting tasks
  GO   run-pagerank
 DONE  run-pagerank

Overview
Final result:  SUCCESS
```

It would be good to clean this up, but for now I think it's acceptable.

Note that it is not safe to assume that a PagerankGraph always exists
for repos that are included in the RepoIdRegistry. The repo gets added
to the registry before the pagerank task runs. Consumers that are
loading the `PagerankGraph` can just check that the file exists, though.

Test plan: I've added unit tests that verify that the right tasks are
generated. Most importantly, the snapshot of the results of `sourcecred
load` now include a snapshotted pagerank graph.
(The snapshot was updated via `UPDATE_SNAPSHOT=1 yarn test --full`.)

Further progress on #967.
2019-04-11 21:21:29 +02:00
Dandelion Mané fb6c9e1ba0
Make tests use SOURCECRED_GITHUB_TOKEN (#1124)
Across SourceCred usage, we depend on the `SOURCECRED_GITHUB_TOKEN`
environment variable being set. Confusingly, some tests expect
`GITHUB_TOKEN` instead of `SOURCECRED_GITHUB_TOKEN`.

This commit resolves that inconsistency, by having all tests that read
from the environment use `SOURCECRED_GITHUB_TOKEN`. This was already
available as a secret in our CI configuration, so no change is needed
there. (After this merges, we may remove the GITHUB_TOKEN variable from
the environment.)

Test plan: `yarn test --full` passes without the environment variable
set. Also, the following grep produces only innocuous hits.

```
git grep -P "(?<\!SOURCECRED_)GITHUB_TOKEN"
```
2019-04-11 21:18:24 +02:00
Dandelion Mané 320a69759e
refactor: `load` uses dependency injection (#1123)
This commit refactors the `sourcecred load` CLI command so that it uses
dependency injection, much like the testing setup #1110. This makes it
feasible to test "surface logic" of how the CLI parses flags and
transforms them into data separately from the "piping logic" of invoking
the right API calls using that data.

This is motivated by the fact that I have other pulls on the way that
modify the `load` command (e.g. #1115) and testing them within the
current framework is onerous.

Test plan:
This is a pure refactoring commit, which substantially re-writes the
unit tests. The new unit tests pass (`yarn test --full` is happy).

Note that `yarn test -full` also includes a sharness test that does an
E2E usage of `sourcecred load`
(see sharness/test_load_example_github.t), so we may be confident that
the command still works as intended.
2019-04-11 18:59:08 +02:00
Seth Benton 13a90675a8 Fixes broken link README (#1122)
This commit fixes three broken links (two in the README, one in the prototype app) that were still pointing to https://discuss.sourcecred.io/.

Test plan:
Verify that there are no other bad links to the old Discourse location, by running `git grep "discuss.sourcecred.io"`.
2019-03-26 16:01:39 -07:00
Dandelion Mané 012c4f3eb7
Add `sourcecred pagerank` for backend pagerank (#1114)
This commit adds a new CLI command, `pagerank`, which runs PageRank on a
given repository. At present, the command only ever uses the default
weights, although I plan to make this configurable in the future. The
command then saves the resultant pagerank graph in the SourceCred
directory.

On its own, this command is not yet very compelling, as it doesn't
present any easily-consumed information (e.g. users' scores). However,
it is the first step for building other commands which do just that. My
intention is to make running this command the last step of `sourcecred
load`, so that future commands may assume the existence of pagerank
scores for any loaded repository.

Test plan: The new command is thoroughly tested; see
`cli/pagerank.test.js`. It also has nearly perfect code coverage (one
line missing, the dependency-injected real function for loading graphs).

Additionally, the following sequence of commands works:
```
$ yarn backend
$ node bin/sourcecred.js load sourcecred/pm
$ node bin/sourcecred.js pagerank sourcecred/pm
$ cat $SOURCECRED_DIRECTORY/data/sourcecred/pm/pagerankGraph.json
```

Material progress on #967.
2019-03-25 18:05:58 -07:00
Dandelion Mané 669f34d009
Add `fetchGithubOrg` for loading organizations (#1117)
This commit adds a module, `fetchGithubOrg`, which loads data on GitHub
organizations, most notably including the list of repositories in that
org.

The structure of this commit is heavily influenced by review feedback
from @wchargin's [review] of a related PR.

Test plan: This logic depends on actually hitting GitHub's API, so the
tests are modeled off the related `fetchGithubRepo` module. There is a
new shell test, `src/plugins/github/fetchGithubOrgTest.sh`, which
verifies that that the org loading logic works via a snapshot.

To verify the correctness of this commit, I've performed the following
checks:

- `yarn test --full` passes
- inspection of `src/plugins/github/example/example-organization.json`
confirms that the list of repositories matches the repos for the
"sourcecred-test-organization" organization
- manually breaking the snapshot (by removing a repo from the snapshot)
causes `yarn test --full` to fail
- running `src/plugins/github/fetchGithubOrgTest.sh -u` restores the
snapshot, afterwhich `yarn test --full` passes again.

[review]: https://github.com/sourcecred/sourcecred/pull/1089#pullrequestreview-204577637
2019-03-19 19:00:08 -07:00
Dandelion Mané bd8be01958
Refactor loadGraph out of exportGraph (#1113)
This pulls the logic for loading a SourceCred graph from disk out
`cli/exportGraph` and into `analysis/loadGraph`. The rationale is that
`exportGraph` is not the only command that wants the ability to load a
graph from the analysis adapters.

The new command has a clean return signature that reports whether the
load was successful, or failed because the graph wasn't loaded, or
failed due to an error in plugin code.

Testing of the loading logic has been moved to `loadGraph.test`, and the
CLI has been refactored so that the loadGraph method is dependency
injected. This allows for (IMO) cleaner testing of the CLI method.

There is one (deliberate) change in behavior, which is that the command no
longer throws an error if no plugins are included; instead it will just
export an empty graph. I don't have a strong preference between the two
behaviors; changing it was just more convenient.

Test plan: New unit tests have been added, and tests of the cli command
have been re-written. As a sanity check, I've verified that the
following sequence still works:

```
$ yarn backend
$ node bin/sourcecred.js load sourcecred/pm
$ node bin/sourcecred.js export-graph sourcecred/pm
```

Nearly perfect code coverage is maintained. One line is uncovered, and
it's the line that injects in the actual graph loading behavior.
2019-03-13 00:24:09 -06:00
Dandelion Mané d1936fbf93
PagerankGraph: add neighbors + score decomposition (#1094)
This commit adds a `neighbors` method to `PagerankGraph`. This is an
augmented version of `Graph.neighbors`. It returns the base data from
`Graph.neighbors` as well as the score, the edge weights, and the score
contribution. The score contribution basically means how much score was
contributed from the target node by this particular neighbor connection.

When the graph is well-converged, a node's score will be the sum of all
its neighbors' score contributions, as well as the contribution it
received from its synthetic loop edge. So, for completeness sake, I
added another method, `syntheticLoopScoreContribution`, which computes
how much score a node received from its synthetic loop edge. (This value
should usually be close to 0).

You can think of these two methods as providing a replacement for the
`PagerankNodeDecomposition` logic.

Test plan: I've added tests that verify:
- That neighbors returns results consistent with Graph.neighbors
- That neighbors' score contributions are computed correctly
- That neighbors errors if the graph has been modified
- That synthetic score contributions are computed correctly
- That a node's score is the sum of all its contributions

Test plan: Unit tests included. Run `yarn test`.
2019-03-08 15:02:00 -07:00
Dandelion Mané 441d6df255
Move default pagerank settings to pagerankGraph (#1112)
This commit moves the default Pagerank options out of
`analysis/pagerank` and to `core/pagerankGraph`. This reflects the
gradual migration of core pagerank logic into `pagerankGraph`.

Test plan: `yarn test` should suffice. It's a trivial change.
2019-03-07 23:04:07 -07:00
Ana Noemi c48b2cd52e
Node and edge description tooltips (#1081)
* Show tooltips in weightConfig UI

* Updated to pass checks from prettier

* Updates unit tests to check WeightSlider descriptions

* Update CHANGELOG.md to reflect PR #1081
2019-03-07 18:49:27 +09:00
Dandelion Mané 996899ade3
Add CLI command: `sourcecred export-graph` (#1110)
* Add CLI command: `sourcecred export-graph`

This adds an `export-graph` command to the SourceCred CLI. It exports
the combined cred graphs for individual RepoIds, as was done for
[sourcecred/research#4].

Example usage:
```
$ node bin/sourcecred.js load sourcecred/mission
$ node bin/sourcecred.js export-graph sourcecred/mission >
  /tmp/mission_graph.json
```

Test plan:
The new command is thoroughly unit tested.

[sourcecred/research#4]: https://github.com/sourcecred/research/pull/4

* Address review feedback by @wchargin
2019-03-01 15:33:40 -07:00
Dandelion Mané b561b1728b
refactor repoIdRegistry (#1109)
This commit makes several improvements to `repoIdRegistry`:

- Create `writeRegistry` and `getRegistry` methods to abstract over
  needing to find the right file, populate an empty registry if its not
  available, etc.
- Create `getEntry` for efficiently checking whether a RepoId is in the
  registry
- Rename `addRepoId` to `addEntry` for consistency
- Add docstrings

The `load` command has been refactored to use the new methods.

Test plan: Unit tests added, and they pass. The `load` command is
already thoroughly tested, so regressions are very unlikely.
2019-03-01 11:25:10 -07:00
Brian Litwin b16c374a2b
pagerankGraph: add edge filter (#1105)
Part of ongoing work for #1020.

Test plan:
Added tests that mirror the edge filtering tests in `graph.test`
to check that `graph` and `pagerankGraph` return the same edges
with the given `EdgesOptions` parameter. Also added a sanity check
that a `weight` prop is returned from the iterator along with the edge.

Given the dependence on a helper function to test the edge
iterator's equality between graphs, I would suggest reviewers give
particular attention to that function:
`expectConsistentEdges()`
2019-02-27 21:44:21 -05:00
William Chargin 656a2d1543
meta: add .mailmap entry for Dandelion (#1108)
Summary:
@decentralion has used two emails to commit to Git: one exclusively
prior to 2018-05-21 and one exclusively after that date. This commit
adds a mailmap file to list their canonical email address. For more
information, see `man git-check-mailmap`:
<https://www.git-scm.com/docs/git-check-mailmap>

Test Plan:
See `git log --format=%ad --author=dl@` for dates of commits under the
old email, and `git log --format=%ad --author=dandelion@` for dates of
commits under the new email, to confirm the date ranges listed above.

Before this change:

    $ git shortlog -nse | head -3
       443	William Chargin <wchargin@gmail.com>
       291	Dandelion Mané <decentralion@dandelion.io>
       129	Dandelion Mané <dl@dandelion.io>

After this change:

    $ git shortlog -nse | head -3
       444	William Chargin <wchargin@gmail.com>
       420	Dandelion Mané <decentralion@dandelion.io>
        12	Brian Litwin <brian.n.litwin@gmail.com>

wchargin-branch: dandelion-mailmap
2019-02-26 15:46:06 +11:00
Brian Litwin 8772daa8b8
Update Contributing.md (#1107)
We switched from marking beginner-friendly issues "Contributions Welcome"
to "Good First Issue". See sourcecred/pm#15 for discussion.

Test Plan:
The new link works correctly on my local fork.
2019-02-23 08:37:32 -05:00
Dandelion Mané 8f6a3f30bd
PagerankGraph: Add `totalOutWeight` (#1092)
This commit adds a `totalOutWeight` method to `PagerankGraph`.
For any given node, `totalOutWeight` reports the total weight traveling
away from the node on edges (including the synthetic loop edge). Using
totalOutWeight makes it possible to normalize the weights to get the
actual markov transition probabilities.

Test plan: Unit tests verify the following properties:
- An error is thrown if the requested node does not exist.
- An error is thrown if the graph has been modified.
- The out weights are computed correctly in the standard case.
- The out weights are computed correctly in the case where there are no
weights (except the synthetic loop weight)
- The out weights are still computed correctly after
JSON-deserialization.
2019-02-22 15:14:38 -07:00
Brian Litwin bd669f292f
Refactor pagerankGraph's node filter to throw error at call site (#1106)
Inspired by a [suggestion] @decentralion made to improve #1105
This will enable `pagerankGraph` to throw an error when it is
called with invalid option parameters. Previously, to elicit
this error we had to access the iterator through `Array.from()`
or similar.

Test plan:
Yarn test passes.
Specifically, I removed the `Array.from()` wrapper around `pagerankGraph`
in the test that checks to see that `pagerankGraph` throws an error when
`nodes()` is passed invalid options.

[suggestion]: https://github.com/sourcecred/sourcecred/pull/1105#pullrequestreview-206496537
2019-02-21 19:25:28 -05:00
Brian Litwin 42669cd160
PagerankTable: Replace topLevelFilter with NodeType in props (#1103)
The motivation for this change is to make it easier
to access the selected Node's `name` prop for #576,
in which we plan to show a Card displaying summary
stats for the selected node. With only the `topLevelFilter`
available, it's trickier than it needs to be to find out
a node type's `name`.

Test Plan:
* Yarn test passes.
* Visual/Manual inspection of table doesn't surface any issues.
* Updated `it("filter defaults to defaultNodeFilter if available")`
to `it("selectedNodeType defaults to defaultNodeType if available")`.
* Verified that the above new test is failable in several ways by
mangling the tests to test for the wrong node type and mangling the
code to set the wrong node type.
* Since we factored out 'topLevelFilter' and 'defaultNodeFilter',
running `git grep -i topLevelFilter` and `git grep -i defaultNodeFilter`
turns up empty, just to make sure those terms aren't hanging
around to confuse anybody in the future.
* I don't think changing the `prop` parameter warrants any
additional tests, as the current tests verify that the prop
is passed in correctly.

This was at @decentralion's suggestion, following the Contributing
Guideline's Kent Beck quote of making the easy change to make the
change we were originally after (#576) easier. 🙌
2019-02-21 14:27:18 -05:00