Commit Graph

731 Commits

Author SHA1 Message Date
William Chargin 85efa811e0
mirror: use `SchemaInfo` in `_initialize` (#858)
Summary:
This simplifies and clarifies the code with no observable change.

Test Plan:
Existing unit tests suffice; run `yarn unit`.

wchargin-branch: mirror-use-schemainfo
2018-09-18 16:30:43 -07:00
William Chargin e69ff57c58
mirror: precompute some useful schema info (#857)
Summary:
This is mostly useful not for computational efficiency, but for ease of
implementation: there end up being multiple places where we want to find
(say) the primitive fields on an object, and having to go through the
whole iterate-and-switch-and-push process repeatedly is annoying.

Test Plan:
Unit tests included, with full coverage; run `yarn unit`.

wchargin-branch: mirror-schema-info
2018-09-18 16:24:38 -07:00
William Chargin 1b1a1e4d46
mirror: embed GraphQL schema into SQL (#849)
Summary:
This commit augments the `Mirror` constructor to turn the provided
GraphQL schema into a SQL schema, with which it initializes the backing
database. The schema is roughly as originally described in #622, with
some changes (primarily: we omit `WITHOUT ROWID`; we add indexes; we
store `total_count` on connections; and we use milliseconds instead of
seconds for epoch time).

Test Plan:
Unit tests included, with full coverage; run `yarn unit`.

wchargin-branch: mirror-sql-schema
2018-09-18 13:31:34 -07:00
Dandelion Mané 786a38e773
Improve CI performance by limiting max workers (#856)
According to the [jest docs], setting `maxWorkers=4` can substantially
improve performance on CI.

This commit sets `maxWorkers=4` when running `yarn unit` as a part of
`yarn test`. Based on my local testing (see data below), this improves
performance locally in addition to the expected performance improvement
on travis.

\## Testing `yarn unit` by itself
```
yarn unit --maxWorkers=1
15.28s user 3.71s system 112% cpu 16.848 total
15.13s user 3.68s system 112% cpu 16.708 total
15.32s user 3.76s system 112% cpu 16.917 total
15.91s user 3.74s system 112% cpu 17.425 total
15.61s user 3.76s system 113% cpu 17.125 total

yarn unit --maxWorkers=2
19.43s user 4.03s system 212% cpu 11.061 total
19.86s user 4.19s system 210% cpu 11.407 total
21.19s user 4.26s system 213% cpu 11.902 total
20.68s user 4.51s system 212% cpu 11.873 total
20.78s user 4.26s system 212% cpu 11.780 total

yarn unit --maxWorkers=4
29.43s user 5.14s system 389% cpu 8.865 total
29.99s user 5.18s system 392% cpu 8.961 total
32.22s user 5.52s system 390% cpu 9.659 total
33.99s user 5.97s system 395% cpu 10.097 total
33.38s user 5.93s system 395% cpu 9.933 total

yarn unit --maxWorkers=8
48.21s user 6.57s system 621% cpu 8.815 total
51.61s user 7.16s system 610% cpu 9.622 total
59.48s user 7.82s system 621% cpu 10.833 total
58.18s user 8.10s system 624% cpu 10.607 total
58.92s user 8.22s system 620% cpu 10.817 total

unset
46.27s user 6.44s system 599% cpu 8.799 total
49.08s user 7.04s system 600% cpu 9.342 total
54.85s user 7.52s system 600% cpu 10.383 total
55.66s user 7.52s system 605% cpu 10.438 total
53.77s user 7.50s system 604% cpu 10.142 total
```

\## Testing `yarn test`
```
maxWorkers=1
46.65s user 5.92s system 249% cpu 21.038 total
47.94s user 5.81s system 251% cpu 21.354 total
51.50s user 6.44s system 260% cpu 22.234 total
52.60s user 6.65s system 268% cpu 22.077 total
53.04s user 6.27s system 266% cpu 22.278 total

maxWorkers=2
56.13s user 6.13s system 409% cpu 15.204 total
63.32s user 7.22s system 412% cpu 17.091 total
64.82s user 7.19s system 422% cpu 17.027 total
64.59s user 7.41s system 417% cpu 17.227 total
65.40s user 7.30s system 419% cpu 17.318 total

maxWorkers=4
74.64s user 7.60s system 584% cpu 14.066 total
82.69s user 8.43s system 582% cpu 15.643 total
85.00s user 8.68s system 591% cpu 15.835 total
84.81s user 8.58s system 595% cpu 15.690 total
85.22s user 8.59s system 596% cpu 15.719 total

maxWorkers=4 and everything depends on unit
59.29s user 6.01s system 378% cpu 17.261 total
62.99s user 6.64s system 375% cpu 18.564 total
65.54s user 7.31s system 375% cpu 19.419 total
63.24s user 7.13s system 379% cpu 18.548 total
63.68s user 7.13s system 383% cpu 18.457 total

maxWorkers=8
92.85s user 8.13s system 643% cpu 15.702 total
101.63s user 9.21s system 632% cpu 17.510 total
101.63s user 9.23s system 636% cpu 17.428 total
101.81s user 9.32s system 633% cpu 17.546 total
101.62s user 9.39s system 632% cpu 17.542 total

unset
88.75s user 8.15s system 646% cpu 14.988 total
96.43s user 9.23s system 631% cpu 16.739 total
98.27s user 9.17s system 638% cpu 16.819 total
98.46s user 9.01s system 642% cpu 16.729 total
98.53s user 9.15s system 637% cpu 16.889 total

unset + everything depends on unit
76.02s user 7.61s system 486% cpu 17.208 total
79.14s user 8.26s system 484% cpu 18.030 total
84.32s user 9.19s system 488% cpu 19.136 total
84.92s user 9.14s system 497% cpu 18.919 total
84.46s user 8.94s system 492% cpu 18.965 total
```

Test plan: `yarn test` passes here and on travis

[jest docs]: https://jestjs.io/docs/en/troubleshooting
2018-09-18 13:17:51 -07:00
Dandelion Mané f03e3c83f2
Fix a crash when a review comment has >5 reactions (#855)
Our continuation-fetching code failed to properly get continuations for
pull request review comments, because it was only asking for more
reactions on `"IssueComment"` fragments. This caused the
`ensureNoMorePages` function to properly throw an error rather than
proceding with incomplete data.

This commit fixes the root cause by splitting
`continuationsFromComment`into `continuationsFromReviewComment` and
`continuationsFromIssueComment`. (Pull and issue comments are both
considered 'IssueComment's.) The example-github repository has been
updated to include 10 reactions to a single review comment; the
example-data was updated in this commit, and all reactions have been
loaded.

I've also added a `console.error` statement in `ensureNoMorePages`. This
only triggers when the program is about to fail, and it's useful for
debugging.

Test plan: `yarn test --full` passes.

Paired with @wchargin
2018-09-18 12:52:05 -07:00
William Chargin fa81c4eaa9
git: properly load empty repositories (#851)
Summary:
Fixes #850.

Test Plan:
Regression test added; it fails before the change and passes after it.
Also, running `node ./bin/sourcecred.js load wchargin/mt` (which is a
GitHub repository with no commits) now successfully loads the
repository. (The cred explorer fails to process it, because it tries to
normalize across GitHub users, of which there are none, but this is a
known limitation and is unrelated.)

wchargin-branch: fix-empty-git-repository
2018-09-17 16:36:22 -07:00
Dandelion Mané 7259233f82
Prepare to enable flow-type eslint rules (#848)
This commit upgrades the flow-type eslint plugin to latest, and writes
new rules into the eslintrc. To keep the diff clean, the rules are
disabled: I will turn them on individually (fixing errors) in followon
commits.

Test plan: `yarn test`.
Uncommenting the lines produces many lint errors (but the linter still operates as expected).
2018-09-17 14:11:39 -07:00
William Chargin a93ad80ebc
mirror: initialize a GraphQL database mirror (#847)
Summary:
This commit introduces the `Mirror` class that will be the centerpiece
of the persistent-loading API as described in #622. An instance of this
class represents a mirror of a remote GraphQL database, defined by a
particular schema. In this commit, we add the construction logic, which
includes a safety measure to ensure that the database is used within one
version of the code and schema.

Test Plan:
Unit tests included, with full coverage; run `yarn unit`.

wchargin-branch: mirror-class
2018-09-17 13:53:08 -07:00
Dandelion Mané 62d3c180ee
Add GitHub reactions to the graph (#846)
* Define Reaction edges

This adds support to `github/edges` for creating edges representing
GitHub reactions. These edges are not actually added to the graph.

Test plan: Unit tests

* Add GitHub reactions to the graph

This commit adds functional support for reactions in SourceCred.
Only thumbs-up, heart, and hooray reactions are supported for now, as
they are all unambiguously positive; adding support for negative
reactions like thumbs-down will require some more thought.

The reactions are added to the graph, and new edge types have been added
to the UI.

Test plan:
The `graphView` class has been updated to do invariant checking for the
reaction edges, including that the unsupported reaction types like
"THUMBS_DOWN" aren't added to the graph.

I've tested this feature by downloading data for a large repository
(ipfs/go-ipfs). The reaction edges appear and transfer cred reasonably.
The edge types are displayed in the weight config appropriately.

Builds on #839, #840, and #845.
2018-09-17 13:44:11 -07:00
Dandelion Mané 488c98c3e1
Add tensorflow-gardener to the list of bots (#841)
Test plan: I verified that the spelling is correct
2018-09-17 13:44:02 -07:00
Dandelion Mané 33d14b9d1a
Define Reaction edges (#845)
This adds support to `github/edges` for creating edges representing
GitHub reactions. These edges are not actually added to the graph.

Test plan: Unit tests
2018-09-17 13:35:47 -07:00
William Chargin e9279bee90
mirror: add a helper function for transactions (#844)
Summary:
In implementing #622, we’ll want to run lots of things inside of
transactions. This commit introduces a JavaScript API to do so more
easily, properly handling success and failure cases.

Test Plan:
Unit tests included, with full coverage; run `yarn unit`.

wchargin-branch: mirror-transaction-helper
2018-09-17 13:33:10 -07:00
William Chargin f966ce300f
schema: make `fields` and `clauses` exact (#843)
Summary:
This affords more flexibility to clients, because an exact value can be
used in place of an inexact value, but not vice versa.

Test Plan:
Running `yarn flow` suffices.

wchargin-branch: schema-exact-type-fields
2018-09-17 12:07:52 -07:00
Dandelion Mané 1ad2cc0958
Request reactions data from GitHub (#839) (#840)
This commit updates the GitHub graphql query to also fetch reactions.
We update the JSON typedefs to include this new information, add
continuations from comments, and update existing continuation and query
code. Also, I added a safety check when updating comments for issues
that was previously unnecessary but is now needed.

Test plan:
- `yarn test --full` passes.
- Setting the page limits to 1 and running on the example-github does
not error with unexhausted pages, and loads all the expected reactions.
- Running on a larger repository (go-ipfs) works as expected.
- I have written dependent code that consumes these reactions in the
RelationalView, and works as intended, which suggests that the type
signatures are correct.
2018-09-17 11:47:37 -07:00
William Chargin 51461f4842
flow: mark bound parameters as covariant (#842)
Summary:
Before this patch, an object whose type has read-only attributes cannot
be passed to `stmt.run`/etc., because the libdef does not promise not to
mutate its argument. This patch fixes that oversight.

Test Plan:
Create the following test file, and note that it fails to typecheck
before this change but passes after:

```js
// @flow
import Database from "better-sqlite3";
const db = new Database("/tmp/foo");
const args: {|+id: number|} = {id: 1};
db.prepare("INSERT INTO foo (id) VALUES (?)").run(args);
```

wchargin-branch: flow-better-sqlite3-bound-parameters-covariant
2018-09-17 11:15:04 -07:00
Dandelion Mané a2ffdf5ca8
Request reactions data from GitHub (#839)
This commit updates the GitHub graphql query to also fetch reactions.
We update the JSON typedefs to include this new information, add
continuations from comments, and update existing continuation and query
code. Also, I added a safety check when updating comments for issues
that was previously unnecessary but is now needed.

Test plan:
- `yarn test --full` passes.
- Setting the page limits to 1 and running on the example-github does
not error with unexhausted pages, and loads all the expected reactions.
- Running on a larger repository (go-ipfs) works as expected.
- I have written dependent code that consumes these reactions in the
RelationalView, and works as intended, which suggests that the type
signatures are correct.
2018-09-14 16:10:31 -07:00
Dandelion Mané aecf64b026
Detect references to commits (#833)
Now that #832 gave us logic to parse references to commits, we have the
RelationalView find and add these references. The actual change is
a simple extension of existing reference detection logic.

Test plan: Observe that the snapshots are updated with references to
commits from the example-github repository.

Progress on #815.
2018-09-14 11:56:16 -07:00
William Chargin c616ec82fb
flow: add type definitions for `better-sqlite3` (#837)
Summary:
The `flow-typed` repository doesn’t have type definitions for
`better-sqlite3`, so I wrote some. I figure that we can use them for a
while, fix any problems that arise, and then PR them upstream.

I started from [the TypeScript definitions][1], but made some
improvements (like stronger typing for bound parameters), and also made
some necessary changes due to differences between Flow and TypeScript.

[1]: https://github.com/DefinitelyTyped/DefinitelyTyped/commits/master/types/better-sqlite3/index.d.ts

Prettier does not format this file (it is in `flow-typed`), so I
manually ran it through Prettier with the settings used by `flow-typed`
itself.

Test Plan:
None.

wchargin-branch: flow-better-sqlite3
2018-09-13 18:32:00 -07:00
Dandelion Mané c9a0d4b1b8
Modify parseReferences to detect refs to commits (#832)
We add a new function, `findCommitReferences`, which can find both
explicit url references to commits, and commit hashes.

Since the commit url includes the commit hash, some extra logic is added
to deduplicate them in this instance. Tests verify that this is done
properly.

Test plan: Unit tests cover the cases of having commit hashes, having
commit urls, and having both at once.
2018-09-13 18:21:06 -07:00
William Chargin 417cc231e9
deps: add `better-sqlite3` (#836)
Summary:
I selected this over the alternatives, `sqlite` and `sqlite3`, primarily
because its README explicitly acknowledges that using asynchronous APIs
for CPU-bound or serialized work units are worse than useless. To me,
this is a sign that the maintainer has his head on straight.

The many-fold performance increase over `sqlite` and `sqlite3` is nice
to have, too.

For now, we use my fork of the project, which includes a critical patch
to support private in-memory databases via SQLite’s standard `:memory:`
filepath. When this patch is merged upstream, we can move back to
mainline.

Test Plan:
The following session demonstrates the basic API and validates that the
install has completed successfully:

```js
const Database = require("better-sqlite3");
const db = new Database("/tmp/irrelevant", {memory: true});

db.prepare("CREATE TABLE pythagorean_triples (x, y, z)").run();
const insert = db.prepare("INSERT INTO pythagorean_triples VALUES (?, ?, ?)");
const get = db.prepare(
  "SELECT rowid, x * x + y * y AS xxyy, z * z AS zz FROM pythagorean_triples"
);

function print(x) {
  console.log(JSON.stringify(x));
}

print(insert.run(3, 4, 5));
print(get.all());
print(insert.run(5, 12, 13));
print(get.all());

db.prepare("DELETE FROM pythagorean_triples").run();
print(get.all());
```

It prints:

```js
{"changes":1,"lastInsertROWID":1}
[{"rowid":1,"xxyy":25,"zz":25}]
{"changes":1,"lastInsertROWID":2}
[{"rowid":1,"xxyy":25,"zz":25},{"rowid":2,"xxyy":169,"zz":169}]
[]
```

wchargin-branch: dep-better-sqlite3
2018-09-13 18:20:10 -07:00
William Chargin 4675b84443
graphql: validate well-foundedness of unions (#835)
Summary:
GraphQL unions are required to be unions specifically of object types.
They cannot contain primitives or other union types as clauses. This is
good: it means that we don’t have to worry about unions that recursively
reference each other or themselves.

Unions are also required to have at least one clause, but we don’t
validate this because it’s not helpful for us. An empty union is
perfectly well-defined, if useless, and shouldn’t cause any problems.

Relevant portion of the spec:
<https://facebook.github.io/graphql/October2016/#sec-Union-type-validation>

Test Plan:
Unit tests added, retaining full coverage; `yarn unit` suffices.

wchargin-branch: graphql-schema-union-validation
2018-09-13 18:11:26 -07:00
William Chargin 7da9ef3a94
graphql: add a schema module (#834)
Summary:
This commit introduces a module for declaratively specifying the schema
of a GraphQL database. See `buildGithubSchema` in `schema.test.js` for
an example of the API.

This makes progress toward #622, though the design has evolved some
since its original specification there.

Test Plan:
Unit tests added, with full coverage; `yarn unit` suffices.

wchargin-branch: graphql-schema
2018-09-13 18:02:14 -07:00
Dandelion Mané 7074a9dbd8
Fix the build (#831)
There was a bad interaction between #830 and #829, wherein they both
independently changed the snapshot. So they passed individually, and
failed once both merged together. This fixes it.

Test plan: `yarn test --full` passes.
2018-09-13 17:23:09 -07:00
Dandelion Mané ab108653f3
Update example-github (#830)
I added two new issues. One of them has references to commits, which  is
relevant for work on #815.

Test plan: `yarn test --full` passes.
2018-09-13 15:46:51 -07:00
Dandelion Mané ab85c9785b
Detect references in commit messages (#829)
Now that the GitHub plugin knows about commit messages (#828), we can
parse those commit messages to find references to other GitHub entities.

Fixed a minor typing mistake along the way.

Test plan:
Observe that a number of references have been detected among the commits
in the example GitHub repository. We mistakenly find references to
wchargin because we don't have a proper tokenizer. (#481)

Progress on #815.
2018-09-13 15:46:39 -07:00
Dandelion Mané a1af9531ec
Request commit messages from GitHub (#828)
We could get this information from the Git plugin, but since we want to
use this for reference detection, it's much easier to have this follow
the same pipeline as all the other GitHub reference detection code.

I've updated the relational view to also remove the commit messages when
compressing by removing bodies. A unit test was added to check this
works as intended.

See #815 for tracking.

Test plan:
`yarn test --full` passes.
Snapshot changes are appropriate.
2018-09-13 15:15:21 -07:00
Dandelion Mané c68cb29769
Add commit authorship to the graph (#826)
In #824, we loaded every commit in the default branch's history into the
GitHub relational view, along with authorship info. This commit actually
uses that authorship info to create AUTHORS edges from the commit to the
user that authored it (whenever possible).

The implementation is quite simple: we just need to yield the commits
when we yield all the authored entities, so that we will process their
authors and add them to the graph. Also, I updated the invariant
declarations in `graphView.js`, and corrected a type signature so that the
new invariants would typecheck.

Test plan: The snapshot update shows that commits are being added to the
graph appropriately. Observe that commits which do not have a valid
GitHub user as their author do not correspond to edges in the graph.
See [example].

This is basically a solution to #815, but I'll defer closing that issue
until I've added a few more features, like reference detection.

[example]: 6bd1b4c0b7
2018-09-13 14:19:37 -07:00
Dandelion Mané 4ad9fcf259
Add commits in the history to the RelationalView (#824)
This builds on #821 so that every commit in the default ref's history is
added as a Commit entity to the GitHub relational view. This means that
these commits are also added to the graph by the GitHub plugin. In
general, this will have no effect on real graphs, because these commits
were already available via the Git plugin.

Test plan:
Observe that the snapshot changes just correspond to new commits being
available to the RelationalView, and correspondingly added to the GitHub
graph. `yarn test --full` passes.
2018-09-13 14:00:09 -07:00
Dandelion Mané 05f73f04ef
Change GitHub edge encoding strategy (#825)
GitHub has a procedure for encoding node addresses into sequences of
string "parts", so that we can generate unique edge addresses. Right
now, the encoding strategy assumes that when we encode a node address
into parts, that node address always starts with the prefix
`["sourcecred", "github"]`. However, #816 makes the Git commit address a
valid GitHub address, which means that this assumption no longer holds.

We could start adding special-cased logic to ensure that we de-serialize
Commit addresses properly, but what if we create edges between GitHub
entities and other plugins' nodes in the future? It is much cleaner to
remove the assumption, and serialize the full node address as parts in
the edge address. This makes the GitHub edge addresses somewhat longer,
but this is OK for now as we don't ever store those on disk. If, in the
future, node/edge address length is a problem, we can investigate more
principled and maintainable compression strategies at that time.

Test plan: `yarn test --full` passes.
2018-09-13 13:54:49 -07:00
Dandelion Mané 9aca956a64
Ensure that `RelationalView` is not typed as `any` (#823)
This fixes another instance of the notorious [facebook/flow#6400]. We
also fix some type errors that were being masked as a consequence.

Test plan: `yarn test` passes.

[facebook/flow#6400]: https://github.com/facebook/flow/issues/6400
2018-09-13 12:53:08 -07:00
Dandelion Mané 5c46636611
Cleanup some minor mistakes (#822)
- the capitalization of the GitObject types was incorrect
- removed an outdated TODO

Test plan: `yarn test` passes.
2018-09-13 12:52:58 -07:00
Dandelion Mané 2a39bd075d
Load commit authorship from GitHub (#821)
This adds logic for retrieving every commit in the default branch's
history, along with authorship information connecting that commit to a
GitHub user (when available).

This will allows us to do better cred tracking, especially for projects
that don't always use pull requests for merging code.

This results in a moderate increase in load time for the GitHub plugin.
On my machine, loading SourceCred before this change takes 30s, and
after this change it takes 34s.

Test plan:
Observe that the example-github has been updated with commits and
authorship. Also, I ran the query for a larger repository
(`sourcecred/sourcecred`) to verify that the continuation logic works.
2018-09-13 11:43:36 -07:00
Dandelion Mané 2a5c093286
Update CHANGELOG.md (#820)
It now mentions that we added `MentionsAuthor` edges to the GitHub
graph in #808.

Thanks @whyrusleeping for suggesting this heuristic.

Test plan: n/a
2018-09-12 20:30:35 -07:00
Dandelion Mané 7d0d4fb2fa
Add GitHub commit entity (#819)
This adds a `Commit` entity to the GitHub relational view. It has all
the standard methods: commits can be retrieved en masse or by particular
address, they have a URL and authors, and (de)serialize appropriately.

The code for adding pull requests has been modified so that the merge
commits are added as commit entities. This does not have any effect on
the ultimate graph being created; the same edge is added either way.

Test plan: I've extended the standard RelationalView tests to cover the
`Commit` entity. The case where the commit has 0 authors is not yet
tested, but will be once I add support for getting all of the commits
from the example-github (we have one example of a commit that doesn't
map to a user).

Progress on #815.
2018-09-12 19:44:26 -07:00
Dandelion Mané 3e06c054db
Give GitHub plugin support for Commit addresses (#816)
The Git plugin owns Commits, but the GitHub plugin also creates commits.
This commit reifies that relationship by making a Git commit address a
valid GitHub structured address. This is precursor work for #815, which
will require adding a commit entity to the GitHub relational view.

Also, this commit surfaces and fixes a minor type bug, wherein a map
from strings to referent addresses was typed to hold any structured
address, rather than just referent addresses.

Test plan: The unit tests confirm that serializing/deserializing a Git
commit address using the GitHub plugin's methods works as intended.
Also, unit tests were added that verify that (de)serializing Git
addresses for non-commit objects is still an error.
2018-09-12 19:06:06 -07:00
Dandelion Mané 7dc9449fe7
GitHub: Request commit authorship info (#817)
This commit pulls the graphql fields to request commit information into
a fragment, and requests GitHub authorship information (when
available) for that fragment. We don't use that information yet, but we
will soon. Progress on #815.

Test plan: Observe that the example-github data is updated, so that we
now have urls and authorship for commits. Observe that the query has
updated, but no downstream code was affected. `yarn test --full` passes.
2018-09-12 18:51:06 -07:00
Dandelion Mané 335441e671
Expose node/edge prefixes publicly (#814)
Both the GitHub and Git plugins create a `_Prefix` object for nodes and
edges, which gives the respective prefixes for different node/edge
types. We named it `_Prefix` because we weren't sure if these should be
exported. In practice, these have proven quite useful to make generally
available, and despite the `_`-naming we expose the objects outside
their modules. This change renames `_Prefix` to `Prefix` to reflect the
reality that these are used as public consts.

Exporting them is safe as both objects are frozen.

Test plan: Simple rename, `yarn test` suffices.
2018-09-12 17:18:47 -07:00
Dandelion Mané 70fe677990
Update example GitHub data (#813)
Generated via `src/plugins/github/fetchGithubRepoTest.sh -u`

`yarn test --full` passes.

Closes #389. Thanks, @wchargin!
2018-09-12 17:18:37 -07:00
Dandelion Mané 737ed4d8b3
Add `MentionsAuthor` edges to the graph (#808)
This commit builds on the work in #806, adding the
`MentionsAuthorReference`s to the graph. It thus resolves #804.

Empirically, the addition of these edges does not change the users' cred
distribution much.  Consider the results with the following 3 forward
weights for the edge (results for ipfs/go-ipfs):

| User          | w=1/32 | w=1/2  | w=2    |
|---------------|-------:|-------:|-------:|
| whyrusleeping | 228.04 | 225.69 | 223.86 |
| jbenet        | 102.04 | 100.26 | 99.53  |
| kubuxu        | 66.60  | 67.80  | 69.36  |
| ...           | —      | —      | —      |
| btc           | 22.69  | 22.29  | 21.38  |

The small effect on users' cred is not that surprising: the
MentionsAuthor references always "shadow" a direct comment->user
reference. In principle, the overall cred going to the user should be
similar; the difference is that now some more cred flows in between the
various comments authored by that user, on the way to the user. (And if
those other comments had references, then it flows out from them, etc.)

Empirically, the variance on comments' scores seems to increase as a
result of having this heuristic, which is great—the fact that all
comments had about the same score was a bug, not a feature.

Sadly, we don't have good tooling for proper statistical analysis of the
effect this is having. We'll want to study the effect of this heuristic
more later, as we build tooling and canonical datasets that makes that
analysis feasible.

We choose to add this heuristic, despite the ambiguous effect on users'
cred, because we think it is principled, and adds meaningful structure
to the graph.

Test plan:
The commit is a pretty straightforward generalization of our existing
GitHub edge logic. All of the interesting logic was thoroughly tested in
the preceding pull, so this commit just tests the integration. Observe
that standard (de)serialization of the edge works, that the snapshot is
updated with a MentionsAuthor reference edge, and that the graph
invariant checker, after update, does not throw errors. Also, I manually
tested this change on the ipfs/go-ipfs repo. (It does not require
regenerating data.)
2018-09-12 12:55:14 -07:00
Dandelion Mané 91f76393e8
Add logic for finding `MentionsAuthorReference`s (#806)
A `MentionsAuthorReference` is created when a post mentions a user, and
that user has authored at least one post in the same thread. Then there
is a `MentionsAuthorReference` from the post to the other posts by that
author.

For context, see the docstrings in `mentionsAuthorReference.js`, and
see #804.

Test plan:
Thorough unit tests have been added, which test the entire pipeline,
from ingesting the data via GitHub's graphql responses, through to
detecting the references. Edge cases such as self-reference and
multi-reference are tested.

Thanks to @wchargin for help writing this commit.
2018-09-11 20:27:31 -07:00
Dandelion Mané bf1e85d6f4
Add `MapUtil.pushValue` for maps of arrays (#805)
With some frequency we find ourselves needing to maintain maps whose
values are arrays that we append to. `MapUtil.pushValue` is a utility
method for these cases.

Existing usage in `aggregate.js` has been modified to use the new
function.

Test plan: Unit tests included.
2018-09-11 15:26:54 -07:00
William Chargin 4b0693e2a7
test: invoke Jest in CI mode (#803)
Summary:
CI mode prevents Jest from automatically writing snapshots, and also
causes obsolete snapshots to be an error instead of a warning. This is
consistent with the behavior on Travis, where the `CI=1` environment
variable is set. It should thus be the default when running `yarn test`
(but not `yarn unit`).

Test Plan:
Add a file `src/foo.test.js`:

```js
// @flow
describe("foo", () => {
  it("bar", () => {
    expect("baz").toMatchSnapshot();
  });
});
```

Note that `yarn test` fails with the message, “new snapshot was not
written”.

Revert this change, then re-run `yarn test`; note that it passes,
writing a snapshot.

Then, reinstate this change and delete `src/foo.test.js`. Note that
running `yarn test` fails, due to an obsolete snapshot. Revert the
change again, and watch `yarn test` pass despite the obsolete snapshot.

Finally, remove the snapshot. :-)

wchargin-branch: test-jest-ci
2018-09-06 20:45:16 -07:00
William Chargin 2d4acf62c5
test: check that JS tests describe their filenames (#802)
Summary:
Resolves #800. The newly added test takes about 2ms per file.

Test Plan:
Run `yarn sharness`, and note that it passes.

Then, edit (say) `src/main/test.js` to change the top-level describe
block from `"cli/main"` to something else, or to remove it altogether.
Re-run `yarn sharness` and note that it fails with a helpful message:

```
test_js_tests_have_top_level_describe_block_with_filename.t .. 1/?
not ok 31 - test file: cli/main.test.js
test_js_tests_describe_filename.t .. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/65 subtests
```

wchargin-branch: describe-test
2018-09-06 20:44:59 -07:00
William Chargin 5fa20ec89e
test: use proper top-level `describe` blocks (#801)
Summary:
Per #800, each test file should start with a `describe` block listing
its file path under `src`. Currently, nine of our tests do not do so.
Of these, eight had a top-level describe block with the wrong name
(either not a filepath or an outdated filepath), while only one short
test was missing a top-level describe block altogether. This patch fixes
each file to use the correct format.

Test Plan:
Apply the Sharness test in #802, and note that it fails before this
patch but passes after it.

wchargin-branch: describe-fix
2018-09-06 20:39:46 -07:00
Dandelion Mané 508fbc5d72
Release 0.1.0 (#799)
Test plan: I ran `yarn test --full`. I also regenerated data from
scratch and manually tested the cred explorer.
2018-09-06 19:06:16 -07:00
Dandelion Mané bf35bbbbda
Move WeightConfig into PagerankTable (#798)
Previously, the WeightConfig (and the button that expanded it) were in
the credExplorer App. This was a little weird, as there's no reason to
play with the weights before you have some Pagerank results to
investigate; additionally, it risked confusing new users with a concept
that was not yet applicable.

Also, the implementation was wonky: the WeightConfig had responsibility
for expanding/hiding itself, which gave poor ability to position the
button and the WeightConfig separately.

Finally, the codepath was untested (vestiges of #604).

This commit fixes all three issues:
- The WeightConfig and button have moved into PagerankTable
- The WeightConfig is now a stateless component, and the parent takes
responsibility for deciding when to mount it
- Logic for showing/hiding the WeightConfig is now tested.
2018-09-06 18:58:09 -07:00
Dandelion Mané b632bd6188
Move `WeightConfig` into the `weights` directory (#797)
Test plan: `yarn test` sufficies for this simple move.
2018-09-06 17:29:15 -07:00
Dandelion Mané eb065f3634
Make `credExplorer/App` control the WeightedTypes (#796)
This commit implements a [suggestion] to make `credExplorer/App` a
single source of truth on the `WeightedTypes`. As such, both
`WeightConfig` and `PluginWeightConfig` have been refactored to be
(essentially) stateless components that are controlled from above. I say
essentially because `WeightConfig` still has its expanded state, but
that will go away soon.

Along the way, I've improved testing and added some new invariant
checking (e.g. that `PluginWeightConfig` was passed the correct set of
weights for its adapter). For the first time, there are now tests for
the `WeightConfig` itself! I'm not totally done with the weight
re-write, but this seems like a good time to close #604, as the whole
logical sequence for setting weights is now tested.

Test plan: There are new unit tests. Also, to be sure, I manually tested
the UI as well.

[suggestion]: https://github.com/sourcecred/sourcecred/pull/792#issuecomment-419234721
2018-09-06 17:17:57 -07:00
Dandelion Mané ead0157960
Change WeightedTypes to contain maps (#795)
This will make it easier to re-organize the weight components so that
the WeightedTypes have a single source of truth, as described in
https://github.com/sourcecred/sourcecred/pull/792#issuecomment-419234721

Test plan: Unit tests suffice.
2018-09-06 16:58:40 -07:00
Dandelion Mané ad5ea761ea
`credExplorer/App` stores weights, not evaluator (#792)
This commit refactors `credExplorer/App` so that instead of storing an
`EdgeEvaulator` in its state, it stores `WeightedTypes` instead. This
has a few benefits:

- It's trivial to generate the right default value for `WeightedTypes`,
so we no longer allow the variable to be nullable in the state. This
simplifies logic, removes an error case, and means that we don't require
the `WeightConfig` to mount before the app is usable.
- `WeightedTypes` are serializable and can be tested for equality, so
they are a better-behaved piece of state
- We put off the information-destroying transformation as long as
possible
- In the future, I think we may want to move the weights/types concept
into core, at which point the `WeightedTypes` will directly be consumed
by the `core/attribution` module.

Test plan: Unit tests are pretty thorough; to be safe, I tested the UI
myself.
2018-09-06 15:29:38 -07:00