548 Commits

Author SHA1 Message Date
William Chargin
4184e8594a
Save the GitHub relational store from the CLI (#447)
Summary:
This provides a command-line entry point `load-plugin-v3` (which will
become `load-plugin` eventually), which fetches the GitHub data via
GraphQL and saves the resulting `RelationalStore` to disk.

A change to the Babel config is needed to prevent runtime errors of the
form `_callee7` is not defined, where `_callee7` is a gensym that is
appears exactly once in the source (in use position, not definition
position). I’m not sure exactly what is causing the error or why this
config change fixes it. But while this patch may be fragile, I don’t
think that it’s likely to subtly break anything, so I’m okay with
pushing it for now and dealing with any resulting breakage as it arises.

Paired with @decentralion.

Test Plan:
Run `yarn backend`, then run something like:

```
node bin/sourcecredV3.js load-plugin-v3 \
    sourcecred example-github --plugin github
```

Inspect results in `SOURCECRED_DIR/data/OWNER/NAME/github/view.json`,
where `SOURCECRED_DIR` is `/tmp/sourcecred` by default, and `OWNER` and
`NAME` are the repository owner and name.

This example repository takes about 1.1 seconds to run. The SourceCred
repository takes about 45 seconds.

wchargin-branch: cli-load-plugin
2018-06-29 12:12:37 -07:00
William Chargin
3835862f82
Create a V3 command-line entry point (#446)
Summary:
Due to oclif’s structure, this entry point shares its `commands`
directory with that of the V1 entry point. We’ll therefore add commands
like `start-v3` as we go.

Test Plan:
`yarn backend` works, and `node bin/sourcecredV3.js start` launches the
V1 server.

wchargin-branch: v3-cli
2018-06-29 11:47:24 -07:00
Dandelion Mané
3bf496b06f
Update example-github data (#445)
Generated via
```
$ src/v3/plugins/github/fetchGithubRepoTest.sh -u
```

Test plan: travis
2018-06-29 11:37:41 -07:00
Dandelion Mané
baec3c15dd
Include Pull additions/deletions (#444)
This adds additions and deletions to the v3 Pull data model, and also
uses them in the pull descriptions.

It's basically a port of #340 to v3.

Test plan: Snapshots
2018-06-29 11:28:51 -07:00
Dandelion Mané
6356c5477f
Add RelationalView.{to,from}JSON (#443)
This adds methods for serializing the GitHub RelationalView.

We have not put in the work to ensure that these methods generate
canonical data. Getting the issues in a different order, or finding
references in a different order, can change the JSON output even if the
resulting repositories are equivalent.

@decentralion think it's not worth putting in the effort, since we may
switch to a SQL database soon anyway.

Test plan: travis

Paired with @wchargin
2018-06-28 18:39:31 -07:00
Dandelion Mané
64df5b09c3
Add RelationalView.addData (#442)
Now that we want to implement RelationalView de/serialization, we need a
way to construct one without adding data to it.

Now that we're allowing `addData` to be called explicitly, we also want
to make sure it's idempotent, which necessitated a small change to
reference handling. A new test verifies idempotency.

Test plan: travis

Paired with @wchargin
2018-06-28 18:31:58 -07:00
William Chargin
4ee1ed54c8
Transform Markdown AST to strip formatting (#441)
Summary:
This makes progress on #432. We’d like to look for GitHub references
only within each text node of the Markdown AST. But there are two
complications:

  - Text nodes split across formatting, and it’s valid for someone to
    write `*Paired* with @decentralion, but *tested* independently`, or
    `**Closes** #12345`, or something.

  - Sometimes contiguous blocks of text expand to multiple text nodes,
    because of how CommonMark approaches smart punctuation. For
    instance: the document `It's got "punctuation" and stuff!` has eight
    text nodes ([demo][1]).

In this commit, we introduce functions `deformat` and `coalesceText` to
solve these problems. (They go together because `coalesceText` is useful
for testing `deformat`.)

[1]: https://spec.commonmark.org/dingus/?text=It%27s%20got%20%22punctuation%22%20and%20stuff!

wchargin-branch: markdown-deformat
2018-06-28 17:30:59 -07:00
William Chargin
0cc2907e9e
Add dependency on commonmark (#440)
Summary:
We plan to use this to more intelligently extract references from GitHub
text content. See #432.

Test Plan:
In a Node shell, running

```js
const cm = require("commonmark");
var parser = new cm.Parser();
var ast = parser.parse("Hello\nworld");
var html = new cm.HtmlRenderer({softbreak: " "}).render(ast);
console.log(html);
```

prints `<p>Hello world</p>`.

wchargin-branch: commonmark
2018-06-28 17:01:31 -07:00
Dandelion Mané
607adeca29
GH: Add a description method for entities (#439)
This commit adds a `description` method that takes a GitHub entity, and
returns a description of that entity. Based on the work in #261.

In contrast to the implementation in #261:
- It won't crash on entities without an author (although we don't have a
test case for this; see #389).
- It handles multi-authors reasonably (although we can't test that, as
we haven't implemented multi-authorship yet; see #218).

Test plan:
Inspect snapshot to see some examples.
2018-06-28 16:53:52 -07:00
Dandelion Mané
40db3cdfa3
Add RelationalView.match (#438)
`match` implements pattern matching over `Entity`

Test plan:
Unit tests included.
2018-06-28 14:57:23 -07:00
Dandelion Mané
e239fdfeeb
Export a clean Entity type from relationalView (#437)
Callers will want to write functions that are generic over `Entity`.
This makes those call signatures cleaner.

Test plan: travis
2018-06-28 14:52:24 -07:00
Dandelion Mané
a8f54530bc
Add a GitHub example module (#436)
Currently, GitHub tests load example data with ad-hoc methods. It makes
it easy for the author of a new test file to forget to clone the test
data (and risk cross-test-file state pollution), or to forget to apply
the correct typing.

This commit factors a shared `example` module which provides a safe way
to access the example data, along with some convenient helpers for
constructing a graph or relational view.

Test plan:
`yarn travis`

Fixes #430.
2018-06-28 14:52:16 -07:00
Dandelion Mané
529f7db374
Rename demoData folders to example (#435)
The Git and GitHub plugins have folders that contain small example data,
as used for tests and snapshots. These folders were called `demoData`
which is misleading since the data isn't used for demos. The folders
themselves contained files called "example", like "example-github.json"
or "exampleRepo.js". Renaming the folders to `example` is cleaner.

Test plan:
`yarn travis --full` passes.
2018-06-28 14:20:31 -07:00
Dandelion Mané
38942d1f7b
Add references to the GitHub graph (#434)
This is a very simple extension of #431 to use the new reference
detection logic added in #429.

Test plan:
Inspect snapshot change for plausibility. Note that the snapshot adds
exactly 16 reference edges, which is the same as the number of
references in the reference snapshot test.
2018-06-28 14:03:45 -07:00
Dandelion Mané
1421148a6d
Refactor GH createGraph to use RelationalView (#428)
This commit modifies `github/createGraph` to use the `RelationalView`
class created in #424. The code is now much cleaner.

I also fixed some `any`s that were leaking in our test code (due to use
of runtime require for GitHub example data). These anys were discovered
by bumping into uncaught type errors. :)

This commit supersedes #413 and #419.

Test plan:
Observe that the graph snapshot was not changed.
2018-06-28 13:45:41 -07:00
Dandelion Mané
c022b3f4d0
RelationalView tracks GitHub references (#431)
For every `TextContentEntity` (`Issue`, `Pull`, `Review`, `Comment`),
this commit adds a `references` method that iterates over the entities
that the text content entity references.

For every `ReferentEntity` (actually, every entity), this commit adds a
`referencedBy` method which iterates over the text content entities that
reference that referent entity.

This method also adds `referentEntities` and `textContentEntities`
methods to the `RelationalView`, as they are used in both implementation
and test code.

Test plan:
The snapshot tests include every reference, in a format that is very
convenient for inspecting the ground truth on GitHub. For every
reference, it's easy to check that the reference actually exists by
copying the `from` url and pasting it into the browser. I've done this
and they check out. (It's not easy to prove that there are no missing
references, but I'm pretty confident that this code is working.)

Unit tests ensure that the `references` and `referencedBy`
methods are consistent.
2018-06-28 13:32:47 -07:00
Dandelion Mané
6235febdac
Add porcelain-style classes to RelationalView (#424)
Based on offline design discussion with @wchargin, we've decided to
upgrade the `RelationalView` to be *the* comprehensive source for GitHub
data inside SourceCred. The `RelationalView` will contain the full
dataset, including parsed relational information (such as
cross-references between GitHub entities). Then, we will project our
GitHub graph out of the `RelationalView`.

To that end, the `RelationalView` no longer exports raw data blobs.
Instead, it exports nice classes: `Repo`, `Issue`, `Pull`, `Review`,
and `Userlike`. These classes have convenient methods for accessing both
their own data and related entities, e.g. `repo.issues()` yields all
the issues in that repo.

This is effectively a port of #170 into the v3 API. The main difference
is that in v1, the Graph contained this data store, whereas in v3, we
will use this data store to generate the graph.

This supersedes #418.

Test plan:
The snapshot tests are quite readable.
2018-06-27 15:25:20 -07:00
William Chargin
e7b28b81db
Convert V3 graphs to Markov chains (#427)
Summary:
This is based on the V1 file `basicPagerank.js`. The API is necessarily
changed for the new graph format, and we export additional utilities
compared to the previous version of the module (useful for testing and
serialization). We also improve the implementation to make it simpler
and easier to understand.

Test Plan:
Unit tests included.

wchargin-branch: v3-graph-markov-chain
2018-06-27 15:24:47 -07:00
William Chargin
b9c67f447f
Expose advancedGraph test case (#426)
Summary:
We’d like to use this test case to generate a Markov chain, which
requires that it not be local to the `graph.js` tests.

Test Plan:
Existing unit tests suffice.

wchargin-branch: expose-advanced-graph
2018-06-27 15:18:47 -07:00
William Chargin
faa2f8c9d0
Copy Markov chain code from V1 to V3 (#425)
Summary:
This code is independent of the graph abstraction, and so is mostly
copied. The only change is to the structure of the test code (we now
prefer to wrap everything in a big `describe` block with an absolute
path to the module under test).

Test Plan:
Unit tests included.

wchargin-branch: v3-markov-chain
2018-06-27 15:14:42 -07:00
Dandelion Mané
659fc51d9b
Rename findReferences to parseReferences (#429)
This code is about parsing references out of text, so `parseReferences`
is a better name.

The code that consumes this logic to find all the references in the
GitHub data shall be rightly called `findReferences`

Test plan:
`yarn travis`
2018-06-27 13:21:25 -07:00
William Chargin
518d5b819c
Represent submodule commits as normal commits (#423)
Summary:
Closes #417. Submodule commits are dead; long live commits. The ontology
is now:

  - A tree includes tree entries.
  - A tree entry may have a blob as contents.
  - A tree entry may have a tree as contents.
  - A tree entry may have a commit as contents.

Test Plan:
Existing unit tests suffice, especially `#commits yields all commits`.

wchargin-branch: git-remove-submodule-commits
2018-06-27 12:01:07 -07:00
William Chargin
38c364c916
Allow Git commits to have zero or one tree (#422)
Summary:
Submodule commits need not have associated tree objects, in case the
repository to which they belong does not exist in our graph. We’d like
to represent submodule commits as actual commits, which necessitates
this change. See #417 for context.

Test Plan:
Existing unit tests suffice.

wchargin-branch: git-affine-trees
2018-06-27 11:47:39 -07:00
William Chargin
dd83d7b4ab
Implement a Git graph view (#415)
Summary:
Similar in structure to the GitHub graph view.

Test Plan:
Unit tests added, with full coverage.

wchargin-branch: git-graph-view
2018-06-26 14:00:19 -07:00
William Chargin
0522894a8d
Create Git graph (#406)
Summary:
This commit adds logic to create the Git graph, modeled after the GitHub
graph creator in #405. In this commit, we do not include the
corresponding porcelain; a Git `GraphView` will be added subsequently.

Kudos to @decentralion for suggesting in #187 that I write the logic to
detect BECOMES edges against the high-level data structures. Due to that
decision, the logic and tests are copied directly from the V1 code
without change, because the high-level data structures are the same. The
new code is exactly the body of the `GraphCreator` class.

Test Plan:
Verify that the new snapshot is likely equivalent to the V1 snapshot,
using the heuristic that the two graphs have the same numbers of nodes
(59) and edges (84). (I have performed this check.)

wchargin-branch: git-v3-create-graph
2018-06-26 13:54:47 -07:00
Dandelion Mané
a470f28204
Add GitHub RelationalView (#411)
The `RelationalView` maps the GitHub GraphQL response data into a View
class, which makes it easy to access pieces of GitHub data by their
corresponding `StructuredAddress`.

This will be a valuable companion to the graph, making it possible to
access GitHub node data like the title or body of an issue via the
issue's address. This basically is the supplement to the GitHub graph
that includes the "payloads" from our v1 Graph.

It will also make creating the GitHub graph a lot more convenient,
although I've left that for another commit.

Designed with feedback from @wchargin.

Note: The `RelationalView` objects have a `nominalAuthor` rather than
`author`, so as to distinguish between authorship in the GitHub data
model (entities have at most one author) and in the SourceCred model
(entities may have multiple authors).

Test plan:
Inspect the included snapshots for reasonability, and run unit tests.
2018-06-22 20:32:07 -07:00
Dandelion Mané
2dec8868db
Copy github/findReferences from v1 to v3 (#410)
The code will be refactored so that references are expressed in terms of
the GitHub node address code; the implementation is copied first so that
the review will be cleaner.

Test plan:
`yarn travis` passes.
2018-06-22 16:16:48 -07:00
William Chargin
127200f67c
Cache core graph checkInvariants result (#408)
Summary:
The public method `checkInvariants` on graph is now cached. The cache is
invalidated when the graph is modified via the public API. As a result
of this change, the time of `yarn ci-test --testPathPattern src/v3/`
decreases from 5.631s to 3.866s (best-of-three timing, but low variance
anyway). This effect becomes much more pronounced as higher-level APIs
check their own invariants by themselves indirectly invoking the graph’s
`checkInvariants` method many times.

Test Plan:
Existing unit tests have been adapted and extended. Tests for the
invariant checking have been updated to call the internal, uncached
method, and new tests have been added to check that the caching behavior
is correct.

wchargin-branch: cache-graph-invariants
2018-06-22 16:16:43 -07:00
Dandelion Mané
a209caeec2
create the GitHub graph (#405)
This commit:
- adds `github/createGraph.js`
  - which ingests GitHub GraphQL response
  - and creates a GitHub graph
- adds `github/graphView.js`,
  - which takes a Graph
  - and validates that all GitHub specific node and edge invariants hold
    - every github node may be parsed by `github/node/fromRaw`
      - with the right node type
    - every github edge may be parsed by `github/edge/fromRaw`
      - with the right edge type
      - with the right src address prefix
      - with the right dst address prefix
    - every child node has exactly one parent
      - of the right type
  - and provides convenient porcelain methods for
    - finding repos in the graph
    - finding issues of a repo
    - finding pulls of a repo
    - finding reviews of a pull
    - finding comments of a Commentable
    - finding authors of Authorables
    - finding parent of a ChildAddress
- tests `createGraph`
  - via snapshot testing
  - by checking the GraphView invariants hold
- tests `graphView`
  - by checking individual entities in the example-git repository have
  the proper relationships
  - by checking that for every class of invariant, errors are thrown if
  the invariant is violated

Test plan:
- Extensive unit and snapshot tests added. `yarn travis` passes.
2018-06-22 13:10:19 -07:00
Dandelion Mané
24a7547e16
Use Git Commit type in GitHub mergedAs edge (#407)
Test plan:
`yarn travis` passes
2018-06-22 12:12:08 -07:00
William Chargin
448fb3e1a8
Add edge type definitions for V3 Git plugin (#404)
Summary:
This is modeled after the GitHub edge module format. In particular, the
whole length encoding garbage is directly copied. As in that module, we
decline to test the error paths.

Test Plan:
Unit tests added; run `yarn travis`. Snapshots are readable.

wchargin-branch: git-v3-edges
2018-06-20 15:49:50 -07:00
William Chargin
7c1b3ca835
Add node type definitions for V3 Git plugin (#403)
Summary:
This is modeled after the GitHub node module format, with the obvious
alterations plus a bit more type safety in the implementation of `toRaw`
(namely, we check `type` exhaustively).

Test Plan:
Unit tests added; run `yarn travis`.

wchargin-branch: git-v3-nodes
2018-06-20 15:40:08 -07:00
William Chargin
83151d9fac
Remove payload definitions from V3 git/types.js (#402)
Test Plan:
Existing Flow and unit tests suffice.

wchargin-branch: git-v3-remove-payloads
2018-06-20 15:32:12 -07:00
William Chargin
9347348dd7
Copy graph-independent V1 Git plugin code to V3 (#401)
Summary:
Many files are unchanged. Some files have had paths updated, or new
build/test targets added.

The `types.js` file includes payload type definitions. These are
technically independent of the graph abstraction (i.e., nothing from V1
is imported and the code all still works), but it of course implicitly
depends on the V1 model. For now, we include the entirety of this file,
just so that we have a clean copy operation. Subsequent commits will
strip out this extraneous code.

Suggest reviewing with the `--find-copies-harder` argument to Git’s
diffing functions.

Test Plan:
Running `yarn travis --full` passes. Running

    ./src/v3/plugins/git/demoData/synchronizeToGithub.sh --dry-run

yields “Everything up-to-date”.

wchargin-branch: git-v3-copy
2018-06-20 15:28:37 -07:00
William Chargin
281cb574d5
Greatly simplify GitHub edge tests (#400)
Summary:
We had `edgeExamples`, wherein we constructed examples of edge addresses
(not actual edges) by manual instantiation. We checked that these
matched snapshots, and then we also called `createEdge` a bunch to
create actual edges, and checked that those matched snapshots, too.
Consequently, we had twice as many snapshots as we needed, and also
defined twice as many edge addresses as we needed.

Test Plan:
Note that snapshot contents are either deleted or unchanged.

wchargin-branch: simplify-github-edge-tests
2018-06-19 16:01:26 -07:00
William Chargin
b9c01d13c9
Use edgeToParts in the GitHub edge tests (#399)
Test Plan:
Observe that the new snapshots are easier to read. Might as well make
sure that they encode the same data as the old snapshots, too. Note that
the backslash character no longer appears in this snapshot file. :-)

wchargin-branch: use-edgeToParts
2018-06-19 15:55:10 -07:00
William Chargin
aac2fc6792
Add edgeToParts convenience export from Graph (#398)
Summary:
We have `edgeToString`, which formats edges as nicely human-readable
strings. However, these strings have some quotes in them, and so when
they are themselves stringified (e.g., as part of a Jest snapshot), they
become much harder to read. We thus introduce `edgeToParts` to make our
snapshots more readable.

Test Plan:
Unit tests added; run `yarn travis`.

wchargin-branch: add-edgeToParts
2018-06-19 15:48:04 -07:00
William Chargin
ea74955a66
Fix GitHub node fromRaw error-path test cases (#397)
Summary:
In #394, we uppercased the constants for GitHub node types. However, we
were using string literals instead of constants in the test cases. These
test cases were supposed to cover every error path, but instead ended up
just covering the “bad type” error path many times.

Any one of the following would have prevented this regression:

 1. using string constants instead of literals in the test case;
 2. throwing and checking more precise error messages; or
 3. being alerted that coverage decreased as a result of the change.

In this commit, we enact the first of these options. I’m open to adding
a coverage bot, but don’t feel strongly about it at this time.

Test Plan:
Running `yarn coverage` now shows 100% coverage for the `nodes.js`
module, whereas previously almost all `throw fail();` lines were
uncovered (and the branch coverage was just 76%).

wchargin-branch: fix-github-node-error-tests
2018-06-19 14:57:14 -07:00
Dandelion Mané
ed3397f654
Add GitHub prefixes and const types (#395)
- Switch string constant node and edge types (e.g. "REPO") to exported
consts (eg `export const REPO_TYPE`).
- Add (and internally use) a `_Prefix` psuedomodule which contains
per-type address prefixes
- Test that constructing a StructuredAddress with the wrong type is an
error.

Test plan:
Unit tests pass, snapshots unchanged.

Paired with @wchargin
2018-06-14 15:01:33 -07:00
Dandelion Mané
a8bf6a36bf
Add CommentableAddress (#396)
Test plan: Not needed.

Paired with @wchargin
2018-06-14 14:52:29 -07:00
William Chargin
b6eebddeb0
Use uppercase enum constants in GitHub addresses (#394)
Summary:
@decentralion wants this! :-)

Test Plan:
Verify that the case-insensitive diff is empty:
```
$ git config --global difftool.idiff.cmd 'diff -ui "$LOCAL" "$REMOTE"'
$ git difftool -y --tool idiff HEAD~1..HEAD
```

wchargin-branch: uppercase-enum
2018-06-14 13:45:55 -07:00
William Chargin
7ce4a0c32d
Use NodeAddress.empty and EdgeAddress.empty (#393)
Summary:
This fixes up all instances of `fromParts([])` that are not in
`address.js` or `address.test.js`.

Paired with @decentralion.

Test Plan:
Running `git grep --name-only -F 'fromParts([])'` yields only the two
modules listed above. Existing unit tests suffice for correctness.

wchargin-branch: use-address-empty
2018-06-14 13:11:08 -07:00
William Chargin
6ba6d885ad
Add empty (monoid identity) to address modules (#392)
Summary:
This can make invocations of `FooAddress.fromParts([])` a bit more
succinct.

Paired with @decentralion.

Test Plan:
Unit tests added. Run `yarn travis`.

wchargin-branch: address-empty
2018-06-14 13:06:26 -07:00
Dandelion Mané
7199586262
Add Graph.edges filtering by prefixing (#391)
Similar to #390, we now allow filtering the results from `Graph.edges`
by address prefixes. It's a little more complicated than #390, as we
allow filtering by src, dst, or address.

Test plan:
Unit tests added. `yarn travis` passes.

Paired with @wchargin
2018-06-14 11:59:58 -07:00
Dandelion Mané
1a08a48c03
Implement prefix filtering for Graph.nodes (#390)
Simple API addition to match v1/v2 semantics.
In the future, we can perf optimize this if we switch graph to
store nodes organized by shared prefixes.

Test plan:
Unit tests were added. `yarn travis` passes.

Paired with @wchargin
2018-06-14 11:18:47 -07:00
Dandelion Mané
95c5af36d9
Add methods for parsing Ids from GitHub urls (#388)
Test plan:
Unit tests added. Run `yarn travis`.

The GitHub regex code is inspired by work in #98
2018-06-13 16:25:15 -07:00
William Chargin
2491fcd3cb
Add GitHub edges module (#385)
Summary:
This module includes a raw edge type, a structured edge type, and edge
creation functions that take source and destination and create an edge.

Test Plan:
Unit tests added. These cover all of the successful cases, and none of
the unsuccessful cases. We plan to refactor this code Soon™, and it is
hard to see how to nicely factor the tests without just testing the same
code paths over and over.

wchargin-branch: github-edges
2018-06-13 16:19:50 -07:00
Dandelion Mané
17b390afe9
Add trait-specific GitHub address types (#387)
Will be useful in graph creation logic, and in #385

Test plan: Only change is to add types. No testing needed.

See design discussion [on discord].

Paired with @wchargin

[on discord]: https://discordapp.com/channels/453243919774253079/454007907663740939?jump=456564101183832064
2018-06-13 14:42:37 -07:00
William Chargin
25f74b89e9
Export _githubAddress from GitHub nodes (#384)
Summary:
We’ll want to use this in the upcoming `edges` module.

Test Plan:
Existing unit tests suffice.

wchargin-branch: expose-githubaddress
2018-06-13 13:53:09 -07:00
Dandelion Mané
ad9ac55bef
Github Addresses: Rename fragment to id (#386)
Test plan:
`yarn travis`
2018-06-13 13:41:52 -07:00