Commit Graph

1125 Commits

Author SHA1 Message Date
William Chargin d3dcf1ef5a
Speed up Git graph creation (#205)
Summary:
Because of `mergeConservative`’s naive implementation, using it as a
reducer results in a roughly quadratic algorithm. Replacing this with a
mutative accumulation has the same semantics but goes much faster.

Test Plan:
For correctness: tests pass. For performance: apply the following patch
to collect timing data. Then run:
```shell
$ NODE_ENV=production yarn backend
$ node bin/loadAndPrintGitRepository.js . >/tmp/sourcecred-git
$ node bin/createGitGraph.js /tmp/sourcecred-git
```
to run against the current state of SourceCred. Before this patch, the
elapsed time is about 6m00s; after this patch, it is about 0m1.3s.
Specifically:
```
$ cat timing_before
[0] Git graph creation: 239593.958ms
[1] Git graph creation: 240380.557ms
[2] Git graph creation: 241687.042ms

$ cat timing_after
[0] Git graph creation: 1585.903ms
[1] Git graph creation: 1315.430ms
[2] Git graph creation: 1373.833ms
```

<details>
<summary>Patch to collect timing data</summary>

```diff
diff --git a/config/paths.js b/config/paths.js
index f875eee..1bc1469 100644
--- a/config/paths.js
+++ b/config/paths.js
@@ -64,5 +64,6 @@ module.exports = {
     loadAndPrintGitRepository: resolveApp(
       "src/plugins/git/bin/loadAndPrintRepository.js"
     ),
+    createGitGraph: resolveApp("src/plugins/git/bin/createGraph.js"),
   },
 };
diff --git a/src/plugins/git/bin/createGraph.js b/src/plugins/git/bin/createGraph.js
new file mode 100644
index 0000000..a35ca1b
--- /dev/null
+++ b/src/plugins/git/bin/createGraph.js
@@ -0,0 +1,25 @@
+// @flow
+
+import {readFileSync} from "fs";
+
+import {createGraph} from "../createGraph";
+
+function main() {
+  const filename = process.argv[2];
+  const raw = JSON.parse(readFileSync(filename).toString());
+  const results = [];
+  for (let i = 0; i < 3; i++) {
+    const timer = `[${i}] Git graph creation`;
+    console.time(timer);
+    results.push(createGraph(raw));
+    console.timeEnd(timer);
+  }
+  console.log(
+    "Checksum: " +
+      JSON.stringify(
+        results.map((graph) => graph.nodes().length ^ graph.edges().length)
+      )
+  );
+}
+
+main();
```

</details>

wchargin-branch: collect-gold-rings
2018-05-04 10:52:41 -07:00
Dandelion Mané 0bf4f73f86
Add fetchGithubGraph (#204)
fetchGithubGraph is a tiny module which is responsible for fetching
GitHub contribution data, and parsing it into a graph.

Test plan:
The function is trivial and does not merit explicit testing.
2018-05-04 10:21:21 -07:00
William Chargin 315f66cc4c
Add BECOMES edges in the Git graph (#203)
Summary:
If a commit causes a tree entry to change hash while keeping the same
name, we now add a BECOMES edge between the corresponding entries.

Test Plan:
Snapshot changes are readable enough to manually verify. Programmatic
tests also added.

wchargin-branch: graph-becomes-edges
2018-05-03 14:16:18 -07:00
William Chargin e9ecb8c608
Find BECOMES edges for a high-level repository (#202)
Test Plan:
For the snapshot: verify that two of the BECOMES edges are the same as
those tested in `findBecomesEdgesForCommits` and that they have the
right commit hashes; then, verify that the remaining edge is correct.

wchargin-branch: high-level-becomes-repository
2018-05-03 14:09:13 -07:00
William Chargin c572b7f880
Find BECOMES edges for high-level commits (#201)
Test Plan:
Unit tests included. I verified that the hashes in the snapshot are
correct.

wchargin-branch: high-level-becomes-commits
2018-05-03 13:46:03 -07:00
Dandelion Mané a76d01ab75
Connect PRs and commits via MERGED_AS edges (#200)
This adds MERGED_AS edges which link from a PullRequest to a Commit. It
adds a corresponding `mergedCommitHash` method on the porcelain PR that
returns the hash of the merged commit (if available).

I would have preferred to return a porcelain wrapper over the commit,
but since we don't have a porcelain Git api, it seemed preferrable to
return the hash as a string. Returning a Node would both break
consistency in the porcelain api, and be problematic as the node does
not necessarily exist in the api. To ensure that the hash is available
without parsing Addresses, I used the edge payload. :)

Test plan:
Inspect the snapshot changes in the graph (they are fairly readable) and
the api testing in api.test.js.
2018-05-03 13:29:44 -07:00
Dandelion Mané 723efeb05f
Pull merge commit SHAs from GitHub (#198)
This commit adds a few fields to the PullRequest query fragment so that
we now retrieve merge commit shas. In cases where there is no merge
commit (ie the PR did not merge), the field is null. Observe that this
is the case for our example unmerged pull request.

Test plan: Inspect the changes to the demo data, and verify that they
are appropriate.
2018-05-03 12:41:20 -07:00
Dandelion Mané 9cbfa35a3a
Expose `commitAddress` from the Git plugin (#199)
For the GitHub plugin to create edges pointing to commits from the Git
plugin, it needs a way to create the appropriate address given the
commit's hash. This commit exposes that functionality by moving
`makeAddress` out of the "createGraph" module and into a new "address"
module, and using it to implement `commitAddress`.

Test plan: The code is so trivial that I don't think it merits testing.
2018-05-03 12:41:01 -07:00
Dandelion Mané 136cfa839c
Update example-github data (#197) 2018-05-03 12:03:19 -07:00
Dandelion Mané ce11a1c4e3
Rename sourcecred/example-{repo,github} (#196)
Our repository containing example GitHub data has been renamed from
"sourcecred/example-repo" to "sourcecred/example-github". This commit
updates all snapshots and tests to reflect this rename.
2018-05-03 11:51:12 -07:00
Dandelion Mané b4474e6bd1
Remove the `repositoryName` field from Addresses (#195)
See [#190] for context.

The change is almost entirely straightforward; the only "interesting"
decision I made was to move the repo owner and repo name into the string
id for the Artifact Plugin addresses, as the id would otherwise not be
unique.

[#190]: https://github.com/sourcecred/sourcecred/issues/190#issuecomment-386362870
2018-05-03 11:12:02 -07:00
William Chargin 082515e16a
Create nodes for submodule commits (#186)
Summary:
Previously, a tree entry had exactly one `HAS_CONTENTS` edge, unless the
tree entry corresponded to a submodule commit, in which case we had no
information at all. Now, submodule commit tree entries point to zero or
more `SUBMODULE_COMMIT` nodes. In the vast majority of cases, there will
be exactly one such node—however, it is possible that a particular tree
entry could correspond to multiple submodules (clone two identical
submodules with different URLs) or none at all (some manually edited
`.gitmodules` or other corruption).

Test Plan:
The snapshot changes are easy enough to read and verify (two new nodes
and five new edges). Additionally, the path-to-submodule `createGraph`
test has been updated.

wchargin-branch: graph-submodule-urls
2018-05-03 10:44:06 -07:00
William Chargin 7dbecfdac6
Load submodule URLs at each commit (#185)
Summary:
In Git, a tree may point to a commit directly. In our graph, we’d like
to represent “submodule commits” explicitly, because, a priori, we do
not know the repository to which the commit belongs. A submodule commit
node will store the hash of the referent commit, as well as the URL to
the subproject as listed in the .gitmodules file. In this commit, we
load the list of those URLs into the in-memory repository.

Shout-out to `git` for having an excellent command-line API:
the `--blob` argument to `git-config` is perfect for this situation.

Test Plan:
Snapshot updates are readable and sufficient.

wchargin-branch: load-submodule-urls
2018-05-03 10:39:03 -07:00
William Chargin bbb05c9508
Store `TreeEntry` metadata in non-string form (#184)
Summary:
Prior to this commit, given a `Tree` node with an edge to a `TreeEntry`
node, there was no way to tell what the entry name was other than
parsing the ID (which should never be required). This adds appropriate
data to the payload of a `TreeEntry`, and also to the inclusion edge (so
that if you only have the edge, you don’t have to fetch the entry).

Test Plan:
Snapshot changes are readable.

wchargin-branch: treeentry-metadata
2018-05-03 10:33:25 -07:00
William Chargin eba1872495
Build backend applications in CI (#193)
Summary:
This could catch failures in build configuration or with Webpack. It’s
unlikely to catch any logic errors, because no production code is run.
In any case, it’s fast enough; it finishes at about the same time as
`ci-test` and `check-pretty`.

Test Plan:
From the repository root, run `rm -r bin; yarn travis`, and note that
the `bin/` directory is regenerated.

wchargin-branch: ci-backend
2018-05-02 22:16:48 -07:00
William Chargin 25d0106a33
Run npm scripts with `--silent` in CI (#191)
Summary:
This prevents the boilerplate output of the form
```

> sourcecred-explorer@0.1.0 check-pretty /home/wchargin/git/sourcecred
> prettier --list-different '**/*.js'

```
(superfluous linebreaks included). In the case that a script fails, it
also omits the giant “this is most likely not a problem with npm” block.

The downside to this is that it suppresses any errors in npm-run-script
itself. For instance, `npm run wat` produces “missing script: wat”,
while `npm run --silent wat` just silently exits with 1. This does not
silence the actual scripts themselves, so things like lint errors or
test failures will still appear.

Test Plan:
Run `yarn travis` before and after this commit, and note that the
resulting build log is prettier after.

wchargin-branch: ci-silent
2018-05-02 19:10:37 -07:00
William Chargin 38f4121ce9
Implement a custom CI script (#189)
Summary:
This CI script accomplishes two tasks:
 1. It speeds up our build by parallelizing where possible.
 2. It opens the possibility for running Travis cron jobs.

Currently, this script by default does the same amount of work as our
current CI script. However, I’d like to move `yarn backend` into the
list of basic actions: a backend build failure should fail CI.

Note: this script is written to be executable directly by Node, so we
can’t use Flow types with the standard syntax. Instead, we use the
comment syntax: https://flow.org/en/docs/types/comments/

Test Plan:
The following should pass with useful output:
  - `npm run travis`
  - `GITHUB_TOKEN="your_github_token" npm run travis -- --full`

The following should fail with useful output:
  - `npm run travis -- --full` (fail)

To test different failure modes, it can be helpful to add
```js
    {id: "doomed", cmd: ["false"], deps: []},
    {id: "orphan", cmd: ["whoami"], deps: ["who", "are", "you"]},
```
to the list of `basicTasks` in `travis.js`.

To test performance:
```shell
$ time node ./config/travis.js >/dev/null 2>/dev/null

real    0m8.306s
user    0m20.336s
sys     0m1.364s

$ time bash -c \
>     'npm run check-pretty && npm run lint && npm run flow && CI=1 npm run test' \
>     >/dev/null 2>/dev/null

real    0m12.427s
user    0m13.752s
sys     0m0.804s
```
A 50% savings is not bad at all—and the raw time saved should only
improve from here on, as the individual steps start taking more time.

wchargin-branch: custom-ci
2018-05-02 16:10:03 -07:00
William Chargin 79dff9a083
Add options to not rebuild on shell script tests (#188)
Summary:
This can be useful for speed, but it can also be important for
correctness (at least theoretically): if we run both these scripts
concurrently, then we don’t want one of them to squash the `bin`
directory while the other is about to invoke an executable therein.

One might note that the diffs to the two files in this commit are
virtually identical, and indeed the files themselves are quite similar.
I’d prefer to keep the duplication for now; if we really need a Bash
snapshot testing framework, we can factor one out.

Test Plan:
Run each script with `--help`, with `--build` and `--no-build`, and with
and without `-u`.

wchargin-branch: optional-rebuild
2018-05-02 15:09:46 -07:00
William Chargin ee03c58357
Exclude punctuation surrounding URL references (#183)
Summary:
To avoid confusion, we simultaneously remove unused capturing groups.
This is not strictly necessary, but it makes the code less brittle.

Test Plan:
The newly added test fails before the change to `findReferences.js`.

wchargin-branch: url-punctuation
2018-05-01 14:51:18 -07:00
Dandelion Mané acf5000547
Create GitHub reference edges (#182)
This commit adds the `addReferenceEdges()` method to the GitHub parser,
which examines all of the posts in the parsed graph and adds References
edges when it detects references between posts. As an example, `Hey
@wchargin, take a look at #1337` would generate two references.

We currently parse the following kinds of references:
- Numeric references to issues/PRs.
- Explicit in-repository url references (to any entity)
- @-author references

We do not parse:
- Cross-repository urls
- Cross-repository shortform (e.g. `sourcecred/sourcecred#100`)

`Parser.parse` calls `addReferenceEdges()`, so no change is required by
consumers to have reference edges added to their graphs.

The GitHub porcelain API layer now includes methods for retreiving the
entities referenced by a post.

Test plan:
This commit is tested both via snapshot tests, and explicit testing at
api layer. (Actually, the creation of the porcelain API layer was
prompted by wanting a cleaner way to test this commit.) I recommend
inspecting the snapshot tests for sanity, but mostly relying on the
tested behavior in api.test.js.
2018-04-30 20:19:38 -07:00
Dandelion Mané f358c33e2a
findReferences now finds url references to users (#181)
For example, `https://github.com/decentralion` is now a valid url
reference to a GitHub author.

Test plan: Check the added test case.
2018-04-30 19:23:58 -07:00
Dandelion Mané 0c0bbf58e2
Update example repo data (#180)
I added a lot of new comments that have url references to different
types of GitHub entities, e.g. to pull request review comments.

The commit was generated by running the example repo fetcher, and
running yarn test and updating snapshots.
2018-04-30 19:21:41 -07:00
Dandelion Mané a1d072846d
Add PR reviews and comments to GitHub api (#179)
Also, a slight re-organization of the GitHub api test code.
2018-04-30 18:22:03 -07:00
William Chargin 16e8e399e6
Add commit parent edges in the Git graph (#178)
Test Plan:
To verify the snapshot change, either believe the programmatic tests, or
use the following script to verify that the right edges were added:
```bash
#!/bin/bash
set -eu
example_repo="$(mktemp -d)"
yarn backend >/dev/null 2>/dev/null
node bin/createExampleRepo.js "${example_repo}"
expected() {
    git -C "${example_repo}" log --format='%H %P' \
        | awk '{ if (NF > 1) { print $2 " " $1 } }' \
        | sort
}
actual() {
    git diff HEAD^..HEAD | grep -A 1 -F -e src -e dst \
        | sed -n 's/^+.*"id": "\(.\+\)".*/\1 /p' \
        | tr -d $'\n' | cat - <(echo) \
        | fold -s -w 82 | sed 's/ *$//' \
        | sort
}
diff -u <(expected) <(actual)
```

wchargin-branch: graph-commit-parents
2018-04-30 18:08:40 -07:00
William Chargin 56ddb5cf9b
Load commit parent hashes into memory (#177)
Test Plan:
Snapshot updated with `./src/plugins/git/loadRepositoryTest.sh -u`; unit
tests suffice.

wchargin-branch: load-commit-parents
2018-04-30 18:01:16 -07:00
William Chargin d5f468ca68
Fix Git plugin `NodePayload` definition (#176)
Summary:
Flow didn’t catch this because all the payloads are `{}` anyway.

Test Plan:
Note that every node and edge payload is now listed exactly once in the
correct spot for each of `{Node,Edge}{Type,Payload}`.

wchargin-branch: git-nodepayload
2018-04-30 17:08:49 -07:00
Dandelion Mané 0609201af4
Remove Graph.{in,out}Edges (#174)
This method removes `Graph.inEdges` and `Graph.outEdges`. As a
replacement to these two functions, `graph.neighborhood` now takes an
optional `direction` flag, which can be set to `"IN" | "OUT" | "ANY"`.

This reduces the surface area of the Graph API, and means that the same
pattern can be used when requesting in or out neighbors as is used when
requesting all neighbors.

This change generates significant churn in the test files, and in some
cases the tests are less elegant / show historicity, as they were
written for the type signature of `{in,out}Edges`, which just returns an
array of edges, and now receive an array of neighbors. I think this is
acceptable, and it's not worth re-writing the test.

In many cases, replacing existing calls to `{in,out}Edges` in our actual
codebase resulted in cleaner code, as `neighborhood` successfully
abstracts over the common patterns that users of `{in,out}Edges` were
implementing.

As a fly-by refactor, I also changed the `neighborAddress` part of the
`neighborhood` return value to `neighbor`. It's a little less
descriptive, but it's more concise, and flow is there to help ensure
it's used correctly.

Test plan: Note that CI passes. Inspect the test changes, and verify
that they are appropriate transformations for consuming the new API.
2018-04-30 15:32:28 -07:00
William Chargin 5af5748ed7
Convert in-memory Git repos to cred graphs (#169)
Test Plan:
This snapshot test is too unwieldy to actually read—it’s 1000 lines of
opaque SHAs and thrice-stringified JSON objects—so it should be
interpreted as a regression test only. The programmatic tests should
suffice.

wchargin-branch: wip-git-create-graph
2018-04-30 15:23:37 -07:00
William Chargin f3a440244e
Fix all lint errors, adding a lint CI step (#175)
Test Plan:
Run `yarn lint` and `yarn travis` and observe success. Add something
that triggers a lint warning, like `const zzz = 3;`; re-run and observe
failures.

wchargin-branch: lint
2018-04-30 14:52:28 -07:00
Dandelion Mané 22ca77ed05
Add safe type coercion for GitHub api (#173)
In general, methods in the porcelain GitHub api may return multiple
types; e.g. a reference could be to an Issue, PullRequest, Comment,
Author (or more). To make working with the api more convenient while
maintaining safety, this commit adds a static `asType` method to each
Entity class, which confirms that type coercion is safe, and errors if
not.

This commit also adds `issueOrPRByNumber`, a convenience method, to
api.test.js.

Test plan: Check the API usage and verify that it is reasonable.
2018-04-30 10:07:23 -07:00
Dandelion Mané d878be6550
Update the GitHub example repo data (#172)
Commit generated by running src/plugins/github/fetchGithubRepoTest -u
2018-04-29 22:13:00 -07:00
Dandelion Mané 7158deaad3
Add a porcelain api for Github data (#170)
Interacting with raw contribution graphs is cumbersome. We'll need
more fluent and convenient ways to retrieve data from them; we can do
this by creating porcelain APIs that wrap the underlying graph.

This commit adds a simple porcelain API for the GitHub data. It creates
the following classes:

* `api.Repository`
* `api.Issue`
* `api.PullRequest`
* `api.Comment`
* `api.Author`

The classes all wrap a graph and a nodeAddress. They provide read-only
functions for retreiving data from the graph; that data might be a part
of the node payload, or it might do some graph traversal under the hood.

The choice to have the wrapper hold onto the Address rather than the
node itself was deliberate; in the future, the graph may contain nodes
that are not synchronously reachable, so this approach allows us to
create wrappers for nodes we can't synchronously reach. When this comes
up in practice, we can then add async methods to the wrapper.

Note that some data already included in our graph, such as
PullRequestReviews and PullRequestReviewComments, were deliberately
excluded, so as to allow the core ideas to be reviewed without
unnecessary clutter.

Test plan:
Check that the unit tests appropriately test the behavior, and that the
API seems pleasant to use.
2018-04-27 21:45:30 -07:00
William Chargin 1c28c75e39
Check in example repo’s in-memory representation (#166)
Summary:
Two reasons for this. First, we want tests to be able to operate on this
data without having to generate repositories via `git(1)`. (Doing that
is slow, and requires a Git installation, and makes it less clear that
the tests are correctly isolated/provides more surface area for
something to go wrong.) Second, in general plugins will need a canonical
source of test data, so setting/continuing this precedent is a good
thing.

Test Plan:
Observe that the old Jest snapshot must be equivalent to the new JSON
one, because the test criterion in `loadRepository.test.js` changed and
the test still passes. Then, run `loadRepositoryTest.sh` and note that
it passes; change the `example-git.json` file and note that the test
fails when re-run; then, run the test with `--updateSnapshot` and watch
it magically revert your changes.

wchargin-branch: check-in-git-repo
2018-04-27 20:51:54 -07:00
William Chargin 301e542ee1
Switch in-memory Git types from Maps to objects (#165)
Summary:
I’d like to use `Map`s whenever the keys are homogeneous (i.e.,
dictionaries, not structs). But this has proven infeasible. The primary
issue at this point is that `JSON.stringify(anyMap)` is `"{}"`—not
entirely unreasonable given that maps can have non-string keys, but
frustrating enough to not use them.

Test Plan:
Jest appears to order the snapshot keys differently for `Map`s and
objects (the former by insertion order and the latter alphabetical),
which makes the snapshot change harder to read. I verified that the
general structure is okay, and hand-verified some of the individual
changes. Noting that the number of lines added and deleted in the
snapshot is a good sanity check.

wchargin-branch: map-to-object
2018-04-27 20:46:55 -07:00
Dandelion Mané ec3d084ffc
Graph: type filter for `nodes()` and `edges()` (#168)
When requesting nodes and edges from the graph, it is convenient to
filter them by their type.

In the future, we should add plugin filtering as well, as we
expect type names to collide across plugins.

We may also want to consider keeping a cache of nodes and edges by type
to speed up these calls, if they become performance bottlenecks. (The
implementation in this commit naively iterates over every node/edge.)

Test plan:
Verify that the unit tests are appropriate.
2018-04-27 20:10:51 -07:00
Dandelion Mané 3d79f7680e
Collapse the 3 author types into 1 (#164)
Currently, we store GitHub Users, Organizations, and Bots as separate
nodetypes in the graph. This is inconvenient, as we don't care very much
what type of entity authored a node.

This commit collapses those three categories into one nodetype. The
extra information has migrated to the node payload, so it is still
possible to discover this information if it's important.

Test plan: There is some amount of snapshot churn because the author
node types and payloads have changed. Verify that the snapshot changes
are appropriate, and that CI passes.
2018-04-27 15:52:55 -07:00
Dandelion Mané dd48084810
Remove `get` prefix from getters in graph.js (#163)
This commit renames the following graph functions:

* `get{Node,Edge}{,s}` -> `{node,edge}{,s}`
* `get{In,Out}Edges` -> `{in,out}Edges`
* `getNeighborhood` -> `neighborhood`

The rename was effected across the repo by running:

```
$ find src -name "*.js" -exec sed -i 's/getNeighborhood/neighborhood/g' {} +
```

modified appropriately for each subsitution.

Test plan:
Inspect the code to make sure nothing was erronously renamed. Check that
CI passes.
2018-04-27 15:10:12 -07:00
Dandelion Mané 678924087a
Replace `Graph.{getAdjacentEdges,getNeighborhood}` (#162)
`Graph.getAdjacentEdges` had a serious defect: for the adjacent edges,
it's hard to tell which of the {src,dst} is the neighboring node address
and which is the node we called `getAdjacentEdges` on.

This commit fixes that limitation by replacing `getAdjacentEdges` with
`getNeighborhood`, with a return signature of
`{edge: Edge<EP>, neighborAddress: Address}[]`

Some yak shaving was required: we needed a version of `expectSameSorted`
and, by extension, `sortedByAddress` that takes an accessor to an
Addressable, so that we could test that the neighborhoods returned were
correct. To satisfy flow, I created `expectSameSortedAccessorized` and
`sortedByAddressAccessorized`. Cumbersome, but it worked. ¯\_(ツ)_/¯
2018-04-27 15:05:36 -07:00
Dandelion Mané 28e686c369
Remove `address.sortedByAddress` (#161)
Previously, the address module exported `sortedByAddress`, a utility
function that sorts an array of `Addressable`s. This function was only
used in test code.

This commit replaces it with generic usage of `lodash.sortBy`. This
reduces the API surface area of the module, and removes test-only code
from the exported api.

New dependency added: `lodash.sortby`
https://www.npmjs.com/package/lodash.sortby
2018-04-27 14:29:49 -07:00
William Chargin 1550e6d05e
Add one-way GitHub sync for Git example repos (#160)
Test Plan:
Run the script with `--dry-run`, which currently prints
```shell
$ src/plugins/git/demoData/synchronizeToGithub.sh -n
yarn run v1.5.1
[build output truncated]
Build completed; results in 'bin'.
Done in 3.30s.

Synchronizing: example-git
warning: no common commits
To github.com:sourcecred/example-git.git
 + 3507b7c...3715ddf HEAD -> master (forced update)

Synchronizing: example-git-submodule
Everything up-to-date

Done.
```
This reflects the correct state of affairs, because #158 changed the
example repository. Note that the `3715ddf` SHA in the output of the
above script matches the SHA in the `exampleRepo.test.js.snap` snapshot.

wchargin-branch: sync-git-example-repos
2018-04-27 14:03:54 -07:00
William Chargin f4de3e2067
Standardize environment passed to Git (#159)
Summary:
When we shell out to `git`, we don’t want the end user’s environment
variables and Git configuration to influence the results. This commit
standardizes those inputs. Standardizing the environment has the side
benefit that the `GIT_DIR` environment variable is not set, which means
that the test suite will work properly when run from the `exec` step of
a Git rebase.

Test Plan:
Tests pass and snapshots are unchanged. Note that
```shell
$ git rebase HEAD --exec 'CI=1 yarn test'
```
works after this commit but not before it.

wchargin-branch: standardize-git-environment
2018-04-27 11:50:32 -07:00
William Chargin c7235f6e49
Remove superfluous commas from example repo README (#158)
Summary:
Using `array.join()` added commas at the start of some lines; I meant to
use `array.join("")`.

(I’ve now inspected the full generated contents of both repos, and they
look good.)

Test Plan:
It is expected that these attributes of the snapshots should change.
There’s no need to carefully check the SHAs.

wchargin-branch: readme-change
2018-04-27 11:48:40 -07:00
Dandelion Mané 8e9ddbf9fc
Add `Graph.getAdjacentEdges` for in and out edges (#157)
Some consumers of the graph may prefer to treat it as an undirected
graph. For example, when finding the author of an issue, it is wholly
sufficient to find an edge with the `AUTHORS` type; the caller may
prefer not to be bothered with remembering which end of the `AUTHORS`
end is considered the `src` and which is the `dst`.

The `getAdjacentEdges` call enables that, by combining the output of
`getInEdges` and `getOutEdges`.

Test plan:
The new tests are pretty comprehensive.
2018-04-26 20:27:46 -07:00
William Chargin aa071ceab3
Include a submodule in the main example repository (#156)
Summary:
The main example repository now covers the currently desired features:
it has blobs, subtrees, and submodules, and commits that change each of
these. (We don’t have merge commits yet—we can add those once we start
to care about them.)

Once this is merged, I will push the two repositories to GitHub.

Test Plan:
Verifying and understanding is easier than ever before. You can run the
following commands to create the repositories in question on your disk:
```shell
$ yarn backend
$ node bin/createExampleRepo.js /tmp/repo
$ node bin/createExampleRepo.js --submodule /tmp/subrepo
```

You can then explore these repositories at your leisure. For instance,
to check that the `loadRepository` snapshot has the right set of
commits, inspect the output of the following command:
```shell
$ git -C /tmp/repo log --format='%H %T'
```
Or, to check that a particular tree has the right contents, just run:
```shell
$ git -C /tmp/repo ls-tree TREE_SHA
```
Verifying the `exampleRepo` snapshot is similarly easy: just check that
the lists of commit SHAs in `/tmp/repo` and `/tmp/subrepo` are correct.

wchargin-branch: include-submodule
2018-04-26 20:11:44 -07:00
William Chargin d6e9b0a72b
Add a command-line script to create example repos (#155)
Summary:
We’ll use this to create the repositories on disk and then push them to
GitHub.

Test Plan:
Generate both kinds of repository, and check out the SHAs:
```shell
$ yarn backend
$ node bin/createExampleRepo.js /tmp/repo
$ node bin/createExampleRepo.js --submodule /tmp/repo-submodule
$ node bin/createExampleRepo.js --no-submodule /tmp/repo-no-submodule
$ # (first and third lines do the same thing)
$ git -C /tmp/repo rev-parse HEAD
677b340674bde17fdaac3b5f5eef929139ef2a52
$ git -C /tmp/repo-submodule rev-parse HEAD
29ef158bc982733e2ba429fcf73e2f7562244188
$ git -C /tmp/repo-no-submodule rev-parse HEAD
677b340674bde17fdaac3b5f5eef929139ef2a52
```
Then, note that these SHAs are expected per the snapshot file in
`exampleRepo.test.js.snap`.

wchargin-branch: create-example-repo-command
2018-04-26 19:53:46 -07:00
William Chargin ef7610a440
Only exclude top-level directories from Prettier (#154)
Summary:
In particular, we excluded `bin`, but this was catching non-root
directories named `bin`, too, and so files like
`src/plugins/github/bin/fetchAndPrintGithubRepo.js` were not subject to
prettification. Happily, those files are all pretty enough, anyway.

Test Plan:
Note that mangling the format of `fetchAndPrintGithubRepo.js` prior to
this commit would not cause `yarn check-pretty` to fail, nor would the
manglings be fixed by `yarn prettify`—but that both of these behaviors
are reversed after this commit.

wchargin-branch: prettier-exclude-root-only
2018-04-26 19:47:58 -07:00
William Chargin 28a118c814
Create an example submodule repository (#153)
Summary:
We want our main repository to include submodules so that we can test
submodule support. Here, we create a repository to be included as a
submodule.

wchargin-branch: example-submodule-repository
2018-04-26 19:43:01 -07:00
William Chargin 75fd068a35
Extract code to create the example repository (#152)
Test Plan:
Note that the snapshot change is simply a move: no SHAs were changed.

wchargin-branch: extract-example-repository-code
2018-04-26 19:38:29 -07:00
William Chargin 6f9941b526
Clean up temporary directories in tests (#151)
Summary:
The `loadRepository` test tries to clean up temporary directories, but
failed to do so because the directories were not empty. The cleanup hook
threw an error, but this error was silenced by Jest due to [a known
bug][1] that was fixed a few days ago. We can fix this by asking `tmp`
to clean up directories even if they are not empty, using the
`unsafeCleanup` option.

[1]: https://github.com/facebook/jest/issues/3266

Test Plan:
While running `watch -n 0.1 'ls /tmp | grep "tmp-.*" | wc -l'`, run
tests. Note that the number increases by five and then drops down again;
before this patch, it would increase by 5 and then stay there.

wchargin-branch: clean-up-tmpdirs
2018-04-26 19:34:23 -07:00
William Chargin 3679529bef
Move `localGit`/`GitDriver` into Git utils (#150)
Summary:
A few reasons for this:
 1. This _is_ a utility, so it makes sense semantically.
 2. This unifies the utilities API; clients like `loadRepository.test`
    don’t have to keep around both a `git` and a `gitUtils`.
 3. Most importantly, further scripts and tests shouldn’t depend on
    `loadRepository` just for `localGit`. Depending on `gitUtils` makes
    much more sense.

(Note that `makeUtils` is no longer dependency-injectable, but that’s
okay; I considered this and favored YAGNI on this one.)

Test Plan:
Existing unit tests pass.

wchargin-branch: move-localgit
2018-04-26 19:31:27 -07:00