sourcecred

Commit Graph

Author	SHA1	Message	Date
Dandelion Mané	0c0bbf58e2	Update example repo data (#180 ) I added a lot of new comments that have url references to different types of GitHub entities, e.g. to pull request review comments. The commit was generated by running the example repo fetcher, and running yarn test and updating snapshots.	2018-04-30 19:21:41 -07:00
Dandelion Mané	a1d072846d	Add PR reviews and comments to GitHub api (#179 ) Also, a slight re-organization of the GitHub api test code.	2018-04-30 18:22:03 -07:00
William Chargin	16e8e399e6	Add commit parent edges in the Git graph (#178 ) Test Plan: To verify the snapshot change, either believe the programmatic tests, or use the following script to verify that the right edges were added: ```bash #!/bin/bash set -eu example_repo="$(mktemp -d)" yarn backend >/dev/null 2>/dev/null node bin/createExampleRepo.js "${example_repo}" expected() { git -C "${example_repo}" log --format='%H %P' \ \| awk '{ if (NF > 1) { print $2 " " $1 } }' \ \| sort } actual() { git diff HEAD^..HEAD \| grep -A 1 -F -e src -e dst \ \| sed -n 's/^+."id": "$.\+$"./\1 /p' \ \| tr -d $'\n' \| cat - <(echo) \ \| fold -s -w 82 \| sed 's/ *$//' \ \| sort } diff -u <(expected) <(actual) ``` wchargin-branch: graph-commit-parents	2018-04-30 18:08:40 -07:00
William Chargin	56ddb5cf9b	Load commit parent hashes into memory (#177 ) Test Plan: Snapshot updated with `./src/plugins/git/loadRepositoryTest.sh -u`; unit tests suffice. wchargin-branch: load-commit-parents	2018-04-30 18:01:16 -07:00
William Chargin	d5f468ca68	Fix Git plugin `NodePayload` definition (#176 ) Summary: Flow didn’t catch this because all the payloads are `{}` anyway. Test Plan: Note that every node and edge payload is now listed exactly once in the correct spot for each of `{Node,Edge}{Type,Payload}`. wchargin-branch: git-nodepayload	2018-04-30 17:08:49 -07:00
Dandelion Mané	0609201af4	Remove Graph.{in,out}Edges (#174 ) This method removes `Graph.inEdges` and `Graph.outEdges`. As a replacement to these two functions, `graph.neighborhood` now takes an optional `direction` flag, which can be set to `"IN" \| "OUT" \| "ANY"`. This reduces the surface area of the Graph API, and means that the same pattern can be used when requesting in or out neighbors as is used when requesting all neighbors. This change generates significant churn in the test files, and in some cases the tests are less elegant / show historicity, as they were written for the type signature of `{in,out}Edges`, which just returns an array of edges, and now receive an array of neighbors. I think this is acceptable, and it's not worth re-writing the test. In many cases, replacing existing calls to `{in,out}Edges` in our actual codebase resulted in cleaner code, as `neighborhood` successfully abstracts over the common patterns that users of `{in,out}Edges` were implementing. As a fly-by refactor, I also changed the `neighborAddress` part of the `neighborhood` return value to `neighbor`. It's a little less descriptive, but it's more concise, and flow is there to help ensure it's used correctly. Test plan: Note that CI passes. Inspect the test changes, and verify that they are appropriate transformations for consuming the new API.	2018-04-30 15:32:28 -07:00
William Chargin	5af5748ed7	Convert in-memory Git repos to cred graphs (#169 ) Test Plan: This snapshot test is too unwieldy to actually read—it’s 1000 lines of opaque SHAs and thrice-stringified JSON objects—so it should be interpreted as a regression test only. The programmatic tests should suffice. wchargin-branch: wip-git-create-graph	2018-04-30 15:23:37 -07:00
William Chargin	f3a440244e	Fix all lint errors, adding a lint CI step (#175 ) Test Plan: Run `yarn lint` and `yarn travis` and observe success. Add something that triggers a lint warning, like `const zzz = 3;`; re-run and observe failures. wchargin-branch: lint	2018-04-30 14:52:28 -07:00
Dandelion Mané	22ca77ed05	Add safe type coercion for GitHub api (#173 ) In general, methods in the porcelain GitHub api may return multiple types; e.g. a reference could be to an Issue, PullRequest, Comment, Author (or more). To make working with the api more convenient while maintaining safety, this commit adds a static `asType` method to each Entity class, which confirms that type coercion is safe, and errors if not. This commit also adds `issueOrPRByNumber`, a convenience method, to api.test.js. Test plan: Check the API usage and verify that it is reasonable.	2018-04-30 10:07:23 -07:00
Dandelion Mané	d878be6550	Update the GitHub example repo data (#172 ) Commit generated by running src/plugins/github/fetchGithubRepoTest -u	2018-04-29 22:13:00 -07:00
Dandelion Mané	7158deaad3	Add a porcelain api for Github data (#170 ) Interacting with raw contribution graphs is cumbersome. We'll need more fluent and convenient ways to retrieve data from them; we can do this by creating porcelain APIs that wrap the underlying graph. This commit adds a simple porcelain API for the GitHub data. It creates the following classes: * `api.Repository` * `api.Issue` * `api.PullRequest` * `api.Comment` * `api.Author` The classes all wrap a graph and a nodeAddress. They provide read-only functions for retreiving data from the graph; that data might be a part of the node payload, or it might do some graph traversal under the hood. The choice to have the wrapper hold onto the Address rather than the node itself was deliberate; in the future, the graph may contain nodes that are not synchronously reachable, so this approach allows us to create wrappers for nodes we can't synchronously reach. When this comes up in practice, we can then add async methods to the wrapper. Note that some data already included in our graph, such as PullRequestReviews and PullRequestReviewComments, were deliberately excluded, so as to allow the core ideas to be reviewed without unnecessary clutter. Test plan: Check that the unit tests appropriately test the behavior, and that the API seems pleasant to use.	2018-04-27 21:45:30 -07:00
William Chargin	1c28c75e39	Check in example repo’s in-memory representation (#166 ) Summary: Two reasons for this. First, we want tests to be able to operate on this data without having to generate repositories via `git(1)`. (Doing that is slow, and requires a Git installation, and makes it less clear that the tests are correctly isolated/provides more surface area for something to go wrong.) Second, in general plugins will need a canonical source of test data, so setting/continuing this precedent is a good thing. Test Plan: Observe that the old Jest snapshot must be equivalent to the new JSON one, because the test criterion in `loadRepository.test.js` changed and the test still passes. Then, run `loadRepositoryTest.sh` and note that it passes; change the `example-git.json` file and note that the test fails when re-run; then, run the test with `--updateSnapshot` and watch it magically revert your changes. wchargin-branch: check-in-git-repo	2018-04-27 20:51:54 -07:00
William Chargin	301e542ee1	Switch in-memory Git types from Maps to objects (#165 ) Summary: I’d like to use `Map`s whenever the keys are homogeneous (i.e., dictionaries, not structs). But this has proven infeasible. The primary issue at this point is that `JSON.stringify(anyMap)` is `"{}"`—not entirely unreasonable given that maps can have non-string keys, but frustrating enough to not use them. Test Plan: Jest appears to order the snapshot keys differently for `Map`s and objects (the former by insertion order and the latter alphabetical), which makes the snapshot change harder to read. I verified that the general structure is okay, and hand-verified some of the individual changes. Noting that the number of lines added and deleted in the snapshot is a good sanity check. wchargin-branch: map-to-object	2018-04-27 20:46:55 -07:00
Dandelion Mané	ec3d084ffc	Graph: type filter for `nodes()` and `edges()` (#168 ) When requesting nodes and edges from the graph, it is convenient to filter them by their type. In the future, we should add plugin filtering as well, as we expect type names to collide across plugins. We may also want to consider keeping a cache of nodes and edges by type to speed up these calls, if they become performance bottlenecks. (The implementation in this commit naively iterates over every node/edge.) Test plan: Verify that the unit tests are appropriate.	2018-04-27 20:10:51 -07:00
Dandelion Mané	3d79f7680e	Collapse the 3 author types into 1 (#164 ) Currently, we store GitHub Users, Organizations, and Bots as separate nodetypes in the graph. This is inconvenient, as we don't care very much what type of entity authored a node. This commit collapses those three categories into one nodetype. The extra information has migrated to the node payload, so it is still possible to discover this information if it's important. Test plan: There is some amount of snapshot churn because the author node types and payloads have changed. Verify that the snapshot changes are appropriate, and that CI passes.	2018-04-27 15:52:55 -07:00
Dandelion Mané	dd48084810	Remove `get` prefix from getters in graph.js (#163 ) This commit renames the following graph functions: * `get{Node,Edge}{,s}` -> `{node,edge}{,s}` * `get{In,Out}Edges` -> `{in,out}Edges` * `getNeighborhood` -> `neighborhood` The rename was effected across the repo by running: ``` $ find src -name "*.js" -exec sed -i 's/getNeighborhood/neighborhood/g' {} + ``` modified appropriately for each subsitution. Test plan: Inspect the code to make sure nothing was erronously renamed. Check that CI passes.	2018-04-27 15:10:12 -07:00
Dandelion Mané	678924087a	Replace `Graph.{getAdjacentEdges,getNeighborhood}` (#162 ) `Graph.getAdjacentEdges` had a serious defect: for the adjacent edges, it's hard to tell which of the {src,dst} is the neighboring node address and which is the node we called `getAdjacentEdges` on. This commit fixes that limitation by replacing `getAdjacentEdges` with `getNeighborhood`, with a return signature of `{edge: Edge<EP>, neighborAddress: Address}[]` Some yak shaving was required: we needed a version of `expectSameSorted` and, by extension, `sortedByAddress` that takes an accessor to an Addressable, so that we could test that the neighborhoods returned were correct. To satisfy flow, I created `expectSameSortedAccessorized` and `sortedByAddressAccessorized`. Cumbersome, but it worked. ¯\_(ツ)_/¯	2018-04-27 15:05:36 -07:00
Dandelion Mané	28e686c369	Remove `address.sortedByAddress` (#161 ) Previously, the address module exported `sortedByAddress`, a utility function that sorts an array of `Addressable`s. This function was only used in test code. This commit replaces it with generic usage of `lodash.sortBy`. This reduces the API surface area of the module, and removes test-only code from the exported api. New dependency added: `lodash.sortby` https://www.npmjs.com/package/lodash.sortby	2018-04-27 14:29:49 -07:00
William Chargin	1550e6d05e	Add one-way GitHub sync for Git example repos (#160 ) Test Plan: Run the script with `--dry-run`, which currently prints ```shell $ src/plugins/git/demoData/synchronizeToGithub.sh -n yarn run v1.5.1 [build output truncated] Build completed; results in 'bin'. Done in 3.30s. Synchronizing: example-git warning: no common commits To github.com:sourcecred/example-git.git + 3507b7c...3715ddf HEAD -> master (forced update) Synchronizing: example-git-submodule Everything up-to-date Done. ``` This reflects the correct state of affairs, because #158 changed the example repository. Note that the `3715ddf` SHA in the output of the above script matches the SHA in the `exampleRepo.test.js.snap` snapshot. wchargin-branch: sync-git-example-repos	2018-04-27 14:03:54 -07:00
William Chargin	f4de3e2067	Standardize environment passed to Git (#159 ) Summary: When we shell out to `git`, we don’t want the end user’s environment variables and Git configuration to influence the results. This commit standardizes those inputs. Standardizing the environment has the side benefit that the `GIT_DIR` environment variable is not set, which means that the test suite will work properly when run from the `exec` step of a Git rebase. Test Plan: Tests pass and snapshots are unchanged. Note that ```shell $ git rebase HEAD --exec 'CI=1 yarn test' ``` works after this commit but not before it. wchargin-branch: standardize-git-environment	2018-04-27 11:50:32 -07:00
William Chargin	c7235f6e49	Remove superfluous commas from example repo README (#158 ) Summary: Using `array.join()` added commas at the start of some lines; I meant to use `array.join("")`. (I’ve now inspected the full generated contents of both repos, and they look good.) Test Plan: It is expected that these attributes of the snapshots should change. There’s no need to carefully check the SHAs. wchargin-branch: readme-change	2018-04-27 11:48:40 -07:00
Dandelion Mané	8e9ddbf9fc	Add `Graph.getAdjacentEdges` for in and out edges (#157 ) Some consumers of the graph may prefer to treat it as an undirected graph. For example, when finding the author of an issue, it is wholly sufficient to find an edge with the `AUTHORS` type; the caller may prefer not to be bothered with remembering which end of the `AUTHORS` end is considered the `src` and which is the `dst`. The `getAdjacentEdges` call enables that, by combining the output of `getInEdges` and `getOutEdges`. Test plan: The new tests are pretty comprehensive.	2018-04-26 20:27:46 -07:00
William Chargin	aa071ceab3	Include a submodule in the main example repository (#156 ) Summary: The main example repository now covers the currently desired features: it has blobs, subtrees, and submodules, and commits that change each of these. (We don’t have merge commits yet—we can add those once we start to care about them.) Once this is merged, I will push the two repositories to GitHub. Test Plan: Verifying and understanding is easier than ever before. You can run the following commands to create the repositories in question on your disk: ```shell $ yarn backend $ node bin/createExampleRepo.js /tmp/repo $ node bin/createExampleRepo.js --submodule /tmp/subrepo ``` You can then explore these repositories at your leisure. For instance, to check that the `loadRepository` snapshot has the right set of commits, inspect the output of the following command: ```shell $ git -C /tmp/repo log --format='%H %T' ``` Or, to check that a particular tree has the right contents, just run: ```shell $ git -C /tmp/repo ls-tree TREE_SHA ``` Verifying the `exampleRepo` snapshot is similarly easy: just check that the lists of commit SHAs in `/tmp/repo` and `/tmp/subrepo` are correct. wchargin-branch: include-submodule	2018-04-26 20:11:44 -07:00
William Chargin	d6e9b0a72b	Add a command-line script to create example repos (#155 ) Summary: We’ll use this to create the repositories on disk and then push them to GitHub. Test Plan: Generate both kinds of repository, and check out the SHAs: ```shell $ yarn backend $ node bin/createExampleRepo.js /tmp/repo $ node bin/createExampleRepo.js --submodule /tmp/repo-submodule $ node bin/createExampleRepo.js --no-submodule /tmp/repo-no-submodule $ # (first and third lines do the same thing) $ git -C /tmp/repo rev-parse HEAD 677b340674bde17fdaac3b5f5eef929139ef2a52 $ git -C /tmp/repo-submodule rev-parse HEAD 29ef158bc982733e2ba429fcf73e2f7562244188 $ git -C /tmp/repo-no-submodule rev-parse HEAD 677b340674bde17fdaac3b5f5eef929139ef2a52 ``` Then, note that these SHAs are expected per the snapshot file in `exampleRepo.test.js.snap`. wchargin-branch: create-example-repo-command	2018-04-26 19:53:46 -07:00
William Chargin	28a118c814	Create an example submodule repository (#153 ) Summary: We want our main repository to include submodules so that we can test submodule support. Here, we create a repository to be included as a submodule. wchargin-branch: example-submodule-repository	2018-04-26 19:43:01 -07:00
William Chargin	75fd068a35	Extract code to create the example repository (#152 ) Test Plan: Note that the snapshot change is simply a move: no SHAs were changed. wchargin-branch: extract-example-repository-code	2018-04-26 19:38:29 -07:00
William Chargin	6f9941b526	Clean up temporary directories in tests (#151 ) Summary: The `loadRepository` test tries to clean up temporary directories, but failed to do so because the directories were not empty. The cleanup hook threw an error, but this error was silenced by Jest due to [a known bug][1] that was fixed a few days ago. We can fix this by asking `tmp` to clean up directories even if they are not empty, using the `unsafeCleanup` option. [1]: https://github.com/facebook/jest/issues/3266 Test Plan: While running `watch -n 0.1 'ls /tmp \| grep "tmp-.*" \| wc -l'`, run tests. Note that the number increases by five and then drops down again; before this patch, it would increase by 5 and then stay there. wchargin-branch: clean-up-tmpdirs	2018-04-26 19:34:23 -07:00
William Chargin	3679529bef	Move `localGit`/`GitDriver` into Git utils (#150 ) Summary: A few reasons for this: 1. This _is_ a utility, so it makes sense semantically. 2. This unifies the utilities API; clients like `loadRepository.test` don’t have to keep around both a `git` and a `gitUtils`. 3. Most importantly, further scripts and tests shouldn’t depend on `loadRepository` just for `localGit`. Depending on `gitUtils` makes much more sense. (Note that `makeUtils` is no longer dependency-injectable, but that’s okay; I considered this and favored YAGNI on this one.) Test Plan: Existing unit tests pass. wchargin-branch: move-localgit	2018-04-26 19:31:27 -07:00
William Chargin	dad8777e6c	Add utility functions for working with Git repos (#149 ) Summary: Utilities like `deterministicCommit` provide valuable functionality that we will want to use in other scripts and perhaps other test cases. It makes sense to factor these out into utility functions. Test Plan: Existing tests pass. wchargin-branch: git-utils	2018-04-26 19:17:16 -07:00
Dandelion Mané	087d8d561e	Enable type filtering on Graph.get{In,Out}Edges (#147 ) This commit adds an optional `typeOptions` argument to Graph.getInEdges and Graph.getOutEdges. The `typeOptions` allow filtering the returned edges by the type of the edge, and the type of the node that the edge is connected to. This makes it much easier to use these methods to find connections that have a certain relationship, e.g. finding the author of a commit or the comments on an issue. Test plan: A new test suite was written that comprehensively tests this behavior, both for getInEdges and getOutEdges.	2018-04-26 19:09:37 -07:00
Dandelion Mané	c635034dab	Add richer types to graphDemoData (#146 ) In preparation for using type info in the Graph apis, it is helpful to have richer type info in the graph demo data. Test plan: Check that the snapshot changes only consist of type changes, and that CI passes.	2018-04-26 16:44:34 -07:00
Dandelion Mané	18ce9982d2	Refactor GH parser to expose a functional api (#145 ) Our GitHub parser is implemented via a `GithubParser` class which builds the GitHub graph. This is a convenient implementation, but an awkward API. This commit refactors the module so that it exposes a clean `parse` function, which ingests the GitHub JSON data and returns as completed graph. Test plan: The unit tests have been re-written to use the new public API. All the snapshots are unchanged, and flow passes. Additionally, I ran `yarn start` and verified that the GithubGraphFetcher for the Artifact plugin is still working.	2018-04-25 19:39:28 -07:00
William Chargin	5e4b7b1fcc	Extract repository types to a separate module (#141 ) Summary: We should be able to get the types without depending on the function to load a Git repo from disk, and in particular without depending on `child_process`. Test Plan: Flow and tests are sufficient. wchargin-branch: extract-repository-types	2018-04-25 14:35:45 -07:00
Dandelion Mané	ad56ba087c	Use urls as ids for GitHub nodes (#144 ) There's some context at #127, in which I initially proposed this change. In addition to the long-term benefits described in #127, there is a short-term benefit which is that it makes snapshot tests easier to read, because the GitHub ids are opaque and unreadable, while the GitHub urls are relatively easy to parse. This change results in significant snapshot churn.	2018-04-25 14:28:06 -07:00
Dandelion Mané	4f857a8bb1	Add return type info for fetchGithubRepo (#143 ) Once the type was added, flow correctly discovered a bug in GithubGraphFetcher.js, which resulted in broken graph fetching in the ArtifactEditor. Oops! / Good work Flow! I made `ensureNoMorePages` expect the result it is testing to be an `any`, which is appropriate for how the function is written (i.e. it is written in a way that is agnostic to the actual result).	2018-04-24 16:00:50 -07:00
Dandelion Mané	8fdfacd097	fetchGithubRepo: remove vestigial data field (#142 ) The example-repo.json file is regenerated with large diffs due to the change in indentation level throughout the file. Test plan: Sanity check the snapshot (close inspection is unnecessary due to the simplicity of the code change). Check that CI pases.	2018-04-24 15:46:16 -07:00
Dandelion Mané	6a3e4d754c	Add flow types for GitHub graphql query response (#140 ) This commit adds flow typing for the JSON result from hitting the GitHub graphql api. We can't prove that the flow typing is correct, but since the type definition is colocated with the corresponding fragment definitions, we can hope that maintainers will maintain both together. We update the parser to consume the new flow types. There are no flow errors. Test plan: Inspect the flowtypes, verify that they correspond to the data in example-repo.json, and that there are no flow errors.	2018-04-24 14:05:51 -07:00
William Chargin	418b745d7c	Load Git repositories into memory (#139 ) Summary: In this newly added module, we load the structural state of a git repository into memory. We do not load into memory the contents of any blobs, so this is not enough information to perform any analysis requiring file diffing. However, it is sufficient to develop a notion of “this file was changed in this commit”, by simply diffing the trees. Test Plan: Unit tests added; `yarn test` suffices. Reading these snapshots is pretty easy, even though they’re filled with hashes: - First, read over the commit specifications on lines 69–83 of `loadRepository.test.js`, so you know what to expect. - In the snapshot file, keep handy the time-ordered list of commit SHAs at the bottom of the file, so that you know which commit SHA is which. - To verify that the large snapshot is correct: for each commit, read the corresponding tree object and make sure that the structure is correct. - To verify the small snapshot, just check that it’s the correct subset of the large snapshot. - If you want to verify that the SHA for a blob is correct, open a terminal and run `git hash-object -t blob --stdin`; then, enter the content of the blob and press `<C-d>`. The result is the blob SHA. To run a sanity-check on a large repository: apply the following patch: <details> <summary>Patch to print out statistics about loaded repository</summary> ```diff diff --git a/config/paths.js b/config/paths.js index d2f25fb..8fa2023 100644 --- a/config/paths.js +++ b/config/paths.js @@ -62,5 +62,6 @@ module.exports = { fetchAndPrintGithubRepo: resolveApp( "src/plugins/github/bin/fetchAndPrintGithubRepo.js" ), + loadRepository: resolveApp("src/plugins/git/loadRepository.js"), }, }; diff --git a/src/plugins/git/loadRepository.js b/src/plugins/git/loadRepository.js index a76b66c..9380941 100644 --- a/src/plugins/git/loadRepository.js +++ b/src/plugins/git/loadRepository.js @@ -106,3 +106,7 @@ function findTrees(git: GitDriver, rootTrees: Set<Hash>): Tree[] { } return result; } + +const result = loadRepository(...process.argv.slice(2)); +console.log("commits", result.commits.size); +console.log("trees", result.trees.size); ``` </details> Then, run `yarn backend` and put the following script in `test.sh`: <details> <summary>Contents for `test.sh`</summary> ```shell #!/bin/bash set -eu repo="$1" ref="$2" via_node() { node bin/loadRepository.js "${repo}" "${ref}" } via_git() ( cd "${repo}" printf 'commits ' git rev-list "${ref}" \| wc -l printf 'trees ' git rev-list "${ref}" \| while read -r commit; do git rev-parse "${commit}^{tree}" git ls-tree -rt "${commit}" \ \| grep ' tree ' \ \| cut -f 1 \| cut -d ' ' -f 3 done \| sort \| uniq \| wc -l ) echo printf 'Running directly via git...\n' time a="$(via_git)" echo printf 'Running Node script...\n' time b="$(via_node)" diff -u <(cat <<<"${a}") <(cat <<<"${b}") ``` </details> Finally, run `./test.sh /path/to/some/repo origin/master`, and verify that it exits successfully (zero diff). Here are some timing results on SourceCred and TensorBoard: - SourceCred: 0.973s via Node, 0.327s via git. - TensorBoard: 30.836s via Node, 6.895s via git. For TensorFlow, running via git takes 7m33.995s. Running via Node fails with an out-of-memory error after 39 minutes, with 10GB RAM and 4GB swap. See details below. <details> <summary> Full timing details, commit SHAs, and OOM error message </summary> ``` + ./test.sh /home/wchargin/git/sourcecred `01634aabcc` Running directly via git... real 0m0.327s user 0m0.016s sys 0m0.052s Running Node script... real 0m0.973s user 0m0.268s sys 0m0.176s + ./test.sh /home/wchargin/git/tensorboard 7aa1ab9d60671056b8811b7099eec08650f2e4fd Running directly via git... real 0m6.895s user 0m0.600s sys 0m0.832s Running Node script... real 0m30.836s user 0m3.216s sys 0m10.588s + ./test.sh /home/wchargin/git/tensorflow 968addadfd4e4f5688eedc31f92a9066329ff6a7 Running directly via git... real 7m33.995s user 5m21.124s sys 1m5.476s Running Node script... FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory 1: node::Abort() [node] 2: 0x121a2cc [node] 3: v8::Utils::ReportOOMFailure(char const, bool) [node] 4: v8::internal::V8::FatalProcessOutOfMemory(char const, bool) [node] 5: v8::internal::Factory::NewFixedArray(int, v8::internal::PretenureFlag) [node] 6: v8::internal::DeoptimizationInputData::New(v8::internal::Isolate, int, v8::internal::PretenureFlag) [node] 7: v8::internal::compiler::CodeGenerator::PopulateDeoptimizationData(v8::internal::Handle<v8::internal::Code>) [node] 8: v8::internal::compiler::CodeGenerator::FinalizeCode() [node] 9: v8::internal::compiler::PipelineImpl::FinalizeCode() [node] 10: v8::internal::compiler::PipelineCompilationJob::FinalizeJobImpl() [node] 11: v8::internal::Compiler::FinalizeCompilationJob(v8::internal::CompilationJob) [node] 12: v8::internal::OptimizingCompileDispatcher::InstallOptimizedFunctions() [node] 13: v8::internal::Runtime_TryInstallOptimizedCode(int, v8::internal::Object*, v8::internal::Isolate) [node] 14: 0x12dc8b08463d ``` </details> wchargin-branch: load-git-repositories # Please enter the commit message for your changes. Lines starting # with '#' will be kept; you may remove them yourself if you want to. # An empty message aborts the commit. # # Date: Mon Apr 23 23:02:14 2018 -0700 # # HEAD detached at origin/wchargin-load-git-repositories # Changes to be committed: # modified: package.json # new file: src/plugins/git/__snapshots__/loadRepository.test.js.snap # new file: src/plugins/git/loadRepository.js # new file: src/plugins/git/loadRepository.test.js # # Untracked files: # out # runtests.sh # src/plugins/artifact/editor/ArtifactSetInput.js # src/plugins/git/repository.js # test.sh # todo #	2018-04-24 13:57:10 -07:00
Dandelion Mané	515c577825	Split github/githubPlugin.js into three files (#138 ) github/githubPlugin.js was growing ungainly - it contained two major pieces: all of the node and edge types, and the GitHub parser. As I contemplated adding a third major new section of logic (an easy-to-use api for traversing the GitHub graph, with first class support for comments, authorship, etc), I found the prospect of adding even more into that file quite unappealing. So, I have instead split it into three files: * github/pluginName.js: Exports the plugin name. * github/types.js: Exports types of nodes, edges, and their payloads * github/parser.js: Exports `GithubParser` No logic has been changed whatsoever - this is purely a rename-refactor. Test plan: CI still passes. I manually verified that the Artifact Editor can still load and display GitHub data.	2018-04-24 09:30:22 -07:00
Dandelion Mané	01634aabcc	Add url to author payloads (#137 ) We have urls for all the author types, so for consistency across GitHub payloads, I am adding urls to the author paylods. Test plan: Check that snapshot changes consist entirely of adding urls, and that the urls are appropraite.	2018-04-23 18:30:19 -04:00
Dandelion Mané	e191614dd4	Find GitHub username references (#136 ) Add logic to findReferences for finding GitHub username references, e.g. "Hello, @wchargin!". The API is unchanged. Test plan: There are new unit tests that verify this behavior works as expected.	2018-04-23 18:13:21 -04:00
Dandelion Mané	d9d83c1157	Rename parseReferences.js to findReferences.js (#135 ) This is consistent with the single export (findReferences.js now exports `findReferences`)	2018-04-23 17:59:00 -04:00
Dandelion Mané	240f05899c	Simplify GH reference handling (#130 ) Currently, every type of reference has its own type signature: numeric references are returned as numbers, url references are a complicated object containing url parts, and so forth. Since ultimately the references are just strings, it makes more sense to treat references as plain strings. This allows a much simpler implementation of reference edge creation in the GitHub plugin. It also results in a simpler API for the parseReferences file (it only exports a single findReferences function). Test plan: Verify that the updated tests encode appropriate behavior.	2018-04-23 17:49:09 -04:00
William Chargin	e0a5118f8d	Switch typed dispatch table to `empty` assertion (#133 ) Summary: This replaces the implementation of a static check from a somewhat complicated use of higher-order types to a more simple empty-union assertion, as suggested by jez in “Case Exhaustiveness in Flow”: https://blog.jez.io/flow-exhaustiveness/ (I know; we’re not using Reason. One step at a time. :-) ) I adapted the implementation a bit because I prefer explicitly disabling an ESLint warning over a no-op function call; it is not clear from the latter that the purpose is to suppress a lint warning. Test Plan: In `githubPlugin.js`, add `\| "ANOTHER"` to the `NodeType` type, and note a compile-time Flow error on the appropriate line, with a very readable error message. Note that all unit tests pass, and running the UI on `sourcecred/sourcecred` yields correct titles for each node type present (namely, all node types except for `ORGANIZATION` and `BOT`). wchargin-branch: empty-union-assertion	2018-04-23 13:12:44 -07:00
Dandelion Mané	1e311c59f4	Add README and update flow for GitHub demo data (#129 ) Keeping the GitHub demo data up-to-date is important, and there isn't good documentation for how to do that. This commit adds a short README.md for the demo data, and adds an update flag to fetchGithubRepoTest.sh that can be used to easily update it. Test plan: Modify example-repo.json (e.g. by deleting it entirely). Run fetchGithubRepoTest.sh -u and confirm that the data was regenerated without change. Run fetchGithubRepoTest.sh and confirm the test passes. Note: The end cursor is sensitive to the timezone, which seems to be cached with the GitHub token. An erroneous switch to Israel timezone made it into master; this commit reverts back to US/Pacific.	2018-04-23 14:21:01 -04:00
Dandelion Mané	d025e78b1d	Get urls for all GitHub objects (#128 ) This commit modifies our GitHub graphql query so that we request urls for all objects (e.g. users, pull requests, pull request review comments). Some change along these lines is necessary so that we can correctly represent URL reference edges to e.g. issue comments. (It might be possible to do without by reverse-enginering from the ids, but we are resolved to treat ids as opaque). Strictly speaking, we don't need to collect urls for users, issues, and pull requests - they are generated via simple schema. However, for consistency, I think it's better to just take URLs on everything. Test plan: The example-repo.json has been regenerated. The diffs are as expected.	2018-04-20 10:50:03 -04:00
William Chargin	9d1200275e	Implement exhaustive fetching for our GitHub query (#123 ) Summary: This commit completes the ad hoc pagination solution described in #117: we implement pagination specifically for our current query against the GitHub API. This is done in such a way that reasonable additions to the query will not be hard to implement—for instance, if we want to fetch a new kind of field, the marginal cost is at most a bit of extra copy-and-paste and some modifications to tests. However, we should certainly still plan to implement the fully automatic pagination system described in #117. Running on TensorBoard with the default page limits takes 30–33 seconds over 7 queries, uses 103 GitHub API points (out of 5000 for any given hour), and produces a JSON file that is 8.7MB (when pretty-printed). This all seems quite reasonable to me. Test Plan: Extensive unit tests added. The snapshots are quite readable, by design. For a real-world test, you can `yarn start` the artifact viewer and use the GUI to fetch the data for tensorflow/tensorboard. To demonstrate that the fetching process gives the same results regardless of the page size, follow these steps: 1. In `fetchGithubRepo.js`’s `postQuery` function, insert a new statement `console.error("Posting query...")` at the beginning. (It is important to print to stderr instead of stdout.) 2. Run `yarn backend` and then invoke `fetchGithubRepo.js` on a repo large enough to require pagination, like SourceCred or TensorBoard. Pipe the result to `shasum -a 256` and note the SHA. 3. In `github/graphql.js`, change the page size constants near the top of the file. Re-run step 2. The number of queries that are posted will vary significantly as the page size constants vary, but the checksum should remain the same. 4. Repeat until satisfied. (I tried three sets of values: the standard values, the standard values but all divided by 10, and all 5s.) wchargin-branch: ad-hoc-pagination	2018-04-06 07:29:55 -07:00
William Chargin	e82b56e52c	Implement a function to merge query results (#122 ) Summary: Once we execute the root query, find continuations, embed the continuations into queries, and execute the continuation query, we will need to merge the continuations’ results back into the root results. This commit adds a function `merge` that will be suitable for doing just that. Test Plan: New unit tests added, with 100% coverage. Run `yarn test`. wchargin-branch: merge-query-results	2018-04-05 02:27:03 -07:00
William Chargin	751172ea77	Create pagination continuations for GitHub query (#121 ) Summary: Per #117, this is a first step toward at writing a pagination API that specifically targets our current GitHub query. For design details, see new module docs on `src/plugins/github/graphql.js`. This commit modifies the core GitHub query and thus the `example-repo.json` snapshot: we now request `endCursor` fields for all pagination info, and we request the `id` of the root `repository` field. The former is obviously necessary. The latter is necessary for the repository to be consistent with other nodes that offer connections as fields: we require an ID on the node containing the connection so that we can have random access to it in a continuation selector. Test Plan: Unit tests added. You can also try out the generated continuation queries for yourself: apply the patch below, run `yarn backend`, and then run the `fetchGithubRepo.js` script on `sourcecred/sourcecred`. This will output a nicely formatted query that you can paste directly into GitHub’s API explorer and execute. (Note that, because this patch is not fully polished, the query must be run against a repository that has a continuation for every node type: more pages of issues, PRs, comments, reviews, and review comments. This is due to an easy-but-annoying-to-fix bug in the patch, not in the code included in this commit.) <details> <summary>Patch for generating a continuations query</summary> ```diff diff --git a/src/plugins/github/fetchGithubRepo.js b/src/plugins/github/fetchGithubRepo.js index 789a20e..418c736 100644 --- a/src/plugins/github/fetchGithubRepo.js +++ b/src/plugins/github/fetchGithubRepo.js @@ -6,8 +6,13 @@ import fetch from "isomorphic-fetch"; -import {stringify, inlineLayout} from "../../graphql/queries"; -import {createQuery, createVariables} from "./graphql"; +import {stringify, inlineLayout, multilineLayout} from "../../graphql/queries"; +import { + continuationsFromQuery, + continuationQuery, + createQuery, + createVariables, +} from "./graphql"; /** * Scrape data from a GitHub repo using the GitHub API. @@ -66,8 +71,13 @@ function postQuery(payload, token) { if (x.errors) { return Promise.reject(x); } - ensureNoMorePages(x); - return Promise.resolve(x); + console.log( + stringify.body( + continuationQuery(Array.from(continuationsFromQuery(x.data))), + multilineLayout(" ") + ) + ); + throw new Error("STOPSHIP"); }); } diff --git a/src/plugins/github/graphql.js b/src/plugins/github/graphql.js index 9ea2592..9ead42b 100644 --- a/src/plugins/github/graphql.js +++ b/src/plugins/github/graphql.js @@ -39,11 +39,11 @@ import {build} from "../../graphql/queries"; * * [1]: https://developer.github.com/v4/guides/resource-limitations/#node-limit / -export const PAGE_LIMIT = 100; -const PAGE_SIZE_ISSUES = 100; -const PAGE_SIZE_PRS = 100; -const PAGE_SIZE_COMMENTS = 20; -const PAGE_SIZE_REVIEWS = 10; +export const PAGE_LIMIT = 10; +const PAGE_SIZE_ISSUES = 10; +const PAGE_SIZE_PRS = 10; +const PAGE_SIZE_COMMENTS = 3; +const PAGE_SIZE_REVIEWS = 1; const PAGE_SIZE_REVIEW_COMMENTS = 10; /* @@ -340,6 +340,36 @@ function* continuationsFromReview( } } +/** + * Combine continuations into a query. + / +export function continuationQuery( + continuations: $ReadOnlyArray<Continuation> +): Body { + const nonces: string[] = continuations.map((_, i) => `_n${String(i)}`); + const nonceToIndex = {}; + nonces.forEach((n, i) => { + nonceToIndex[n] = i; + }); + const b = build; + const query = b.query( + "Continuations", + [], + continuations.map((continuation, i) => + b.alias( + nonces[i], + b.field( + "node", + {id: b.literal(continuation.enclosingNodeId)}, + continuation.selections.slice() + ) + ) + ) + ); + const body = [query, ...createFragments()]; + return body; +} + /* * These fragments are used to construct the root query, and also to * fetch more pages of specific entity types. ``` </details> wchargin-branch: ad-hoc-pagination-continuations	2018-04-05 02:23:17 -07:00
William Chargin	7711f01b84	Extract paginatable fragments of GitHub query (#120 ) Summary: Any time that we pull fields off a connection object, we may need to repeat the query for subsequent pages. Therefore, such fragments will be shared across multiple queries, and also shared within a query if we need to fetch—say—more issue comments on two or more distinct issues. This is a perfect use case for fragments. This commit refactors the GitHub query to be organized in terms of fragments, without changing the format of the results. (We also take this opportunity to factor the page limits into constants.) Test Plan: After running `yarn backend`, the `fetchGithubRepoTest.sh` test passes. wchargin-branch: extract-github-query-fragments	2018-04-05 02:19:29 -07:00
William Chargin	fbb6ec28db	Extract GitHub GraphQL code to a module (#119 ) Summary: Per #117, we want to develop an ad hoc pagination API written specifically against the query that we use to interact with GitHub. The pagination logic should be separate from the logic to actually fetch the repo, but should be colocated with the query itself, so this commit extricates the query from `fetchGithubRepo.js` into a new module. Test Plan: Existing tests pass, including `fetchGithubRepoTest.sh`. wchargin-branch: extract-github-graphql	2018-04-05 00:08:52 -07:00
William Chargin	806c5e5687	Update example data for @decentralion name change (#118 ) Summary: This was created by re-crawling the GitHub repo via `fetchGithubRepo`, and then updating Jest snapshots. Test Plan: Note that `fetchGithubRepoTest.sh` passes, so the data is now up to date. Inspect the snapshot, and note that the only changes are to change login names from `dandelionmane` to `decentralion`. To do so automatically: ```bash set -eu diff_contents() { git difftool HEAD^ HEAD --extcmd=diff --no-prompt } ! diff_contents \| grep '^<' \| grep -vF '"dandelionmane"' ! diff_contents \| grep '^>' \| grep -vF '"decentralion"' ``` wchargin-branch: decentralion-data	2018-04-04 23:29:30 -07:00
William Chargin	e6f401df30	Add field aliases to structured GraphQL queries (#116 ) Summary: For pagination, we’ll want to query against multiple entities of the same type. GraphQL uses aliases to facilitate this. This commit adds support for aliases to our GraphQL query DSL. Test Plan: Inspect snapshot changes, and note that `yarn flow` and `yarn test` pass. wchargin-branch: graphql-aliases	2018-03-26 20:54:16 -07:00
William Chargin	2f50aa7364	Rename getAll{Nodes,Edges} to get{Nodes,Edges} (#115 ) Test Plan: Standard `yarn flow` and `yarn test` suffice. wchargin-branch: get-no-all	2018-03-26 20:25:06 -07:00
William Chargin	f93c9a8e42	Allow editing artifact descriptions (#114 ) Summary: It looks like this: ![Screenshot](https://user-images.githubusercontent.com/4317806/37943962-a8352b94-312e-11e8-9523-855a34020709.png) Test Plan: Interaction tests included. Run `yarn test`. wchargin-branch: artifact-descriptions	2018-03-26 20:11:30 -07:00
William Chargin	99f24c420a	Convert ArtifactList to ArtifactGraphEditor (#112 ) Summary: This component now maintains a graph of just artifacts and the edges among them. It owns the state, and notifies its parents of changes with a callback. We treat the graph objects as properly immutable, copying them on each change. So far, descriptions are always the empty string. Test Plan: Interaction tests added. Run `yarn test`. wchargin-branch: artifact-graph-editor	2018-03-26 19:49:21 -07:00
Dandelion Mané	823e7da374	Cleanup GitHub plugin Node/Edge types (#113 ) Update GitHub plugin to respect two new conventions: - Node/Edge types are exported as UPPER_CASE_CONSTANTS - Edge types are always verbs, which can be read as $src verb $dst Test plan: Flow passes. Inspect snapshot changes.	2018-03-26 19:24:14 -07:00
William Chargin	458744e77f	Redefine artifact plugin node and edge semantics (#111 ) Summary: This commit revises our implementations of node and edge types, and specifies the semantics for artifact plugin IDs: we create IDs by slugifying an artifact name and then resolving collisions Test Plan: Unit tests added. Run `yarn flow` and `yarn test`. wchargin-branch: artifact-plugin-node-edge-semantics	2018-03-26 19:17:42 -07:00
William Chargin	e57a16efbd	Allow removing nodes and edges from the graph (#110 ) Summary: Wherein we change the semantics to allow\* dangling edges. This is necessary for plugins that want to update nodes, such as changing a description or other noncritical field. \* (It was technically possible before by abusing `merge`, but now you can just do it.) Paired with @dandelionmane. Test Plan: Extensive tests added. Run `yarn flow` and `yarn test`. wchargin-branch: allow-removing-from-graph	2018-03-26 17:40:19 -07:00
William Chargin	26508051a4	Add a covariant `copy` method on `Graph` (#109 ) Summary: Clients of `Graph` that wish to treat the graph as immutable will benefit from a `copy` method. We should provide it on `Graph` instead of asking clients to reimplement it because it affords us the opportunity to get the type signature right: in particular, copying should allow upcasting of the type parameters, even though `Graph` itself is invariant. Paired with @dandelionmane. Test Plan: Unit tests added. Run `yarn flow` and `yarn test`. To check that downcasting is not allowed, change the types in the new static test case in `graph.test.js` to be contravariant instead of covariant, and note that `yarn flow` fails. wchargin-branch: graph-copy	2018-03-26 13:58:12 -07:00
William Chargin	007cf88172	Separate artifact settings from GitHub graph fetch (#108 ) Summary: We need to know the repo owner and name for purposes other than fetching the GitHub graph: for instance, fetching the `artifacts.json` file that describes the artifact subgraph. It makes sense that these should be settings global to the application. This commit separates a settings component and the original GitHub graph fetcher. This invalidates localStorage; you can manually migrate. Paired with @dandelionmane. Test Plan: Note that the data continues to be stored in localStorage and that it is updated on each keypress. Note that the state is properly passed around: if you change the repository name from `example-repo` to `sourcecred`, e.g., and click “Fetch GitHub graph”, then the proper graph is fetched. wchargin-branch: separate-artifact-settings	2018-03-26 13:26:44 -07:00
Dandelion Mané	d310561b94	Generate Edge ids automatically (#107 ) * Generate Edge ids automatically Adds edgeID in graph, which creates a string id from src and dest. Provided that the plugin only uses edgeID for generating edge ids of that type, these ids will be unique. Modify the GitHub plugin to use edgeID. This allows the code and typesignature to be simpler, and will be more consistent with other plugins. Test plan: Carefully inspect the snapshots.	2018-03-26 12:05:25 -07:00
Dandelion Mané	043c37f9c6	Add a toString and fromString method on addresses (#106 ) * Add a toString and fromString method on addresses The toString and fromString methods use json-stable-stringify, and we've modified other address code to use these methods. As such, a number of the snapshots have changed (ordering). Test plan: Verify the new included unit tests are comprehensive. Inspect the snapshots.	2018-03-26 11:32:25 -07:00
Dandelion Mané	b2c13ac891	[refactor] Move node/edge types to addresses (#105 ) This pull request adds a string type as a field of the address, thus canonicalizing that all Nodes and Edges have a Type. This allows for simpler PluginAdapters, and simpler implementation of the GitHub plugin (as it no longer needs to invent its own mechanism for storing types). I explored making the Address interface generic, with a Type parameter that is a subtype of string. Unfortunately, Flow type resolution seems to have an exponential performance degradation with many subtyped parameters, and adding the extra type parameters to Graph.merge resulted in my flow instance locking. Maybe we can explore adding the subtypes later. In the GitHub plugin, we entirely do away with the NodeIDs, but we still have EdgeIDs. I plan to remove the need for EdgeIDs in a separate PR which will enforce uniqueness of (srcAddress, dstAddress, pluginName, type) tuples, so that explicitly generating IDs for edges will be unnecessary. In the mean time, I bifurcated the makeAddress function in the GitHub plugin to makeEdgeAddress and makeNodeAddress. Test plan: Flow and unit tests all pass. Inspect snapshots, and verify that all the changes are reasonable. Note that since we order by serialized address, adding the type field to address has changed the snapshot order in a few cases. Close #103	2018-03-26 10:36:49 -07:00
William Chargin	8fdf758cb9	Standardize on Enzyme shallow rendering (#104 ) Summary: This commit moves our existing frontend tests to use Enzyme’s shallow rendering API <http://airbnb.io/enzyme/docs/api/shallow.html>. The benefit over also using `react-test-renderer` is simply consistency (the two are functionally equivalent); the benefits over `mount` are that subcomponents cannot contaminate the test state (i.e., you’re only testing one component at a time), that the resulting snapshots are more readable because the root props are not shown, and that the implementation is more efficient. This is a follow-up to #102. In a case where we actually need a full DOM tree, we should still feel free to use `mount`, but we haven’t needed that yet. Test Plan: Verify that the new `ContributionList.test.js.snap` represents the same data as the old one. wchargin-branch: standardize-enzyme-shallow	2018-03-21 18:28:06 -07:00
William Chargin	feac85ad2c	Use Enzyme to test ContributionList dynamics (#102 ) Summary: This is our first dynamic test of a React component! Enzyme looks pretty easy to use to me, for both snapshot tests and interaction simulation. In doing so, we catch a minor bug in the edge case where a contribution is not owned by any plugin (`colSpan`, not `colspan`). This edge case does not appear in the sample data, but it does appear in the test data, even prior to this commit. The previous renderer, `react-test-renderer`, appears not to surface this error. Furthermore, this bug did not cause any user-visible errors except a `console.error`. Test Plan: Inspect the snapshot file to make sure that it is reasonable. (The existing test case has its snapshot regenerated due to formatting differences between the two renderers.) To test that the browser error is fixed, render a contribution list on a GitHub graph but with an empty adapter set. One way to do this is to comment out line 7 of `standardAdapterSet.js`; alternately, you can use the React Dev Tools to select the `ContributionList` node, then run ```js $r.props.adapters.adapters = {}; $r.forceUpdate(); ``` Note subsequently that there is no console error and that the `<td>`s in question span three columns. wchargin-branch: contributionlist-dynamic-test	2018-03-21 17:35:17 -07:00
William Chargin	5dd5de306c	Allow filtering by type in the contribution list (#101 ) Summary: Filter options are “all contributions” or a specific plugin/type combination. This includes a snapshot test for the static state. I’ll add an interaction test in a subsequent commit. Test Plan: `yarn start`, fetch graph, play with the filtering options. wchargin-branch: filter-contributions	2018-03-20 18:50:25 -07:00
Dandelion Mané	39fd3fa354	Make GitHub capitalization consistent within code (#100 ) * Make GitHub capitalization consistent within code We now never capitalize the H in GitHub within variable or function names. We still capitalize it in comments or user facing strings. Test plan: Unit tests, the fetchGithubRepoTest.sh, and `git grep itHub` only shows comment lines and print statements. * Fix William's klaxon	2018-03-20 18:32:05 -07:00
Dandelion Mané	41cdf2d855	Implement GitHub reference detection (#98 ) This commit only adds logic for finding references in GitHub posts, either by #-numeric reference, or explicit urls. Adding the reference edges to the graph will occur in a followon commit. Test plan: New unit tests are included	2018-03-20 18:10:03 -07:00
William Chargin	61624a5dcf	Extract Aphrodite-management test code to testUtil (#99 ) Summary: Any tests that render Aphrodite-styled React elements will need to do this, so it’s nice to have the code in one place. wchargin-branch: aphrodite-testutils	2018-03-20 17:35:26 -07:00
Dandelion Mané	5b420c6294	Add EdgeTypes type in githubPlugin (#97 )	2018-03-20 15:35:22 -07:00
William Chargin	f02e0610be	Use structured GraphQL query API in GitHub fetcher (#94 ) Summary: In addition to being nicer on the eyes, this enables the query to be statically analyzed (e.g., by an auto-pagination API) and used by other modules. Test Plan: Manually running ```shell $ yarn backend $ GITHUB_TOKEN="<your_token>" src/plugins/github/fetchGitHubRepoTest.sh ``` succeeds. wchargin-branch: use-structured-graphql-queries	2018-03-20 15:29:31 -07:00
William Chargin	ab619432e1	Begin work on contributions and adapters (#93 ) Summary: This commit begins to extend the artifact editor to display contributions. To display contributions from arbitrary plugins, we need to communicate with those plugins somehow. We do so via an adapter interface that plugins implement; included in this commit is an implementation of this interface for the GitHub plugin (partially: we punt on rendering). This includes a snapshot test. The snapshot format is designed to be human-readable and -auditable so that it can serve as documentation. Test Plan: Run the application with `yarn start`. Then, fetch a graph and watch as its contributions appear in the view. wchargin-branch: contributions-and-adapters	2018-03-20 14:26:02 -07:00
William Chargin	e9ca833448	Expose `NodeTypes` from the GitHub plugin (#96 ) Summary: This is useful for metaprogramming. For instance, suppose we have an object like this: ```js const stringifiers = { ISSUE: (stringifyIssue: (Node<IssueNodePayload>) => string), COMMENT: (stringifyComment: (Node<CommentPayload>) => string), ... } ``` How do we type this? We might try ```js {[type: NodeType]: (Node<NodePayload>) => string} ``` but this is not correct, because `Node<IssueNodePayload>` is a subtype of `Node<NodePayload>`, and `(_) => K` is contravariant, not covariant. (In other words, a function from `Node<IssueNodePayload>` is not as general as a function from `Node<NodePayload>`.) We need to express a dependency between the object key and the value. We instead write: ```js type TypedNodeToStringifier = <T: $Values<NodeTypes>>( T ) => (node: Node<$ElementType<T, "payload">>) => string; (stringifiers: $Exact<$ObjMap<NodeTypes, TypedNodeToStringifier>>); ``` This expresses exactly (heh) the right type. Test Plan: Note that removing any of the elements of `NodeTypes` yields a Flow error, due to the static assertion following the type definition. wchargin-branch: node-types	2018-03-20 13:27:08 -07:00
Dandelion Mané	559ed393a9	Update example-repo json and snapshots (#95 ) I added some new issues to sourcecred/example-repo to test unicode support and parsing of extremely long issues titles. This commit merely updates our example-repo json and the corresponding snapshots. Test plan: Run testFetchGithubRepo.sh Run unit tests	2018-03-20 13:18:38 -07:00
Dandelion Mané	02754d2523	GitHub parser now recognizes pull request reviews (#91 ) Also, since there are now two types of things that are being "contained" (comments and pull request reviews), I factored out an addContainment method to avoid repeating that code. To make our handling of PullRequestReviewComments and regular Comments consistent, I modified our query string so that we now request urls on PullRequestReviewComments. Also, since I didn't notice until closely inspecting the snapshot that we had been adding payloads with some undefined properties, I added a test to verify that every property on every node and edge payload is defined. I regenerated the example-repo data to reflect the change to query string. Test plan: Verify that the snapshot changes are appropriate Run standard tests Run `yarn backend` Run `GITHUB_TOKEN={your_token} ./src/plugins/github/fetchGithubRepoTest.sh`	2018-03-20 11:46:46 -07:00
William Chargin	55225fd53e	Save graph fetcher credentials in local storage Test Plan: Make a request, then refresh, and note that the fields are populated. Paired with @dandelionmane. wchargin-branch: graph-fetcher-localstore	2018-03-19 20:06:52 -07:00
William Chargin	5d80e39473	Fetch and parse GitHub graphs on the frontend Summary: This is quick and dirty. No error handling yet. We’ll soon save credentials and repository to local storage. Paired with @dandelionmane. Test Plan: Run `yarn start`, then enter your API key and specify the `sourcecred/example-repo` repo. Note that the resulting graph is shortly logged to the console. wchargin-branch: fetch-parse-github-frontend	2018-03-19 20:06:52 -07:00
William Chargin	0eed384850	Enable creating artifacts in the artifact list Summary: Paired with @dandelionmane. wchargin-branch: create-artifacts	2018-03-19 20:06:52 -07:00
William Chargin	0c3aa9c7ba	Create a view-only artifact viewer Summary: Paired with @dandelionmane. wchargin-branch: create-artifact-viewer	2018-03-19 20:06:52 -07:00
William Chargin	5d042c0008	Use isomorphic-fetch instead of node-fetch Summary: Paired with @dandelionmane. Test Plan: ``` $ CI=true yarn test $ yarn backend $ GITHUB_TOKEN="<your_token>" src/plugins/github/fetchGitHubRepoTest.sh ``` wchargin-branch: isomorphic-fetch	2018-03-19 20:06:52 -07:00
William Chargin	d18cb945af	Add style support to the artifacts app Test Plan: Note that the header, when rendered, is magenta. wchargin-branch: stylish-artifacts	2018-03-19 20:06:52 -07:00
William Chargin	bbecf00615	Repurpose React app as artifact editor (#89 ) Summary: We’ll now start creating the artifact plugin. A large part of this will be the user interface, including a GUI. For now, our build system just builds a single React app, so we’re cannibalizing the main explorer to serve this purpose. Paired with @dandelionmane. Test Plan: The following still work: - `yarn test` - `yarn start` - `yarn build; (cd build; python -m SimpleHTTPServer)` wchargin-branch: repurpose-react-app-as-artifact-editor	2018-03-19 15:25:23 -07:00
William Chargin	8f8d9c4564	Strip down explorer app to a barebones React app (#88 ) Summary: We’re not deleting it because it works with the build system and has the service worker stuff from create-react-app, but we’ll soon repurpose it. Paired with @dandelionmane. Test Plan: The following still work: - `yarn test` - `yarn start` - `yarn build; (cd build; python -m SimpleHTTPServer)` wchargin-branch: dismantle-explorer	2018-03-19 15:09:11 -07:00
William Chargin	ca85fdf234	Reorganize `src/` directory (#87 ) Test Plan: Note that tests still pass, and all changes to snapshot files are verbatim moves. wchargin-branch: reorg	2018-03-19 14:31:50 -07:00
Dandelion Mané	30600004e4	Implement the GitHub graph parser (#81 ) The GitHub parser transforms GraphQL api data from GitHub into our Graph data structure. This commit focuses on properly parsing Issues, Pull Requests, Comments, and Users. Test Plan: Run the unit tests. Inspect the snapshot results (particuarly those for individual pull requests or issues, which are easier to parse) and verify that the output is appropriate.	2018-03-19 12:26:32 -07:00
William Chargin	274007c90d	Configure Webpack for backend applications (#84 ) Summary: Running `yarn backend` will now bundle backend applications. They’ll be placed into the new `bin/` directory. This enables us to use ES6 modules with the standard syntax, Flow types, and all the other goodies that we’ve come to expect. A backend build takes about 2.5s on my laptop. Created by forking the prod configuration to a backend configuration and trimming it down appropriately. To test out the new changes, this commit changes `fetchGitHubRepo` and its driver to use the ES6 module system and Flow types, both of which are properly resolved. Test Plan: Run `yarn backend`. Then, you can directly run an entry point via ``` $ node bin/fetchAndPrintGitHubRepo.js sourcecred example-repo "${TOKEN}" ``` or invoke the standard test driver via ```shell $ GITHUB_TOKEN="${TOKEN}" src/backend/fetchGitHubRepoTest.sh ``` where `${TOKEN}` is your GitHub authentication token. wchargin-branch: webpack-backend	2018-03-18 22:43:23 -07:00
William Chargin	291dcb17c3	Add a GraphQL structured query format (#77 ) Summary: See motivation in #76. Feel free to look at the new snapshot file to inspect the structured representation and also the stringified output. This implementation is sufficient to encode our query against the GitHub v4 API; see the test plan below. Test Plan: Unit tests added; run `yarn flow && yarn test`. This code has full coverage except for lines 260, 315, and 380 of `queries.js`; these lines check invariants that should never be violated. You can also use the following steps to verify that the sample query is valid GraphQL that produces the same results as our hand-written query: 1. Apply the following hacky patch: ```diff diff --git a/src/backend/graphql/queries.test.js b/src/backend/graphql/queries.test.js index 52bdec7..c04a636 100644 --- a/src/backend/graphql/queries.test.js +++ b/src/backend/graphql/queries.test.js @@ -3,6 +3,18 @@ import type {Body} from "./queries"; import {build, stringify, multilineLayout, inlineLayout} from "./queries"; +function emitGitHubQuery(layout, filename) { + const fs = require("fs"); + const path = require("path"); + const result = stringify.body(usefulQuery(), layout); + const outputFilepath = path.join(__dirname, "..", filename); + const outputText = `module.exports = ${JSON.stringify(result)};\n`; + fs.writeFileSync(outputFilepath, outputText); + console.log(`Wrote output to ${outputFilepath}.`); +} +emitGitHubQuery(multilineLayout(" "), "githubQueryMultiline.js"); +emitGitHubQuery(inlineLayout(), "githubQueryInline.js"); + describe("queries", () => { describe("end-to-end-test cases", () => { const testCases = { ``` 2. Run `CI=true yarn test`, and verify that the following two files written to `src/backend/` contain appropriate contents. You can just eyeball them, or check that they match my results: https://gist.github.com/wchargin/f37b99fd4ec345c9d2541c2dc53ceda9 3. In `fetchGitHubRepo.js`, change the definition of `const query` to ```js const query = require("./githubQueryMultiline.js"); ``` Run ```shell GITHUB_TOKEN="<your_token_here>" src/backend/fetchGitHubRepoTest.sh ``` and verify that it exits successfully. 4. Repeat for `require("./githubQueryInline.js")`. wchargin-branch: graphql-structured-queries	2018-03-18 22:35:20 -07:00
William Chargin	1083540d21	Parameterize `Graph` over node and edge payloads (#83 ) Summary: Closes #82. This affords clients type-safety without needing to verbosely annotate every node or edge passed into the graph functions. It also enables graph algorithms to be more expressive in their types: for instance, the merge function now clearly indicates from its type that the first graph’s nodes are passed as the first argument to the node reducer, and the second graph’s nodes to the second. Clients can upgrade immediately by using `Graph<, >`. Thankfully, Flow supports variance well enough for this all to be possible without too much trouble. Test Plan: Existing unit tests pass statically and at runtime. I added a test case to demonstrate that merging works covariantly. To see some failures, change `string` to an incompatible type, like `number`, in the definitions of `makeGraph` in test functions for conservatively rejecting graphs with conflicting nodes/edges (ll. 446, 462). wchargin-branch: parameterize-graph	2018-03-17 11:36:53 -07:00
Dandelion Mané	894d6a2291	Allow adding explicitly typed nodes/edges (#80 ) Summary: Flow doesn’t allow us to specify variance annotations in generic function parameters, and doesn’t allow coercing `Node<T>` to `Node<mixed>`. This forces us to put `any`s in our code, which…works. Paired with @dandelionmane. Test Plan: New unit tests trivially pass dynamically, and now pass statically (failing before the changes to `graph.js`). wchargin-branch: explicitly-typed-nodes-edges	2018-03-17 00:28:33 -07:00
Dandelion Mané	1e791782d5	Allow redundant adds to the Graph (#79 ) Graph.addNode and Graph.addEdge now allow adding the same node or edge multiple times, provided that the duplicate adds are trying to insert identical content. This came up while prototyping the GitHub plugin; rather than create myriad subgraphs and merge them, I found it convenient to construct a single graph and iteratively add nodes. Since the same node may be discovered multiple times (most notably user identities), there was a need for a "conservative add" abstraction that adds a node if it doesn't exist yet, but errors only if multiple adds conflict. Since this behavior is generic and highly conservative, it seemed appropriate to include in the graph class itself. Test Plan: The unit tests have been updated to include the new behavior.	2018-03-17 00:28:17 -07:00
Dandelion Mané	d2501947a6	Regenerate example-repo testdata (#78 ) I moved sourcecred/tiny-example-repository to sourcecred/example-repo as it's simpler to remember. I also unarchived it and added comments to an issue, so that we can create a simple test for issue parsing. This commit merely updates SourceCred to point to example-repo with the regenerated canoncial output.	2018-03-16 11:22:17 -07:00
Dandelion Mané	0e57b42095	Fetch GitHub repos using the GitHub v4 API (#75 ) Summary: It’s a whole new world of GraphQL! Our parser is now just a GraphQL query that asks for exactly what we want and dumps it to a file. The data exposed by the v4 API is also in a much nicer format than that of the v3 API, so this is pretty much a universal improvement. Currently, we do not handle pagination. We require that the repository in question have fewer than a fixed number of issues, and comments per issue, and reviews per PR, and review comments per PR, and so on. If this limit is exceeded, the script will fail-fast with a nice error message. To fix this, we’ll need to write a general-purpose pagination API that allows traversing cursors at any level of the query. Paired with @wchargin. Test Plan: Run $ GITHUB_TOKEN="your_token_here" src/backend/fetchGitHubRepoTest.sh and verify that it exits with 0. Note that if you change this script’s repository from `tiny-example-repository` to `sourcecred`, the script correctly fails and outputs a useful diff. wchargin-branch: github-v4-graphql	2018-03-15 14:56:25 -07:00
Dandelion Mané	fb00c35823	Factor eventide graph demo data to a new module (#71 ) * Factor evertide graph demo data to a new module It would be helpful to make our standard tiny graph available to other test and demo instances, outside of just graph.test.js. This way we can use it as a test case for the Graph Explorer.	2018-03-05 20:58:47 -08:00
Dandelion Mané	7ea8bdd964	Make App.js into skeleton for GraphExplorer (#70 ) * Make App.js into skeleton for GraphExplorer We make a very basic skeleton for the Graph Explorer as a basis for future development. This commit also removes the UserExplorer and FileExplorer from App.js. Since we have changed the underlying data model, we are unlikely to use the UserExplorer or FileExplorer in anything like their current state, so they are effectively deprecated. I am deferring removing them because it is nice to have some examples of working React code to copy from, before the Graph Explorer is ready. Test plan: run `yarn start`, and observe that the App displays the words "Graph Explorer" underneath the "SourceCred Explorer" title bar.	2018-03-05 20:09:24 -08:00
William Chargin	5960eab6c1	Make `Graph` serializable (#69 ) Summary: This commit adds `toJSON()` and `static fromJSON()` on `Graph`. The main benefit at this time is that this gets us free interoperability with Jest’s snapshot testing. The implementation of `fromJSON` is not performance-tuned, and could probably be significantly optimized. See #65 for discussion. Test Plan: New unit tests added: `yarn flow && yarn test`. wchargin-branch: make-graph-serializable	2018-03-05 16:22:58 -08:00
William Chargin	cee90fd10f	Use `AddressMap` in `Graph` (#68 ) Summary: This commit simplifies the implementation of `Graph` without changing its interface. We now use the `AddressMap` for all four instance fields of `Graph`. Test Plan: All existing tests pass, and coverage is maintained. wchargin-branch: use-address-map-in-graph	2018-03-05 16:20:52 -08:00
William Chargin	a8da44c94b	Create an `AddressMap` abstraction (#67 ) Summary: This commit reifies the concept of an `Addressable`, which is any object that has a covariant `address: Address` attribute, and implements a simple data structure for storing addressable items keyed against their addresses. Instances of `AddressMap` can replace the four fields of `Graph`: ```js _nodes: AddressMap<Node<mixed>>; _edges: AddressMap<Edge<mixed>>; _outEdges: AddressMap<{\|+address: Address, +edges: Address[]\|}>; _inEdges: AddressMap<{\|+address: Address, +edges: Address[]\|}>; ``` Test Plan: New unit tests included, with 100% coverage: `yarn flow && yarn test`. wchargin-branch: address-map	2018-03-05 16:18:20 -08:00
William Chargin	a798f9bac2	Rewrite GitHub plugin payload type system (#64 ) Summary: We’re stripping down the payload types for the GitHub plugin, to only include what we expect to use immediately. In doing so, we take the opportunity to make the typing a little stronger, so that we can ensure that the `type` field of a specific type of payload is set to a particular constant. Paired with @dandelionmane. Test Plan: Adding these lines to `githubPlugin.js` and running `yarn flow` indicates that the typechecking is working as expected: ```js ("ISSUE" : NodeType); // works ("WEIRD" : NodeType); // fails ("AUTHORSHIP" : EdgeType); // works ("UNEXPECTED" : EdgeType); // fails ``` wchargin-branch: github-plugin-payload-types	2018-03-03 15:17:11 -08:00
William Chargin	50c575b2f9	Explicitly test address↔string function error case (#63 ) Summary: `graph.js` coverage is now 100% :-) Test Plan: `yarn jest --env=jsdom --coverage` shows no uncovered lines for `graph.js`, and no failing tests. wchargin-branch: coverage-gremlin	2018-03-03 15:10:57 -08:00
William Chargin	9b203e8489	Add graph merge functions (#62 ) Summary: Merging graphs will be a common operation. At a per-plugin level, it will often be useful to build up graphs by creating many very small graphs and then merging them together. At a cross-project level, we will need to merge graphs across repositories to gain an understanding of how value flows among these repositories. It’s important that the core graph type provide useful functions for merging; this commit adds them. Test Plan: New unit tests added; run `yarn flow && yarn test`. wchargin-branch: graph-merge	2018-03-02 21:35:51 -08:00
William Chargin	82dbf64a2c	Add an equality function for `Graph` (#61 ) Summary: We need this for testing graph equality: deep-equality is not sufficient because two graphs can be logically equal even if, say, two nodes are added in different orders. This commit adds a dependency on `lodash.isequal` for deep equality. Test Plan: New unit tests added. Run `yarn flow && yarn test`. wchargin-branch: graph-equals	2018-03-02 21:13:30 -08:00
William Chargin	5a2380d486	Move `addNode`/`addEdge` tests away from getters (#60 ) Summary: Nothing big; these were just organized wrong. wchargin-branch: test-reorg	2018-03-02 20:46:35 -08:00
William Chargin	58410c62fa	Replace lingering `mealGraph`s in test case (#57 ) Summary: In merging #54, there was a semantic merge conflict that was not also a textual merge conflict; this created a failure that only appeared once that commit was merged. We propose that to fix this in the future, we only merge commits that are directly ahead of master. Test Plan: This fixes `yarn flow` and `yarn test`. wchargin-branch: fix-merge-conflict	2018-03-02 14:16:51 -08:00
William Chargin	97446138ab	Make `Address`, `Node`, `Edge` read-only and exact (#56 ) Summary: Again: we assume these invariants, so we may as well encode them. We should just keep in mind that non-Flow users may wantonly violate these, so we should still code defensively. wchargin-branch: readonly-exact	2018-03-02 13:49:34 -08:00
William Chargin	f305a48391	Check for `null`/`undefined` in graph functions (#55 ) Summary: These will make nicer error functions in cases where static analysis doesn’t detect the pollution: e.g., a user isn’t using Flow, or an expression like `arr[0]` introduces an `undefined`. Paired with @dandelionmane. Test Plan: New unit tests added. Run `yarn test`. wchargin-branch: null-undefined-check	2018-03-02 13:47:13 -08:00
Dandelion Mané	ca3502009b	Create an 'advancedMealGraph' test case (#54 ) Create an 'advancedMealGraph' test case The advancedMealGraph will be a grab-all that holds all advanced and edge behaviors, e.g. the crab-self-referential loop, and the case where there are multiple directed edges between the same two nodes. Aggregating them into one test case will make it easier to test more complex behaviors, like graph merging and serialization, on the edge case graphs. However, it's still nice to have the simple graph so that we can test simple things too. The specific tests for edge case behavior are left mostly unchanged, in that they start from the simple graph and add just the advanced feature that they want to test.	2018-03-02 13:45:52 -08:00
William Chargin	cae3a92dc9	Add `getAllNodes` and `getAllEdges` functions (#53 ) Summary: Without these functions, it is not possible to meaningfully operate on an arbitrary graph. Paired with @dandelionmane. Test Plan: New unit tests included. Run `yarn flow && yarn test`. wchargin-branch: get-all	2018-03-02 11:33:45 -08:00
William Chargin	01510ca63f	Make node and edge types exact (#51 ) Summary: We’ve realized that `u: Edge<T>` implies `u: Node<T>`. That certainly wasn’t what we were expecting! We might want something like that eventually, to capture the fact that valuations are themselves valuable, but for now the type system should encode the assumptions that we’re actually making. See also #50. Paired with @dandelionmane. wchargin-branch: exact-types	2018-03-02 11:31:38 -08:00
William Chargin	09156bf3f4	Promote `Graph` to a class with useful methods (#49 ) Summary: We had planned to expose our core types as simple Plain Old JavaScript Objects, with accompanying standalone functions to act directly on these data structures. We chose this instead of creating `class`es for the types because it simplifies serialization interop: it obviates the need for serialization and deserialization functions, because the code is separated from the data entirely. Reconsidering, we now think that the convenience benefits of using classes probably outweigh these serialization cons. Furthermore, this design enables us to separate ancillary data structures and caches from the raw data, presenting a cleaner API for consumers of the data. This commit introduces a `Graph` class and some related logic. With lots of tests! And 100% code coverage! :-) Paired with @dandelionmane. Test Plan: Run `yarn flow && yarn test` to see the new tests. wchargin-branch: graph-class	2018-03-01 01:04:11 -08:00
William Chargin	f5d486087d	Pull in-edges and out-edges up to top-level graph (#48 ) Summary: The main problem with having these fields on the node is that this presents the illusion that the API surface area is larger than it actually is. Clients with reference to a node object could somewhat-reasonably expect that mutating these fields would be sufficient to update the structure of the graph, but this isn’t the case (as the edge objects would need to be updated, too). It’s a nice semantic bonus, too, as edges aren’t conceptually “part of” nodes. wchargin-branch: top-level-edges	2018-03-01 00:48:23 -08:00
William Chargin	66243a16c1	Remove weights from the weighted graph (#47 ) Summary: This is an experiment. There are a couple diffferent meanings of “weight” in play: most prominently, weights assigned by plugins versus those suitable for comparison among other arbitrary weights. We’re not sure what the right thing is to put in the actual graph object, so we’re going to think about this a bit more before adding the field back in. wchargin-branch: remove-weights	2018-03-01 00:45:52 -08:00
William Chargin	43450f18b1	Rename `sourceId`/`destId` to `src`/`dst` (#46 ) Summary: The “ID” parts were left-over from the Great Address Migration, and we think that abbreviations are fine here, anyway. Test Plan: `yarn flow && yarn test` wchargin-branch: src-dst-rename	2018-03-01 00:32:30 -08:00
William Chargin	01df727c39	Add tiny-example-repository example data (#44 ) Summary: The sourcecred/tiny-example-repository repository stores some example data that we can use to generate test cases. As of now, the repository has been archived so that its state is stable. This commit checks in the result of our scraper on the repository. wchargin-branch: example-data	2018-02-28 20:36:18 -08:00
Dandelion Mané	9dc9d5e4f3	Change order of repositoryName and pluginName	2018-02-28 17:47:15 -08:00
Dandelion Mané	58ad1eb635	Add inEdges and outEdges for Nodes.	2018-02-28 17:47:15 -08:00
Dandelion Mané	2992a31157	Graph concept renames - ID -> Address - ID.name -> Address.id - GraphEdge -> Edge - GraphNode -> Node	2018-02-28 17:47:15 -08:00
William Chargin	791cad9059	Add conversion functions for id (#38 ) Test Plan: Run `yarn test` and note that tests pass. wchargin-branch: id-conversion	2018-02-26 22:57:00 -08:00
Dandelion Mané	bc2377448f	Move package json to root (#37 ) Reorganize the code so that we have a single package.json file, which is at the root. All source code now lives under `src`, separated into `src/backend` and `src/explorer`. Test plan: - run `yarn start` - it works - run `yarn test` - it finds the tests (all in src/explorer) and they pass - run `yarn flow` - it works. (tested with an error, that works too) - run `yarn prettify` - it finds all the js files and writes to them	2018-02-26 22:32:23 -08:00

... 10 11 12 13 14

669 Commits