sourcecred

Commit Graph

Author	SHA1	Message	Date
Dandelion Mané	c022b3f4d0	`RelationalView` tracks GitHub references (#431 ) For every `TextContentEntity` (`Issue`, `Pull`, `Review`, `Comment`), this commit adds a `references` method that iterates over the entities that the text content entity references. For every `ReferentEntity` (actually, every entity), this commit adds a `referencedBy` method which iterates over the text content entities that reference that referent entity. This method also adds `referentEntities` and `textContentEntities` methods to the `RelationalView`, as they are used in both implementation and test code. Test plan: The snapshot tests include every reference, in a format that is very convenient for inspecting the ground truth on GitHub. For every reference, it's easy to check that the reference actually exists by copying the `from` url and pasting it into the browser. I've done this and they check out. (It's not easy to prove that there are no missing references, but I'm pretty confident that this code is working.) Unit tests ensure that the `references` and `referencedBy` methods are consistent.	2018-06-28 13:32:47 -07:00
Dandelion Mané	6235febdac	Add porcelain-style classes to `RelationalView` (#424 ) Based on offline design discussion with @wchargin, we've decided to upgrade the `RelationalView` to be the comprehensive source for GitHub data inside SourceCred. The `RelationalView` will contain the full dataset, including parsed relational information (such as cross-references between GitHub entities). Then, we will project our GitHub graph out of the `RelationalView`. To that end, the `RelationalView` no longer exports raw data blobs. Instead, it exports nice classes: `Repo`, `Issue`, `Pull`, `Review`, and `Userlike`. These classes have convenient methods for accessing both their own data and related entities, e.g. `repo.issues()` yields all the issues in that repo. This is effectively a port of #170 into the v3 API. The main difference is that in v1, the Graph contained this data store, whereas in v3, we will use this data store to generate the graph. This supersedes #418. Test plan: The snapshot tests are quite readable.	2018-06-27 15:25:20 -07:00
William Chargin	e7b28b81db	Convert V3 graphs to Markov chains (#427 ) Summary: This is based on the V1 file `basicPagerank.js`. The API is necessarily changed for the new graph format, and we export additional utilities compared to the previous version of the module (useful for testing and serialization). We also improve the implementation to make it simpler and easier to understand. Test Plan: Unit tests included. wchargin-branch: v3-graph-markov-chain	2018-06-27 15:24:47 -07:00
William Chargin	b9c67f447f	Expose `advancedGraph` test case (#426 ) Summary: We’d like to use this test case to generate a Markov chain, which requires that it not be local to the `graph.js` tests. Test Plan: Existing unit tests suffice. wchargin-branch: expose-advanced-graph	2018-06-27 15:18:47 -07:00
William Chargin	faa2f8c9d0	Copy Markov chain code from V1 to V3 (#425 ) Summary: This code is independent of the graph abstraction, and so is mostly copied. The only change is to the structure of the test code (we now prefer to wrap everything in a big `describe` block with an absolute path to the module under test). Test Plan: Unit tests included. wchargin-branch: v3-markov-chain	2018-06-27 15:14:42 -07:00
Dandelion Mané	659fc51d9b	Rename `findReferences` to `parseReferences` (#429 ) This code is about parsing references out of text, so `parseReferences` is a better name. The code that consumes this logic to find all the references in the GitHub data shall be rightly called `findReferences` Test plan: `yarn travis`	2018-06-27 13:21:25 -07:00
William Chargin	518d5b819c	Represent submodule commits as normal commits (#423 ) Summary: Closes #417. Submodule commits are dead; long live commits. The ontology is now: - A tree includes tree entries. - A tree entry may have a blob as contents. - A tree entry may have a tree as contents. - A tree entry may have a commit as contents. Test Plan: Existing unit tests suffice, especially `#commits yields all commits`. wchargin-branch: git-remove-submodule-commits	2018-06-27 12:01:07 -07:00
William Chargin	38c364c916	Allow Git commits to have zero or one tree (#422 ) Summary: Submodule commits need not have associated tree objects, in case the repository to which they belong does not exist in our graph. We’d like to represent submodule commits as actual commits, which necessitates this change. See #417 for context. Test Plan: Existing unit tests suffice. wchargin-branch: git-affine-trees	2018-06-27 11:47:39 -07:00
William Chargin	dd83d7b4ab	Implement a Git graph view (#415 ) Summary: Similar in structure to the GitHub graph view. Test Plan: Unit tests added, with full coverage. wchargin-branch: git-graph-view	2018-06-26 14:00:19 -07:00
William Chargin	0522894a8d	Create Git graph (#406 ) Summary: This commit adds logic to create the Git graph, modeled after the GitHub graph creator in #405. In this commit, we do not include the corresponding porcelain; a Git `GraphView` will be added subsequently. Kudos to @decentralion for suggesting in #187 that I write the logic to detect BECOMES edges against the high-level data structures. Due to that decision, the logic and tests are copied directly from the V1 code without change, because the high-level data structures are the same. The new code is exactly the body of the `GraphCreator` class. Test Plan: Verify that the new snapshot is likely equivalent to the V1 snapshot, using the heuristic that the two graphs have the same numbers of nodes (59) and edges (84). (I have performed this check.) wchargin-branch: git-v3-create-graph	2018-06-26 13:54:47 -07:00
Dandelion Mané	a470f28204	Add GitHub `RelationalView` (#411 ) The `RelationalView` maps the GitHub GraphQL response data into a View class, which makes it easy to access pieces of GitHub data by their corresponding `StructuredAddress`. This will be a valuable companion to the graph, making it possible to access GitHub node data like the title or body of an issue via the issue's address. This basically is the supplement to the GitHub graph that includes the "payloads" from our v1 Graph. It will also make creating the GitHub graph a lot more convenient, although I've left that for another commit. Designed with feedback from @wchargin. Note: The `RelationalView` objects have a `nominalAuthor` rather than `author`, so as to distinguish between authorship in the GitHub data model (entities have at most one author) and in the SourceCred model (entities may have multiple authors). Test plan: Inspect the included snapshots for reasonability, and run unit tests.	2018-06-22 20:32:07 -07:00
Dandelion Mané	2dec8868db	Copy `github/findReferences` from v1 to v3 (#410 ) The code will be refactored so that references are expressed in terms of the GitHub node address code; the implementation is copied first so that the review will be cleaner. Test plan: `yarn travis` passes.	2018-06-22 16:16:48 -07:00
William Chargin	127200f67c	Cache core graph `checkInvariants` result (#408 ) Summary: The public method `checkInvariants` on graph is now cached. The cache is invalidated when the graph is modified via the public API. As a result of this change, the time of `yarn ci-test --testPathPattern src/v3/` decreases from 5.631s to 3.866s (best-of-three timing, but low variance anyway). This effect becomes much more pronounced as higher-level APIs check their own invariants by themselves indirectly invoking the graph’s `checkInvariants` method many times. Test Plan: Existing unit tests have been adapted and extended. Tests for the invariant checking have been updated to call the internal, uncached method, and new tests have been added to check that the caching behavior is correct. wchargin-branch: cache-graph-invariants	2018-06-22 16:16:43 -07:00
Dandelion Mané	a209caeec2	create the GitHub graph (#405 ) This commit: - adds `github/createGraph.js` - which ingests GitHub GraphQL response - and creates a GitHub graph - adds `github/graphView.js`, - which takes a Graph - and validates that all GitHub specific node and edge invariants hold - every github node may be parsed by `github/node/fromRaw` - with the right node type - every github edge may be parsed by `github/edge/fromRaw` - with the right edge type - with the right src address prefix - with the right dst address prefix - every child node has exactly one parent - of the right type - and provides convenient porcelain methods for - finding repos in the graph - finding issues of a repo - finding pulls of a repo - finding reviews of a pull - finding comments of a Commentable - finding authors of Authorables - finding parent of a ChildAddress - tests `createGraph` - via snapshot testing - by checking the GraphView invariants hold - tests `graphView` - by checking individual entities in the example-git repository have the proper relationships - by checking that for every class of invariant, errors are thrown if the invariant is violated Test plan: - Extensive unit and snapshot tests added. `yarn travis` passes.	2018-06-22 13:10:19 -07:00
Dandelion Mané	24a7547e16	Use Git Commit type in GitHub mergedAs edge (#407 ) Test plan: `yarn travis` passes	2018-06-22 12:12:08 -07:00
William Chargin	448fb3e1a8	Add edge type definitions for V3 Git plugin (#404 ) Summary: This is modeled after the GitHub edge module format. In particular, the whole length encoding garbage is directly copied. As in that module, we decline to test the error paths. Test Plan: Unit tests added; run `yarn travis`. Snapshots are readable. wchargin-branch: git-v3-edges	2018-06-20 15:49:50 -07:00
William Chargin	7c1b3ca835	Add node type definitions for V3 Git plugin (#403 ) Summary: This is modeled after the GitHub node module format, with the obvious alterations plus a bit more type safety in the implementation of `toRaw` (namely, we check `type` exhaustively). Test Plan: Unit tests added; run `yarn travis`. wchargin-branch: git-v3-nodes	2018-06-20 15:40:08 -07:00
William Chargin	83151d9fac	Remove payload definitions from V3 `git/types.js` (#402 ) Test Plan: Existing Flow and unit tests suffice. wchargin-branch: git-v3-remove-payloads	2018-06-20 15:32:12 -07:00
William Chargin	9347348dd7	Copy graph-independent V1 Git plugin code to V3 (#401 ) Summary: Many files are unchanged. Some files have had paths updated, or new build/test targets added. The `types.js` file includes payload type definitions. These are technically independent of the graph abstraction (i.e., nothing from V1 is imported and the code all still works), but it of course implicitly depends on the V1 model. For now, we include the entirety of this file, just so that we have a clean copy operation. Subsequent commits will strip out this extraneous code. Suggest reviewing with the `--find-copies-harder` argument to Git’s diffing functions. Test Plan: Running `yarn travis --full` passes. Running ./src/v3/plugins/git/demoData/synchronizeToGithub.sh --dry-run yields “Everything up-to-date”. wchargin-branch: git-v3-copy	2018-06-20 15:28:37 -07:00
William Chargin	281cb574d5	Greatly simplify GitHub edge tests (#400 ) Summary: We had `edgeExamples`, wherein we constructed examples of edge addresses (not actual edges) by manual instantiation. We checked that these matched snapshots, and then we also called `createEdge` a bunch to create actual edges, and checked that those matched snapshots, too. Consequently, we had twice as many snapshots as we needed, and also defined twice as many edge addresses as we needed. Test Plan: Note that snapshot contents are either deleted or unchanged. wchargin-branch: simplify-github-edge-tests	2018-06-19 16:01:26 -07:00
William Chargin	b9c01d13c9	Use `edgeToParts` in the GitHub edge tests (#399 ) Test Plan: Observe that the new snapshots are easier to read. Might as well make sure that they encode the same data as the old snapshots, too. Note that the backslash character no longer appears in this snapshot file. :-) wchargin-branch: use-edgeToParts	2018-06-19 15:55:10 -07:00
William Chargin	aac2fc6792	Add `edgeToParts` convenience export from `Graph` (#398 ) Summary: We have `edgeToString`, which formats edges as nicely human-readable strings. However, these strings have some quotes in them, and so when they are themselves stringified (e.g., as part of a Jest snapshot), they become much harder to read. We thus introduce `edgeToParts` to make our snapshots more readable. Test Plan: Unit tests added; run `yarn travis`. wchargin-branch: add-edgeToParts	2018-06-19 15:48:04 -07:00
William Chargin	ea74955a66	Fix GitHub node `fromRaw` error-path test cases (#397 ) Summary: In #394, we uppercased the constants for GitHub node types. However, we were using string literals instead of constants in the test cases. These test cases were supposed to cover every error path, but instead ended up just covering the “bad type” error path many times. Any one of the following would have prevented this regression: 1. using string constants instead of literals in the test case; 2. throwing and checking more precise error messages; or 3. being alerted that coverage decreased as a result of the change. In this commit, we enact the first of these options. I’m open to adding a coverage bot, but don’t feel strongly about it at this time. Test Plan: Running `yarn coverage` now shows 100% coverage for the `nodes.js` module, whereas previously almost all `throw fail();` lines were uncovered (and the branch coverage was just 76%). wchargin-branch: fix-github-node-error-tests	2018-06-19 14:57:14 -07:00
Dandelion Mané	ed3397f654	Add GitHub prefixes and const types (#395 ) - Switch string constant node and edge types (e.g. "REPO") to exported consts (eg `export const REPO_TYPE`). - Add (and internally use) a `_Prefix` psuedomodule which contains per-type address prefixes - Test that constructing a StructuredAddress with the wrong type is an error. Test plan: Unit tests pass, snapshots unchanged. Paired with @wchargin	2018-06-14 15:01:33 -07:00
Dandelion Mané	a8bf6a36bf	Add `CommentableAddress` (#396 ) Test plan: Not needed. Paired with @wchargin	2018-06-14 14:52:29 -07:00
William Chargin	b6eebddeb0	Use uppercase enum constants in GitHub addresses (#394 ) Summary: @decentralion wants this! :-) Test Plan: Verify that the case-insensitive diff is empty: ``` $ git config --global difftool.idiff.cmd 'diff -ui "$LOCAL" "$REMOTE"' $ git difftool -y --tool idiff HEAD~1..HEAD ``` wchargin-branch: uppercase-enum	2018-06-14 13:45:55 -07:00
William Chargin	7ce4a0c32d	Use `NodeAddress.empty` and `EdgeAddress.empty` (#393 ) Summary: This fixes up all instances of `fromParts([])` that are not in `address.js` or `address.test.js`. Paired with @decentralion. Test Plan: Running `git grep --name-only -F 'fromParts([])'` yields only the two modules listed above. Existing unit tests suffice for correctness. wchargin-branch: use-address-empty	2018-06-14 13:11:08 -07:00
William Chargin	6ba6d885ad	Add `empty` (monoid identity) to address modules (#392 ) Summary: This can make invocations of `FooAddress.fromParts([])` a bit more succinct. Paired with @decentralion. Test Plan: Unit tests added. Run `yarn travis`. wchargin-branch: address-empty	2018-06-14 13:06:26 -07:00
Dandelion Mané	7199586262	Add `Graph.edges` filtering by prefixing (#391 ) Similar to #390, we now allow filtering the results from `Graph.edges` by address prefixes. It's a little more complicated than #390, as we allow filtering by src, dst, or address. Test plan: Unit tests added. `yarn travis` passes. Paired with @wchargin	2018-06-14 11:59:58 -07:00
Dandelion Mané	1a08a48c03	Implement prefix filtering for `Graph.nodes` (#390 ) Simple API addition to match v1/v2 semantics. In the future, we can perf optimize this if we switch graph to store nodes organized by shared prefixes. Test plan: Unit tests were added. `yarn travis` passes. Paired with @wchargin	2018-06-14 11:18:47 -07:00
Dandelion Mané	95c5af36d9	Add methods for parsing Ids from GitHub urls (#388 ) Test plan: Unit tests added. Run `yarn travis`. The GitHub regex code is inspired by work in #98	2018-06-13 16:25:15 -07:00
William Chargin	2491fcd3cb	Add GitHub `edges` module (#385 ) Summary: This module includes a raw edge type, a structured edge type, and edge creation functions that take source and destination and create an edge. Test Plan: Unit tests added. These cover all of the successful cases, and none of the unsuccessful cases. We plan to refactor this code Soon™, and it is hard to see how to nicely factor the tests without just testing the same code paths over and over. wchargin-branch: github-edges	2018-06-13 16:19:50 -07:00
Dandelion Mané	17b390afe9	Add trait-specific GitHub address types (#387 ) Will be useful in graph creation logic, and in #385 Test plan: Only change is to add types. No testing needed. See design discussion [on discord]. Paired with @wchargin [on discord]: https://discordapp.com/channels/453243919774253079/454007907663740939?jump=456564101183832064	2018-06-13 14:42:37 -07:00
William Chargin	25f74b89e9	Export `_githubAddress` from GitHub `nodes` (#384 ) Summary: We’ll want to use this in the upcoming `edges` module. Test Plan: Existing unit tests suffice. wchargin-branch: expose-githubaddress	2018-06-13 13:53:09 -07:00
Dandelion Mané	ad9ac55bef	Github Addresses: Rename `fragment` to `id` (#386 ) Test plan: `yarn travis`	2018-06-13 13:41:52 -07:00
William Chargin	748f9210a6	Rename various aspects of GitHub `nodes` module (#383 ) Summary: First, we rename the module itself from `address` to `nodes`: we’d like to put the edge functions in a parallel `edges` module instead of cramping it into this one, so it stands to reason that this one should be called `nodes`. We also rename the `GithubAddressT` type to `RawAddress`, so that the module exports `RawAddress` and `StructuredAddress`. The functions then have much better natural names of `toRaw` and `fromRaw`. Test Plan: Existing unit tests suffice. wchargin-branch: rename-nodes	2018-06-12 18:09:41 -07:00
Dandelion Mané	e4d9ce1565	gh plugin: use consistent concise naming for pulls (#381 ) One of the slight modifications we've made in v3 is to effect the following renames (as implemented in #380): PullRequest -> Pull PullRequestReview -> Review PullRequestReviewComment -> ReviewComment This commit just changes the rest of the github code in v3 to follow the new convention. Test plan: `yarn travis --full` passes.	2018-06-12 14:15:55 -07:00
William Chargin	a3f2b82073	Add snapshot test for GitHub GraphQL query (#382 ) Summary: This has two primary benefits: - Humans can look at this snapshot file to see what’s being queried, or to manually issue a query. - When we change the programmatically generated query, we can easily see what the results are in the GraphQL output. This makes it easy to verify that a change is correct. Test Plan: None. wchargin-branch: snapshot-query	2018-06-12 14:09:58 -07:00
Dandelion Mané	773596755a	Add an address module for the GitHub plugin (#380 ) Summary: This module exposes a structured type `StructuredAddress`, an embedding `GithubAddressT` of this type into the `NodeAddress` layer, and functions to convert between the two. Paired with @wchargin. Test Plan: Unit tests added, with full coverage. Snapshots are easily readable.	2018-06-12 11:05:09 -07:00
Dandelion Mané	0339d9f41b	Port GitHub data ingestion into v3 (#378 ) This commit copies the following logic necessary for downloading GitHub data into v3. Minimal changes have been made to accomodate the new path structure. Test plan: - Manually ran plugins/github/fetchGithubRepoTest.sh and verified that it can correctly pass and fail - Added the v3 github repo test to `yarn travis --full` - Ran `yarn travis --full` and it passed Paired with @wchargin	2018-06-11 18:57:37 -07:00
William Chargin	ed70947c63	Update GitHub example repository data (#379 ) Summary: We’ve added a comment directly on a pull request. Paired with @decentralion. Test Plan: `yarn travis --full` passes. wchargin-branch: update-example-github	2018-06-11 18:53:57 -07:00
William Chargin	5d3cfd82e4	Check graph invariants during tests (#372 ) Summary: Each of the invariants listed at the top of the `Graph` class is now explicitly checked by `checkInvariants`, which is called at the end of each `Graph` method during tests only. This is powerful: it means that not only do our tests for `Graph` test the graph, but also any tests that depend on `Graph`—e.g., plugin code—will give us extra invariant testing on `Graph`. As noted in a comment, if this becomes bad for performance, we can blacklist expensive tests or whitelist tests that we care about. A graph method may assume that the graph invariants hold before the method is invoked. Within the body of a graph method, invariants may be violated, but the method must ensure that the invariants hold immediately before it returns or yields. A consequence of this is that if a graph function internally calls a public function (e.g., `addEdge` might call `hasNode` to check that the source and destination exist), then it must ensure that the invariants hold before the internal call. This is not an “implementation detail” or “caveat”; it is simply part of the interface of public functions. It is legal and reasonable for private helper functions to explicitly not expect or not guarantee that particular invariants hold, and in this case the exception should be documented. (This is not yet the case in any of our code.) Finally, note that the `checkInvariants` method should not call any public methods, because those methods in turn call `checkInvariants`. If this becomes a huge pain, we can look into implementing some kind of “only check invariants if the invariants are not actively being checked”, but I’d much rather not do so if we don’t have to. Test Plan: Running `yarn coverage` indicates that each of the failure cases is verified. In principle, I’d be willing to add a test that parses the source code for `graph.js` and verifies that each `return`, `yield`, or implicit return is preceded by an invariant check. But I don’t really want to implement that right now. wchargin-branch: automatic-invariants	2018-06-11 12:28:25 -07:00
Dandelion Mané	c352b5b8d6	Add an `advancedGraph` test case (#377 ) The `advancedGraph` is an example graph defined in `graph.test.js`. It shows off many tricksy features, like having loop edges, multiple edges from the same src to same dst, etc. We also provide two ways of constructing it: `graph1` is straightforward, `graph2` adds tons of spurious adds, removes, and odd ordering. This way we can ensure that our functions treat `graph1` and `graph2` equivalently. Test plan: New unit tests are added verifying that `equals`, `merge`, and `to/fromJSON` handle the advanced graph appropriately.	2018-06-11 12:08:53 -07:00
Dandelion Mané	5fc0d42c1f	Implement `Graph.toJSON` and `Graph.fromJSON` (#374 ) The serialization scheme uses `IndexedEdge`s: ```js type Integer = number; type IndexedEdge = {\| Address: EdgeAddressT, srcIndex: Integer, dstIndex: Integer, \|} ``` The nodes are first sorted. Then, we generate indexed edges from the regular edges by replacing each node address with its index in the sorted order. This encoding reduces the number of addresses serialized from `n + 3e` to `n + e` (where `n` is the number of nodes and `e` is the number of edges). This is based on work in #295, but in contrast to that PR, we do not index the in-memory representations of graphs. Only the JSON representation is indexed. Test plan: Unit tests added. A snapshot test is also included, both to make it easy to inspect an example of a JSON-serialized graph, and to ensure backwards-compatibility. (The snapshot likely should not change independent of the VERSION string.)	2018-06-11 11:57:29 -07:00
Dandelion Mané	6177f6c740	Reimplement `Graph.copy` using `Graph.merge` (#376 ) * Implement `Graph.merge` Tests are mostly copied over from the v2, as implemented in #320. Some new tests were added, e.g. checking that Merge correctly handles 10 small graphs combined. Test plan: See unit tests. * Reimplement `Graph.copy` using `Graph.merge` Test plan: Existing unit tests suffice Suggested by @wchargin	2018-06-11 10:55:52 -07:00
Dandelion Mané	46751d2707	Implement `Graph.merge` (#375 ) Tests are mostly copied over from the v2, as implemented in #320. Some new tests were added, e.g. checking that Merge correctly handles 10 small graphs combined. Test plan: See unit tests.	2018-06-11 10:55:30 -07:00
Dandelion Mané	5fde1c10a5	Copy `v1/util` to `v3/util` (#373 ) No changes made to the code - it's a straight copy. Test plan: Unit tests are included.	2018-06-11 10:42:25 -07:00
William Chargin	831f5f571c	Make invariants more precise (#371 ) Summary: The previously listed invariants were weak on two counts. First, it was unstated that the keys of `_inEdges`, `_outEdges`, and `_nodes` should coincide. Second, the “exactly once” condition on edge inclusion had the unintentional effect that edge absent in `_edges` but present twice or more in each of `_inEdges` and `_outEdges` would not violate the invariant. Test Plan: Stay tuned. wchargin-branch: strengthen-invariants	2018-06-08 15:19:25 -07:00
Dandelion Mané	d9e2850eb3	Implement `Graph.copy` (#370 ) The implementation is quite simple. The tests are somewhat more comprehensive than in v2 or v1. We now test that copies are equal to the original in a variety of situations. Test plan: Unit tests added.	2018-06-08 15:19:13 -07:00
Dandelion Mané	feef119250	Add and implement `Graph.equals` (#369 ) It turns out we forgot to add this to the API, so I added it. I also implemented it. The tests are pretty thorough; as an added innovation over our previous tests (e.g. in #312 and #61), we now consistently test that equality is commutative. In contrast to our previous implementations, this one is massively simpler. That's an upside of using primitive ES6 data structures to store all of the graph's information... which is itself an upside of not trying to store arbitrary additional information in the graph. Now we can just do a deep equality check on the underlying nodes set and edges map! We might be able to performance tune this method by taking advantage of the structure of our nodes and edges. This should suffice for now, though. Paired with @wchargin Test plan: Unit tests were added. Run `yarn travis`	2018-06-08 14:17:57 -07:00

... 4 5 6 7 8 ...

633 Commits All Branches Search

633 Commits

All Branches