sourcecred

mirror of https://github.com/status-im/sourcecred.git synced 2025-02-18 15:38:10 +00:00

Author	SHA1	Message	Date
William Chargin	838092194b	mirror: add support for connections of union types (#882 ) Summary: Almost every GitHub connection has nodes of an object type, like `User` or `IssueComment`. But a few have nodes of union type, including `IssueTimelineItemConnection` (which we will likely want to query), and those require special handling. This commit adds susupport for such connections. Analysis to determine which connections have non-object elements: <https://gist.github.com/wchargin/647fa7ed8d6d17ae2e204bd098104407> Test Plan: Unit tests modified appropriately, retaining full coverage. The easiest way to verify the snapshot is probably to copy the raw contents (everything inside the quotes) into `/tmp/snapshot`, then run: ```shell $ sed -e 's/\\//g' </tmp/snapshot >/tmp/query # Jest adds backslashes $ jq -csR '{query: ., variables: {}}' </tmp/query >/tmp/payload $ ENDPOINT='https://api.github.com/graphql' $ AUTH="Authorization: bearer ${SOURCECRED_GITHUB_TOKEN}" $ curl "$ENDPOINT" -X POST -H "$AUTH" -d @- </tmp/payload >/tmp/result ``` and then execute the JQ program mentioned in the comment in the test case, and verify that it prints `true`. wchargin-branch: connection-of-union	2018-09-21 16:20:58 -07:00
Dandelion Mané	23ae7e2f08	Rename the core `Repo` type to `RepoId` (#881 ) We have a core type called `Repo`, but it really is an identifier to a repo, rather than being a repo itself. This is confusing since we have a data type called `Repository` which actually represents the data in a repository. I've renamed `Repo` to `RepoId` for clarity. Test plan: `yarn test --full` passes. Running the frontend passes after wholly regenerating the sourcecred data directory.	2018-09-21 14:44:29 -07:00
Dandelion Mané	1e5f728e29	Cred explorer: display commit short hash + summary (#879 ) This modifies how commits are displayed in the cred explorer. Rather than printing the full hash, we now print a short hash followed by the summary. Test plan: Snapshot is updated, also I tested it by running SourceCred on a real repository.	2018-09-21 13:24:28 -07:00
William Chargin	09ed51ed6e	mirror: align test schema more with GitHub schema (#880 ) Summary: This commit changes the `Issue` type of the schema used in the `mirror` tests to have fields a faithful subset of those in the actual GitHub schema. The tests are self-contained, so this is not strictly required. However, it is convenient, because it means that we can snapshot a query that can actually be posted to GitHub. Test Plan: Running `yarn unit mirror` suffices for the code change. The GitHub schema docs at <https://developer.github.com/v4/object/issue/> indicate that each of `id`, `url`, `author`, `repository`, `title`, and `comments` is a valid field. wchargin-branch: mirror-test-schema	2018-09-20 17:17:27 -07:00
Dandelion Mané	0b3e2e7f7f	Load commits' short hash and summary (#876 ) This modifies the Git Commit type to includea short hash and a oneline summary, and modifies `loadRepository` so that we actually get that data. The example-git repository has been updated to include a commit with leading whitespace and a pipe in the summary, to ensure that these are respected. Test plan: Observe that the snapshot is updated, and the updates are correct. `yarn test --full` passes.	2018-09-20 15:51:58 -07:00
William Chargin	6ae5c56624	mirror: query and process connection updates (#878 ) Summary: This commit adds internal functions to (a) emit a GraphQL query to fetch data for a particular connection, and (b) ingest the results of said query back into the database. This commit makes progress toward #622. Test Plan: Unit tests included, with full coverage. While these tests check that the GraphQL queries are as expected, they cannot check that they are actually valid in production. To check this, follow the instructions in the added snapshot test. wchargin-branch: mirror-connection-updates	2018-09-20 15:46:03 -07:00
Dandelion Mané	5348fe68bf	Remove git adresses for trees, blobs, and entries (#877 ) In #873 I removed the data types for trees, blobs, and entries, but neglected to remove the address related code. This commit corrects that mistake. Some test cases in other modules have been removed because the failure is now structurally impossible, e.g. it is not possible that we would provide a non-commit address to the GitHub plugin, because non-commit addresses do not exist. Test plan: `yarn test --full` passes.	2018-09-20 15:18:01 -07:00
William Chargin	1dd8b7bcb7	mirror: add internal method `_findOutdated` (#875 ) Summary: This function finds all objects whose own data has not been updated since a given time, and all connections whose entries have not been updated since that time. Note that this is scoped to the entirety of the database. In #622, I discussed using a recursive common table expression to identify only those transitive dependencies of the root. I think that this is overkill for the `_findOutdated` method: you’ll usually want to update everything in the database. Don’t worry—the cool recursive query will still be used in the `extract` function. :-) This commit makes progress toward #622. Test Plan: Unit tests added, with full coverage; run `yarn unit`. wchargin-branch: mirror-findoutdated	2018-09-20 14:19:47 -07:00
Dandelion Mané	e3f04c5079	Git plugin serializes the repository and graph (#874 ) This modifies the behavior when loading the Git plugin so that it serializes the Repository as well as the graph. This will allow us to get extra information, like the commit headline, to the Git plugin in the frontend. As an added bonus, we can now refactor `loadRepositoryTest` to depend on `sourcecred.js load` rather than `loadAndPrintRepository`. As this was the only use for `loadAndPrintRepository`, we can safely delete it. This improves our test quality because it means we are also testing the actual CLI behavior. Note that the switch from using `stringify` to `json.tool` for pretty-printing has resulted in a trivial diff in the snapshot. Test plan: `yarn test --full` passes.	2018-09-20 14:07:58 -07:00
Dandelion Mané	5b8ec53b13	Remove all git tree related code (#873 ) In #627, I made a case for removing all trees and blobs from the cred graph. The issue was that the data was bloated and noisy, and did not provide much value in its current form. This commit follows on that by actually removing the code from the codebase (rather than keeping it unused). I want to delete this code because I believe: 1. It is unlikely to see further use in its current form (because collecting the entire Git tree structure is just too noisy for our purposes) 2. In the event that we do need it, reviving it will not be too difficult (because it is all quite locally scoped to the Git plugin). 3. Keeping unused code increases ongoing maintenance + development costs, and I'd like to bias towards keeping the codebase simple and lean. In the event that a future contributor is reviving this code and finds it a pain, I pre-emptively apologize to you. Test plan: `yarn test --full` passees.	2018-09-20 13:50:27 -07:00
William Chargin	ab5b6ecb68	mirror: add public method `registerObject` (#870 ) Summary: This function informs the GraphQL mirror of the existence of an object, specified by its global ID and its concrete typename (“concrete” meaning “object type”—like `User`, not `Actor`). The function will be called extensively internally as more objects are discovered while traversing the graph, but also needs to be exposed as a public entry point: a client needs to call this function at least once to register the root node of interest. A typical client workflow, once all of #622 is implemented, might be: 1. Issue a standalone GraphQL query to find the ID of a root node, like a GitHub repository: `repository(owner: "foo", name: "bar") { id }`. 2. Call `registerObject` with the ID found in the previous step. 3. Instruct the mirror to recursively update all dependencies. 4. Extract data from the mirror. As of this commit, steps (1) and (2) are possible. This commit makes progress toward #622. Test Plan: Unit tests included, with full coverage; run `yarn unit`. wchargin-branch: mirror-registerobject	2018-09-20 13:03:06 -07:00
Dandelion Mané	12aa5b7439	Add `mergeRepository` for merging Git repositories (#871 ) This commit adds a utility method, `mergeRepository`, which can merge multiple Git repository data structures. `loadGitData` has been updated to create a merged repository and then subsequently generate a graph from it. Test plan: New unit tests were added. `yarn test --full` passes. Loading a project and viewing its git data in the cred explorer works.	2018-09-20 12:57:57 -07:00
William Chargin	e572551cd8	mirror: add internal method `_createUpdate` (#868 ) Summary: It’s useful to add this simple function now because the rest of the commits required to implement #622 will want to use it extensively in test code. Actual clients of the API will not need to use it, because the concept of “updates” is an implementation detail: clients will always provide simple timestamps. Test Plan: Unit tests included, with full coverage; run `yarn unit`. wchargin-branch: mirror-createupdate	2018-09-20 11:12:21 -07:00
Dandelion Mané	cdceedef8d	Display urls in the cred explorer (#860 ) This commit modifies the plugin adapter's `nodeDescription` method so that it may return a React node. This enables the GitHub plugin's `nodeDescription` method to include hyperlinks directly to the referenced content on GitHub. This makes examining e.g. comment cred much easier. I've also made two other changes to the descriptions: - Pull requests diffs now color-encode the additions and deletions - Descriptions for comments and reviews no longer include the authors The Git plugin's behavior is unchanged. Test plan: I loaded a large repository in the cred explorer and verified that exploring comments and pulls and issues is much easier. The descriptions are as expected for every category of node. Snapshot tests updated. Fixes #590.	2018-09-20 10:48:05 -07:00
William Chargin	5debae414e	mirror: rename helper `_primitivesTableName` (#861 ) Summary: Some clients want to write const primitivesTableName = _primitivesTableName(typename); which they cannot if the function is also called `primitivesTableName`, due to ECMAScript shadowing semantics. Test Plan: Running `yarn flow` suffices; running `yarn unit` really suffices. wchargin-branch: mirror-rename-primitivestablename	2018-09-19 14:28:26 -07:00
William Chargin	1d18652459	graphql: improve coverage of `queries` module (#859 ) Summary: Each change provides real value, by either testing a plausible happy path that simply was not tested previously, or by adding an `empty`-assertion to a switch against a discriminated union type. Test Plan: For the snapshot change relating to the query formatter, note that Prettier formats the changed portion of the snapshot in the same way, by visiting <https://prettier.io/playground> and setting the parser to "graphql". (Prettier in general agrees with the stringification defined by this module, except for commas and spacing, for which we don’t bother to generate impeccably pretty output.) Run `yarn coverage` and note that the coverage of the whole `graphql` package is 100% on all axes. wchargin-branch: graphql-coverage	2018-09-18 17:57:16 -07:00
William Chargin	85efa811e0	mirror: use `SchemaInfo` in `_initialize` (#858 ) Summary: This simplifies and clarifies the code with no observable change. Test Plan: Existing unit tests suffice; run `yarn unit`. wchargin-branch: mirror-use-schemainfo	2018-09-18 16:30:43 -07:00
William Chargin	e69ff57c58	mirror: precompute some useful schema info (#857 ) Summary: This is mostly useful not for computational efficiency, but for ease of implementation: there end up being multiple places where we want to find (say) the primitive fields on an object, and having to go through the whole iterate-and-switch-and-push process repeatedly is annoying. Test Plan: Unit tests included, with full coverage; run `yarn unit`. wchargin-branch: mirror-schema-info	2018-09-18 16:24:38 -07:00
William Chargin	1b1a1e4d46	mirror: embed GraphQL schema into SQL (#849 ) Summary: This commit augments the `Mirror` constructor to turn the provided GraphQL schema into a SQL schema, with which it initializes the backing database. The schema is roughly as originally described in #622, with some changes (primarily: we omit `WITHOUT ROWID`; we add indexes; we store `total_count` on connections; and we use milliseconds instead of seconds for epoch time). Test Plan: Unit tests included, with full coverage; run `yarn unit`. wchargin-branch: mirror-sql-schema	2018-09-18 13:31:34 -07:00
Dandelion Mané	f03e3c83f2	Fix a crash when a review comment has >5 reactions (#855 ) Our continuation-fetching code failed to properly get continuations for pull request review comments, because it was only asking for more reactions on `"IssueComment"` fragments. This caused the `ensureNoMorePages` function to properly throw an error rather than proceding with incomplete data. This commit fixes the root cause by splitting `continuationsFromComment`into `continuationsFromReviewComment` and `continuationsFromIssueComment`. (Pull and issue comments are both considered 'IssueComment's.) The example-github repository has been updated to include 10 reactions to a single review comment; the example-data was updated in this commit, and all reactions have been loaded. I've also added a `console.error` statement in `ensureNoMorePages`. This only triggers when the program is about to fail, and it's useful for debugging. Test plan: `yarn test --full` passes. Paired with @wchargin	2018-09-18 12:52:05 -07:00
William Chargin	fa81c4eaa9	git: properly load empty repositories (#851 ) Summary: Fixes #850. Test Plan: Regression test added; it fails before the change and passes after it. Also, running `node ./bin/sourcecred.js load wchargin/mt` (which is a GitHub repository with no commits) now successfully loads the repository. (The cred explorer fails to process it, because it tries to normalize across GitHub users, of which there are none, but this is a known limitation and is unrelated.) wchargin-branch: fix-empty-git-repository	2018-09-17 16:36:22 -07:00
William Chargin	a93ad80ebc	mirror: initialize a GraphQL database mirror (#847 ) Summary: This commit introduces the `Mirror` class that will be the centerpiece of the persistent-loading API as described in #622. An instance of this class represents a mirror of a remote GraphQL database, defined by a particular schema. In this commit, we add the construction logic, which includes a safety measure to ensure that the database is used within one version of the code and schema. Test Plan: Unit tests included, with full coverage; run `yarn unit`. wchargin-branch: mirror-class	2018-09-17 13:53:08 -07:00
Dandelion Mané	62d3c180ee	Add GitHub reactions to the graph (#846 ) * Define Reaction edges This adds support to `github/edges` for creating edges representing GitHub reactions. These edges are not actually added to the graph. Test plan: Unit tests * Add GitHub reactions to the graph This commit adds functional support for reactions in SourceCred. Only thumbs-up, heart, and hooray reactions are supported for now, as they are all unambiguously positive; adding support for negative reactions like thumbs-down will require some more thought. The reactions are added to the graph, and new edge types have been added to the UI. Test plan: The `graphView` class has been updated to do invariant checking for the reaction edges, including that the unsupported reaction types like "THUMBS_DOWN" aren't added to the graph. I've tested this feature by downloading data for a large repository (ipfs/go-ipfs). The reaction edges appear and transfer cred reasonably. The edge types are displayed in the weight config appropriately. Builds on #839, #840, and #845.	2018-09-17 13:44:11 -07:00
Dandelion Mané	488c98c3e1	Add tensorflow-gardener to the list of bots (#841 ) Test plan: I verified that the spelling is correct	2018-09-17 13:44:02 -07:00
Dandelion Mané	33d14b9d1a	Define Reaction edges (#845 ) This adds support to `github/edges` for creating edges representing GitHub reactions. These edges are not actually added to the graph. Test plan: Unit tests	2018-09-17 13:35:47 -07:00
William Chargin	e9279bee90	mirror: add a helper function for transactions (#844 ) Summary: In implementing #622, we’ll want to run lots of things inside of transactions. This commit introduces a JavaScript API to do so more easily, properly handling success and failure cases. Test Plan: Unit tests included, with full coverage; run `yarn unit`. wchargin-branch: mirror-transaction-helper	2018-09-17 13:33:10 -07:00
William Chargin	f966ce300f	schema: make `fields` and `clauses` exact (#843 ) Summary: This affords more flexibility to clients, because an exact value can be used in place of an inexact value, but not vice versa. Test Plan: Running `yarn flow` suffices. wchargin-branch: schema-exact-type-fields	2018-09-17 12:07:52 -07:00
Dandelion Mané	1ad2cc0958	Request reactions data from GitHub (#839 ) (#840 ) This commit updates the GitHub graphql query to also fetch reactions. We update the JSON typedefs to include this new information, add continuations from comments, and update existing continuation and query code. Also, I added a safety check when updating comments for issues that was previously unnecessary but is now needed. Test plan: - `yarn test --full` passes. - Setting the page limits to 1 and running on the example-github does not error with unexhausted pages, and loads all the expected reactions. - Running on a larger repository (go-ipfs) works as expected. - I have written dependent code that consumes these reactions in the RelationalView, and works as intended, which suggests that the type signatures are correct.	2018-09-17 11:47:37 -07:00
Dandelion Mané	a2ffdf5ca8	Request reactions data from GitHub (#839 ) This commit updates the GitHub graphql query to also fetch reactions. We update the JSON typedefs to include this new information, add continuations from comments, and update existing continuation and query code. Also, I added a safety check when updating comments for issues that was previously unnecessary but is now needed. Test plan: - `yarn test --full` passes. - Setting the page limits to 1 and running on the example-github does not error with unexhausted pages, and loads all the expected reactions. - Running on a larger repository (go-ipfs) works as expected. - I have written dependent code that consumes these reactions in the RelationalView, and works as intended, which suggests that the type signatures are correct.	2018-09-14 16:10:31 -07:00
Dandelion Mané	aecf64b026	Detect references to commits (#833 ) Now that #832 gave us logic to parse references to commits, we have the RelationalView find and add these references. The actual change is a simple extension of existing reference detection logic. Test plan: Observe that the snapshots are updated with references to commits from the example-github repository. Progress on #815.	2018-09-14 11:56:16 -07:00
Dandelion Mané	c9a0d4b1b8	Modify parseReferences to detect refs to commits (#832 ) We add a new function, `findCommitReferences`, which can find both explicit url references to commits, and commit hashes. Since the commit url includes the commit hash, some extra logic is added to deduplicate them in this instance. Tests verify that this is done properly. Test plan: Unit tests cover the cases of having commit hashes, having commit urls, and having both at once.	2018-09-13 18:21:06 -07:00
William Chargin	4675b84443	graphql: validate well-foundedness of unions (#835 ) Summary: GraphQL unions are required to be unions specifically of object types. They cannot contain primitives or other union types as clauses. This is good: it means that we don’t have to worry about unions that recursively reference each other or themselves. Unions are also required to have at least one clause, but we don’t validate this because it’s not helpful for us. An empty union is perfectly well-defined, if useless, and shouldn’t cause any problems. Relevant portion of the spec: <https://facebook.github.io/graphql/October2016/#sec-Union-type-validation> Test Plan: Unit tests added, retaining full coverage; `yarn unit` suffices. wchargin-branch: graphql-schema-union-validation	2018-09-13 18:11:26 -07:00
William Chargin	7da9ef3a94	graphql: add a schema module (#834 ) Summary: This commit introduces a module for declaratively specifying the schema of a GraphQL database. See `buildGithubSchema` in `schema.test.js` for an example of the API. This makes progress toward #622, though the design has evolved some since its original specification there. Test Plan: Unit tests added, with full coverage; `yarn unit` suffices. wchargin-branch: graphql-schema	2018-09-13 18:02:14 -07:00
Dandelion Mané	7074a9dbd8	Fix the build (#831 ) There was a bad interaction between #830 and #829, wherein they both independently changed the snapshot. So they passed individually, and failed once both merged together. This fixes it. Test plan: `yarn test --full` passes.	2018-09-13 17:23:09 -07:00
Dandelion Mané	ab108653f3	Update example-github (#830 ) I added two new issues. One of them has references to commits, which is relevant for work on #815. Test plan: `yarn test --full` passes.	2018-09-13 15:46:51 -07:00
Dandelion Mané	ab85c9785b	Detect references in commit messages (#829 ) Now that the GitHub plugin knows about commit messages (#828), we can parse those commit messages to find references to other GitHub entities. Fixed a minor typing mistake along the way. Test plan: Observe that a number of references have been detected among the commits in the example GitHub repository. We mistakenly find references to wchargin because we don't have a proper tokenizer. (#481) Progress on #815.	2018-09-13 15:46:39 -07:00
Dandelion Mané	a1af9531ec	Request commit messages from GitHub (#828 ) We could get this information from the Git plugin, but since we want to use this for reference detection, it's much easier to have this follow the same pipeline as all the other GitHub reference detection code. I've updated the relational view to also remove the commit messages when compressing by removing bodies. A unit test was added to check this works as intended. See #815 for tracking. Test plan: `yarn test --full` passes. Snapshot changes are appropriate.	2018-09-13 15:15:21 -07:00
Dandelion Mané	c68cb29769	Add commit authorship to the graph (#826 ) In #824, we loaded every commit in the default branch's history into the GitHub relational view, along with authorship info. This commit actually uses that authorship info to create AUTHORS edges from the commit to the user that authored it (whenever possible). The implementation is quite simple: we just need to yield the commits when we yield all the authored entities, so that we will process their authors and add them to the graph. Also, I updated the invariant declarations in `graphView.js`, and corrected a type signature so that the new invariants would typecheck. Test plan: The snapshot update shows that commits are being added to the graph appropriately. Observe that commits which do not have a valid GitHub user as their author do not correspond to edges in the graph. See [example]. This is basically a solution to #815, but I'll defer closing that issue until I've added a few more features, like reference detection. [example]: `6bd1b4c0b7`	2018-09-13 14:19:37 -07:00
Dandelion Mané	4ad9fcf259	Add commits in the history to the RelationalView (#824 ) This builds on #821 so that every commit in the default ref's history is added as a Commit entity to the GitHub relational view. This means that these commits are also added to the graph by the GitHub plugin. In general, this will have no effect on real graphs, because these commits were already available via the Git plugin. Test plan: Observe that the snapshot changes just correspond to new commits being available to the RelationalView, and correspondingly added to the GitHub graph. `yarn test --full` passes.	2018-09-13 14:00:09 -07:00
Dandelion Mané	05f73f04ef	Change GitHub edge encoding strategy (#825 ) GitHub has a procedure for encoding node addresses into sequences of string "parts", so that we can generate unique edge addresses. Right now, the encoding strategy assumes that when we encode a node address into parts, that node address always starts with the prefix `["sourcecred", "github"]`. However, #816 makes the Git commit address a valid GitHub address, which means that this assumption no longer holds. We could start adding special-cased logic to ensure that we de-serialize Commit addresses properly, but what if we create edges between GitHub entities and other plugins' nodes in the future? It is much cleaner to remove the assumption, and serialize the full node address as parts in the edge address. This makes the GitHub edge addresses somewhat longer, but this is OK for now as we don't ever store those on disk. If, in the future, node/edge address length is a problem, we can investigate more principled and maintainable compression strategies at that time. Test plan: `yarn test --full` passes.	2018-09-13 13:54:49 -07:00
Dandelion Mané	9aca956a64	Ensure that `RelationalView` is not typed as `any` (#823 ) This fixes another instance of the notorious [facebook/flow#6400]. We also fix some type errors that were being masked as a consequence. Test plan: `yarn test` passes. [facebook/flow#6400]: https://github.com/facebook/flow/issues/6400	2018-09-13 12:53:08 -07:00
Dandelion Mané	5c46636611	Cleanup some minor mistakes (#822 ) - the capitalization of the GitObject types was incorrect - removed an outdated TODO Test plan: `yarn test` passes.	2018-09-13 12:52:58 -07:00
Dandelion Mané	2a39bd075d	Load commit authorship from GitHub (#821 ) This adds logic for retrieving every commit in the default branch's history, along with authorship information connecting that commit to a GitHub user (when available). This will allows us to do better cred tracking, especially for projects that don't always use pull requests for merging code. This results in a moderate increase in load time for the GitHub plugin. On my machine, loading SourceCred before this change takes 30s, and after this change it takes 34s. Test plan: Observe that the example-github has been updated with commits and authorship. Also, I ran the query for a larger repository (`sourcecred/sourcecred`) to verify that the continuation logic works.	2018-09-13 11:43:36 -07:00
Dandelion Mané	7d0d4fb2fa	Add GitHub commit entity (#819 ) This adds a `Commit` entity to the GitHub relational view. It has all the standard methods: commits can be retrieved en masse or by particular address, they have a URL and authors, and (de)serialize appropriately. The code for adding pull requests has been modified so that the merge commits are added as commit entities. This does not have any effect on the ultimate graph being created; the same edge is added either way. Test plan: I've extended the standard RelationalView tests to cover the `Commit` entity. The case where the commit has 0 authors is not yet tested, but will be once I add support for getting all of the commits from the example-github (we have one example of a commit that doesn't map to a user). Progress on #815.	2018-09-12 19:44:26 -07:00
Dandelion Mané	3e06c054db	Give GitHub plugin support for Commit addresses (#816 ) The Git plugin owns Commits, but the GitHub plugin also creates commits. This commit reifies that relationship by making a Git commit address a valid GitHub structured address. This is precursor work for #815, which will require adding a commit entity to the GitHub relational view. Also, this commit surfaces and fixes a minor type bug, wherein a map from strings to referent addresses was typed to hold any structured address, rather than just referent addresses. Test plan: The unit tests confirm that serializing/deserializing a Git commit address using the GitHub plugin's methods works as intended. Also, unit tests were added that verify that (de)serializing Git addresses for non-commit objects is still an error.	2018-09-12 19:06:06 -07:00
Dandelion Mané	7dc9449fe7	GitHub: Request commit authorship info (#817 ) This commit pulls the graphql fields to request commit information into a fragment, and requests GitHub authorship information (when available) for that fragment. We don't use that information yet, but we will soon. Progress on #815. Test plan: Observe that the example-github data is updated, so that we now have urls and authorship for commits. Observe that the query has updated, but no downstream code was affected. `yarn test --full` passes.	2018-09-12 18:51:06 -07:00
Dandelion Mané	335441e671	Expose node/edge prefixes publicly (#814 ) Both the GitHub and Git plugins create a `_Prefix` object for nodes and edges, which gives the respective prefixes for different node/edge types. We named it `_Prefix` because we weren't sure if these should be exported. In practice, these have proven quite useful to make generally available, and despite the `_`-naming we expose the objects outside their modules. This change renames `_Prefix` to `Prefix` to reflect the reality that these are used as public consts. Exporting them is safe as both objects are frozen. Test plan: Simple rename, `yarn test` suffices.	2018-09-12 17:18:47 -07:00
Dandelion Mané	70fe677990	Update example GitHub data (#813 ) Generated via `src/plugins/github/fetchGithubRepoTest.sh -u` `yarn test --full` passes. Closes #389. Thanks, @wchargin!	2018-09-12 17:18:37 -07:00
Dandelion Mané	737ed4d8b3	Add `MentionsAuthor` edges to the graph (#808 ) This commit builds on the work in #806, adding the `MentionsAuthorReference`s to the graph. It thus resolves #804. Empirically, the addition of these edges does not change the users' cred distribution much. Consider the results with the following 3 forward weights for the edge (results for ipfs/go-ipfs): \| User \| w=1/32 \| w=1/2 \| w=2 \| \|---------------\|-------:\|-------:\|-------:\| \| whyrusleeping \| 228.04 \| 225.69 \| 223.86 \| \| jbenet \| 102.04 \| 100.26 \| 99.53 \| \| kubuxu \| 66.60 \| 67.80 \| 69.36 \| \| ... \| — \| — \| — \| \| btc \| 22.69 \| 22.29 \| 21.38 \| The small effect on users' cred is not that surprising: the MentionsAuthor references always "shadow" a direct comment->user reference. In principle, the overall cred going to the user should be similar; the difference is that now some more cred flows in between the various comments authored by that user, on the way to the user. (And if those other comments had references, then it flows out from them, etc.) Empirically, the variance on comments' scores seems to increase as a result of having this heuristic, which is great—the fact that all comments had about the same score was a bug, not a feature. Sadly, we don't have good tooling for proper statistical analysis of the effect this is having. We'll want to study the effect of this heuristic more later, as we build tooling and canonical datasets that makes that analysis feasible. We choose to add this heuristic, despite the ambiguous effect on users' cred, because we think it is principled, and adds meaningful structure to the graph. Test plan: The commit is a pretty straightforward generalization of our existing GitHub edge logic. All of the interesting logic was thoroughly tested in the preceding pull, so this commit just tests the integration. Observe that standard (de)serialization of the edge works, that the snapshot is updated with a MentionsAuthor reference edge, and that the graph invariant checker, after update, does not throw errors. Also, I manually tested this change on the ipfs/go-ipfs repo. (It does not require regenerating data.)	2018-09-12 12:55:14 -07:00
Dandelion Mané	91f76393e8	Add logic for finding `MentionsAuthorReference`s (#806 ) A `MentionsAuthorReference` is created when a post mentions a user, and that user has authored at least one post in the same thread. Then there is a `MentionsAuthorReference` from the post to the other posts by that author. For context, see the docstrings in `mentionsAuthorReference.js`, and see #804. Test plan: Thorough unit tests have been added, which test the entire pipeline, from ingesting the data via GitHub's graphql responses, through to detecting the references. Edge cases such as self-reference and multi-reference are tested. Thanks to @wchargin for help writing this commit.	2018-09-11 20:27:31 -07:00

1 2 3 4 5 ...

645 Commits