I moved sourcecred/example-git{,hub} to the @sourcecred-test org.
This commit fixes the build given that move.
I've realized that in #1233 I in-advertently made some Git tests that
depend on a snapshot un-updateable. I'm going to compound on that slight
technical debt by skipping the tests that depended on that snapshot. I
recognize and accept that I'll need to pay this down when I resuscitate
the git plugin.
Test plan: `yarn test --full`.
This commit swaps usage over to the new implementation of `cli/load`
(the one that wraps `api/load`) and makes changes throughout the project
to accomodate that we now track instances by Project rather than by
RepoId.
Test plan: Unit tests updated; run `yarn test --full`. Also, for safety:
actually load a project (by whole org, why not) and verify that the
frontend still works.
The scores are lightly processed from their internal representation.
Example usage:
```
$ yarn backend;
$ node bin/sourcecred.js load sourcecred/sourcecred
$ node bin/sourcecred.js scores sourcecred/sourcecred > scores.json
```
The data structure is as follows:
```js
export type NodeOutput = {|
+id: string,
+totalCred: number,
+intervalCred: $ReadOnlyArray<number>,
|};
export type ScoreOutput = Compatible<{|
+users: $ReadOnlyArray<NodeOutput>,
+intervals: $ReadOnlyArray<Interval>,
|}>;
```
Test plan: I added sharness tests at `sharness/test_cli_scores.t`.
In the past, we've used javascript tests for CLI commands. However,
those are pretty time-consuming to write, and are less robust than
simply running the command from bash. Check the snapshot for a sense of
what the new data format looks like. Also, the snapshot updater now
updates this snapshot too.
Relevant for #1047.
Thanks to @Beanow for feedback on the output format and design.
Thanks to @wchargin for help in code review.
This fixes a build error in test_build_static_site.t.
It's been masked by a lot of ENOMEM issues I've been having in the full
build, so a few failures had actually had a chance to accrue:
- Fixes an issue wherein loading the sourcecred/example-git repo would
error (on failure to normalize scores, because there was no user
activity). Fixed it pretty crudely by adding an issue to that
repository.
- Fixes an issue where a deprecation warning caused the
build_static_site build to fail. We now permit that particular warning.
- Updates the example load snapshot.
Test plan: `yarn test --full` passes once again.
This commit disables the Git plugin by removing it from the default list
of plugins to load, or to display in the frontend.
Rationale: The git plugin doesn't currently add very much to cred
quality. Git commits have edges to their parent, which isn't a very
meaningful relationship for cred purposes. We'll want to re-enable the
Git plugin once we're ready to support e.g. file and directory level
cred tracking.
I've skipped a block of tests around the git analysisAdapter. (I intend
to deprecate the analysisAdapters, so skipping the tests seemed
preferrable to updating them). I also updated our sharness test for
catching test files without a proper describe block, so that it won't
error on skipped blocks.
Test plan: `yarn test --full` passes. Loading a new repository and
inspecting it in the frontend gives consistent results. There are no
references to Git plugin weights in the frontend, now that corresponding
nodes are not available.
This updates the graph `Node` type to include a string description.
The description should be a brief (ideally oneline) string giving
context on what the node is. All planned frontends will support
markdown, so linking to context (e.g. linking to the issue corresponding
to an ISSUE type node) is supported.
This commit updates the Git and GitHub plugins to use the new
description field.
Test plan: `yarn test --full` passes, and I've inspected snapshots and
made sure they look reasonable.
This commit modifies the Graph class so that it permits dangling edges;
that is to say, edges whose src or dst are not present in the graph.
Dangling edges may be directly added to the graph, or existing edges may
become dangling if their src or dst is removed.
This change is prerequisite to #1136; if we require that nodes have
metadata, we should also make it possible to add edges to nodes that
don't yet exist, as the plugin creating an edge may not have access to
the full metadata needed to add the node.
To support this change, there is now an `isDanglingEdge` method on the
graph, which reports whether or not the edge is dangling. Also,
`Graph.edges` requires that the client make an explicit choice on
whether dangling edges are desired. This ensures that we do not
accidentally include dangling edges in a case where they are
inappropriate (e.g. creating a Markov chain) or accidentally discard
dangling edges when they are needed (e.g. when merging or serializing).
The Graph's invariant checker has been updated to reflect the new
semantics.
The Graph compat version has been bumped, since this is a break in
backwards compatibility.
Note that this commit does not change the behavior of any plugins; that
is to say, no plugins create dangling edges (yet).
Test plan: The advanced graph test case has been updated to include
dangling edges. The tests for Graph, PagerankGraph, and
GraphToMarkovChain have been updated. `yarn test --full` passes.
This will allow timeline cred (#862) to do a better job of flowing cred
across reaction edges. (Very old reactions should not be moving a lot of
present-day cred.)
Test plan: Inspected snapshot changes.
I added `mentionsAuthorReference` based on an untested hypothesis that
they would be useful. With the passage of time, I've never seen any
evidence that they actually improve cred socres (their impact seems
negligible), and they add complexity.
In the future, "go-fishing" style heuristics like this should not merge
unless they are of clearly demonstrated value. Also, it would be better
to add stuff like this via a standalone plugin rather than in the core
GitHub logic.
Undoes #806.
Test plan: `yarn test`
As of the timeline cred work, I'm shifting emphasis away from raw
PageRank results, in favor of timeline pagerank results. As such,
there's no need to have load save the regular pagerank results on
creation.
As of #1136, there will be no need for timestampMap, as that data will
be present directly in the graph. As the timeline cred UI will depend on
the full graph for analysis, let's save the graph instead.
Test plan: `yarn test` and snapshot inspection.
At present, the Git commit node type lives in a strange state of shared
responsibility between GitHub and Git. The Git plugin is nominally
responsible for it, but its render method tries to show a hyperlink to
GitHub -- which is awkward for many reasons, including that the same Git
commit could have multiple hyperlinks on GitHub.
This commit resolves that issue by separating the existing commit type
into two: the Git Commit type, which is owned by the Git plugin and
doesn't have hyperlinks or any fancy GitHub metadata, and the GitHub
Commit, which is owned by the GitHub plugin, corresponds to a unique
database id in GitHub, and has a corresponding GitHub url.
The two commits are connected by a CorrespondsToCommit edge type, which
links from the GitHub commit to the corresponding Git commit.
This is necessary for #1136, as if we want to make descriptions a part
of the graph payload, we need for descriptions to be unique for a given
address--and descriptions are only unique if we identifiy each GitHub
commit pointer as a separate address.
Test plan: The unit testing in this part of the codebase is light, so I
verified that the frontend work as expected for `sourcecred/sourcecred`
and `sourcecred/research`. The new node type and edge type appear
properly in the UI, the GitHub commits are connected to their Git
counterparts, etc.
This modifies `sourcecred load` so that it saves timestamp information
for all of the loaded plugins in a single aggregated map.
This is quite convenient, as it saves consumers of timestamp information
from needing to worry about the (rather hacky) implementation whereby
the data is fed from each adapter. Instead, consumers can just load the
timestamp map. This will also make it much easier to use timestamp info
in the research codebase.
Test plan: The timestampMap module has testing around generating the map
from the adapter and nodes, writing it, and reading it.
I haven't added any testing to the `load` CLI command. I think it would
be redundant as the updated snapshot test reveals that the map is
getting serialized properly.
Tests pass, and I have inspected the snapshot
Thanks to @wchargin for [catching it].
[catching it]: https://github.com/sourcecred/sourcecred/issues/1151#issuecomment-494256526
Generated this change via:
```
$ yarn backend
$ (cd sharness; UPDATE_SNAPSHOT=1 ./test_load_example_github.t -l)
```
Test plan: `yarn test --full`, excluding the known (local-only) failure
described in #1151
Modifies the Git plugin so that we now track commit author dates.
Similar to in #1152, they are encoded in MsSinceEpoch.
Test plan: `yarn test --full` passes, except for the pre-existing
failure discussed in #1151.
Thanks to @s-ben for a conversation which motivated these changes.
Updates github schema to include createdAt timestamps, and then updates
the RelationalView to provide those timestamps as MsSinceEpoch.
I added createdAt timestamps to Repos, Issues, Pulls, Reviews, and
Comments, as these correspond to GitHub graph nodes where I think
time-based filtering is relevant. I didn't add them to Users, Reactions,
or Commits. Reactions, because they correspond to edges not nodes. (We
could consider doing the time filtering on edges too, but I'd rather
keep it simple for now.) Commits, because they're owned by a different
plugin. Users, because... in a certain sense the user identity is
timeless, the time factoring is mostly so we can evaluate how users'
cred varies over time.
Anyway, it will be easy to add more fields later if we need them.
Test plan:
- Inspect snapshot changes
- Ran `yarn test --full`
- Its only failure is pre-existing, per #1151
Thanks to @s-ben for some motivation and discussion about this change.
This commit updates the `sourcecred load` command so that it also
automatically runs PageRank on completion.
The implementation is slightly hacky, in that it prints two sets of
task status headers/footers to console, for reasons described in a
comment in the source code. The user-visible effect of this hack can
be seen below:
```
❯ node bin/sourcecred.js load sourcecred/example-github
Starting tasks
GO load-git
GO load-github
DONE load-github
DONE load-git
Overview
Final result: SUCCESS
Starting tasks
GO run-pagerank
DONE run-pagerank
Overview
Final result: SUCCESS
```
It would be good to clean this up, but for now I think it's acceptable.
Note that it is not safe to assume that a PagerankGraph always exists
for repos that are included in the RepoIdRegistry. The repo gets added
to the registry before the pagerank task runs. Consumers that are
loading the `PagerankGraph` can just check that the file exists, though.
Test plan: I've added unit tests that verify that the right tasks are
generated. Most importantly, the snapshot of the results of `sourcecred
load` now include a snapshotted pagerank graph.
(The snapshot was updated via `UPDATE_SNAPSHOT=1 yarn test --full`.)
Further progress on #967.
Updating github example data with support
for 🚀 and 👀 reaction types.
This follows #1068 and @decentralion updating
the archived repo with the new reaction types.
`src/plugins/github/fetchGithubRepoTest.sh -u`
(as @decentralion suggested) updated `example-github.json`
`yarn unit` caught two tests with failing snapshot
tests (`createGraph.test` and `relationalView.test`), so
I updated those with `yarn unit -u`
`yarn test -full` caught a failing snapshot test
at `sharness-full`, resolved by updating the
snapshot in `view.json.gz` with
`UPDATE_SNAPSHOT=1 yarn test --full`.
Thanks to @wchargin for the [explanation] on how
to resolve that issue.
[explanation]: https://github.com/sourcecred/sourcecred/pull/1077#pullrequestreview-196805017
**Test Plan:**
`yarn test --full` is passing.
Additionally, the commands:
```sh
filepath="./sharness/__snapshots__/example-github-load/data/sourcecred/example-github/github/view.json.gz" &&
[ -f "${filepath}" ] && # sanity check
diff -u \
<(git show "HEAD:${filepath}" | gzip -d | jq .) \
<(gzip -dc "${filepath}" | jq .) \
;
```
yields the following output:
```
--- /dev/fd/63 2019-01-27 08:34:15.020387301 -0500
+++ /dev/fd/62 2019-01-27 08:34:15.021095696 -0500
@@ -654,6 +654,22 @@
"subtype": "USER",
"login": "decentralion"
}
+ },
+ {
+ "content": "ROCKET",
+ "user": {
+ "type": "USERLIKE",
+ "subtype": "USER",
+ "login": "decentralion"
+ }
+ },
+ {
+ "content": "EYES",
+ "user": {
+ "type": "USERLIKE",
+ "subtype": "USER",
+ "login": "decentralion"
+ }
}
]
}
```
Again, thanks @wchargin's for providing those commands and accompanying
explanation.
Summary:
Our registry was defined to simply be a list of IDs. This is
insufficiently flexible; we want to be able to annotate these IDs with,
e.g., last-updated times (#989). This commit wraps the entries in a
simple object, updating clients appropriately.
Test Plan:
- Run `node ./bin/sourcecred.js load sourcecred/example-github` with a
repository registry in the old format, and note that it errors
appropriately.
- Run `yarn build` with a repository registry in the old format, and
note that it errors (“Compat mismatch”).
- Delete the old registry and re-run the `load` command. Note that it
runs successfully and outputs a registry. Run `yarn build`; note
that this works.
- Load data for two repositories. Run `yarn start`. Note that the list
of prototypes still works, and that you can navigate to and render
attributions for individual project pages.
- Verify that `yarn test --full` passes.
wchargin-branch: repo-id-registry-metadata
This will enable us to test code that needs to consume the results of
running `sourcecred load`, e.g. plugin adapter code.
If you need to update the snapshot, run
(cd sharness; UPDATE_SHAPSHOT=1 ./test_load_example_github.t)
Test plan: `yarn sharness-full` passes.
Paired with @wchargin