Now that the GitHub plugin knows about commit messages (#828), we can
parse those commit messages to find references to other GitHub entities.
Fixed a minor typing mistake along the way.
Test plan:
Observe that a number of references have been detected among the commits
in the example GitHub repository. We mistakenly find references to
wchargin because we don't have a proper tokenizer. (#481)
Progress on #815.
We could get this information from the Git plugin, but since we want to
use this for reference detection, it's much easier to have this follow
the same pipeline as all the other GitHub reference detection code.
I've updated the relational view to also remove the commit messages when
compressing by removing bodies. A unit test was added to check this
works as intended.
See #815 for tracking.
Test plan:
`yarn test --full` passes.
Snapshot changes are appropriate.
In #824, we loaded every commit in the default branch's history into the
GitHub relational view, along with authorship info. This commit actually
uses that authorship info to create AUTHORS edges from the commit to the
user that authored it (whenever possible).
The implementation is quite simple: we just need to yield the commits
when we yield all the authored entities, so that we will process their
authors and add them to the graph. Also, I updated the invariant
declarations in `graphView.js`, and corrected a type signature so that the
new invariants would typecheck.
Test plan: The snapshot update shows that commits are being added to the
graph appropriately. Observe that commits which do not have a valid
GitHub user as their author do not correspond to edges in the graph.
See [example].
This is basically a solution to #815, but I'll defer closing that issue
until I've added a few more features, like reference detection.
[example]: 6bd1b4c0b7
This builds on #821 so that every commit in the default ref's history is
added as a Commit entity to the GitHub relational view. This means that
these commits are also added to the graph by the GitHub plugin. In
general, this will have no effect on real graphs, because these commits
were already available via the Git plugin.
Test plan:
Observe that the snapshot changes just correspond to new commits being
available to the RelationalView, and correspondingly added to the GitHub
graph. `yarn test --full` passes.
GitHub has a procedure for encoding node addresses into sequences of
string "parts", so that we can generate unique edge addresses. Right
now, the encoding strategy assumes that when we encode a node address
into parts, that node address always starts with the prefix
`["sourcecred", "github"]`. However, #816 makes the Git commit address a
valid GitHub address, which means that this assumption no longer holds.
We could start adding special-cased logic to ensure that we de-serialize
Commit addresses properly, but what if we create edges between GitHub
entities and other plugins' nodes in the future? It is much cleaner to
remove the assumption, and serialize the full node address as parts in
the edge address. This makes the GitHub edge addresses somewhat longer,
but this is OK for now as we don't ever store those on disk. If, in the
future, node/edge address length is a problem, we can investigate more
principled and maintainable compression strategies at that time.
Test plan: `yarn test --full` passes.
This adds logic for retrieving every commit in the default branch's
history, along with authorship information connecting that commit to a
GitHub user (when available).
This will allows us to do better cred tracking, especially for projects
that don't always use pull requests for merging code.
This results in a moderate increase in load time for the GitHub plugin.
On my machine, loading SourceCred before this change takes 30s, and
after this change it takes 34s.
Test plan:
Observe that the example-github has been updated with commits and
authorship. Also, I ran the query for a larger repository
(`sourcecred/sourcecred`) to verify that the continuation logic works.
This adds a `Commit` entity to the GitHub relational view. It has all
the standard methods: commits can be retrieved en masse or by particular
address, they have a URL and authors, and (de)serialize appropriately.
The code for adding pull requests has been modified so that the merge
commits are added as commit entities. This does not have any effect on
the ultimate graph being created; the same edge is added either way.
Test plan: I've extended the standard RelationalView tests to cover the
`Commit` entity. The case where the commit has 0 authors is not yet
tested, but will be once I add support for getting all of the commits
from the example-github (we have one example of a commit that doesn't
map to a user).
Progress on #815.
The Git plugin owns Commits, but the GitHub plugin also creates commits.
This commit reifies that relationship by making a Git commit address a
valid GitHub structured address. This is precursor work for #815, which
will require adding a commit entity to the GitHub relational view.
Also, this commit surfaces and fixes a minor type bug, wherein a map
from strings to referent addresses was typed to hold any structured
address, rather than just referent addresses.
Test plan: The unit tests confirm that serializing/deserializing a Git
commit address using the GitHub plugin's methods works as intended.
Also, unit tests were added that verify that (de)serializing Git
addresses for non-commit objects is still an error.
This commit pulls the graphql fields to request commit information into
a fragment, and requests GitHub authorship information (when
available) for that fragment. We don't use that information yet, but we
will soon. Progress on #815.
Test plan: Observe that the example-github data is updated, so that we
now have urls and authorship for commits. Observe that the query has
updated, but no downstream code was affected. `yarn test --full` passes.
Both the GitHub and Git plugins create a `_Prefix` object for nodes and
edges, which gives the respective prefixes for different node/edge
types. We named it `_Prefix` because we weren't sure if these should be
exported. In practice, these have proven quite useful to make generally
available, and despite the `_`-naming we expose the objects outside
their modules. This change renames `_Prefix` to `Prefix` to reflect the
reality that these are used as public consts.
Exporting them is safe as both objects are frozen.
Test plan: Simple rename, `yarn test` suffices.
This commit builds on the work in #806, adding the
`MentionsAuthorReference`s to the graph. It thus resolves#804.
Empirically, the addition of these edges does not change the users' cred
distribution much. Consider the results with the following 3 forward
weights for the edge (results for ipfs/go-ipfs):
| User | w=1/32 | w=1/2 | w=2 |
|---------------|-------:|-------:|-------:|
| whyrusleeping | 228.04 | 225.69 | 223.86 |
| jbenet | 102.04 | 100.26 | 99.53 |
| kubuxu | 66.60 | 67.80 | 69.36 |
| ... | — | — | — |
| btc | 22.69 | 22.29 | 21.38 |
The small effect on users' cred is not that surprising: the
MentionsAuthor references always "shadow" a direct comment->user
reference. In principle, the overall cred going to the user should be
similar; the difference is that now some more cred flows in between the
various comments authored by that user, on the way to the user. (And if
those other comments had references, then it flows out from them, etc.)
Empirically, the variance on comments' scores seems to increase as a
result of having this heuristic, which is great—the fact that all
comments had about the same score was a bug, not a feature.
Sadly, we don't have good tooling for proper statistical analysis of the
effect this is having. We'll want to study the effect of this heuristic
more later, as we build tooling and canonical datasets that makes that
analysis feasible.
We choose to add this heuristic, despite the ambiguous effect on users'
cred, because we think it is principled, and adds meaningful structure
to the graph.
Test plan:
The commit is a pretty straightforward generalization of our existing
GitHub edge logic. All of the interesting logic was thoroughly tested in
the preceding pull, so this commit just tests the integration. Observe
that standard (de)serialization of the edge works, that the snapshot is
updated with a MentionsAuthor reference edge, and that the graph
invariant checker, after update, does not throw errors. Also, I manually
tested this change on the ipfs/go-ipfs repo. (It does not require
regenerating data.)
A `MentionsAuthorReference` is created when a post mentions a user, and
that user has authored at least one post in the same thread. Then there
is a `MentionsAuthorReference` from the post to the other posts by that
author.
For context, see the docstrings in `mentionsAuthorReference.js`, and
see #804.
Test plan:
Thorough unit tests have been added, which test the entire pipeline,
from ingesting the data via GitHub's graphql responses, through to
detecting the references. Edge cases such as self-reference and
multi-reference are tested.
Thanks to @wchargin for help writing this commit.
With some frequency we find ourselves needing to maintain maps whose
values are arrays that we append to. `MapUtil.pushValue` is a utility
method for these cases.
Existing usage in `aggregate.js` has been modified to use the new
function.
Test plan: Unit tests included.
Summary:
CI mode prevents Jest from automatically writing snapshots, and also
causes obsolete snapshots to be an error instead of a warning. This is
consistent with the behavior on Travis, where the `CI=1` environment
variable is set. It should thus be the default when running `yarn test`
(but not `yarn unit`).
Test Plan:
Add a file `src/foo.test.js`:
```js
// @flow
describe("foo", () => {
it("bar", () => {
expect("baz").toMatchSnapshot();
});
});
```
Note that `yarn test` fails with the message, “new snapshot was not
written”.
Revert this change, then re-run `yarn test`; note that it passes,
writing a snapshot.
Then, reinstate this change and delete `src/foo.test.js`. Note that
running `yarn test` fails, due to an obsolete snapshot. Revert the
change again, and watch `yarn test` pass despite the obsolete snapshot.
Finally, remove the snapshot. :-)
wchargin-branch: test-jest-ci
Summary:
Resolves#800. The newly added test takes about 2ms per file.
Test Plan:
Run `yarn sharness`, and note that it passes.
Then, edit (say) `src/main/test.js` to change the top-level describe
block from `"cli/main"` to something else, or to remove it altogether.
Re-run `yarn sharness` and note that it fails with a helpful message:
```
test_js_tests_have_top_level_describe_block_with_filename.t .. 1/?
not ok 31 - test file: cli/main.test.js
test_js_tests_describe_filename.t .. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/65 subtests
```
wchargin-branch: describe-test
Summary:
Per #800, each test file should start with a `describe` block listing
its file path under `src`. Currently, nine of our tests do not do so.
Of these, eight had a top-level describe block with the wrong name
(either not a filepath or an outdated filepath), while only one short
test was missing a top-level describe block altogether. This patch fixes
each file to use the correct format.
Test Plan:
Apply the Sharness test in #802, and note that it fails before this
patch but passes after it.
wchargin-branch: describe-fix
Previously, the WeightConfig (and the button that expanded it) were in
the credExplorer App. This was a little weird, as there's no reason to
play with the weights before you have some Pagerank results to
investigate; additionally, it risked confusing new users with a concept
that was not yet applicable.
Also, the implementation was wonky: the WeightConfig had responsibility
for expanding/hiding itself, which gave poor ability to position the
button and the WeightConfig separately.
Finally, the codepath was untested (vestiges of #604).
This commit fixes all three issues:
- The WeightConfig and button have moved into PagerankTable
- The WeightConfig is now a stateless component, and the parent takes
responsibility for deciding when to mount it
- Logic for showing/hiding the WeightConfig is now tested.
This commit implements a [suggestion] to make `credExplorer/App` a
single source of truth on the `WeightedTypes`. As such, both
`WeightConfig` and `PluginWeightConfig` have been refactored to be
(essentially) stateless components that are controlled from above. I say
essentially because `WeightConfig` still has its expanded state, but
that will go away soon.
Along the way, I've improved testing and added some new invariant
checking (e.g. that `PluginWeightConfig` was passed the correct set of
weights for its adapter). For the first time, there are now tests for
the `WeightConfig` itself! I'm not totally done with the weight
re-write, but this seems like a good time to close#604, as the whole
logical sequence for setting weights is now tested.
Test plan: There are new unit tests. Also, to be sure, I manually tested
the UI as well.
[suggestion]: https://github.com/sourcecred/sourcecred/pull/792#issuecomment-419234721
This commit refactors `credExplorer/App` so that instead of storing an
`EdgeEvaulator` in its state, it stores `WeightedTypes` instead. This
has a few benefits:
- It's trivial to generate the right default value for `WeightedTypes`,
so we no longer allow the variable to be nullable in the state. This
simplifies logic, removes an error case, and means that we don't require
the `WeightConfig` to mount before the app is usable.
- `WeightedTypes` are serializable and can be tested for equality, so
they are a better-behaved piece of state
- We put off the information-destroying transformation as long as
possible
- In the future, I think we may want to move the weights/types concept
into core, at which point the `WeightedTypes` will directly be consumed
by the `core/attribution` module.
Test plan: Unit tests are pretty thorough; to be safe, I tested the UI
myself.
Summary:
This upgrade didn’t require fixing any new errors, but Flow is a good
dependency to keep on top of.
Test Plan:
Running `yarn flow` suffices.
wchargin-branch: flow-v0.80.0
Summary:
Mostly Webpack loaders that have become unused through various config
changes.
Test Plan:
Check that these packages are not used anywhere except as transitive
dependencies:
```shell
$ git show --format= package.json |
> sed '1,4d' | grep '^-' | cut -d\" -f2 | git grep -cf -
yarn.lock:3
```
Also, `yarn && yarn test --full` works, and `yarn start` works, and
`yarn backend && node ./bin/sourcecred.js load sourcecred/example-git`
works.
wchargin-branch: remove-unused-deps
This refactors PluginWeightConfig so that it uses the
`defaultWeightsForAdapter` method introduced in #787.
The refactor is mildly invasive, as we switch the state from being a
(mutable) `WeightedTypes` to having a (regular, read-only)
`WeightedTypes`. I think this is an improvement in consistency.
Test plan: Trivial refactor; unit tests+flow pass.
This commit creates a central `weights` module that defines all of the
weight-related types, and provides some utilities for dealing with them.
This way users of weight-concepts do not need to depend on a lot of
random modules just to get the relevant types. The utility methods are
implicitly defined a few places in the codebase: now we can avoid
re-writing them, and test them more thoroughly.
Test plan: Unit tests pass.
Currently, the `credExplorer` uses the `defaultStaticAdapters`, but it
imports these adapters in multiple places. If we decide to make the
adapters configurable (e.g. when we start supporting more plugins) this
will be a problem.
This change modifies the cred explorer so that the adapters always come
from a prop declaration on the app. Then the adapters are passed into
the `state` module's functional entry points, rather than letting
`state` depend on the default adapters directly.
This change is motivated by the fact that my WeightConfig cleanup can be
done more cleanly if the adapters are present as a prop on the App.
Test plan: Unit tests are updated. Also, `git grep
defaultStaticAdapters` reaveals that the adapters are only consumed
once.
This commit adds `weightsToEdgeEvaluator`, a function for converting
weighted node types into an `EdgeEvaluator`. This replaces the
`edgeWeights` module (which was untested, and an outmoded API).
Test plan: The new `weightsToEdgeEvaluator` method is well-tested.
Since `WeightConfig` is still not tested, I manually verified that it
still works as anticipated.
Summary:
Lots of tests need the output of `yarn backend`. Before this commit,
they tended to create it themselves. This was slow and wasteful, and
also could in principle have race conditions (though in practice usually
tended not to).
This commit updates tests to respect a `SOURCECRED_BIN` environment
variable indicating the path to an existing directory of backend
applications.
Closes#765.
Test Plan:
Running `yarn test --full` passes.
Prepending `echo run >>/tmp/log &&` to the `backend` script in
`package.json` and running `yarn test --full` results in a log file
containing only one line, indicating that the script really is run only
once.
wchargin-branch: deduplicate-backend
This is convenient for testing other code, where we may want to directly
use the fallback types. One test has been updated in this way.
I also changed the names for the fallback adapter's edges to be somewhat
more readable.
Test plan: Tests improved.
Summary:
This commit removes the `config/backend.js` script and replaces it with
a direct invocation of Webpack. This enables us to use command-line
arguments to Webpack, like `--output-path`.
Test Plan:
Note that `rm -rf bin; yarn backend` still works, and that the resulting
applications work (`node bin/sourcecred.js load`). Note that `yarn test`
and `yarn test --full` still work.
wchargin-branch: backend-webpack-direct
Summary:
We currently configure the Babel config with environment variables: in
particular, the `SOURCECRED_BACKEND` environment variable causes Babel
to target Node instead of the browser. The relevant lines are copied
from `scripts/backend.js`.
The environment variable mechanism is slightly clunky, especially as it
requires the Webpack config module to be impure, but it works okay for
our purposes. We could adopt a more principled solution—setting the
`options` argument to the Babel loader in the backend Webpack config—but
this would require redesigning the Babel config system, which would take
a moderate amount of effort.
Test Plan:
As of this commit, `yarn backend` has bitwise identical output to
directly invoking Webpack:
```shell
$ yarn backend >/dev/null 2>/dev/null
$ shasum -a 256 bin/* | shasum -a 256
c4f7494c3ba70e5488ff4a8b44550e478a2a8b27fa96f286123f9566fd28f7be -
$ NODE_ENV=development node ./node_modules/.bin/webpack \
> --config ./config/webpack.config.backend.js >/dev/null 2>/dev/null
$ shasum -a 256 bin/* | shasum -a 256
c4f7494c3ba70e5488ff4a8b44550e478a2a8b27fa96f286123f9566fd28f7be -
```
wchargin-branch: backend-set-babel-flags
Summary:
Over the past few commits, I have accidentally removed all Flow errors
from the Webpack config. We can now use Flow on that file to prevent any
new errors from creeping in.
Test Plan:
Running `yarn flow` suffices.
wchargin-branch: backend-flow
Summary:
Previously, our `webpack.config.backend.js` file actually exported a
function that could be used to make a Webpack configuration object.
(This is not to be confused with the late `makeWebpackConfig.js`, which
actually exported a configuration object!)
In addition to being confusing nomenclature, this was a sneaky trap for
CLI users. Invoking `webpack --config config/webpack.config.backend.js`
would actually work, but do the wrong thing: Webpack _allows_ your
configuration object to be a function, but with different semantics. In
particular, the result was that Webpack would emit the build output into
your current directory instead of into `bin/`.
This commit fixes that by making `webpack.config.backend.js` export the
Webpack configuration object for the backend JavaScript applications.
The logic to change the path is now handled by the caller, by
overwriting `config.output.path`; this is exactly [the same approach
that the Webpack CLI takes when given an `--output-path`][1], so it’s
okay with me.
[1]: 368e2640e6/bin/convert-argv.js (L406-L409)
Test Plan:
Run `yarn backend` and `yarn backend --dry-run`. Note that each runs
with appropriate output (both emitted files and console logs).
wchargin-branch: backend-webpack-config-object
This commit adds PluginWeightConfig, which is responsible for
adding all the weights for an individual plugin. The top-level
WeightConfig now creates multiple PluginWeightConfigs. It also takes
responsibility for hiding the FallbackPlugin.
Test plan: The PluginWeightConfig is tested (and fairly simple). The
top-level WeightConfig is not yet tested (#604), so I manually tested
that the weights in the app still function.
Summary:
This logic isn’t currently useful to us. If we want this functionality,
we should consider using a Webpack plugin like `size-limit`. In the
meantime, this removes functionality from `scripts/backend.js`,
continuing on the path to #765.
Recommend reviewing with `-w`.
Test Plan:
Run `yarn backend` and `yarn backend --dry-run`, noting that each works.
wchargin-branch: backend-remove-file-sizes
Summary:
As of #775, this is no longer used.
Test Plan:
A `git grep eslint-loader` shows no results, and `yarn test --full`
passes.
wchargin-branch: remove-eslint-loader
Summary:
Webpack will fail (quickly) if any required entry points do not exist.
The `scripts/backend.js` script superfluously checks this, too. This
patch removes that check.
Test Plan:
In `config/paths.js`, change `src/cli/main.js` to `src/cli/wat.js`.
Then, `yarn backend`, and note that Webpack fails quickly, with an error
“entry module not found”. Note that Webpack fails equally quickly if you
change the path of the last entry point rather than the first one.
wchargin-branch: backend-remove-exists-check
Summary:
We lint separately, with `yarn lint`. There’s no need to duplicate this
effort.
Test Plan:
Introduce a lint error, for instance by adding `("unused expression");`
to `src/cli/main.js` and `src/app/App.js`. Note that `yarn lint` fails
but `yarn backend` and `yarn start` and `yarn build` succeed.
wchargin-branch: webpack-no-lint
Summary:
Both the backend and the web builds want to empty the build directory
before starting. This commit makes them use the same codepath, reducing
the amount of work that `scripts/backend.js` does so that we can more
easily remove it (#765).
Test Plan:
```shell
$ touch ./bin/wat
$ yarn backend >/dev/null 2>/dev/null
$ file ./bin/wat
./bin/wat: cannot open `./bin/wat' (No such file or directory)
```
wchargin-branch: backend-empty-backend-build-directory
`testUtil.configureEnzyme` now additionally asserts, after every test,
that `console.error` and `console.warn` were not called. Tests that
explicitly expect such calls can still be written by manually re-mocking
the relevant console method (and several examples already exist).
The code that explicitly specifies this for various enzyme test files
has been removed.
Test plan: `git grep "not.toHaveBeenCalled"` shows only unrelated usage.
`yarn test` passes. Adding a spurious console.warn to a passing test
causes it to fail.
Fixes#668
Summary:
This patch fixes a particularly sneaky bug. Our test script contains a
literal backtick inside single quotes. This is generally not a problem,
because backticks inside single quotes do nothing. But the contents of
the single quotes are interpreted as Bash by our test runner, and at
that time the single quotes are expanded to a command substitution.
Therefore, `grep` is invoked as if writing
grep -e "warning: running $(yarn backend)"
at the CLI. This will actually invoke `yarn backend`!
The magnificent aspect of this bug is that it both makes the test script
slower by about ten seconds _and_ completely and silently defeats the
assertion in which it’s contained. The output of `yarn backend` contains
several blank lines. Therefore, one of the literal patterns to `grep`
contains a blank line. This causes `grep` to match _every_ line in the
error file, regardless of whether it is one of the intended messages.
This patch is the 666th PR to SourceCred. In my opinion, it deserves
this dubious honor.
Test Plan:
Note that `yarn test --full` works, but fails if one of the expected
error message patterns is deleted or munged.
Confirm the behavior by prepending `echo backend >>/tmp/log &&` to the
`yarn backend` script in `package.json`, noting that the resulting log
file contains four lines before this patch and two lines after it.
(Don’t forget to delete/clear the log file before invocations.)
Confirm the behavior of `grep` by writing:
```shell
$ printf 'things went wrong!\n' >err
$ printf 'wat\n\nwot\n' >patterns
$ grep -vF -e "okay" -e "warn: `cat patterns`" err; echo $?
1
$ printf 'wat\nwot\n' >patterns # no empty line
$ grep -vF -e "okay" -e "warn: `cat patterns`" err; echo $?
things went wrong!
0
```
wchargin-branch: fix-build-test-quoting
Summary:
This change should have happened in #768. However, I didn’t catch it
then because `yarn test --full` passes even before this commit, despite
the expected error being clearly wrong! It turns out that a very sneaky
bug conspires with this one to result in the test passing no matter what
kinds of warnings `yarn backend` may output. This bug is fixed in #772.
Test Plan:
Observe that the error message is now correct by comparing against the
source in `config/RemoveBuildDirectoryPlugin.js`. Then, apply #772 and
note that `yarn test --full` still passes, but does not pass when #772
is applied and this change is reverted.
wchargin-branch: fix-expected-error-message
Summary:
This simplifies interfaces everywhere.
See also #216, which did the opposite of this as a temporary fix due to
a Babel/Webpack interaction that no longer exists as of #766.
Test Plan:
Note that `node bin/sourcecred.js load sourcecred/example-git` still
works (after `yarn backend`). Note that `yarn test` still works. These
demonstrate that the module works from both a Webpack context and a Node
context. Note that `git grep --name-only execDependencyGraph` yields
exactly those files touched in this commit. Note that `yarn test --full`
passes.
wchargin-branch: commonjs-execDependencyGraph