Summary:
This affords more flexibility to clients, because an exact value can be
used in place of an inexact value, but not vice versa.
Test Plan:
Running `yarn flow` suffices.
wchargin-branch: schema-exact-type-fields
This commit updates the GitHub graphql query to also fetch reactions.
We update the JSON typedefs to include this new information, add
continuations from comments, and update existing continuation and query
code. Also, I added a safety check when updating comments for issues
that was previously unnecessary but is now needed.
Test plan:
- `yarn test --full` passes.
- Setting the page limits to 1 and running on the example-github does
not error with unexhausted pages, and loads all the expected reactions.
- Running on a larger repository (go-ipfs) works as expected.
- I have written dependent code that consumes these reactions in the
RelationalView, and works as intended, which suggests that the type
signatures are correct.
This commit updates the GitHub graphql query to also fetch reactions.
We update the JSON typedefs to include this new information, add
continuations from comments, and update existing continuation and query
code. Also, I added a safety check when updating comments for issues
that was previously unnecessary but is now needed.
Test plan:
- `yarn test --full` passes.
- Setting the page limits to 1 and running on the example-github does
not error with unexhausted pages, and loads all the expected reactions.
- Running on a larger repository (go-ipfs) works as expected.
- I have written dependent code that consumes these reactions in the
RelationalView, and works as intended, which suggests that the type
signatures are correct.
Now that #832 gave us logic to parse references to commits, we have the
RelationalView find and add these references. The actual change is
a simple extension of existing reference detection logic.
Test plan: Observe that the snapshots are updated with references to
commits from the example-github repository.
Progress on #815.
We add a new function, `findCommitReferences`, which can find both
explicit url references to commits, and commit hashes.
Since the commit url includes the commit hash, some extra logic is added
to deduplicate them in this instance. Tests verify that this is done
properly.
Test plan: Unit tests cover the cases of having commit hashes, having
commit urls, and having both at once.
Summary:
GraphQL unions are required to be unions specifically of object types.
They cannot contain primitives or other union types as clauses. This is
good: it means that we don’t have to worry about unions that recursively
reference each other or themselves.
Unions are also required to have at least one clause, but we don’t
validate this because it’s not helpful for us. An empty union is
perfectly well-defined, if useless, and shouldn’t cause any problems.
Relevant portion of the spec:
<https://facebook.github.io/graphql/October2016/#sec-Union-type-validation>
Test Plan:
Unit tests added, retaining full coverage; `yarn unit` suffices.
wchargin-branch: graphql-schema-union-validation
Summary:
This commit introduces a module for declaratively specifying the schema
of a GraphQL database. See `buildGithubSchema` in `schema.test.js` for
an example of the API.
This makes progress toward #622, though the design has evolved some
since its original specification there.
Test Plan:
Unit tests added, with full coverage; `yarn unit` suffices.
wchargin-branch: graphql-schema
There was a bad interaction between #830 and #829, wherein they both
independently changed the snapshot. So they passed individually, and
failed once both merged together. This fixes it.
Test plan: `yarn test --full` passes.
Now that the GitHub plugin knows about commit messages (#828), we can
parse those commit messages to find references to other GitHub entities.
Fixed a minor typing mistake along the way.
Test plan:
Observe that a number of references have been detected among the commits
in the example GitHub repository. We mistakenly find references to
wchargin because we don't have a proper tokenizer. (#481)
Progress on #815.
We could get this information from the Git plugin, but since we want to
use this for reference detection, it's much easier to have this follow
the same pipeline as all the other GitHub reference detection code.
I've updated the relational view to also remove the commit messages when
compressing by removing bodies. A unit test was added to check this
works as intended.
See #815 for tracking.
Test plan:
`yarn test --full` passes.
Snapshot changes are appropriate.
In #824, we loaded every commit in the default branch's history into the
GitHub relational view, along with authorship info. This commit actually
uses that authorship info to create AUTHORS edges from the commit to the
user that authored it (whenever possible).
The implementation is quite simple: we just need to yield the commits
when we yield all the authored entities, so that we will process their
authors and add them to the graph. Also, I updated the invariant
declarations in `graphView.js`, and corrected a type signature so that the
new invariants would typecheck.
Test plan: The snapshot update shows that commits are being added to the
graph appropriately. Observe that commits which do not have a valid
GitHub user as their author do not correspond to edges in the graph.
See [example].
This is basically a solution to #815, but I'll defer closing that issue
until I've added a few more features, like reference detection.
[example]: 6bd1b4c0b7
This builds on #821 so that every commit in the default ref's history is
added as a Commit entity to the GitHub relational view. This means that
these commits are also added to the graph by the GitHub plugin. In
general, this will have no effect on real graphs, because these commits
were already available via the Git plugin.
Test plan:
Observe that the snapshot changes just correspond to new commits being
available to the RelationalView, and correspondingly added to the GitHub
graph. `yarn test --full` passes.
GitHub has a procedure for encoding node addresses into sequences of
string "parts", so that we can generate unique edge addresses. Right
now, the encoding strategy assumes that when we encode a node address
into parts, that node address always starts with the prefix
`["sourcecred", "github"]`. However, #816 makes the Git commit address a
valid GitHub address, which means that this assumption no longer holds.
We could start adding special-cased logic to ensure that we de-serialize
Commit addresses properly, but what if we create edges between GitHub
entities and other plugins' nodes in the future? It is much cleaner to
remove the assumption, and serialize the full node address as parts in
the edge address. This makes the GitHub edge addresses somewhat longer,
but this is OK for now as we don't ever store those on disk. If, in the
future, node/edge address length is a problem, we can investigate more
principled and maintainable compression strategies at that time.
Test plan: `yarn test --full` passes.
This adds logic for retrieving every commit in the default branch's
history, along with authorship information connecting that commit to a
GitHub user (when available).
This will allows us to do better cred tracking, especially for projects
that don't always use pull requests for merging code.
This results in a moderate increase in load time for the GitHub plugin.
On my machine, loading SourceCred before this change takes 30s, and
after this change it takes 34s.
Test plan:
Observe that the example-github has been updated with commits and
authorship. Also, I ran the query for a larger repository
(`sourcecred/sourcecred`) to verify that the continuation logic works.
This adds a `Commit` entity to the GitHub relational view. It has all
the standard methods: commits can be retrieved en masse or by particular
address, they have a URL and authors, and (de)serialize appropriately.
The code for adding pull requests has been modified so that the merge
commits are added as commit entities. This does not have any effect on
the ultimate graph being created; the same edge is added either way.
Test plan: I've extended the standard RelationalView tests to cover the
`Commit` entity. The case where the commit has 0 authors is not yet
tested, but will be once I add support for getting all of the commits
from the example-github (we have one example of a commit that doesn't
map to a user).
Progress on #815.
The Git plugin owns Commits, but the GitHub plugin also creates commits.
This commit reifies that relationship by making a Git commit address a
valid GitHub structured address. This is precursor work for #815, which
will require adding a commit entity to the GitHub relational view.
Also, this commit surfaces and fixes a minor type bug, wherein a map
from strings to referent addresses was typed to hold any structured
address, rather than just referent addresses.
Test plan: The unit tests confirm that serializing/deserializing a Git
commit address using the GitHub plugin's methods works as intended.
Also, unit tests were added that verify that (de)serializing Git
addresses for non-commit objects is still an error.
This commit pulls the graphql fields to request commit information into
a fragment, and requests GitHub authorship information (when
available) for that fragment. We don't use that information yet, but we
will soon. Progress on #815.
Test plan: Observe that the example-github data is updated, so that we
now have urls and authorship for commits. Observe that the query has
updated, but no downstream code was affected. `yarn test --full` passes.
Both the GitHub and Git plugins create a `_Prefix` object for nodes and
edges, which gives the respective prefixes for different node/edge
types. We named it `_Prefix` because we weren't sure if these should be
exported. In practice, these have proven quite useful to make generally
available, and despite the `_`-naming we expose the objects outside
their modules. This change renames `_Prefix` to `Prefix` to reflect the
reality that these are used as public consts.
Exporting them is safe as both objects are frozen.
Test plan: Simple rename, `yarn test` suffices.
This commit builds on the work in #806, adding the
`MentionsAuthorReference`s to the graph. It thus resolves#804.
Empirically, the addition of these edges does not change the users' cred
distribution much. Consider the results with the following 3 forward
weights for the edge (results for ipfs/go-ipfs):
| User | w=1/32 | w=1/2 | w=2 |
|---------------|-------:|-------:|-------:|
| whyrusleeping | 228.04 | 225.69 | 223.86 |
| jbenet | 102.04 | 100.26 | 99.53 |
| kubuxu | 66.60 | 67.80 | 69.36 |
| ... | — | — | — |
| btc | 22.69 | 22.29 | 21.38 |
The small effect on users' cred is not that surprising: the
MentionsAuthor references always "shadow" a direct comment->user
reference. In principle, the overall cred going to the user should be
similar; the difference is that now some more cred flows in between the
various comments authored by that user, on the way to the user. (And if
those other comments had references, then it flows out from them, etc.)
Empirically, the variance on comments' scores seems to increase as a
result of having this heuristic, which is great—the fact that all
comments had about the same score was a bug, not a feature.
Sadly, we don't have good tooling for proper statistical analysis of the
effect this is having. We'll want to study the effect of this heuristic
more later, as we build tooling and canonical datasets that makes that
analysis feasible.
We choose to add this heuristic, despite the ambiguous effect on users'
cred, because we think it is principled, and adds meaningful structure
to the graph.
Test plan:
The commit is a pretty straightforward generalization of our existing
GitHub edge logic. All of the interesting logic was thoroughly tested in
the preceding pull, so this commit just tests the integration. Observe
that standard (de)serialization of the edge works, that the snapshot is
updated with a MentionsAuthor reference edge, and that the graph
invariant checker, after update, does not throw errors. Also, I manually
tested this change on the ipfs/go-ipfs repo. (It does not require
regenerating data.)
A `MentionsAuthorReference` is created when a post mentions a user, and
that user has authored at least one post in the same thread. Then there
is a `MentionsAuthorReference` from the post to the other posts by that
author.
For context, see the docstrings in `mentionsAuthorReference.js`, and
see #804.
Test plan:
Thorough unit tests have been added, which test the entire pipeline,
from ingesting the data via GitHub's graphql responses, through to
detecting the references. Edge cases such as self-reference and
multi-reference are tested.
Thanks to @wchargin for help writing this commit.
With some frequency we find ourselves needing to maintain maps whose
values are arrays that we append to. `MapUtil.pushValue` is a utility
method for these cases.
Existing usage in `aggregate.js` has been modified to use the new
function.
Test plan: Unit tests included.
Summary:
Per #800, each test file should start with a `describe` block listing
its file path under `src`. Currently, nine of our tests do not do so.
Of these, eight had a top-level describe block with the wrong name
(either not a filepath or an outdated filepath), while only one short
test was missing a top-level describe block altogether. This patch fixes
each file to use the correct format.
Test Plan:
Apply the Sharness test in #802, and note that it fails before this
patch but passes after it.
wchargin-branch: describe-fix
Previously, the WeightConfig (and the button that expanded it) were in
the credExplorer App. This was a little weird, as there's no reason to
play with the weights before you have some Pagerank results to
investigate; additionally, it risked confusing new users with a concept
that was not yet applicable.
Also, the implementation was wonky: the WeightConfig had responsibility
for expanding/hiding itself, which gave poor ability to position the
button and the WeightConfig separately.
Finally, the codepath was untested (vestiges of #604).
This commit fixes all three issues:
- The WeightConfig and button have moved into PagerankTable
- The WeightConfig is now a stateless component, and the parent takes
responsibility for deciding when to mount it
- Logic for showing/hiding the WeightConfig is now tested.
This commit implements a [suggestion] to make `credExplorer/App` a
single source of truth on the `WeightedTypes`. As such, both
`WeightConfig` and `PluginWeightConfig` have been refactored to be
(essentially) stateless components that are controlled from above. I say
essentially because `WeightConfig` still has its expanded state, but
that will go away soon.
Along the way, I've improved testing and added some new invariant
checking (e.g. that `PluginWeightConfig` was passed the correct set of
weights for its adapter). For the first time, there are now tests for
the `WeightConfig` itself! I'm not totally done with the weight
re-write, but this seems like a good time to close#604, as the whole
logical sequence for setting weights is now tested.
Test plan: There are new unit tests. Also, to be sure, I manually tested
the UI as well.
[suggestion]: https://github.com/sourcecred/sourcecred/pull/792#issuecomment-419234721
This commit refactors `credExplorer/App` so that instead of storing an
`EdgeEvaulator` in its state, it stores `WeightedTypes` instead. This
has a few benefits:
- It's trivial to generate the right default value for `WeightedTypes`,
so we no longer allow the variable to be nullable in the state. This
simplifies logic, removes an error case, and means that we don't require
the `WeightConfig` to mount before the app is usable.
- `WeightedTypes` are serializable and can be tested for equality, so
they are a better-behaved piece of state
- We put off the information-destroying transformation as long as
possible
- In the future, I think we may want to move the weights/types concept
into core, at which point the `WeightedTypes` will directly be consumed
by the `core/attribution` module.
Test plan: Unit tests are pretty thorough; to be safe, I tested the UI
myself.
This refactors PluginWeightConfig so that it uses the
`defaultWeightsForAdapter` method introduced in #787.
The refactor is mildly invasive, as we switch the state from being a
(mutable) `WeightedTypes` to having a (regular, read-only)
`WeightedTypes`. I think this is an improvement in consistency.
Test plan: Trivial refactor; unit tests+flow pass.
This commit creates a central `weights` module that defines all of the
weight-related types, and provides some utilities for dealing with them.
This way users of weight-concepts do not need to depend on a lot of
random modules just to get the relevant types. The utility methods are
implicitly defined a few places in the codebase: now we can avoid
re-writing them, and test them more thoroughly.
Test plan: Unit tests pass.
Currently, the `credExplorer` uses the `defaultStaticAdapters`, but it
imports these adapters in multiple places. If we decide to make the
adapters configurable (e.g. when we start supporting more plugins) this
will be a problem.
This change modifies the cred explorer so that the adapters always come
from a prop declaration on the app. Then the adapters are passed into
the `state` module's functional entry points, rather than letting
`state` depend on the default adapters directly.
This change is motivated by the fact that my WeightConfig cleanup can be
done more cleanly if the adapters are present as a prop on the App.
Test plan: Unit tests are updated. Also, `git grep
defaultStaticAdapters` reaveals that the adapters are only consumed
once.
This commit adds `weightsToEdgeEvaluator`, a function for converting
weighted node types into an `EdgeEvaluator`. This replaces the
`edgeWeights` module (which was untested, and an outmoded API).
Test plan: The new `weightsToEdgeEvaluator` method is well-tested.
Since `WeightConfig` is still not tested, I manually verified that it
still works as anticipated.
Summary:
Lots of tests need the output of `yarn backend`. Before this commit,
they tended to create it themselves. This was slow and wasteful, and
also could in principle have race conditions (though in practice usually
tended not to).
This commit updates tests to respect a `SOURCECRED_BIN` environment
variable indicating the path to an existing directory of backend
applications.
Closes#765.
Test Plan:
Running `yarn test --full` passes.
Prepending `echo run >>/tmp/log &&` to the `backend` script in
`package.json` and running `yarn test --full` results in a log file
containing only one line, indicating that the script really is run only
once.
wchargin-branch: deduplicate-backend
This is convenient for testing other code, where we may want to directly
use the fallback types. One test has been updated in this way.
I also changed the names for the fallback adapter's edges to be somewhat
more readable.
Test plan: Tests improved.
This commit adds PluginWeightConfig, which is responsible for
adding all the weights for an individual plugin. The top-level
WeightConfig now creates multiple PluginWeightConfigs. It also takes
responsibility for hiding the FallbackPlugin.
Test plan: The PluginWeightConfig is tested (and fairly simple). The
top-level WeightConfig is not yet tested (#604), so I manually tested
that the weights in the app still function.
`testUtil.configureEnzyme` now additionally asserts, after every test,
that `console.error` and `console.warn` were not called. Tests that
explicitly expect such calls can still be written by manually re-mocking
the relevant console method (and several examples already exist).
The code that explicitly specifies this for various enzyme test files
has been removed.
Test plan: `git grep "not.toHaveBeenCalled"` shows only unrelated usage.
`yarn test` passes. Adding a spurious console.warn to a passing test
causes it to fail.
Fixes#668
Summary:
This simplifies interfaces everywhere.
See also #216, which did the opposite of this as a temporary fix due to
a Babel/Webpack interaction that no longer exists as of #766.
Test Plan:
Note that `node bin/sourcecred.js load sourcecred/example-git` still
works (after `yarn backend`). Note that `yarn test` still works. These
demonstrate that the module works from both a Webpack context and a Node
context. Note that `git grep --name-only execDependencyGraph` yields
exactly those files touched in this commit. Note that `yarn test --full`
passes.
wchargin-branch: commonjs-execDependencyGraph
Currently, it's possible to lock the cred explorer UI via the following
sequence of actions:
1. Press the analyze cred button
2. While it is running, modify the weights
After following this repro, the UI is stuck: it will say "loading"
forever, and the analyze cred button is disabled.
The issue is that loadGraph and runPagerank do not apply their success
(or failure) state transitions if they are pre-empted by another state
change. If a repo change occurs, that's the right behavior: the repo
change puts the state back to `"READY_TO_LOAD_GRAPH"`, which means the
UI is ready to re-load, and showing the PageRank results for the wrong
repo would be very confusing.
However, if an edge evaluator change occurs while loadGraph or
runPagerank is happening, the state is left in the "LOADING" status
(which means the analyze cred button is disabled).
This commit fixes the issue via a refactor: per @wchargin's suggestion,
responsibility for the edge evaluator moves from the state module out to
`credExplorer/App.js`. This dramatically simplifies the state module, as
we no longer need a `Substate` concept: we can simplify the state into a
single sequence of states.
As of the refactor, the bug is impossible.
Test plan: Unit tests have been updated to maintain coverage on the
refactored code. I manually tested that the bug no longer repro's, and
that the cred explorer UI continues to function. I did not add a new
test to protect against regression, because in the new codepath, the bug
is basically impossible.
This commit modifies the edge type so that it has a
`defaultForwardWeight` and `defaultBackwardWeight`, and these defaults
are respected by the `WeightConfig`.
I came up with reasonable-seeming defaults for the Git and GitHub
plugins; these will undoubtably be more methodically tuned in the
future.
Test plan: `yarn test` passes, also opening the cred explorer now has
the specified default weights in the WeightConfig. (Note that the
forward/backward directions are reversed as described in #749.)
This commit introduces a new component, `EdgeTypeConfig`, which is
responsible for configuring the weights for a given edge type. The
config creates two `WeightSlider`s: one for the forward direction, and
one for the backward direction. The `DirectionalitySlider` is no longer
used, and is removed. This fixes#596.
So as to avoid confusion, we now describe every edge with variables, as
in 'α REFERENCES β', and clarify that the weight modifies how cred flows
from β to α. This necessitated the creation of an `EdgeWeightSlider`,
local to the `EdgeTypeConfig`, which sets up a `WeightSlider` with the
necessary greek characters.
The EdgeTypeConfig is tested, so this is continuing progress towards
solving #604.
Test plan: I manually verified that modifying edge weights has the
expected effect on cred scores. Also, some new unit tests are included.
This factors `NodeTypeConfig` out of the `WeightConfig` component. The
scope for a `NodeTypeConfig` is that it configures a single node type.
Right now it just renders a single `WeightSlider`, but I like factoring
out both for consistency with the `EdgeTypeConfig` (see #749) and
because I expect we may want to add more complexity later.
Test plan: The new component has some tests, also I manually tested the
frontend.
* StateTransitionMachine.loadGraph reports success
Step one towards #586. This will enable us to chain runPagerank after
loadGraph only if the load went through successfully.
Test plan: Unit tests included.
* Add StateTransitionMachine.loadGraphAndRunPagerank
This methods combines `loadGraph` and `runPagerank` into one method
which internally chains the two method. `runPagerank` is only called if
`loadGraph` was successful.
Progress on #586.
Test plan:
The new method has attached unit tests. I implemented the unit tests via
mocking, which seemed quite convenient as the method is basically a
wrapper for chaining two other function calls.
* Combine loadGraph and runPagerank into one button
Resolves#586. The new button is called "Analyze cred".
Test plan: Unit tests, also I tested it manually.
* StateTransitionMachine.loadGraph reports success
Step one towards #586. This will enable us to chain runPagerank after
loadGraph only if the load went through successfully.
Test plan: Unit tests included.
* Add StateTransitionMachine.loadGraphAndRunPagerank
This methods combines `loadGraph` and `runPagerank` into one method
which internally chains the two method. `runPagerank` is only called if
`loadGraph` was successful.
Progress on #586.
Test plan:
The new method has attached unit tests. I implemented the unit tests via
mocking, which seemed quite convenient as the method is basically a
wrapper for chaining two other function calls.
Step one towards #586. This will enable us to chain runPagerank after
loadGraph only if the load went through successfully.
Test plan: Unit tests included.
Fixes#732; see that issue for context.
Test plan:
The success case still works (verified that loading
sourcecred/sourcecred works).
I haven't tested the error case, as getting a real RATE_LIMIT_EXCEEDED
form GitHub is time-consuming, and has only happened once in practice.
I'm pretty confident the code works because it's a simple adaptation of
the code that catches other cases.
Summary:
This commit changes the CLI to use the code in `cli` instead of `oclif`.
A subsequent commit will remove the dependency on OClif altogether.
Resolves#580.
Test Plan:
Note that `yarn backend; node bin/sourcecred.js help` works. Note that
the documentation in the README is still correct.
wchargin-branch: cli-replace-oclif
Summary:
This ports the OClif version of `sourcecred load` to the sane CLI
system. The functionality is similar, but the interface has been
changed a bit (mostly simplifications):
- The `SOURCECRED_GITHUB_TOKEN` can only be set by an environment
variable, not by a command-line argument. This is standard practice
because it is more secure: (a) other users on the same system can
see the full command line arguments, but not the environment
variables, and (b) it’s easier to accidentally leak a command line
(e.g., in CI) than a full environment.
- The `SOURCECRED_DIRECTORY` can only be set by an environment
variable, not by a command-line argument. This is mostly just to
simplify the interface, and also because we don’t really have a good
name for the argument: we had previously used `-d`, which is
unclear, but `--sourcecred-directory` is a bit redundant, while
`--directory` is vague and `--sourcecred-directory` is redundant.
This is an easy way out, but we can put the flag for this back in if
it becomes a problem.
- The `--max-old-space-size` argument has been removed in favor of a
fixed value. It’s unlikely that users should need to change it.
If we’re blowing an 8GB heap, we should try to not do that instead
of increasing the heap.
- Loading zero repositories, but specifying an output directory, is
now valid. This is the right thing to do, but OClif got in our way
in the previous implementation.
Test Plan:
Unit tests added, with full coverage; run `yarn unit`.
To try it out, run `yarn backend`, then `node bin/cli.js load --help` to
get started.
I also manually tested that the following invocations work (i.e., they
complete successfully, and `yarn start` shows good data):
- `load sourcecred/sourcecred`
- `load sourcecred/example-git{,hub} --output sourcecred/examples`
These work even when invoked from a different directory.
wchargin-branch: cli-load
Summary:
This includes environment variables to specify the SourceCred directory
and the GitHub token. Parts of this may change once #638 is resolved.
Test Plan:
Unit tests included, with full coverage; run `yarn unit`.
wchargin-branch: cli-common
Summary:
This commit includes a minimal usage of an actual CLI application. It
provides the `help` command and no actual functionality.
Test Plan:
Unit tests added, with full coverage. To see it in action, first run
`yarn backend`, then run `node bin/cli.js help`.
wchargin-branch: cli-beginnings
Summary:
This commit introduces the notion of a `Command`, which is simply a
function that takes command-line arguments and interacts with the real
world. This infrastructure will enable us to write a well-tested CLI.
The `Command` interface is asynchronous because commands like `load`
need to block on promise resolution (for loading GitHub and Git data).
This is annoying for testing, but does not actually appear to be a
problem in practice.
Test Plan:
Unit tests added. See later commits for real-world usage.
wchargin-branch: cli-command-infrastructure
Summary:
Per #580, we aim to remove OClif. To do so, we move the old system to a
directory `oclif`, and will create the new system in the now-vacant
`cli` directory.
Test Plan:
Note that `yarn backend` still builds, that `node bin/sourcecred.js`
still has `help` and `load`, and that `git grep -wc cli` yields only
`yarn.lock:9`.
wchargin-branch: rename-cli-to-oclif
Our serialized RelationalView can get quite large - in the case of
TensorFlow it's over 190MB. This is a problem, as GitHub pages have a
hard cap of 100MB on hosted files.
As a temporary workaround, this commit introduces a method,
`compressByRemovingBody`, which strips away the bodies of every post. In
the longer term, we'll need a solution that scales with larger
repositories, e.g. sharding the relational view into smaller pieces.
Test plan: Unit tests were added. I've manually confirmed that the
newly-generated views are smaller (2.1MB vs 3.3MB), and that the
frontend continues to function.
Summary:
We store the relational view in `view.json.gz` instead of `view.json`,
taking advantage of the isomorphic `pako` library for gzip encoding and
decoding.
Sample space savings (note that post bodies are included; i.e., #747 has
not been applied):
SAVE OLD (B) NEW (B) REPO
89.7% 25326 2617 sourcecred/example-github
82.9% 3257576 555948 sourcecred/sourcecred
85.2% 11287621 1665884 ipfs/js-ipfs
88.0% 20953425 2520358 gitcoinco/web
84.4% 38196825 5951459 ipfs/go-ipfs
84.9% 205770642 31101452 tensorflow/tensorflow
<details>
<summary>Script to generate space savings output</summary>
```shell
savings() {
printf '% 7s % 11s % 11s %s\n' 'SAVE' 'OLD (B)' 'NEW (B)' 'REPO'
for repo; do
file="${SOURCECRED_DIRECTORY}/data/${repo}/github/view.json.gz"
if ! [ -f "${file}" ]; then
printf >&2 'warn: no such file %s\n' "${file}"
continue
fi
script="$(sed -e 's/^ *//' <<EOF
repo = '${repo}'
pre_size = $(<"${file}" gzip -dc | wc -c)
post_size = $(<"${file}" wc -c)
percentage = '%0.1f%%' % (100 * (1 - post_size / pre_size))
p = '% 7s % 11d % 11d %s' % (percentage, pre_size, post_size, repo)
print(p)
EOF
)"
python3 -c "${script}"
done
}
```
</details>
Closes#750.
Test Plan:
Comparing the raw old version with the decompressed new version shows
that they are identical:
```
$ <~/tmp/sourcecred/data/sourcecred/example-github/github/view.json \
> shasum -a 256 -
63853b9d3f918274aafacf5198787e18185a61b9c95faf640a1e61f5d11fa19f -
$ <~/tmp/sourcecred/data/sourcecred/example-github/github/view.json.gz \
> gzip -dc | shasum -a 256
63853b9d3f918274aafacf5198787e18185a61b9c95faf640a1e61f5d11fa19f -
```
Additionally, `yarn test --full` passes, and `yarn start` still loads
data and runs PageRank properly.
wchargin-branch: gzip-relational-view
The GitHub regex in urlIdParse.js incorrectly disallowed repo names with
underscores and dots. Fixes#721.
To mitigate errors like this in the future, code which uses regexes to
find owners and repos has been modified to all depend on the same regex
pattern.
Test plan:
Unit tests have been updated to include the failure case (they correctly
failed), and then code was updated so that the tests pass again.
Also, I manually verified that loading ipfs/js.ipfs.io no longer fails.
Paired with @wchargin
This commit isolates all of the log-weight behavior in the weight
slider. That slider moves in log space, but the numbers printed and
passed around the WeightConfig code are now always in linear-space.
This should reduce confusion in the UI and for developers.
This commit contains two other improvements: (#588)
- Changes the (log space) range on the sliders from ±10 to ±5
- Change the order from slider, weight, name to name, slider, weight, so
that there is more visual separation between the name and the weight.
Test plan: Changes to the weight slider are tested. Changes to the
WeightConfig aren't (#604) so I manually tested the UI.
PluginAdapters and Node/Edge types are increasingly fundamental to the
cred explorer. Prior to this commit, we had no canonical demo
adapters/types, and we would create ad-hoc and messy adapters whenever
we needed them. This creates unnecessary repetition and lowers test
quality.
This commit creates a canonical demo adapter (loosely themed based on
the wonderful game [Factorio]) and refactors most existing test cases to
use the demo adapters. In particular, the horrible mess of pagerankTable
adapters has been removed.
[Factorio]: https://www.factorio.com/
I left `aggregate.test.js` untouched because I would have needed to
materially re-write the tests to port them over. I added a comment so
that if we ever do re-write those tests, we'll use the new demo
adapters.
Test plan: `yarn test` passes.
This commit factors the weight sliders used for both node and edge
weights into a shared WeightSlider component, and factors out the
direction slider used for edge weights into a DirectionalitySlider.
Both of these components are tested. This is a step towards #604.
Test plan:
The specific behaviors of the sliders are well tested. Since the weight
config as a whole is not tested, I manually verified by messing with the
weights that node weights, edge weights, and edge directionality all
affects the cred distribution as anticipated.
Summary:
We currently load trees and then throw them away later, because we don’t
get useful signal from them. We should consider not doing that. This
will be faster.
Test Plan:
```
$ time node bin/sourcecred.js load tensorflow/tensorflow --plugin git
real 0m33.512s
user 0m35.196s
sys 0m12.489s
```
Also, `yarn test --full` passes.
wchargin-branch: git-deforestation
Adds a link titled "what is this?" that points to my gentle introduction
to cred. Also, move the feedback link to be next to it and get rid of
the prototype disclaimer.
Test plan: Visual inspection, also a test was updated.
Summary:
This fixes a bug where, if the `SOURCECRED_DIRECTORY` environment
variable is set to `foo` but the `-d bar` flag is passed, then the
repository registry will be written under `foo` but the plugin data will
be loaded under `bar`.
Test Plan:
```
$ rm -rf /tmp/good /tmp/bad
$ SOURCECRED_DIRECTORY=/tmp/bad >/dev/null \
> node bin/sourcecred.js load sourcecred/example-github -d /tmp/good
$ [ -d /tmp/bad ]; echo $?
$ find /tmp/good
/tmp/good
/tmp/good/cache
/tmp/good/cache/sourcecred
/tmp/good/cache/sourcecred/example-github
/tmp/good/cache/sourcecred/example-github/github
/tmp/good/cache/sourcecred/example-github/git
/tmp/good/repositoryRegistry.json
/tmp/good/data
/tmp/good/data/sourcecred
/tmp/good/data/sourcecred/example-github
/tmp/good/data/sourcecred/example-github/github
/tmp/good/data/sourcecred/example-github/github/view.json
/tmp/good/data/sourcecred/example-github/git
/tmp/good/data/sourcecred/example-github/git/graph.json
```
wchargin-branch: load-pass-context
Fixes#696.
Test plan: This is basically a config change, so I manually tested it.
I ran SourceCred on gitcoinco/web, which has two bots,
and verified that the bots are correctly removed from the list of users.
Selecting "Bots" in the dropdown filter shows the two bots. Changing
the user weight does not affect the bots' scores, and changing the bot
weight does affect the bots' scores.
Summary:
We can now set, at build time, a URL to be displayed at the top of the
prototype, encouraging users to provide feedback. If the URL is not
provided, it defaults to the appropriate topic on the SourceCred
Discourse instance.
The result looks like this:
![Screenshot of the feedback URL in the prototype][screenshot]
[screenshot]: https://user-images.githubusercontent.com/4317806/44814824-a238b380-ab92-11e8-88c8-dfbae27ca496.png
Test Plan:
Unit tests added to `yarn sharness-full` and `yarn unit`.
You can run `yarn start` to see the message with the default URL, or
`SOURCECRED_FEEDBACK_URL=http://example.com/ yarn start` to specify a
custom URL.
wchargin-branch: feedback-url
This commit adds a hardcoded list of known bots. Building on #713, it
categorizes those userlikes with the bot subtype. (Note that those users
may not be bots in the GitHub ontology - GitHub doesn't actually have a
clear record of which userlikes are bots.)
Progress towards #696.
Test plan:
Observe the single snapshot change, which demonstrates that @credbot is
now correctly categorized as a bot.
Summary:
As a first pass toward support for analyzing whole organizations, we
allow loading multiple repositories with `sourcecred load`, combining
them into a single relational view and a single Git graph at load time.
Test Plan:
Run
```
node bin/sourcecred.js \
load \
sourcecred/example-git \
sourcecred/example-github \
sourcecred/sourcecred \
--output sourcecred/examples \
;
```
and select `sourcecred/examples` from the web view. Filter “Repository”
nodes, and note that there are three.
Note that loading a single repository without `--output` still works,
that loading a single repository with `--output` still works (respecting
the alias name), and loading not exactly one repository without
`--output` yields an appropriate error message.
Note that `yarn sharness-full` still works.
wchargin-branch: load-combined
Userlikes now have an additional piece of data encoded in their address:
whether they are a USER or a BOT. Userlikes are still handled
identically by the RelationalView, which cuts down on code duplication.
I haven't added ORGANIZATIONs but it will be trivial to do once we're
interested in tracking them.
Note that this is basically the same as how we treat comments: comments
are subtyped to review comments, issue comments, and pull comments.
This is the initial step towards solving #696.
Test plan: Existing unit tests pass (and caught a few bugs during
development!). New test cases were added to the parser. Observe that all
the snapshot changes make sense.
Note: As of this commit, every GitHub userlike is classified as a user,
and the subtypes are not used in the application, so this commit causes
no change in observable behavior.
This commit changes the cred normalization algorithm so that the total
cred of all GitHub user nodes always sums to 1000. For rationale on the
change, see #705.
Fixes#705.
Note that this introduces a new way for PageRank to fail: if the
graph has no GitHub userlike nodes, then PageRank will throw an error
when it attempts to normalize. This will result in a message being
displayed to the user, and a more helpful error being printed to
console. If we need the cred explorer to display graphs that have no
userlike nodes, then we can modify the codepath so that it falls back to
normalizing based on all nodes instead of on the GitHub userlike nodes
specifically.
Test plan: There is an included unit test which verifies that the
new argument gets threaded through the state properly. But this is
mostly a config change, so it's best tested by actually inspecting
the cred explorer. I have done so, and can verify that the behavior is
as expected: the sum of users' cred now sums to 1000, and e.g. modifying
the weight on the repository node doesn't produce drastic changes to
cred scores.
This commit adds the logic for computing scores so that the total score,
summed across all nodes matching a NodePrefix, is a fixed constant.
See #705 for context.
Test plan: The logic is quite simple, and adequate unit tests are
included.
Note to reviewer: There is a spurious whitespace diff in the test file
because the tests for the previous test block were not correctly scoped.
Storing the user's weights in localStore enables a workflow where a
user chooses their preferred weights, and brings those weights with them
across projects and contexts. However, this is the wrong workflow:
actually, a project chooses its weights, and when a user visits a
particular project, they want to sync up with the project's choice.
Giving the user the ability to modify the weights and recalculate is
still important, so that they can propose improvements to the project
maintainer. But implicitly keeping their modified weights, and even
bringing them to other projects the user inspects, is
counter-productive.
This commit removes this dubious feature. (It's a feature we were likely
to drop anyway, as it conflicts with #703.) As an added bonus, this code
is untested, which means the feature is technical debt—so removing it
reduces our technical debt! It also removes at least one known bug.
Test plan: There are no tests. I manually verified that the frontend
still works, and that it no longer persists weights across refresh.
Summary:
This patch adds independent exponential backoff to each individual
GitHub GraphQL query. We remove the fixed `GITHUB_DELAY_MS` delay before
each query in favor of this solution, which requires no additional
configuration (thus resolving a TODO in the process).
We use the NPM module `retry` with its default settings: namely, a
maximum of 10 retries with factor-2 backoff starting at 1000ms.
Empirically, it seems very unlikely that we should require much more
than 2 retries for a query. (See Test Plan for more details.)
This is both a short-term unblocker and a good kind of thing to have in
the long term.
Test Plan:
Note that `yarn test --full` passes, including `fetchGithubRepoTest.sh`.
Consider manual testing as follows.
Add `console.info` statements in `retryGithubFetch`, then load a large
repository like TensorFlow, and observe the output:
```shell
$ node bin/sourcecred.js load --plugin github tensorflow/tensorflow 2>&1 | ts -s '%.s'
0.252566 Fetching repo...
0.258422 Trying...
5.203014 Trying...
[snip]
1244.521197 Trying...
1254.848044 Will retry (n=1)...
1260.893334 Trying...
1271.547368 Trying...
1282.094735 Will retry (n=1)...
1283.349192 Will retry (n=2)...
1289.188728 Trying...
[snip]
1741.026869 Ensuring no more pages...
1742.139978 Creating view...
1752.023697 Stringifying...
1754.697116 Writing...
1754.697772 Done.
```
This took just under half an hour, with 264 queries total, of which:
- 225 queries required 0 retries;
- 38 queries required exactly 1 retry;
- 1 query required exactly 2 retries; and
- 0 queries required 3 or more retries.
wchargin-branch: github-backoff
Summary:
The version number displayed in the application now displays much more
specific information. It now lists the Git commit from which the build
was constructed, and will identify whether we have accidentally deployed
a development instance (which would be slow) or an instance with
uncommitted changes (which would be bad).
The version information is computed during the initialization of the
Webpack config. For development, this means that it is computed when you
run `yarn start`, and not updated thenafter. If the stale information
presents actual confusion, we would need to backport Webpack 4’s support
for runtime values in `DefinePlugin` to Webpack 3 (or upgrade Webpack
by a major version).
Test Plan:
The logic for `GitState` and `Environment` has existing tests. With both
a clean tree and a dirty tree, run `yarn start` and build the static
site, and check that the resulting versions are correct.
wchargin-branch: use-rich-version-types
Summary:
These types will shortly be added to the global `VersionInfo`. For now,
we include the types and validation logic only.
Test Plan:
Unit tests suffice.
wchargin-branch: add-rich-version-types
This commit re-introduces the git plugin, now that it has been radically
simplified as described in [1]. The new git plugin only has nodes for
commits and only has commit has-parent edges. As compared to the version
that was removed in #628, this plugin is far leaner. It doesn't bloat
the graph (for `sourcecred/sourcecred`, the git plugin data is just
164k), and as such doesn't incur much performance penalty.
Re-incorporating the git plugin also brings some tangible benefits. We
already had git nodes in the graph, as the GitHub plugin attaches them
to pull requests. Without any git plugin, these nodes are displayed as
"uknown nodes" with ugly descriptions. Also, including a git plugin,
even one that is very minimal, communicates to users that git is a
source of information to SourceCred, and that they can expect more from
it in the future.
Note that this commit breaks backcompat for existing repositories that
were locally loaded after #628. As such, it is best to
`rm -rf $SOURCECRED_DIRECTORY` and start with fresh data. Also, due to a
known bug in the WeightConfig, you should reset your browser's local
storage.
Test plan: After removing the SourceCred directory and the stale
localStorage, the cred explorer nicely displays git commits, and
connects them via has_parent edges. The NodeType filter allows filtering
to commits as expected, and the WeightConfig shows node and edge weights
for the Git plugin's nodes and edges.
[1]: https://github.com/sourcecred/sourcecred/issues/627#issuecomment-413435447
The minimal git plugin adapter only provides commit nodes and has_parent
edges. See #627 for context.
I forked this from `git/pluginAdapter.js`, and then deleted the
nodeTypes and edgeTypes which are no longer in scope.
Test plan: This is a fork of untested "glue" code, and is itself still
untested.
This implements the approach suggested in [1]. Instead of forking the
git plugin entirely, we'll fork the createGraph method and the
pluginAdapter so that we have instances that produce a lightweight git
graph.
createMinimalGraph is a fork of createGraph that only adds commit nodes
and has_parent edges. New unit tests ensure that only the whitelisted
nodes and edges appear.
Supersedes #683 and #684.
Test plan: `yarn test`
[1]: https://github.com/sourcecred/sourcecred/issues/627#issuecomment-413623784
We often construct case statements over union-typed variables, and then
in the default case, we use a `(type: empty)` assertion to ensure that
failing to account for all the cases results in a flow error.
In the past, we created an extra line for this assertion, which required
some eslint suppressions. We've realized it's cleaner to inline the type
assertion in the runtime error that we throw in these defaults.
This code cleans everything to the new style, and removes every existing
`// no-unused-expressions` invocation in the codebase.
Test plan: `yarn test`
The 'Score' column is renamed to 'Cred' (and its prop is renamed as
well). The column which shows how a connection or aggregation
contributes to a node's cred, as a percentage, has been rendered
nameless. It is pretty self explanatory, and the previous name
("Connection") was meaningless.
Test plan: Unit tests, also I inspected the frontend.
Some CSS magic was required.
Also creates `src/app/version.js` for storing the version string.
Test plan: Visual inspection of the footer in both Chrome and Firefox,
both on a page with very little content (the cred explorer without a
repository loaded), and on a page with more than a screen height's of
content (the homepage, or cred explorer with a large repository loaded).
In all cases, the footer unobtrusively appears in the lower-left hand
corner at the bottom of the screen, (after scrolling past all content,
if applicable).
Summary:
The initial logo checkin in #637 included the 32px raster image, but
generated it in the _wrong order_ in the rasterizer script. This commit
fixes that heinous bug once and for all.
Test Plan:
Running `rasterize.sh` does not change the output.
wchargin-branch: rasterize-32px
Summary:
This commit approximately completes the implementation of #643.\* Plugin
adapters are now provided an `Assets` object at `load` time, which they
can use to resolve their plugin-specific API routes.
\* “Approximately” because there are some non-essential pieces of legacy
code that should be cleaned up.
Test Plan:
Unit tests modified, but it would be good to also manually test this.
Run `./scripts/build_static_site.sh` to build the site to, say,
`/tmp/gateway/`. Then spin up a static HTTP server serving `/tmp/` and
navigate to `/gateway/` in the browser. Note that the entire application
works.
wchargin-branch: use-assets-in-PluginAdapters
Summary:
This commit is the next step in #643. It makes the `RepositorySelect`
robust to being hosted at arbitrary gateways by accepting `Assets` and
resolving the repository registry API route appropriately.
Test Plan:
Unit tests modified, but it would be good to also manually test this.
Run `./scripts/build_static_site.sh` to build the site to, say,
`/tmp/gateway/`. Then spin up a static HTTP server serving `/tmp/` and
navigate to `/gateway/` in the browser. Note that you can navigate
around the application and load the repository registry on the prototype
without any console warnings or errors, although you cannot yet load
actual graph data.
wchargin-branch: use-assets-in-RepositorySelect
Summary:
This commit takes the next step toward #643 by exposing `Assets` to our
React components at top level. Components will be expected to pass them
down as appropriate; this commit does not add any actual uses.
Test Plan:
Apply the following patch:
```diff
diff --git a/src/app/Page.js b/src/app/Page.js
index 24c2602..7ac2641 100644
--- a/src/app/Page.js
+++ b/src/app/Page.js
@@ -24,6 +24,10 @@ export default class Page extends React.Component<{|
<Link to="/" className={css(style.navLink, style.navLinkTitle)}>
SourceCred
</Link>
+ <img
+ alt="fav"
+ src={this.props.assets.resolve("/favicon.png")}
+ />
</li>
{routeData.map(({navTitle, path}) =>
NullUtil.map(navTitle, (navTitle) => (
```
Then, observe that the favicon loads correctly and updates across page
loads and refreshes in the following situations:
- under `yarn start`;
- after building the static site and serving from root;
- after building the static site and serving from another gateway.
wchargin-branch: use-withAssets
Summary:
This is the last piece of major infrastructure for #643. It will enable
components like `Page` and `CredExplorerApp` to receive `Assets` as a
prop.
A previous iteration of the same functionality used the new Context API
in React v16.3. This did a good job of solving the problem in production
code, and was convenient. However, it is currently intractable to test
with the current state of Enzyme. It’s plausible that this might improve
in the future, so if threading down the props becomes to onerous, we
might check in to see how our testing libraries are doing. I expect that
the threading should not be too bad, given that we do the same thing
with `localStore`, which has worked (as far as I’m aware) without a
hitch.
Test Plan:
Unit tests added; `yarn test` suffices.
wchargin-branch: withAssets
Summary:
As the next step for #643, this patch enables the app to be rendered at
non-root gateways by incorporating the relative-path history
implementation developed in #666. The app is not fully functional:
our React components do not yet know how to resolve assets, and so
fetches of resources like the repository will be against the wrong URLs.
Test Plan:
- Note that `yarn start` still works.
- Run `./scripts/build_static_site.sh` to build the site into, say,
`/tmp/gateway`.
- Run a static web server from `/tmp/gateway/` and note that (a) the
paths listed in the page source are relative, and (b) everything
works as intended, with no console messages in either Chrome or
Firefox.
- Run a static web server from `/tmp/` and navigate to `/gateway/` in
the browser. Note that the app loads properly, and that refreshes
work (i.e., the `pushState` paths are real paths). Note that the
repository registry cannot yet be loaded, and so PageRank cannot be
run.
wchargin-branch: relative-router
Summary:
This is the first observable step toward #643. Assets whose paths are
known as literals at server-side rendering time are now referenced via
relative paths. This means that the favicon and JavaScript bundle can be
loaded from an arbitrary gateway. The actual bundle code will still only
work when loaded from `/`.
This commit stands alone so that the enclosing change to the Webpack
config can be in as small a change as possible.
Test Plan:
- Note that `yarn start` still works.
- Run `./scripts/build_static_site.sh` to build the site into, say,
`/tmp/gateway`.
- Run a static web server from `/tmp/gateway/` and note that (a) the
paths listed in the page source are relative, and (b) everything
works as intended, with no console messages in either Chrome or
Firefox.
- Run a static web server from `/tmp/` and navigate to `/gateway/` in
the browser. Note that the favicon and JavaScript are correctly
noted, but that the router raises an error because it is trying to
load a non-existent route. (This behavior is unchanged.)
wchargin-branch: relative-lexically-static
Previously, expanding a node would display the individual connections
that contributed cred to that node. For nodes with high degree, this was
a pretty noisy UI.
Now, expanding a node displays "aggregations": for every type of
adjacent connection (where type is the union of the edge type and the
adjacent node type), we show a summary of the total cred from
connections of that type. The result is a much more managable summary
view. Naturally, these aggregations can be further expanded to see the
individual connections.
Closes#502.
Test plan: The new behavior is unit tested. You can also launch the cred
explorer and experience the UI directly. I have used the new UI a lot,
as well as demo'd it to people, and I like it quite a bit.
For the group of aggregations returned by aggregation operation (e.g.
the set of aggregations returned by a call to `flatAggregate`), the keys
are unique.
Test plan: `yarn test`
The TableRow currently has some margin on the left, but not on the
right. This is visually unbalanced, especially when expanded so depth>0,
as the content on the right is at the very edge of the shaded rectangle.
This commit cleans that up a bit!
Test plan: Visual inspection (see screenshots in the pull request). I
don't think unit tests are necessary for small visual tweaks like this.
Summary:
This is necessary for #643. If we’re serving `/prototype/index.html`, we
need to to use `..` to refer to the root of the site. This patch adds
`rootFromPath`, which performs the relevant transformation. (The
implementation is trivial, but figuring out exactly what the
specification should be was not!)
Test Plan:
Unit tests added; `yarn test` suffices.
wchargin-branch: rootFromPath
Summary:
This will enable clients to obtain the path to a static asset, even when
the app is not hosted at the root of a server, as outlined in #643.
This module will be used for simple assets (images, etc.) and API data
(fetches from `/api/**`) alike.
This supersedes #663. It includes the logic from that PR (`Assets`)
without the React-specific context bindings (`AssetsContext`).
Test Plan:
Unit tests included; `yarn test` suffices.
wchargin-branch: assets-resolver
Summary:
See #643 and the module docstring on `createRelativeHistory.js` for
context and explanation.
This patch adds `history@^3.0.0` as an explicit dependency—previously,
we were depending on it only implicitly through `react-router` (which
was fine then, but is not now). The dependency is chosen to match the
version specified in `react-router`’s `package.json`.
Test Plan:
Extensive unit tests included, with full coverage; `yarn test` suffices.
wchargin-branch: createRelativeHistory
This is preparation for #502 - we want to be able to describe groups of
nodes, e.g. "52 repositories", so we need a plural form for every node
name.
As a fly-by fix, I removed the parentheses around the node names in the
fallback adapter, as these proved to look ugly/inconsistent in the UI.
Test plan: `yarn test` is sufficient.
We humans tend to find information about humans more interesting than
information about commits or pulls. The UI should accomodate this by
defaulting to displaying GitHub user nodes in the cred explorer.
This is implemented as a new nullable argument to the PageRankTable. If
not present, then the filter defaults to showing all nodes. If the
default filter is present but doesn't match any available type, an error
is thrown.
Test plan: The new behavior is tested. Also, I checked it in the UI and
it works.
Closes#651
Previously, the ConnectionRow showed the score of the node that was the
source of the connection. I believe the UI will be more consistent and
useful if it instead shows the connection score, i.e. how important that
connection was to the node in scope. This combos well with PR #657.
Test plan: The change is very simple, and covered by unit tests. I also
verified the behavior by examining the cred explorer.
Summary:
This function normalizes paths like `foo/bar//../baz` to `foo/baz`. The
implementation is directly copied from Node’s source code, which is
available under an MIT License.
I looked for a suitable NPM package, and rejected `path-normalize` and
`normalize-path`. (The former is closer and explicitly purports to be
the right thing, but actually isn’t.)
Test Plan:
Unit tests added, with full coverage except for one branch; I include a
proof that the branch is unreachable.
Tested on Node v8.9.4, Node v10.0.0, and Node v10.8.0. Tests pass in
each case. In the latter two cases, inconsistencies between the
implementation and the actual `path.posix.normalize` would cause a test
failure. In the former case, they do not. (All such cases verified.)
wchargin-branch: path-normalize
Currently, the PagerankTable creates components in the following
pattern:
```
NodeRow (depth=0)
> ConnectionRowList (depth=1)
> ConnectionRow (depth=2)
> ConnectionRowList (depth=3)
```
This commit changes the cycle to the following:
```
NodeRow (depth=0)
> ConnectionRowList (depth=0)
> ConnectionRow (depth=0)
NodeRow (padding=true, depth=1)
> ConnectionRowList (depth=1)
> ConnectionRow (depth=1)
```
This has some nice properties:
First, the context visually resets every time we return to a NodeRow, which
makes it feasible to change the score column to always have a context
dependent meaning:
- for a node row, the score is the score of that node.
- for a connection row, the score is the score contribution of that
connection
- (as of #502): for an aggregation row, the score is the score
contribution of that aggregation
We think this will be visually clear thanks to the padding around the
new NodeRow, along with the new color block indicating a new scope.
This design also ensures that every NodeRow has the full width available
to it (rather than getting crushed into a progressively smaller area of
the table), which will be very convenient for when we add renderers for
the nodes.
Thanks to @theliamcrawford as the idea for this came during a user study
with him.
Test plan:
The updated unit tests should be comprehensive. Also, try expanding some
rows in the cred explorer and verify that the behavior is as described.
William and I were experimenting, and felt that this color is slightly
more pleasing / harmonious with the rest of the site, and still quite
legibile.
Test plan: Examine the new UI, and conclude that the color choice is
harmonious and legible :). Screenshot included with the PR.
Paired with @wchargin
Summary:
Due to <https://github.com/facebook/flow/issues/6400>, patches like the
following weren’t raising Flow errors:
```diff
diff --git a/src/app/adapters/adapterSet.test.js b/src/app/adapters/adapterSet.test.js
index 67dd3ed..ccc6ac6 100644
--- a/src/app/adapters/adapterSet.test.js
+++ b/src/app/adapters/adapterSet.test.js
@@ -77,6 +77,7 @@ describe("app/adapters/adapterSet", () => {
const x = new TestStaticPluginAdapter();
const fallback = new FallbackStaticAdapter();
const sas = new StaticAdapterSet([x]);
+ sas.wat();
return {x, fallback, sas};
}
it("errors if two plugins have the same name", () => {
```
A `flow type-at-pos` check indicated that the type of `sas` was indeed
inferred as `any`.
This patch applies the usual nonsensical fix. Better safe than spooky.
Test Plan:
The above patch now raises a Flow error.
wchargin-branch: annotate-adapterset-constructors
Summary:
See justification in the added unit test.
Test Plan:
Added unit test, with justification.
Also, `yarn sharness-full` passes, and `yarn start` still works.
wchargin-branch: route-trailing-slashes
This commit adds the `showPadding` prop to `TableRow`s. If showPadding
is true, then the row will have vertical padding above the row, and
below the last child of the row. The padding will match the background
color of the given row. The padding is implemented as extra `tr`
elements that themselves contain empty tds.
Test plan: The new behavior is pretty thoroughly covered by new unit
tests. Additionally, if you want to see padding in the live UI, you can
apply the following (slightly contrived) diff.
```
diff --git a/src/app/credExplorer/pagerankTable/Connection.js b/src/app/credExplorer/pagerankTable/Connection.js
index 3a882cd..633525b 100644
--- a/src/app/credExplorer/pagerankTable/Connection.js
+++ b/src/app/credExplorer/pagerankTable/Connection.js
@@ -70,7 +70,7 @@ export class ConnectionRow extends React.PureComponent<ConnectionRowProps> {
depth={depth}
description={connectionView}
connectionProportion={connectionProportion}
- showPadding={false}
+ showPadding={depth % 3 === 0}
score={sourceScore}
>
<ConnectionRowList
```
Currently, as we expand nodes or connections in the PagerankTable, the
rows both get more indented and attain a deeper color. Both of these
behaviors are controlled by the `depth` parameter.
We're going to switch the UI to a cyclic structure, where as you drill
down, once you get back to a particular node, the indentation resets to
base, but the color - which now indicates nested depth - continues to
change. This commit sets that change up, by splitting the behvaior into
two parameters: `depth`, which controls the color, and `indent`, which
controls the indentation level.
As a small additional tweak, the indentation formula is changed so that
buttons are always indented by 5 pixels. This results in a cleaner
display for nodes that have `depth>0` but `indent==0` (as the button
doesn't look squahsed against the color boundary).
Test plan:
The change is very simple; inspecting the updated snapshots should be
persuasive.
We currently have two components which create rows in our PagerankTable:
the `NodeRow` and `ConnectionRow`. Work on #502 will result in the
addition of a new one, the `AggregationRow`. It's time to stop
duplicating logic (and testing) of the shared behavior for these rows,
like depth-based styling, row expansion/collapse, etc. This commit pulls
all the common logic to rendering rows into a single, thoroughly tested
`TableRow` component.
There is one observable change in the UI: when a connection percentage
is not available (i.e. for NodeRows), we now leave the column empty
rather than placing a dash in the column. I think this is visually
cleaner.
Test plan: Unit tests pass, and this part of the code base is thoroughly
tested, so that's a pretty reliable indicator. I also poked around the
live PagerankTable in the cred explorer, just to be safe.
When using Tries, we often want the last matching entry for the given
path, and to throw an error if one is not available. By adding this
method to the API, we avoid a lot of unnecessary repetition in the code
base.
Test plan: Unit tests pass. As this touches the untested WeightConfig,
I've also manually tested the weight config behavior.
Summary:
An `import *` was used for convenience, but this effects a value import
in addition to a type import. By exploding the wildcard import to
directly import the required types, we can shave off 2.3% of our
post-gzip bundle size (131.82 KB to 128.74 KB). It’s unfortunate that we
lose the namespacing, but c’est la vie.
Test Plan:
`yarn flow` suffices.
wchargin-branch: explode-wildcard-type-import
Thanks to #642, it should now be safe to disable the Git plugin, reaping
the benefits described in #628, without causing the cred explorer to
crash (#631).
Test plan:
- `yarn travis --full` passes
- The full cred explorer works:
- Running PageRank does not crash the explorer
- Expanding a pull request does not crash the explorer
- (After clearing state) the weight config doesn't show Git weights
- The filter doesn't show Git nodes
This modifies WeightConfig to properly use the fallback node type, as
created in #640 and merged in #642. As an additional change, it now
displays type names, rather than the address parts. For example, the
issue type is now displayed as `Issue`, not
`["sourcecred", "github", "issue"]`.
The WeightConfig is an untested mess, and I will likely re-write it
entirely. (See a bevy of WeightConfig related issues: #604, #595, #588).
So, not too much effort was invested in keeping high code quality in
this commit.
Test plan: The weight config has no tests, so I manually tested:
- weights persist after page reload
- node weights influence cred attribution
- edge weights influence cred attribution
- edge directionality influences cred attribution
- weights have reasonably pretty description messages
This takes the code from #640 and puts it into production.
Test plan: Unit tests pass. The observable behavior in the cred explorer
is unchanged; i.e. the addition of the FallbackAdapter did not produce
new entries in the WeightConfig or in the Pagerank table options. The
WeightConfig is untested, so we don't have verification of that behavior
(other than that I tested it and am reporting it here). The
PagerankTable code is tested, and a snapshot would fail if another
option group had appeared.
Issue #631 revealed that our current plugin-handling code is fragile -
we aren't robust to having nodes from a plugin without having that
plugin present in the frontend. This commit adds `StaticAdapterSet` and
`DynamicAdapterSet`, which are abstractions over finding the matching
plugin adapter or type for a node or edge. It's a robust abstraction,
because the adapter sets always include the `StaticFallbackAdapter` or
`DynamicFallbackAdapter`, which can match any node, so we'll never get
an error like #631 due to not having an adapter / type available.
Also relevant: #465
Test plan:
Unit tests included.
Summary:
This patch considers an environment variable `GITHUB_DELAY_MS`. If the
value is set to a positive integer, a delay of the given number of
milliseconds will be incurred before each query to GitHub. This is to
decrease the probability of being rate-limited; see #350 for details.
This in turn unblocks us to load SourceCred data for larger
repositories.
The use of an environment variable is something of a hack to get this
off the ground. See #638 for long-term plans.
Test Plan:
Run `time node ./bin/sourcecred.js load sourcecred/example-github` with
varying values for `GITHUB_DELAY_MS`. Note that with the variable unset,
set to zero, set to a negative number, or set to a non-numeric value,
the job completes quickly; when the delay is set to `5000`, the job
takes an extra five seconds.
wchargin-branch: delay-github-queries
For #502: The UI that I currently have in mind displays aggregations
grouped by connection type and node type together, rather than nested.
I think it will be cumbersome to have multiple hierarchical
levels of expansion.
To make that UI easy to write, this commit adds some logic for
flattening the hiearchical aggregation from #624. I add an extra
translation to flatten, rather than just having the logic produce nested
structures, because it's convenient to keep around the nested structure
in case I decide to implement the hierarchical UI instead. Once we have
solidified how we want the UI to behave, we might choose to simplify
this code.
Test plan: The implementation is rather simple. There are some unit
tests.
This commit creates the directory `src/app/adapters` and moves the
following three files into it:
- `src/app/pluginAdapter.js`
- `src/app/pluginAdapter.test.js`
- `src/app/defaultPlugins.js`
This is in preparation for a principled fix for #631, which will add a
base plugin and some logic that ensures it's always included.
Summary:
In addition to the obvious benefit of having a favicon, this gets rid of
a 404 Not Found error on our home page, tremendously boosting our hacker
cred.
Test Plan:
The favicon is displayed in both `yarn start` and the static site (as a
result of the build script). The added build test fails before this
change.
wchargin-branch: add-favicon
Summary:
The SVG was artisanally crafted by yours truly, and rasterized by the
accompanying script (which is fully deterministic, and also artisanal).
Test Plan:
Run `./src/assets/logo/rasterize.sh`, and note that the output is
unchanged.
wchargin-branch: add-logo
Summary:
This is a follow-up to #514, wherein we disabled new service workers and
instructed any existing service workers to self-destruct. (See that PR
for the rationale.) This commit removes them from our codebase entirely,
enabling us to slim down our build process and our build output.
Test Plan:
Running `yarn start` still works. Building the static site and exploring
it works, too.
wchargin-branch: remove-sw
Summary:
This fixes the following warning on our development instance:
> Warning: render(): Calling ReactDOM.render() to hydrate
> server-rendered markup will stop working in React v17. Replace the
> ReactDOM.render() call with ReactDOM.hydrate() if you want React to
> attach to the server HTML.
We do in fact want to attach to the server HTML, so we apply the
suggested patch.
(The warning of course also applies to production, but warnings do not
appear in production.)
Test Plan:
Running `yarn start` shows that the above warning has disappeared, and
that the cred explorer still works. (Also, `yarn test --full` passes,
but that tells us effectively nothing because this code path is never
hit in tests: it only affects the HTML that is executed in the browser.
Erasing the entire module, leaving only `// @flow`, still lets tests
pass.)
wchargin-branch: migrate-to-hydrate
This reverts commit 8c70f03122.
Context: This introduced a serious bug (#631), so we're reverting it to
get the codebase back in a working state. Meanwhile, I'll develop a
principled solution.
Test plan:
I rebuilt the backend, re-loaded a graph, and loaded it in the frontend.
PageRank, the cred explorer, and the weight config all work. Opening a
pull request does not trigger a crash.
This implements two methods:
`aggregateByNodeType` groups `scoredConnection`s by the specified
`NodeTypes`, along with summary statistics.
`aggregateByConnectionType` groups `scoredConnection`s by
`ConnectionType` at the top level, where `ConnectionType` includes
`EdgeType` and direction, (and also captures synthetic self-loops).
Then it also groups by `NodeType` within any aggregation.
This is progress towards #502.
Test plan: unit tests included.
See #627 for context.
Removing the Git plugin results in an enormous performance improvement.
In my testing on `metamask/metamask-extension`: before this change, load
took 23s, and PageRank took >9 mins and then crashed. After this change,
load+PageRank took 5s combined.
Test plan: All unit tests pass; loading new data from the CLI works; and
I poked around the UI to make sure there were no spurious references to
the Git plugin.
Note: This does not break backcompat, there's no need to regenerate any
already-loaded data.
This is the first real step towards #502.
Factoring this out because deciding the type signatures was non-trivial,
and the work was paired with @wchargin.
Test plan: `yarn test`
PagerankTable is getting a bit unwieldy, especially as #502 will need to
add a new pair of components. This commit splits the erstwise
PagerankTable.js into four files:
- `pagerankTable/shared`, shared utils and types
- `pagerankTable/Node`, the `NodeRow` and `NodeRowList`
- `pagerankTable/Connection`, the `ConnectionRow`, `ConnectionRowList`,
and `ConnectionView`
- `pagerankTable/Table`, the `PagerankTable` itself
This commit makes no logical changes; it is purely a reorganization.
Test plan: `yarn test`
PageRank outputs scores as components in a probability distribution.
This means that most scores are very small numbers, e.g. 0.00003. This
doesn't make for a great UI (humans don't like thinking in tiny
decimals).
Our first attempt to come up with a more readable UI was to use log
scores; in #265 we displayed the log score alongside (arbitrarily)
`rawScore * 100` in the UI. The log scores were more usable, so we kept
them, with subsequent modifications. In the original version, all the
log scores were negative. In #466, we arbitrarily added 10 to the
scores, which made most scores look nicer, but introduced a meaningless
switch where scores counter-intuitively become negative after a certain
point. That was bad, so in #535 we started displaying negative log
scores. This is also counter-intuitive: it's weird that lower scores are
better, and it's not clear that a score of (say) 3 is 20x better than a
score of 6.
I think we need to do away with the log scores; people just don't think
about numbers logarithmically. This commit switches to linear scores,
normalized so that the largest score is always 1000. I've tried this out
on a few repos and demo'd it to people, and it seems much clearer.
Test plan: Some unit tests added; also, I launched the cred explorer and
experienced the change on several projects.
We manually import the set of plugins to be used by the app in a few
different places. That's a bad smell. This commit creates a centralized
import point instead.
Test plan: `yarn test`
This commit adds some consistent and tested methods for getting the
appropriate plugin adapter for a given Node/Edge address. There are
methods for both static and dynamic adapters.
In the event that more than one plugin adapter matches the given
address, an error is thrown; likewise in the case where there is no
matching adapter.
Test plan: `yarn test`
Relevant to #465
Summary:
The `node ./bin/sourcecred.js load` command invokes plugin code by
providing an output directory into which the plugin may store data.
As of this patch, it also provides a cache directory that the plugin may
use to store data that will not be available at runtime. For instance,
the Git plugin might choose to clone the repository herein, or the
GitHub plugin may choose to store partial GraphQL query results to deal
with interruptions. The contract is that the cache directory may be
removed at any time and that the plugin should continue to operate
normally.
Test Plan:
The build script has been updated and tested. Reverting the change to
the build script causes the newly added test to fail. (Each plugin has a
cache directory, though the cache directories are empty for now.)
wchargin-branch: create-plugin-cache
This reverts commit 25b0124b56.
@wchargin had an extensive offline conversation about PluginAdapters,
and decided that for now we awnt to punt on figuring out the right
abstraction for having a "core"-scoped plugin adapter. Instead, we'll
keep on using plugin adapters as something of a kitchen sink where we
throw in all the per-plugin logic that we need to run the app.
This necessitates reverting #615 because we don't think that React
should be a dependency in core, but we will need the
DynamicPluginAdapter to have a type dependency on React so that we can
solve issues like #590.
Test plan: Yarn test suffices.
There's no reason for it to be in `app` - the concepts it contains are
core concepts, e.g. node types and graphs.
For now I'm leaving the `NodeType` and `EdgeType` interfaces with the
PluginAdapter. They may move later, but I'm relucant to clutter the
graph class more, and all I need for now is that they live in core.
Test plan: It's just a move/rename, so `yarn test` is amply sufficient.
Summary:
We never use the `node ./bin/sourcecred.js start` command. This command
contains an Express server to combine the static files with the build
output, which duplicates the logic in our Webpack config, which we
actually use (with `yarn start`). Once we actually want the command line
entry point to be a useful tool for end users, we can consider
reimplementing it the right way, whatever that may be. Until then, it’s
simply one more thing to keep in sync.
Test Plan:
Running `yarn test --full` passes; the `load` CLI command still works;
running `yarn start` still works.
wchargin-branch: remove-start
For #465, I'm planning to create an abstraction over NodeTypes and
EdgeTypes which traverses a hierarchy of types and aggregates/reduces
information across all the matching types for a given Node/Edge address.
To do that efficiently, we will want tries[1].
Thanks to @wchargin for helping me figure out how to implement this.
Test plan: Unit tests. The code is a little tricky so please review it
closely.
[1]: https://en.wikipedia.org/wiki/Trie
Consider the following types:
```
// Used to be called "Contributor"
export type Adjacency =
| {|+type: "SYNTHETIC_LOOP"|}
| {|+type: "IN_EDGE", +edge: Edge|}
| {|+type: "OUT_EDGE", +edge: Edge|};
// Used to be called "Contribution"
export type Connection = {|
+adjacency: Adjacency,
// This `weight` is a conditional probability: given that you're at
// the source of this connection's adjacency, what's the
// probability that you travel along this connection to the target?
+weight: Probability,
|};
// Used to be called "ScoredContribution"
export type ScoredConnection = {|
+connection: Connection,
+source: NodeAddressT,
+sourceScore: number,
+connectionScore: number,
|};
```
These types represent how a node's PagerankScore is influenced by its
connections in the markov chain. The previous names, "Contributor",
"Contribution" and "ScoredContribution", were quite confusing as
elsewhere in the project "contributon" means something that added value
to the project, and "contributor" means the author of contributions.
While these new names aren't necessarily much better a priori, in the
context of the project's vernacular they are much less confusing.
Test plan: It's just a rename, and `yarn test` passes.
Historically, `credExplorer/App.js` instantiated a PagerankTable
regardless of its state, and would pass null props when the App didn't
have data needed to load the table. As of #583, we just don't create the
PagerankTable before its data is available, which is a simpler/better
contract. As such, the type signature of PagerankTable's props can be
simplified, and some logic for handling the null case may be removed.
Test plan: `yarn test` passes, which is sufficient.
Pull #579 reifies the cred explorer state as an explicit state
transition machine, with a well-tested implementation. This pull
re-writes `credExplorer/App.js` to use that implementation, and thoroughly
tests it.
The result is that `credExplorer/App.js` has much simpler code
(because it just binds the rendered components to the state machine),
and is much more thoroughly tested.
To ensure easy testability of the `App` class, it was refactored so that
the module exports a factory function which takes a method for creating
the `AppStateTransitionMachine` and returns an `App` class. This ensures
that in test code, we can easily mock the state transition machine. This
had no effect on external callers, since the higher-level `<AppPage>`
class, which already wraps over `LocalStore` choice, was already the
preferred call site.
I also added a loading indicator component, which presently displays a
status text corresponding to the state, such as "Loading graph...", or
"Error while running PageRank". This way, there is always some user
feedback during loading states (which could take a while).
Test plan:
Visual inspection, and the very thorough included unit tests.
Summary:
Here, we make the width consistent across the home page, the explorer,
and the navbar. Arguably, the PageRank table itself should be wider. We
can let it pop out of the box model with `relative`, `width`, and `left`
properties (using `calc`), but we don’t want to deal with the details
right now.
At some time in the future, we can also of course unify these styles.
Paired with @decentralion.
Test Plan:
Running `yarn start` and clicking around the various pages suffices. To
check the external redirect page, you can apply
```diff
diff --git a/src/app/HomePage.js b/src/app/HomePage.js
index 4d0f832..0eee519 100644
--- a/src/app/HomePage.js
+++ b/src/app/HomePage.js
@@ -143,7 +143,8 @@ export default class HomePage extends React.Component<{||}> {
<h2>Roadmap</h2>
<p>
SourceCred is under active development.{" "}
- <Link className={css(styles.link)} to="/prototype">
+ <Link className={css(styles.link)} to="/discord-invite">
+ {/* STOPSHIP */}
We have a prototype
</Link>{" "}
that ingests data from Git and GitHub, computes cred, and allows the
```
and then click the appropriate link on the home page.
wchargin-branch: max-width-900
Summary:
@decentralion and I have spent a bunch of time on this prose, and we
think that it’s pretty good. Let’s put some content on our otherwise
bare site.
Test Plan:
Running `yarn start` suffices.
Paired with @decentralion.
wchargin-branch: home-page-prose
The cred explorer app has a variety of valid states. Currently, it is
thrown together without explicit documentation of what its states are,
how they transition, or error handling or testing. I worry that this
will be hard to maintain.
This commit creates the AppState type which explicitly reifies every
reachable state for the app, and a StateTransitionMachine which handles
transitions between states. The transitions are thoroughly tested,
including edge cases where the user makes a change while waiting for a
promise to resolve, or where one of the promises failes.
Test plan:
The unit tests are comprehensive. `yarn test` passes.
Thanks to @wchargin for much discussion about how to structure the
states.
"Explore(r)" does not accurately convey the current state of the
project. In order to more accurately convey the current state,
"Explore(r)" has been updated to "Prototype"
Addresses #584
Test plan: Visual inspection and manual testing of pathing