I'm re-organizing SC data to be oriented on the graph, rather than on
plugin-specific data structures. So there is no longer a need for the
git loading logic which orients around saving a repository.json file
that's been potentially merged across repos, or indeed the logic for
merging repositories at all. So I'm removing `git/loadGitData`,
`git/mergeRepository`, and relatives.
Test plan: `yarn test --full` passes.
There's no need for us to depend on `mkdirp`, because the `fs-extra`
module already has `fs.mkdirp` and `fs.mkdirpSync`. This commit removes
the dep from our `package.json`, and removes all explicit imports of it.
Test plan: `yarn test --full` passes. `git grep "import mkdirp"` has no
hits.
This adds a new module, `api/load`, which implements the logic that will
underly the new `sourcecred load` command. The `api` package is a new
folder that will contain the logic that powers the CLI (but will be
callable directly as we improve SourceCred). As a heuristic, nontrivial
logic in `cli/` should be factored out to `api/`.
In the future, we will likely want to refactor these APIs to
make them more atomic/composable. `api/load` does "all the things" in
terms of loading data, computing cred, and writing it to disk. I'm going
with the simplest approach here (mirroring existing functionality) so
that we can merge #1233 and realize its many benefits more easily.
This work is factored out of #1233. Thanks to @Beanow for [review]
of the module, which resulted in several changes (e.g. organizing it
under api/, having the TaskReporter be dependency injected).
[review]: https://github.com/sourcecred/sourcecred/pull/1233#pullrequestreview-263633643
Test plan: `api/load` is tested (via mocking unit tests). Run `yarn test`
This commit refactors the `util/taskReporter` module so that
`TaskReporter` is an interface; the class previously called
`TaskReporter` is renamed to `LoggingTaskReporter`. We also export a
`TestTaskReporter` which implements the interface, and is very easy to
test.
The motivation: This will make it much easier to write tested code that
uses a `TaskReporter`, as now the test code can provide a
`TestTaskReporter` and check that all tasks get finished, that task
ids are as expected, etc.
Test plan: The `TestTaskReporter` is tested. Run `yarn test`.
Throughout the codebase, we freeze objects when we want to ensure that
their properties are never altered -- e.g. because they are a plugin
declaration, or are being re-used for various test cases.
We generally use `Object.freeze`. This has the disadvantage that it does
not work recursively, so a frozen object's mutable fields and properties
can still be mutated. (E.g. if `const obj = Object.freeze({foo: []})`,
then `obj.foo.push(1)` will succeed in mutating the 'frozen' object).
Sometimes we anticipate this and explicitly freeze the sub-fields (which
is tedious); sometimes we forget (which invites errors). This change
simply replaces all instances of Object.freeze with [deep-freeze], so we
don't need to worry about the issue at all anymore.
Test plan: `yarn test` passes (after updating snapshots);
`git grep Object.freeze` returns no hits.
[deep-freeze]: https://www.npmjs.com/package/deep-freeze
This is a replacement for `github/loadGithubData` which returns a
combined Graph rather than a combined RelationalView. This provides a
major benefit, which is that we can use the (robust) Graph merge logic
rather than the (buggy) relational view merge.
Test plan: This function is untested. It basically pipelines a few APIs
together. I think that flow is basically sufficient to validate that it
works, and writing a unit test will be frustrating (mostly will involve
re-integrating the funcitonality via mocks). A future commit makes this
part of the pipeline that generates snapshot tests, so it is de-facto
integration tested.
This module builds on the project logic added in #1238, and makes it
easy to create projects based on a simple string configuration.
Basically, the spec `foo/bar` creates a project containing just the repo
foo/bar, and the spec `@foo` creates a project containing all of the
repos from the user/organization named foo.
This is pulled out of #1233, but I've enhanced it to support
organizations out of the box.
The method is thoroughly tested.
Test plan: `yarn test --full` still passes. Also, I've ensured that the
async `_getProjectIds` is still usable in our webpack configs (via
modifying and testing the dependent commits).
This creates a new `Project` type which will replace `RepoId` as the
index type for saving and loading data.
The basic data type is added to `project.js`. Rather than having a
`RepoIdRegistry`, I intend to infer the registry at build time by
scanning for available projects saved in the sourcecred directory. I've
added the `project_io` module for this task. It has methods for setting
up a project subdirectory, and loading the `Project` info from that
subdirectory.
To ensure that projects ids can be encoded even if they have symbols
like `/` and `@`, we base64 encode them.
To ensure that project ids can be retrieved at build time, the
`getProjectIds` method is factored out into its own plain ECMAScript
module. For all non-build time needs, it is re-exported from
`project_io`.
Test plan: Unit tests added; run `yarn test`.
See #1243 for context. This is basically a more aggressive version of
pull #1230 -- instead of just running unit tests in isolation, we also
run flow in isolation, and kill the servers afterwards.
Test plan: See how this fares in CI :)
* chore(package): update @babel/core to version 7.5.5
* chore(package): update @babel/plugin-proposal-class-properties to version 7.5.5
* chore(package): update @babel/preset-env to version 7.5.5
Test plan: CI passes.
* chore(package): update dependencies
* revert tmp upgrade
I'm having test failures when `tmp` is upgraded; they seem to repro only
when many tests are running at once. Since we have no issues with the
older version of tmp, let's just keep an old tmp and inform Greenkeeper
not to touch it.
Test plan: `yarn test`
Ever since I upgraded all of the dependencies, we've been having
regular CI failures, which seem to share a common root cause of memory
exhaustion. Here are some examples: [1], [2].
[1]: https://circleci.com/gh/sourcecred/sourcecred/1246
[2]: https://circleci.com/gh/sourcecred/sourcecred/1239
After some experimentation, I've found that we can solve the
issue by ensuring that jest runs on its own in CI, so that it doesn't
contend with other tests for memories. Also, I reduce its max workers to
2, which matches the number of CPUs in the CircleCI containers.
Unfortunately, this does increase our build time. The postcommit (non
full) test now takes 45-60s (up from 30-50s), and the full test is also
a little slower. However, building in about one minute is still
acceptably fast, and having regular flakey test failures is not
acceptable, so this is still a win.
If we want to improve on this in the future, we should look into the git
shells getting spawned in `config/env.js`. I noticed that they were
often involved in the out-of-memory failures.
Also, I modified `.circleci/config.yml` so that any branch matching the
regular expression `/ci-.*/` will trigger a full build. That makes it
easier to test against CI failures.
Test plan: I ran about ~10 full builds with this change, and more with
similar variations, and they all passed. Verify that the full builds
that are run for this commit also all pass! Also, verify that running
yarn test locally has unchanged behavior, and running
`yarn test --ci` locally lets jest run to completion before running
any other test.
Includes a change to `cli/load` and `build_static_site.sh` to accept a `--weights WEIGHTS_FILE` argument.
This allows overriding the default weights at build-time using a `weights.json` that has the same format as previously generated in the frontend.
Test plan:
Adds an additional test-case as well for propagating the optional parameter.
The file I/O of loading and parsing a weights.json file was tested manually. As analysis/weights' fromJSON() is tested elsewhere as is passing weight parameters.
Prior to #1136, we needed an `ExplorerAdapter` abstraction to get node
description data to the frontend. Now that it's included in the graph,
we can throw away this abstraction, which is a big step towards plugin
simplification (work towards #1120).
Since it only affects a deprecated/legacy part of the code base, I
didn't put much effort into making the result super clean. I also
removed a few tests that became inconvenient.
Test plan: Verified that the legacy frontend still works. There's one
tiny regression, which is that the link color in the legacy frontend no
longer matches the rest of the UI, but that's actually consistent with
the timeline frontend, so no biggie.
`yarn test` passes.
This resolves an issue that caused jest tests to fail when depending on
a module that ships .mjs files (encountered via a transitive dep of
react-markdown).
See https://github.com/facebook/create-react-app/pull/4085 for context.
Test plan: I have a future commit which tests a file that depends on
react-markdown. The tests fail to run before this commit, and they run
without issue afterwards.
This resolves an issue where build_static_site.sh fails if the
$SOURCECRED_DIRECTORY is not set.
I tried to add a test for this, but was unsuccessful; see discussion in
[a575638].
Test plan: I manually verified that running build_static_site.sh with an
unset SOURCECRED_DIRECTORY fails before this commit, and passes
afterwards.
[a575638]: a575638fa4
The scores are lightly processed from their internal representation.
Example usage:
```
$ yarn backend;
$ node bin/sourcecred.js load sourcecred/sourcecred
$ node bin/sourcecred.js scores sourcecred/sourcecred > scores.json
```
The data structure is as follows:
```js
export type NodeOutput = {|
+id: string,
+totalCred: number,
+intervalCred: $ReadOnlyArray<number>,
|};
export type ScoreOutput = Compatible<{|
+users: $ReadOnlyArray<NodeOutput>,
+intervals: $ReadOnlyArray<Interval>,
|}>;
```
Test plan: I added sharness tests at `sharness/test_cli_scores.t`.
In the past, we've used javascript tests for CLI commands. However,
those are pretty time-consuming to write, and are less robust than
simply running the command from bash. Check the snapshot for a sense of
what the new data format looks like. Also, the snapshot updater now
updates this snapshot too.
Relevant for #1047.
Thanks to @Beanow for feedback on the output format and design.
Thanks to @wchargin for help in code review.
This fixes a build error in test_build_static_site.t.
It's been masked by a lot of ENOMEM issues I've been having in the full
build, so a few failures had actually had a chance to accrue:
- Fixes an issue wherein loading the sourcecred/example-git repo would
error (on failure to normalize scores, because there was no user
activity). Fixed it pretty crudely by adding an issue to that
repository.
- Fixes an issue where a deprecation warning caused the
build_static_site build to fail. We now permit that particular warning.
- Updates the example load snapshot.
Test plan: `yarn test --full` passes once again.
This modifies `scripts/build_static_site.sh` so that it uses the cache
available in `$SOURCECRED_DIRECTORY/cache`. This makes the script less
hermetic, but also enormously faster for regular usage.
We use a symbolic link for efficiency, and so that the user's main cache
is updated by the data loaded by build_static_site.
Test plan: I've manually verified that running build_static_site.sh is
now fast, when building repos which are already locally present in
cache. Existing tests pass. The user's cache is not removed.
This modifies config.yml so that it will run full tests on a branch
called "ci-test". This will make it easier for us to test attemtps to
fix full build issues, without needing to iterate against master
directly.
As of this commit, the main SourceCred prototypes page now links to
timeline cred, meaning that timeline cred is now live. I've added a
link from the legacy explorer to the timeline explorer (which already
has a link out to the legacy explorer).
Test plan: Careful inspection of the frontend by the committer.
Also, yarn test.
This is a bulk rename of all the old explorer code into
`explorer/legacy`. Now that the timeline explorer exists, I intend to
prioritize development on that going forwards. Once the timeline
explorer is as good as the old explorer at decomposing a node's sources
of cred, I will remove the legacy explorer entirely.
Test plan: `yarn test`
This commit adds a TimelineExplorer for visualizing timeline cred data.
The centerpiece is the TimelineCredChart, a d3-based line chart showing
how the top users' cred evolved over time. It has features like tooltips,
reasonable ticks on the x axis, a legend, and filtering out line
segments that stay on the x axis.
An inspection test is included, which you can check out here:
http://localhost:8080/test/TimelineCredView/
Also, you can run it for any loaded repository at:
http://localhost:8080/timeline/$repoOwner/$repoName
This commit also includes new dependencies:
- recharts (for the charts)
- react-markdown (for rendering the Markdown descriptions)
- remove-markdown (so the legend will be clean text)
- d3-time-format for date axis generation
- d3-scale and d3-scale-chromatic for color scales
Test plan: The frontend code is mostly untested, in keeping with my
observation that the costs of testing the old explorer were really high,
and the tests brought little benefit. However, I have manually tested it
thoroughly. Also, there is an inspection test for the TimelineCredView
(see above).
It's very simple: a method that creates a copy of a `Weights`.
While writing this, I realized I should probably refactor the weights
module so that it exports a class rather than a bunch of methods
operating on a data structure. It would just be a cleaner API. But I'm
leaving that for another day.
Test plan: Unit tests added.
Now that babel is upgraded, upgrading webpack was pretty
straightforward.
- We take advantage of the new `mode` config option, and no longer need
to manually set up Uglify plugin
- Uglifyjs is back, I checked the prod build output: it's very ugly
- I updated the RemoveBuildDirectoryPlugin per instructions, and
verified it still works.
- I verified that of `yarn backend`, `yarn build`, and `yarn start` all
still work as expected.
I moved `config/babel.js` to `.babelrc.js` because it seemed like babel
7 really wanted that. I also blew away our (complicated, copied from
create-react-app) config and replaced it with a much, much simpler one.
Test plan: `yarn test` passes, `yarn start` still serves a working
server, and `scripts/build_static_site.sh` still produces a working
site.
Possibly we lost some nice features re: React debugging; if so I'll add
them back as I miss them.
We have an old version of uglify, and it's causing problems with
compiling d3-array, and also when upgrading to a newer babel.
I'm going to disable it for now, then upgrade babel, and then upgrade
webpack. Upgrading webpack will get us a later version of uglify without
these issues.
As described in #987, we use a single TTL across GitHub types. Right
now, the TTL is set to 7 days. This means that it's possible to run
`sourcecred load`, but still be missing the last 7 days worth of issues.
Now that we're doing timeline cred (cf #1212), this is not acceptable.
As a workaround until we fix#987, I'm decreasing the TTL to 12 hours.
That's still long enough to make a good experience for someone who is
tweaking config and calling `sourcecred load` a lot, but ensures that
freshly-loaded results still have recent activity.
Test plan: `yarn test`
This adds a TimelineCred class which serves several functions:
- acts as a view on timeline cred data
- (lets you get highest scoring nodes, etc)
- has an interface for computing timeline cred
- lets you serialize cred along with the graph and paramter settings
that generated it in a single object
One upshot of this design is that now if we let the user provide weights
(or other config) on load time in the CLI, those weights will get
carried over to the frontend, since they are included along with the
cred results.
TimelineCred has 'Parameters' and 'Config'. The parameters are
user-specified and may change within a given instance. The config is
essentially codebase-level configuration around what types are used for
scoring, etc; I don't expect users to be changing this. To keep the
analysis module decoupled from the plugins module, I put a default
config in `src/plugins/defaultCredConfig`; I expect all users of
TimelineCred to use this config. (At least for a while!)
Test plan: I've added some tests to `TimelineCred`. Run `yarn test`. I
also have a yet-unmerged branch that builds a functioning cred display
UI using the `TimelineCred` class.
fixup tlc
This adds the `filterTimelineCred` module, which dramatically reduces
the size of timeline cred by throwing away all nodes that are not a user
or repository. It also supports serialization / deserialization.
Test plan: unit tests included
This module takes the timeline distributions created by
`timelinePagerank`, and re-normalizes the scores into cred. For details
on the algorithm, read comments and docstrings in the module.
Test plan: Unit tests added.
As the name would suggest, this module allows computing timeline
PageRank on a graph. See documentation in the module for details on the
algorithm.
Test plan: The module has incomplete testing. The timelinePagerank
function calls out to iterators for getting the time-decayed node
weights and the time-decayed markov chain; these iterators are tested.
However, the wrapper logic that composes these pieces together,
calculates seed vectors, and runs PageRank is not tested. I felt it
would be a pain to test and settled for reviewing the code carefully,
and putting a cautionary note at the top of the function.
This commit adds new weight evaluators for nodes and edges. Unlike the
previous evaluator, edges and nodes are handled as separate concerns,
rather than composing the node weights into the edge weights. I think
this separation is cleaner.
Both evaluators use only the address, not the full (Node or Edge)
object. Although we may want to give the edge evaluator access to the
full Edge later, if we decide we want node-type-differentiated edge
weights (e.g. if a hasParent edge has a different weight depending on
whether it is connected to an Issue or a Repository).
weightsToEdgeEvaluator has been refactored to use the new evaluators,
and has been given a deprecation notice.
Test plan: `yarn test`
This commit adds an `interval` module which defines intervals (time
ranges), and methods for slicing up a graph into its consistuent time
intervals. This is pre-requisite work for #862.
I've added a dep on d3-array.
Test plan: Unit tests added; run `yarn test`
#1167 added some info to the README about how the user needs to have ssh keys setup. This was true at the time, but changed as a result of #1210. This commit fixes that up by removing the now-outdated information.