Summary:
We want to change this configuration so that our compilation of backend
applications can target latest Node. This commit forks the current
configuration so that we can modify it easily.
Test Plan:
Both `yarn start` and `yarn travis` work. The generated backend
applications work, too.
wchargin-branch: fork-babel-config
Setup following directions from [webpack-node-externals]
[webpack-node-externals]: https://www.npmjs.com/package/webpack-node-externals
This unblocks #210.
Test plan: `yarn backend` still succeeds, and the binary scripts still
work. The resultant binaries are much smaller, as seen below (note build
time is the same).
before:
```
❯ yarn backend
yarn run v1.5.1
$ node scripts/backend.js
Building backend applications...
Compiled successfully.
File sizes after gzip:
231.37 KB bin/printCombinedGraph.js
199.5 KB bin/fetchAndPrintGithubRepo.js
46.41 KB bin/cloneAndPrintGitGraph.js
21.48 KB bin/createExampleRepo.js
17.71 KB bin/loadAndPrintGitRepository.js
Build completed; results in 'bin'.
Done in 4.46s.
```
after:
```
❯ yarn backend
yarn run v1.5.1
$ node scripts/backend.js
Building backend applications...
Compiled successfully.
File sizes after gzip:
27.78 KB bin/printCombinedGraph.js
12.73 KB bin/cloneAndPrintGitGraph.js
12.41 KB bin/fetchAndPrintGithubRepo.js
6.03 KB bin/loadAndPrintGitRepository.js
5.52 KB bin/createExampleRepo.js
Build completed; results in 'bin'.
Done in 4.28s.
```
Summary:
This CI script accomplishes two tasks:
1. It speeds up our build by parallelizing where possible.
2. It opens the possibility for running Travis cron jobs.
Currently, this script by default does the same amount of work as our
current CI script. However, I’d like to move `yarn backend` into the
list of basic actions: a backend build failure should fail CI.
Note: this script is written to be executable directly by Node, so we
can’t use Flow types with the standard syntax. Instead, we use the
comment syntax: https://flow.org/en/docs/types/comments/
Test Plan:
The following should pass with useful output:
- `npm run travis`
- `GITHUB_TOKEN="your_github_token" npm run travis -- --full`
The following should fail with useful output:
- `npm run travis -- --full` (fail)
To test different failure modes, it can be helpful to add
```js
{id: "doomed", cmd: ["false"], deps: []},
{id: "orphan", cmd: ["whoami"], deps: ["who", "are", "you"]},
```
to the list of `basicTasks` in `travis.js`.
To test performance:
```shell
$ time node ./config/travis.js >/dev/null 2>/dev/null
real 0m8.306s
user 0m20.336s
sys 0m1.364s
$ time bash -c \
> 'npm run check-pretty && npm run lint && npm run flow && CI=1 npm run test' \
> >/dev/null 2>/dev/null
real 0m12.427s
user 0m13.752s
sys 0m0.804s
```
A 50% savings is not bad at all—and the raw time saved should only
improve from here on, as the individual steps start taking more time.
wchargin-branch: custom-ci
Test Plan:
This snapshot test is too unwieldy to actually read—it’s 1000 lines of
opaque SHAs and thrice-stringified JSON objects—so it should be
interpreted as a regression test only. The programmatic tests should
suffice.
wchargin-branch: wip-git-create-graph
Test Plan:
Run `yarn lint` and `yarn travis` and observe success. Add something
that triggers a lint warning, like `const zzz = 3;`; re-run and observe
failures.
wchargin-branch: lint
Previously, the address module exported `sortedByAddress`, a utility
function that sorts an array of `Addressable`s. This function was only
used in test code.
This commit replaces it with generic usage of `lodash.sortBy`. This
reduces the API surface area of the module, and removes test-only code
from the exported api.
New dependency added: `lodash.sortby`
https://www.npmjs.com/package/lodash.sortby
Summary:
In this newly added module, we load the structural state of a git
repository into memory. We do not load into memory the contents of any
blobs, so this is not enough information to perform any analysis
requiring file diffing. However, it is sufficient to develop a notion of
“this file was changed in this commit”, by simply diffing the trees.
Test Plan:
Unit tests added; `yarn test` suffices. Reading these snapshots is
pretty easy, even though they’re filled with hashes:
- First, read over the commit specifications on lines 69–83 of
`loadRepository.test.js`, so you know what to expect.
- In the snapshot file, keep handy the time-ordered list of commit
SHAs at the bottom of the file, so that you know which commit SHA is
which.
- To verify that the large snapshot is correct: for each commit, read
the corresponding tree object and make sure that the structure is
correct.
- To verify the small snapshot, just check that it’s the correct
subset of the large snapshot.
- If you want to verify that the SHA for a blob is correct, open a
terminal and run `git hash-object -t blob --stdin`; then, enter the
content of the blob and press `<C-d>`. The result is the blob SHA.
To run a sanity-check on a large repository: apply the following patch:
<details>
<summary>Patch to print out statistics about loaded repository</summary>
```diff
diff --git a/config/paths.js b/config/paths.js
index d2f25fb..8fa2023 100644
--- a/config/paths.js
+++ b/config/paths.js
@@ -62,5 +62,6 @@ module.exports = {
fetchAndPrintGithubRepo: resolveApp(
"src/plugins/github/bin/fetchAndPrintGithubRepo.js"
),
+ loadRepository: resolveApp("src/plugins/git/loadRepository.js"),
},
};
diff --git a/src/plugins/git/loadRepository.js b/src/plugins/git/loadRepository.js
index a76b66c..9380941 100644
--- a/src/plugins/git/loadRepository.js
+++ b/src/plugins/git/loadRepository.js
@@ -106,3 +106,7 @@ function findTrees(git: GitDriver, rootTrees: Set<Hash>): Tree[] {
}
return result;
}
+
+const result = loadRepository(...process.argv.slice(2));
+console.log("commits", result.commits.size);
+console.log("trees", result.trees.size);
```
</details>
Then, run `yarn backend` and put the following script in `test.sh`:
<details>
<summary>Contents for `test.sh`</summary>
```shell
#!/bin/bash
set -eu
repo="$1"
ref="$2"
via_node() {
node bin/loadRepository.js "${repo}" "${ref}"
}
via_git() (
cd "${repo}"
printf 'commits '
git rev-list "${ref}" | wc -l
printf 'trees '
git rev-list "${ref}" |
while read -r commit; do
git rev-parse "${commit}^{tree}"
git ls-tree -rt "${commit}" \
| grep ' tree ' \
| cut -f 1 | cut -d ' ' -f 3
done | sort | uniq | wc -l
)
echo
printf 'Running directly via git...\n'
time a="$(via_git)"
echo
printf 'Running Node script...\n'
time b="$(via_node)"
diff -u <(cat <<<"${a}") <(cat <<<"${b}")
```
</details>
Finally, run `./test.sh /path/to/some/repo origin/master`, and verify
that it exits successfully (zero diff). Here are some timing results on
SourceCred and TensorBoard:
- SourceCred: 0.973s via Node, 0.327s via git.
- TensorBoard: 30.836s via Node, 6.895s via git.
For TensorFlow, running via git takes 7m33.995s. Running via Node fails
with an out-of-memory error after 39 minutes, with 10GB RAM and 4GB
swap. See details below.
<details>
<summary>
Full timing details, commit SHAs, and OOM error message
</summary>
```
+ ./test.sh /home/wchargin/git/sourcecred 01634aabcc
Running directly via git...
real 0m0.327s
user 0m0.016s
sys 0m0.052s
Running Node script...
real 0m0.973s
user 0m0.268s
sys 0m0.176s
+ ./test.sh /home/wchargin/git/tensorboard 7aa1ab9d60671056b8811b7099eec08650f2e4fd
Running directly via git...
real 0m6.895s
user 0m0.600s
sys 0m0.832s
Running Node script...
real 0m30.836s
user 0m3.216s
sys 0m10.588s
+ ./test.sh /home/wchargin/git/tensorflow 968addadfd4e4f5688eedc31f92a9066329ff6a7
Running directly via git...
real 7m33.995s
user 5m21.124s
sys 1m5.476s
Running Node script...
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
1: node::Abort() [node]
2: 0x121a2cc [node]
3: v8::Utils::ReportOOMFailure(char const*, bool) [node]
4: v8::internal::V8::FatalProcessOutOfMemory(char const*, bool) [node]
5: v8::internal::Factory::NewFixedArray(int, v8::internal::PretenureFlag) [node]
6: v8::internal::DeoptimizationInputData::New(v8::internal::Isolate*, int, v8::internal::PretenureFlag) [node]
7: v8::internal::compiler::CodeGenerator::PopulateDeoptimizationData(v8::internal::Handle<v8::internal::Code>) [node]
8: v8::internal::compiler::CodeGenerator::FinalizeCode() [node]
9: v8::internal::compiler::PipelineImpl::FinalizeCode() [node]
10: v8::internal::compiler::PipelineCompilationJob::FinalizeJobImpl() [node]
11: v8::internal::Compiler::FinalizeCompilationJob(v8::internal::CompilationJob*) [node]
12: v8::internal::OptimizingCompileDispatcher::InstallOptimizedFunctions() [node]
13: v8::internal::Runtime_TryInstallOptimizedCode(int, v8::internal::Object**, v8::internal::Isolate*) [node]
14: 0x12dc8b08463d
```
</details>
wchargin-branch: load-git-repositories
# Please enter the commit message for your changes. Lines starting
# with '#' will be kept; you may remove them yourself if you want to.
# An empty message aborts the commit.
#
# Date: Mon Apr 23 23:02:14 2018 -0700
#
# HEAD detached at origin/wchargin-load-git-repositories
# Changes to be committed:
# modified: package.json
# new file: src/plugins/git/__snapshots__/loadRepository.test.js.snap
# new file: src/plugins/git/loadRepository.js
# new file: src/plugins/git/loadRepository.test.js
#
# Untracked files:
# out
# runtests.sh
# src/plugins/artifact/editor/ArtifactSetInput.js
# src/plugins/git/repository.js
# test.sh
# todo
#
Summary:
This commit moves our existing frontend tests to use Enzyme’s shallow
rendering API <http://airbnb.io/enzyme/docs/api/shallow.html>. The
benefit over also using `react-test-renderer` is simply consistency (the
two are functionally equivalent); the benefits over `mount` are that
subcomponents cannot contaminate the test state (i.e., you’re only
testing one component at a time), that the resulting snapshots are more
readable because the root props are not shown, and that the
implementation is more efficient. This is a follow-up to #102.
In a case where we actually need a full DOM tree, we should still feel
free to use `mount`, but we haven’t needed that yet.
Test Plan:
Verify that the new `ContributionList.test.js.snap` represents the same
data as the old one.
wchargin-branch: standardize-enzyme-shallow
Summary:
This is our first dynamic test of a React component! Enzyme looks pretty
easy to use to me, for both snapshot tests and interaction simulation.
In doing so, we catch a minor bug in the edge case where a contribution
is not owned by any plugin (`colSpan`, not `colspan`). This edge case
does not appear in the sample data, but it does appear in the test data,
even prior to this commit. The previous renderer, `react-test-renderer`,
appears not to surface this error. Furthermore, this bug did not cause
any user-visible errors except a `console.error`.
Test Plan:
Inspect the snapshot file to make sure that it is reasonable. (The
existing test case has its snapshot regenerated due to formatting
differences between the two renderers.)
To test that the browser error is fixed, render a contribution list on a
GitHub graph but with an empty adapter set. One way to do this is to comment out line 7 of
`standardAdapterSet.js`; alternately, you can use the React Dev Tools to
select the `ContributionList` node, then run
```js
$r.props.adapters.adapters = {};
$r.forceUpdate();
```
Note subsequently that there is no console error and that the `<td>`s in
question span three columns.
wchargin-branch: contributionlist-dynamic-test
Summary:
This commit begins to extend the artifact editor to display
contributions. To display contributions from arbitrary plugins, we need
to communicate with those plugins somehow. We do so via an adapter
interface that plugins implement; included in this commit is an
implementation of this interface for the GitHub plugin (partially: we
punt on rendering).
This includes a snapshot test. The snapshot format is designed to be
human-readable and -auditable so that it can serve as documentation.
Test Plan:
Run the application with `yarn start`. Then, fetch a graph and watch as
its contributions appear in the view.
wchargin-branch: contributions-and-adapters
Summary:
Running `yarn backend` will now bundle backend applications. They’ll be
placed into the new `bin/` directory. This enables us to use ES6 modules
with the standard syntax, Flow types, and all the other goodies that
we’ve come to expect. A backend build takes about 2.5s on my laptop.
Created by forking the prod configuration to a backend configuration and
trimming it down appropriately.
To test out the new changes, this commit changes `fetchGitHubRepo` and
its driver to use the ES6 module system and Flow types, both of which
are properly resolved.
Test Plan:
Run `yarn backend`. Then, you can directly run an entry point via
```
$ node bin/fetchAndPrintGitHubRepo.js sourcecred example-repo "${TOKEN}"
```
or invoke the standard test driver via
```shell
$ GITHUB_TOKEN="${TOKEN}" src/backend/fetchGitHubRepoTest.sh
```
where `${TOKEN}` is your GitHub authentication token.
wchargin-branch: webpack-backend
Summary:
It’s a whole new world of GraphQL! Our parser is now just a GraphQL
query that asks for exactly what we want and dumps it to a file. The
data exposed by the v4 API is also in a much nicer format than that of
the v3 API, so this is pretty much a universal improvement.
Currently, we do not handle pagination. We require that the repository
in question have fewer than a fixed number of issues, and comments per
issue, and reviews per PR, and review comments per PR, and so on. If
this limit is exceeded, the script will fail-fast with a nice error
message. To fix this, we’ll need to write a general-purpose pagination
API that allows traversing cursors at any level of the query.
Paired with @wchargin.
Test Plan:
Run
$ GITHUB_TOKEN="your_token_here" src/backend/fetchGitHubRepoTest.sh
and verify that it exits with 0. Note that if you change this script’s
repository from `tiny-example-repository` to `sourcecred`, the script
correctly fails and outputs a useful diff.
wchargin-branch: github-v4-graphql
Summary:
We need this for testing graph equality: deep-equality is not sufficient
because two graphs can be logically equal even if, say, two nodes are
added in different orders.
This commit adds a dependency on `lodash.isequal` for deep equality.
Test Plan:
New unit tests added. Run `yarn flow && yarn test`.
wchargin-branch: graph-equals
Reorganize the code so that we have a single package.json file, which is at the root.
All source code now lives under `src`, separated into `src/backend` and `src/explorer`.
Test plan:
- run `yarn start` - it works
- run `yarn test` - it finds the tests (all in src/explorer) and they pass
- run `yarn flow` - it works. (tested with an error, that works too)
- run `yarn prettify` - it finds all the js files and writes to them