This adds two methods to cli2/common.js:
- loadJson, which loads a JSON file from disk and then parses it
- loadJsonWithDefault, which loads a JSON file from disk and parses it,
or returns a default value if the file is not present.
Both methods are well documented and well tested.
Test plan: `yarn test`; see included unit tests which are thorough.
This adds Combo parsing support to the compatible module. Now, rather
than writing `fromJSON` methods which implicitly take any, we can
instead write typesafe `parseJSON` methods which will parse compatible
headers, and then choose a version-appropriate parser.
Test plan: Added unit tests; `yarn test` passes.
Just a more sensible name since it's not a homepage of any sort.
Test plan: Flow passes, `yarn build2` and `yarn start2` both succeed.
Also: `git grep homepage2` gets no hits.
Now you pass `--instance PATH` rather than `--instance=PATH` which is
agreed to be much much better. (It tab completes.)
Test plan: Do the obvious thing. :)
Per discussion with @hammadj, @topocount, and @wchargin, we are planning
to have the frontend2 system use react-admin at the top level. Per
investigation by @topocount, react-admin conflicts with the older
version of react-router that we use.
As such, this commit wildly simplifies the homepage2 system so we no
longer have any routing, and instead we just statically render the
index.html file. We also removed the `Assets` type, not because we are
sure we don't need it, but because we didn't want to debug it while we
were all pairing. @wchargin offered to fix it up later.
Test plan:
- run `yarn start2 --instance=PATH` and observe that the "Under
Construction" message displays, along with console messages showing that
data loaded successfully.
- run `yarn build2` and copy files from `build2` into the root of a cli2
instance. Run an http server in that instance, and observe that the
frontend displays properly per instructions above.
Paired with: @wchargin
Paired with: @hammadj
Paired with: @topocount
This modifies the frontend2 so that we can load real data from cli2
instances, including:
- the root `sourcecred.json` config
- files inside the `config/` directory
- files inside the `output/` directory
This works both for the dev server and the compiled output.
In the case of the dev server, it's now necessary to provide the path to
the cli2 instance you're developing against. You can pass this via the
`--instance` argument, as in `yarn start2 --instance=/path/`, or via
the `$SOURCECRED_DEV_INSTANCE` environment variable (recommended). If
neither is provided, or if that path doesn't look like a valid instance,
an error is thrown.
In the case of the built output, given a valid sc2 instance, you can set
it up via the following:
```
cp -r build2/favicon.png build2/index.html build2/static $INSTANCE
```
Then spin up a simple http server in the $INSTANCE.
You can look at console messages in the frontend to verify that the
instance is working (note it expects to see a discourse config file, not
a GitHub config file as was used in some previous examples).
Test plan:
Setup an example Discourse instance, and then turn on the server via
`yarn start2 --instance=PATH`,
`SOURCECRED_DEV_INSTANCE=path yarn start2`,
and by manually copying in the built outputs using the instructions
above.
In each case, you can load the homepage and check the console to see
that assets loaded successfully.
The frontend build system has a bunch of logic for loading in the list
of projectIds and including them in the build. Fortunately, in
frontend2, we won't need this. This commit simply removes all that
logic.
Test plan: Grepping for `projectIds` finds no hits inside the frontend2
system, except for a comment noting that we'll be able to remove an
argument once the transition is complete. `yarn start2` still works as
expected.
This drastically streamlines the new frontend entry directory.
All the logos, the nav bar, the pages, etc, are all gone.
Now there's just a landing page that reads "Under Construction".
Test plan: Run `yarn start2`, observe that there's just an empty landing
page that we'll rebuild from.
The cli2 ("instance") system has a foundationally different assumption
about how the frontend works: rather than having a unified frontend that
abstracts over many separate SourceCred projects, we'll have a single
frontend entry per instance. This means we no longer need (for example)
to make project IDs available at build time.
Our frontend setup and server side rendering is pretty complex, so
rather than rebuild it from scratch, I'm going to fork it into an
independent copy and then change it to suit our needs.
To start here, I've duplicated the `src/homepage` directory into
`src/homepage2`, duplicated the webpack config to
`config/webpack.config.web2.js`, and duplicated the paths and
package.json scripts.
Test plan:
Run `yarn start2` and it will start an identical frontend, using the
duplicated directory. Run `yarn build2` and it will build a viable
frontend into the `build2` directory. `build2` is gitignored.
As discussed [in Discord]:
At present, cli2 is organized around the following commands:
- init: help setting up a new SourceCred instance (not yet implemented)
- load [PLUGINS...]: load data from all active plugins (default) or
specific named plugins
- graph: regenerate graph for every plugin (will likely give this
[PLUGINS...] args in the future)
- merge: merge the generated graphs together
- score: compute scores over all the graphs
I'm going to change this to fold the merge command into the score
command. This is inspired by realizing that setting the weights.json
should get incorporated in the merge command from an implementation
perspective, but in the score command conceptually.
Thus, having merge and score as separate commands is actually an
anti-affordance, it allows the user to get into a confusing state where
the graph in their output directory is in a different state from the
scores.
(also, merge takes about 2s on sc's cred instance, so it isn't a huge
performance penalty on the flow of "recompute scores with new
parameters")
cc @wchargin
Also, I wonder whether we should persist the weightedGraph to disk at
all, seeing as it is already included in the credResult.json.
Storing the graph on its own is inefficient from a space perspective,
although it can be nice to get an eyeball sense of how big the
graph is compared to the associated cred data. For now, I'm not going to
write the weightedGraph to disk, simply because it's more convenient to
implement.
If there's a need, we can add it back later.
[in Discord]: https://discordapp.com/channels/453243919774253079/454007907663740939/721603543278157905
Test plan: manually tested it on a local instance
This modifies cli2/load so that you can now provide a list of fully
scoped plugin names (e.g. sourcecred/github) and it will load only the
mentioned plugins. If any plugins aren't available, an error is thrown.
If no plugins are listed, all of the activated plugins are loaded.
Test plan: Tested locally in success and failure cases. No unit tests
atm, which matches the rest of cli2--tbh this feels like the kind of
code where the liklihood of subtle failures or regressions is low, and
it's a pain to test, so I'm content with the status quo for now.
This adds a cliPlugin to the experimental Discord plugin, along the
lines of the cli 2 plugins for GitHub and Discourse.
Test plan: I set up a Cred instance for SourceCred including our Discord
config, and successfully loaded Cred for it. Not easy to give full
reproduction instructions, not least because it requires a private
Discord bot token.
This adds support for retrieving cred-augmented nodes and edges to the
CredView class. These methods wrap the underlying Graph methods, and
return nodes and edges that also include relevant cred information
Test plan: Unit tests included; yarn test passes.
This commit adds the CredView class, an interface for Graph-aware
queries over a CredResult. To start, I just added accessor methods for
reaching the constituent data in the CredResult.
Test plan: `yarn test`
In #1738, @wchargin updated the svg logo with a cleaner path structure.
This commit re-runs the rasterize script with the new svg, which
results in much cleaner pngs without aliasing.
Test plan: The new pngs look a lot cleaner.
Summary:
The Sharness snapshots contain large data files that are effectively
blobs, but happen to be encoded as UTF-8 JSON. Consequently, they fill
up a terminal with junk when they wriggle into a `grep` or `diff` or
similar. This patch fixes that by teaching Git to treat them as binary.
If you ever want to see a raw diff (unlikely, since these files have
many kilobytes of data with no newlines), just pass `-a`/`--text`.
This isn’t perfect—e.g., it only works with Git (not, e.g., Ripgrep)—but
it’s better than nothing.
Test Plan:
Run `git grep 6961` and note that the result now shows just three lines,
including the desired text result and two “Binary file matches” notes.
Run `git grep -a 6961` and note that the monstrosity returns.
wchargin-branch: sharness-snapshots-binary
If a project.json file defines a field that belongs to a newer version but the compat version wasnt updated, the
auto-upgrade code would silently throw away the field, changing it to the default setting of null. This fixes it
by spreading the project object after adding the new field instead of before, that way if that field was defined
in the project file it wont get overwritten while still adding the empty case if it wasnt defined.
Test Plan: Add a field belonging to ProjectV052 in a project file with compat version 0.5.0 and ensure it doesn't get
overwritten.
This updates the alias "plugin" to support the experimental Discord
plugin that we recently merged into master. Without this, we can't
resolve Discord aliases.
Test plan: Validated on SourceCred's own cred instance. No unit tests
added since both modules in question (experimental discord and alias)
are going to be replaced or re-written, and the experimental-discord
plugin has no testing at all.
Since the discord branch is being used in production by many projects (MetaGame, AraCred, RaidGuild, etc), and we also want to start using it for SourceCred, it would be a good idea to merge it into master in a separate experimental-discord plugin so the proper discord plugin can be developed in parallel while allowing the other projects to stay up to date with master.
paired with @decentralion
* Add params field to Project type
This will allow SourceCred instances to override the default alpha and interval decay parameters by adding an optional "params" field in their project.json file. If they don't include any of the fields, it will just fallback to use the default params.
Test Plan: Add params field in project.json file and see if the resulting SourceCred instance correctly picks up the custom alpha value.
* Rename params field in project to timelineCredParams
Makes the field more descriptive and allows it to be nullable
Test Plan: Ensure all usages of Project type are updated to use timelineCredParams field instead of params
* Update Snapshots
Update snapshots for new project version file
Test Plan: Ensure sharness tests pass in CI
This commit puts the lossy cred compression strategy from #1832 into
production.
When run on the MakerDAO forums, this drops the output size from 41MB
(close to the point where GitHub starts warning about file sizes) to
14MB (plenty of room to grow).
Test plan: I ran it on a local sc2 instance. Unlikely that this
introduced any subtle bugs.
This adds a new `compressByThreshold` method in the credResult module,
which compresses the CredResult by removing interval-level cred data for
flows of cred that are below a threshold.
Test plan: Unit tests included; `yarn test` passes.
This commit modifies the cli2 score command so that instead of writing
the output format described in #1773, it instead writes the new
CredResult format added in #1830. This is a much better engineered
format, and will be useful for building a new frontend as well as for
analysis by third parties (although we will need to provide client code
for parsing it).
Test plan: On a cli2 instance, run `sc2 score` and inspect the resulting
cred.json file.
This module has a concise and clean type for storing all the output data
from running SourceCred, including:
- The graph
- The weights
- The cred scores and flows
- The interval timing info
- The plugins used
- The parameters used
We also have a method `compute` which computes one of these given the
weighted graph, parameters, and plugins. It's all quite clean and
simple. I feel good about thsi API.
Test plan:
The main `compute` function is only sanity checked via flow and unit
testing, which is appropriate since it's a pure-piping function with
little than can go wrong. I've also added JSON serialization; this is
tested with round trip testing.
This commit adds the `analysis/credData` module, which processes raw
TimelineCredScores into a format which is better for serialization and
data analysis. In particular, this format explicitly stores the summary
(summed-across-time) data for nodes separately from the raw temporal
data, which will allow us to throw away raw data for uninteresting or
low-cred nodes, while keeping the summary.
Test plan: I've added some basic unit tests; run `yarn test`.
Summary:
Write `C.exactly(["red", "green", "blue"])` to admit any of a fixed set
of primitive values. This can already be implemented in terms of `raw`
and `fmap`, but it’s more convenient to have a dedicated helper.
Test Plan:
Unit tests included, retaining full coverage.
wchargin-branch: combo-exactly
Summary:
The `orElse` combinator tries parsing the input with a each of a
sequence of parsers, taking the first successful match (if any). It’s
handy for parsing union types; see examples in the patch.
This combinator treats all errors as recoverable. For instance, a parser
```javascript
C.orElse([
C.object({type: "NUMBER", number: C.number}),
C.object({type: "STRING", string: C.string}),
C.object({type: "BOOLEAN", boolean: C.boolean}),
]);
```
on input `{type: "NUMBER", number: "wat"}` will see that the first
parser fails and try all the others, even though we can see that as soon
as any of these parsers matches the `type` field, we can commit to that
parser rather than backtracking on failure. In a pathological case, even
a small parser can run very slowly and generate very long errors:
```javascript
it("big", () => {
let p = C.null_;
for (let i = 0; i < 12; i++) {
p = C.orElse([p, p]); // exponential in loop iteration count
}
const result = p.parse("non-null!");
if (result.ok) throw new Error("unreachable");
expect(result.err.length).toEqual(22558015);
});
```
Hopefully, this won’t be a problem in practice.
Test Plan:
Unit tests included, retaining full coverage.
wchargin-branch: combo-orelse
Summary:
If you want to parse an arbitrary JSON object, or if you want to parse a
complicated dynamic type that can’t easily be expressed in terms of the
other base parsers and combinators, use `C.raw`. This combinator accepts
any input and returns it as a `JsonObject`—not an `any`, so it’s still
safely typed, but the client (either an `fmap` transform or the consumer
of the eventual `parse`/`parseOrThrow`) will need to destructure it by
pattern matching on the `JsonObject` structure.
Test Plan:
Unit tests included (though full coverage technically doesn’t require
them).
wchargin-branch: combo-raw
Summary:
This is for homogeneous object types with unbounded key sets: roughly,
`dict` is to `object` as `array` is to `tuple`. Its implementation
requires no terrifying type magic whatsoever.
Test Plan:
Unit tests included, retaining full coverage.
wchargin-branch: combo-dict
Summary:
This new combinator parses heterogeneous tuples: arrays of fixed length,
which may have different, fixed types at each index. For example:
```javascript
type CompactDistribution = [NodeAddressT, number][];
function compactDistributionParser(): C.Parser<CompactDistribution> {
return C.array(
C.tuple([
C.fmap(C.array(C.string), NodeAddress.fromParts),
C.number,
])
);
}
```
Or:
```javascript
function compatParser<T>(underlying: C.Parser<T>): C.Parser<Compatible<T>> {
return C.tuple([
C.object({type: C.string, version: C.string}),
underlying,
]);
}
```
Test Plan:
Unit tests included, retaining full coverage.
wchargin-branch: combo-tuple
Summary:
This patch changes the Discourse plugin’s config parsing to use the
parser combinator approach, which simplifies it and removes the need for
all the tests.
Test Plan:
Prior to deleting the tests, they passed, except for the test that
mirror options could not be partially populated; the partial parsing now
works correctly.
wchargin-branch: discourse-config-combinator
This commit modifies the distributionToCred module so that in addition
to normalizing the scores into cred, it also normalizes the score flows
(from #1802) into cred flows. This is another big part of #1773, so that
we can support the `OutputEdge` format and give meaningful info on cred
flows for each edge.
The change is quite simple, as we already compute a normalization
constant, we just need to apply it to a few more arrays.
Test plan: I added a new unit test that has values in all of the
different flows, so we can validate that they are normalized
consistently.
Summary:
This patch converts the `instanceConfig` and `github/config` parsers
from hand-written traversals to combinator-based structures. The
resulting code is much simpler and easier to read.
Test Plan:
Running the `sc2` CLI for `load` and `graph` still works in the happy
path, with the expected errors if keys in the config files are changed
to the wrong strings.
wchargin-branch: combo-instance-github
Summary:
This patch expands the API of `Combo.object` such that fields in the
input JSON may be renamed in the output JSON. This often occurs in
combination with `fmap`, as we convert a simple string field with a
user-facing name to a different structured representation. For example:
```javascript
C.object({
repoIds: C.rename(
"repositories",
C.fmap(C.array(C.string), repoIdToString)
),
});
```
This is backward-compatible and invisible when not needed: the fields of
the argument to `C.object` may now be either parsers (as before) or
results of `C.rename`.
This patch also adds a check that the required and optional key sets
don’t overlap, which could technically have happened before but is more
important now that renames are possible.
Test Plan:
Unit tests included, retaining full coverage.
wchargin-branch: combo-rename-fields
Summary:
This combinator is critical to parsing real-world types, which almost
always contain objects at some level. This combinator supports objects
with both required keys and optional keys, as is often needed for
parsing config files and the like.
There’s a bit of dark Flow magic here. In particular, I’ve found that
using a phantom datum in the parser type is necessary for Flow to
correctly extract the output type of a parser without collapsing
discriminated unions. I also ran into some bugs with `$Shape`, which are
avoided by using `$Rest<_, {}>` instead, per this helpful user comment:
<https://github.com/facebook/flow/issues/7566#issuecomment-526324094>
Finally, Flow’s greedy inference for multiple-arity functions causes it
to infer and propagate an empty type in some cases, so we type `object`
as an intersection of functions, or, equivalently, an interface with
multiple callable signatures; this is a trick borrowed from the standard
library for functions like `filter`.
Critically, the tests for this module include tests for expected Flow
errors, so we should be reasonably safe against regressions here.
Test Plan:
Unit tests included, retaining full coverage.
wchargin-branch: combo-object
Summary:
These are commonly used for building large parsers from small pieces,
especially `fmap`, which is instrumental for transformation and
validation. We pick the name `fmap` rather than `map` to avoid confusion
with ES6 `Map` values, which are unrelated.
Test Plan:
Unit tests and type-checking tests included, with full coverage.
wchargin-branch: combo-pure-fmap
Summary:
We often want to parse data from JSON files on disk into similar object
structures in memory. But `JSON.parse` is untyped both statically and
dynamically: it has type `(string) => any`, and it’s happy to accept
structures that aren’t in the shape that you expected. Whenever we write
something like `const c: MyConfig = JSON.parse(raw)` where `raw` comes
from a user-editable file on disk, we’re introducing a trivial soundness
hole. Furthermore, we often want to use a different in-memory state from
the serialized form: perhaps we use ES6 `Map`s in memory, or perhaps
we’ve refined a raw string type to an opaque validated type like
`RepoId` or `NodeAddressT`. These can be done by manually walking the
output of `JSON.parse`, but it’s not pretty: see `instanceConfig.js` or
`github/config.js`.
Parser combinators are a solution to this problem that enable building
parsers for simple primitives and composing them to form parsers for
larger structures. This patch introduces the skeleton of a parser
combinator library, supporting JSON primitives and arrays (but not
objects) along with tests that show its usage. Support for heterogeneous
object (“struct”) types will come in a subsequent patch because the
typing implementation is more complicated, though the interface to
clients is just as simple.
For comparison, this is essentially the `FromJSON` half of the Haskell
library [Aeson][aeson].
It’s possible that we’ll want to generalize this to a broader system of
profunctor optics, maybe over monad transformers, which would make it
easier to both parse and serialize these structures (using “isos” rather
than just parsers everywhere). But manually serializing the structures
is easier than manually parsing them, because they start out strongly
typed. The profunctor generalization is more complicated, and in the
meantime this solves a useful problem, so let’s defer the generality
until we decide that we need it.
[aeson]: https://hackage.haskell.org/package/aeson
Test Plan:
Unit tests included, with full coverage.
wchargin-branch: combo-init
This adds the missing `merge` and `score` commands to the sc2 CLI.
`merge` currently just merges the plugin graphs, and doesn't yet support
identity resolution or weight overrides.
`score` computes the new data output format (cf #1773) and writes it to
disk. It doesn't yet support using custom parameters.
Test plan:
Follow the test plans for the previous commits, then run `sc2 merge`.
It will create a combined graph at `output/graph.json`, which will
contain data for both Discourse and GitHub. Then run `sc2 score` and
the `output/cred.json` file will contain scores for the combined
graph.
Summary:
We’d written `die("usage: ...")`, but forgotten to actually return.
Test Plan:
After patching a test instance’s `sourcecred.json` to be invalid,
running `sc2 load extra-arg` now only prints a usage message rather than
an invalid configuration message.
wchargin-branch: cli2-actually-die
This updates the v2 CLI so that it now supports the Discourse plugin.
Test plan:
Modify the test instance described in the previous commits so that the
root `sourcecred.json` file includes "sourcecred/discourse" in the list
of bundled plugins. Then, add a
`config/sourcecred/discourse/config.json` file with the following
contents:
`{"serverUrl": "https://sourcecred-test.discourse.group/"}`
Now, running `sc2 load` will load Discourse data, and `sc2 graph` writes
a Discourse graph in the output directory.
This commit makes a big step towards realizing the v3 output format
(see #1773). Specifically, it modifies the timelinePagerank interval
result format so that in addition to the distribution (the score for
each ndoe), we also track:
- How much score flowed to each node from the seed
- How much score flowed across each edge in a forwards (src->dst)
direction
- How much score flowed across each edge in a backwards (dst->src)
direction
- How much score flowed to each node from its synthetic self loop
The result is that we can now precisely decompose where a node's score
came from (and where it went to). Specficially, for every node we have
the invariant that the node's score is equal to the sum of its seed
score, its synthetic loop score, and the forward flow on each edge for
which the node was dst, and the backward flow for each edge on which the
node was src.
Test plan:
I've added unit tests which verify that the invariant above holds for
real PageRank results on a small example graph.
As part of work for #1773, I want to add a lot more complexity to the
logic for computing individual time-slices of pagerank scores, so that
we can trace how much score flowed on each individual edge. This means
adding more complexity to the _computeTimelineDistribution function;
however, that function is an untested wrapper, so I'm hesitant to add
more complexity directly.
Instead, I'm first factoring out an _intervalResult method and a
corresponding type, which computes the scores for a given timeline
interval. I've also added sanity checking tests for this method. In a
followon commit, I'll add more logic for tracking edge-level score
flows.
Test plan: This is just a refactor, maintaining existing behavior and
adding tests. `yarn test --full` passes. Since our sharness tests
include doing a full load (including timeline cred computation) on
realistic data from GitHub, this gives us confidence that there hasn't
been any change to cred semantics.
This commit refactors internal helper methods in timelinePagerank so
that rather than piping around an OrderedSparseMarkovChain, we instead
provide the NodeToConnections from which that OSMC may be derived. This
is important because the NodeToConnections has the information necessary
to derive how score flowed across individual edges, and not just on the
adjacency topology of the graph. This will allow us to compute the
OutputEdge format with edge-specific cred flows as documented in #1773.
Test plan: `yarn test` passes. It's a simple refactor.