Summary:
This patch converts the `instanceConfig` and `github/config` parsers
from hand-written traversals to combinator-based structures. The
resulting code is much simpler and easier to read.
Test Plan:
Running the `sc2` CLI for `load` and `graph` still works in the happy
path, with the expected errors if keys in the config files are changed
to the wrong strings.
wchargin-branch: combo-instance-github
Summary:
This patch expands the API of `Combo.object` such that fields in the
input JSON may be renamed in the output JSON. This often occurs in
combination with `fmap`, as we convert a simple string field with a
user-facing name to a different structured representation. For example:
```javascript
C.object({
repoIds: C.rename(
"repositories",
C.fmap(C.array(C.string), repoIdToString)
),
});
```
This is backward-compatible and invisible when not needed: the fields of
the argument to `C.object` may now be either parsers (as before) or
results of `C.rename`.
This patch also adds a check that the required and optional key sets
don’t overlap, which could technically have happened before but is more
important now that renames are possible.
Test Plan:
Unit tests included, retaining full coverage.
wchargin-branch: combo-rename-fields
Summary:
This combinator is critical to parsing real-world types, which almost
always contain objects at some level. This combinator supports objects
with both required keys and optional keys, as is often needed for
parsing config files and the like.
There’s a bit of dark Flow magic here. In particular, I’ve found that
using a phantom datum in the parser type is necessary for Flow to
correctly extract the output type of a parser without collapsing
discriminated unions. I also ran into some bugs with `$Shape`, which are
avoided by using `$Rest<_, {}>` instead, per this helpful user comment:
<https://github.com/facebook/flow/issues/7566#issuecomment-526324094>
Finally, Flow’s greedy inference for multiple-arity functions causes it
to infer and propagate an empty type in some cases, so we type `object`
as an intersection of functions, or, equivalently, an interface with
multiple callable signatures; this is a trick borrowed from the standard
library for functions like `filter`.
Critically, the tests for this module include tests for expected Flow
errors, so we should be reasonably safe against regressions here.
Test Plan:
Unit tests included, retaining full coverage.
wchargin-branch: combo-object
Summary:
These are commonly used for building large parsers from small pieces,
especially `fmap`, which is instrumental for transformation and
validation. We pick the name `fmap` rather than `map` to avoid confusion
with ES6 `Map` values, which are unrelated.
Test Plan:
Unit tests and type-checking tests included, with full coverage.
wchargin-branch: combo-pure-fmap
Summary:
We often want to parse data from JSON files on disk into similar object
structures in memory. But `JSON.parse` is untyped both statically and
dynamically: it has type `(string) => any`, and it’s happy to accept
structures that aren’t in the shape that you expected. Whenever we write
something like `const c: MyConfig = JSON.parse(raw)` where `raw` comes
from a user-editable file on disk, we’re introducing a trivial soundness
hole. Furthermore, we often want to use a different in-memory state from
the serialized form: perhaps we use ES6 `Map`s in memory, or perhaps
we’ve refined a raw string type to an opaque validated type like
`RepoId` or `NodeAddressT`. These can be done by manually walking the
output of `JSON.parse`, but it’s not pretty: see `instanceConfig.js` or
`github/config.js`.
Parser combinators are a solution to this problem that enable building
parsers for simple primitives and composing them to form parsers for
larger structures. This patch introduces the skeleton of a parser
combinator library, supporting JSON primitives and arrays (but not
objects) along with tests that show its usage. Support for heterogeneous
object (“struct”) types will come in a subsequent patch because the
typing implementation is more complicated, though the interface to
clients is just as simple.
For comparison, this is essentially the `FromJSON` half of the Haskell
library [Aeson][aeson].
It’s possible that we’ll want to generalize this to a broader system of
profunctor optics, maybe over monad transformers, which would make it
easier to both parse and serialize these structures (using “isos” rather
than just parsers everywhere). But manually serializing the structures
is easier than manually parsing them, because they start out strongly
typed. The profunctor generalization is more complicated, and in the
meantime this solves a useful problem, so let’s defer the generality
until we decide that we need it.
[aeson]: https://hackage.haskell.org/package/aeson
Test Plan:
Unit tests included, with full coverage.
wchargin-branch: combo-init
This adds the missing `merge` and `score` commands to the sc2 CLI.
`merge` currently just merges the plugin graphs, and doesn't yet support
identity resolution or weight overrides.
`score` computes the new data output format (cf #1773) and writes it to
disk. It doesn't yet support using custom parameters.
Test plan:
Follow the test plans for the previous commits, then run `sc2 merge`.
It will create a combined graph at `output/graph.json`, which will
contain data for both Discourse and GitHub. Then run `sc2 score` and
the `output/cred.json` file will contain scores for the combined
graph.
Summary:
We’d written `die("usage: ...")`, but forgotten to actually return.
Test Plan:
After patching a test instance’s `sourcecred.json` to be invalid,
running `sc2 load extra-arg` now only prints a usage message rather than
an invalid configuration message.
wchargin-branch: cli2-actually-die
This updates the v2 CLI so that it now supports the Discourse plugin.
Test plan:
Modify the test instance described in the previous commits so that the
root `sourcecred.json` file includes "sourcecred/discourse" in the list
of bundled plugins. Then, add a
`config/sourcecred/discourse/config.json` file with the following
contents:
`{"serverUrl": "https://sourcecred-test.discourse.group/"}`
Now, running `sc2 load` will load Discourse data, and `sc2 graph` writes
a Discourse graph in the output directory.
This commit makes a big step towards realizing the v3 output format
(see #1773). Specifically, it modifies the timelinePagerank interval
result format so that in addition to the distribution (the score for
each ndoe), we also track:
- How much score flowed to each node from the seed
- How much score flowed across each edge in a forwards (src->dst)
direction
- How much score flowed across each edge in a backwards (dst->src)
direction
- How much score flowed to each node from its synthetic self loop
The result is that we can now precisely decompose where a node's score
came from (and where it went to). Specficially, for every node we have
the invariant that the node's score is equal to the sum of its seed
score, its synthetic loop score, and the forward flow on each edge for
which the node was dst, and the backward flow for each edge on which the
node was src.
Test plan:
I've added unit tests which verify that the invariant above holds for
real PageRank results on a small example graph.
As part of work for #1773, I want to add a lot more complexity to the
logic for computing individual time-slices of pagerank scores, so that
we can trace how much score flowed on each individual edge. This means
adding more complexity to the _computeTimelineDistribution function;
however, that function is an untested wrapper, so I'm hesitant to add
more complexity directly.
Instead, I'm first factoring out an _intervalResult method and a
corresponding type, which computes the scores for a given timeline
interval. I've also added sanity checking tests for this method. In a
followon commit, I'll add more logic for tracking edge-level score
flows.
Test plan: This is just a refactor, maintaining existing behavior and
adding tests. `yarn test --full` passes. Since our sharness tests
include doing a full load (including timeline cred computation) on
realistic data from GitHub, this gives us confidence that there hasn't
been any change to cred semantics.
This commit refactors internal helper methods in timelinePagerank so
that rather than piping around an OrderedSparseMarkovChain, we instead
provide the NodeToConnections from which that OSMC may be derived. This
is important because the NodeToConnections has the information necessary
to derive how score flowed across individual edges, and not just on the
adjacency topology of the graph. This will allow us to compute the
OutputEdge format with edge-specific cred flows as documented in #1773.
Test plan: `yarn test` passes. It's a simple refactor.
This commit modifies the cli2 interfaces so that plugins may use task
reporters when loading, generating a reference detector, or creating a
graph. Also, the scaffold now automatically reports on task/plugin-level
progress as appropriate.
Test plan: Generate an example instance as described in previous
commits, then run `load` and `graph` and get timing info:
```
~/tmp/instance❯ node $sc/bin/sc2.js load
GO load
GO loading sourcecred/github
GO github: loading sourcecred/example-github
DONE github: loading sourcecred/example-github: 220ms
DONE loading sourcecred/github: 221ms
DONE load: 227ms
~/tmp/instance❯ node $sc/bin/sc2.js graph
GO graph
GO reference detector
GO reference detector for sourcecred/github
DONE reference detector for sourcecred/github: 296ms
DONE reference detector: 297ms
GO sourcecred/github: generating graph
DONE sourcecred/github: generating graph: 242ms
DONE graph: 544ms
```
Summary:
Paired with @decentralion.
Test Plan:
Follow the test plan for #1810, then additionally run
```
(cd /tmp/test-instance && node "$OLDPWD/bin/sc2.js" graph)
```
and note that the `output/graphs/...` directory has a graph JSON file.
wchargin-branch: cli2-graph
Summary:
This adds a `CliPlugin` interface and a basic implementation for the
GitHub plugin.
Paired with @decentralion.
Test Plan:
Create a new directory `/tmp/test-instance`, with:
```
// sourcecred.json
{"bundledPlugins": ["sourcecred/github"]}
// config/sourcecred/github/config.json
{"repositories": ["sourcecred/example-github"]}
```
Then, run
```
yarn backend &&
(cd /tmp/test-instance && node "$OLDPWD/bin/sc2.js" load)
```
and observe that the new instance has a cache directory containing a
GitHub database.
wchargin-branch: cli2-load
This commit refactors the TimelineCredScores data type so it is an
array-of-objects rather than an object-of-arrays. I want to add several
more fields (for forward cred flow, backwards cred flow, seed flow,
synthetic loop flow), and feel it will be a lot cleaner with an
array-of-objects.
This is a refactor of a local data type, and there's test coverage.
Likelihood of regression is very low.
Test plan: Updated tests; `yarn test` passes.
Summary:
This patch creates a new binary, `./bin/sc2`, which will be the home for
a rewrite of the CLI intended to implement an instance system. See:
<https://discourse.sourcecred.io/t/sourcecred-instance-system/244>
Paired with @decentralion.
Test Plan:
Run `yarn backend && node ./bin/sc2.js`, which should nicely fail with a
“not yet implemented” message.
wchargin-branch: cli2-skeleton
As requested by @s-ben, we map now include cred over time for all
contributions, not just contributors. Based on discussion with @Beanow,
we made it an optional field so that we can optionally filter to save
space instead.
I was initially concerned that we wouldn't be able to compute
credOverTime for non-user nodes in CredRank, which is why I left it out.
However, informed by discussions with @mZargham, I'm less concerned
because PageRank (and thus CredRank) is a linear operator on the seed
vector. So, if we want to compute the "cred over time" for individual
contributions in CredRank, we can do so by constructing time-specific
seed vectors (which flow only to activity minting cred in the specified
interval), and the sum of contributions time-scoped cred will be equal
to the non-time-scoped cred. It's good that we'll still have the epoch
nodes for users, as that will allow us to model sponsorship cred flow
dynamics.
cc @wchargin for CredRank considerations.
Test plan: Unit tests updated, `yarn test` passes.
This commit builds on the change in #1789 which made Timestamp
validation optional. Now, it is convenient to use Timestamps
consistently across the codebase. This is split out from #1788.
Test plan: `yarn test` and `git grep "timestampMs: number"`. The only
hit is within the timestamp module itself.
Currently, the Timestamp module requires that all TimestampMs types be
validated via the `fromNumber` method. However, as discussed in #1788,
this creates marginal benefit (just knowing that a TimestampMs is an
integer does not give great confidence that it's a meaningful and
correct timestamp for any use case), and creates a lot of overhead,
especially in test code. It makes adopting the TimestampMs type across
the codebase an arduous and mostly thankless task that would involve
re-writing dozens or hundreds of test instances to add zero actual
safety.
Therefore, I've made the type non-opaque, and renamed `fromNumber` to
`validate`. I then changed existing test code to not use the validator,
since we don't need any manual validation of our test cases, and `123`
is not a meaningful timestamp in any case :)
Test plan: `yarn test`
This commit updates the alias module so that we may convert node
addresses into aliases. Naturally, the node address needs to be some
kind of user node address that is known to the aliasing scheme.
It's a big in-elegant that this creates a "hidden" integration point for
plugins, where plugins creating new user node types should add hardcoded
logic into the identity plugin's alias system. However, it is a
convenience and we currently use this system, so I'm just going to add
this functionality for now, and think about how the alias system should
work long term (or whether we should phase it out) for another
discussion.
This is needed for #1773.
I have `toAlias` return `null` when the address doesn't correspond to a
known aliasing scheme, rather than erroring. I think erroring would be
too harsh, given that it's quite possible that the user has loaded
third-party plugins that haven't registered aliasing schemes upstream
with us. In that case, the application should make a best effort attempt
to proceed without an alias (e.g. fallback to the full address), for
robustness.
Test plan: Unit tests included; `yarn test` passes.
* Add grain allocation module
This adds the `grain/allocation.js` module, which contains logic for computing "grain allocations" based on cred scores.
The resultant Allocation type contains the receipts (who gets how much Grain) and the
strategy informing the payout. We can then include these directly into the ledger
to keep a record of the grain balances.
The code is pretty well tested, hitting a number of potential edge cases in the
distribution logic.
Test plan: Inspect attached unit tests, verify CI passes.
Paired with @decentralion
* Rename "Lifetime" distribution strategy to "Balanced" distribution strategy
Balanced more accurately describes what the strategy is doing since its optimizing
to reduce the "imbalance" of cred scores with total grain earnings.
Test Plan: grep "lifetime" to ensure its not used to refer to a distribution strategy
* Refactor strategy specific logic in distribution function
Cleans ups the distribution function by moving all the logic for transforming data and handling errors
into the respective functions (computeImmediateReceipts and computeBalancedReceipts). Also added a test
to ensure error is thrown on an unsupported strategy with an empty credHistory.
Test Plan: Ensure unit tests pass
* Return an empty distribution if budget is zero
Adds a check for when budget === ZERO to return an empty array of receipts to prevent
having receipts with zero grain
Test Plan: Ensure unit tests pass
* Rename earnings to lifetimeEarningsMap
This makes it more descriptive and explicit that the parameter is supposed to be
the lifetime earnings of each contributor, vs "earnings" which could mean many things
Test Plan: Grep "earnings" (case sensitive) to ensure its no longer used in places
where its representing lifetime earnings of all contributors
* Rename distribution to createGrainDistribution
Also renamed the related types / vars from "Distribution" to "GrainDistribution".
This makes it more descriptive and explicit that it is a distribution of Grain,
not a probability distribution or any other distribution. Prevents potential naming
conflicts / confusion with the core/algorithm/distribution module.
Test Plan: Grep "distribution" to ensure that the grain related code always uses
"GrainDistribution"
* Rename GrainDistribution to GrainAllocation
This simplifies the scope of the grain distribution module
to focus purely on the calculations for the Immediate and
Balanced allocation strategies instead of trying to solve
for the actual "ledger of events". The time-filtering has
been removed with the responsibility being delegated to the
caller of the functions. The actual "GrainDistributed" events
can contain the timestamp and an array of "GrainAllocation"s.
Test Plan: Grep "distribution" to ensure its no longer used
in this PR when referring to a GrainAllocation. Also ensure
all the unit tests pass.
* Move "budget" field from Strategy to GrainAllocation
It feels more appropriate for the budget to be a property
of a GrainAllocation since we are allocating that amount of
grain based on a certain strategy. The strategy type is just
meant to describe how the grain was allocated, not how much
grain was allocated. Its also more appropriate since a "distribution"
will have an array of GrainAllocation's instead of GrainAllocation
having an array of strategies, each with different amounts
of grain budget.
Test Plan: Ensure there are no type errors and that Unit tests are passing
* Simplify test cases for GrainAllocation strategies
There was a lot of duplicated test code for the different
allocation strategies, this consolidates it by using describe.each
to run the same test suites on both strategies.
Test Plan: Ensure that unit tests pass and that both strategies
have test coverage for the common test cases
* Rename "lifetimeEarningsMap" to "lifetimeGrainAllocation" and "credMap" to "immediate/lifetimeCredMap"
Changes the naming of some params / variables to be more
descriptive / accurate.
Test Plan: grep "lifetimeEarnings" to ensure its no longer used anywhere
* Improve clarity for balanced allocation test cases
Adds comments to explain the reasoning behind expected receipts
and update calculations to consistently use the "BUDGET"
variable
Test Plan: Ensure CI tests pass
This command is basically a fork of `cli/scores`, except it outputs the
format described in #1773. I started by copying cli/scores and
sharness/test_cli_scores.t, and made appropraite modifications.
You can check out the example-github-output.json to get a feel for the
new format. I also added a compat header in `analysis/output.js`, and
made the necessary adjustments to the CLI harness.
Test plan: The sharness test runs the real command and saves output in
its success case, looking at that JSON is sufficient. I also manually
ran it on the @sourcecred project.
This commit builds on #1781, adding the logic for computing the first
output format from TimelineCred. See #1773 for context.
Test plan:
The logic is simple, but has a couple interesting edge cases. I've added
unit tests to cover them. `yarn test` passes.
This includes packages that can be upgraded without making changes.
* Chore: minor version upgrades
* Chore: upgrade fs-extra
* Chore: upgrade chalk
* Chore: upgrade jest, babel-jest
Closes#1712Closes#1734
As explored in #1773, this commit adds some output data formats that we
can use to enable data analysis of cred instances, along with powering
new UIs. I've included three output formats in order of increasing
complexity, with the intention that we can ship the simpler output
formats quickly, and then enhance with more information to power better
analyses.
I've tried to ensure forward-compatibility with CredRank, so that we can
migrate to CredRank without needing to make major changes to this API.
I may want to include type information in the output format as well.
Test plan: Human review and consideration. `yarn test` passes.
Thanks to @s-ben and @Beanow for discussion, review, and inspiration.
This adds a bash script that fetches data from our test
Discord instance. We will be able to test against this
data and easily update the data if Discord's api changes.
Test plan:
After running bash script, inspected snapshot files
and verified that the data appears reasonable.
Verified that the check for `jq` and `SOURCECRED_DISCORD_TOKEN` both
fail and exit if `jq` isn't installed or the Discord bot token
hasn't been set.
As of this commit the plugin should fully support EdgeSpec. Meaning the
entries are included in the Graph and Cred computations.
As the champions field does support URLs but not NodeEntry from an
EdgeSpec, we're separating the URL handler from the EdgeSpec handler.
Summary:
This replaces the logo with another SVG document that looks (roughly)
the same but is implemented more cleanly. In particular, the segments of
rays now overlap properly and so are not subject to aliasing, which also
makes it easier to create color variations (e.g., for monochrome).
I would usually optimize this further with SVGO, but this document
appears to reveal a bug in SVGO’s “Round/rewrite paths” optimization
that causes the document to render correctly in Chrome and Firefox but
incorrectly in `rsvg-convert` and Inkscape, so I’m stopping here.
Test Plan:
An animation shows the structure of the new SVG:
![SVG structure animation][gif]
[gif]: https://user-images.githubusercontent.com/4317806/78756263-5da8f880-792f-11ea-9fd1-8e1380e3c530.gif
wchargin-branch: logo-overlapping-paths
Removes extraneous whitespace in comments in the Discord Plugin.
Test Plan:
Grepping `/*` in the `src/plugins/discord` directory should show
6 instances of this comment style, all of which should have no
whitespace after the final line of content.
`yarn test` passes.
Because the `Initiative` type now supports `EdgeSpec`, we're no longer
discarding entries when converting between `InitiativeFile` and
`Initiative`.
To make this commit smaller and easier to review, we're not yet adding
support to add `NodeEntry` to the graph though, instead
`createWeightedGraph` ignores the entries for now.
An additional change here is we're allowing more keys to be omitted in
the JSON format. This is both intuitive for data entry, and safer in
terms of Flow types (as JSON.parse returns any).
The test examples now cover a v0.1.0 (initiative-A), v0.2.0 with just
URLs (initiative-B) and one with just entries (initiative-C).
To make this commit smaller and easier to review, we're not yet adding
`EdgeSpec` to the `Initiative` type and will ignore the entries when
converting from `InitiativeFile` to `Initiative`.
Defines the NodeAddress format we want to use for a NodeEntry.
Because we want to use the parent InitiativeId as parts of the address,
we'll need to read the underlying string[], changing the opaque type.
This commit modifies the resolveAlias function in the identity plugin's
alias module so that it now allows you to convert sourcecred identity
aliases (e.g. "sourcecred/user") back into NodeAddresses. This will be
necessary so that we can convert our ad-hoc distributions and transfers
(which use aliases, including SourceCred identity aliases) into the more
robust formats for the productionized grain ledger, which use full node
addresses.
I did a slight refactor on the identity module to expose a method for
constructing the address without the rest of the node.
Test plan: `yarn test`
Summary:
Version 5.0.0 of `better-sqlite3` redesigned the `Database.transaction`
method to do exactly what we want, obviating the need for our custom
helper function and tests. It’s actually more flexible: it allows the
transaction function to take arguments, and allows nested transactions.
Test Plan:
All tests pass as written. If this patch is instead made by changing the
implementation of `_inTransaction` to `return db.transaction(fn)()`,
most tests still pass, excepting only some tests about the behavior of
`_inTransaction` itself around transaction nesting and manual rollbacks;
those behavior changes are acceptable, especially given that they’re not
exercised in the code under test.
wchargin-branch: mirror-builtin-transaction
Summary:
I first wrote these type definitions for v4.x.x. The library API has
changed since then. This patch updates the type definitions to match the
current API as of the v7.0.0 release:
<https://github.com/JoshuaWise/better-sqlite3/blob/v7.0.0/docs/api.md>
There are breaking changes, but none among functions that we use. On the
other hand, there are new features that will be useful to us.
Test Plan:
Running `yarn flow` still passes. There may be some errors among typing
of functions that we don’t actually use (particularly `aggregate`, which
is more complicated than the others). If so, we can cross those bridges
when we come to them.
wchargin-branch: flow-better-sqlite3-v7-api
Summary:
In `_findOutdated`, we bound a query parameter that was not used by the
query. This is entirely harmless, but should still be fixed.
Test Plan:
That unit tests continue to pass suffices.
wchargin-branch: mirror-findoutdated-superfluous-param
Summary:
This fixes a bug introduced in #1665, where we added a `typenames`
clause to the query plan but didn’t update the termination checking
accordingly. As a result, query plans with only `typenames` left to
update would not execute, so `extract` would fail with a `NOT NULL`
constraint violation because not all transitively needed objects had
been fetched.
Databases created before this change are still valid. Re-running the
problematic `sourcecred load` command should successfully update the
cache and proceed.
Fixes#1762.
Test Plan:
Regression test added; `aracred/ideas` and `aracred/aracred-template`
both load successfully now.
wchargin-branch: mirror-typenames-only-plan
Adds a new NodeType for each NodeEntryField. Allowing multipliers to be
set per field.
Since the "contributes to" edge is very generic and created a naming
conflict, this includes a slightly awkward "contributes to entry" addition.
Including "entry" to differentiate from the existing edge type.
The edge weights for this loosely follow the current edge weight rationale.
When we've merged enough functionality to do in-browser testing of different
weights with an example scenario, I think we should revisit them.
This type will replace the current `$ReadOnlyArray<URL>` on the Initiative
fields to support NodeEntry arrays as well. Like with the NodeEntryJson
they will have a manual-entry convenience flavor and an internally
normalized one.
The normalization function aims to throw errors that help users notice any
input mistakes.
This is where most flexibility when hand-writing JSON files is expected
to come from. As it makes few assumptions about the formatting but the
internal normalized type is still consistent.
This is the generalized type that allows us to define contributions to an
Initiative from the same JSON file as the Initiative. See #1739.
The types distinguish between what a user is expected to enter and what
this is internally normalized to. The normalization logic is implemented
in a follow-up PR.
Often we can use representations like Set to avoid duplicates in the first
place. However when duplicates are not allowed in some kind of user input
we may want to present them with a useful error message.
This util will find the duplicate elements, so they can be highlighted
in an error.
Add a numerically-naive method for calculating the floating point ratio
between grain values, as it is needed in #1743.
Following discussion in [this review], we hope that @wchargin will
re-write this method later to have better precision.
Test plan: Attached unit tests should pass.
[this review]: https://github.com/sourcecred/sourcecred/pull/1715#discussion_r396909459
Paired with @hammadj
Adds a class to persist a local mirror of data from a Discord Guild.
Implements create and read functionality.
Adds a function to `models` which converts an Emoji string-reference
into an Emoji object.
Test plan:
Unit tests added.
Paired with: @beanow
By adding information about configuration and what identity contractions
do, it provides more reference for instance maintainers and community
members to understand the plugin.
Includes rephrasing and feedback from @s-ben and @vsoch.
Closes#1725