This commit adds a `grain/grain.js` module, which contains a type and
logic for representing Grain balances with 18 digits of precision. We
use the native BigInt type (and add the necessary babel plugin to
support it).
Unfortunately, Flow does not yet support BigInts (see
[facebook/flow#6639]). To hack around this, we lie to Flow, claiming
that BigInts are numbers, and we expect/suppress the flow errors
whenever we actually instantiate one. For example:
```js
// $ExpectFlowError
const myBigInt = 5n;
```
We can use the BigInt operators like `+`, `-`, `>` without flow errors,
since these actually exist on numbers too. However, flow will fail to
detect improper combinations of regular numbers and BigInts:
```js
// $ExpectFlowError
const x = 5n;
const y = x + 5;
// Uncaught TypeError: Cannot mix BigInt and other types
```
Since any improper mixing will result in a runtime error, these issues
will be easy to detect via unit tests.
In addition to adding the basic Grain type, I exported a `format`
function which will display Grain balances in a human readable way.
It supports arbitrary decimal precision, groups large amounts with comma
separators, handles negative numbers, and adds a suffix string.
The format method is thoroughly documented and tested. Thanks to @Beanow
for valuable feedback on its implementation.
Test plan: See included unit tests. `yarn test` passes.
[facebook/flow#6639]: https://github.com/facebook/flow/issues/6639
Found this incorrectly encoded %93 in an actual forum post.
This change will make it so we will print the error and ignore the URL
for reference detection, rather than crash.
Preparing for release #1679
Note: due to a regression, not upgrading eslint-plugin-react
See https://github.com/yannickcr/eslint-plugin-react/issues/2570
Also updated package.json to latest semver in-range versions.
Note, this changes all packages (other than eslint-plugin-react)
to ^x.x.x format.
This commit modifies the default weights in the Discourse plugin. The
overall theme is to make the plugin flow less cred to "raw activity", in
favor of only flowing cred to posts where there is some explicit signal
that they were valuable.
Most significantly, we move over to fully [like-minted cred][1], instead
of minting cred directly to posts and topics. Also, we remove the edges
that tend to flow cred to posts indiscriminately. For example, topics no
longer flow cred to every post within the topic.
Based on local testing on a few forums I'm familiar with, I feel
confident that these cred scores are an improvement over the current
defaults, as we now have a few "real life test cases" of high-noise,
high-activity users, and these weights reduce the amount of cred that
accrues to such "stress testers". With these changes, I feel that we can
start cautiously using the Discourse plugin in [Trust Level 2][2]
communities.
Test plan: `yarn flow` and code inspection are sufficient to verify that
the new weights are technically valid. Because my calibration process
for validating these changes involves subjective judgements about
contributions from real people, I'm declining to publicly post any
specifics; reviewers are welcome to reach out to my offline for further
discussion.
[1]: https://discourse.sourcecred.io/t/minting-discourse-cred-on-likes-not-posts/603
[2]: https://github.com/sourcecred/docs/pull/24
Summary:
This is mostly a QoL improvement for maintainers. GraphQL mirrors are
stored in the SourceCred cache directory, but until now have been hard
to tell apart. All IDs looked like long, similar hex strings; e.g.:
```
github_736f75726365637265642d746573742f6578616d706c652d676974687562.db
github_736f75726365637265642f646f6373.db
github_736f75726365637265642f736f7572636563726564.db
```
It’s too hard to tell that these are `sourcecred-test/example-github`,
`sourcecred/docs`, and `sourcecred/sourcecred` in that order. (Yes, you
can work it out if you try; that’s not good enough.) With dozens of
caches loaded, finding the right one to poke at for debugging or
progress-checking takes way too much scripting.
The purpose of this abstruse encoding was portable filename safety, but
we can actually achieve this with human-readable names like these:
```
github_sourcecred-test_example_github.db
github_sourcecred_docs.db
github_sourcecred_sourcecred.db
```
This suffices because (a) login names cannot contain underscores, so we
can safely use that as a separator, and (b) GitHub disallows collisions
on names that are equal ignoring case, so we can convert all names to
lowercase without introducing collisions.
This change orphans existing caches. Running `sourcecred clear --cache`
will clean those up.
Test Plan:
Try to create a GitHub repository with the same name as a repository
that you already have, but with inverted case; note that this is
disallowed. Then, note that loading `sourcecred-test/example-github`
still works and produces a database with nicely readable name.
wchargin-branch: github-readable-db-names
Summary:
The `_addCommit` helper recurs against the list of parents of a commit.
We add pull requests to the repository in ascending numeric order, so in
many cases the depth of this recursion is bounded, because each PR’s
merge commit is only a few levels down from the previous one’s. But this
is not the case for repositories that have long sequences of commits
none of which was merged by a pull request. In such cases, we’ll simply
blow the stack.
To complicate matters, the conditions in which we observe this are hard
to predict, because V8 optimizes `_addCommit` when it determines that
it’s a hotspot. This means that the tests are a bit brittle, with
carefully chosen constants that balance test execution time against
regression-catching power. See [comment on #1354 with my analysis][c].
[c]: https://github.com/sourcecred/sourcecred/issues/1354#issuecomment-593062805Fixes#1354.
Test Plan:
Tests pass as written, and fail if `commitStack.push(parent)` is
replaced with `this._addCommit(parent)`, even if the two `it`-blocks
under the new `describe` spec are commuted. An end-to-end test shows
that we can now compute cred for `passbolt/passbolt_api` (which takes
about an hour to load and compute).
wchargin-branch: rv-fix-addcommit-overflow
* Add serialization to TimelineCredScores
Right now the serialization is super trivial, we just attach a compat
header. In the future, we can try encoding the float64 arrays as
bytestrings to save space.
Test plan: Unit tests included; `yarn test` passes.
* Fix up serialization per @Beanow's review
Summary:
The GraphQL Mirror module now supports fidelity annotations, so we can
remove the hard-coded object ID blacklist in favor of specifying which
fields are unfaithful. We no longer need to maintain the blacklist, and
we also successfully load data from even the formerly blacklisted nodes.
Closes#998.
Test Plan:
The following repositories\* previously could not load GitHub data without
a blacklist, and now all load successfully (network load times in
parentheses):
- (00:21:08) `ReactTraining/react-router`
- (00:14:05) `axios/axios`
- (00:02:58) `babel/babel-eslint`
- (00:01:35) `chimurai/http-proxy-middleware`
- (00:40:39) `eslint/eslint`
- (00:37:39) `facebook/jest`
- (00:36:01) `lodash/lodash`
- (00:05:52) `lovell/sharp`
- (00:49:08) `passbolt/passbolt_api`
- (00:27:24) `prettier/prettier`
- (00:16:18) `quasarframework/quasar`
- (00:01:57) `quasarframework/quasar-cli`
- (00:04:32) `recharts/recharts`
- (00:10:20) `sass/node-sass`
- (00:07:46) `sinonjs/sinon`
- (01:09:06) `twbs/bootstrap`
- (00:29:02) `vuejs/vue`
- (00:05:44) `vuejs/vuex`
- (00:05:01) `webpack-contrib/css-loader`
- (00:46:58) `webpack/webpack`
- (00:11:28) `webpack/webpack-dev-server`
- (00:09:34) `yannickcr/eslint-plugin-react`
All of these also compute cred correctly, with the following exceptions:
- `passbolt/passbolt_api` hits a stack overflow in the relational
view’s `_addCommit`;
- `twbs/bootstrap` hits a string overflow in `storeProject`.
\* List generated by running the following command on the old blacklist:
```
<src/plugins/github/blacklistedObjectIds.js \
awk '/^[^ ]/ { p = 0 }; p { gsub(".*// ", ""); print }; /reactions/ { p = 1 }' |
grep -Po '(?<=github.com/)[^/]*/[^/]*' | sort -u
```
wchargin-branch: github-fidelity
Summary:
Now that the database schema and logic are prepared to handle objects of
unknown typenames, we can support schemata that have unfaithful fields
by simply not requesting their typenames.
Test Plan:
Unit tests included.
wchargin-branch: mirror-unfaithful
Summary:
The database now stores objects without typenames, so we can emit
requests for those typenames in our GraphQL queries.
Test Plan:
Unit tests added; they’re lighter-weight than their siblings only
because querying typenames is intrinsically simpler than querying own
data or connections (in particular, the typenames query is a constant).
wchargin-branch: mirror-typename-queryfromplan
Summary:
The `objects.typename` column is now nullable, and `registerObject`
permits a `null` typename argument. Other methods are updated to avoid
relying on typenames being non-`null`, which is straightforward in all
cases.
Test Plan:
Unit tests included. Audited all queries selecting from `objects` to
verify that they behave correctly with missing typenames.
wchargin-branch: mirror-typename-storage
Summary:
The internal `UpdateResult` structure now lists IDs of objects whose
typename has been queried. This list is expected to be empty for now.
Test Plan:
Unit tests added.
wchargin-branch: mirror-typename-updateresult
Summary:
The internal `QueryPlan` structure now lists IDs of objects whose
typename is to be queried. This list is expected to be empty for now.
Test Plan:
Unit tests included.
wchargin-branch: mirror-typename-queryplan
Summary:
Fields in a GraphQL schema may now declare themselves as “unfaithful”,
indicating that the server is known to return responses with incorrect
types for that field. Future changes will teach the Mirror module to
query these fields more carefully, handling such incorrect responses
robustly. See the [tracking issue] and [project initiative] for details.
[project initiative]: https://discourse.sourcecred.io/t/graphql-mirror-fidelity-awareness/275
[tracking issue]: https://github.com/sourcecred/sourcecred/issues/998
This change is source-compatible and data-incompatible: the APIs are
backward-compatible, but the schema representation (JSON serialized
form) has changed, and so the `Mirror` constructor will reject old
caches.
Test Plan:
Unit tests included.
wchargin-branch: schema-fidelity
Using a required type of before and after completion weight is a simple
way to start minting Cred on Initiatives. It sets expectations by having
both states defined in a version controlled file.
Eventually all plugins are expected to use the ReferenceDetector. This
commit composes the Github and Discourse detectors we can create, for now
exposing it as unused to `createPluginGraphs`.
Adding parameters to Project.
Assumes we're loading an InitiativesDirectory. We're not including the local path here, as this is environment dependent. It should be passed as an ENV or CLI parameter instead.
Summary:
Because `fromObject` does not mutate its input, it’s safe to accept
read-only inputs. This is needed for parts of the CredRank Markov
process graph implementation.
Test Plan:
Typing test added.
wchargin-branch: maputil-readonly-input
Summary:
Most of the blacklisted reactions helpfully link to their original
source, but some don’t. This patch adds the missing links.
Test Plan:
The following command now prints a URL on every line:
```
<src/plugins/github/blacklistedObjectIds.js \
awk '/^[^ ]/ { p = 0 }; p { gsub(".*// ", ""); print }; /reactions/ { p = 1 }'
```
wchargin-branch: blacklist-urls
This commit makes several small improvements to the distributionToCred
module:
- We rename the output `FullTimelineCred` data structure to
`TimelineCredScores`, which is more descriptive
- We re-organize that data structure so that rather than being an array
of `{interval, cred}` objects, it has an `intervals` property and a
`intervalCredScores` property, both of which are arrays. This will make
downstream usage cleaner.
- An unused variable is removed.
- We document invariants about the TimelineCredScores data type.
- We mark the TimelineCredScores data type opaque, so that clients
recieving a TimelineCredScores can trust that the invariants are
maintained.
Test plan:
- The rename is robustly tested by `yarn flow`.
- That the refactor lands without changing existing semantics is
robustly tested by `yarn test --full`, since we snapshot a full cred
load; thus we know that cred scores haven't changed. (Also, we have
existing unit tests).
- The newly documented invariants aren't robustly tested by the test
code, but it's easy to see that they hold by reading the algorithm.
Helper functions intended to be used in succession by loadDirectory.
Only `_validateUrl` provides helfpul error messages. It's the caller's
responsiblity to do this first.
Helper functions intended to be used in succession by `loadDirectory`.
Only `_validatePath` provides helfpul error messages. It's the caller's
responsiblity to do this first.
Introduces dependency `globby` for globbing with a Promises API.
The private function `_initiativeFileId` will be used as a helper to
load a directory. The public function `initiativeFileURL` will be used
to add links to the remote file in the node description.
This commit modifies the api declaration file so that it includes
`core.weights` and `core.weightedGraph`. These are both important
modules for interfacing with Cred (loading a WeightedGraph, and defining
custom weights), so it's natural for them to be included in the API.
Test plan: Change is a trivial extension of the pattern in api.js. `yarn
flow` passes, and I believe no further testing is _required_; however, I
will use this in an Observable notebook prior to merging.
Based on [forum discussion][1], Initiatives should be tracked in files.
The main issue with storing the existing Initiative type as JSON in a
file, is there's no natural NodeAddress or URL for a file-based tracker.
This type resolves that by using the file name within a directory as a
unique reference and requiring a remoteUrl for referencing. (See #1640)
[1]: https://discourse.sourcecred.io/t/576
"Trackers" were an idea to let Initiatives be aware of the medium that
declares it. Such as a Discourse topic or file.
With Discourse in mind this was really useful. We could add an automatic
contribution edge, enhance reference detection to "upgrade" a URL pointing
to that Topic to resolve to the Initiative instead, etc.
Using files as the only source of Initiatives this becomes less relevant.
So in the interest of reducing complexity, we'll remove tracker awareness.
Previously we were relying on the `Initiative.tracker` to define the
address of an Initiative. Based on feedback at #1643 we want to remove
trackers. So we'll need a replacement ID.
This will enforce a new InitiativeId convention. As well as how to derive
a NodeAddressT from it. In a follow-up we'll remove the tracker concept.
Currently similar code to read/write Compatible JSON files is copy
pasted across the code. This takes some common practices and provides
a generic utility for it.
Correct Flow type usage can't be detected if the JSON type is opaque
though. GraphJSON is an example of this, so removed the opaque for a
smoke test.
There was a fair amount of copy-pasted lines in these tests. Which is mostly a
good thing, because ESlint and Flow provide good errors when you're using the
variables wrong. But in terms of readability wasn't great.
In upcoming PRs we'll add more to these test files. So I thought it was good to
improve this first.
This version:
- Still copy-pastes to get good ESlint and Flow errors.
- Doesn't repeat itself when it's not for better errors or readability.
- Uses a "spyBuilder", for readability in spite of prettier trying to collapse lines.
- Makes sure duplicates are exactly duplicates, easier to edit in IDEs.
example: `[githubDeclaration, "githubDeclaration"]`
instead of `[fakeGithubDec, "fake-github-dec"]`
Test plan: `yarn test`
Reviewer note: recommend looking at the split diff on Github, not the unified one.
Summary:
Upcoming changes will add support for field-level fidelity annotations
(see #998), at which point the `Reaction.user` field will be marked
unfaithful. This patch will surface that behavior change.
Test Plan:
The newly snapshotted query is valid, and returns a reaction whose
`user` property lists typename and ID.
wchargin-branch: mirror-snapshot-reaction
This commit moves a lot of code and algorithms for computing timeline
cred scores into `core/algorithm`. The `TimelineCred` module hasn't been
moved, because it isn't clean enough for core -- it has dependencies on
analysis and types, for example.
This is another material step towards consolidating all of the
SourceCred algorithm logic into `core/algorithm`, although there's still
more to be done.
Test plan: It's just a code reorg; `yarn test` is sufficient.
This commit modifies the timelinePagerank module so that it no longer
takes in node/edge types. Instead, the timelinePagerank just takes a
WeightedGraph and uses weights from that WeightedGraph. This is a key
part of decoupling the core cred computation logic from the plugin
logic, as described in #1557.
I also modified the timelinePagerank module's immediate dependencies
(the weightEvaluator module) to do the same. Since the weight evaluators
now have a simpler contract (no overriding, etc), the unit tests have
been simplified.
Test plan: It's a simple refactor, so `yarn test` should be sufficient.
As a bit of added caution, I manually tested changing weights in the
frontend, and verified that cred updates as expected.
This was the last usage of strings as tokens. Other than the edges of
the system, like the cli and bin code which read the arguments.
Meaning the tokens should now always be validated.
Closes#1626
As part of my cleanup to make it easy to document and re-implement the
SourceCred algorithm, I want a place in core where we can consolidate
the js implementation. I'm renaming `core/attribution` to
`core/algorithm` to make this clearer.
Test plan: It's just a rename. `yarn test` passing is sufficient to
assure us of correctness.
TimelineCred has a `reduceSize` method which discards cred for most
nodes, keeping only cred for the top nodes of each type across all time.
I've wanted to remove this for a while, because it is a bad fit for the
kind of experimentation we're starting to do with showing the top nodes
for recent activity periods. Since the recent nodes haven't had much
time to accumulate cred, they are almost all discarded by reduceSize. As
an added inducement, I want to get rid of reduceSize for #1557 because
it requires type information.
`reduceSize` still serves one function, which is enabling the frontend
to load faster because it only loads a smaller amount of data which is
discoverable in the UI. However, it doesn't make sense to discard most
of the data just for a fast UI load--we can later make another data
structure which is tuned for the needs of the frontend, and have that
data structure include only summary statistics.
This will make the cred.json file much larger for large repos, e.g. I
expect that loading tensorflow/tensorflow would now go over the 100MB
hard cap for GitHub pages. However, right now we're prioritizing using
SourceCred for medium-small projects (e.g. SourceCred itself, and maybe
Maker), so this isn't a current concern.
Test plan: `yarn test --full` passes.
This commit fixes an issue introduced in #1625, which caused the
`state.test.js` file to print some unhandled console errors when
running `yarn unit`.
First, this commit changes the file so that it properly errors if any
unexpected console errors are printed. Then it fixes the erroring tests.
Test plan: `yarn test` passes; `yarn unit --watch legacy/state.test.js`
no longer prints any error messages to console.