1305 Commits

Author SHA1 Message Date
William Chargin
976eaf05ec
github: standardize blacklist URL comments (#1660)
Summary:
Most of the blacklisted reactions helpfully link to their original
source, but some don’t. This patch adds the missing links.

Test Plan:
The following command now prints a URL on every line:

```
<src/plugins/github/blacklistedObjectIds.js \
awk '/^[^ ]/ { p = 0 }; p { gsub(".*// ", ""); print }; /reactions/ { p = 1 }'
```

wchargin-branch: blacklist-urls
2020-02-14 00:11:46 -07:00
Dandelion Mané
8dac968a69 Improve distributionToCred (#1654)
This commit makes several small improvements to the distributionToCred
module:

- We rename the output `FullTimelineCred` data structure to
`TimelineCredScores`, which is more descriptive
- We re-organize that data structure so that rather than being an array
of `{interval, cred}` objects, it has an `intervals` property and a
`intervalCredScores` property, both of which are arrays. This will make
downstream usage cleaner.
- An unused variable is removed.
- We document invariants about the TimelineCredScores data type.
- We mark the TimelineCredScores data type opaque, so that clients
recieving a TimelineCredScores can trust that the invariants are
maintained.

Test plan:
- The rename is robustly tested by `yarn flow`.
- That the refactor lands without changing existing semantics is
robustly tested by `yarn test --full`, since we snapshot a full cred
load; thus we know that cred scores haven't changed. (Also, we have
existing unit tests).
- The newly documented invariants aren't robustly tested by the test
code, but it's easy to see that they hold by reading the algorithm.
2020-02-09 11:58:19 -08:00
Robin van Boven
db94bb50fb
Initiatives: implement loadDirectory (#1649)
It's tests are primarily smoke tests, as the underlying helper functions
have been tested more extensively.
2020-02-09 12:56:42 +01:00
Robin van Boven
32fe756b5b Initiatives: implement conversion from InitiativeFiles (#1648)
Helper functions intended to be used in succession by loadDirectory.
Only `_validateUrl` provides helfpul error messages. It's the caller's
responsiblity to do this first.
2020-02-09 12:49:07 +01:00
Robin van Boven
4a4c35bfdc
Initiatives: validate and read files from local directory (#1644)
Helper functions intended to be used in succession by `loadDirectory`.
Only `_validatePath` provides helfpul error messages. It's the caller's
responsiblity to do this first.

Introduces dependency `globby` for globbing with a Promises API.
2020-02-09 00:46:23 +01:00
Robin van Boven
803a752d80 Initiatives: create and parse InitiativeFile InitiativeId (#1643)
The private function `_initiativeFileId` will be used as a helper to
load a directory. The public function `initiativeFileURL` will be used
to add links to the remote file in the node description.
2020-02-09 00:42:56 +01:00
Dandelion Mané
579554d265
Add core.weights and core.weightedGraph to api (#1652)
This commit modifies the api declaration file so that it includes
`core.weights` and `core.weightedGraph`. These are both important
modules for interfacing with Cred (loading a WeightedGraph, and defining
custom weights), so it's natural for them to be included in the API.

Test plan: Change is a trivial extension of the pattern in api.js. `yarn
flow` passes, and I believe no further testing is _required_; however, I
will use this in an Observable notebook prior to merging.
2020-02-08 15:05:06 -08:00
William Chargin
8c47dd1c14
mirror: add implementation docs for network_log (#1608)
Summary:
Follow-up to #1562. These docs are not user-facing.

Test Plan:
None.

wchargin-branch: mirror-network-log-impldocs
2020-02-08 08:57:53 -08:00
Robin van Boven
f924521fdd Add InitiativesDirectory and InitiativeFile types (#1641)
Based on [forum discussion][1], Initiatives should be tracked in files.

The main issue with storing the existing Initiative type as JSON in a
file, is there's no natural NodeAddress or URL for a file-based tracker.
This type resolves that by using the file name within a directory as a
unique reference and requiring a remoteUrl for referencing. (See #1640)

[1]: https://discourse.sourcecred.io/t/576
2020-02-08 09:00:30 +01:00
Robin van Boven
84e658143e Initiatives: replace trackers with InitiativeIds (#1647)
"Trackers" were an idea to let Initiatives be aware of the medium that
declares it. Such as a Discourse topic or file.

With Discourse in mind this was really useful. We could add an automatic
contribution edge, enhance reference detection to "upgrade" a URL pointing
to that Topic to resolve to the Initiative instead, etc.

Using files as the only source of Initiatives this becomes less relevant.
So in the interest of reducing complexity, we'll remove tracker awareness.
2020-02-08 08:55:07 +01:00
Robin van Boven
8eb9312277
Initiatives: create InitiativeId concept (#1646)
Previously we were relying on the `Initiative.tracker` to define the
address of an Initiative. Based on feedback at #1643 we want to remove
trackers. So we'll need a replacement ID.

This will enforce a new InitiativeId convention. As well as how to derive
a NodeAddressT from it. In a follow-up we'll remove the tracker concept.
2020-02-08 08:46:32 +01:00
Robin van Boven
1a58745d3b Create compatIO utility functions (#1637)
Currently similar code to read/write Compatible JSON files is copy
pasted across the code. This takes some common practices and provides
a generic utility for it.

Correct Flow type usage can't be detected if the JSON type is opaque
though. GraphJSON is an example of this, so removed the opaque for a
smoke test.
2020-02-08 08:39:06 +01:00
Robin van Boven
b1a6865985
Improve test readability of loadContext and pluginLoaders (#1650)
There was a fair amount of copy-pasted lines in these tests. Which is mostly a
good thing, because ESlint and Flow provide good errors when you're using the
variables wrong. But in terms of readability wasn't great.

In upcoming PRs we'll add more to these test files. So I thought it was good to
improve this first.

This version:
- Still copy-pastes to get good ESlint and Flow errors.
- Doesn't repeat itself when it's not for better errors or readability.
- Uses a "spyBuilder", for readability in spite of prettier trying to collapse lines.
- Makes sure duplicates are exactly duplicates, easier to edit in IDEs.
example: `[githubDeclaration, "githubDeclaration"]`
instead of `[fakeGithubDec, "fake-github-dec"]`

Test plan: `yarn test`

Reviewer note: recommend looking at the split diff on Github, not the unified one.
2020-02-08 08:35:39 +01:00
William Chargin
86c31415c7
mirror: add reaction to snapshot test query (#1615)
Summary:
Upcoming changes will add support for field-level fidelity annotations
(see #998), at which point the `Reaction.user` field will be marked
unfaithful. This patch will surface that behavior change.

Test Plan:
The newly snapshotted query is valid, and returns a reaction whose
`user` property lists typename and ID.

wchargin-branch: mirror-snapshot-reaction
2020-02-07 20:54:02 -08:00
greenkeeper[bot]
d6a2618e9e
Update flow-bin to the latest version 🚀 (#1645)
* chore(package): update flow-bin to version 0.118.0

* chore(package): update lockfile yarn.lock
2020-02-07 00:15:26 -08:00
Dandelion Mané
c9958e346e
Move timeline pagerank code to core/algorithm (#1632)
This commit moves a lot of code and algorithms for computing timeline
cred scores into `core/algorithm`. The `TimelineCred` module hasn't been
moved, because it isn't clean enough for core -- it has dependencies on
analysis and types, for example.

This is another material step towards consolidating all of the
SourceCred algorithm logic into `core/algorithm`, although there's still
more to be done.

Test plan: It's just a code reorg; `yarn test` is sufficient.
2020-02-05 11:25:06 -08:00
Dandelion Mané
5b2f38114b
Remove types from TimelinePagerank (#1627)
This commit modifies the timelinePagerank module so that it no longer
takes in node/edge types. Instead, the timelinePagerank just takes a
WeightedGraph and uses weights from that WeightedGraph. This is a key
part of decoupling the core cred computation logic from the plugin
logic, as described in #1557.

I also modified the timelinePagerank module's immediate dependencies
(the weightEvaluator module) to do the same. Since the weight evaluators
now have a simpler contract (no overriding, etc), the unit tests have
been simplified.

Test plan: It's a simple refactor, so `yarn test` should be sufficient.
As a bit of added caution, I manually tested changing weights in the
frontend, and verified that cred updates as expected.
2020-02-05 11:14:04 -08:00
Robin van Boven
9deeb93142
Github: use GithubToken type in fetchGithubRepo (#1636)
This was the last usage of strings as tokens. Other than the edges of
the system, like the cli and bin code which read the arguments.
Meaning the tokens should now always be validated.

Closes #1626
2020-02-05 19:59:54 +01:00
Dandelion Mané
36264ed85b
Rename core/attribution to core/algorithm (#1631)
As part of my cleanup to make it easy to document and re-implement the
SourceCred algorithm, I want a place in core where we can consolidate
the js implementation. I'm renaming `core/attribution` to
`core/algorithm` to make this clearer.

Test plan: It's just a rename. `yarn test` passing is sufficient to
assure us of correctness.
2020-02-05 10:51:57 -08:00
Dandelion Mané
e97a1209d0
Remove TimelineCred.reduceSize (#1633)
TimelineCred has a `reduceSize` method which discards cred for most
nodes, keeping only cred for the top nodes of each type across all time.
I've wanted to remove this for a while, because it is a bad fit for the
kind of experimentation we're starting to do with showing the top nodes
for recent activity periods. Since the recent nodes haven't had much
time to accumulate cred, they are almost all discarded by reduceSize. As
an added inducement, I want to get rid of reduceSize for #1557 because
it requires type information.

`reduceSize` still serves one function, which is enabling the frontend
to load faster because it only loads a smaller amount of data which is
discoverable in the UI. However, it doesn't make sense to discard most
of the data just for a fast UI load--we can later make another data
structure which is tuned for the needs of the frontend, and have that
data structure include only summary statistics.

This will make the cred.json file much larger for large repos, e.g. I
expect that loading tensorflow/tensorflow would now go over the 100MB
hard cap for GitHub pages. However, right now we're prioritizing using
SourceCred for medium-small projects (e.g. SourceCred itself, and maybe
Maker), so this isn't a current concern.

Test plan: `yarn test --full` passes.
2020-02-05 10:46:17 -08:00
Dandelion Mané
378255e91d
Fix console errors from state.test.js (#1628)
This commit fixes an issue introduced in #1625, which caused the
`state.test.js` file to print some unhandled console errors when
running `yarn unit`.

First, this commit changes the file so that it properly errors if any
unexpected console errors are printed. Then it fixes the erroring tests.

Test plan: `yarn test` passes; `yarn unit --watch legacy/state.test.js`
no longer prints any error messages to console.
2020-02-05 10:45:35 -08:00
Robin van Boven
9cf412c437
Github: remove unused loaders (#1635) 2020-02-05 19:28:26 +01:00
Robin van Boven
3c971ebaef
Discourse: remove unused loaders (#1634) 2020-02-05 19:21:14 +01:00
Robin van Boven
e48902cfc9
Remove deprecated load function (#1630)
One of several cleanup commits. See #1629.

The previously created loadContext (see #1622) function will be used
instead and renamed to be load.

We're removing significant amounts of test code as well. The individual
components that make up the replacement function already provide the
needed coverage, like `dataDirectory.test.js` and `loadContext.test.js`.
The integration provided by this function is also covered through the
sharness `test_load_example_github.t`.
2020-02-05 11:27:43 +01:00
Robin van Boven
361961c59f
Switch to LoadContext based loading (#1622)
This naming is temporary, as the old loading code is removed it
will be named load and replace the existing function.
2020-02-04 22:28:24 +01:00
Robin van Boven
f3cfab636e
Backend: implement LoadContext (#1621)
Follows the general outline of #1586.
It uses a new trick of aliasing external module functions as
private properties. This makes the spyOn / mock tests more robust,
while fitting in the composition responsibility.
2020-02-04 22:14:21 +01:00
Robin van Boven
5fd43bf62d
Backend: implement PluginLoaders.contractPluginGraphs (#1619)
Note: this doesn't include a WeightedGraph.overrideWeights step.
Because overriding weights isn't related to plugins, this will be
handled as a separate feature, later in the load pipeline.
2020-02-04 21:08:51 +01:00
Robin van Boven
ffd6c38e4c
Backend: implement PluginLoaders.createPluginGraphs (#1618)
Similar to CachedProject, we're using an opaque PluginGraphs return
type. Because only PluginLoaders can add the semantic of this being
all plugin graphs, and to use this semantic in future functions.
2020-02-04 21:02:47 +01:00
Robin van Boven
4c53558c65
Backend: implement PluginLoaders.updateMirror (#1617)
Note, the return type is a CachedProject. See #1586 for discussion.
Having this type allows us to create new functions with a semantic
of requiring the project is mirrored into cache. It is opaque,
because only the "all plugins" semantic which PluginsLoaders has
could know when mirroring of a Project has been completed.

Additionally MirrorEnv is not a strict type. We're expecting this
to be a subset of parameters. We'll use Flow to ensure we only use
the ones we need from it.
2020-02-04 20:06:50 +01:00
Dandelion Mané
1073374dc7
Frontend gets plugins from disk, not TimelineCred (#1625)
This commit modifies the frontend so that it now pulls plugin
declarations from disk, rather than from the TimelineCred. This will
allow us to decouple the TimelineCred from the PluginDeclarations, which
is another step towards #1557.

As proof that the frontend no longer gets plugins from the TimelineCred,
I removed the public `plugins()` method on TimelineCred.

Test plan: The frontend is somewhat sketchily tested. `yarn test`
passing is good, manual inspection of the frontend is also necessary;
I've done this.
2020-02-04 10:44:42 -08:00
Dandelion Mané
aeaa945a27
api/load: save plugin declarations to disk (#1624)
This builds on #1623 and is another step towards separating cred
computation from plugin declarations, as described in #1557. Basically,
this will allow the frontend to get plugin declarations even if the
TimelineCred computation never saw them.

This commit modifies `api/load`, and adds a new facility to
`DataDirectory` for saving the PluginDeclarations (which will be used by
@Beanow's in-flight refactor of `api/load`).

Test plan: See included unit tests, also try loading a project and
inspect the newlys saved file.
2020-02-04 10:30:14 -08:00
Robin van Boven
806919a6cc
Backend: add ComputeFunction (#1620)
Similar to PluginLoaders, this accepts an interface like
TimelineCred.compute. During a load, we have the added
responsiblity of using the TaskReporter.

This lets us mock the concrete TimelineCred.compute, while testing
just the extra functionality. The responsiblity to compose this
with the concrete TimelineCred.compute lies with LoadContext.
2020-02-04 12:45:48 +01:00
Robin van Boven
492f2ff6b4
Github: add fetchGithubRepoFromCache (#1569)
Updating the mirror and using the mirror data should be separated.
As unified reference detection requires the mirror of all plugins are
updated. Upcoming refactors to the load system will solidify this.

While this refactor is ongoing, we will ignore the fetchGithubRepo
return value, using it as a mirror update only. And use this new
function to extract the data as needed in other loading steps.
2020-02-04 12:09:54 +01:00
William Chargin
115740e7fe
mirror: remove docs about legacy extract options (#1612)
Summary:
The `options` argument was introduced during the EAV table refactor and
dropped once that was complete, so we can remove the docs.

Test Plan:
None.

wchargin-branch: mirror-extract-no-options
2020-02-03 18:39:35 -08:00
William Chargin
ba9cc43a39
mirror: add default typename-mismatch error message (#1610)
Summary:
Previously, `_nontransactionallyRegisterObject` differed from its
counterpart `registerObject` in two ways: the former does not enter a
transaction, but it also requires a function to format an error message.
The value provided by `registerObject` to the helper is a suitable
default, so we can use it to decouple the transaction semantics from the
error message.

Test Plan:
Existing tests suffice, retaining full coverage.

wchargin-branch: mirror-default-typename-message
2020-02-03 17:50:30 -08:00
Dandelion Mané
9539a18a67
Add to/fromJSON for plugin declarations (#1623)
This commit adds a simple method for (de-)serializing arrays of
PluginDeclarations. This will allow us to save PluginDeclarations for
consumption by the frontend, without having them bundled with cred
results in TimelineCred. Thus, we can simplify and clean up TimelineCred
as described in #1557.

Test plan: Inspect unit tests and snapshots; `yarn test` passes.
2020-02-03 16:16:53 -08:00
Robin van Boven
8838f856ab
Backend: implement PluginLoaders.declarations (#1616)
This is the first of several commits to create the PluginLoaders
abstraction. Using this allows us to define "for all plugins"
semantics, while keeping each underlying plugin interface flexible.
2020-02-04 00:02:54 +01:00
Robin van Boven
ca63ea00fb
Github: switch to CacheProvider for fetchGithubRepo (#1614)
Delegating to a CacheProvider instance, will limit the number
of places where we need to handle filesystem details. It will
also allow a mock, or in-memory cache to be provided.
2020-02-03 23:46:03 +01:00
Robin van Boven
6904621646
Backend: implement MemoryCacheProvider (#1613)
While so far tests haven't required this (by mocking), there are
scripts like plugins/github/bin/fetchAndPrintGithubRepo.js which
create a tmp directory as a single-use cache. Here we'll be able
to use the MemoryCacheProvider instead.

In future PRs we'll switch to an API which only accepts CacheProviders.
So having MemoryCacheProvider in place will provide a good alternative
to creating tmp directories.
2020-02-03 23:36:58 +01:00
greenkeeper[bot]
f22b6a539f
Update eslint-plugin-import to the latest version 🚀 (#1611)
* chore(package): update eslint-plugin-import to version 2.20.1

* chore(package): update lockfile yarn.lock
2020-02-03 12:56:12 -08:00
William Chargin
0aff3081ec
mirror: remove unnecessary shadowing (#1609)
Summary:
A common table expression shadows a (main or temporary) table name for
`SELECT` statements, but doing so is confusing and makes the code harder
to read.

Test Plan:
Existing unit tests suffice.

wchargin-branch: mirror-no-cte-shadow
2020-02-03 08:17:58 -08:00
Dandelion Mané
5de027e83d
Move weights out of TimelineCredParams (#1607)
This commit moves weights out of the "parameters" to TimelineCred. This
makes sense, because the Weights are now passed to TimelineCred via the
included WeightedGraph. As such, we now have the `api/load` options
include explicit Weights that are used as overrides, rather than having
them be included in the TimelineCred parameters.

Test plan: I've manually tested this commit by:
- Changing weights in the explorer, and verifying that the `recalculate
cred` button activates as expected, and the new weights are used
correctly in the resultant distribution.
- Verifying that downloading weights form the UI still works.
- Verified that uploading weights to the UI still works.
- Verifying that passing command-line weights files still works.

Also, `yarn test` passes.
2020-01-30 16:20:31 -08:00
Dandelion Mané
eb47465421
Add WeightedGraph.overrideWeights (#1606)
In a few occasions in the codebase, we need the ability to take a
WeightedGraph and apply manual user overrides to its weights (keeping
the base weights wherever non-conflicting). It's actually a fairly
simple application of Weights.merge, but since it's of general utility
I'm adding it to the WeightedGraph API.

Test plan: I've added unit tests that validate its behavior; take a
look. `yarn test` passes.
2020-01-30 15:47:39 -08:00
Dandelion Mané
f557af9020
Convert load pipeline to pass WeightedGraphs (#1605)
This commit changes `api/load` and downstream consumers to use
WeightedGraphs instead of regular Graphs. In addition to `api/load`, we
also modify the frontends and the timeline cred calculation module.

However, we don't yet _use_ the weights from the WeightedGraph. So as to
make this commit easier to review, it only changes the data type being
passed around; however in practice the consumers ignore the weights and
simply use the underlying graph. A followon commit will modify the
consumers so that they properly retrieve weights from within the
WeightedGraph.

This is a major step towards #1557.

Test plan:

`yarn test --full` passes; manual testing verifies that the frontend
still displays cred properly, and that modifying the weights and
re-calculating shows that the weights are being used properly.
2020-01-30 15:42:23 -08:00
Dandelion Mané
b5be554d63
add api/loadWeightedGraph (#1604)
This adds a new module the api directory which loads a combined
WeightedGraph across all available plugins. This is intended as a key
piece of a future, less-tightly-coupled load pipeline which will produce
WeightedGraphs, as required by #1557.

Test plan:
The "clean" logic (combining graphs, applying transformations,
overriding weights) is tested explicitly. The "unclean" logic, which
involves directly generating graphs from Discourse/GitHub, are untested.
Arguably we could test with mocks, I'm dubious that doing so would add
real value. I think most of the potential issues (especially
refactoring-induced issues) would get caught by Flow. This is also one
of those "works perfectly or is totally broken" type situations. (Thus,
the likelihood of costly "subtle failures" is low.)
2020-01-30 15:22:57 -08:00
Dandelion Mané
1dd7e7a3c3
add loadWeightedGraph modules for plugins (#1602)
This commit adds `loadWeightedGraph` modules for both the GitHub and
Discourse plugins. They will replacing existing (and inconsistently
named) modules which load regular graphs. In addition to loading
the underlying graph, they set weights according to the plugins'
default type-level weights.

The soon-to-be-replaced modules have been marked deprecated.

Another small and vital step towards #1557.

Test plan: The functions that these functions replace are not tested,
because they are IO-heavy composition methods which are painful to test
themselves, and directly depend on well-tested behavior. For the same
reason, no unit tests have been added. Given the nature of the methods
in question, it's unlikely that they'll be sublty broken.
2020-01-30 15:17:32 -08:00
Dandelion Mané
4407c4f9fc
Weights.merge: add support for resolvers (#1597)
This commit adds support for resolvers to `Weights.merge`. The change is
documented and unit tested. Another step towards #1557.

Test plan: Inspect included tests; `yarn test` passes.
2020-01-30 10:37:05 -08:00
Dandelion Mané
566ecdd255
identity plugin provides a unified IdentitySpec (#1603)
This commit contains a slight refactor to the identity plugin so that it
provides a unified `IdentitySpec` type which wraps the list of
Identities with the metadata (currently a discourse server url) needed
to interpret those identities. This makes the API slightly nicer to use.

Test plan: Simple refactor; `yarn test` is sufficient.
2020-01-30 10:35:54 -08:00
greenkeeper[bot]
3d3c8c92b3
Update flow-bin to the latest version 🚀 (#1600)
* chore(package): update flow-bin to version 0.117.0

* chore(package): update lockfile yarn.lock

* Fixup flow error

From the [Flow 0.117.0 release notes](https://github.com/facebook/flow/releases/tag/v0.117.0)
> Removed uses of Symbol from libdefs in favor of symbol.

Test plan: `yarn flow`

Co-authored-by: Dandelion Mané <decentralion@dandelion.io>
2020-01-28 18:28:16 -08:00
Dandelion Mané
3f42d72467
Create a node for each Discourse like (#1587)
*Let's use the syntax `(node)` to represent some node, and `> edge >` to
represent some edge.*

In the past, for every like, we would create the following graph
structure:

`(user) > likes > (post)`

As of this commit, we instead create:

`(user) > createsLike > (like) > likes > (post)`

We make this change because we want to mint cred for likes. Arguably,
this is more robust than minting cred for activity: something being
liked signals that at least one person in the community found a post
valuable, so you can think of moving cred minting away from raw activity
and towards likes as a sort of implicit "cred review".

Create a node for each like is a somewhat hacky way to do it--in
principle, we should have a heuristic which increases the cred weight of
a post based on the number of likes it has received--but it is expedient
so we can prototype this quickly.

Obviously, this is not robust to Sibyll attacks. If we decide to adopt
this, in the medium term we can add some filtering logic so that e.g. a
user must be whitelisted for their likes to mint cred. (And, in a nice
recursive step, the whitelist can be auto-generated from the last week's
cred scores, so that e.g. every user with at least 50 cred can mint more
cred.) I think it's OK to put in a Sibyll-vulnerable mechanism here
because SourceCred is still being designed for high trust-level
communities, and the existing system of minting cred for raw activity is
also vulnerable to Sibyll and spam attacks.

Test plan: Unit tests updated; also @s-ben can report back on whether
this is useful to him in demo-ing SourceCred [on MakerDAO][1].

If we merge this, we should simultaneously explicitly set the weight to
like nodes to 0 in our cred instance, so that we can separate merging
this feature from actually changing our own cred (which should go
through a separate review).

[1]: https://forum.makerdao.com/t/possible-data-source-for-determining-compensation
2020-01-28 18:10:24 -08:00