This adds a new module the api directory which loads a combined
WeightedGraph across all available plugins. This is intended as a key
piece of a future, less-tightly-coupled load pipeline which will produce
WeightedGraphs, as required by #1557.
Test plan:
The "clean" logic (combining graphs, applying transformations,
overriding weights) is tested explicitly. The "unclean" logic, which
involves directly generating graphs from Discourse/GitHub, are untested.
Arguably we could test with mocks, I'm dubious that doing so would add
real value. I think most of the potential issues (especially
refactoring-induced issues) would get caught by Flow. This is also one
of those "works perfectly or is totally broken" type situations. (Thus,
the likelihood of costly "subtle failures" is low.)
This commit adds `loadWeightedGraph` modules for both the GitHub and
Discourse plugins. They will replacing existing (and inconsistently
named) modules which load regular graphs. In addition to loading
the underlying graph, they set weights according to the plugins'
default type-level weights.
The soon-to-be-replaced modules have been marked deprecated.
Another small and vital step towards #1557.
Test plan: The functions that these functions replace are not tested,
because they are IO-heavy composition methods which are painful to test
themselves, and directly depend on well-tested behavior. For the same
reason, no unit tests have been added. Given the nature of the methods
in question, it's unlikely that they'll be sublty broken.
This commit adds support for resolvers to `Weights.merge`. The change is
documented and unit tested. Another step towards #1557.
Test plan: Inspect included tests; `yarn test` passes.
This commit contains a slight refactor to the identity plugin so that it
provides a unified `IdentitySpec` type which wraps the list of
Identities with the metadata (currently a discourse server url) needed
to interpret those identities. This makes the API slightly nicer to use.
Test plan: Simple refactor; `yarn test` is sufficient.
* chore(package): update flow-bin to version 0.117.0
* chore(package): update lockfile yarn.lock
* Fixup flow error
From the [Flow 0.117.0 release notes](https://github.com/facebook/flow/releases/tag/v0.117.0)
> Removed uses of Symbol from libdefs in favor of symbol.
Test plan: `yarn flow`
Co-authored-by: Dandelion Mané <decentralion@dandelion.io>
*Let's use the syntax `(node)` to represent some node, and `> edge >` to
represent some edge.*
In the past, for every like, we would create the following graph
structure:
`(user) > likes > (post)`
As of this commit, we instead create:
`(user) > createsLike > (like) > likes > (post)`
We make this change because we want to mint cred for likes. Arguably,
this is more robust than minting cred for activity: something being
liked signals that at least one person in the community found a post
valuable, so you can think of moving cred minting away from raw activity
and towards likes as a sort of implicit "cred review".
Create a node for each like is a somewhat hacky way to do it--in
principle, we should have a heuristic which increases the cred weight of
a post based on the number of likes it has received--but it is expedient
so we can prototype this quickly.
Obviously, this is not robust to Sibyll attacks. If we decide to adopt
this, in the medium term we can add some filtering logic so that e.g. a
user must be whitelisted for their likes to mint cred. (And, in a nice
recursive step, the whitelist can be auto-generated from the last week's
cred scores, so that e.g. every user with at least 50 cred can mint more
cred.) I think it's OK to put in a Sibyll-vulnerable mechanism here
because SourceCred is still being designed for high trust-level
communities, and the existing system of minting cred for raw activity is
also vulnerable to Sibyll and spam attacks.
Test plan: Unit tests updated; also @s-ben can report back on whether
this is useful to him in demo-ing SourceCred [on MakerDAO][1].
If we merge this, we should simultaneously explicitly set the weight to
like nodes to 0 in our cred instance, so that we can separate merging
this feature from actually changing our own cred (which should go
through a separate review).
[1]: https://forum.makerdao.com/t/possible-data-source-for-determining-compensation
This adds a simple method, `weightsForDeclaration`, which generates
weights from a plugin declaration. This is a small but important piece
of #1557, as it allows us to create appropriate Weights cleanly from the
plugin.
Test plan: Unit tests added; `yarn test` passes.
This commit adds a `contractIdentities` method to the Identity plugin,
which allows contracting a WeightedGraph using the provided identities.
The method does not attempt to contract weights together, although as a
safety check it will error if weights have been explicitly provided for
any of the contracted nodes.
This PR replaces #1591; see that pull for some context on why this
method is defined on the identity plugin rather than as part of the
WeightedGraph module.
Test plan: For ease of testing, `contractIdentities` is a thin wrapper
around `nodeContractions` (which is already tested) and a new private
`_contractWeightedGraph` method (for which tests have been added). Since
`contractIdentities` is a trivial oneline composition, it does not need
any additional explicit testing.
`yarn test` passes.
This module is a simple data type which contains a graph and associated
weights. It provides methods for JSON (de)serialization, constructing
new WeightedGraphs, and for merging them.
Test plan: See included unit tests. The unit tests are simple because
the data type and associated methods are quite simple; the underlying
functions for Graphs and Weights have more extensive testing.
Progress towards #1557.
scripts/update_snapshots.sh is intended as a general-purpose snapshot
updater for SourceCred. Currently, it includes updating Discourse
snapshots, but only if an obsolete Discourse API key is present.
Updating Discourse snapshots is very noisy, because the API responses
are not stable (they include the view count, which increments when
making API requests). Also, most times when we want to update our
snapshots, it's because we changed some core data structure, not because
we actually want new data from Discourse. Therefore, we should
disconnect the Discourse snapshot update process from the general
snapshot updating script.
Test plan:
Run `./scripts/update_snapshots.sh` and verify that it does not produce
Discourse update churn. Run
`./src/plugins/discourse/update_discourse_api_snapshots.sh` and verify
that it does update all the Discourse snapshots.
As discussed in [this GitHub comment][1], it doesn't make sense for user
node types (or user nodes) to have non-zero weight. The reason is that
we use weights for minting cred. Minting cred to users in general
doesn't make sense (having more user accounts is not intrinsically
valuable to a project) and minting cred to specific users is
inappropriate (it means that users' cred is being determined by their
power to influence the weights, rather than because of the value of
their contributions).
This commit makes two changes:
- It sets the default weight for all user types to 0. This has no
implications for cred, since the user weights were already (implicitly)
discarded because users all have null timestamps.
- It filters user node types from the weight config, so the UI no longer
incorrectly suggests that user node weights can be meaningfully changed.
As a result of the second change, the identity plugin now displays in
the weight change UI but has no node or edge types associated. As a
followon commit, we may want to add a bit of filtering logic to clean
that up.
Test plan:
Setting the default weights to 0 for the user types has no effect on
cred, as can be manually ascertained by taking an existing cred
instance, changing the user type weights, and re-calculating.
Filtering the user node types from the WeightConfig is validated through
manual inspection testing.
I've found that frontend unit testing of changes like this has limited
value; since there aren't subtle edge cases to validate, and regressions
are unlikely, I don't think we need a unit test at this time. Therefore,
I haven't added formal tests.
[1]: https://github.com/sourcecred/sourcecred/pull/1591#discussion_r370951707
This adds a `merge` method to the weights module, which allows combining
multiple weights together. If any of the weights overlap (i.e. the same
address is specified in multiple Weights), then an error will be thrown.
In the future, we will likely extend the API so that the client can
specify how to resolve cases where the same address is present in
multiple weights.
The method is thoroughly unit tested.
This is part of work on #1557 (we'll want to be able to merge
`WeightedGraph`s.)
Test plan: Run `yarn test`
Currently the responsibility for the SourceCred directory
is spread out in different places. Some in core/project_io
some in api/load, some in the plugins.
This class is intended to centralize that IO using simple
interfaces we can depend on (and mock) instead.
`empty` is a more descriptive name for a `Weights` object that has no
weights set, rather than `defaultWeights`.
In every case where we were importing `defaultWeights` as a direct
symbol, I switched to importing the whole module, as usage of
`Weights.empty` makes it clear that the empty object returned will be an
empty weights (as opposed to an empty list or some other empty type).
This is as proposed in the reviews from #1538.
Test plan: It's just a rename and change in imports, so `yarn flow` would
catch any error. `yarn test` passes.
Note: this is a port of #1583, which merged to the wrong branch.
Currently, to produce a Github graph from a populated mirror
there is an unexpected dependency on a GithubToken. See #1580.
This is step 1 to remove the dependency. It will allow us to
locate the Database without a GithubToken.
As part of #1557, I want to move the concept of weights into core, so
that plugins can produce a WeightedGraph rather than raw Graph. This
will allow us to do cred computation directly on the data we get from
the plugins, without recourse to plugin metadata.
Test plan: It's a simple file move; `yarn test` suffices.
The upcoming DataDirectory class will use stable stringify too.
But since that will affect the snapshots, make sure those are
updated before we switch to the new load implementation.
Logs network requests and responses into a table in the db. Also logs
an `UpdateId` that links the mutation to the network request/response
that generated it.
This is a breaking change to the Mirror database format; existing caches
will need to be regenerated.
Resolves#1450.
Test Plan:
Unit tests added, covering the core functionality and integration into
the public APIs. After loading a project and manually inspecting it in
an sqlite shell, the results in the `network_log` table looked
reasonable.
Resolves#1317
Updates timeline cred to handle the case where the scoring nodes' total
cred sums to zero in an interval. In practice, we've encountered this
circumstance when a github.io repository contains timestamps that
predates any User's contributions by several weeks, such as
sfosc.github.io.
Test Plan:
- Added a test case to handle this circumstance
- Updated a test case per discussion on #1317 to return a cred score
of 0 for all nodes in all intervals when there are no scoring nodes
passed to the function, so that we handle these cases consistently.
Also loaded sfosc.github.io and observed that the cred output appeared
to match expectations and didn't contain any `NaN` or `Infinity` values
as it did before.
Summary:
We’ve had this policy unspoken since the beginning; making it explicit
as a lint rule makes it one less thing to think about. The few existing
offenders can almost all be changed with no value lost; one is an extern
that needs a lint suppression.
Test Plan:
That Flow and tests still pass suffices.
wchargin-branch: lint-camelcase
I'm currently on a quest to separate cred computation away from any
plugin metadata (see #1557). This means we need a way to represent node
and edge weights without any explicit concept of 'types'.
This commit is a first step towards that. It removes the distinction
between 'type weights' and 'manual weights' in the weights data type.
Instead, we now just have node weights and edge weights. In contrast to
before, all weights are now interpreted as prefix matchers, e.g. a
single node or edge may match multiple weights; when this occurs, the
weights compose multiplicatively.
Since types were already identified by prefix, if a plugin wants to
assign a weight to a particular type, it may do so by specifying a
weight for that type's prefix. As before, it's possible to have a
type-level weight and a weight on a specific node, and compose them
multiplicatively.
As an added bonus, we could now sensibly have 'plugin-level' weights and
'type-level weights' and compose them multiplicatively. Thus, if we
realized that the Foo plugin is undervalued relative to the Bar plugin,
we could increase the Foo weight rather than needing to adjust all of
its types individually.
So as to keep the scope for this commit somewhat manageable, I modified
the underlying data type for Weights, but not any of the cred
computation interfaces. The weights pipeline still takes the plugin
declarations, and we still get the default type level weights from the
plugin's types. A future commit will modify the pipeline so that the
plugins provide default types alongside the Graph.
I deliberately did not provide an upgrade handler for the old style
weights JSON. This is sensible as the semantics are now different. In
the past, it was possible to specify a weight for a single node without
affecting the weights of other nodes whose addresses have the first
node's address as a prefix. Since this is no longer possible, there is
no universally "correct" way to handle the old weights files. In
practice, there are so few users that it is not a big deal either way.
Test plan:
This change has implications across the codebase and UI. In addition to
`yarn test --full` passing, I verified that:
- updating and recomputing works in the mainline UI
- updating and recomputing works in the legacy UI
- downloading weights from the UI and then explicitly loading them still
works
As a RelationalView is not designed for multiple repositories, we
should implement our own merging of mappings obtained from
RelationalViews.
fromRelationalViews is a factory function which does this for us.
And by accepting an array of RelationalViews it's more apparent
it should be used this way.
RelationalView provides easy access to ReferentEntities, which we
can use for reference detection. The map produced by
urlReferenceMap can be used easily by a MappedReferenceDetector.
As discussed in #1532 we'll use RelationalView for this before
deprecating it.
Currently, we have robust GitHub token validation logic. However, at a
type level, usage of this logic is unenforced, so many places in the
codebase don't use validation; most crucially, the `Common.githubToken`
method doesn't, which means that the CLI doesn't validate GitHub tokens.
Instead, `Common.githubToken` currently provides a deceptive signature:
`function githubToken(): string | null`
One might reasonably think that the presence of a string means that
there is a GitHub token, and that you can test `if (token != null)`.
However, a command line user can easily provide an empty string:
`SOURCECRED_GITHUB_TOKEN=null node bin/sourcecred.js load ...`
In this case, the user was trying to unset the GitHub token, but this
actually provides a string-y GitHub token, so at a type level, it looks
like a GitHub token is present.
No more! This commit adds `opaque type GitHubToken: string = string` in
the `github/token.js` module. Since the type is opaque, it only has one
legal constructor: the `validateToken` method in `github/token.js`. The
functions that actually use the token have been updated to require this
type. Therefore, we now enforce at the type level that every usage of a
GitHub token needs to be validated, ensuring that we no longer confuse
empty strings for valid GitHub tokens.
Note that making GitHub token an opaque subtype of string
(`GithubToken: string`) is important because it means that consumers can
still pass or store the token as a string; however, no fresh ones can be
constructed except by the validator.
Test plan: Implementation-wise, this is a simple refactor; `yarn test`
passes.
We defined a DiscourseQueries interface, intended as a subset of
the Discourse plugin's MirrorRepository methods. This subset is
used by the Initiatives plugin to source Iniaitive data.
We're now adding the new methods it needed to the MirrorRepository.
In the early days of the project, we used GitHub repository ids as the
core way of identifiying projects. This was a weird choice, since it's
so specific to the GitHub plugin. In #1238 we added a (theoretically)
agnostic type in `Project`, although in practice it's still pretty
coupled. Still, it will be best to move the `RepoId` type out of `core`
and to the GitHub plugin where it belongs.
This leaves a few "awkard" imports from plugin code, (e.g. in the api
module), but generally the places that are importing `RepoId` were
already importing stuff from Discourse and Identity plugins. In either
case, I'd rather have the awkwardness of depending on the RepoId in core
places be obvious (from the dependency on plugin code) rather than
giving a false appearance that RepoIds are really a core concept.
Test plan: `yarn test` passes.
After we wrote our Docker jobs, the cache_from was added to
the orb. This is shorter and less error prone because we don't
need to provide the identical arguments twice.
The change was made in
https://github.com/CircleCI-Public/docker-orb/pull/27
by @vsoch. And was released as of v0.5.15 of the Docker orb.
The change updates to the latest version of the orb currently
available.
Fixes#1512
In contrast with our previous "tags only" deploy job, this
configuration makes sure all it's preceding jobs are also
set to a "tags only" filter in order to run.
Note, adding support in this single function doesn't solve some of the greater issues with HTTP/HTTPS. Because the protocol is included in the node addresses, converging nodes or canonicalizing on either protocol would be important for instances that support HTTP. That problem is outside of scope for the reference detector though.
Summary:
This commit adds a simple Python server for connecting the output of
`yarn api` (or `yarn api --watch`) to an observable notebook. We need a
custom server rather than just `python3 -m http.server` to send CORS
headers properly. This server enables a very tight loop from editing
SourceCred core code on your local filesystem to seeing live updates in
an Observable notebook, with latency on the order of one second.
Test Plan:
Run `yarn api --watch` in the background. Launch the new API server.
Navigate to <https://observablehq.com/demo>. Copy the two paragraphs of
Observable code from `scripts/serve_api.py` into _separate_ Observable
cells, and execute them. Note that `myGraph` becomes a valid SourceCred
graph. Modify `src/core/graph.js` to add `this._aaa = 123;` to the top
of the `Graph` constructor. Re-execute the first Observable cell (the
one that loads the SourceCred module), and note that `myGraph` updates
to include the new `_aaa` attribute:
![Screenshot of Observable notebook after test plan][ss]
[ss]: https://user-images.githubusercontent.com/4317806/71958748-dddf8680-31a5-11ea-9016-5df76ceeea46.png
wchargin-branch: api-server
Summary:
This re-packages the build for the internal APIs exposed under #1526 to
be more browser-friendly. Removing `target: "node"` (and adding an
explicit `globalObject: "this"` for best-effort cross-compatibility) is
the biggest change from the backend build; removing all the extra
loaders and static site generation is the biggest change from the
frontend build.
This build configuration is forked from `webpack.config.backend.js`.
Test Plan:
Run `yarn api`, then upload the contents of `dist/api.js` to an
Observable notebook and require it as an ES module. Verify that the
SourceCred APIs are exposed: e.g., `sourcecred.core.graph.Graph` should
be a valid constructor.
wchargin-branch: api-build
The `pagerankGraph` module was an attempt to do a better job of
co-ordinating the data needed to run Pagerank, by wrapping the Graph
class alongside context on edge weights, etc. However, it was obsoleted
by work on TimelineCred. Thus, we can remove it entirely. I intend to
make another attempt at collecting all the data needed for cred analysis
in a way that doesn't couple with plugin code, and this time it will be
timeline-aware.
Test plan: `yarn test`
Summary:
For convenient import by scripts and Observable notebooks that want to
use SourceCred code outside its normal build system. We export a subset
of the codebase, including some core data structures and algorithms and
also some plugin metadata, but no plugin loading code.
To build, run `yarn backend` (or `yarn backend --watch`), then grab the
new `bin/api.js` file.
Test Plan:
Sample usage, with normal Node:
```javascript
const {
core: {
graph: {Graph, NodeAddress, EdgeAddress},
},
} = require("./api").default;
function node(address) {
return {
address,
description: "blurgh",
timestampMs: -1,
};
}
const g = new Graph();
g.addNode(node(NodeAddress.fromParts(["people", "alice"])));
g.addNode(node(NodeAddress.fromParts(["people", "bob"])));
g.addEdge({
address: EdgeAddress.fromParts(["friendship"]),
src: NodeAddress.fromParts(["people", "alice"]),
dst: NodeAddress.fromParts(["people", "bob"]),
timestampMs: 0,
});
console.log(require("json-stable-stringify")(g));
```
This prints a valid graph JSON object.
wchargin-branch: api-bundle
Before we added the concept of "SourceCred Projects", we tracked cred
instances via their GitHub repostiory id. The replacement for this
system was added in #1238, I missed the RepoIdRegistry in the cleanup.
This commit removes all code pertaining to the now-obsolete
RepoIdRegistry.
Test plan:
- `yarn test --full` passes
- manual inspection of `yarn start`; it still loads properly
- manual inspection of the output for build_static_site.sh
- `git grep repoIdRegistry` returns no hits
Summary:
PRs created from forks don’t have credentials when running CI. This
commit causes the `test-full` job (which requires credentials) to fail
fast with a helpful error message.
Test Plan:
Push distinct versions of this commit to a fork and to the main
repository, and open pull requests for each. Note that the tests pass
from the main repository, but fail with a nice message from the fork:
![Screenshot of expected fast-fail behavior][ss]
The “team member pushes to trusted branch” workflow has already been
successfully exercised for #1521.
[ss]: https://user-images.githubusercontent.com/4317806/71707839-b782ab00-2da1-11ea-8aa9-7d8720538a87.png
wchargin-branch: forked-pr-fail-fast
See #1512 for full context.
Short explanation:
Because the job wants to run only on tag pushes, but requires
the `test` job (which doesn't run on tag pushes), the job
will never run.
Gets the username of a user, if it exists.
Helpful for fixing capitalization issues such as #1479,
and verifying the user exists for reference detection.
Previously both node versions would share the same cache.
This caused one of the two versions to always rebuild
the `better-sqlite3` package, costing about 1 min per job.
Now we're using a different cache key for each version,
rebuilding a cached `better-sqlite3` should no longer be
necessary.
The TranslatingReferenceDetector is an abstraction particularly useful for the
Initiatives reference detector. Which should use the Discourse reference
detector as it's base and translate the node address of the returned discourse
topic to the initiative's node address.
The current reference detection implementation internal to the GitHub plugin
uses a map similar to this. This class being near to that makes it easy to adopt.
It's also very simple to use for tests.
The core declaration of the ReferenceDetector interface.
Reason I'm adding an index.js file is to allow (core) classes that implement
this interface to have separate files, while keeping redundancy out of the
import statements.