This implements rate limiting to the Discourse fetch logic, so that we
can actually load nontrivial servers without getting a 529 failure.
We could have used retry; I thought it was more polite to actually limit
the rate at which we make requests. However, to avoid seeing 529s in
practice, I left a bit of a buffer: we make only 55 requests per minute,
although 60 would be allowed.
If we want to improve Discourse loading time, we could boost up to the
full 60 request/min, but add in retries. (Or we could switch to retries
entirely.)
Test plan: This logic is untested, however my full discourse-plugin
branch uses it to do full Discourse loads without issue.
Adding docker container recipe and instructions in README for running sourcecred
Signed-off-by: Vanessa Sochat <vsochat@stanford.edu>
Test plan: @decentralion verified that the commands work on a fresh setup prior to merging.
Summary:
Generated by manually deleting the three `lodash` paragraphs from the
lockfile and then re-running `yarn`.
Test Plan:
Prior to this commit, running `yarn audit` noted 3011 high-severity
vulnerabilities; now, it notes none. Running `yarn test --full` still
passes.
wchargin-branch: security-upgrade-lodash
Summary:
In #1194, we upgraded Prettier from 1.13.4 to 1.18.2, but this upgrades
past <https://github.com/prettier/prettier/pull/5647>, which was first
released in Prettier 1.16.0. This commit fixes the uses of deprecated
code introduced as a result. It also upgrades the type definitions to
match, via `flow-typed install prettier@1.18.2`.
Addresses part of #1308.
Test Plan:
Prior to this commit, running `yarn unit` would print
```
console.warn node_modules/prettier/index.js:7934
{ parser: "babylon" } is deprecated; we now treat it as { parser: "babel" }.
```
in two test cases; it no longer prints any such warnings. Furthermore,
running `git grep 'parser.*babylon'` no longer finds any matches.
wchargin-branch: prettier-deprecations
Summary:
This dependency was added in #1249 without typedefs, and so is
implicitly `any`-typed.
Depends on #1309 to fix a bug that would otherwise be a true positive
type error.
Addresses part of #1308.
Generated with `flow-typed install deep-freeze@0.0.1`.
Test Plan:
Running `yarn flow` passes, but fails if you remove the `nodePrefix` or
`edgePrefix` attributes of the Discourse plugin declaration.
wchargin-branch: libdefs-deep-freeze
Summary:
A `PluginDeclaration` must have a `nodePrefix` and an `edgePrefix`, but
the Discourse plugin declaration was missing these. This was not caught
by Flow because `deep-freeze` was introduced in #1249 without type
definitions; see #1308.
Test Plan:
Apply the following patch:
```diff
diff --git a/src/plugins/discourse/declaration.js b/src/plugins/discourse/declaration.js
index 246a0a28..36ae5f13 100644
--- a/src/plugins/discourse/declaration.js
+++ b/src/plugins/discourse/declaration.js
@@ -1,6 +1,6 @@
// @flow
-import deepFreeze from "deep-freeze";
+declare function deepFreeze<T>(x: T): T;
import type {PluginDeclaration} from "../../analysis/pluginDeclaration";
import type {NodeType, EdgeType} from "../../analysis/types";
import {NodeAddress, EdgeAddress} from "../../core/graph";
```
Note that, with this patch, `yarn flow` fails before this change but
passes after it. Running `yarn unit` still passes.
wchargin-branch: discourse-plugin-prefixes
Summary:
Generated by running `flow-typed install --skip --overwrite` and
reverting a minimal set of libdefs such that the change does not
introduce any Flow errors (except Prettier, which is covered by #1307).
Addresses parts of #1308.
Changes:
- `chalk`: upgraded v1.x.x to v2.x.x
- `flow-bin`: no-op; explicit Flow window widening
- `isomorphic-fetch`: no-op; formatting change
- `jest`: updates for Flow v0.104.x (explicit inexact objects), and
also some functional additions
- `object-assign`: no-op; explicit Flow window widening
- `rimraf`: added new at v2.x.x
Test Plan:
Flow passes, by construction.
wchargin-branch: libdefs-clean
Summary:
These can be updated cleanly after applying the SourceCred-specific
patch. I’ve modified the comment on that patch to be clear that it *is*
SourceCred-specific—after updating, I spent a while trying to find why
it was deleted from upstream, before eventually realizing that it never
existed upstream anyway.
Generated by running `flow-typed install express@4.16.3 --overwrite` and
then manually inserting the three “SourceCred-specific hack” comment
blocks.
Addresses part of #1308.
Test Plan:
Running `yarn flow` still passes (but warns if the hacks are removed).
wchargin-branch: libdefs-express
Summary:
These can be updated cleanly now that an upstream pull request has been
merged: <https://github.com/flow-typed/flow-typed/pull/3522>
Generated by running `flow-typed install enzyme@3.3.0 --overwrite`.
Addresses part of #1308.
Test Plan:
Running `yarn flow` still passes.
wchargin-branch: libdefs-enzyme
Summary:
All links in SourceCred must use the `Link` component, providing either
an external URL `href={…}` or an internal route `to={…}`. Any uses of a
raw `<a>` element for internal routes will incur 404s when the
application is hosted on a non-root path, as is currently the case on
the production website.
The change to `FileUploader` is not strictly necessary, as the link has
no styled text and uses a `data:` URL, but there’s no reason not to.
Fixes#1304.
Test Plan:
Build the static site:
```
scripts/build_static_site.sh --target cred --project sourcecred/example-github
```
Then run `python3 -m http.server` from the repository root directory—not
the `cred/` subdirectory—and navigate to the timeline cred view:
<http://localhost:8000/cred/timeline/sourcecred/example-github/>
Observe that the “(legacy)” link now has the correct styling and
correctly navigates to the legacy mode page when clicked: prior to this
change, it would navigate to a URL without the proper `/cred/` path
prefix, yielding a 404. On the legacy page, verify that the “timeline
mode” link has the same properties.
Then, visit <http://localhost:8000/cred/test/FileUploader/> and verify
that the inspection test still passes.
Added a regression test to catch further such errors. Note that
reverting the code changes in this commit causes the test to fail, and
that running it with `--verbose` prints the problematic files.
wchargin-branch: fix-bad-routing-404s
Summary:
This is firing on a production page load of the “prototype” link from
the homepage, and does not seem to actually be an error condition.
Test Plan:
Run `yarn start`, navigate to `/timeline/sourcecred/example-github/`,
and observe that the console error has disappeared.
wchargin-branch: defaultloader-console-error
Summary:
When inserting a “like” action with `INSERT OR IGNORE` semantics, we
also learn whether the action had any effect. We can use this bit to
avoid a separate query checking whether the “like” already exists.
As mentioned here:
<https://github.com/sourcecred/sourcecred/pull/1298#discussion_r314994911>
Test Plan:
Running `yarn test` passes as is, and fails if you change `addLike` to
always return either `changed: true` or `changed: false`.
wchargin-branch: discourse-likes-one-query
Summary:
Calling `db.prepare(sql)` parses the text in `sql` and compiles it to a
prepared statement. This takes time, both for the parsing and allocation
itself and for the context switch from JavaScript to C (SQLite).
Prepared statements are designed to be invoked multiple times with
different bound values. This commit factors prepared statement creation
out of loops so that each call to `update` prepares only a constant
number of statements.
In doing so, we naturally factor out some light JS abstractions over the
raw SQL: `addTopic((topic: Topic))`, rather than `addTopicStmt.run(…)`.
In principle, these could be factored out of `update` entirely to
properties set on the class at initialization time, but, as described in
a comment on the GraphQL mirror, we defer this optimization for now as
it introduces additional complexity.
Test Plan:
Running `yarn test --full` passes.
wchargin-branch: discourse-sql-cse
This commit changes the Discourse default weights around, mostly
significantly moving many weights (e.g. LIKES) that have a 0 backward
weight to have a small positive backward weight instead, like 1/16. In
practice, this mitigates an issue where users with few outbound edges
act as "cred sinks" because the cred gets stuck in a loop between the
user and content they've authored.
Test plan: In local experimentation, I've found the new weights produce
more reasonable-seeming cred attribution.
I've written the Discourse plugin with distinct edge types for post and
topic authorship; it allows us to have more precise control over how
cred flows (and mitigates the need for #968). However, I gave the two
types the same name, which is confusing in the weight config ui. Now
they are properly distinct.
Test plan: It's a simple string change. In (unpublished) commits with a
full Discourse integration, the new strings show up nicely in the UI.
The previous code incorrectly constructed a Discourse post url based on
the post's id, rather than its index within the containing topic. This
is now fixed.
Test plan: There isn't actually a snapshot diff, because the post with
id 2 is also the second post in its thread. I'm not too worried about
this, though: this kind of code changes infrequently, and it's pretty
obvious when it's wrong.
The Discourse mirror class now keeps an up-to-date record of all of the
likes within an instance. It does this by iterating over every user in
the history, and requesting their likes. If at any point we hit a like
we've already seen, we move on to the next user. In the future, we can
improve this so we only query users we haven't checked in a while, or
users who were recently active.
Test plan: Tests verify that we correctly store all the likes, including
after partial updates, and that we don't issue unnecessary queries.
This is a minor change to the Discourse mirror so that it supports a
query to get all users from the server. It will be convenient for a
followon change which makes `update` search for every user's likes.
I also modified createGraph so that it uses the new method, which
results in code that is cleaner and slightly more efficient.
Test plan: Unit tests updated.
For the Discourse plugin, we really want to be able to add a full record
of all of the users' liked posts as edges in the graph. It's a really
high-signal way to move cred, that also gives individual users a lot of
agency and way to engage.
However: we need an API to get this data. Initial searches of API docs
were un-promising; fisrt, we would need to query potentially every post
to get its likes individually (makes it very expensive to find the likes
on old posts), and second, the likes did not come with timestamp
information. For a while, I thought we were at an impasse.
I then went fishing in the Discourse implementation for a solution (yay
open source!). Lots of the API is un-documented, since it's whatever
they happen to add to run Discourse. And it turns out there's a
`user_actions` API ([source]) which can provide all of a user's actions
in order, and having your content liked by someone else is considered an
action. Best of all, these actions come with timestamps.
The upshot is that instead of querying every post to get its likes, we
can query every user to get likes. Iterating over all users can still
be slow, but it's far better than iterating over all posts; plus we can
implement caching so that we only infrequently check in on inactive
users.
I've added a `likesByUser` method to the Discourse fetch interface that
provides this information. I've also added a snapshot test for it (and
updated all of the snapshots). I also rolled in a slight refactor to
error handling in the fetcher.
The mirror doesn't yet use this information (will come later).
[source]: 82e07cb0f4/app/controllers/user_actions_controller.rb (L3)
Test plan: `yarn test` passes. Snapshots look good.
This commit adds the logic needed for creating a contribution graph
based on the Discourse data. We first have a declaration with
specifications for the node and edge types in the plugin. We also have a
`createGraph` module which creates a conformant graph from the Mirror
data. The graph creation is thoroughly tested.
Test plan: Inspect unit tests, run `yarn test`. I also have (yet
unpublished) code which loads the graph into the UI, and it appears
fine.
This is a quick fixup so that the coming createGraph module can be
properly tested.
Shout out to @Beanow for anticipating this need in a [review comment].
[review comment]: https://github.com/sourcecred/sourcecred/pull/1266#discussion_r314305108
Test plan: trivial refactor, run `yarn test`
The mirror wraps a SQLite database which will store all of the data we
download from Discourse.
On a call to `update`, it downloads new data from the server and stores
it. Then, when it is asked for information like the topics and posts, it
can just pull from its local copy. This means that we don't need to
re-download the content every time we load a Discourse instance, which
makes the load more performant, more robust to network failures, etc.
Thanks to @wchargin, whose work on the GraphQL mirror for GitHub (#622)
inspired this mirror.
Test plan: I've written unit tests that use a mock fetcher to validate
the update logic. I've also used this to do a full load of the real
SourceCred Discourse instance, and to create a corresponding graph
(using subsequent commits).
Progress towards #865.
The `DiscourseFetcher` class abstracts over fetching from the Discourse
API, and post-processing and filtering the result into a form that's
convenient for us.
Testing is a bit tricky because the Discourse API keys are sensitive
(they are admin keys) and so I'm reluctant to commit them, even for our
test instance. As a workaround, I've added a shell script which
downloads some data from the SourceCred test instance, and saves it with
a filename which is an encoding of the actual endpoint. Then, in
testing, we can use a mocked fetch which actually hits the snapshots
directory, and thus validate the processing logic on "real" data from
the server. We also test that the fetch headers are set correctly, and
that we handle non-200 error codes appropriately.
Test plan: In addition to the included tests, I have an end-to-end test
which actually uses this fetcher to fully populate the mirror and then
generate a valid SourceCred graph.
This builds on API investigations
[here](https://github.com/sourcecred/sourcecred/issues/865#issuecomment-478026449),
and is general progress towards #865. Thanks to @erlend-sh, without whom
we wouldn't have a test instance.
Summary:
In ES6, the [`try` statement grammar][1] requires a catch parameter; the
parameter is only optional in the latest draft of ECMAScript, which is
of course not yet ratified as any actual standard.
Even though we don’t officially pledge to support Node 8, this is
currently the only breakage, and it’s easy enough to fix.
[1]: https://www.ecma-international.org/ecma-262/6.0/#sec-try-statement
Test Plan:
Running `yarn start` on Node v8.11.4 no longer raises a syntax error.
wchargin-branch: catch-parameter
Summary:
Introduced in #1277.
Test Plan:
Run `yarn start` and visit <http://localhost:8080/test/FileUploader/>.
Conduct the test plan as specified on that page.
wchargin-branch: fileuploader-target
I'm mostly motivated by wanting to get greenkeeper lockfile
auto-updating working (see #1269) although this is also a first step
towards making SourceCred usable from NPM (#1232).
For now, see this as us making sure we claim the sourcecred package name
on npm (see: https://www.npmjs.com/package/sourcecred).
I also fixed the license spec so that it's valid SPDX.
Summary:
To elaborate a bit: The repository-level `.gitignore` file is for
artifacts that are generated _by the code/build of that project_. This
includes `node_modules/`, `bin/`, `build/`, etc. These should be
necessary for all users of the project.
The user-level `.gitignore_global` file is for files that _your system_
generates. These are swap files (`.swp` `.swo` `.swa` for Vim), file
system metadata (`.DS_Store` for macOS, `Thumbs.db` for Windows), trash
directories, etc.
(See `man gitignore` for details about the two files. Take a look at
[the `.gitignore` for Git itself][git-gitignore] as an example.)
[git-gitignore]: https://github.com/git/git/blob/master/.gitignore
It doesn’t make sense to put the latter category of patterns into the
project’s `.gitignore`. You can’t accommodate every programming
environment under the sun. The file would be hundreds of lines.
By removing these patterns from the `.gitignore`, we help teach users
about how to configure `.gitignore_global` to set up their own
environment properly, once and for all.
This reverts commit 816c954f3d.
Test Plan:
The `.gitignore` now only contains patterns specific to SourceCred.
wchargin-branch: gitignore-project-only
Summary:
Backticks are discouraged relative to the `$(…)` form for command
substitution, because they are harder to read and do not nest without
exponential escaping:
```shell
$ foo=$(echo $(echo hi $(echo bye))) # clear
$ foo=`echo \`echo hi \\\`echo bye\\\`\`` # hmm
```
In this context, command substitution should not be used at all; `PWD`
is a special variable that always contains the current working
directory. The new version of the code will be correct even if the
current working directory ends with whitespace that would be stripped
off by the command substitution.
Test Plan:
Prepended an `echo` to the relevant line, and verified that the script
has the same output before and after this change.
wchargin-branch: fix-backticks
This updates the deploys script so that we now load full projects for
@libp2p, @ipld, @sourcecred, and @filecoin-project.
I'd like to support @ipfs, but we need to tackle #1256 first.
The code is mostly ported from the legacy app. However, we no longer
assume that we are showing every type for every plugin. Instead, the
types are manually selected. For now, we permit the GitHub user type,
and the GitHub repo type, as these are the two types that are included
in filtered timeline cred.
Test plan: Manual inspection is necessary, since this frontend is mostly
untested. I've done that inspection. Also, `yarn test` passes.
Minor change to the API for MapUtil.pushValue. Now it returns the
resultant array. I've found this convenient in at least one case, and
previously we weren't returning anything, so it's a cheap change.
Test plan: Unit test added.
Summary:
In #1259, `flow-bin` was upgraded to 0.104.0 in `package.json`, but no
corresponding change was made in the lock file.
Test Plan:
Running `yarn` is now a no-op.
wchargin-branch: lock-flow-bin-0.104.0
Summary:
[Prettier docs] recommend pinning an exact version because their semver
policy does not extend to stylistic changes, and so patch releases may
change the formatting output.
Given some recent discussion about formatting skew of unknown cause,
this seems like a reasonable safety measure.
Generated with `yarn add --dev --exact prettier`.
[Prettier docs]: https://prettier.io/docs/en/install.html
wchargin-branch: prettier-exact