sourcecred

Commit Graph

Author	SHA1	Message	Date
Dandelion Mané	b2943390dc	add discourse references to the graph (#1410 ) This commit modifies `discourse/createGraph` so that it finds all of the same-server Discourse references in Discourse posts, and creates appropriately typed references edges in response. The unit tests have been updated with cases for both references that should exist, and references that shouldn't (e.g. post index out of bounds, or a reference to the wrong server). Test plan: `yarn test --full` along with snapshot update. This is progress towards [Discourse reference and mention detection][1]. [1]: https://discourse.sourcecred.io/t/discourse-reference-mention-detection/270	2019-10-18 10:56:53 -06:00
Robin van Boven	e043347526	Support dashes in alias usernames. (#1412 )	2019-10-17 13:21:39 -06:00
Dandelion Mané	78c34b5a36	Parse Discourse references from hyperlinks (#1405 ) The `discourse/references` module now has a `linksToReferences` method which extracts the parsed Discourse references from an array of hyperlinks. The method is tested. Test plan: Unit tests added; `yarn test` passes. This is progress towards [Discourse reference and mention detection][1]. [1]: https://discourse.sourcecred.io/t/discourse-reference-mention-detection/270	2019-10-16 18:39:46 -06:00
Robin van Boven	00cc8b2a54	Expand the blacklist, found new type inconsistencies (#1407 ) - Bots being Users as a commit author - Orgs being Users on a reaction Repositories affected, check represents tested after patch: - [x] prettier/prettier - [x] lovell/sharp - [x] facebook/jest - [x] babel/babel-eslint - [x] recharts/recharts - [x] webpack-contrib/css-loader - [x] yannickcr/eslint-plugin-react - [x] vuejs/vuex - [x] chimurai/http-proxy-middleware - [x] sass/node-sass - [x] lodash/lodash - [x] vuejs/vue - [x] reacttraining/react-router - [x] axios/axios - [x] webpack/webpack-dev-middleware - [x] eslint/eslint - [x] webpack/webpack - [x] webpack/webpack-cli - [x] sinonjs/sinon - [x] neutrinojs/webpack-chain - [x] webpack/webpack-dev-server Found as part of https://github.com/teamopen-dev/sourcecred-stack-lookup Test after this patch: pending, it's a lot of data after the cache invalidated 😅	2019-10-15 08:37:24 -07:00
William Chargin	0380088af2	mirror: update implementation notes for EAV tables (#1345 ) Summary: The notes used to focus on the legacy implementation with a minor note about the EAV implementation; this change flips that relationship. Test Plan: None. wchargin-branch: mirror-eav-impl-notes	2019-10-12 11:36:16 -07:00
William Chargin	809fd23def	mirror: read from EAV tables by default (#1344 ) Summary: This flips the switch for all production `Mirror` reads to use the single `primitives` EAV table as their source of truth, rather than the legacy type-specific primitives tables. For context and design discussion, see issue #1313 and commits adjacent to this one. Test Plan: All relevant code paths are already tested (see test plans of commits adjacent to this one). Running `yarn test --full` passes. wchargin-branch: mirror-eav-flip	2019-10-12 11:28:55 -07:00
William Chargin	e5a77488de	mirror: add EAV reading to `extract`, behind flag (#1343 ) Summary: This completes the end-to-end EAV mode pipeline, but does not yet set it as default or use it in production. A note about indentation: we take care to avoid reindenting the entire block of `extract` test cases, which is over 900 lines long. As to the implementation code, reindenting the legacy type-specific primitives branch is not easily avoidable, but when we remove that branch we won’t have to reindent the EAV mode branch: we can replace its `if` block with two scope blocks (which is the right thing to do, anyway). Test Plan: We reuse existing tests, which suffice for full coverage in both implementation branches. Note that these tests cover the case of object types with no primitive fields (the `Feline` and `Socket` types), which are more likely to fail in a broken EAV implementation than in a broken type-specific primitives implementation due to deletion anomalies. To check that all relevant calls to `mirror.extract(…)` have been properly replaced with `extract(mirror, …)`, run yarn coverage -f graphql/mirror -t 'EAV primitives' and note that the “else” path of the `if (fullOptions.useEavPrimitives)` branch is not taken; then, run yarn coverage -f graphql/mirror -t 'legacy type-specific primitives' and note that the “if” path of the same branch is not taken. To check that the table hiding logic is working, invert the branch that checks `if (fullOptions.useEavPrimitives)`, and note that every test case using the table hiding logic fails (except for some of the error handling test cases, which do not actually need to read primitive data). Finally, `yarn test --full` passes after flipping the `useEavPrimitives` default to `true`. wchargin-branch: mirror-eav-extract	2019-10-12 11:23:35 -07:00
Dandelion Mané	e1a73ac368	refactor discourse createGraph (#1409 ) This is a minor refactor to re-organize the createGraph function in the Discourse plugin to use a class under the hood. Using a hidden class makes sense because there is a fair bit of shared state that's needed while creating the graph. The proximate cause for this refactor is tha adding reference edges will bloat the `addPost` section of the function, which was already a little too complex. Simply shoving in more complexity would make it unweidy. So I opted for this minor refactor. It's internal-only (no public APIs are changed). Test plan: `yarn test` passes. As noted, refactor is internal-only. This is progress towards [Discourse reference and mention detection][1]. [1]: https://discourse.sourcecred.io/t/discourse-reference-mention-detection/270	2019-10-11 13:46:49 -06:00
Dandelion Mané	d4804a7a68	Add edge types for Discourse references (#1406 ) Test plan: It's just a declaration change. `yarn flow` passes. This is progress towards [Discourse reference and mention detection][1]. [1]: https://discourse.sourcecred.io/t/discourse-reference-mention-detection/270	2019-10-11 13:46:35 -06:00
Dandelion Mané	eb008f40cc	discourse: factor out address module (#1404 ) This will make it possible to depend on addresses in the reference module. Test plan: `yarn test` passes. This is progress towards [Discourse reference and mention detection][1]. [1]: https://discourse.sourcecred.io/t/discourse-reference-mention-detection/270	2019-10-11 13:40:10 -06:00
Dandelion Mané	5e02a2caeb	Add logic for plucking hyperlinks from cooked html (#1403 ) This commit adds a `parseLinks` method to a new module, `plugins/discourse/references`. `parseLinks` allows us to extract the hyperlinks from `<a>` tags in "cooked" html. I added `htmlparser2` as a dependency to parse the html. There were a lot of options to choose from; I chose htmlparser2 because it has a lot of usage, reasonable performance, and suits our needs. We use this dependency in a lightweight and local way, so we can always change it later if needed. One thing which was a bit odd: I wasn't able to import it using `import`, and needed a `require` statement instead. Test plan: Unit tests added; `yarn test` passes. This is progress towards [Discourse reference and mention detection][1]. [1]: https://discourse.sourcecred.io/t/discourse-reference-mention-detection/270	2019-10-11 13:36:31 -06:00
Dandelion Mané	f82c1bfbbe	Add post contents to the Discourse mirror (#1402 ) This modifies the Discourse fetcher and mirror so that we now keep post contents around, thus enabling future reference detection (and other things). The post contents are stored and provided as retrieved from the API, which is in "cooked" HTML form. Test plan: Unit tests and snapshots updated. Observe that the snapshots now include Discourse post contents. This is progress towards [Discourse reference and mention detection][1]. [1]: https://discourse.sourcecred.io/t/discourse-reference-mention-detection/270	2019-10-11 13:31:01 -06:00
Dandelion Mané	026d3dc705	Upgrade flow to v109 (#1395 ) We need one tiny change in test code, where Flow (correctly) detects an error. I've added an error suppression comment because it is truly a Flow error, but is appropriate as we are testing an error condition. Test plan: `yarn test`	2019-10-03 10:41:51 -06:00
Dandelion Mané	64c17f7dba	Change default alpha to 0.2 (#1391 ) SourceCred is currently quite sensitive to inadvertent 'tight loops' in the cred, where (e.g.) one user recieves cred but doesn't have many out edges, resulting in a feedback loop where that person gets disproportinate cred. See [1] and [2] for some examples. Per a [suggestion] from @mzargham, I'm going to bandaid this issue by increasing the alpha parameter; I've increased it 4x from 0.05 to 0.2. Subjectively, I think this improves the cred quality. [1]: https://discourse.sourcecred.io/t/sneak-peek-sourcecred-discourse-plugin/171 [2]: https://discourse.sourcecred.io/t/preliminary-credsperiment-cred/219 [suggestion]: https://discourse.sourcecred.io/t/preliminary-credsperiment-cred/219/16?u=decentralion	2019-09-30 10:49:25 -06:00
Dandelion Mané	6e2af1070f	Expose alpha in TimelineExplorer (#1390 ) This commit modifies the TimelineExplorer so that the user can both see the chosen alpha value, and change it. Alpha has a pretty profound impact on the final scores, and I want to tweak it for CredSperiment week two, so this is an important addition. Test plan: Modify the alpha, re-run cred calculation, and observe that the scores change. `yarn test` passes.	2019-09-30 10:33:15 -06:00
Dandelion Mané	54ece536d3	Integrate the identity plugin (#1385 ) This commit integrates the identity plugin, which was created in #1384. It does this by adding explicit identity fields to the project configuration, which are then applied when loading the graph in `api/load.js`. The actual integration is quite straightforward. Test plan: The underlying logic is thoroughly tested; I added one new test case to verify that it is integrated properly. Since the project compat has changed, I've updated all the snapshots. Prior to merging this PR, I will produce one "integration test", using this code to do identity resolution for a real project (i.e. on the SourceCred instance itself).	2019-09-20 12:08:27 +02:00
Dandelion Mané	9a9f211901	Add the identity plugin (#1384 ) This commit adds the new SourceCred identity plugin. As described in the README.md file: This folder contains the Identity plugin. Unlike most other plugins, the Identity plugin does not add any new contributions to the graph. Instead, it allows collapsing different user accounts together into a shared 'identity' node. To see why this is valuable, imagine that a contributor has an account on both GitHub and Discourse (potentially with a different username on each service). We would like to combine these two identities together, so that we can represent that user's combined cred properly. The Identity plugin enables this. Specifically, the instance maintainer can provide a (locally unique) username for the user, along with a list of aliases the user is known by, e.g. `github/username` and `discourse/other_username`. The aliases are simple string representations, that are intended to be easy to maintain by hand in a configuration file. Then, the identity plugin will provide a list of `NodeContraction`s that can be used by `Graph.contractNodes` to combine the user identities as described. The plugin is broken up into a few submoudles: - `declaration.js` provides the PluginDeclaration. It has a single node type (the identity node). - `identity.js` declares the `Identity` type (a username and list of aliases), allows constructing identity nodes, and does some validation on the identity username. - `alias.js` implements the logic for parsing aliases like "github/decentralion" or "discourse/s_ben" into a node address. - `nodeContractions.js` provides logic for turning a list of Identities into a list of NodeContractions, suitable for use in `Graph.contractNodes`. The plugin is not yet integrated; that will come in a followon commit. Test plan: Unit tests added; `yarn test` passes.	2019-09-20 11:50:59 +02:00
Dandelion Mané	b86dcf742e	Make the Discourse plugin robust to errors (#1387 ) Currently attempting to load the SourceCred discourse instance fails with foreign key constraint errors. Basically, we have a few weird situations: - A post (which corresponds to the 'psuedo-topic' generated by creating a new category) is picked up, but its topic is not detected, because Discourse does not list these 'psuedo-topics' in the latest topic endpoint. Attempting to add the post breaks the foreign key constraint. - We have several likes which correspond to posts that don't exist. Possibly they were deleted? I'm not sure. Right now, the load process fails entirely when it hits these exceptions, which is bad. It should print a warning instead, and continue without the offending interactions. This commit effects that change in behavior. Test plan: Before this commit, loading the SourceCred discourse with a clean cache fails. After building with this commit, loading the SourceCred discourse with a clean cache workes and prints the following warnings: ``` $ node bin/sourcecred.js discourse https://discourse.sourcecred.io credbot GO load-discourse.sourcecred.io GO discourse GO discourse/topics DONE discourse/topics: 3m 53s GO discourse/posts Warning: Encountered error 'FOREIGN KEY constraint failed' while adding post https://discourse.so urcecred.io/t/214/1. DONE discourse/posts: 2m 38s GO discourse/likes DONE discourse/likes: 50s DONE discourse: 7m 21s GO compute-cred DONE compute-cred: 547ms DONE load-discourse.sourcecred.io: 7m 22s ``` Also, unit tests have been added that verify the specific behavior changes.	2019-09-20 11:21:53 +02:00
Robin van Boven	d5d00aae5a	Blacklist techtribe org, thumbsup reaction (#1386 ) Fixes #1353 Tested manually by creating a docker image including the changes. Running the dev-preview @passbolt command until completion. (once hitting the github rate limit, once till #1354 happens) No more problematic interactions show up during load.	2019-09-20 11:20:14 +02:00
Robin van Boven	d6bbc939b2	Add more bots. (#1383 ) Fixes #1381	2019-09-19 17:52:20 +02:00
Dandelion Mané	8f46d7d812	Fix bug when selecting "All users" in explorer (#1388 ) This fixes a bug introduced in #1371, where selecting a type other than "All users" and then trying to reselect "All users" would break the UI. Test plan: Manual inspection; load an instance, try selecting a different type, and then go back to "All users". It now works as expected.	2019-09-19 14:01:17 +02:00
Dandelion Mané	007568d3f0	Add `sourcecred discourse` command (#1374 ) This adds a new command, `discourse`, which makes it convenient to load Discourse servers as standalone SourceCred projects. For example, you could load the official SourceCred discourse via the following: ```sh export SOURCECRED_DISCOURSE_KEY=.... yarn backend node bin/sourcecred.js discourse https://discourse.sourcecred.io credbot yarn start ``` I've updated the README with instructions for using the plugin. Test plan: No automated testing because I see this tool as a temporary placeholder until we get the SourceCred instances setup. I manually tested the error cases (e.g. providing an invalid server url) as well as success cases like the one above. I validated that the weights file argument is being interpreted correctly (i.e. trying to load invalid weights produces an expected error message, loading valid weights results in those weights being present in the UI).	2019-09-19 12:32:49 +02:00
Dandelion Mané	1449935651	GitHub plugin: Expose user addresses (#1382 ) Allow getting the node address for a user, given the user's login. This will be needed by the upcoming identity plugin. If the login in question corresponds to a bot, then a bot address will be returned. When we make the bot-set configuration (rather than hardcoded), we'll need to change the signature of this function; I think that's fine. Test plan: Unit tests added. (Also, it's really simple.)	2019-09-18 14:50:52 +02:00
Dandelion Mané	ac8ac7051f	add `Graph.contractNodes` (#1380 ) This commit adds Graph.contractNodes, which allows collapsing certain nodes in the graph into each other. This will enable the creation of a SourceCred "identity" plugin, allowing identity resolution between users different accounts on different services. Test plan: Thorough unit tests have been added. `yarn test` passes. Thanks to @wchargin for [review feedback][1] which significantly improved this API. [1]: https://github.com/sourcecred/sourcecred/pull/1380#discussion_r324958055	2019-09-18 13:59:49 +02:00
William Chargin	ddf07c6714	Replace `PartialTimelineCredParams` with `$Shape` (#1379 ) Summary: Flow provides a utility type for this purpose; there’s no need to implement, document, and keep it in sync ourselves: <https://flow.org/en/docs/types/utilities/#toc-shape> Test Plan: As written, `yarn flow` passes. Changing the definition of `params` on line 77 of `load.test.js` to add a key `foo: "wat"` or change the value of `weights` to `{hmm: "hmm"}` yield appropriate type errors. wchargin-branch: use-shape	2019-09-16 19:22:35 -07:00
William Chargin	3cb22565e5	mirror: update EAV primitives (#1342 ) Summary: This commit modifies `_updateOwnData` to write to both the old type-specific primitives tables as well as the new EAV table. This establishes the invariant that a node with non-null `last_update` will always have primitive data (if its object type has primitive fields). Test Plan: Existing tests expanded. Commenting out each of the `updateEavPrimitive` calls (independently) causes a test to fail. Note that every test that queries an internal `primitives_*` table to inspect the database state has been expanded to make an equivalent query against the `primitives` table as well. wchargin-branch: mirror-eav-update	2019-09-14 17:28:09 -07:00
William Chargin	463f3a073a	mirror: initialize EAV primitives at registration (#1341 ) Summary: This establishes the invariant that every object in the `objects` table has all relevant rows in the `primitives` table, though those rows’ values are never yet set. Test Plan: Unit tests updated. Manually loading `sourcecred/example-github` and running `.dump primitives` generates reasonable-looking output, with lots of rows, including entries for nested fields and eggs. Verified that the set of non-`id` columns on `Issue` equals the set of values for the `fieldname` column of an `Issue` object, and likewise for `Commit`s, thus covering each kind of field. wchargin-branch: mirror-eav-init	2019-09-14 17:24:58 -07:00
William Chargin	0418dfe9dd	mirror: add `primitives` table for EAV migration (#1340 ) Summary: See #1313 for context. The plan is to set up dual-writes with `extract` calls still reading from the old tables until the new ones are complete and tested. The primary risk to production would be a fatal exception in the new write paths, which seems like an acceptable risk. Test Plan: Unit tests pass. wchargin-branch: mirror-eav-schema	2019-09-14 17:21:42 -07:00
William Chargin	976afb6665	mirror: test `registerObject` with nested fields (#1339 ) Summary: Prior to this commit, removing the `addLink.run({id, fieldname})` on line 487 of `mirror.js` would cause test failures down the pipeline, but not at the root cause. Such an error is now caught earlier. Test Plan: Comment out line 487 of `mirror.js` and observe that the newly added test case fails, but the other `registerObject` test cases do not. wchargin-branch: mirror-test-registerobject-nested	2019-09-14 17:16:24 -07:00
Dandelion Mané	c58315fe4d	Hackily add support for mixed GitHub/Discourse projects (#1378 ) For phase one of the CredSperiment, I need a SourceCred instance which combines GitHub and Discourse servers. I'll also need to be able to give it very specific configuration to collapse certain user identities together. Shortly after launching the CredSperiment, I plan to come back and totally re-write SourceCred's command line interface and site building system, in a way that will throw away most of the existing codebase. As such, I found it expedient to add rather hacky and untested support for loading combined GitHub/Discourse instances, so I can land the promised features. This PR does so by: - adding sourcecred gen-project for constructing project.json files - adding sourcecred load --project for loading a project.json file - ensuring that load provides the right plugins based on the project that's in scope - updating build_static_site so that it can use the new --project flag Test plan: I have done some end-to-end testing, but the overall commit stack lacks automated testing. This is a deliberate tradeoff: I'm planning to re-write this section of the codebase, and the testing ergonomics are not great, so I'd rather accept some technical debt, especially since I plan to pay it off soon. See the pull request on GitHub for the individual constituent commits.	2019-09-12 17:35:21 +02:00
Dandelion Mané	7a0dd49b42	factor loadWeights into Common (#1377 ) As suggested by @Beanow in [a review comment][1], this commit factors loading weights from disk into a cli/common utility method. The actual method is really generic, and we have a number of similar constructions across the codebase (grep for `JSON.parse` to find them). I considered factoring out a generic utility for loading and deserializing JSON data from disk in general, but it didn't seem valuable enough at this time. Test plan: Unit tests added, existing tests pass. [1]: https://github.com/sourcecred/sourcecred/pull/1374#discussion_r323149740	2019-09-12 15:55:05 +02:00
Dandelion Mané	0a0010f38e	Share default TimelineCredParameters (#1376 ) At present, every place in the codebase that needs TimelineCredParameters constructs them ad-hoc, meaning we don't have any shared defaults across different consumers. This commit adds a new type, `PartialTimelineCredParameters`, which is basically `TimelineCredParameters` with every field marked optional. Callers can then choose to override any fields where they want non-default values. A new internal `partialParams` function promotes these partial parameters to full parameters. All the public interfaces for using params (namely, `TimelineCred.compute` and `TimelineCred.reanalyze`) now accept optional partial params. If the params are not specified, default values are used; if partial params are provided, all the explicitly provided values are used, and unspecified values are initialized to default values. Test plan: A simple unit test was added to ensure that weights overrides work as intended. `git grep "intervalDecay: "` reveals that there are no other explicit parameter constructions in the codebase. All existing unit tests pass.	2019-09-12 15:21:13 +02:00
Dandelion Mané	def1fef192	Factor TimelineCredParameters into new module (#1375 ) The `timelineCred.js` file is a bit of a beast. One way to start slimming it down is to pull the parameters into their own file. This is especially helpful as I'm planning a followon PR that will colocate the default parameter values with their declaration. The naming of everything in the `/timeline/` subdirectory is a bit wonky: it reflects that at the time of creation, "Timeline" designated an experimental version of SourceCred. Now, it is becoming canonical, but the cumbersome naming persists. I haven't made any effort to tackle the name debt here. Test plan: `yarn test` passes; since this is merely a code reorganization, this give me great confidence that the change is correct. I also added a few small tests to the new module. Although the behavior in question is already tested, I think setting up test files liberally is a good practice, as the existence of the test file invites the creation of more tests.	2019-09-12 15:12:17 +02:00
Dandelion Mané	e1b9b07cac	group explorer types by plugin (#1373 ) Now that we're adding support for the Discourse plugin, we'll start having >1 plugin present in the frontend again. As such, we should provide clear grouping of types in the frontend so that it's possible to distinguish between a GitHub user and a Discourse user. This commit does just that, by resurrecting code that we used when the GitHub and Git plugins co-existed in the frontend. Test plan: Launch the fronted and observe that node types in the filter selection dropdown are grouped by the name of their plugin. Also, clicking on the name of a plugin should filter to all nodes from that plugin.	2019-09-11 02:28:42 +02:00
Dandelion Mané	093955dea1	scores command no longer assumes GitHub plugin (#1372 ) Previously, the `sourcecred scores` command assumed that all users are GitHub users, and assigned users an id based on their GitHub login. Now, the command returns information on all users, regardless of which plugin provided them. As such, we need to identify users differently. Instead of a string id, they now have an array of address parts. That array contains all of the parts of their corresponding node address. For example, the GitHub user `@Beanow` would correspond to the address array `["sourcecred", "github", "USERLIKE", "USER", "Beanow"]` As a general convention, the first two components of any node's address contain information about the plugin that owns that node. The first component is the owner of the plugin, and the second is the name of the plugin. Afterwards, the plugin may represent nodes in whatever manner it sees fit. Thanks to @Beanow and @vsoch for some feedback and discussion on this design. Test plan: Snapshots have been updated. `yarn test` passes.	2019-09-10 23:49:45 +02:00
Dandelion Mané	b3ffd3758b	TimelineExplorer defaults to showing all users (#1371 ) Now instead of always defaulting to GitHub users, it shows all user-typed nodes. This will make SourceCred work non-hackily when there is e.g. just a Discourse plugin in scope. I also fixed an issue where it was loading the GitHub declaration in a hardcoded way, instead of properly getting it from the TimelineCred's plugin array. Test plan: Manual UI inspection.	2019-09-10 22:50:39 +02:00
Dandelion Mané	8de57fdb7b	add TimelineCred.userNodes (#1369 ) This is a convenience method that extracts cred for all the user-typed nodes. It's basically an abstraction over calling `credSortedNodes` with the right set of prefixes. I forsee using it in at least two places (score retrieval in the CLI and score display in the frontend) so I decided to make it a method. Test plan: A very simple unit test was added. (It's a very simple wrapper function.)	2019-09-10 20:02:28 +02:00
Dandelion Mané	1079f5ec86	timelineCred.credSortedNodes takes prefixes (#1368 ) This lets us filter by a group of prefixes simultaneously, which enables e.g. seeing all user node types at once. I also tweaked the API to make it a bit more convenient, you can now pass no arguments and get all nodes in sorted order. Test plan: Unit tests updated.	2019-09-10 19:44:03 +02:00
Dandelion Mané	65f22a0a74	Replace TimelineCredConfig with array of plugins (#1367 ) The PluginDeclaration has all of the information we need to configure TimelineCred: it knows all the node and edge types, as well as which node types are user (or scoring) node types. Therefore, we can replace the ad-hoc config object with a simple array of plugin declarations. Since the plugins will be saved as part of the TimelineCred, it means the UI can configure to only show information for plugins that are actually in scope. Test plan: `yarn test` passes, and the prototype still works. Snapshots updated.	2019-09-10 19:36:12 +02:00
Dandelion Mané	dcf4010ff0	discourse: fix fetch failure on 410 (#1366 ) When a post or topic is deleted, Discourse fetch will give status 410. As with 404 and 403, we should just ignore the post and move on. I took the opportunity to slightly refactor the fetch error handling while I was there. Test plan: Previously, doing a load on the SourceCred discourse instance would fail due to a deleted topic. Now, it doesn't.	2019-09-10 19:13:13 +02:00
Dandelion Mané	aecd2864bf	Let plugins specify user types (#1365 ) This modifies the pluginDeclaration so that it can specifiy user node types. This will allow us to replace the TimelineCredConfig type with a plugin collection instead. It's expected that the user types will also be present in the node types, although this isn't validated anywhere at present. Test plan: `yarn flow`.	2019-09-10 19:09:01 +02:00
Dandelion Mané	dbb31a586c	Capitalize Discourse plugin name (#1364 ) This ensures consistency with GitHub, and will allow us to use plugin names in the UI. Test plan: Not needed, trivial change.	2019-09-10 19:06:05 +02:00
Dandelion Mané	e2e6c56650	Enable multiple scoring node types (#1361 ) This updates the cred computation logic so that we can have multiple "scoring node types". Context: Currently, we designate a single node type (GitHub users) as the scoring node type, and normalize so that all users have 1000 score in total. This commit updates the pipeline to admit using more than one prefix for scoring, meaning that we could have GitHub users, Discourse users, and more, and still have all users sum to 1000 score. We will still need to update the frontend so that it will have a user pane which aggregates across all users. Test plan: Unit tests updated. `yarn test` passes.	2019-09-10 19:05:46 +02:00
William Chargin	0d7db99d7f	Blacklist `@allcontributors` bot (#1363 ) Summary: This adds `MDM6Qm90NDY0NDczMjE=` (`@allcontributors`) to the blacklist to enable loading the `aragon/aragon` repository. See #1362 and #996 for context. Test Plan: Running `node ./bin/sourcecred.js load aragon/aragon` on a clean cache now completes successfully. wchargin-branch: blacklist-allcontributors	2019-09-10 08:55:16 -07:00
Dandelion Mané	545b084146	Change TimelineCred filtering strategy (#1358 ) This changes how TimelineCred filtering works. Instead of using the filterTimelineCred module, which includes all nodes matching filterPrefixes, we now take all nodes matching scorePrefixes and additionally the top `k` nodes for every other type. This ensures that we will have the top comments, pull requests, issues, etc in the UI, without needing to take every single comment or PR or issue. Concurrently, the UI is updated so that every type is included in the filter dropdown. CHANGELOG has been updated, since this is user facing. Test plan: `yarn test` passes, snapshots are updated, and I also tested the UI manually.	2019-09-08 00:32:10 +02:00
Dandelion Mané	f31a92874b	hide `filterTimelineCred` (#1357 ) TimelineCred computation is implemented as follows: - Compute Distribution - Filter it down to specified node types - Wrap the filtered results into a TimelineCred I want to change how the filtering works. The new filtering logic will depend on logic we've already implemented in TimelineCred; therefore filtering should be done on the TimelineCred object and not separately. Specifically, I want to be able to filter down to the highest-scored nodes by type (dependent on the type). As a first step, I've refactored the interface to TimelineCred so that the filtering is an implementation detail, i.e. the TimelineCred constructor doesn't expect objects defined in `filterTimelineCred`. Test plan: `yarn test` passes after a snapshot update.	2019-09-08 00:20:34 +02:00
Dandelion Mané	5996dd710a	timeline cred config is stored in JSON (#1356 ) This modifies the TimelineCred serialization so that it includes the CredConfig in the JSON. This means that it's easier to coordinate which plugins and types are in scope, as the data itself can contain that information. Rather than define a new hand-rolled serializer, I just passed the config directly through for stringification. Unit tests verify that this still works (round-trip serialization is tested). As an added sanity check, I generated a new small `cred.json`, and inspected the file via `cat` to ensure that it's still legible text, and isn't interpreted as a binary file due to the `NUL` bytes in node addresses. Every client that previously depended on the `DEFAULT_CRED_CONFIG` now properly gets its cred configuration from the JSON. Test plan: Unit tests for serialization already exist. Generated a fresh `cred.json` file and tested the frontend with it. Also, `yarn test --full` passes.	2019-09-08 00:04:01 +02:00
William Chargin	5bcec38e5b	Blacklist more problematic quasar interactions (#1335 ) Blacklist more problematic quasar interactions Summary: Context: <https://github.com/sourcecred/sourcecred/issues/1256#issuecomment-526252852> Without also blacklisting the reaction, we hit an invariant violation in the relational view (reactions are expected to have exactly one author). Test Plan: Running `node ./bin/sourcecred.js load quasarframework/quasar-cli` now completes successfully (in about 2 minutes 40 seconds). It does emit a warning: ``` Issue[MDU6SXNzdWUzNDg0NjUzNDg=].reactions: unexpected null value ``` …because one of the reactions was blacklisted. But the relational view handles this correctly, it seems: timeline cred is still computed and renders without obvious error. wchargin-branch: blacklist-more-quasar	2019-09-02 08:18:36 -07:00
William Chargin	7d3d24e0ec	mirror: guess typenames and warn on mismatch (#1337 ) Summary: The format of GitHub’s GraphQL object IDs is explicitly opaque, and so we must not introspect them in any way that would influence our results. But it seems reasonable to introspect these IDs solely for diagnostic purposes, enabling us to proactively detect GitHub’s contract violations while we still have useful information about the root cause. This commit adds an optional `guessTypename` option to the Mirror constructor, which accepts a function that attempts to guess an object’s typename based on its ID. If the guess differs from what the server claims, we continue on as before, but omit a console warning to help diagnose the issue more quickly. Resolves #1336. See that issue for details. Test Plan: Unit tests for `mirror.js` updated, retaining full coverage. To test manually, revert #1335, then load `quasarframework/quasar-cli`. Note that it emits the following warning before failing: > Warning: when setting Reaction["MDg6UmVhY3Rpb24zNDUxNjA2MQ=="].user: > object "MDEyOk9yZ2FuaXphdGlvbjQzMDkzODIw" looks like it should have > type "Organization", but the server claims that it has type "User" Unit tests for the GitHub typename guesser added as well. Running `yarn test --full` passes. wchargin-branch: mirror-guess-typenames	2019-09-01 01:04:53 -07:00
William Chargin	ae8ab0d1bd	Check typesafety of `NullUtil.filterList` (#1328 ) Summary: The current implementation of `NullUtil.filterList` uses an `any`-cast. This is fine as long as the definition is actually typesafe; we should take a least a little care to ensure that it is. This commit adds a typesafe version, commented out but still typechecked, and refines the type around the `any`-cast to make the cast slightly more robust. Test Plan: Note that changing `$ReadOnlyArray<?T>` to `$ReadOnlyArray<?T \| number>` in the declaration of `filterList` caused no Flow error prior to this commit, but now causes one. wchargin-branch: filter-list-typecheck	2019-08-26 10:35:08 -07:00
William Chargin	909045a7ec	Rename `NullUtil.filter` to `NullUtil.filterList` (#1327 ) Summary: The old name is misleading. There _is_ a function called `filter` on options, but its type is `(Option<T>, (T -> boolean)) -> Option<T>`: - Java: <https://docs.oracle.com/javase/8/docs/api/java/util/Optional.html#filter-java.util.function.Predicate-> - Rust: <https://doc.rust-lang.org/std/option/enum.Option.html#method.filter> - Haskell: <https://hackage.haskell.org/package/base-4.12.0.0/docs/Control-Monad.html#v:mfilter> - OCaml (Core): <https://ocaml.janestreet.com/ocaml-core/latest/doc/base/Base/Option/index.html#val-filter> This is even inconsistent with SourceCred’s own documentation: <`126332096f/src/util/null.js (L31)`> In general, a function called `foo` on options where `foo` also exists on lists has the meaning, “interpret an `Option<T>` as a subsingleton list, apply `foo` to the list, and reinterpret as an option”. To choose the same name a conflicting function is confusing. The function that was wanted is really just a special case of `flatMap`. For instance, in Java: ``` -> import java.util.stream.Collectors; -> (List.of(Optional.of(3), Optional.empty(), Optional.of(2)) >> .stream() >> .flatMap(Optional::stream) >> .collect(Collectors.toList())) \| Expression value is: [3, 2] \| assigned to temporary variable $2 of type List<? extends Object> ``` Yet some languages do provide it as a utility function: [`catMaybes`] in Haskell, or [`List.filter_opt`] in OCaml (Core). For parallelism with the latter, we define `NullUtil.filterList`. [`List.filter_opt`]: https://ocaml.janestreet.com/ocaml-core/latest/doc/base/Base/List/#val-filter_opt [`catMaybes`]: https://hackage.haskell.org/package/base-4.12.0.0/docs/Data-Maybe.html#v:catMaybes Test Plan: That `yarn flow` passes suffices. wchargin-branch: filter-list	2019-08-26 10:16:44 -07:00
Dandelion Mané	12a3321ea7	Fix failing snapshot test (#1329 ) PR #1325 introduced a failing snapshot test, which was promptly caught by @wchargin. This commit fixes it by running `./scripts/update_snapshots.sh`. Also, I bumped the project JSON version number, which also should have happened in #1325. Test plan: `yarn test --full` passes.	2019-08-26 18:23:11 +02:00
Dandelion Mané	b4463f2ab7	cli load uses discourse key (#1326 ) This commit modifies `cli/load` to appropriately load a Discourse key from the environment, if it is available. The mechanics are basically the same as with the GitHub token. Test plan: Unit tests added. `yarn test` passes.	2019-08-26 13:40:19 +02:00
Dandelion Mané	243437f1cd	api: add support for loading Discourse servers (#1325 ) This commit modifies the `Project` type so that it allows settings for a Discourse server, and ensures that `api/load` will appropriately load the server in question, and include it in the output graph. Putting the full Discourse declaration directly into the Project type is an unsustainable development practice—in general, adding plugins should not require changing core data types. However, at the moment I'm punting on polishing the plugin system, in favor of adding the Discourse plugin quickly, so I just put it into Project alongside the repo ids. In the future, I expect to refactor the plugins around a much cleaner interface; it's just not a priority as yet. (Tracking: #1120.) This commit also makes the GitHub token optional in `api/load`, since now it's very plausible that a user will want to only load a Discourse server, and therefore not require a GitHub token. As of this commit, it's still impossible to load Discourse projects, as the CLI always sets a null Discourse server; and in any case, the frontend would not properly display the project in question, as any Discourse types would get filtered out. Test plan: Mocking unit tests have been added to `api/load.test.js` to ensure that the Discourse graph is loaded and merged correctly.	2019-08-26 13:31:52 +02:00
Dandelion Mané	126332096f	NullUtil: add `filter` (#1324 ) This adds a new method called `filter` to the `NullUtil` module. `filter` enables you to filter all the null-like values out of an array in a convenient typesafe way. (It's really just a wrapper around `Array.filter((x) => x != null)` with a type signature.) Test plan: Unit tests added (for both functionality and type safety).	2019-08-26 13:20:40 +02:00
Dandelion Mané	ebdd2a05c5	add a loadDiscourse method (#1323 ) This is the analogue to `github/loadGraph`, but for Discourse. It basically pipes together the mechanisms for loading Discourse data and creating a Discourse graph from them, resulting in a single endpoint for consumption in the API. In contrast to github, the method is called `loadDiscourse` and not `loadGraph`, which seemed more appropriate to me. I haven't changed the corresponding GitHub method's name. (I'm currently knowingly letting conceputal debt accumulate around the plugin interface; I expect to do a full refactor within the next few months.) Test plan: This is the kind of "pipe together tested APIs involving IO" code which I have decided not to write explicit tests for. However, it is still protected by flow, and I have a branch (`discourse-plugin`) which uses this code to do a full Discourse load.	2019-08-26 12:25:42 +02:00
Dandelion Mané	0e3ce1c531	mirror: output progress to taskReporter (#1322 ) It's nice to get some sense of what is happening while waiting for a Discourse load. Test plan: See attached unit tests.	2019-08-24 12:41:19 +02:00
Dandelion Mané	012f19eb48	discourse fetch: add rate limiting (#1321 ) This implements rate limiting to the Discourse fetch logic, so that we can actually load nontrivial servers without getting a 529 failure. We could have used retry; I thought it was more polite to actually limit the rate at which we make requests. However, to avoid seeing 529s in practice, I left a bit of a buffer: we make only 55 requests per minute, although 60 would be allowed. If we want to improve Discourse loading time, we could boost up to the full 60 request/min, but add in retries. (Or we could switch to retries entirely.) Test plan: This logic is untested, however my full discourse-plugin branch uses it to do full Discourse loads without issue.	2019-08-23 17:51:35 +02:00
William Chargin	c162813a5e	Fix Prettier deprecations and typings post upgrade (#1307 ) Summary: In #1194, we upgraded Prettier from 1.13.4 to 1.18.2, but this upgrades past <https://github.com/prettier/prettier/pull/5647>, which was first released in Prettier 1.16.0. This commit fixes the uses of deprecated code introduced as a result. It also upgrades the type definitions to match, via `flow-typed install prettier@1.18.2`. Addresses part of #1308. Test Plan: Prior to this commit, running `yarn unit` would print ``` console.warn node_modules/prettier/index.js:7934 { parser: "babylon" } is deprecated; we now treat it as { parser: "babel" }. ``` in two test cases; it no longer prints any such warnings. Furthermore, running `git grep 'parser.*babylon'` no longer finds any matches. wchargin-branch: prettier-deprecations	2019-08-22 09:00:25 -07:00
William Chargin	0367c9e50c	Fix missing prefixes in Discourse plugin (#1309 ) Summary: A `PluginDeclaration` must have a `nodePrefix` and an `edgePrefix`, but the Discourse plugin declaration was missing these. This was not caught by Flow because `deep-freeze` was introduced in #1249 without type definitions; see #1308. Test Plan: Apply the following patch: ```diff diff --git a/src/plugins/discourse/declaration.js b/src/plugins/discourse/declaration.js index 246a0a28..36ae5f13 100644 --- a/src/plugins/discourse/declaration.js +++ b/src/plugins/discourse/declaration.js @@ -1,6 +1,6 @@ // @flow -import deepFreeze from "deep-freeze"; +declare function deepFreeze<T>(x: T): T; import type {PluginDeclaration} from "../../analysis/pluginDeclaration"; import type {NodeType, EdgeType} from "../../analysis/types"; import {NodeAddress, EdgeAddress} from "../../core/graph"; ``` Note that, with this patch, `yarn flow` fails before this change but passes after it. Running `yarn unit` still passes. wchargin-branch: discourse-plugin-prefixes	2019-08-22 08:52:52 -07:00
William Chargin	acdcbc29ed	Fix 404s in timeline and legacy-mode interlinks (#1305 ) Summary: All links in SourceCred must use the `Link` component, providing either an external URL `href={…}` or an internal route `to={…}`. Any uses of a raw `<a>` element for internal routes will incur 404s when the application is hosted on a non-root path, as is currently the case on the production website. The change to `FileUploader` is not strictly necessary, as the link has no styled text and uses a `data:` URL, but there’s no reason not to. Fixes #1304. Test Plan: Build the static site: ``` scripts/build_static_site.sh --target cred --project sourcecred/example-github ``` Then run `python3 -m http.server` from the repository root directory—not the `cred/` subdirectory—and navigate to the timeline cred view: <http://localhost:8000/cred/timeline/sourcecred/example-github/> Observe that the “(legacy)” link now has the correct styling and correctly navigates to the legacy mode page when clicked: prior to this change, it would navigate to a URL without the proper `/cred/` path prefix, yielding a 404. On the legacy page, verify that the “timeline mode” link has the same properties. Then, visit <http://localhost:8000/cred/test/FileUploader/> and verify that the inspection test still passes. Added a regression test to catch further such errors. Note that reverting the code changes in this commit causes the test to fail, and that running it with `--verbose` prints the problematic files. wchargin-branch: fix-bad-routing-404s	2019-08-18 14:43:34 -07:00
William Chargin	0eeda22e0c	Remove console error from default timeline loader (#1303 ) Summary: This is firing on a production page load of the “prototype” link from the homepage, and does not seem to actually be an error condition. Test Plan: Run `yarn start`, navigate to `/timeline/sourcecred/example-github/`, and observe that the console error has disappeared. wchargin-branch: defaultloader-console-error	2019-08-18 13:35:02 -07:00
William Chargin	9fc1482d9d	discourse: save superfluous query of `likes` table (#1302 ) Summary: When inserting a “like” action with `INSERT OR IGNORE` semantics, we also learn whether the action had any effect. We can use this bit to avoid a separate query checking whether the “like” already exists. As mentioned here: <https://github.com/sourcecred/sourcecred/pull/1298#discussion_r314994911> Test Plan: Running `yarn test` passes as is, and fails if you change `addLike` to always return either `changed: true` or `changed: false`. wchargin-branch: discourse-likes-one-query	2019-08-18 12:00:59 -07:00
William Chargin	5e26424d82	discourse: factor SQL parsing out of loops (#1301 ) Summary: Calling `db.prepare(sql)` parses the text in `sql` and compiles it to a prepared statement. This takes time, both for the parsing and allocation itself and for the context switch from JavaScript to C (SQLite). Prepared statements are designed to be invoked multiple times with different bound values. This commit factors prepared statement creation out of loops so that each call to `update` prepares only a constant number of statements. In doing so, we naturally factor out some light JS abstractions over the raw SQL: `addTopic((topic: Topic))`, rather than `addTopicStmt.run(…)`. In principle, these could be factored out of `update` entirely to properties set on the class at initialization time, but, as described in a comment on the GraphQL mirror, we defer this optimization for now as it introduces additional complexity. Test Plan: Running `yarn test --full` passes. wchargin-branch: discourse-sql-cse	2019-08-18 11:58:44 -07:00
Dandelion Mané	bf5c3a953b	discourse: tweak default weights This commit changes the Discourse default weights around, mostly significantly moving many weights (e.g. LIKES) that have a 0 backward weight to have a small positive backward weight instead, like 1/16. In practice, this mitigates an issue where users with few outbound edges act as "cred sinks" because the cred gets stuck in a loop between the user and content they've authored. Test plan: In local experimentation, I've found the new weights produce more reasonable-seeming cred attribution.	2019-08-18 19:49:08 +02:00
Dandelion Mané	0d6e868324	discourse: distinguish the different AUTHORS edges I've written the Discourse plugin with distinct edge types for post and topic authorship; it allows us to have more precise control over how cred flows (and mitigates the need for #968). However, I gave the two types the same name, which is confusing in the weight config ui. Now they are properly distinct. Test plan: It's a simple string change. In (unpublished) commits with a full Discourse integration, the new strings show up nicely in the UI.	2019-08-18 19:49:08 +02:00
Dandelion Mané	cedc8e8fe5	discourse: fix post urls The previous code incorrectly constructed a Discourse post url based on the post's id, rather than its index within the containing topic. This is now fixed. Test plan: There isn't actually a snapshot diff, because the post with id 2 is also the second post in its thread. I'm not too worried about this, though: this kind of code changes infrequently, and it's pretty obvious when it's wrong.	2019-08-18 19:49:08 +02:00
Dandelion Mané	c082b8faf2	discourse: add likes edges to the graph (#1299 ) A very simple commit; we add a type for likes edges, and add them to the graph. Test plan: Unit tests added; yarn test passes.	2019-08-18 19:38:32 +02:00
Dandelion Mané	bf68a4c01d	discourse mirror updates likes (#1298 ) The Discourse mirror class now keeps an up-to-date record of all of the likes within an instance. It does this by iterating over every user in the history, and requesting their likes. If at any point we hit a like we've already seen, we move on to the next user. In the future, we can improve this so we only query users we haven't checked in a while, or users who were recently active. Test plan: Tests verify that we correctly store all the likes, including after partial updates, and that we don't issue unnecessary queries.	2019-08-18 19:31:52 +02:00
Dandelion Mané	e7b1fbd681	mirror: allow fetching all usernames (#1293 ) This is a minor change to the Discourse mirror so that it supports a query to get all users from the server. It will be convenient for a followon change which makes `update` search for every user's likes. I also modified createGraph so that it uses the new method, which results in code that is cleaner and slightly more efficient. Test plan: Unit tests updated.	2019-08-18 16:27:11 +02:00
Dandelion Mané	f3ae0a8415	discourse fetcher: add support for likes (#1294 ) For the Discourse plugin, we really want to be able to add a full record of all of the users' liked posts as edges in the graph. It's a really high-signal way to move cred, that also gives individual users a lot of agency and way to engage. However: we need an API to get this data. Initial searches of API docs were un-promising; fisrt, we would need to query potentially every post to get its likes individually (makes it very expensive to find the likes on old posts), and second, the likes did not come with timestamp information. For a while, I thought we were at an impasse. I then went fishing in the Discourse implementation for a solution (yay open source!). Lots of the API is un-documented, since it's whatever they happen to add to run Discourse. And it turns out there's a `user_actions` API ([source]) which can provide all of a user's actions in order, and having your content liked by someone else is considered an action. Best of all, these actions come with timestamps. The upshot is that instead of querying every post to get its likes, we can query every user to get likes. Iterating over all users can still be slow, but it's far better than iterating over all posts; plus we can implement caching so that we only infrequently check in on inactive users. I've added a `likesByUser` method to the Discourse fetch interface that provides this information. I've also added a snapshot test for it (and updated all of the snapshots). I also rolled in a slight refactor to error handling in the fetcher. The mirror doesn't yet use this information (will come later). [source]: `82e07cb0f4/app/controllers/user_actions_controller.rb (L3)` Test plan: `yarn test` passes. Snapshots look good.	2019-08-18 16:20:33 +02:00
William Chargin	64f282fff2	discourse: consolidate parallel `MAX` queries (#1296 ) Summary: As mentioned in <https://github.com/sourcecred/sourcecred/pull/1266#discussion_r312345441>. Test Plan: Running `yarn test` passes. wchargin-branch: discourse-single-max-query	2019-08-17 12:58:55 +02:00
Dandelion Mané	b50ba67797	add discourse declaration and createGraph (#1292 ) This commit adds the logic needed for creating a contribution graph based on the Discourse data. We first have a declaration with specifications for the node and edge types in the plugin. We also have a `createGraph` module which creates a conformant graph from the Mirror data. The graph creation is thoroughly tested. Test plan: Inspect unit tests, run `yarn test`. I also have (yet unpublished) code which loads the graph into the UI, and it appears fine.	2019-08-17 04:20:27 +02:00
Dandelion Mané	69831d6961	mirror: make a data interface (#1291 ) This is a quick fixup so that the coming createGraph module can be properly tested. Shout out to @Beanow for anticipating this need in a [review comment]. [review comment]: https://github.com/sourcecred/sourcecred/pull/1266#discussion_r314305108 Test plan: trivial refactor, run `yarn test`	2019-08-15 20:12:20 +02:00
Dandelion Mané	2f8e1c61e4	Add a Discourse API mirror (#1266 ) The mirror wraps a SQLite database which will store all of the data we download from Discourse. On a call to `update`, it downloads new data from the server and stores it. Then, when it is asked for information like the topics and posts, it can just pull from its local copy. This means that we don't need to re-download the content every time we load a Discourse instance, which makes the load more performant, more robust to network failures, etc. Thanks to @wchargin, whose work on the GraphQL mirror for GitHub (#622) inspired this mirror. Test plan: I've written unit tests that use a mock fetcher to validate the update logic. I've also used this to do a full load of the real SourceCred Discourse instance, and to create a corresponding graph (using subsequent commits). Progress towards #865.	2019-08-15 16:08:04 +02:00
Dandelion Mané	fd95be68a9	Add class for fetching data from Discourse (#1265 ) The `DiscourseFetcher` class abstracts over fetching from the Discourse API, and post-processing and filtering the result into a form that's convenient for us. Testing is a bit tricky because the Discourse API keys are sensitive (they are admin keys) and so I'm reluctant to commit them, even for our test instance. As a workaround, I've added a shell script which downloads some data from the SourceCred test instance, and saves it with a filename which is an encoding of the actual endpoint. Then, in testing, we can use a mocked fetch which actually hits the snapshots directory, and thus validate the processing logic on "real" data from the server. We also test that the fetch headers are set correctly, and that we handle non-200 error codes appropriately. Test plan: In addition to the included tests, I have an end-to-end test which actually uses this fetcher to fully populate the mirror and then generate a valid SourceCred graph. This builds on API investigations [here](https://github.com/sourcecred/sourcecred/issues/865#issuecomment-478026449), and is general progress towards #865. Thanks to @erlend-sh, without whom we wouldn't have a test instance.	2019-08-15 13:22:06 +02:00
William Chargin	68aad8b205	Remove non-standard bare-`catch` blocks (#1281 ) Summary: In ES6, the [`try` statement grammar][1] requires a catch parameter; the parameter is only optional in the latest draft of ECMAScript, which is of course not yet ratified as any actual standard. Even though we don’t officially pledge to support Node 8, this is currently the only breakage, and it’s easy enough to fix. [1]: https://www.ecma-international.org/ecma-262/6.0/#sec-try-statement Test Plan: Running `yarn start` on Node v8.11.4 no longer raises a syntax error. wchargin-branch: catch-parameter	2019-08-13 09:41:45 -07:00
William Chargin	607a6d426d	Fix type error in FileUploader (#1280 ) Summary: Introduced in #1277. Test Plan: Run `yarn start` and visit <http://localhost:8080/test/FileUploader/>. Conduct the test plan as specified on that page. wchargin-branch: fileuploader-target	2019-08-13 18:20:54 +02:00
William Chargin	09493eb368	rasterize: replace backtick command substitution Summary: Backticks are discouraged relative to the `$(…)` form for command substitution, because they are harder to read and do not nest without exponential escaping: ```shell $ foo=$(echo $(echo hi $(echo bye))) # clear $ foo=`echo \`echo hi \\\`echo bye\\\`\`` # hmm ``` In this context, command substitution should not be used at all; `PWD` is a special variable that always contains the current working directory. The new version of the code will be correct even if the current working directory ends with whitespace that would be stripped off by the command substitution. Test Plan: Prepended an `echo` to the relevant line, and verified that the script has the same output before and after this change. wchargin-branch: fix-backticks	2019-08-13 15:39:51 +02:00
Dandelion Mané	a7ccf7ff6d	Update logo This adds a new version of the logo, based on the work by @ericronne in PR #1261, but regenerated using an algorithmic approach, which can be found in [this notebook]. Also, the color scheme has changed. Thanks to @lbStrobbe for a lot of creative feedback, and to everyone who participated in the [feedback thread] and the original [logo issue] on GitHub. Besides committing the rasterized assets, this commit also updates the favicon. [feedback thread]: https://discourse.sourcecred.io/t/more-logo-explorations/142 [this notebook]: https://observablehq.com/@decentralion/sourcecred-logo-explorations [logo issue]: https://github.com/sourcecred/pm/issues/5	2019-08-09 14:28:23 +02:00
Eric Ronne	d205fe5a99	Revised color logo	2019-08-08 16:53:35 +02:00
Dandelion Mané	c62ddccfec	Release version 0.4.0 (#1271 ) Test plan: `yarn test --full`	2019-08-07 20:12:11 +02:00
Dandelion Mané	26c0910a1f	TimelineExplorer: Enable changing selected type (#1268 ) The code is mostly ported from the legacy app. However, we no longer assume that we are showing every type for every plugin. Instead, the types are manually selected. For now, we permit the GitHub user type, and the GitHub repo type, as these are the two types that are included in filtered timeline cred. Test plan: Manual inspection is necessary, since this frontend is mostly untested. I've done that inspection. Also, `yarn test` passes.	2019-08-07 17:54:04 +02:00
Dandelion Mané	ee5f2a9a56	MapUtil.pushValue returns the array Minor change to the API for MapUtil.pushValue. Now it returns the resultant array. I've found this convenient in at least one case, and previously we weren't returning anything, so it's a cheap change. Test plan: Unit test added.	2019-08-06 21:36:42 +02:00
Dandelion Mané	d5a1ca30b4	Fixup project for move of example repos I moved sourcecred/example-git{,hub} to the @sourcecred-test org. This commit fixes the build given that move. I've realized that in #1233 I in-advertently made some Git tests that depend on a snapshot un-updateable. I'm going to compound on that slight technical debt by skipping the tests that depended on that snapshot. I recognize and accept that I'll need to pay this down when I resuscitate the git plugin. Test plan: `yarn test --full`.	2019-07-23 02:36:28 +01:00
Dandelion Mané	ca4fb2bc5d	Remove deprecated commands and adapters This commit removes the `pagerank` and `analyze` commands (both of which never saw real usage), removes the outdated adapter-based `loadGraph` method, and removes all traces of the analysis adapters. It builds on work in #1233 and #1136. Test plan: `yarn test --full` passes.	2019-07-23 01:31:18 +01:00
Dandelion Mané	c15e97b4d4	change the world: track projects not repos This commit swaps usage over to the new implementation of `cli/load` (the one that wraps `api/load`) and makes changes throughout the project to accomodate that we now track instances by Project rather than by RepoId. Test plan: Unit tests updated; run `yarn test --full`. Also, for safety: actually load a project (by whole org, why not) and verify that the frontend still works.	2019-07-23 01:01:09 +01:00
Dandelion Mané	e31269283a	re-implement src/cli/load The new implementation wraps `api/load`. Test plan: I've ported over the tests from the old `cli/load`. Run `yarn test`.	2019-07-23 01:01:09 +01:00
Dandelion Mané	1f7ee2ed1c	deprecate cli/load This commit deprecates `cli/load` so that we can write a new implementation, and then make an atomic switch. Test plan: `yarn test --full`	2019-07-23 01:01:09 +01:00
Dandelion Mané	4ae502bfd3	remove old-style git loading, and its testing I'm re-organizing SC data to be oriented on the graph, rather than on plugin-specific data structures. So there is no longer a need for the git loading logic which orients around saving a repository.json file that's been potentially merged across repos, or indeed the logic for merging repositories at all. So I'm removing `git/loadGitData`, `git/mergeRepository`, and relatives. Test plan: `yarn test --full` passes.	2019-07-23 01:01:09 +01:00
Dandelion Mané	37b39a7e0a	remove dep on `mkdirp` (#1253 ) There's no need for us to depend on `mkdirp`, because the `fs-extra` module already has `fs.mkdirp` and `fs.mkdirpSync`. This commit removes the dep from our `package.json`, and removes all explicit imports of it. Test plan: `yarn test --full` passes. `git grep "import mkdirp"` has no hits.	2019-07-23 00:28:49 +01:00
Dandelion Mané	d0660c9366	add `api/load` (#1251 ) This adds a new module, `api/load`, which implements the logic that will underly the new `sourcecred load` command. The `api` package is a new folder that will contain the logic that powers the CLI (but will be callable directly as we improve SourceCred). As a heuristic, nontrivial logic in `cli/` should be factored out to `api/`. In the future, we will likely want to refactor these APIs to make them more atomic/composable. `api/load` does "all the things" in terms of loading data, computing cred, and writing it to disk. I'm going with the simplest approach here (mirroring existing functionality) so that we can merge #1233 and realize its many benefits more easily. This work is factored out of #1233. Thanks to @Beanow for [review] of the module, which resulted in several changes (e.g. organizing it under api/, having the TaskReporter be dependency injected). [review]: https://github.com/sourcecred/sourcecred/pull/1233#pullrequestreview-263633643 Test plan: `api/load` is tested (via mocking unit tests). Run `yarn test`	2019-07-22 15:41:24 +01:00
Dandelion Mané	aa72bee217	Make `TaskReporter` an easily-tested interface This commit refactors the `util/taskReporter` module so that `TaskReporter` is an interface; the class previously called `TaskReporter` is renamed to `LoggingTaskReporter`. We also export a `TestTaskReporter` which implements the interface, and is very easy to test. The motivation: This will make it much easier to write tested code that uses a `TaskReporter`, as now the test code can provide a `TestTaskReporter` and check that all tasks get finished, that task ids are as expected, etc. Test plan: The `TestTaskReporter` is tested. Run `yarn test`.	2019-07-22 15:15:33 +01:00
Dandelion Mané	6bd7fe1154	Replace `Object.freeze` with `deepFreeze` Throughout the codebase, we freeze objects when we want to ensure that their properties are never altered -- e.g. because they are a plugin declaration, or are being re-used for various test cases. We generally use `Object.freeze`. This has the disadvantage that it does not work recursively, so a frozen object's mutable fields and properties can still be mutated. (E.g. if `const obj = Object.freeze({foo: []})`, then `obj.foo.push(1)` will succeed in mutating the 'frozen' object). Sometimes we anticipate this and explicitly freeze the sub-fields (which is tedious); sometimes we forget (which invites errors). This change simply replaces all instances of Object.freeze with [deep-freeze], so we don't need to worry about the issue at all anymore. Test plan: `yarn test` passes (after updating snapshots); `git grep Object.freeze` returns no hits. [deep-freeze]: https://www.npmjs.com/package/deep-freeze	2019-07-21 16:21:12 +01:00
Dandelion Mané	e4c96f3a18	Add `github/loadGraph` This is a replacement for `github/loadGithubData` which returns a combined Graph rather than a combined RelationalView. This provides a major benefit, which is that we can use the (robust) Graph merge logic rather than the (buggy) relational view merge. Test plan: This function is untested. It basically pipelines a few APIs together. I think that flow is basically sufficient to validate that it works, and writing a unit test will be frustrating (mostly will involve re-integrating the funcitonality via mocks). A future commit makes this part of the pipeline that generates snapshot tests, so it is de-facto integration tested.	2019-07-21 15:38:10 +01:00
Dandelion Mané	0a34c8b036	add github/specToProject This module builds on the project logic added in #1238, and makes it easy to create projects based on a simple string configuration. Basically, the spec `foo/bar` creates a project containing just the repo foo/bar, and the spec `@foo` creates a project containing all of the repos from the user/organization named foo. This is pulled out of #1233, but I've enhanced it to support organizations out of the box. The method is thoroughly tested.	2019-07-21 13:12:34 +01:00
Dandelion Mané	daa7409abb	Add util/taskReporter It's a lightweight utility for reporting task progress in a CLI. It's inspired by execDependencyGraph. Test plan: `yarn test`; unit tests included.	2019-07-21 12:32:38 +01:00
Dandelion Mané	0889a0a5d1	Use url encoding and make _getProjectIds async Test plan: `yarn test --full` still passes. Also, I've ensured that the async `_getProjectIds` is still usable in our webpack configs (via modifying and testing the dependent commits).	2019-07-21 12:24:10 +01:00
Dandelion Mané	9b105ee4ce	Add `core/project` and `core/project_io` This creates a new `Project` type which will replace `RepoId` as the index type for saving and loading data. The basic data type is added to `project.js`. Rather than having a `RepoIdRegistry`, I intend to infer the registry at build time by scanning for available projects saved in the sourcecred directory. I've added the `project_io` module for this task. It has methods for setting up a project subdirectory, and loading the `Project` info from that subdirectory. To ensure that projects ids can be encoded even if they have symbols like `/` and `@`, we base64 encode them. To ensure that project ids can be retrieved at build time, the `getProjectIds` method is factored out into its own plain ECMAScript module. For all non-build time needs, it is re-exported from `project_io`. Test plan: Unit tests added; run `yarn test`.	2019-07-21 12:24:10 +01:00
Dandelion Mané	aa28c932c5	Another stab at fixing CI flakiness See #1243 for context. This is basically a more aggressive version of pull #1230 -- instead of just running unit tests in isolation, we also run flow in isolation, and kill the servers afterwards. Test plan: See how this fares in CI :)	2019-07-19 15:35:44 +01:00
Robin van Boven	7509a78f65	Add --weights as load option (#1224 ) Includes a change to `cli/load` and `build_static_site.sh` to accept a `--weights WEIGHTS_FILE` argument. This allows overriding the default weights at build-time using a `weights.json` that has the same format as previously generated in the frontend. Test plan: Adds an additional test-case as well for propagating the optional parameter. The file I/O of loading and parsing a weights.json file was tested manually. As analysis/weights' fromJSON() is tested elsewhere as is passing weight parameters.	2019-07-15 15:25:28 +01:00
Dandelion Mané	7a88d32cb2	Remove the ExplorerAdapter from the legacy app Prior to #1136, we needed an `ExplorerAdapter` abstraction to get node description data to the frontend. Now that it's included in the graph, we can throw away this abstraction, which is a big step towards plugin simplification (work towards #1120). Since it only affects a deprecated/legacy part of the code base, I didn't put much effort into making the result super clean. I also removed a few tests that became inconvenient. Test plan: Verified that the legacy frontend still works. There's one tiny regression, which is that the link color in the legacy frontend no longer matches the rest of the UI, but that's actually consistent with the timeline frontend, so no biggie. `yarn test` passes.	2019-07-14 20:11:30 +01:00
Robin van Boven	8ae76f122e	Include quasar problematic interactions. (#1225 )	2019-07-14 18:34:52 +01:00
Dandelion Mané	88f736d180	add `sourcecred/scores` (#1223 ) The scores are lightly processed from their internal representation. Example usage: ``` $ yarn backend; $ node bin/sourcecred.js load sourcecred/sourcecred $ node bin/sourcecred.js scores sourcecred/sourcecred > scores.json ``` The data structure is as follows: ```js export type NodeOutput = {\| +id: string, +totalCred: number, +intervalCred: $ReadOnlyArray<number>, \|}; export type ScoreOutput = Compatible<{\| +users: $ReadOnlyArray<NodeOutput>, +intervals: $ReadOnlyArray<Interval>, \|}>; ``` Test plan: I added sharness tests at `sharness/test_cli_scores.t`. In the past, we've used javascript tests for CLI commands. However, those are pretty time-consuming to write, and are less robust than simply running the command from bash. Check the snapshot for a sense of what the new data format looks like. Also, the snapshot updater now updates this snapshot too. Relevant for #1047. Thanks to @Beanow for feedback on the output format and design. Thanks to @wchargin for help in code review.	2019-07-14 17:05:13 +01:00
Dandelion Mané	8e0bbcf597	Change version to 0.3.0	2019-07-11 21:53:11 +01:00
Dandelion Mané	c0b207b989	Have the prototypes/ page point to Timeline Cred As of this commit, the main SourceCred prototypes page now links to timeline cred, meaning that timeline cred is now live. I've added a link from the legacy explorer to the timeline explorer (which already has a link out to the legacy explorer). Test plan: Careful inspection of the frontend by the committer. Also, yarn test.	2019-07-11 06:33:41 +01:00
Dandelion Mané	93ceb9ca05	Move the v1 explorer to `explorer/legacy` This is a bulk rename of all the old explorer code into `explorer/legacy`. Now that the timeline explorer exists, I intend to prioritize development on that going forwards. Once the timeline explorer is as good as the old explorer at decomposing a node's sources of cred, I will remove the legacy explorer entirely. Test plan: `yarn test`	2019-07-11 06:33:41 +01:00
Dandelion Mané	5dc7f440ce	Initial Timeline Explorer This commit adds a TimelineExplorer for visualizing timeline cred data. The centerpiece is the TimelineCredChart, a d3-based line chart showing how the top users' cred evolved over time. It has features like tooltips, reasonable ticks on the x axis, a legend, and filtering out line segments that stay on the x axis. An inspection test is included, which you can check out here: http://localhost:8080/test/TimelineCredView/ Also, you can run it for any loaded repository at: http://localhost:8080/timeline/$repoOwner/$repoName This commit also includes new dependencies: - recharts (for the charts) - react-markdown (for rendering the Markdown descriptions) - remove-markdown (so the legend will be clean text) - d3-time-format for date axis generation - d3-scale and d3-scale-chromatic for color scales Test plan: The frontend code is mostly untested, in keeping with my observation that the costs of testing the old explorer were really high, and the tests brought little benefit. However, I have manually tested it thoroughly. Also, there is an inspection test for the TimelineCredView (see above).	2019-07-11 06:33:41 +01:00
Dandelion Mané	b106326e0a	Add a copy method to `analysis/weights` It's very simple: a method that creates a copy of a `Weights`. While writing this, I realized I should probably refactor the weights module so that it exports a class rather than a bunch of methods operating on a data structure. It would just be a cleaner API. But I'm leaving that for another day. Test plan: Unit tests added.	2019-07-11 06:33:41 +01:00
Dandelion Mané	d13c040a5f	Decrease GitHub TTL from 7 days to 12 hours As described in #987, we use a single TTL across GitHub types. Right now, the TTL is set to 7 days. This means that it's possible to run `sourcecred load`, but still be missing the last 7 days worth of issues. Now that we're doing timeline cred (cf #1212), this is not acceptable. As a workaround until we fix #987, I'm decreasing the TTL to 12 hours. That's still long enough to make a good experience for someone who is tweaking config and calling `sourcecred load` a lot, but ensures that freshly-loaded results still have recent activity. Test plan: `yarn test`	2019-07-11 01:39:51 +01:00
Dandelion Mané	a46500d704	Modify `sourcecred load` to save timeline cred Test plan: Observe changes to the snapshot for example-github-load. `yarn test --full` passes.	2019-07-11 01:30:27 +01:00
Dandelion Mané	aa7158dd95	add `analysis/timeline/timelineCred` This adds a TimelineCred class which serves several functions: - acts as a view on timeline cred data - (lets you get highest scoring nodes, etc) - has an interface for computing timeline cred - lets you serialize cred along with the graph and paramter settings that generated it in a single object One upshot of this design is that now if we let the user provide weights (or other config) on load time in the CLI, those weights will get carried over to the frontend, since they are included along with the cred results. TimelineCred has 'Parameters' and 'Config'. The parameters are user-specified and may change within a given instance. The config is essentially codebase-level configuration around what types are used for scoring, etc; I don't expect users to be changing this. To keep the analysis module decoupled from the plugins module, I put a default config in `src/plugins/defaultCredConfig`; I expect all users of TimelineCred to use this config. (At least for a while!) Test plan: I've added some tests to `TimelineCred`. Run `yarn test`. I also have a yet-unmerged branch that builds a functioning cred display UI using the `TimelineCred` class. fixup tlc	2019-07-11 01:30:27 +01:00
Dandelion Mané	9bd1e88bc9	add `analysis/timeline/filterTimelineCred` This adds the `filterTimelineCred` module, which dramatically reduces the size of timeline cred by throwing away all nodes that are not a user or repository. It also supports serialization / deserialization. Test plan: unit tests included	2019-07-11 01:30:27 +01:00
Dandelion Mané	162f73c3e9	add `analysis/timeline/distributionToCred` This module takes the timeline distributions created by `timelinePagerank`, and re-normalizes the scores into cred. For details on the algorithm, read comments and docstrings in the module. Test plan: Unit tests added.	2019-07-11 01:30:27 +01:00
Dandelion Mané	87720c4868	Add `analysis/timeline/timelinePagerank` As the name would suggest, this module allows computing timeline PageRank on a graph. See documentation in the module for details on the algorithm. Test plan: The module has incomplete testing. The timelinePagerank function calls out to iterators for getting the time-decayed node weights and the time-decayed markov chain; these iterators are tested. However, the wrapper logic that composes these pieces together, calculates seed vectors, and runs PageRank is not tested. I felt it would be a pain to test and settled for reviewing the code carefully, and putting a cautionary note at the top of the function.	2019-07-11 01:30:27 +01:00
Dandelion Mané	cb236eff5d	add `analysis/weightEvaluator` This commit adds new weight evaluators for nodes and edges. Unlike the previous evaluator, edges and nodes are handled as separate concerns, rather than composing the node weights into the edge weights. I think this separation is cleaner. Both evaluators use only the address, not the full (Node or Edge) object. Although we may want to give the edge evaluator access to the full Edge later, if we decide we want node-type-differentiated edge weights (e.g. if a hasParent edge has a different weight depending on whether it is connected to an Issue or a Repository). weightsToEdgeEvaluator has been refactored to use the new evaluators, and has been given a deprecation notice. Test plan: `yarn test`	2019-07-11 01:30:27 +01:00
Dandelion Mané	2335c5d844	add `analysis/timeline/interval` This commit adds an `interval` module which defines intervals (time ranges), and methods for slicing up a graph into its consistuent time intervals. This is pre-requisite work for #862. I've added a dep on d3-array. Test plan: Unit tests added; run `yarn test`	2019-07-11 01:30:27 +01:00
Dandelion Mané	7d26c196f2	Disable the Git plugin This commit disables the Git plugin by removing it from the default list of plugins to load, or to display in the frontend. Rationale: The git plugin doesn't currently add very much to cred quality. Git commits have edges to their parent, which isn't a very meaningful relationship for cred purposes. We'll want to re-enable the Git plugin once we're ready to support e.g. file and directory level cred tracking. I've skipped a block of tests around the git analysisAdapter. (I intend to deprecate the analysisAdapters, so skipping the tests seemed preferrable to updating them). I also updated our sharness test for catching test files without a proper describe block, so that it won't error on skipped blocks. Test plan: `yarn test --full` passes. Loading a new repository and inspecting it in the frontend gives consistent results. There are no references to Git plugin weights in the frontend, now that corresponding nodes are not available.	2019-07-09 19:40:17 +01:00
Dandelion Mané	e454a44c71	blacklist dependabot The dependabot bot has an inconsistent typename in GitHub's database. We'll blacklist it so we can continue loading `sourcecred/sourcecred`. Test plan: `node bin/sourcecred.js load sourcecred/sourcecred` fails before this commit, and succeeds after.	2019-07-09 19:33:16 +01:00
Dandelion Mané	302850202a	refactor: describe edge weights consistently This commit resolves an inconsistency where we called edge weights `toWeight` and `froWeight` in the core/attribution module, but `forwards` and `backwards` in the analysis module. I changed field names in the PagerankGraph JSON, so I bumped its compat. Test plan: `yarn test --full` passes.	2019-07-09 13:29:22 +01:00
Dandelion Mané	31ab767c03	verify graphToMarkovChain ordering is canonical This commit just adds a test which verifies that when an OrderedSparseMarkovChain is created by graphToMarkovChain, its nodeOrder is the graph's canonical node order. Test plan: `yarn test`	2019-07-09 13:08:23 +01:00
Dandelion Mané	df55b9c3c5	graph: canoncialize node and edge iteration order This means that we no longer need to expose methods for extracting the order from serialized JSON. We can always count on iterating over the nodes and edges in sorted order. Test plan: `yarn test`; tests updated.	2019-07-09 13:08:23 +01:00
Dandelion Mané	7a493b596d	Update eslint and eslint configuration This commit updates eslint from v4 to v6. In doing so, I've moved off of the create-react-app base eslint config. We were on an old version (v2) and it doesn't make sense to update to v4, as in v4 create-react-app uses typescript. Also, it didn't make sense to stay on create-react-app's v2 config, because then it had unmet peer dependency constraints on old versions of eslint. Instead, I've moved us to use the default rules for eslint, eslint-plugin-react, and eslint-plugin-flowtype. I also made some changes to the codebase to satisfy the new lint rules that came with this change. Test plan: `yarn test` passes.	2019-07-05 18:39:00 +01:00
Dandelion Mané	eadcca8999	Upgrade flow to to 0.102.0 This necessitated a number of type fixes: - Upgraded the express flow-typed file to latest - Added manual flow error suppression to where the express flow-typed file is still using a deprecated utility type - Removed type polymorphism support on map.merge (see context here[1]). We weren't using the polymorphism anywhere so I figured it was simplest to just remove it. - Improve typing around jest mocks throughout the codebase. Test plan: `yarn test --full` passes. [1]: https://github.com/flow-typed/flow-typed/issues/2991	2019-07-05 17:21:56 +01:00
Dandelion Mané	6a13248b09	Upgrade prettier This commit updates our prettier version from `1.13` to `1.18`. Looks like software does get better over time! I like all of the changes. Test plan: `yarn test` passes. I've manually inspected the diffs.	2019-07-04 20:33:42 +03:00
Dandelion Mané	29c9229c28	Update better-sqlite3 to v5 When we took a dep on better-sqlite3 in #836, we used a fork, because better-sqlite3 did not yet support private in-memory databases via the `:memory:` filepath. As of better-sqlite3 v5, this has been added to mainline, so we no longer need the fork. The v4->v5 transition involves some breaking changes. The only ones that affected us were two field renames, from `lastUpdateROWID` to `lastUpdateRowid`, and `returnsData` to `reader`. Test plan: After updating the field accesses, `yarn test --full` passes. For added safety, I also blew away cache, loaded a nontrivial repository, and verified that the full cred workflow still works. cc @wchargin	2019-07-04 20:31:32 +03:00
Dandelion Mané	e47c9b3aba	graph: add node and edge timestamps This commit updates the Graph class so that both nodes and edges have timestmaps. This is a big step for #862. Test plan: `yarn test --full` passes.	2019-07-04 13:44:28 +03:00
Dandelion Mané	6c5e8b70d6	graph: add descriptions This updates the graph `Node` type to include a string description. The description should be a brief (ideally oneline) string giving context on what the node is. All planned frontends will support markdown, so linking to context (e.g. linking to the issue corresponding to an ISSUE type node) is supported. This commit updates the Git and GitHub plugins to use the new description field. Test plan: `yarn test --full` passes, and I've inspected snapshots and made sure they look reasonable.	2019-07-04 13:44:28 +03:00
Dandelion Mané	e7add05df5	Plugins create dangling edges The GitHub plugin no longer adds a Node to the graph for Git commits. Instead, it creates a dangling edge to it. This frees the GitHub plugin from responsibility for setting the timestamp or other metadata for Git nodes. The Git plugin no longer adds a Commit Node to the graph immediately when encountering a commit's parent hash. Instead, it creates an edge to the parent, and then fills in the parent node once it is encountered in the commit store. Test plan: Load a real repository with merged pull requests (e.g. sourcecred/research) into the explorer, and verify that GitHub commit entities are still connected to Git commits, and that Git commits are still connected to their parents.	2019-07-03 15:19:11 +03:00
Dandelion Mané	02a8e02922	graph: add support for dangling edges This commit modifies the Graph class so that it permits dangling edges; that is to say, edges whose src or dst are not present in the graph. Dangling edges may be directly added to the graph, or existing edges may become dangling if their src or dst is removed. This change is prerequisite to #1136; if we require that nodes have metadata, we should also make it possible to add edges to nodes that don't yet exist, as the plugin creating an edge may not have access to the full metadata needed to add the node. To support this change, there is now an `isDanglingEdge` method on the graph, which reports whether or not the edge is dangling. Also, `Graph.edges` requires that the client make an explicit choice on whether dangling edges are desired. This ensures that we do not accidentally include dangling edges in a case where they are inappropriate (e.g. creating a Markov chain) or accidentally discard dangling edges when they are needed (e.g. when merging or serializing). The Graph's invariant checker has been updated to reflect the new semantics. The Graph compat version has been bumped, since this is a break in backwards compatibility. Note that this commit does not change the behavior of any plugins; that is to say, no plugins create dangling edges (yet). Test plan: The advanced graph test case has been updated to include dangling edges. The tests for Graph, PagerankGraph, and GraphToMarkovChain have been updated. `yarn test --full` passes.	2019-07-03 15:19:11 +03:00
Dandelion Mané	f934735afc	Graph: reify Object nodes (#1188 ) This commit modifies the base `Graph` class so that nodes are now represented by `Node` objects rather than `NodeAddressT`. The intention is to start adding additional fields (e.g. description and timestamp) to nodes, although that is not included in this commit. See #1136 for rationale. Test plan: The graph is very well tested, and this commit adds additional tests and invariant checking. Some additional test code needed update. `yarn test --full` passes, and the SourceCred UI works as expected.	2019-06-14 03:22:31 +03:00
Dandelion Mané	7277867cc8	cleanup: PagerankGraph getters return undefined PagerankGraph's `node` and `edge` getters returned null for unavailable entries, rather than undefined. This is inconsistent with general JS behavior, and with the base Graph. I've now cleaned it up. Test plan: unit tests updated; `yarn test` passes.	2019-06-14 02:46:11 +03:00
Dandelion Mané	bcf805b9c8	cleanup: remove test duplication In a previous commit (#1182) I inadvertently duplicated some tests. They have now been removed. Test plan: `yarn test` passes.	2019-06-14 02:46:11 +03:00
Dandelion Mané	414fb9f89f	Add GitHub entity descriptions (#1186 ) Every GitHub entity now has a `description` method which returns a short markdown description. These will be added to the graph as part of #1136. Test plan: Inspected snapshots, `yarn test --full`.	2019-06-13 23:58:17 +03:00
Dandelion Mané	31abd380dc	Add GitHub reaction timestamps (#1185 ) This will allow timeline cred (#862) to do a better job of flowing cred across reaction edges. (Very old reactions should not be moving a lot of present-day cred.) Test plan: Inspected snapshot changes.	2019-06-13 23:43:54 +03:00
Dandelion Mané	b03f824e7d	GitHub entities expose `timestampMs` (#1184 ) Every GitHub entity from `RelationalView` now has a `timestampMs` method. This replaces the standalone `createdAt` method. Test plan: Snapshots look good.	2019-06-13 23:32:22 +03:00
Dandelion Mané	1ec3945cdb	Load commit authored datetimes from GitHub (#1183 ) It's an extension of #1152 induced by #1175. It's a very simple change; I just changed the schema, and `scripts/update_snapshots.sh` took care of the rest. Test plan: Inspected snapshots and generated flow types.	2019-06-13 23:27:56 +03:00
Dandelion Mané	4029458098	Factor out distribution modules (#1182 ) This pulls distribution related code out of `markovChain.js` into the new `distribution.js` module, and from `graphToMarkovChain.js` into `nodeDistribution.js`. Since the `computeDelta` method is now exported, I've added some unit tests. Test plan: `yarn test` passes.	2019-06-13 23:24:37 +03:00
Dandelion Mané	e47a5bd84e	Remove `createdAt` from the `AnalysisAdapter` As #1136 will be moving timestamps into the graph, we no longer need `createdAt` method in the `AnalysisAdapter`. Actually, we no longer need the adapter/loader distinction introduced in #1157. I haven't taken the time to remove the `BackendAdapterLoader` concept because a) we may need it later, and b) if we don't, I'll likely remove the `AnalysisAdapter` concept entirely, in favor of having plugins directly save `graph.json` files to a known location. Test plan: `yarn test` passes.	2019-06-13 23:24:20 +03:00
Dandelion Mané	4b1763ebc6	Discard mentionsAuthorReference I added `mentionsAuthorReference` based on an untested hypothesis that they would be useful. With the passage of time, I've never seen any evidence that they actually improve cred socres (their impact seems negligible), and they add complexity. In the future, "go-fishing" style heuristics like this should not merge unless they are of clearly demonstrated value. Also, it would be better to add stuff like this via a standalone plugin rather than in the core GitHub logic. Undoes #806. Test plan: `yarn test`	2019-06-13 23:24:20 +03:00
Dandelion Mané	3179ba841b	remove `sourcecred export-graph` Now that the graph is saved by default as a part of load, users who need the graph can grab it directly from the `$SOURCECRED_DIRECTORY`. If we really need a command line util for grabbing it, we should rewrite that command to just grab the graph from that spot rather than re-computing it. Test plan: `yarn test`	2019-06-13 22:40:07 +03:00
Dandelion Mané	a348747aed	Remove `timestampMap` As of #1136, this will be redundant with raw information in the graph. Test plan: `yarn test`	2019-06-13 22:40:07 +03:00
Dandelion Mané	67baacd862	cli load: save graph, not pagerank or timestampMap As of the timeline cred work, I'm shifting emphasis away from raw PageRank results, in favor of timeline pagerank results. As such, there's no need to have load save the regular pagerank results on creation. As of #1136, there will be no need for timestampMap, as that data will be present directly in the graph. As the timeline cred UI will depend on the full graph for analysis, let's save the graph instead. Test plan: `yarn test` and snapshot inspection.	2019-06-13 22:40:07 +03:00
Dandelion Mané	e493af2307	Refactor graph-related test code (#1179 ) This commit adds new helper methods for creating test nodes (`node` and `partsNode`) and for creating test edges (`edge` and `partsEdge`) to graphTestUtil.js. This is very helpful in light of work related to #1136. I'm going to change the concept of "node" from a raw address to an object, add fields to that object, and add fields to the `Edge` type. If done naively, we would need to change all the test code across the project for every one of those changes. By centralizing the creation of test nodes and edges behind the new functions, we can update all the test code in a single place. This change is trivial from a conceputal perspective, and very broad-reaching from a code-touching perspective. It should be easy to review, because if tests pass then the change is probably working as intended. :) Test plan: `yarn test`	2019-06-13 22:16:26 +03:00
Dandelion Mané	e916bc91c8	Temporarily remove the odyssey plugin (#1178 ) In #1132 and #1134, I started work on the Odyssey plugin. However, before getting it to a state where it's usefully included in SourceCred, I decided to pivot to focus on timeline cred first. Now I'm merging significant refactors as a part of timeline cred (#1136). As a side effect of this refactor, the Odyssey plugin should undergo significant changes (OdysseyInstance is now basically redundant with base Graph.) Rather than incrementally update unused code, I elect to remove the plugin. This code should be revived on a side branch, and then merged into master once we have a fully functioning prototype. Test plan: `yarn test` passes.	2019-06-13 17:07:05 +03:00
Dandelion Mané	3c8fd0e701	Graph refactor: {inEdges, outEdges}->incidentEdges (#1173 ) This commit refactors the Graph class so that rather than having separate maps for inEdges and outEdges, there is a single incidentEdges map, which contains objects with inEdges and outEdges. This is motivated by a forthcoming big change as part of #1136; namely, to allow storing dangling edges in the graph. Once we do so, we'll need a consistent source of truth that enumerates all of the node addresses which are accessible in the graph (either because they correspond to a node in the graph, or because they are the src or dst of a dangling edge). We could do this by adding another field to graph which tracks this set, but by making this refactor, we can instead use the key set of _incidentEdges as the source of truth for which node addresses are present. Besides being motivated by #1136, I think it's cleaner in general. Note there are fewer ways for the graph to be inconsistent, as it's no longer possible for inEdges and outEdges to have inconsistent sets of node addresses. The most complicated piece of this change was updating the automatic invariant checker. It was no longer possible to test 3.1 and 4.1 separately, so they needed to be merged into a new invariant. Rather than re-enumerate the invariants, I called the new one the 'Temporary Invariant', because it is going to disappear in a subsequent commit. Test plan: `yarn test` passes. Since Graph has extremely thorough testing, this gives me great confidence in this commit. Note that no observable behavior has changed.	2019-06-13 15:49:12 +03:00
Dandelion Mané	ccfaa25e7b	Add a GitHub Commit node type (#1175 ) At present, the Git commit node type lives in a strange state of shared responsibility between GitHub and Git. The Git plugin is nominally responsible for it, but its render method tries to show a hyperlink to GitHub -- which is awkward for many reasons, including that the same Git commit could have multiple hyperlinks on GitHub. This commit resolves that issue by separating the existing commit type into two: the Git Commit type, which is owned by the Git plugin and doesn't have hyperlinks or any fancy GitHub metadata, and the GitHub Commit, which is owned by the GitHub plugin, corresponds to a unique database id in GitHub, and has a corresponding GitHub url. The two commits are connected by a CorrespondsToCommit edge type, which links from the GitHub commit to the corresponding Git commit. This is necessary for #1136, as if we want to make descriptions a part of the graph payload, we need for descriptions to be unique for a given address--and descriptions are only unique if we identifiy each GitHub commit pointer as a separate address. Test plan: The unit testing in this part of the codebase is light, so I verified that the frontend work as expected for `sourcecred/sourcecred` and `sourcecred/research`. The new node type and edge type appear properly in the UI, the GitHub commits are connected to their Git counterparts, etc.	2019-06-03 23:57:48 +03:00
Dandelion Mané	16edea6413	Remove the `graphView`s (#1171 ) A long time ago, we made graph views for git and github. These are interfaces over the graph which allow retrieving nodes' relations, e.g. finding the parent address of a commit just using the graph. These are fairly complex, and have seen almost no use at all. The one thing they are used for is implementing invariant checking. The invariant checking is nice in principle, but since we only apply it to the example data, its of very limited value in practice. Since I'm planning a significant Graph refactor (#1136), I'd rather delete this code than continue to maintain it, since I think it's complexity/value ratio is unfavorable. Test plan: `yarn test`	2019-06-03 21:07:27 +03:00
Dandelion Mané	fcbd024a83	Fix silently failing github token test This is another minor silent test failure: the error message thrown by loadIndividualPlugin when a GitHub token is not available is not quite the error that was specified in the test. There were two issues: that we were testing for the wrong error message, and that the failure didn't fail the test. I fixed both issues (by changing the message thrown to match the test, and by having the test _return_ the expectation that the promise will reject, and by expecting there to be one assertion.) Test plan: `yarn unit cli/load` no longer shows any unhandled promise rejection warnings. If the test is modified so that it checks for the wrong string, it now properly fails rather than passing with an unhandled rejection.	2019-06-02 11:27:36 +03:00
Dandelion Mané	7c4ff66907	Fix uncaught test failures Prior to this commit, if you run `yarn unit cli/load`, you would see a lot of unhandled promise rejection warnings related to the fact that load calls a `saveTimestamps` function which expects the SourceCred directory to contain the results of really loading SourceCred plugins. However, when testing, these functions have been mocked, and so saveTimestamps rejects. This rejection was not caught, and polluted the test output without actually failing the tests. This commit updates the tests so that saveTimestamps can be mocked (via dependency injection) and we can both verify that it is invoked correctly, and not pollute the test output with spurious warnings. Test plan: - `yarn test --full` passes - `yarn unit cli/load` produces far fewer UnhandledPromiseRejectionWarnings (there is still one unrelated one) - loading sourcecred/research still works (as a canary) Note: the PR which introduced this issue is #1162.	2019-06-02 11:27:36 +03:00

1 2 3 4 5 ...

987 Commits