Summary:
An update step is now roughly as simple as:
_updateData(postQuery(_queryFromPlan(_findOutdated())))
(with some option/config parameters thrown in).
This makes progress toward #622.
Test Plan:
Unit tests included. They are light, because these functions are light.
They still retain full coverage.
wchargin-branch: mirror-full-pipeline
Summary:
These typenames are often superfluous, but sometimes they are useful.
For instance, we might want to fetch the same data for `User`s, `Bot`s,
and `Organization`s, but still differentiate which kind of node we
fetched from an `Actor` union reference. Similarly, many timeline events
may have similar signatures (like, “issue closed” vs. “issue reopened”).
Test Plan:
Existing unit tests have been updated; run `yarn unit`.
wchargin-branch: mirror-extract-typenames
Summary:
The `extract` method lets you get data out of a mirror in a structured
format.
The mirror module now contains all the plumbing needed to provide
meaningful value. Remaining to be implemented are some internal
porcelain and a public method to perform an update step.
This makes progress toward #622.
Test Plan:
Comprehensive unit tests included, with full coverage; run `yarn unit`.
wchargin-branch: mirror-extract
Summary:
This reestablishes harmony in light of #882.
Test Plan:
Existing unit tests suffice; run `yarn unit`.
wchargin-branch: mirror-query-plan-connection-object-typename
Summary:
This commit adds internal functions to (a) emit a GraphQL query to fetch
data for own-data of an object, and (b) ingest the results of said query
back into the database.
The API and implementation differ from the connection-updating analogues
introduced in #878 in that the query for own data is independent of an
object’s ID: it depends only on the object’s type. This affords us more
flexibility in composing queries.
As described in a internal documentation comment, values are stored in
the database in JSON-stringified form: we cannot use the obvious raw SQL
values, because there is no native encoding of booleans (`0`/`1` is
conventional), and we need to distinguish them from other data types.
There are other ways to solve this problem. Notably:
1. We could take inspiration from OCaml: encode stronger static types
and a simpler runtime representation. That is, we could change the
schema field types from simply “primitive” to the various specific
primitive types. Then, when reading data out from the database, we
could reinterpret the values appropriately.
2. We could take advantage of the fact that we are not using all of
SQLite’s data types. In particular, we do not store anything as a
binary blob, so we could encode `false` as a length-0 zeroblob and
`true` as a length-1 zeroblob, for instance. Again, when reading
data out from the database, we would reinterpret the values—but in
this approach we would not need an explicit schema from the user.
For now, we take the easiest and simplest approach just to get ourselves
off the ground. We can easily move to the second option described above
later.
This commit makes progress toward #622.
Test Plan:
Unit tests included, with full coverage. While these tests check that
the GraphQL queries are as expected, they cannot check that they are
actually valid in production. To check this, follow the instructions in
the added snapshot test.
wchargin-branch: mirror-own-data-updates
Summary:
As <https://github.com/sourcecred/sourcecred/pull/883/files#r219648511>.
It is somewhat unfortunate that this mixes a command with a query, but
the concession is acceptable in this instance, I think.
Test Plan:
Existing unit tests suffice, retaining full coverage.
wchargin-branch: mirror-register-node-field-results
Summary:
This helpful utility is already used in some test code, and will shortly
be used in main code. @decentralion suggested factoring it out in
<https://github.com/sourcecred/sourcecred/pull/883#discussion_r219647781>.
Test Plan:
Unit tests included, with full coverage; run `yarn unit`.
wchargin-branch: mirror-make-update-helper
Summary:
Almost every GitHub connection has nodes of an object type, like `User`
or `IssueComment`. But a few have nodes of union type, including
`IssueTimelineItemConnection` (which we will likely want to query), and
those require special handling. This commit adds susupport for such
connections.
Analysis to determine which connections have non-object elements:
<https://gist.github.com/wchargin/647fa7ed8d6d17ae2e204bd098104407>
Test Plan:
Unit tests modified appropriately, retaining full coverage.
The easiest way to verify the snapshot is probably to copy the raw
contents (everything inside the quotes) into `/tmp/snapshot`, then run:
```shell
$ sed -e 's/\\//g' </tmp/snapshot >/tmp/query # Jest adds backslashes
$ jq -csR '{query: ., variables: {}}' </tmp/query >/tmp/payload
$ ENDPOINT='https://api.github.com/graphql'
$ AUTH="Authorization: bearer ${SOURCECRED_GITHUB_TOKEN}"
$ curl "$ENDPOINT" -X POST -H "$AUTH" -d @- </tmp/payload >/tmp/result
```
and then execute the JQ program mentioned in the comment in the test
case, and verify that it prints `true`.
wchargin-branch: connection-of-union
Summary:
This commit changes the `Issue` type of the schema used in the `mirror`
tests to have fields a faithful subset of those in the actual GitHub
schema. The tests are self-contained, so this is not strictly required.
However, it is convenient, because it means that we can snapshot a query
that can actually be posted to GitHub.
Test Plan:
Running `yarn unit mirror` suffices for the code change. The GitHub
schema docs at <https://developer.github.com/v4/object/issue/> indicate
that each of `id`, `url`, `author`, `repository`, `title`, and
`comments` is a valid field.
wchargin-branch: mirror-test-schema
Summary:
This commit adds internal functions to (a) emit a GraphQL query to fetch
data for a particular connection, and (b) ingest the results of said
query back into the database.
This commit makes progress toward #622.
Test Plan:
Unit tests included, with full coverage. While these tests check that
the GraphQL queries are as expected, they cannot check that they are
actually valid in production. To check this, follow the instructions in
the added snapshot test.
wchargin-branch: mirror-connection-updates
Summary:
This function finds all objects whose own data has not been updated
since a given time, and all connections whose entries have not been
updated since that time.
Note that this is scoped to the entirety of the database. In #622,
I discussed using a recursive common table expression to identify only
those transitive dependencies of the root. I think that this is overkill
for the `_findOutdated` method: you’ll usually want to update everything
in the database. Don’t worry—the cool recursive query will still be used
in the `extract` function. :-)
This commit makes progress toward #622.
Test Plan:
Unit tests added, with full coverage; run `yarn unit`.
wchargin-branch: mirror-findoutdated
Summary:
This function informs the GraphQL mirror of the existence of an object,
specified by its global ID and its concrete typename (“concrete” meaning
“object type”—like `User`, not `Actor`).
The function will be called extensively internally as more objects are
discovered while traversing the graph, but also needs to be exposed as a
public entry point: a client needs to call this function at least once
to register the root node of interest. A typical client workflow, once
all of #622 is implemented, might be:
1. Issue a standalone GraphQL query to find the ID of a root node, like
a GitHub repository: `repository(owner: "foo", name: "bar") { id }`.
2. Call `registerObject` with the ID found in the previous step.
3. Instruct the mirror to recursively update all dependencies.
4. Extract data from the mirror.
As of this commit, steps (1) and (2) are possible.
This commit makes progress toward #622.
Test Plan:
Unit tests included, with full coverage; run `yarn unit`.
wchargin-branch: mirror-registerobject
Summary:
It’s useful to add this simple function now because the rest of the
commits required to implement #622 will want to use it extensively in
test code. Actual clients of the API will not need to use it, because
the concept of “updates” is an implementation detail: clients will
always provide simple timestamps.
Test Plan:
Unit tests included, with full coverage; run `yarn unit`.
wchargin-branch: mirror-createupdate
Summary:
Some clients want to write
const primitivesTableName = _primitivesTableName(typename);
which they cannot if the function is also called `primitivesTableName`,
due to ECMAScript shadowing semantics.
Test Plan:
Running `yarn flow` suffices; running `yarn unit` really suffices.
wchargin-branch: mirror-rename-primitivestablename
Summary:
Each change provides real value, by either testing a plausible happy
path that simply was not tested previously, or by adding an
`empty`-assertion to a switch against a discriminated union type.
Test Plan:
For the snapshot change relating to the query formatter, note that
Prettier formats the changed portion of the snapshot in the same way, by
visiting <https://prettier.io/playground> and setting the parser to
"graphql". (Prettier in general agrees with the stringification defined
by this module, except for commas and spacing, for which we don’t bother
to generate impeccably pretty output.)
Run `yarn coverage` and note that the coverage of the whole `graphql`
package is 100% on all axes.
wchargin-branch: graphql-coverage
Summary:
This simplifies and clarifies the code with no observable change.
Test Plan:
Existing unit tests suffice; run `yarn unit`.
wchargin-branch: mirror-use-schemainfo
Summary:
This is mostly useful not for computational efficiency, but for ease of
implementation: there end up being multiple places where we want to find
(say) the primitive fields on an object, and having to go through the
whole iterate-and-switch-and-push process repeatedly is annoying.
Test Plan:
Unit tests included, with full coverage; run `yarn unit`.
wchargin-branch: mirror-schema-info
Summary:
This commit augments the `Mirror` constructor to turn the provided
GraphQL schema into a SQL schema, with which it initializes the backing
database. The schema is roughly as originally described in #622, with
some changes (primarily: we omit `WITHOUT ROWID`; we add indexes; we
store `total_count` on connections; and we use milliseconds instead of
seconds for epoch time).
Test Plan:
Unit tests included, with full coverage; run `yarn unit`.
wchargin-branch: mirror-sql-schema
Summary:
This commit introduces the `Mirror` class that will be the centerpiece
of the persistent-loading API as described in #622. An instance of this
class represents a mirror of a remote GraphQL database, defined by a
particular schema. In this commit, we add the construction logic, which
includes a safety measure to ensure that the database is used within one
version of the code and schema.
Test Plan:
Unit tests included, with full coverage; run `yarn unit`.
wchargin-branch: mirror-class
Summary:
In implementing #622, we’ll want to run lots of things inside of
transactions. This commit introduces a JavaScript API to do so more
easily, properly handling success and failure cases.
Test Plan:
Unit tests included, with full coverage; run `yarn unit`.
wchargin-branch: mirror-transaction-helper
Summary:
This affords more flexibility to clients, because an exact value can be
used in place of an inexact value, but not vice versa.
Test Plan:
Running `yarn flow` suffices.
wchargin-branch: schema-exact-type-fields
Summary:
GraphQL unions are required to be unions specifically of object types.
They cannot contain primitives or other union types as clauses. This is
good: it means that we don’t have to worry about unions that recursively
reference each other or themselves.
Unions are also required to have at least one clause, but we don’t
validate this because it’s not helpful for us. An empty union is
perfectly well-defined, if useless, and shouldn’t cause any problems.
Relevant portion of the spec:
<https://facebook.github.io/graphql/October2016/#sec-Union-type-validation>
Test Plan:
Unit tests added, retaining full coverage; `yarn unit` suffices.
wchargin-branch: graphql-schema-union-validation
Summary:
This commit introduces a module for declaratively specifying the schema
of a GraphQL database. See `buildGithubSchema` in `schema.test.js` for
an example of the API.
This makes progress toward #622, though the design has evolved some
since its original specification there.
Test Plan:
Unit tests added, with full coverage; `yarn unit` suffices.
wchargin-branch: graphql-schema
Summary:
Per #800, each test file should start with a `describe` block listing
its file path under `src`. Currently, nine of our tests do not do so.
Of these, eight had a top-level describe block with the wrong name
(either not a filepath or an outdated filepath), while only one short
test was missing a top-level describe block altogether. This patch fixes
each file to use the correct format.
Test Plan:
Apply the Sharness test in #802, and note that it fails before this
patch but passes after it.
wchargin-branch: describe-fix
Our data model orients on getting repos from GitHub, which are
alternatively represented as strings like "sourcecred/sourcecred", or
pairs of variables representing the owner and name, or objects with
owner and name properties. We also have a few different implementations
of repo validation, which are not applied consistently.
This commit changes all that. We now have a consistent Repo type which
is an object containing a string owner and string name. Thanks to a
clever suggestion by @wchargin, it is implemented as an opaque subtype
of an object containing those properties, so that the only valid way to
construct a Repo typed object is to use one of the functions that
consistently validates the repo.
As a fly-by fix, I noticed that there were some functions in the GitHub
query generation that didn't properly mark arguments as readOnly. I've
fixed these.
Test plan: No externally-observable behavior changes (except insofar as
there is a slight change in variable names in the GitHub graphql query,
which has also resulted in a snapshot diff). `yarn travis --full`
passes. `git grep repoOwner` presents no hits.
Test plan:
`git grep -i v3` only shows incidental hits in longer strings
`yarn travis --full` passes
`yarn backend` works
`yarn build` works
`yarn start` works
`node bin/sourcecred.js start` works
`node bin/sourcecred.js load sourcecred example-github` works
Paired with @wchargin
We want to reset some of our basic assumptions, and make `Graph` into a
pure graph implementation, rather than a hybrid graph and key-value
store.
This is a substantial rewrite, so we want to start from scratch in a v3/
directory and pull code into v3 as necessary. So that we can do this in
a relatively clean fashion, we're first moving the v1 and v2 code into
their own directories.
Paired with @wchargin
Test plan: Travis, and `yarn backend`, `node bin/sourcecred.js start`.
Note that `yarn backend` and `node bin/sourcecred.js start` both use the
v1 versions. We'll migrate those (by changing paths.js) to v3 when
appropriate.
Test Plan:
Run `yarn lint` and `yarn travis` and observe success. Add something
that triggers a lint warning, like `const zzz = 3;`; re-run and observe
failures.
wchargin-branch: lint
Summary:
For pagination, we’ll want to query against multiple entities of the
same type. GraphQL uses aliases to facilitate this. This commit adds
support for aliases to our GraphQL query DSL.
Test Plan:
Inspect snapshot changes, and note that `yarn flow` and `yarn test`
pass.
wchargin-branch: graphql-aliases