status-go/protocol/migrations/sqlite
Icaro Motta 6fa8c11382
URL unfurling (initial implementation) (#3471)
This is the initial implementation for the new URL unfurling requirements. The
most important one is that only the message sender will pay the privacy cost for
unfurling and extracting metadata from websites. Once the message is sent, the
unfurled data will be stored at the protocol level and receivers will just
profit and happily decode the metadata to render it.

Further development of this URL unfurling capability will be mostly guided by
issues created on clients. For the moment in status-mobile:
https://github.com/status-im/status-mobile/labels/url-preview

- https://github.com/status-im/status-mobile/issues/15918
- https://github.com/status-im/status-mobile/issues/15917
- https://github.com/status-im/status-mobile/issues/15910
- https://github.com/status-im/status-mobile/issues/15909
- https://github.com/status-im/status-mobile/issues/15908
- https://github.com/status-im/status-mobile/issues/15906
- https://github.com/status-im/status-mobile/issues/15905

### Terminology

In the code, I've tried to stick to the word "unfurl URL" to really mean the
process of extracting metadata from a website, sort of lower level. I use "link
preview" to mean a higher level structure which is enriched by unfurled data.
"link preview" is also how designers refer to it.

### User flows

1. Carol needs to see link previews while typing in the chat input field. Notice
   from the diagram nothing is persisted and that status-go endpoints are
   essentially stateless.

```
#+begin_src plantuml :results verbatim
  Client->>Server: Call wakuext_getTextURLs
  Server-->>Client: Normalized URLs
  Client->>Client: Render cached unfurled URLs
  Client->>Server: Unfurl non-cached URLs.\nCall wakuext_unfurlURLs
  Server->>Website: Fetch metadata
  Website-->>Server: Metadata (thumbnail URL, title, etc)
  Server->>Website: Fetch thumbnail
  Server->>Website: Fetch favicon
  Website-->>Server: Favicon bytes
  Website-->>Server: Thumbnail bytes
  Server->>Server: Decode & process images
  Server-->>Client: Unfurled data (thumbnail data URI, etc)
#+end_src
```

```
     ,------.                                 ,------.                             ,-------.
     |Client|                                 |Server|                             |Website|
     `--+---'                                 `--+---'                             `---+---'
        |        Call wakuext_getTextURLs        |                                     |
        | --------------------------------------->                                     |
        |                                        |                                     |
        |             Normalized URLs            |                                     |
        | <- - - - - - - - - - - - - - - - - - - -                                     |
        |                                        |                                     |
        |----.                                   |                                     |
        |    | Render cached unfurled URLs       |                                     |
        |<---'                                   |                                     |
        |                                        |                                     |
        |         Unfurl non-cached URLs.        |                                     |
        |         Call wakuext_unfurlURLs        |                                     |
        | --------------------------------------->                                     |
        |                                        |                                     |
        |                                        |            Fetch metadata           |
        |                                        | ------------------------------------>
        |                                        |                                     |
        |                                        | Metadata (thumbnail URL, title, etc)|
        |                                        | <- - - - - - - - - - - - - - - - - -
        |                                        |                                     |
        |                                        |           Fetch thumbnail           |
        |                                        | ------------------------------------>
        |                                        |                                     |
        |                                        |            Fetch favicon            |
        |                                        | ------------------------------------>
        |                                        |                                     |
        |                                        |            Favicon bytes            |
        |                                        | <- - - - - - - - - - - - - - - - - -
        |                                        |                                     |
        |                                        |           Thumbnail bytes           |
        |                                        | <- - - - - - - - - - - - - - - - - -
        |                                        |                                     |
        |                                        |----.                                |
        |                                        |    | Decode & process images        |
        |                                        |<---'                                |
        |                                        |                                     |
        | Unfurled data (thumbnail data URI, etc)|                                     |
        | <- - - - - - - - - - - - - - - - - - - -                                     |
     ,--+---.                                 ,--+---.                             ,---+---.
     |Client|                                 |Server|                             |Website|
     `------'                                 `------'                             `-------'
```

2. Carol sends the text message with link previews in the RPC request
   wakuext_sendChatMessages. status-go assumes the link previews are good
   because it can't and shouldn't attempt to re-unfurl them.

```
#+begin_src plantuml :results verbatim
  Client->>Server: Call wakuext_sendChatMessages
  Server->>Server: Transform link previews to\nbe proto-marshalled
  Server->DB: Write link previews serialized as JSON
  Server-->>Client: Updated message response
#+end_src
```

```
     ,------.                       ,------.                                  ,--.
     |Client|                       |Server|                                  |DB|
     `--+---'                       `--+---'                                  `+-'
        | Call wakuext_sendChatMessages|                                       |
        | ----------------------------->                                       |
        |                              |                                       |
        |                              |----.                                  |
        |                              |    | Transform link previews to       |
        |                              |<---' be proto-marshalled              |
        |                              |                                       |
        |                              |                                       |
        |                              | Write link previews serialized as JSON|
        |                              | -------------------------------------->
        |                              |                                       |
        |   Updated message response   |                                       |
        | <- - - - - - - - - - - - - - -                                       |
     ,--+---.                       ,--+---.                                  ,+-.
     |Client|                       |Server|                                  |DB|
     `------'                       `------'                                  `--'
```

3. The message was sent over waku and persisted locally in Carol's device. She
   should now see the link previews in the chat history. There can be many link
   previews shared by other chat members, therefore it is important to serve the
   assets via the media server to avoid overloading the ReactNative bridge with
   lots of big JSON payloads containing base64 encoded data URIs (maybe this
   concern is meaningless for desktop). When a client is rendering messages with
   link previews, they will have the field linkPreviews, and the thumbnail URL
   will point to the local media server.

```
 #+begin_src plantuml :results verbatim
   Client->>Server: GET /link-preview/thumbnail (media server)
   Server->>DB: Read from user_messages.unfurled_links
   Server->Server: Unmarshal JSON
   Server-->>Client: HTTP Content-Type: image/jpeg/etc
 #+end_src
```

```
     ,------.                                    ,------.                                  ,--.
     |Client|                                    |Server|                                  |DB|
     `--+---'                                    `--+---'                                  `+-'
        | GET /link-preview/thumbnail (media server)|                                       |
        | ------------------------------------------>                                       |
        |                                           |                                       |
        |                                           | Read from user_messages.unfurled_links|
        |                                           | -------------------------------------->
        |                                           |                                       |
        |                                           |----.                                  |
        |                                           |    | Unmarshal JSON                   |
        |                                           |<---'                                  |
        |                                           |                                       |
        |     HTTP Content-Type: image/jpeg/etc     |                                       |
        | <- - - - - - - - - - - - - - - - - - - - -                                        |
     ,--+---.                                    ,--+---.                                  ,+-.
     |Client|                                    |Server|                                  |DB|
     `------'                                    `------'                                  `--'
```

### Some limitations of the current implementation

The following points will become separate issues in status-go that I'll work on
over the next couple weeks. In no order of importance:

- Improve how multiple links are fetched; retries on failure and testing how
  unfurling behaves around the timeout limits (deterministically, not by making
  real HTTP calls as I did). https://github.com/status-im/status-go/issues/3498
- Unfurl favicons and store them in the protobuf too.
- For this PR, I added unfurling support only for websites with OpenGraph
  https://ogp.me/ meta tags. Other unfurlers will be implemented on demand. The
  next one will probably be for oEmbed https://oembed.com/, the protocol
  supported by YouTube, for example.
- Resize and/or compress thumbnails (and favicons). Often times, thumbnails are
  huge for the purposes of link previews. There is already support for
  compressing JPEGs in status-go, but I prefer to work with compression in a
  separate PR because I'd like to also solve the problem for PNGs (probably
  convert them to JPEGs, plus compress them). This would be a safe choice for
  thumbnails, favicons not so much because transparency is desirable.
- Editing messages is not yet supported.
- I haven't coded any artificial limit on the number of previews or on the size
  of the thumbnail payload. This will be done in a separate issue. I have heard
  the ideal solution may be to split messages into smaller chunks of ~125 KiB
  because of libp2p, but that might be too complicated at this stage of the
  product (?).
- Link preview deletion.
- For the moment, OpenGraph metadata is extracted by requesting data for the
  English language (and fallback to whatever is available). In the future, we'll
  want to unfurl by respecting the user's local device language. Some websites,
  like GoDaddy, are already localized based on the device's IP, but many aren't.
- The website's description text should be limited by a certain number of
  characters, especially because it's outside our control. Exactly how much has
  not been decided yet, so it'll be done separately.
- URL normalization can be tricky, so I implemented only the basics to help with
  caching. For example, the url https://status.im and HTTPS://status.im are
  considered identical. Also, a URL is considered valid for unfurling if its TLD
  exists according to publicsuffix.EffectiveTLDPlusOne. This was essential,
  otherwise the default Go url.Parse approach would consider many invalid URLs
  valid, and thus the server would waste resources trying to unfurl the
  unfurleable.

### Other requirements

- If the message is edited, the link previews should reflect the edited text,
  not the original one. This has been aligned with the design team as well.
- If the website's thumbnail or the favicon can't be fetched, just ignore them.
  The only mandatory piece of metadata is the website's title and URL.
- Link previews in clients should be generated in near real-time, that is, as
  the user types, previews are updated. In mobile this performs very well, and
  it's what other clients like WhatsApp, Telegram, and Facebook do.

### Decisions

- While the user typing in the input field, the client is constantly (debounced)
  asking status-go to parse the text and extract normalized URLs and then the
  client checks if they're already in its in-memory cache. If they are, no RPC
  call is made. I chose this approach to achieve the best possible performance
  in mobile and avoid the whole RPC overhead, since the chat experience is
  already not smooth enough. The mobile client uses URLs as cache keys in a
  hashmap, i.e. if the key is present, it means the preview is readily available
  (naive, but good enough for now). This decision also gave me more flexibility
  to find the best UX at this stage of the feature.
- Due to the requirement that users should be able to see independent loading
  indicators for each link preview, when status-go can't unfurl a URL, it
  doesn't return it in the response.
- As an initial implementation, I added the BLOB column unfurled_links to the
  user_messages table. The preview data is then serialized as JSON before being
  stored in this column. I felt that creating a separate table and the related
  code for this initial PR would be inconvenient. Is that reasonable to you?
  Once things stabilize I can create a proper table if we want to avoid this
  kind of solution with serialized columns.
2023-05-18 15:43:06 -03:00
..
000001_init.down.db.sql Add Commands (#1731) 2020-01-10 19:59:01 +01:00
000001_init.up.db.sql Add Commands (#1731) 2020-01-10 19:59:01 +01:00
000002_add_last_ens_clock_value.up.sql Verify ENS in the background (#1824) 2020-02-05 11:09:33 +01:00
1586358095_add_replace.up.sql Add replies to messages 2020-04-16 15:51:28 +02:00
1588665364_add_image_data.up.sql Create different index for filtering 2020-06-08 10:02:31 +02:00
1589365189_add_pow_target.up.sql Create different index for filtering 2020-06-08 10:02:31 +02:00
1591277220_add_index_messages.up.sql Create different index for filtering 2020-06-08 10:02:31 +02:00
1593087212_add_mute_chat_and_raw_message_fields.up.sql Polish up and address review feedback 2020-07-27 08:51:28 +02:00
1595862781_add_audio_data.up.sql add audio duration 2020-07-27 17:15:10 +02:00
1595865249_create_emoji_reactions_table.up.sql Use local chat-id for matching messages 2020-07-30 20:20:59 +02:00
1596805115_create_group_chat_invitations_table.up.sql group chat invitation 2020-09-07 12:15:58 +02:00
1597322655_add_invitation_admin_chat_field.up.sql group chat invitation 2020-09-07 12:15:58 +02:00
1597757544_add_nickname.up.sql [#11046] Add local contact names 2020-09-07 11:34:06 +02:00
1598955122_add_mentions.up.sql Add parsing and storing of mentions 2020-09-09 21:22:07 +02:00
1599641390_add_emoji_reactions_index.up.sql Add index to emoji reactions 2020-09-09 15:02:54 +02:00
1599720851_add_seen_index_remove_long_messages.up.sql Force migration if dirty 2020-10-01 13:47:59 +02:00
1603198582_add_profile_chat_field.up.sql profile status updates 2020-10-27 14:56:35 +01:00
1603816533_add_links.up.sql Link previews support (#2059) 2020-10-27 19:35:28 +02:00
1603888149_create_chat_identity_last_published_table.up.sql remove photo path in favor of images in contact 2020-12-17 14:10:00 +01:00
1605075346_add_communities.up.sql Fix communities migration 2021-01-08 08:43:16 +01:00
1610117927_add_message_cache.up.sql Cache waku messages 2021-01-18 09:38:27 +01:00
1610959908_add_dont_wrap_to_raw_messages.up.sql Skip wrapping emojis in private group chats 2021-01-26 09:39:47 +01:00
1610960912_add_send_on_personal_topic.up.sql Use personal topic for push notification registration 2021-01-26 09:39:53 +01:00
1612870480_add_datasync_id.up.sql Listen for delivered messages (#2150) 2021-02-23 17:47:45 +02:00
1614152139_add_communities_request_to_join.up.sql Request/Decline access to communities 2021-02-26 15:35:43 +01:00
1615374373_add_confirmations.up.sql Expand confirmations to private group chat messages 2021-03-16 13:41:14 +01:00
1617694931_add_notification_center.up.sql Add activity center & messages from contacts only 2021-04-16 20:42:40 +02:00
1618923660_create_pin_messages.up.sql add PinMessage and PinnedMessage (#2180) 2021-05-14 23:22:50 +02:00
1619094007_add_joined_chat_field.up.sql feat: introduce new `joined` property in `Chat` struct 2021-05-18 11:29:03 +02:00
1619099821_add_last_synced_field.up.sql Add mailserver logic 2021-05-21 07:22:58 +02:00
1621933219_add_mentioned.up.sql Add mentioned field 2021-05-26 08:33:38 +02:00
1622010048_add_unviewed_mentions_count.up.sql Add unviewed mentions count 2021-05-28 13:05:23 +02:00
1622061278_add_message_activity_center_notification_field.up.sql Add mention notification to activity center (#2239) 2021-05-29 14:05:25 -03:00
1622464518_set_synced_to_from.up.sql Set synced_to/from to 24 hours ago 2021-06-01 12:59:52 +02:00
1622464519_add_chat_description.up.sql Fix order of migration 2021-06-02 14:16:29 +02:00
1622622253_add_pinned_by_to_pin_messages.up.sql add PinnedBy to PinnedMessage (#2250) 2021-06-08 17:23:32 +02:00
1623938329_add_author_activity_center_notification_field.up.sql feat: add author to notification in activity center (#2259) 2021-06-17 11:08:28 -04:00
1623938330_add_edit_messages.up.sql Handle edit first & then message 2021-06-29 13:15:15 +02:00
1624978434_add_muted_community.up.sql feat(community): add muted to community and function to set it (#2271) 2021-06-30 09:29:43 -04:00
1625018910_add_repply_message_activity_center_notification_field.up.sql Add reply message to activity center notification of type reply (#2272) 2021-07-15 17:21:44 -03:00
1625762506_add_deleted_messages.up.sql Delete messages (#2279) 2021-07-26 17:06:32 -04:00
1627388946_add_communities_synced_at.up.sql Sync Communities (#2253) 2021-08-06 16:40:23 +01:00
1628280060_create-usermessages-index.sql add index on user_messages to speed-up marking all messages as read (#2304) 2021-08-11 08:26:15 -04:00
1632303896_modify_contacts_table.up.sql Add HasAddedUs field 2021-10-04 12:19:15 +02:00
1633349838_add_emoji_column_in_chats.up.sql Storing emoji values for Custom Emoji Thumbnails for Community Channels (#2366) 2021-10-04 18:32:25 +05:30
1634831235_add_highlight_column_in_chats.up.sql Store Highlight field for identify new chats (#2384) 2021-10-21 22:34:56 +05:30
1634896007_add_last_updated_locally_and_removed.up.sql Backup removed & added by them contacts 2021-11-15 18:53:35 +00:00
1635840039_add_clock_read_at_column_in_chats.up.sql [pairing] Sync read messages 2021-11-02 10:02:27 +02:00
1637852321_add_received_invitation_admin_column_in_chats.up.sql Decline pending group invitations from user, when user is banned (#2437) 2021-11-25 20:51:42 +05:30
1645034601_display_name.up.sql feat: display name 2022-03-14 13:48:34 -04:00
1645034602_add_mutual_contact_request.up.sql Initial support for mutual contact requests 2022-05-31 09:12:36 +01:00
1650373957_add_contact_request_state.up.sql Initial support for mutual contact requests 2022-05-31 09:12:36 +01:00
1656958989_contact_verification.up.sql feat: contact verification request (#2586) 2022-07-05 15:49:44 -04:00
1658236268_add_discord_message_authors_table.up.sql feat: introduce `discord_message_authors` persistence APIs 2022-08-10 10:13:55 +02:00
1659619997_add_discord_messages_table.up.sql feat: add `discord_messages` table and persistence APIs 2022-08-11 10:49:23 +02:00
1660226788_create_chat_identity_social_links.up.sql feat: add social links 2022-08-16 14:29:00 +02:00
1660226789_add_walletconnectsessions_table.up.sql Implement wallet connect session CRUD API 2022-08-19 12:32:00 +01:00
1661242854_add_communities_requests_to_leave.up.sql feat: introduce and distribute RequestToLeave community 2022-08-26 11:25:33 +02:00
1662044232_add_chat_image.up.sql fix: change migration timestamp of group chat add image feature 2022-09-01 17:55:46 +02:00
1662106895_add_chat_first_message_timestamp.up.sql feat: add and distribute `chatIdentity.FirstMessageTimestamp` 2022-09-09 08:59:39 +02:00
1662723928_add_discord_author_image_fields.up.sql feat(message_persistence): add discord message author image payload fields 2022-09-19 13:47:16 +02:00
1664195977_add_deleted_for_mes.up.sql feat: delete for me (#2866) 2022-09-28 19:42:17 +08:00
1664367420_add_discord_attachments_table.up.sql feat: add `DiscordMessageAttachment` types and APIs 2022-09-29 11:38:29 +02:00
1665079662_add_spectated_column_in_communities.up.sql feat: add `SpectateCommunity` api 2022-10-06 21:21:37 +02:00
1665479047_add_community_id_in_notifications.up.sql feat(ActivityCenter): Add community membership AC notifications (#2886) 2022-10-26 02:06:20 +04:00
1665484435_add_encrypted_messages.up.sql Send all encryption keys 2022-10-20 12:19:44 +01:00
1665560200_add_contact_verification_individual.up.sql Handle identity verifications 2022-10-26 17:19:44 +01:00
1670921937_add_album_id.up.sql Images Album (#3021) 2022-12-14 16:25:45 +04:00
1673373000_add_replied.up.sql feat: make replies act as mentions 2023-01-10 13:39:57 -05:00
1673428910_add_image_width_height.up.sql Image width height (#3061) 2023-01-12 13:43:14 +04:00
1674210659_add_contact_request_local_clock.up.sql Fix broken migrations 2023-02-01 18:31:32 +00:00
1675212323_add_deleted_by.up.sql feat: add deleted by xxx support (#3077) 2023-02-01 08:57:35 +08:00
1675247084_add_activity_center_states.up.sql feat: Add seen/unseen activity center setting (#3148) 2023-02-17 14:08:08 +04:00
1675272329_fix_protocol_migration.up.sql Fix broken migrations 2023-02-01 18:31:32 +00:00
1676998418_fix_activity_center_migration.up.sql Add test for everyone tag & fix migration order 2023-02-24 10:18:26 +00:00
1677278861_add_deleted_column_to_activity_center_notifications_table.up.sql Support soft deletion for activity center notifications (#3201) 2023-02-24 20:47:04 -03:00
1677486338_add_community_tokens_table.up.sql feat(CommunityTokens): Keep community token details in database 2023-02-27 10:37:54 +01:00
1678292329_add_collapsed_categories.up.sql Add collapsed community categories 2023-03-14 17:13:21 +00:00
1678800760_add_index_to_raw_messages.up.sql Add index to raw messages 2023-03-16 13:40:20 +00:00
1678877478_add_communities_requests_to_join_revealed_addresses_table.up.sql feat: add verified wallet accounts to community requests 2023-03-22 13:50:25 +01:00
1679326850_add_community_token_owners.up.sql feat(MintTo): Add Airdrop functionality. 2023-03-27 17:17:51 +02:00
1680011500_add_album_images_count.up.sql Add album count key to messages (#3347) 2023-03-30 12:02:20 +02:00
1680114896_add_index_on_album_id.up.sql fix(unread_count): Skip extra count of new messages for album of images (#3345) 2023-03-31 12:15:06 +03:00
1681655289_add_mute_till.up.sql Add muted_till param for chats (#3258) 2023-04-16 17:06:00 +02:00
1681934966_add_index_response_to.up.sql Fix some issues with pinned messages 2023-04-25 16:02:48 +01:00
1682528339_add_index_user_messages_unseen.up.sql chore: add index idx_user_messages_unseen 2023-05-02 09:21:58 +02:00
1683707289_recreate_deleted_for_mes.up.sql Feat/sync local deleted message (#3476) 2023-05-12 16:31:34 +08:00
1683725607_mark_discord_messages_as_seen.up.sql fix: mark imported messages as seen 2023-05-10 18:51:48 +02:00
1684174617_add_url_previews_to_user_messages.up.sql URL unfurling (initial implementation) (#3471) 2023-05-18 15:43:06 -03:00
README.md Fix communities migration 2021-01-08 08:43:16 +01:00
doc.go Add replies to messages 2020-04-16 15:51:28 +02:00

README.md

How to write migrations?

We only write up migrations, down migrations are not always possible in sqlite or too complex/too expensive. For example to remove a column you would have to duplicate the table, copy over the data, delete and recreated. This can be very expensive for some tables (user_messages for example), so should not be attempted.

Notes

One issue we faced multiple times is that updates to user_messages can be very expensive, leading to slow upgrade times and interrupted migrations. So avoid writes if not necessary.