2020-09-01 15:27:01 +02:00
|
|
|
package common
|
2019-08-06 23:50:13 +02:00
|
|
|
|
|
|
|
import (
|
Move to protobuf for Message type (#1706)
* Use a single Message type `v1/message.go` and `message.go` are the same now, and they embed `protobuf.ChatMessage`
* Use `SendChatMessage` for sending chat messages, this is basically the old `Send` but a bit more flexible so we can send different message types (stickers,commands), and not just text.
* Remove dedup from services/shhext. Because now we process in status-protocol, dedup makes less sense, as those messages are going to be processed anyway, so removing for now, we can re-evaluate if bringing it to status-go or not.
* Change the various retrieveX method to a single one:
`RetrieveAll` will be processing those messages that it can process (Currently only `Message`), and return the rest in `RawMessages` (still transit). The format for the response is:
`Chats`: -> The chats updated by receiving the message
`Messages`: -> The messages retrieved (already matched to a chat)
`Contacts`: -> The contacts updated by the messages
`RawMessages` -> Anything else that can't be parsed, eventually as we move everything to status-protocol-go this will go away.
2019-12-05 17:25:34 +01:00
|
|
|
"crypto/ecdsa"
|
2020-05-13 15:16:17 +02:00
|
|
|
"encoding/base64"
|
2021-02-23 07:37:08 +00:00
|
|
|
"encoding/hex"
|
Move to protobuf for Message type (#1706)
* Use a single Message type `v1/message.go` and `message.go` are the same now, and they embed `protobuf.ChatMessage`
* Use `SendChatMessage` for sending chat messages, this is basically the old `Send` but a bit more flexible so we can send different message types (stickers,commands), and not just text.
* Remove dedup from services/shhext. Because now we process in status-protocol, dedup makes less sense, as those messages are going to be processed anyway, so removing for now, we can re-evaluate if bringing it to status-go or not.
* Change the various retrieveX method to a single one:
`RetrieveAll` will be processing those messages that it can process (Currently only `Message`), and return the rest in `RawMessages` (still transit). The format for the response is:
`Chats`: -> The chats updated by receiving the message
`Messages`: -> The messages retrieved (already matched to a chat)
`Contacts`: -> The contacts updated by the messages
`RawMessages` -> Anything else that can't be parsed, eventually as we move everything to status-protocol-go this will go away.
2019-12-05 17:25:34 +01:00
|
|
|
"encoding/json"
|
2020-05-13 15:16:17 +02:00
|
|
|
"errors"
|
|
|
|
"fmt"
|
2023-02-02 17:59:48 +00:00
|
|
|
"io/ioutil"
|
URL unfurling (initial implementation) (#3471)
This is the initial implementation for the new URL unfurling requirements. The
most important one is that only the message sender will pay the privacy cost for
unfurling and extracting metadata from websites. Once the message is sent, the
unfurled data will be stored at the protocol level and receivers will just
profit and happily decode the metadata to render it.
Further development of this URL unfurling capability will be mostly guided by
issues created on clients. For the moment in status-mobile:
https://github.com/status-im/status-mobile/labels/url-preview
- https://github.com/status-im/status-mobile/issues/15918
- https://github.com/status-im/status-mobile/issues/15917
- https://github.com/status-im/status-mobile/issues/15910
- https://github.com/status-im/status-mobile/issues/15909
- https://github.com/status-im/status-mobile/issues/15908
- https://github.com/status-im/status-mobile/issues/15906
- https://github.com/status-im/status-mobile/issues/15905
### Terminology
In the code, I've tried to stick to the word "unfurl URL" to really mean the
process of extracting metadata from a website, sort of lower level. I use "link
preview" to mean a higher level structure which is enriched by unfurled data.
"link preview" is also how designers refer to it.
### User flows
1. Carol needs to see link previews while typing in the chat input field. Notice
from the diagram nothing is persisted and that status-go endpoints are
essentially stateless.
```
#+begin_src plantuml :results verbatim
Client->>Server: Call wakuext_getTextURLs
Server-->>Client: Normalized URLs
Client->>Client: Render cached unfurled URLs
Client->>Server: Unfurl non-cached URLs.\nCall wakuext_unfurlURLs
Server->>Website: Fetch metadata
Website-->>Server: Metadata (thumbnail URL, title, etc)
Server->>Website: Fetch thumbnail
Server->>Website: Fetch favicon
Website-->>Server: Favicon bytes
Website-->>Server: Thumbnail bytes
Server->>Server: Decode & process images
Server-->>Client: Unfurled data (thumbnail data URI, etc)
#+end_src
```
```
,------. ,------. ,-------.
|Client| |Server| |Website|
`--+---' `--+---' `---+---'
| Call wakuext_getTextURLs | |
| ---------------------------------------> |
| | |
| Normalized URLs | |
| <- - - - - - - - - - - - - - - - - - - - |
| | |
|----. | |
| | Render cached unfurled URLs | |
|<---' | |
| | |
| Unfurl non-cached URLs. | |
| Call wakuext_unfurlURLs | |
| ---------------------------------------> |
| | |
| | Fetch metadata |
| | ------------------------------------>
| | |
| | Metadata (thumbnail URL, title, etc)|
| | <- - - - - - - - - - - - - - - - - -
| | |
| | Fetch thumbnail |
| | ------------------------------------>
| | |
| | Fetch favicon |
| | ------------------------------------>
| | |
| | Favicon bytes |
| | <- - - - - - - - - - - - - - - - - -
| | |
| | Thumbnail bytes |
| | <- - - - - - - - - - - - - - - - - -
| | |
| |----. |
| | | Decode & process images |
| |<---' |
| | |
| Unfurled data (thumbnail data URI, etc)| |
| <- - - - - - - - - - - - - - - - - - - - |
,--+---. ,--+---. ,---+---.
|Client| |Server| |Website|
`------' `------' `-------'
```
2. Carol sends the text message with link previews in the RPC request
wakuext_sendChatMessages. status-go assumes the link previews are good
because it can't and shouldn't attempt to re-unfurl them.
```
#+begin_src plantuml :results verbatim
Client->>Server: Call wakuext_sendChatMessages
Server->>Server: Transform link previews to\nbe proto-marshalled
Server->DB: Write link previews serialized as JSON
Server-->>Client: Updated message response
#+end_src
```
```
,------. ,------. ,--.
|Client| |Server| |DB|
`--+---' `--+---' `+-'
| Call wakuext_sendChatMessages| |
| -----------------------------> |
| | |
| |----. |
| | | Transform link previews to |
| |<---' be proto-marshalled |
| | |
| | |
| | Write link previews serialized as JSON|
| | -------------------------------------->
| | |
| Updated message response | |
| <- - - - - - - - - - - - - - - |
,--+---. ,--+---. ,+-.
|Client| |Server| |DB|
`------' `------' `--'
```
3. The message was sent over waku and persisted locally in Carol's device. She
should now see the link previews in the chat history. There can be many link
previews shared by other chat members, therefore it is important to serve the
assets via the media server to avoid overloading the ReactNative bridge with
lots of big JSON payloads containing base64 encoded data URIs (maybe this
concern is meaningless for desktop). When a client is rendering messages with
link previews, they will have the field linkPreviews, and the thumbnail URL
will point to the local media server.
```
#+begin_src plantuml :results verbatim
Client->>Server: GET /link-preview/thumbnail (media server)
Server->>DB: Read from user_messages.unfurled_links
Server->Server: Unmarshal JSON
Server-->>Client: HTTP Content-Type: image/jpeg/etc
#+end_src
```
```
,------. ,------. ,--.
|Client| |Server| |DB|
`--+---' `--+---' `+-'
| GET /link-preview/thumbnail (media server)| |
| ------------------------------------------> |
| | |
| | Read from user_messages.unfurled_links|
| | -------------------------------------->
| | |
| |----. |
| | | Unmarshal JSON |
| |<---' |
| | |
| HTTP Content-Type: image/jpeg/etc | |
| <- - - - - - - - - - - - - - - - - - - - - |
,--+---. ,--+---. ,+-.
|Client| |Server| |DB|
`------' `------' `--'
```
### Some limitations of the current implementation
The following points will become separate issues in status-go that I'll work on
over the next couple weeks. In no order of importance:
- Improve how multiple links are fetched; retries on failure and testing how
unfurling behaves around the timeout limits (deterministically, not by making
real HTTP calls as I did). https://github.com/status-im/status-go/issues/3498
- Unfurl favicons and store them in the protobuf too.
- For this PR, I added unfurling support only for websites with OpenGraph
https://ogp.me/ meta tags. Other unfurlers will be implemented on demand. The
next one will probably be for oEmbed https://oembed.com/, the protocol
supported by YouTube, for example.
- Resize and/or compress thumbnails (and favicons). Often times, thumbnails are
huge for the purposes of link previews. There is already support for
compressing JPEGs in status-go, but I prefer to work with compression in a
separate PR because I'd like to also solve the problem for PNGs (probably
convert them to JPEGs, plus compress them). This would be a safe choice for
thumbnails, favicons not so much because transparency is desirable.
- Editing messages is not yet supported.
- I haven't coded any artificial limit on the number of previews or on the size
of the thumbnail payload. This will be done in a separate issue. I have heard
the ideal solution may be to split messages into smaller chunks of ~125 KiB
because of libp2p, but that might be too complicated at this stage of the
product (?).
- Link preview deletion.
- For the moment, OpenGraph metadata is extracted by requesting data for the
English language (and fallback to whatever is available). In the future, we'll
want to unfurl by respecting the user's local device language. Some websites,
like GoDaddy, are already localized based on the device's IP, but many aren't.
- The website's description text should be limited by a certain number of
characters, especially because it's outside our control. Exactly how much has
not been decided yet, so it'll be done separately.
- URL normalization can be tricky, so I implemented only the basics to help with
caching. For example, the url https://status.im and HTTPS://status.im are
considered identical. Also, a URL is considered valid for unfurling if its TLD
exists according to publicsuffix.EffectiveTLDPlusOne. This was essential,
otherwise the default Go url.Parse approach would consider many invalid URLs
valid, and thus the server would waste resources trying to unfurl the
unfurleable.
### Other requirements
- If the message is edited, the link previews should reflect the edited text,
not the original one. This has been aligned with the design team as well.
- If the website's thumbnail or the favicon can't be fetched, just ignore them.
The only mandatory piece of metadata is the website's title and URL.
- Link previews in clients should be generated in near real-time, that is, as
the user types, previews are updated. In mobile this performs very well, and
it's what other clients like WhatsApp, Telegram, and Facebook do.
### Decisions
- While the user typing in the input field, the client is constantly (debounced)
asking status-go to parse the text and extract normalized URLs and then the
client checks if they're already in its in-memory cache. If they are, no RPC
call is made. I chose this approach to achieve the best possible performance
in mobile and avoid the whole RPC overhead, since the chat experience is
already not smooth enough. The mobile client uses URLs as cache keys in a
hashmap, i.e. if the key is present, it means the preview is readily available
(naive, but good enough for now). This decision also gave me more flexibility
to find the best UX at this stage of the feature.
- Due to the requirement that users should be able to see independent loading
indicators for each link preview, when status-go can't unfurl a URL, it
doesn't return it in the response.
- As an initial implementation, I added the BLOB column unfurled_links to the
user_messages table. The preview data is then serialized as JSON before being
stored in this column. I felt that creating a separate table and the related
code for this initial PR would be inconvenient. Is that reasonable to you?
Once things stabilize I can create a proper table if we want to avoid this
kind of solution with serialized columns.
2023-05-18 15:43:06 -03:00
|
|
|
"net/url"
|
2023-02-02 17:59:48 +00:00
|
|
|
"os"
|
Move to protobuf for Message type (#1706)
* Use a single Message type `v1/message.go` and `message.go` are the same now, and they embed `protobuf.ChatMessage`
* Use `SendChatMessage` for sending chat messages, this is basically the old `Send` but a bit more flexible so we can send different message types (stickers,commands), and not just text.
* Remove dedup from services/shhext. Because now we process in status-protocol, dedup makes less sense, as those messages are going to be processed anyway, so removing for now, we can re-evaluate if bringing it to status-go or not.
* Change the various retrieveX method to a single one:
`RetrieveAll` will be processing those messages that it can process (Currently only `Message`), and return the rest in `RawMessages` (still transit). The format for the response is:
`Chats`: -> The chats updated by receiving the message
`Messages`: -> The messages retrieved (already matched to a chat)
`Contacts`: -> The contacts updated by the messages
`RawMessages` -> Anything else that can't be parsed, eventually as we move everything to status-protocol-go this will go away.
2019-12-05 17:25:34 +01:00
|
|
|
"strings"
|
|
|
|
"unicode"
|
|
|
|
"unicode/utf8"
|
2019-08-06 23:50:13 +02:00
|
|
|
|
2020-07-25 17:13:08 +01:00
|
|
|
"github.com/golang/protobuf/proto"
|
2020-09-01 15:27:01 +02:00
|
|
|
|
2020-07-25 17:13:08 +01:00
|
|
|
"github.com/status-im/markdown"
|
2020-09-01 12:34:28 +02:00
|
|
|
"github.com/status-im/markdown/ast"
|
2020-11-25 14:11:12 +00:00
|
|
|
|
2023-03-06 10:51:09 +02:00
|
|
|
accountJson "github.com/status-im/status-go/account/json"
|
2021-02-23 07:37:08 +00:00
|
|
|
"github.com/status-im/status-go/eth-node/crypto"
|
2020-11-25 00:34:32 +00:00
|
|
|
"github.com/status-im/status-go/images"
|
2023-02-02 17:59:48 +00:00
|
|
|
"github.com/status-im/status-go/protocol/audio"
|
Move to protobuf for Message type (#1706)
* Use a single Message type `v1/message.go` and `message.go` are the same now, and they embed `protobuf.ChatMessage`
* Use `SendChatMessage` for sending chat messages, this is basically the old `Send` but a bit more flexible so we can send different message types (stickers,commands), and not just text.
* Remove dedup from services/shhext. Because now we process in status-protocol, dedup makes less sense, as those messages are going to be processed anyway, so removing for now, we can re-evaluate if bringing it to status-go or not.
* Change the various retrieveX method to a single one:
`RetrieveAll` will be processing those messages that it can process (Currently only `Message`), and return the rest in `RawMessages` (still transit). The format for the response is:
`Chats`: -> The chats updated by receiving the message
`Messages`: -> The messages retrieved (already matched to a chat)
`Contacts`: -> The contacts updated by the messages
`RawMessages` -> Anything else that can't be parsed, eventually as we move everything to status-protocol-go this will go away.
2019-12-05 17:25:34 +01:00
|
|
|
"github.com/status-im/status-go/protocol/protobuf"
|
2019-08-06 23:50:13 +02:00
|
|
|
)
|
|
|
|
|
2019-08-20 13:20:25 +02:00
|
|
|
// QuotedMessage contains the original text of the message replied to
|
|
|
|
type QuotedMessage struct {
|
2022-03-16 17:28:14 +01:00
|
|
|
ID string `json:"id"`
|
|
|
|
ContentType int64 `json:"contentType"`
|
2019-08-20 13:20:25 +02:00
|
|
|
// From is a public key of the author of the message.
|
2023-06-06 15:52:07 +04:00
|
|
|
From string `json:"from"`
|
|
|
|
Text string `json:"text"`
|
|
|
|
ParsedText json.RawMessage `json:"parsedText,omitempty"`
|
|
|
|
AlbumImagesCount int64 `json:"albumImagesCount"`
|
2022-03-16 17:28:14 +01:00
|
|
|
// ImageLocalURL is the local url of the image
|
|
|
|
ImageLocalURL string `json:"image,omitempty"`
|
2022-10-05 21:54:47 +04:00
|
|
|
// AudioLocalURL is the local url of the audio
|
|
|
|
AudioLocalURL string `json:"audio,omitempty"`
|
|
|
|
|
|
|
|
HasSticker bool `json:"sticker,omitempty"`
|
2020-11-18 10:16:51 +01:00
|
|
|
// CommunityID is the id of the community advertised
|
|
|
|
CommunityID string `json:"communityId,omitempty"`
|
2023-01-11 16:09:40 -05:00
|
|
|
|
|
|
|
Deleted bool `json:"deleted,omitempty"`
|
2023-01-13 16:28:49 +01:00
|
|
|
|
2023-04-17 15:44:48 +01:00
|
|
|
DeletedForMe bool `json:"deletedForMe,omitempty"`
|
|
|
|
|
2023-01-13 16:28:49 +01:00
|
|
|
DiscordMessage *protobuf.DiscordMessage `json:"discordMessage,omitempty"`
|
2019-08-20 13:20:25 +02:00
|
|
|
}
|
|
|
|
|
2020-01-10 19:59:01 +01:00
|
|
|
type CommandState int
|
|
|
|
|
|
|
|
const (
|
|
|
|
CommandStateRequestAddressForTransaction CommandState = iota + 1
|
|
|
|
CommandStateRequestAddressForTransactionDeclined
|
|
|
|
CommandStateRequestAddressForTransactionAccepted
|
|
|
|
CommandStateRequestTransaction
|
|
|
|
CommandStateRequestTransactionDeclined
|
|
|
|
CommandStateTransactionPending
|
|
|
|
CommandStateTransactionSent
|
|
|
|
)
|
|
|
|
|
2022-01-18 16:31:34 +00:00
|
|
|
type ContactRequestState int
|
|
|
|
|
|
|
|
const (
|
|
|
|
ContactRequestStatePending ContactRequestState = iota + 1
|
|
|
|
ContactRequestStateAccepted
|
|
|
|
ContactRequestStateDismissed
|
|
|
|
)
|
|
|
|
|
2022-08-31 15:41:58 +01:00
|
|
|
type ContactVerificationState int
|
|
|
|
|
|
|
|
const (
|
|
|
|
ContactVerificationStatePending ContactVerificationState = iota + 1
|
|
|
|
ContactVerificationStateAccepted
|
2022-10-24 12:33:47 +01:00
|
|
|
ContactVerificationStateDeclined
|
2022-10-28 12:37:46 +01:00
|
|
|
ContactVerificationStateTrusted
|
|
|
|
ContactVerificationStateUntrustworthy
|
2022-12-14 12:27:02 +04:00
|
|
|
ContactVerificationStateCanceled
|
2022-08-31 15:41:58 +01:00
|
|
|
)
|
|
|
|
|
URL unfurling (initial implementation) (#3471)
This is the initial implementation for the new URL unfurling requirements. The
most important one is that only the message sender will pay the privacy cost for
unfurling and extracting metadata from websites. Once the message is sent, the
unfurled data will be stored at the protocol level and receivers will just
profit and happily decode the metadata to render it.
Further development of this URL unfurling capability will be mostly guided by
issues created on clients. For the moment in status-mobile:
https://github.com/status-im/status-mobile/labels/url-preview
- https://github.com/status-im/status-mobile/issues/15918
- https://github.com/status-im/status-mobile/issues/15917
- https://github.com/status-im/status-mobile/issues/15910
- https://github.com/status-im/status-mobile/issues/15909
- https://github.com/status-im/status-mobile/issues/15908
- https://github.com/status-im/status-mobile/issues/15906
- https://github.com/status-im/status-mobile/issues/15905
### Terminology
In the code, I've tried to stick to the word "unfurl URL" to really mean the
process of extracting metadata from a website, sort of lower level. I use "link
preview" to mean a higher level structure which is enriched by unfurled data.
"link preview" is also how designers refer to it.
### User flows
1. Carol needs to see link previews while typing in the chat input field. Notice
from the diagram nothing is persisted and that status-go endpoints are
essentially stateless.
```
#+begin_src plantuml :results verbatim
Client->>Server: Call wakuext_getTextURLs
Server-->>Client: Normalized URLs
Client->>Client: Render cached unfurled URLs
Client->>Server: Unfurl non-cached URLs.\nCall wakuext_unfurlURLs
Server->>Website: Fetch metadata
Website-->>Server: Metadata (thumbnail URL, title, etc)
Server->>Website: Fetch thumbnail
Server->>Website: Fetch favicon
Website-->>Server: Favicon bytes
Website-->>Server: Thumbnail bytes
Server->>Server: Decode & process images
Server-->>Client: Unfurled data (thumbnail data URI, etc)
#+end_src
```
```
,------. ,------. ,-------.
|Client| |Server| |Website|
`--+---' `--+---' `---+---'
| Call wakuext_getTextURLs | |
| ---------------------------------------> |
| | |
| Normalized URLs | |
| <- - - - - - - - - - - - - - - - - - - - |
| | |
|----. | |
| | Render cached unfurled URLs | |
|<---' | |
| | |
| Unfurl non-cached URLs. | |
| Call wakuext_unfurlURLs | |
| ---------------------------------------> |
| | |
| | Fetch metadata |
| | ------------------------------------>
| | |
| | Metadata (thumbnail URL, title, etc)|
| | <- - - - - - - - - - - - - - - - - -
| | |
| | Fetch thumbnail |
| | ------------------------------------>
| | |
| | Fetch favicon |
| | ------------------------------------>
| | |
| | Favicon bytes |
| | <- - - - - - - - - - - - - - - - - -
| | |
| | Thumbnail bytes |
| | <- - - - - - - - - - - - - - - - - -
| | |
| |----. |
| | | Decode & process images |
| |<---' |
| | |
| Unfurled data (thumbnail data URI, etc)| |
| <- - - - - - - - - - - - - - - - - - - - |
,--+---. ,--+---. ,---+---.
|Client| |Server| |Website|
`------' `------' `-------'
```
2. Carol sends the text message with link previews in the RPC request
wakuext_sendChatMessages. status-go assumes the link previews are good
because it can't and shouldn't attempt to re-unfurl them.
```
#+begin_src plantuml :results verbatim
Client->>Server: Call wakuext_sendChatMessages
Server->>Server: Transform link previews to\nbe proto-marshalled
Server->DB: Write link previews serialized as JSON
Server-->>Client: Updated message response
#+end_src
```
```
,------. ,------. ,--.
|Client| |Server| |DB|
`--+---' `--+---' `+-'
| Call wakuext_sendChatMessages| |
| -----------------------------> |
| | |
| |----. |
| | | Transform link previews to |
| |<---' be proto-marshalled |
| | |
| | |
| | Write link previews serialized as JSON|
| | -------------------------------------->
| | |
| Updated message response | |
| <- - - - - - - - - - - - - - - |
,--+---. ,--+---. ,+-.
|Client| |Server| |DB|
`------' `------' `--'
```
3. The message was sent over waku and persisted locally in Carol's device. She
should now see the link previews in the chat history. There can be many link
previews shared by other chat members, therefore it is important to serve the
assets via the media server to avoid overloading the ReactNative bridge with
lots of big JSON payloads containing base64 encoded data URIs (maybe this
concern is meaningless for desktop). When a client is rendering messages with
link previews, they will have the field linkPreviews, and the thumbnail URL
will point to the local media server.
```
#+begin_src plantuml :results verbatim
Client->>Server: GET /link-preview/thumbnail (media server)
Server->>DB: Read from user_messages.unfurled_links
Server->Server: Unmarshal JSON
Server-->>Client: HTTP Content-Type: image/jpeg/etc
#+end_src
```
```
,------. ,------. ,--.
|Client| |Server| |DB|
`--+---' `--+---' `+-'
| GET /link-preview/thumbnail (media server)| |
| ------------------------------------------> |
| | |
| | Read from user_messages.unfurled_links|
| | -------------------------------------->
| | |
| |----. |
| | | Unmarshal JSON |
| |<---' |
| | |
| HTTP Content-Type: image/jpeg/etc | |
| <- - - - - - - - - - - - - - - - - - - - - |
,--+---. ,--+---. ,+-.
|Client| |Server| |DB|
`------' `------' `--'
```
### Some limitations of the current implementation
The following points will become separate issues in status-go that I'll work on
over the next couple weeks. In no order of importance:
- Improve how multiple links are fetched; retries on failure and testing how
unfurling behaves around the timeout limits (deterministically, not by making
real HTTP calls as I did). https://github.com/status-im/status-go/issues/3498
- Unfurl favicons and store them in the protobuf too.
- For this PR, I added unfurling support only for websites with OpenGraph
https://ogp.me/ meta tags. Other unfurlers will be implemented on demand. The
next one will probably be for oEmbed https://oembed.com/, the protocol
supported by YouTube, for example.
- Resize and/or compress thumbnails (and favicons). Often times, thumbnails are
huge for the purposes of link previews. There is already support for
compressing JPEGs in status-go, but I prefer to work with compression in a
separate PR because I'd like to also solve the problem for PNGs (probably
convert them to JPEGs, plus compress them). This would be a safe choice for
thumbnails, favicons not so much because transparency is desirable.
- Editing messages is not yet supported.
- I haven't coded any artificial limit on the number of previews or on the size
of the thumbnail payload. This will be done in a separate issue. I have heard
the ideal solution may be to split messages into smaller chunks of ~125 KiB
because of libp2p, but that might be too complicated at this stage of the
product (?).
- Link preview deletion.
- For the moment, OpenGraph metadata is extracted by requesting data for the
English language (and fallback to whatever is available). In the future, we'll
want to unfurl by respecting the user's local device language. Some websites,
like GoDaddy, are already localized based on the device's IP, but many aren't.
- The website's description text should be limited by a certain number of
characters, especially because it's outside our control. Exactly how much has
not been decided yet, so it'll be done separately.
- URL normalization can be tricky, so I implemented only the basics to help with
caching. For example, the url https://status.im and HTTPS://status.im are
considered identical. Also, a URL is considered valid for unfurling if its TLD
exists according to publicsuffix.EffectiveTLDPlusOne. This was essential,
otherwise the default Go url.Parse approach would consider many invalid URLs
valid, and thus the server would waste resources trying to unfurl the
unfurleable.
### Other requirements
- If the message is edited, the link previews should reflect the edited text,
not the original one. This has been aligned with the design team as well.
- If the website's thumbnail or the favicon can't be fetched, just ignore them.
The only mandatory piece of metadata is the website's title and URL.
- Link previews in clients should be generated in near real-time, that is, as
the user types, previews are updated. In mobile this performs very well, and
it's what other clients like WhatsApp, Telegram, and Facebook do.
### Decisions
- While the user typing in the input field, the client is constantly (debounced)
asking status-go to parse the text and extract normalized URLs and then the
client checks if they're already in its in-memory cache. If they are, no RPC
call is made. I chose this approach to achieve the best possible performance
in mobile and avoid the whole RPC overhead, since the chat experience is
already not smooth enough. The mobile client uses URLs as cache keys in a
hashmap, i.e. if the key is present, it means the preview is readily available
(naive, but good enough for now). This decision also gave me more flexibility
to find the best UX at this stage of the feature.
- Due to the requirement that users should be able to see independent loading
indicators for each link preview, when status-go can't unfurl a URL, it
doesn't return it in the response.
- As an initial implementation, I added the BLOB column unfurled_links to the
user_messages table. The preview data is then serialized as JSON before being
stored in this column. I felt that creating a separate table and the related
code for this initial PR would be inconvenient. Is that reasonable to you?
Once things stabilize I can create a proper table if we want to avoid this
kind of solution with serialized columns.
2023-05-18 15:43:06 -03:00
|
|
|
type LinkPreviewThumbnail struct {
|
|
|
|
Width int `json:"width"`
|
|
|
|
Height int `json:"height"`
|
|
|
|
// Non-empty when the thumbnail is available via the media server, i.e. after
|
|
|
|
// the chat message is sent.
|
|
|
|
URL string `json:"url,omitempty"`
|
|
|
|
// Non-empty when the thumbnail payload needs to be shared with the client,
|
|
|
|
// but before it has been persisted.
|
|
|
|
DataURI string `json:"dataUri,omitempty"`
|
|
|
|
}
|
|
|
|
|
|
|
|
type LinkPreview struct {
|
|
|
|
URL string `json:"url"`
|
|
|
|
Hostname string `json:"hostname"`
|
|
|
|
Title string `json:"title,omitempty"`
|
|
|
|
Description string `json:"description,omitempty"`
|
|
|
|
Thumbnail LinkPreviewThumbnail `json:"thumbnail,omitempty"`
|
|
|
|
}
|
|
|
|
|
2023-02-21 16:20:10 +00:00
|
|
|
const EveryoneMentionTag = "0x00001"
|
2022-12-19 15:49:23 +01:00
|
|
|
|
2020-01-10 19:59:01 +01:00
|
|
|
type CommandParameters struct {
|
|
|
|
// ID is the ID of the initial message
|
|
|
|
ID string `json:"id"`
|
|
|
|
// From is the address we are sending the command from
|
|
|
|
From string `json:"from"`
|
|
|
|
// Address is the address sent with the command
|
|
|
|
Address string `json:"address"`
|
|
|
|
// Contract is the contract address for ERC20 tokens
|
|
|
|
Contract string `json:"contract"`
|
|
|
|
// Value is the value as a string sent
|
|
|
|
Value string `json:"value"`
|
|
|
|
// TransactionHash is the hash of the transaction
|
|
|
|
TransactionHash string `json:"transactionHash"`
|
|
|
|
// CommandState is the state of the command
|
|
|
|
CommandState CommandState `json:"commandState"`
|
|
|
|
// The Signature of the pk-bytes+transaction-hash from the wallet
|
|
|
|
// address originating
|
|
|
|
Signature []byte `json:"signature"`
|
|
|
|
}
|
|
|
|
|
2021-03-25 16:15:22 +01:00
|
|
|
// GapParameters is the From and To indicating the missing period in chat history
|
|
|
|
type GapParameters struct {
|
|
|
|
From uint32 `json:"from,omitempty"`
|
|
|
|
To uint32 `json:"to,omitempty"`
|
|
|
|
}
|
|
|
|
|
2020-01-10 19:59:01 +01:00
|
|
|
func (c *CommandParameters) IsTokenTransfer() bool {
|
|
|
|
return len(c.Contract) != 0
|
|
|
|
}
|
|
|
|
|
Move to protobuf for Message type (#1706)
* Use a single Message type `v1/message.go` and `message.go` are the same now, and they embed `protobuf.ChatMessage`
* Use `SendChatMessage` for sending chat messages, this is basically the old `Send` but a bit more flexible so we can send different message types (stickers,commands), and not just text.
* Remove dedup from services/shhext. Because now we process in status-protocol, dedup makes less sense, as those messages are going to be processed anyway, so removing for now, we can re-evaluate if bringing it to status-go or not.
* Change the various retrieveX method to a single one:
`RetrieveAll` will be processing those messages that it can process (Currently only `Message`), and return the rest in `RawMessages` (still transit). The format for the response is:
`Chats`: -> The chats updated by receiving the message
`Messages`: -> The messages retrieved (already matched to a chat)
`Contacts`: -> The contacts updated by the messages
`RawMessages` -> Anything else that can't be parsed, eventually as we move everything to status-protocol-go this will go away.
2019-12-05 17:25:34 +01:00
|
|
|
const (
|
2021-02-23 17:47:45 +02:00
|
|
|
OutgoingStatusSending = "sending"
|
|
|
|
OutgoingStatusSent = "sent"
|
|
|
|
OutgoingStatusDelivered = "delivered"
|
Move to protobuf for Message type (#1706)
* Use a single Message type `v1/message.go` and `message.go` are the same now, and they embed `protobuf.ChatMessage`
* Use `SendChatMessage` for sending chat messages, this is basically the old `Send` but a bit more flexible so we can send different message types (stickers,commands), and not just text.
* Remove dedup from services/shhext. Because now we process in status-protocol, dedup makes less sense, as those messages are going to be processed anyway, so removing for now, we can re-evaluate if bringing it to status-go or not.
* Change the various retrieveX method to a single one:
`RetrieveAll` will be processing those messages that it can process (Currently only `Message`), and return the rest in `RawMessages` (still transit). The format for the response is:
`Chats`: -> The chats updated by receiving the message
`Messages`: -> The messages retrieved (already matched to a chat)
`Contacts`: -> The contacts updated by the messages
`RawMessages` -> Anything else that can't be parsed, eventually as we move everything to status-protocol-go this will go away.
2019-12-05 17:25:34 +01:00
|
|
|
)
|
|
|
|
|
2022-09-20 11:13:44 +02:00
|
|
|
type Messages []*Message
|
|
|
|
|
|
|
|
func (m Messages) GetClock(i int) uint64 {
|
|
|
|
return m[i].Clock
|
|
|
|
}
|
|
|
|
|
2019-08-06 23:50:13 +02:00
|
|
|
// Message represents a message record in the database,
|
2020-01-10 19:59:01 +01:00
|
|
|
// more specifically in user_messages table.
|
2019-08-06 23:50:13 +02:00
|
|
|
type Message struct {
|
Move to protobuf for Message type (#1706)
* Use a single Message type `v1/message.go` and `message.go` are the same now, and they embed `protobuf.ChatMessage`
* Use `SendChatMessage` for sending chat messages, this is basically the old `Send` but a bit more flexible so we can send different message types (stickers,commands), and not just text.
* Remove dedup from services/shhext. Because now we process in status-protocol, dedup makes less sense, as those messages are going to be processed anyway, so removing for now, we can re-evaluate if bringing it to status-go or not.
* Change the various retrieveX method to a single one:
`RetrieveAll` will be processing those messages that it can process (Currently only `Message`), and return the rest in `RawMessages` (still transit). The format for the response is:
`Chats`: -> The chats updated by receiving the message
`Messages`: -> The messages retrieved (already matched to a chat)
`Contacts`: -> The contacts updated by the messages
`RawMessages` -> Anything else that can't be parsed, eventually as we move everything to status-protocol-go this will go away.
2019-12-05 17:25:34 +01:00
|
|
|
protobuf.ChatMessage
|
|
|
|
|
2019-08-06 23:50:13 +02:00
|
|
|
// ID calculated as keccak256(compressedAuthorPubKey, data) where data is unencrypted payload.
|
|
|
|
ID string `json:"id"`
|
|
|
|
// WhisperTimestamp is a timestamp of a Whisper envelope.
|
Move to protobuf for Message type (#1706)
* Use a single Message type `v1/message.go` and `message.go` are the same now, and they embed `protobuf.ChatMessage`
* Use `SendChatMessage` for sending chat messages, this is basically the old `Send` but a bit more flexible so we can send different message types (stickers,commands), and not just text.
* Remove dedup from services/shhext. Because now we process in status-protocol, dedup makes less sense, as those messages are going to be processed anyway, so removing for now, we can re-evaluate if bringing it to status-go or not.
* Change the various retrieveX method to a single one:
`RetrieveAll` will be processing those messages that it can process (Currently only `Message`), and return the rest in `RawMessages` (still transit). The format for the response is:
`Chats`: -> The chats updated by receiving the message
`Messages`: -> The messages retrieved (already matched to a chat)
`Contacts`: -> The contacts updated by the messages
`RawMessages` -> Anything else that can't be parsed, eventually as we move everything to status-protocol-go this will go away.
2019-12-05 17:25:34 +01:00
|
|
|
WhisperTimestamp uint64 `json:"whisperTimestamp"`
|
2019-08-06 23:50:13 +02:00
|
|
|
// From is a public key of the author of the message.
|
2019-08-20 13:20:25 +02:00
|
|
|
From string `json:"from"`
|
2019-09-26 09:01:17 +02:00
|
|
|
// Random 3 words name
|
|
|
|
Alias string `json:"alias"`
|
|
|
|
// Identicon of the author
|
|
|
|
Identicon string `json:"identicon"`
|
Move to protobuf for Message type (#1706)
* Use a single Message type `v1/message.go` and `message.go` are the same now, and they embed `protobuf.ChatMessage`
* Use `SendChatMessage` for sending chat messages, this is basically the old `Send` but a bit more flexible so we can send different message types (stickers,commands), and not just text.
* Remove dedup from services/shhext. Because now we process in status-protocol, dedup makes less sense, as those messages are going to be processed anyway, so removing for now, we can re-evaluate if bringing it to status-go or not.
* Change the various retrieveX method to a single one:
`RetrieveAll` will be processing those messages that it can process (Currently only `Message`), and return the rest in `RawMessages` (still transit). The format for the response is:
`Chats`: -> The chats updated by receiving the message
`Messages`: -> The messages retrieved (already matched to a chat)
`Contacts`: -> The contacts updated by the messages
`RawMessages` -> Anything else that can't be parsed, eventually as we move everything to status-protocol-go this will go away.
2019-12-05 17:25:34 +01:00
|
|
|
// The chat id to be stored locally
|
|
|
|
LocalChatID string `json:"localChatId"`
|
2021-02-23 17:47:45 +02:00
|
|
|
// Seen set to true when user have read this message already
|
2019-08-06 23:50:13 +02:00
|
|
|
Seen bool `json:"seen"`
|
|
|
|
OutgoingStatus string `json:"outgoingStatus,omitempty"`
|
Move to protobuf for Message type (#1706)
* Use a single Message type `v1/message.go` and `message.go` are the same now, and they embed `protobuf.ChatMessage`
* Use `SendChatMessage` for sending chat messages, this is basically the old `Send` but a bit more flexible so we can send different message types (stickers,commands), and not just text.
* Remove dedup from services/shhext. Because now we process in status-protocol, dedup makes less sense, as those messages are going to be processed anyway, so removing for now, we can re-evaluate if bringing it to status-go or not.
* Change the various retrieveX method to a single one:
`RetrieveAll` will be processing those messages that it can process (Currently only `Message`), and return the rest in `RawMessages` (still transit). The format for the response is:
`Chats`: -> The chats updated by receiving the message
`Messages`: -> The messages retrieved (already matched to a chat)
`Contacts`: -> The contacts updated by the messages
`RawMessages` -> Anything else that can't be parsed, eventually as we move everything to status-protocol-go this will go away.
2019-12-05 17:25:34 +01:00
|
|
|
|
2019-08-20 13:20:25 +02:00
|
|
|
QuotedMessage *QuotedMessage `json:"quotedMessage"`
|
Move to protobuf for Message type (#1706)
* Use a single Message type `v1/message.go` and `message.go` are the same now, and they embed `protobuf.ChatMessage`
* Use `SendChatMessage` for sending chat messages, this is basically the old `Send` but a bit more flexible so we can send different message types (stickers,commands), and not just text.
* Remove dedup from services/shhext. Because now we process in status-protocol, dedup makes less sense, as those messages are going to be processed anyway, so removing for now, we can re-evaluate if bringing it to status-go or not.
* Change the various retrieveX method to a single one:
`RetrieveAll` will be processing those messages that it can process (Currently only `Message`), and return the rest in `RawMessages` (still transit). The format for the response is:
`Chats`: -> The chats updated by receiving the message
`Messages`: -> The messages retrieved (already matched to a chat)
`Contacts`: -> The contacts updated by the messages
`RawMessages` -> Anything else that can't be parsed, eventually as we move everything to status-protocol-go this will go away.
2019-12-05 17:25:34 +01:00
|
|
|
|
2020-01-10 19:59:01 +01:00
|
|
|
// CommandParameters is the parameters sent with the message
|
|
|
|
CommandParameters *CommandParameters `json:"commandParameters"`
|
|
|
|
|
2021-03-25 16:15:22 +01:00
|
|
|
// GapParameters is the value from/to related to the gap
|
|
|
|
GapParameters *GapParameters `json:"gapParameters,omitempty"`
|
|
|
|
|
Move to protobuf for Message type (#1706)
* Use a single Message type `v1/message.go` and `message.go` are the same now, and they embed `protobuf.ChatMessage`
* Use `SendChatMessage` for sending chat messages, this is basically the old `Send` but a bit more flexible so we can send different message types (stickers,commands), and not just text.
* Remove dedup from services/shhext. Because now we process in status-protocol, dedup makes less sense, as those messages are going to be processed anyway, so removing for now, we can re-evaluate if bringing it to status-go or not.
* Change the various retrieveX method to a single one:
`RetrieveAll` will be processing those messages that it can process (Currently only `Message`), and return the rest in `RawMessages` (still transit). The format for the response is:
`Chats`: -> The chats updated by receiving the message
`Messages`: -> The messages retrieved (already matched to a chat)
`Contacts`: -> The contacts updated by the messages
`RawMessages` -> Anything else that can't be parsed, eventually as we move everything to status-protocol-go this will go away.
2019-12-05 17:25:34 +01:00
|
|
|
// Computed fields
|
Add replies to messages
Currently replies to messages are handled in status-react.
This causes some issues with the fact that sometimes replies might come
out of order, they might be offloaded to the database etc.
This commit changes the behavior so that status-go always returns the
replies, and in case a reply comes out of order (first the reply, later
the message being replied to), it will include in the messages the
updated message.
It also adds some fields (RTL,Replace,LineCount) to the database which
were not previously saved, resulting in some potential bugs.
The method that we use to pull replies is currently a bit naive, we just
pull all the message again from the database, but has the advantage of
being simple. It will go through performance testing to make sure
performnace are acceptable, if so I think it's reasonable to avoid some
complexity.
2020-04-08 15:42:02 +02:00
|
|
|
// RTL is whether this is a right-to-left message (arabic/hebrew script etc)
|
|
|
|
RTL bool `json:"rtl"`
|
|
|
|
// ParsedText is the parsed markdown for displaying
|
2020-07-30 22:54:33 +02:00
|
|
|
ParsedText []byte `json:"parsedText,omitempty"`
|
2021-03-31 18:23:45 +02:00
|
|
|
// ParsedTextAst is the ast of the parsed text
|
|
|
|
ParsedTextAst *ast.Node `json:"-"`
|
Add replies to messages
Currently replies to messages are handled in status-react.
This causes some issues with the fact that sometimes replies might come
out of order, they might be offloaded to the database etc.
This commit changes the behavior so that status-go always returns the
replies, and in case a reply comes out of order (first the reply, later
the message being replied to), it will include in the messages the
updated message.
It also adds some fields (RTL,Replace,LineCount) to the database which
were not previously saved, resulting in some potential bugs.
The method that we use to pull replies is currently a bit naive, we just
pull all the message again from the database, but has the advantage of
being simple. It will go through performance testing to make sure
performnace are acceptable, if so I think it's reasonable to avoid some
complexity.
2020-04-08 15:42:02 +02:00
|
|
|
// LineCount is the count of newlines in the message
|
|
|
|
LineCount int `json:"lineCount"`
|
2020-05-13 15:16:17 +02:00
|
|
|
// Base64Image is the converted base64 image
|
|
|
|
Base64Image string `json:"image,omitempty"`
|
|
|
|
// ImagePath is the path of the image to be sent
|
|
|
|
ImagePath string `json:"imagePath,omitempty"`
|
2020-06-17 20:55:49 +02:00
|
|
|
// Base64Audio is the converted base64 audio
|
|
|
|
Base64Audio string `json:"audio,omitempty"`
|
|
|
|
// AudioPath is the path of the audio to be sent
|
|
|
|
AudioPath string `json:"audioPath,omitempty"`
|
2022-02-10 18:19:34 +01:00
|
|
|
// ImageLocalURL is the local url of the image
|
|
|
|
ImageLocalURL string `json:"imageLocalUrl,omitempty"`
|
2022-02-23 15:34:16 +01:00
|
|
|
// AudioLocalURL is the local url of the audio
|
|
|
|
AudioLocalURL string `json:"audioLocalUrl,omitempty"`
|
2022-05-09 09:07:57 -04:00
|
|
|
// StickerLocalURL is the local url of the sticker
|
|
|
|
StickerLocalURL string `json:"stickerLocalUrl,omitempty"`
|
Move to protobuf for Message type (#1706)
* Use a single Message type `v1/message.go` and `message.go` are the same now, and they embed `protobuf.ChatMessage`
* Use `SendChatMessage` for sending chat messages, this is basically the old `Send` but a bit more flexible so we can send different message types (stickers,commands), and not just text.
* Remove dedup from services/shhext. Because now we process in status-protocol, dedup makes less sense, as those messages are going to be processed anyway, so removing for now, we can re-evaluate if bringing it to status-go or not.
* Change the various retrieveX method to a single one:
`RetrieveAll` will be processing those messages that it can process (Currently only `Message`), and return the rest in `RawMessages` (still transit). The format for the response is:
`Chats`: -> The chats updated by receiving the message
`Messages`: -> The messages retrieved (already matched to a chat)
`Contacts`: -> The contacts updated by the messages
`RawMessages` -> Anything else that can't be parsed, eventually as we move everything to status-protocol-go this will go away.
2019-12-05 17:25:34 +01:00
|
|
|
|
2020-11-18 10:16:51 +01:00
|
|
|
// CommunityID is the id of the community to advertise
|
|
|
|
CommunityID string `json:"communityId,omitempty"`
|
|
|
|
|
2021-06-03 15:11:55 +02:00
|
|
|
// Replace indicates that this is a replacement of a message
|
|
|
|
// that has been updated
|
|
|
|
Replace string `json:"replace,omitempty"`
|
|
|
|
New bool `json:"new,omitempty"`
|
2020-11-10 17:08:32 +02:00
|
|
|
|
Move to protobuf for Message type (#1706)
* Use a single Message type `v1/message.go` and `message.go` are the same now, and they embed `protobuf.ChatMessage`
* Use `SendChatMessage` for sending chat messages, this is basically the old `Send` but a bit more flexible so we can send different message types (stickers,commands), and not just text.
* Remove dedup from services/shhext. Because now we process in status-protocol, dedup makes less sense, as those messages are going to be processed anyway, so removing for now, we can re-evaluate if bringing it to status-go or not.
* Change the various retrieveX method to a single one:
`RetrieveAll` will be processing those messages that it can process (Currently only `Message`), and return the rest in `RawMessages` (still transit). The format for the response is:
`Chats`: -> The chats updated by receiving the message
`Messages`: -> The messages retrieved (already matched to a chat)
`Contacts`: -> The contacts updated by the messages
`RawMessages` -> Anything else that can't be parsed, eventually as we move everything to status-protocol-go this will go away.
2019-12-05 17:25:34 +01:00
|
|
|
SigPubKey *ecdsa.PublicKey `json:"-"`
|
2020-09-01 12:34:28 +02:00
|
|
|
|
|
|
|
// Mentions is an array of mentions for a given message
|
|
|
|
Mentions []string
|
2020-10-27 19:35:28 +02:00
|
|
|
|
2021-05-25 11:40:02 +02:00
|
|
|
// Mentioned is whether the user is mentioned in the message
|
|
|
|
Mentioned bool `json:"mentioned"`
|
|
|
|
|
2023-01-10 12:59:32 -05:00
|
|
|
// Replied is whether the user is replied to in the message
|
|
|
|
Replied bool `json:"replied"`
|
|
|
|
|
2020-10-27 19:35:28 +02:00
|
|
|
// Links is an array of links within given message
|
URL unfurling (initial implementation) (#3471)
This is the initial implementation for the new URL unfurling requirements. The
most important one is that only the message sender will pay the privacy cost for
unfurling and extracting metadata from websites. Once the message is sent, the
unfurled data will be stored at the protocol level and receivers will just
profit and happily decode the metadata to render it.
Further development of this URL unfurling capability will be mostly guided by
issues created on clients. For the moment in status-mobile:
https://github.com/status-im/status-mobile/labels/url-preview
- https://github.com/status-im/status-mobile/issues/15918
- https://github.com/status-im/status-mobile/issues/15917
- https://github.com/status-im/status-mobile/issues/15910
- https://github.com/status-im/status-mobile/issues/15909
- https://github.com/status-im/status-mobile/issues/15908
- https://github.com/status-im/status-mobile/issues/15906
- https://github.com/status-im/status-mobile/issues/15905
### Terminology
In the code, I've tried to stick to the word "unfurl URL" to really mean the
process of extracting metadata from a website, sort of lower level. I use "link
preview" to mean a higher level structure which is enriched by unfurled data.
"link preview" is also how designers refer to it.
### User flows
1. Carol needs to see link previews while typing in the chat input field. Notice
from the diagram nothing is persisted and that status-go endpoints are
essentially stateless.
```
#+begin_src plantuml :results verbatim
Client->>Server: Call wakuext_getTextURLs
Server-->>Client: Normalized URLs
Client->>Client: Render cached unfurled URLs
Client->>Server: Unfurl non-cached URLs.\nCall wakuext_unfurlURLs
Server->>Website: Fetch metadata
Website-->>Server: Metadata (thumbnail URL, title, etc)
Server->>Website: Fetch thumbnail
Server->>Website: Fetch favicon
Website-->>Server: Favicon bytes
Website-->>Server: Thumbnail bytes
Server->>Server: Decode & process images
Server-->>Client: Unfurled data (thumbnail data URI, etc)
#+end_src
```
```
,------. ,------. ,-------.
|Client| |Server| |Website|
`--+---' `--+---' `---+---'
| Call wakuext_getTextURLs | |
| ---------------------------------------> |
| | |
| Normalized URLs | |
| <- - - - - - - - - - - - - - - - - - - - |
| | |
|----. | |
| | Render cached unfurled URLs | |
|<---' | |
| | |
| Unfurl non-cached URLs. | |
| Call wakuext_unfurlURLs | |
| ---------------------------------------> |
| | |
| | Fetch metadata |
| | ------------------------------------>
| | |
| | Metadata (thumbnail URL, title, etc)|
| | <- - - - - - - - - - - - - - - - - -
| | |
| | Fetch thumbnail |
| | ------------------------------------>
| | |
| | Fetch favicon |
| | ------------------------------------>
| | |
| | Favicon bytes |
| | <- - - - - - - - - - - - - - - - - -
| | |
| | Thumbnail bytes |
| | <- - - - - - - - - - - - - - - - - -
| | |
| |----. |
| | | Decode & process images |
| |<---' |
| | |
| Unfurled data (thumbnail data URI, etc)| |
| <- - - - - - - - - - - - - - - - - - - - |
,--+---. ,--+---. ,---+---.
|Client| |Server| |Website|
`------' `------' `-------'
```
2. Carol sends the text message with link previews in the RPC request
wakuext_sendChatMessages. status-go assumes the link previews are good
because it can't and shouldn't attempt to re-unfurl them.
```
#+begin_src plantuml :results verbatim
Client->>Server: Call wakuext_sendChatMessages
Server->>Server: Transform link previews to\nbe proto-marshalled
Server->DB: Write link previews serialized as JSON
Server-->>Client: Updated message response
#+end_src
```
```
,------. ,------. ,--.
|Client| |Server| |DB|
`--+---' `--+---' `+-'
| Call wakuext_sendChatMessages| |
| -----------------------------> |
| | |
| |----. |
| | | Transform link previews to |
| |<---' be proto-marshalled |
| | |
| | |
| | Write link previews serialized as JSON|
| | -------------------------------------->
| | |
| Updated message response | |
| <- - - - - - - - - - - - - - - |
,--+---. ,--+---. ,+-.
|Client| |Server| |DB|
`------' `------' `--'
```
3. The message was sent over waku and persisted locally in Carol's device. She
should now see the link previews in the chat history. There can be many link
previews shared by other chat members, therefore it is important to serve the
assets via the media server to avoid overloading the ReactNative bridge with
lots of big JSON payloads containing base64 encoded data URIs (maybe this
concern is meaningless for desktop). When a client is rendering messages with
link previews, they will have the field linkPreviews, and the thumbnail URL
will point to the local media server.
```
#+begin_src plantuml :results verbatim
Client->>Server: GET /link-preview/thumbnail (media server)
Server->>DB: Read from user_messages.unfurled_links
Server->Server: Unmarshal JSON
Server-->>Client: HTTP Content-Type: image/jpeg/etc
#+end_src
```
```
,------. ,------. ,--.
|Client| |Server| |DB|
`--+---' `--+---' `+-'
| GET /link-preview/thumbnail (media server)| |
| ------------------------------------------> |
| | |
| | Read from user_messages.unfurled_links|
| | -------------------------------------->
| | |
| |----. |
| | | Unmarshal JSON |
| |<---' |
| | |
| HTTP Content-Type: image/jpeg/etc | |
| <- - - - - - - - - - - - - - - - - - - - - |
,--+---. ,--+---. ,+-.
|Client| |Server| |DB|
`------' `------' `--'
```
### Some limitations of the current implementation
The following points will become separate issues in status-go that I'll work on
over the next couple weeks. In no order of importance:
- Improve how multiple links are fetched; retries on failure and testing how
unfurling behaves around the timeout limits (deterministically, not by making
real HTTP calls as I did). https://github.com/status-im/status-go/issues/3498
- Unfurl favicons and store them in the protobuf too.
- For this PR, I added unfurling support only for websites with OpenGraph
https://ogp.me/ meta tags. Other unfurlers will be implemented on demand. The
next one will probably be for oEmbed https://oembed.com/, the protocol
supported by YouTube, for example.
- Resize and/or compress thumbnails (and favicons). Often times, thumbnails are
huge for the purposes of link previews. There is already support for
compressing JPEGs in status-go, but I prefer to work with compression in a
separate PR because I'd like to also solve the problem for PNGs (probably
convert them to JPEGs, plus compress them). This would be a safe choice for
thumbnails, favicons not so much because transparency is desirable.
- Editing messages is not yet supported.
- I haven't coded any artificial limit on the number of previews or on the size
of the thumbnail payload. This will be done in a separate issue. I have heard
the ideal solution may be to split messages into smaller chunks of ~125 KiB
because of libp2p, but that might be too complicated at this stage of the
product (?).
- Link preview deletion.
- For the moment, OpenGraph metadata is extracted by requesting data for the
English language (and fallback to whatever is available). In the future, we'll
want to unfurl by respecting the user's local device language. Some websites,
like GoDaddy, are already localized based on the device's IP, but many aren't.
- The website's description text should be limited by a certain number of
characters, especially because it's outside our control. Exactly how much has
not been decided yet, so it'll be done separately.
- URL normalization can be tricky, so I implemented only the basics to help with
caching. For example, the url https://status.im and HTTPS://status.im are
considered identical. Also, a URL is considered valid for unfurling if its TLD
exists according to publicsuffix.EffectiveTLDPlusOne. This was essential,
otherwise the default Go url.Parse approach would consider many invalid URLs
valid, and thus the server would waste resources trying to unfurl the
unfurleable.
### Other requirements
- If the message is edited, the link previews should reflect the edited text,
not the original one. This has been aligned with the design team as well.
- If the website's thumbnail or the favicon can't be fetched, just ignore them.
The only mandatory piece of metadata is the website's title and URL.
- Link previews in clients should be generated in near real-time, that is, as
the user types, previews are updated. In mobile this performs very well, and
it's what other clients like WhatsApp, Telegram, and Facebook do.
### Decisions
- While the user typing in the input field, the client is constantly (debounced)
asking status-go to parse the text and extract normalized URLs and then the
client checks if they're already in its in-memory cache. If they are, no RPC
call is made. I chose this approach to achieve the best possible performance
in mobile and avoid the whole RPC overhead, since the chat experience is
already not smooth enough. The mobile client uses URLs as cache keys in a
hashmap, i.e. if the key is present, it means the preview is readily available
(naive, but good enough for now). This decision also gave me more flexibility
to find the best UX at this stage of the feature.
- Due to the requirement that users should be able to see independent loading
indicators for each link preview, when status-go can't unfurl a URL, it
doesn't return it in the response.
- As an initial implementation, I added the BLOB column unfurled_links to the
user_messages table. The preview data is then serialized as JSON before being
stored in this column. I felt that creating a separate table and the related
code for this initial PR would be inconvenient. Is that reasonable to you?
Once things stabilize I can create a proper table if we want to avoid this
kind of solution with serialized columns.
2023-05-18 15:43:06 -03:00
|
|
|
Links []string
|
|
|
|
LinkPreviews []LinkPreview `json:"linkPreviews"`
|
2021-06-07 10:31:27 +02:00
|
|
|
|
|
|
|
// EditedAt indicates the clock value it was edited
|
|
|
|
EditedAt uint64 `json:"editedAt"`
|
2021-07-26 17:06:32 -04:00
|
|
|
|
|
|
|
// Deleted indicates if a message was deleted
|
|
|
|
Deleted bool `json:"deleted"`
|
2022-01-18 16:31:34 +00:00
|
|
|
|
2023-02-01 08:57:35 +08:00
|
|
|
DeletedBy string `json:"deletedBy,omitempty"`
|
|
|
|
|
2022-09-28 19:42:17 +08:00
|
|
|
DeletedForMe bool `json:"deletedForMe"`
|
|
|
|
|
2022-01-18 16:31:34 +00:00
|
|
|
// ContactRequestState is the state of the contact request message
|
|
|
|
ContactRequestState ContactRequestState `json:"contactRequestState,omitempty"`
|
2022-08-04 18:16:56 +02:00
|
|
|
|
2022-08-31 15:41:58 +01:00
|
|
|
// ContactVerificationState is the state of the identity verification process
|
|
|
|
ContactVerificationState ContactVerificationState `json:"contactVerificationState,omitempty"`
|
|
|
|
|
2022-08-04 18:16:56 +02:00
|
|
|
DiscordMessage *protobuf.DiscordMessage `json:"discordMessage,omitempty"`
|
2020-01-10 19:59:01 +01:00
|
|
|
}
|
|
|
|
|
Move to protobuf for Message type (#1706)
* Use a single Message type `v1/message.go` and `message.go` are the same now, and they embed `protobuf.ChatMessage`
* Use `SendChatMessage` for sending chat messages, this is basically the old `Send` but a bit more flexible so we can send different message types (stickers,commands), and not just text.
* Remove dedup from services/shhext. Because now we process in status-protocol, dedup makes less sense, as those messages are going to be processed anyway, so removing for now, we can re-evaluate if bringing it to status-go or not.
* Change the various retrieveX method to a single one:
`RetrieveAll` will be processing those messages that it can process (Currently only `Message`), and return the rest in `RawMessages` (still transit). The format for the response is:
`Chats`: -> The chats updated by receiving the message
`Messages`: -> The messages retrieved (already matched to a chat)
`Contacts`: -> The contacts updated by the messages
`RawMessages` -> Anything else that can't be parsed, eventually as we move everything to status-protocol-go this will go away.
2019-12-05 17:25:34 +01:00
|
|
|
func (m *Message) MarshalJSON() ([]byte, error) {
|
2020-01-15 08:25:09 +01:00
|
|
|
type StickerAlias struct {
|
|
|
|
Hash string `json:"hash"`
|
|
|
|
Pack int32 `json:"pack"`
|
2022-05-09 09:07:57 -04:00
|
|
|
URL string `json:"url"`
|
2020-01-15 08:25:09 +01:00
|
|
|
}
|
2023-03-06 10:51:09 +02:00
|
|
|
|
|
|
|
type MessageStructType struct {
|
2022-10-24 12:33:47 +01:00
|
|
|
ID string `json:"id"`
|
|
|
|
WhisperTimestamp uint64 `json:"whisperTimestamp"`
|
|
|
|
From string `json:"from"`
|
|
|
|
Alias string `json:"alias"`
|
|
|
|
Identicon string `json:"identicon"`
|
|
|
|
Seen bool `json:"seen"`
|
|
|
|
OutgoingStatus string `json:"outgoingStatus,omitempty"`
|
|
|
|
QuotedMessage *QuotedMessage `json:"quotedMessage"`
|
|
|
|
RTL bool `json:"rtl"`
|
|
|
|
ParsedText json.RawMessage `json:"parsedText,omitempty"`
|
|
|
|
LineCount int `json:"lineCount"`
|
|
|
|
Text string `json:"text"`
|
|
|
|
ChatID string `json:"chatId"`
|
|
|
|
LocalChatID string `json:"localChatId"`
|
|
|
|
Clock uint64 `json:"clock"`
|
|
|
|
Replace string `json:"replace"`
|
|
|
|
ResponseTo string `json:"responseTo"`
|
|
|
|
New bool `json:"new,omitempty"`
|
|
|
|
EnsName string `json:"ensName"`
|
|
|
|
DisplayName string `json:"displayName"`
|
|
|
|
Image string `json:"image,omitempty"`
|
2022-12-14 16:25:45 +04:00
|
|
|
AlbumID string `json:"albumId,omitempty"`
|
2023-01-12 13:43:14 +04:00
|
|
|
ImageWidth uint32 `json:"imageWidth,omitempty"`
|
|
|
|
ImageHeight uint32 `json:"imageHeight,omitempty"`
|
2023-04-05 15:24:55 +02:00
|
|
|
AlbumImagesCount uint32 `json:"albumImagesCount,omitempty"`
|
2022-10-24 12:33:47 +01:00
|
|
|
Audio string `json:"audio,omitempty"`
|
|
|
|
AudioDurationMs uint64 `json:"audioDurationMs,omitempty"`
|
|
|
|
CommunityID string `json:"communityId,omitempty"`
|
|
|
|
Sticker *StickerAlias `json:"sticker,omitempty"`
|
|
|
|
CommandParameters *CommandParameters `json:"commandParameters,omitempty"`
|
|
|
|
GapParameters *GapParameters `json:"gapParameters,omitempty"`
|
|
|
|
Timestamp uint64 `json:"timestamp"`
|
|
|
|
ContentType protobuf.ChatMessage_ContentType `json:"contentType"`
|
|
|
|
MessageType protobuf.MessageType `json:"messageType"`
|
|
|
|
Mentions []string `json:"mentions,omitempty"`
|
|
|
|
Mentioned bool `json:"mentioned,omitempty"`
|
2023-01-10 12:59:32 -05:00
|
|
|
Replied bool `json:"replied,omitempty"`
|
2022-10-24 12:33:47 +01:00
|
|
|
Links []string `json:"links,omitempty"`
|
URL unfurling (initial implementation) (#3471)
This is the initial implementation for the new URL unfurling requirements. The
most important one is that only the message sender will pay the privacy cost for
unfurling and extracting metadata from websites. Once the message is sent, the
unfurled data will be stored at the protocol level and receivers will just
profit and happily decode the metadata to render it.
Further development of this URL unfurling capability will be mostly guided by
issues created on clients. For the moment in status-mobile:
https://github.com/status-im/status-mobile/labels/url-preview
- https://github.com/status-im/status-mobile/issues/15918
- https://github.com/status-im/status-mobile/issues/15917
- https://github.com/status-im/status-mobile/issues/15910
- https://github.com/status-im/status-mobile/issues/15909
- https://github.com/status-im/status-mobile/issues/15908
- https://github.com/status-im/status-mobile/issues/15906
- https://github.com/status-im/status-mobile/issues/15905
### Terminology
In the code, I've tried to stick to the word "unfurl URL" to really mean the
process of extracting metadata from a website, sort of lower level. I use "link
preview" to mean a higher level structure which is enriched by unfurled data.
"link preview" is also how designers refer to it.
### User flows
1. Carol needs to see link previews while typing in the chat input field. Notice
from the diagram nothing is persisted and that status-go endpoints are
essentially stateless.
```
#+begin_src plantuml :results verbatim
Client->>Server: Call wakuext_getTextURLs
Server-->>Client: Normalized URLs
Client->>Client: Render cached unfurled URLs
Client->>Server: Unfurl non-cached URLs.\nCall wakuext_unfurlURLs
Server->>Website: Fetch metadata
Website-->>Server: Metadata (thumbnail URL, title, etc)
Server->>Website: Fetch thumbnail
Server->>Website: Fetch favicon
Website-->>Server: Favicon bytes
Website-->>Server: Thumbnail bytes
Server->>Server: Decode & process images
Server-->>Client: Unfurled data (thumbnail data URI, etc)
#+end_src
```
```
,------. ,------. ,-------.
|Client| |Server| |Website|
`--+---' `--+---' `---+---'
| Call wakuext_getTextURLs | |
| ---------------------------------------> |
| | |
| Normalized URLs | |
| <- - - - - - - - - - - - - - - - - - - - |
| | |
|----. | |
| | Render cached unfurled URLs | |
|<---' | |
| | |
| Unfurl non-cached URLs. | |
| Call wakuext_unfurlURLs | |
| ---------------------------------------> |
| | |
| | Fetch metadata |
| | ------------------------------------>
| | |
| | Metadata (thumbnail URL, title, etc)|
| | <- - - - - - - - - - - - - - - - - -
| | |
| | Fetch thumbnail |
| | ------------------------------------>
| | |
| | Fetch favicon |
| | ------------------------------------>
| | |
| | Favicon bytes |
| | <- - - - - - - - - - - - - - - - - -
| | |
| | Thumbnail bytes |
| | <- - - - - - - - - - - - - - - - - -
| | |
| |----. |
| | | Decode & process images |
| |<---' |
| | |
| Unfurled data (thumbnail data URI, etc)| |
| <- - - - - - - - - - - - - - - - - - - - |
,--+---. ,--+---. ,---+---.
|Client| |Server| |Website|
`------' `------' `-------'
```
2. Carol sends the text message with link previews in the RPC request
wakuext_sendChatMessages. status-go assumes the link previews are good
because it can't and shouldn't attempt to re-unfurl them.
```
#+begin_src plantuml :results verbatim
Client->>Server: Call wakuext_sendChatMessages
Server->>Server: Transform link previews to\nbe proto-marshalled
Server->DB: Write link previews serialized as JSON
Server-->>Client: Updated message response
#+end_src
```
```
,------. ,------. ,--.
|Client| |Server| |DB|
`--+---' `--+---' `+-'
| Call wakuext_sendChatMessages| |
| -----------------------------> |
| | |
| |----. |
| | | Transform link previews to |
| |<---' be proto-marshalled |
| | |
| | |
| | Write link previews serialized as JSON|
| | -------------------------------------->
| | |
| Updated message response | |
| <- - - - - - - - - - - - - - - |
,--+---. ,--+---. ,+-.
|Client| |Server| |DB|
`------' `------' `--'
```
3. The message was sent over waku and persisted locally in Carol's device. She
should now see the link previews in the chat history. There can be many link
previews shared by other chat members, therefore it is important to serve the
assets via the media server to avoid overloading the ReactNative bridge with
lots of big JSON payloads containing base64 encoded data URIs (maybe this
concern is meaningless for desktop). When a client is rendering messages with
link previews, they will have the field linkPreviews, and the thumbnail URL
will point to the local media server.
```
#+begin_src plantuml :results verbatim
Client->>Server: GET /link-preview/thumbnail (media server)
Server->>DB: Read from user_messages.unfurled_links
Server->Server: Unmarshal JSON
Server-->>Client: HTTP Content-Type: image/jpeg/etc
#+end_src
```
```
,------. ,------. ,--.
|Client| |Server| |DB|
`--+---' `--+---' `+-'
| GET /link-preview/thumbnail (media server)| |
| ------------------------------------------> |
| | |
| | Read from user_messages.unfurled_links|
| | -------------------------------------->
| | |
| |----. |
| | | Unmarshal JSON |
| |<---' |
| | |
| HTTP Content-Type: image/jpeg/etc | |
| <- - - - - - - - - - - - - - - - - - - - - |
,--+---. ,--+---. ,+-.
|Client| |Server| |DB|
`------' `------' `--'
```
### Some limitations of the current implementation
The following points will become separate issues in status-go that I'll work on
over the next couple weeks. In no order of importance:
- Improve how multiple links are fetched; retries on failure and testing how
unfurling behaves around the timeout limits (deterministically, not by making
real HTTP calls as I did). https://github.com/status-im/status-go/issues/3498
- Unfurl favicons and store them in the protobuf too.
- For this PR, I added unfurling support only for websites with OpenGraph
https://ogp.me/ meta tags. Other unfurlers will be implemented on demand. The
next one will probably be for oEmbed https://oembed.com/, the protocol
supported by YouTube, for example.
- Resize and/or compress thumbnails (and favicons). Often times, thumbnails are
huge for the purposes of link previews. There is already support for
compressing JPEGs in status-go, but I prefer to work with compression in a
separate PR because I'd like to also solve the problem for PNGs (probably
convert them to JPEGs, plus compress them). This would be a safe choice for
thumbnails, favicons not so much because transparency is desirable.
- Editing messages is not yet supported.
- I haven't coded any artificial limit on the number of previews or on the size
of the thumbnail payload. This will be done in a separate issue. I have heard
the ideal solution may be to split messages into smaller chunks of ~125 KiB
because of libp2p, but that might be too complicated at this stage of the
product (?).
- Link preview deletion.
- For the moment, OpenGraph metadata is extracted by requesting data for the
English language (and fallback to whatever is available). In the future, we'll
want to unfurl by respecting the user's local device language. Some websites,
like GoDaddy, are already localized based on the device's IP, but many aren't.
- The website's description text should be limited by a certain number of
characters, especially because it's outside our control. Exactly how much has
not been decided yet, so it'll be done separately.
- URL normalization can be tricky, so I implemented only the basics to help with
caching. For example, the url https://status.im and HTTPS://status.im are
considered identical. Also, a URL is considered valid for unfurling if its TLD
exists according to publicsuffix.EffectiveTLDPlusOne. This was essential,
otherwise the default Go url.Parse approach would consider many invalid URLs
valid, and thus the server would waste resources trying to unfurl the
unfurleable.
### Other requirements
- If the message is edited, the link previews should reflect the edited text,
not the original one. This has been aligned with the design team as well.
- If the website's thumbnail or the favicon can't be fetched, just ignore them.
The only mandatory piece of metadata is the website's title and URL.
- Link previews in clients should be generated in near real-time, that is, as
the user types, previews are updated. In mobile this performs very well, and
it's what other clients like WhatsApp, Telegram, and Facebook do.
### Decisions
- While the user typing in the input field, the client is constantly (debounced)
asking status-go to parse the text and extract normalized URLs and then the
client checks if they're already in its in-memory cache. If they are, no RPC
call is made. I chose this approach to achieve the best possible performance
in mobile and avoid the whole RPC overhead, since the chat experience is
already not smooth enough. The mobile client uses URLs as cache keys in a
hashmap, i.e. if the key is present, it means the preview is readily available
(naive, but good enough for now). This decision also gave me more flexibility
to find the best UX at this stage of the feature.
- Due to the requirement that users should be able to see independent loading
indicators for each link preview, when status-go can't unfurl a URL, it
doesn't return it in the response.
- As an initial implementation, I added the BLOB column unfurled_links to the
user_messages table. The preview data is then serialized as JSON before being
stored in this column. I felt that creating a separate table and the related
code for this initial PR would be inconvenient. Is that reasonable to you?
Once things stabilize I can create a proper table if we want to avoid this
kind of solution with serialized columns.
2023-05-18 15:43:06 -03:00
|
|
|
LinkPreviews []LinkPreview `json:"linkPreviews,omitempty"`
|
2022-10-24 12:33:47 +01:00
|
|
|
EditedAt uint64 `json:"editedAt,omitempty"`
|
|
|
|
Deleted bool `json:"deleted,omitempty"`
|
2023-02-01 08:57:35 +08:00
|
|
|
DeletedBy string `json:"deletedBy,omitempty"`
|
2022-10-24 12:33:47 +01:00
|
|
|
DeletedForMe bool `json:"deletedForMe,omitempty"`
|
|
|
|
ContactRequestState ContactRequestState `json:"contactRequestState,omitempty"`
|
|
|
|
ContactVerificationState ContactVerificationState `json:"contactVerificationState,omitempty"`
|
|
|
|
DiscordMessage *protobuf.DiscordMessage `json:"discordMessage,omitempty"`
|
2023-03-06 10:51:09 +02:00
|
|
|
}
|
|
|
|
item := MessageStructType{
|
2022-10-24 12:33:47 +01:00
|
|
|
ID: m.ID,
|
|
|
|
WhisperTimestamp: m.WhisperTimestamp,
|
|
|
|
From: m.From,
|
|
|
|
Alias: m.Alias,
|
|
|
|
Identicon: m.Identicon,
|
|
|
|
Seen: m.Seen,
|
|
|
|
OutgoingStatus: m.OutgoingStatus,
|
|
|
|
QuotedMessage: m.QuotedMessage,
|
|
|
|
RTL: m.RTL,
|
|
|
|
ParsedText: m.ParsedText,
|
|
|
|
LineCount: m.LineCount,
|
|
|
|
Text: m.Text,
|
|
|
|
Replace: m.Replace,
|
|
|
|
ChatID: m.ChatId,
|
|
|
|
LocalChatID: m.LocalChatID,
|
|
|
|
Clock: m.Clock,
|
|
|
|
ResponseTo: m.ResponseTo,
|
|
|
|
New: m.New,
|
|
|
|
EnsName: m.EnsName,
|
|
|
|
DisplayName: m.DisplayName,
|
|
|
|
Image: m.ImageLocalURL,
|
|
|
|
Audio: m.AudioLocalURL,
|
|
|
|
CommunityID: m.CommunityID,
|
|
|
|
Timestamp: m.Timestamp,
|
|
|
|
ContentType: m.ContentType,
|
|
|
|
Mentions: m.Mentions,
|
|
|
|
Mentioned: m.Mentioned,
|
2023-01-10 12:59:32 -05:00
|
|
|
Replied: m.Replied,
|
2022-10-24 12:33:47 +01:00
|
|
|
Links: m.Links,
|
URL unfurling (initial implementation) (#3471)
This is the initial implementation for the new URL unfurling requirements. The
most important one is that only the message sender will pay the privacy cost for
unfurling and extracting metadata from websites. Once the message is sent, the
unfurled data will be stored at the protocol level and receivers will just
profit and happily decode the metadata to render it.
Further development of this URL unfurling capability will be mostly guided by
issues created on clients. For the moment in status-mobile:
https://github.com/status-im/status-mobile/labels/url-preview
- https://github.com/status-im/status-mobile/issues/15918
- https://github.com/status-im/status-mobile/issues/15917
- https://github.com/status-im/status-mobile/issues/15910
- https://github.com/status-im/status-mobile/issues/15909
- https://github.com/status-im/status-mobile/issues/15908
- https://github.com/status-im/status-mobile/issues/15906
- https://github.com/status-im/status-mobile/issues/15905
### Terminology
In the code, I've tried to stick to the word "unfurl URL" to really mean the
process of extracting metadata from a website, sort of lower level. I use "link
preview" to mean a higher level structure which is enriched by unfurled data.
"link preview" is also how designers refer to it.
### User flows
1. Carol needs to see link previews while typing in the chat input field. Notice
from the diagram nothing is persisted and that status-go endpoints are
essentially stateless.
```
#+begin_src plantuml :results verbatim
Client->>Server: Call wakuext_getTextURLs
Server-->>Client: Normalized URLs
Client->>Client: Render cached unfurled URLs
Client->>Server: Unfurl non-cached URLs.\nCall wakuext_unfurlURLs
Server->>Website: Fetch metadata
Website-->>Server: Metadata (thumbnail URL, title, etc)
Server->>Website: Fetch thumbnail
Server->>Website: Fetch favicon
Website-->>Server: Favicon bytes
Website-->>Server: Thumbnail bytes
Server->>Server: Decode & process images
Server-->>Client: Unfurled data (thumbnail data URI, etc)
#+end_src
```
```
,------. ,------. ,-------.
|Client| |Server| |Website|
`--+---' `--+---' `---+---'
| Call wakuext_getTextURLs | |
| ---------------------------------------> |
| | |
| Normalized URLs | |
| <- - - - - - - - - - - - - - - - - - - - |
| | |
|----. | |
| | Render cached unfurled URLs | |
|<---' | |
| | |
| Unfurl non-cached URLs. | |
| Call wakuext_unfurlURLs | |
| ---------------------------------------> |
| | |
| | Fetch metadata |
| | ------------------------------------>
| | |
| | Metadata (thumbnail URL, title, etc)|
| | <- - - - - - - - - - - - - - - - - -
| | |
| | Fetch thumbnail |
| | ------------------------------------>
| | |
| | Fetch favicon |
| | ------------------------------------>
| | |
| | Favicon bytes |
| | <- - - - - - - - - - - - - - - - - -
| | |
| | Thumbnail bytes |
| | <- - - - - - - - - - - - - - - - - -
| | |
| |----. |
| | | Decode & process images |
| |<---' |
| | |
| Unfurled data (thumbnail data URI, etc)| |
| <- - - - - - - - - - - - - - - - - - - - |
,--+---. ,--+---. ,---+---.
|Client| |Server| |Website|
`------' `------' `-------'
```
2. Carol sends the text message with link previews in the RPC request
wakuext_sendChatMessages. status-go assumes the link previews are good
because it can't and shouldn't attempt to re-unfurl them.
```
#+begin_src plantuml :results verbatim
Client->>Server: Call wakuext_sendChatMessages
Server->>Server: Transform link previews to\nbe proto-marshalled
Server->DB: Write link previews serialized as JSON
Server-->>Client: Updated message response
#+end_src
```
```
,------. ,------. ,--.
|Client| |Server| |DB|
`--+---' `--+---' `+-'
| Call wakuext_sendChatMessages| |
| -----------------------------> |
| | |
| |----. |
| | | Transform link previews to |
| |<---' be proto-marshalled |
| | |
| | |
| | Write link previews serialized as JSON|
| | -------------------------------------->
| | |
| Updated message response | |
| <- - - - - - - - - - - - - - - |
,--+---. ,--+---. ,+-.
|Client| |Server| |DB|
`------' `------' `--'
```
3. The message was sent over waku and persisted locally in Carol's device. She
should now see the link previews in the chat history. There can be many link
previews shared by other chat members, therefore it is important to serve the
assets via the media server to avoid overloading the ReactNative bridge with
lots of big JSON payloads containing base64 encoded data URIs (maybe this
concern is meaningless for desktop). When a client is rendering messages with
link previews, they will have the field linkPreviews, and the thumbnail URL
will point to the local media server.
```
#+begin_src plantuml :results verbatim
Client->>Server: GET /link-preview/thumbnail (media server)
Server->>DB: Read from user_messages.unfurled_links
Server->Server: Unmarshal JSON
Server-->>Client: HTTP Content-Type: image/jpeg/etc
#+end_src
```
```
,------. ,------. ,--.
|Client| |Server| |DB|
`--+---' `--+---' `+-'
| GET /link-preview/thumbnail (media server)| |
| ------------------------------------------> |
| | |
| | Read from user_messages.unfurled_links|
| | -------------------------------------->
| | |
| |----. |
| | | Unmarshal JSON |
| |<---' |
| | |
| HTTP Content-Type: image/jpeg/etc | |
| <- - - - - - - - - - - - - - - - - - - - - |
,--+---. ,--+---. ,+-.
|Client| |Server| |DB|
`------' `------' `--'
```
### Some limitations of the current implementation
The following points will become separate issues in status-go that I'll work on
over the next couple weeks. In no order of importance:
- Improve how multiple links are fetched; retries on failure and testing how
unfurling behaves around the timeout limits (deterministically, not by making
real HTTP calls as I did). https://github.com/status-im/status-go/issues/3498
- Unfurl favicons and store them in the protobuf too.
- For this PR, I added unfurling support only for websites with OpenGraph
https://ogp.me/ meta tags. Other unfurlers will be implemented on demand. The
next one will probably be for oEmbed https://oembed.com/, the protocol
supported by YouTube, for example.
- Resize and/or compress thumbnails (and favicons). Often times, thumbnails are
huge for the purposes of link previews. There is already support for
compressing JPEGs in status-go, but I prefer to work with compression in a
separate PR because I'd like to also solve the problem for PNGs (probably
convert them to JPEGs, plus compress them). This would be a safe choice for
thumbnails, favicons not so much because transparency is desirable.
- Editing messages is not yet supported.
- I haven't coded any artificial limit on the number of previews or on the size
of the thumbnail payload. This will be done in a separate issue. I have heard
the ideal solution may be to split messages into smaller chunks of ~125 KiB
because of libp2p, but that might be too complicated at this stage of the
product (?).
- Link preview deletion.
- For the moment, OpenGraph metadata is extracted by requesting data for the
English language (and fallback to whatever is available). In the future, we'll
want to unfurl by respecting the user's local device language. Some websites,
like GoDaddy, are already localized based on the device's IP, but many aren't.
- The website's description text should be limited by a certain number of
characters, especially because it's outside our control. Exactly how much has
not been decided yet, so it'll be done separately.
- URL normalization can be tricky, so I implemented only the basics to help with
caching. For example, the url https://status.im and HTTPS://status.im are
considered identical. Also, a URL is considered valid for unfurling if its TLD
exists according to publicsuffix.EffectiveTLDPlusOne. This was essential,
otherwise the default Go url.Parse approach would consider many invalid URLs
valid, and thus the server would waste resources trying to unfurl the
unfurleable.
### Other requirements
- If the message is edited, the link previews should reflect the edited text,
not the original one. This has been aligned with the design team as well.
- If the website's thumbnail or the favicon can't be fetched, just ignore them.
The only mandatory piece of metadata is the website's title and URL.
- Link previews in clients should be generated in near real-time, that is, as
the user types, previews are updated. In mobile this performs very well, and
it's what other clients like WhatsApp, Telegram, and Facebook do.
### Decisions
- While the user typing in the input field, the client is constantly (debounced)
asking status-go to parse the text and extract normalized URLs and then the
client checks if they're already in its in-memory cache. If they are, no RPC
call is made. I chose this approach to achieve the best possible performance
in mobile and avoid the whole RPC overhead, since the chat experience is
already not smooth enough. The mobile client uses URLs as cache keys in a
hashmap, i.e. if the key is present, it means the preview is readily available
(naive, but good enough for now). This decision also gave me more flexibility
to find the best UX at this stage of the feature.
- Due to the requirement that users should be able to see independent loading
indicators for each link preview, when status-go can't unfurl a URL, it
doesn't return it in the response.
- As an initial implementation, I added the BLOB column unfurled_links to the
user_messages table. The preview data is then serialized as JSON before being
stored in this column. I felt that creating a separate table and the related
code for this initial PR would be inconvenient. Is that reasonable to you?
Once things stabilize I can create a proper table if we want to avoid this
kind of solution with serialized columns.
2023-05-18 15:43:06 -03:00
|
|
|
LinkPreviews: m.LinkPreviews,
|
2022-10-24 12:33:47 +01:00
|
|
|
MessageType: m.MessageType,
|
|
|
|
CommandParameters: m.CommandParameters,
|
|
|
|
GapParameters: m.GapParameters,
|
|
|
|
EditedAt: m.EditedAt,
|
|
|
|
Deleted: m.Deleted,
|
2023-02-01 08:57:35 +08:00
|
|
|
DeletedBy: m.DeletedBy,
|
2022-10-24 12:33:47 +01:00
|
|
|
DeletedForMe: m.DeletedForMe,
|
|
|
|
ContactRequestState: m.ContactRequestState,
|
2022-08-31 15:41:58 +01:00
|
|
|
ContactVerificationState: m.ContactVerificationState,
|
Move to protobuf for Message type (#1706)
* Use a single Message type `v1/message.go` and `message.go` are the same now, and they embed `protobuf.ChatMessage`
* Use `SendChatMessage` for sending chat messages, this is basically the old `Send` but a bit more flexible so we can send different message types (stickers,commands), and not just text.
* Remove dedup from services/shhext. Because now we process in status-protocol, dedup makes less sense, as those messages are going to be processed anyway, so removing for now, we can re-evaluate if bringing it to status-go or not.
* Change the various retrieveX method to a single one:
`RetrieveAll` will be processing those messages that it can process (Currently only `Message`), and return the rest in `RawMessages` (still transit). The format for the response is:
`Chats`: -> The chats updated by receiving the message
`Messages`: -> The messages retrieved (already matched to a chat)
`Contacts`: -> The contacts updated by the messages
`RawMessages` -> Anything else that can't be parsed, eventually as we move everything to status-protocol-go this will go away.
2019-12-05 17:25:34 +01:00
|
|
|
}
|
2023-01-20 18:51:36 +00:00
|
|
|
|
2020-01-15 08:25:09 +01:00
|
|
|
if sticker := m.GetSticker(); sticker != nil {
|
|
|
|
item.Sticker = &StickerAlias{
|
|
|
|
Pack: sticker.Pack,
|
|
|
|
Hash: sticker.Hash,
|
2022-05-09 09:07:57 -04:00
|
|
|
URL: m.StickerLocalURL,
|
2020-01-15 08:25:09 +01:00
|
|
|
}
|
|
|
|
}
|
2020-06-23 16:30:39 +02:00
|
|
|
|
|
|
|
if audio := m.GetAudio(); audio != nil {
|
|
|
|
item.AudioDurationMs = audio.DurationMs
|
|
|
|
}
|
2022-08-04 18:16:56 +02:00
|
|
|
|
2023-02-02 12:00:49 +04:00
|
|
|
if image := m.GetImage(); image != nil {
|
|
|
|
item.AlbumID = image.AlbumId
|
|
|
|
item.ImageWidth = image.Width
|
|
|
|
item.ImageHeight = image.Height
|
2023-04-05 15:24:55 +02:00
|
|
|
item.AlbumImagesCount = image.AlbumImagesCount
|
2023-02-02 12:00:49 +04:00
|
|
|
}
|
|
|
|
|
2022-08-04 18:16:56 +02:00
|
|
|
if discordMessage := m.GetDiscordMessage(); discordMessage != nil {
|
|
|
|
item.DiscordMessage = discordMessage
|
|
|
|
}
|
2023-03-06 10:51:09 +02:00
|
|
|
if item.From != "" {
|
|
|
|
ext, err := accountJson.ExtendStructWithPubKeyData(item.From, item)
|
|
|
|
if err != nil {
|
|
|
|
return nil, err
|
|
|
|
}
|
|
|
|
|
|
|
|
return json.Marshal(ext)
|
|
|
|
}
|
2020-06-23 16:30:39 +02:00
|
|
|
|
Move to protobuf for Message type (#1706)
* Use a single Message type `v1/message.go` and `message.go` are the same now, and they embed `protobuf.ChatMessage`
* Use `SendChatMessage` for sending chat messages, this is basically the old `Send` but a bit more flexible so we can send different message types (stickers,commands), and not just text.
* Remove dedup from services/shhext. Because now we process in status-protocol, dedup makes less sense, as those messages are going to be processed anyway, so removing for now, we can re-evaluate if bringing it to status-go or not.
* Change the various retrieveX method to a single one:
`RetrieveAll` will be processing those messages that it can process (Currently only `Message`), and return the rest in `RawMessages` (still transit). The format for the response is:
`Chats`: -> The chats updated by receiving the message
`Messages`: -> The messages retrieved (already matched to a chat)
`Contacts`: -> The contacts updated by the messages
`RawMessages` -> Anything else that can't be parsed, eventually as we move everything to status-protocol-go this will go away.
2019-12-05 17:25:34 +01:00
|
|
|
return json.Marshal(item)
|
|
|
|
}
|
|
|
|
|
|
|
|
func (m *Message) UnmarshalJSON(data []byte) error {
|
|
|
|
type Alias Message
|
|
|
|
aux := struct {
|
|
|
|
*Alias
|
2023-05-08 18:02:54 +01:00
|
|
|
ResponseTo string `json:"responseTo"`
|
|
|
|
EnsName string `json:"ensName"`
|
|
|
|
DisplayName string `json:"displayName"`
|
|
|
|
ChatID string `json:"chatId"`
|
|
|
|
Sticker *protobuf.StickerMessage `json:"sticker"`
|
|
|
|
AudioDurationMs uint64 `json:"audioDurationMs"`
|
|
|
|
ParsedText json.RawMessage `json:"parsedText"`
|
|
|
|
ContentType protobuf.ChatMessage_ContentType `json:"contentType"`
|
|
|
|
AlbumID string `json:"albumId"`
|
|
|
|
ImageWidth uint32 `json:"imageWidth"`
|
|
|
|
ImageHeight uint32 `json:"imageHeight"`
|
|
|
|
AlbumImagesCount uint32 `json:"albumImagesCount"`
|
|
|
|
From string `json:"from"`
|
|
|
|
Deleted bool `json:"deleted,omitempty"`
|
|
|
|
DeletedForMe bool `json:"deletedForMe,omitempty"`
|
Move to protobuf for Message type (#1706)
* Use a single Message type `v1/message.go` and `message.go` are the same now, and they embed `protobuf.ChatMessage`
* Use `SendChatMessage` for sending chat messages, this is basically the old `Send` but a bit more flexible so we can send different message types (stickers,commands), and not just text.
* Remove dedup from services/shhext. Because now we process in status-protocol, dedup makes less sense, as those messages are going to be processed anyway, so removing for now, we can re-evaluate if bringing it to status-go or not.
* Change the various retrieveX method to a single one:
`RetrieveAll` will be processing those messages that it can process (Currently only `Message`), and return the rest in `RawMessages` (still transit). The format for the response is:
`Chats`: -> The chats updated by receiving the message
`Messages`: -> The messages retrieved (already matched to a chat)
`Contacts`: -> The contacts updated by the messages
`RawMessages` -> Anything else that can't be parsed, eventually as we move everything to status-protocol-go this will go away.
2019-12-05 17:25:34 +01:00
|
|
|
}{
|
|
|
|
Alias: (*Alias)(m),
|
|
|
|
}
|
|
|
|
if err := json.Unmarshal(data, &aux); err != nil {
|
|
|
|
return err
|
|
|
|
}
|
2020-07-20 22:36:03 +01:00
|
|
|
if aux.ContentType == protobuf.ChatMessage_STICKER {
|
Move to protobuf for Message type (#1706)
* Use a single Message type `v1/message.go` and `message.go` are the same now, and they embed `protobuf.ChatMessage`
* Use `SendChatMessage` for sending chat messages, this is basically the old `Send` but a bit more flexible so we can send different message types (stickers,commands), and not just text.
* Remove dedup from services/shhext. Because now we process in status-protocol, dedup makes less sense, as those messages are going to be processed anyway, so removing for now, we can re-evaluate if bringing it to status-go or not.
* Change the various retrieveX method to a single one:
`RetrieveAll` will be processing those messages that it can process (Currently only `Message`), and return the rest in `RawMessages` (still transit). The format for the response is:
`Chats`: -> The chats updated by receiving the message
`Messages`: -> The messages retrieved (already matched to a chat)
`Contacts`: -> The contacts updated by the messages
`RawMessages` -> Anything else that can't be parsed, eventually as we move everything to status-protocol-go this will go away.
2019-12-05 17:25:34 +01:00
|
|
|
m.Payload = &protobuf.ChatMessage_Sticker{Sticker: aux.Sticker}
|
|
|
|
}
|
2020-06-23 16:30:39 +02:00
|
|
|
if aux.ContentType == protobuf.ChatMessage_AUDIO {
|
|
|
|
m.Payload = &protobuf.ChatMessage_Audio{
|
|
|
|
Audio: &protobuf.AudioMessage{DurationMs: aux.AudioDurationMs},
|
|
|
|
}
|
|
|
|
}
|
2023-02-02 12:00:49 +04:00
|
|
|
|
|
|
|
if aux.ContentType == protobuf.ChatMessage_IMAGE {
|
|
|
|
m.Payload = &protobuf.ChatMessage_Image{
|
|
|
|
Image: &protobuf.ImageMessage{
|
2023-05-08 18:02:54 +01:00
|
|
|
AlbumId: aux.AlbumID,
|
|
|
|
Width: aux.ImageWidth,
|
|
|
|
Height: aux.ImageHeight,
|
|
|
|
AlbumImagesCount: aux.AlbumImagesCount},
|
2023-02-02 12:00:49 +04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
Move to protobuf for Message type (#1706)
* Use a single Message type `v1/message.go` and `message.go` are the same now, and they embed `protobuf.ChatMessage`
* Use `SendChatMessage` for sending chat messages, this is basically the old `Send` but a bit more flexible so we can send different message types (stickers,commands), and not just text.
* Remove dedup from services/shhext. Because now we process in status-protocol, dedup makes less sense, as those messages are going to be processed anyway, so removing for now, we can re-evaluate if bringing it to status-go or not.
* Change the various retrieveX method to a single one:
`RetrieveAll` will be processing those messages that it can process (Currently only `Message`), and return the rest in `RawMessages` (still transit). The format for the response is:
`Chats`: -> The chats updated by receiving the message
`Messages`: -> The messages retrieved (already matched to a chat)
`Contacts`: -> The contacts updated by the messages
`RawMessages` -> Anything else that can't be parsed, eventually as we move everything to status-protocol-go this will go away.
2019-12-05 17:25:34 +01:00
|
|
|
m.ResponseTo = aux.ResponseTo
|
|
|
|
m.EnsName = aux.EnsName
|
2022-02-17 11:13:10 -04:00
|
|
|
m.DisplayName = aux.DisplayName
|
Move to protobuf for Message type (#1706)
* Use a single Message type `v1/message.go` and `message.go` are the same now, and they embed `protobuf.ChatMessage`
* Use `SendChatMessage` for sending chat messages, this is basically the old `Send` but a bit more flexible so we can send different message types (stickers,commands), and not just text.
* Remove dedup from services/shhext. Because now we process in status-protocol, dedup makes less sense, as those messages are going to be processed anyway, so removing for now, we can re-evaluate if bringing it to status-go or not.
* Change the various retrieveX method to a single one:
`RetrieveAll` will be processing those messages that it can process (Currently only `Message`), and return the rest in `RawMessages` (still transit). The format for the response is:
`Chats`: -> The chats updated by receiving the message
`Messages`: -> The messages retrieved (already matched to a chat)
`Contacts`: -> The contacts updated by the messages
`RawMessages` -> Anything else that can't be parsed, eventually as we move everything to status-protocol-go this will go away.
2019-12-05 17:25:34 +01:00
|
|
|
m.ChatId = aux.ChatID
|
|
|
|
m.ContentType = aux.ContentType
|
2020-07-30 22:54:33 +02:00
|
|
|
m.ParsedText = aux.ParsedText
|
2023-05-08 18:02:54 +01:00
|
|
|
m.From = aux.From
|
|
|
|
m.Deleted = aux.Deleted
|
|
|
|
m.DeletedForMe = aux.DeletedForMe
|
Move to protobuf for Message type (#1706)
* Use a single Message type `v1/message.go` and `message.go` are the same now, and they embed `protobuf.ChatMessage`
* Use `SendChatMessage` for sending chat messages, this is basically the old `Send` but a bit more flexible so we can send different message types (stickers,commands), and not just text.
* Remove dedup from services/shhext. Because now we process in status-protocol, dedup makes less sense, as those messages are going to be processed anyway, so removing for now, we can re-evaluate if bringing it to status-go or not.
* Change the various retrieveX method to a single one:
`RetrieveAll` will be processing those messages that it can process (Currently only `Message`), and return the rest in `RawMessages` (still transit). The format for the response is:
`Chats`: -> The chats updated by receiving the message
`Messages`: -> The messages retrieved (already matched to a chat)
`Contacts`: -> The contacts updated by the messages
`RawMessages` -> Anything else that can't be parsed, eventually as we move everything to status-protocol-go this will go away.
2019-12-05 17:25:34 +01:00
|
|
|
return nil
|
|
|
|
}
|
|
|
|
|
|
|
|
// Check if the first character is Hebrew or Arabic or the RTL character
|
|
|
|
func isRTL(s string) bool {
|
|
|
|
first, _ := utf8.DecodeRuneInString(s)
|
|
|
|
return unicode.Is(unicode.Hebrew, first) ||
|
|
|
|
unicode.Is(unicode.Arabic, first) ||
|
|
|
|
// RTL character
|
|
|
|
first == '\u200f'
|
|
|
|
}
|
|
|
|
|
2020-05-13 15:16:17 +02:00
|
|
|
// parseImage check the message contains an image, and if so
|
|
|
|
// it creates the a base64 encoded version of it.
|
|
|
|
func (m *Message) parseImage() error {
|
|
|
|
if m.ContentType != protobuf.ChatMessage_IMAGE {
|
|
|
|
return nil
|
|
|
|
}
|
|
|
|
image := m.GetImage()
|
|
|
|
if image == nil {
|
|
|
|
return errors.New("image empty")
|
|
|
|
}
|
|
|
|
|
|
|
|
payload := image.Payload
|
|
|
|
|
|
|
|
e64 := base64.StdEncoding
|
|
|
|
|
|
|
|
maxEncLen := e64.EncodedLen(len(payload))
|
|
|
|
encBuf := make([]byte, maxEncLen)
|
|
|
|
|
|
|
|
e64.Encode(encBuf, payload)
|
|
|
|
|
2020-10-26 15:16:41 +00:00
|
|
|
mime, err := images.GetMimeType(image.Payload)
|
2020-05-14 07:40:40 +02:00
|
|
|
|
|
|
|
if err != nil {
|
|
|
|
return err
|
2020-05-13 15:16:17 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
m.Base64Image = fmt.Sprintf("data:image/%s;base64,%s", mime, encBuf)
|
|
|
|
|
|
|
|
return nil
|
|
|
|
}
|
|
|
|
|
2020-06-17 20:55:49 +02:00
|
|
|
// parseAudio check the message contains an audio, and if so
|
2020-07-16 08:21:02 +02:00
|
|
|
// it creates a base64 encoded version of it.
|
2020-06-17 20:55:49 +02:00
|
|
|
func (m *Message) parseAudio() error {
|
|
|
|
if m.ContentType != protobuf.ChatMessage_AUDIO {
|
|
|
|
return nil
|
|
|
|
}
|
|
|
|
audio := m.GetAudio()
|
|
|
|
if audio == nil {
|
|
|
|
return errors.New("audio empty")
|
|
|
|
}
|
|
|
|
|
|
|
|
payload := audio.Payload
|
|
|
|
|
|
|
|
e64 := base64.StdEncoding
|
|
|
|
|
|
|
|
maxEncLen := e64.EncodedLen(len(payload))
|
|
|
|
encBuf := make([]byte, maxEncLen)
|
|
|
|
|
|
|
|
e64.Encode(encBuf, payload)
|
|
|
|
|
|
|
|
mime, err := getAudioMessageMIME(audio)
|
|
|
|
|
|
|
|
if err != nil {
|
|
|
|
return err
|
|
|
|
}
|
|
|
|
|
|
|
|
m.Base64Audio = fmt.Sprintf("data:audio/%s;base64,%s", mime, encBuf)
|
|
|
|
|
|
|
|
return nil
|
|
|
|
}
|
|
|
|
|
2020-09-07 10:25:57 +02:00
|
|
|
// implement interface of https://github.com/status-im/markdown/blob/b9fe921681227b1dace4b56364e15edb3b698308/ast/node.go#L701
|
2021-03-31 18:23:45 +02:00
|
|
|
type SimplifiedTextVisitor struct {
|
|
|
|
text string
|
|
|
|
canonicalNames map[string]string
|
|
|
|
}
|
|
|
|
|
|
|
|
func (v *SimplifiedTextVisitor) Visit(node ast.Node, entering bool) ast.WalkStatus {
|
|
|
|
// only on entering we fetch, otherwise we go on
|
|
|
|
if !entering {
|
|
|
|
return ast.GoToNext
|
|
|
|
}
|
|
|
|
|
|
|
|
switch n := node.(type) {
|
|
|
|
case *ast.Mention:
|
|
|
|
literal := string(n.Literal)
|
|
|
|
canonicalName, ok := v.canonicalNames[literal]
|
|
|
|
if ok {
|
|
|
|
v.text += canonicalName
|
|
|
|
} else {
|
|
|
|
v.text += literal
|
|
|
|
}
|
|
|
|
case *ast.Link:
|
|
|
|
destination := string(n.Destination)
|
|
|
|
v.text += destination
|
|
|
|
default:
|
|
|
|
var literal string
|
|
|
|
|
|
|
|
leaf := node.AsLeaf()
|
|
|
|
container := node.AsContainer()
|
|
|
|
if leaf != nil {
|
|
|
|
literal = string(leaf.Literal)
|
|
|
|
} else if container != nil {
|
|
|
|
literal = string(container.Literal)
|
|
|
|
}
|
|
|
|
v.text += literal
|
|
|
|
}
|
|
|
|
|
|
|
|
return ast.GoToNext
|
|
|
|
}
|
|
|
|
|
|
|
|
// implement interface of https://github.com/status-im/markdown/blob/b9fe921681227b1dace4b56364e15edb3b698308/ast/node.go#L701
|
|
|
|
type MentionsAndLinksVisitor struct {
|
2021-05-25 11:40:02 +02:00
|
|
|
identity string
|
|
|
|
mentioned bool
|
|
|
|
mentions []string
|
|
|
|
links []string
|
2020-09-01 12:34:28 +02:00
|
|
|
}
|
|
|
|
|
URL unfurling (initial implementation) (#3471)
This is the initial implementation for the new URL unfurling requirements. The
most important one is that only the message sender will pay the privacy cost for
unfurling and extracting metadata from websites. Once the message is sent, the
unfurled data will be stored at the protocol level and receivers will just
profit and happily decode the metadata to render it.
Further development of this URL unfurling capability will be mostly guided by
issues created on clients. For the moment in status-mobile:
https://github.com/status-im/status-mobile/labels/url-preview
- https://github.com/status-im/status-mobile/issues/15918
- https://github.com/status-im/status-mobile/issues/15917
- https://github.com/status-im/status-mobile/issues/15910
- https://github.com/status-im/status-mobile/issues/15909
- https://github.com/status-im/status-mobile/issues/15908
- https://github.com/status-im/status-mobile/issues/15906
- https://github.com/status-im/status-mobile/issues/15905
### Terminology
In the code, I've tried to stick to the word "unfurl URL" to really mean the
process of extracting metadata from a website, sort of lower level. I use "link
preview" to mean a higher level structure which is enriched by unfurled data.
"link preview" is also how designers refer to it.
### User flows
1. Carol needs to see link previews while typing in the chat input field. Notice
from the diagram nothing is persisted and that status-go endpoints are
essentially stateless.
```
#+begin_src plantuml :results verbatim
Client->>Server: Call wakuext_getTextURLs
Server-->>Client: Normalized URLs
Client->>Client: Render cached unfurled URLs
Client->>Server: Unfurl non-cached URLs.\nCall wakuext_unfurlURLs
Server->>Website: Fetch metadata
Website-->>Server: Metadata (thumbnail URL, title, etc)
Server->>Website: Fetch thumbnail
Server->>Website: Fetch favicon
Website-->>Server: Favicon bytes
Website-->>Server: Thumbnail bytes
Server->>Server: Decode & process images
Server-->>Client: Unfurled data (thumbnail data URI, etc)
#+end_src
```
```
,------. ,------. ,-------.
|Client| |Server| |Website|
`--+---' `--+---' `---+---'
| Call wakuext_getTextURLs | |
| ---------------------------------------> |
| | |
| Normalized URLs | |
| <- - - - - - - - - - - - - - - - - - - - |
| | |
|----. | |
| | Render cached unfurled URLs | |
|<---' | |
| | |
| Unfurl non-cached URLs. | |
| Call wakuext_unfurlURLs | |
| ---------------------------------------> |
| | |
| | Fetch metadata |
| | ------------------------------------>
| | |
| | Metadata (thumbnail URL, title, etc)|
| | <- - - - - - - - - - - - - - - - - -
| | |
| | Fetch thumbnail |
| | ------------------------------------>
| | |
| | Fetch favicon |
| | ------------------------------------>
| | |
| | Favicon bytes |
| | <- - - - - - - - - - - - - - - - - -
| | |
| | Thumbnail bytes |
| | <- - - - - - - - - - - - - - - - - -
| | |
| |----. |
| | | Decode & process images |
| |<---' |
| | |
| Unfurled data (thumbnail data URI, etc)| |
| <- - - - - - - - - - - - - - - - - - - - |
,--+---. ,--+---. ,---+---.
|Client| |Server| |Website|
`------' `------' `-------'
```
2. Carol sends the text message with link previews in the RPC request
wakuext_sendChatMessages. status-go assumes the link previews are good
because it can't and shouldn't attempt to re-unfurl them.
```
#+begin_src plantuml :results verbatim
Client->>Server: Call wakuext_sendChatMessages
Server->>Server: Transform link previews to\nbe proto-marshalled
Server->DB: Write link previews serialized as JSON
Server-->>Client: Updated message response
#+end_src
```
```
,------. ,------. ,--.
|Client| |Server| |DB|
`--+---' `--+---' `+-'
| Call wakuext_sendChatMessages| |
| -----------------------------> |
| | |
| |----. |
| | | Transform link previews to |
| |<---' be proto-marshalled |
| | |
| | |
| | Write link previews serialized as JSON|
| | -------------------------------------->
| | |
| Updated message response | |
| <- - - - - - - - - - - - - - - |
,--+---. ,--+---. ,+-.
|Client| |Server| |DB|
`------' `------' `--'
```
3. The message was sent over waku and persisted locally in Carol's device. She
should now see the link previews in the chat history. There can be many link
previews shared by other chat members, therefore it is important to serve the
assets via the media server to avoid overloading the ReactNative bridge with
lots of big JSON payloads containing base64 encoded data URIs (maybe this
concern is meaningless for desktop). When a client is rendering messages with
link previews, they will have the field linkPreviews, and the thumbnail URL
will point to the local media server.
```
#+begin_src plantuml :results verbatim
Client->>Server: GET /link-preview/thumbnail (media server)
Server->>DB: Read from user_messages.unfurled_links
Server->Server: Unmarshal JSON
Server-->>Client: HTTP Content-Type: image/jpeg/etc
#+end_src
```
```
,------. ,------. ,--.
|Client| |Server| |DB|
`--+---' `--+---' `+-'
| GET /link-preview/thumbnail (media server)| |
| ------------------------------------------> |
| | |
| | Read from user_messages.unfurled_links|
| | -------------------------------------->
| | |
| |----. |
| | | Unmarshal JSON |
| |<---' |
| | |
| HTTP Content-Type: image/jpeg/etc | |
| <- - - - - - - - - - - - - - - - - - - - - |
,--+---. ,--+---. ,+-.
|Client| |Server| |DB|
`------' `------' `--'
```
### Some limitations of the current implementation
The following points will become separate issues in status-go that I'll work on
over the next couple weeks. In no order of importance:
- Improve how multiple links are fetched; retries on failure and testing how
unfurling behaves around the timeout limits (deterministically, not by making
real HTTP calls as I did). https://github.com/status-im/status-go/issues/3498
- Unfurl favicons and store them in the protobuf too.
- For this PR, I added unfurling support only for websites with OpenGraph
https://ogp.me/ meta tags. Other unfurlers will be implemented on demand. The
next one will probably be for oEmbed https://oembed.com/, the protocol
supported by YouTube, for example.
- Resize and/or compress thumbnails (and favicons). Often times, thumbnails are
huge for the purposes of link previews. There is already support for
compressing JPEGs in status-go, but I prefer to work with compression in a
separate PR because I'd like to also solve the problem for PNGs (probably
convert them to JPEGs, plus compress them). This would be a safe choice for
thumbnails, favicons not so much because transparency is desirable.
- Editing messages is not yet supported.
- I haven't coded any artificial limit on the number of previews or on the size
of the thumbnail payload. This will be done in a separate issue. I have heard
the ideal solution may be to split messages into smaller chunks of ~125 KiB
because of libp2p, but that might be too complicated at this stage of the
product (?).
- Link preview deletion.
- For the moment, OpenGraph metadata is extracted by requesting data for the
English language (and fallback to whatever is available). In the future, we'll
want to unfurl by respecting the user's local device language. Some websites,
like GoDaddy, are already localized based on the device's IP, but many aren't.
- The website's description text should be limited by a certain number of
characters, especially because it's outside our control. Exactly how much has
not been decided yet, so it'll be done separately.
- URL normalization can be tricky, so I implemented only the basics to help with
caching. For example, the url https://status.im and HTTPS://status.im are
considered identical. Also, a URL is considered valid for unfurling if its TLD
exists according to publicsuffix.EffectiveTLDPlusOne. This was essential,
otherwise the default Go url.Parse approach would consider many invalid URLs
valid, and thus the server would waste resources trying to unfurl the
unfurleable.
### Other requirements
- If the message is edited, the link previews should reflect the edited text,
not the original one. This has been aligned with the design team as well.
- If the website's thumbnail or the favicon can't be fetched, just ignore them.
The only mandatory piece of metadata is the website's title and URL.
- Link previews in clients should be generated in near real-time, that is, as
the user types, previews are updated. In mobile this performs very well, and
it's what other clients like WhatsApp, Telegram, and Facebook do.
### Decisions
- While the user typing in the input field, the client is constantly (debounced)
asking status-go to parse the text and extract normalized URLs and then the
client checks if they're already in its in-memory cache. If they are, no RPC
call is made. I chose this approach to achieve the best possible performance
in mobile and avoid the whole RPC overhead, since the chat experience is
already not smooth enough. The mobile client uses URLs as cache keys in a
hashmap, i.e. if the key is present, it means the preview is readily available
(naive, but good enough for now). This decision also gave me more flexibility
to find the best UX at this stage of the feature.
- Due to the requirement that users should be able to see independent loading
indicators for each link preview, when status-go can't unfurl a URL, it
doesn't return it in the response.
- As an initial implementation, I added the BLOB column unfurled_links to the
user_messages table. The preview data is then serialized as JSON before being
stored in this column. I felt that creating a separate table and the related
code for this initial PR would be inconvenient. Is that reasonable to you?
Once things stabilize I can create a proper table if we want to avoid this
kind of solution with serialized columns.
2023-05-18 15:43:06 -03:00
|
|
|
type LinksVisitor struct {
|
|
|
|
Links []string
|
|
|
|
}
|
|
|
|
|
2021-03-31 18:23:45 +02:00
|
|
|
func (v *MentionsAndLinksVisitor) Visit(node ast.Node, entering bool) ast.WalkStatus {
|
2020-09-01 12:34:28 +02:00
|
|
|
// only on entering we fetch, otherwise we go on
|
|
|
|
if !entering {
|
|
|
|
return ast.GoToNext
|
|
|
|
}
|
|
|
|
switch n := node.(type) {
|
|
|
|
case *ast.Mention:
|
2021-05-25 11:40:02 +02:00
|
|
|
mention := string(n.Literal)
|
2023-02-21 16:20:10 +00:00
|
|
|
if mention == v.identity || mention == EveryoneMentionTag {
|
2021-05-25 11:40:02 +02:00
|
|
|
v.mentioned = true
|
|
|
|
}
|
|
|
|
v.mentions = append(v.mentions, mention)
|
2020-10-27 19:35:28 +02:00
|
|
|
case *ast.Link:
|
|
|
|
v.links = append(v.links, string(n.Destination))
|
2020-09-01 12:34:28 +02:00
|
|
|
}
|
2020-10-27 19:35:28 +02:00
|
|
|
|
2020-09-01 12:34:28 +02:00
|
|
|
return ast.GoToNext
|
|
|
|
}
|
|
|
|
|
URL unfurling (initial implementation) (#3471)
This is the initial implementation for the new URL unfurling requirements. The
most important one is that only the message sender will pay the privacy cost for
unfurling and extracting metadata from websites. Once the message is sent, the
unfurled data will be stored at the protocol level and receivers will just
profit and happily decode the metadata to render it.
Further development of this URL unfurling capability will be mostly guided by
issues created on clients. For the moment in status-mobile:
https://github.com/status-im/status-mobile/labels/url-preview
- https://github.com/status-im/status-mobile/issues/15918
- https://github.com/status-im/status-mobile/issues/15917
- https://github.com/status-im/status-mobile/issues/15910
- https://github.com/status-im/status-mobile/issues/15909
- https://github.com/status-im/status-mobile/issues/15908
- https://github.com/status-im/status-mobile/issues/15906
- https://github.com/status-im/status-mobile/issues/15905
### Terminology
In the code, I've tried to stick to the word "unfurl URL" to really mean the
process of extracting metadata from a website, sort of lower level. I use "link
preview" to mean a higher level structure which is enriched by unfurled data.
"link preview" is also how designers refer to it.
### User flows
1. Carol needs to see link previews while typing in the chat input field. Notice
from the diagram nothing is persisted and that status-go endpoints are
essentially stateless.
```
#+begin_src plantuml :results verbatim
Client->>Server: Call wakuext_getTextURLs
Server-->>Client: Normalized URLs
Client->>Client: Render cached unfurled URLs
Client->>Server: Unfurl non-cached URLs.\nCall wakuext_unfurlURLs
Server->>Website: Fetch metadata
Website-->>Server: Metadata (thumbnail URL, title, etc)
Server->>Website: Fetch thumbnail
Server->>Website: Fetch favicon
Website-->>Server: Favicon bytes
Website-->>Server: Thumbnail bytes
Server->>Server: Decode & process images
Server-->>Client: Unfurled data (thumbnail data URI, etc)
#+end_src
```
```
,------. ,------. ,-------.
|Client| |Server| |Website|
`--+---' `--+---' `---+---'
| Call wakuext_getTextURLs | |
| ---------------------------------------> |
| | |
| Normalized URLs | |
| <- - - - - - - - - - - - - - - - - - - - |
| | |
|----. | |
| | Render cached unfurled URLs | |
|<---' | |
| | |
| Unfurl non-cached URLs. | |
| Call wakuext_unfurlURLs | |
| ---------------------------------------> |
| | |
| | Fetch metadata |
| | ------------------------------------>
| | |
| | Metadata (thumbnail URL, title, etc)|
| | <- - - - - - - - - - - - - - - - - -
| | |
| | Fetch thumbnail |
| | ------------------------------------>
| | |
| | Fetch favicon |
| | ------------------------------------>
| | |
| | Favicon bytes |
| | <- - - - - - - - - - - - - - - - - -
| | |
| | Thumbnail bytes |
| | <- - - - - - - - - - - - - - - - - -
| | |
| |----. |
| | | Decode & process images |
| |<---' |
| | |
| Unfurled data (thumbnail data URI, etc)| |
| <- - - - - - - - - - - - - - - - - - - - |
,--+---. ,--+---. ,---+---.
|Client| |Server| |Website|
`------' `------' `-------'
```
2. Carol sends the text message with link previews in the RPC request
wakuext_sendChatMessages. status-go assumes the link previews are good
because it can't and shouldn't attempt to re-unfurl them.
```
#+begin_src plantuml :results verbatim
Client->>Server: Call wakuext_sendChatMessages
Server->>Server: Transform link previews to\nbe proto-marshalled
Server->DB: Write link previews serialized as JSON
Server-->>Client: Updated message response
#+end_src
```
```
,------. ,------. ,--.
|Client| |Server| |DB|
`--+---' `--+---' `+-'
| Call wakuext_sendChatMessages| |
| -----------------------------> |
| | |
| |----. |
| | | Transform link previews to |
| |<---' be proto-marshalled |
| | |
| | |
| | Write link previews serialized as JSON|
| | -------------------------------------->
| | |
| Updated message response | |
| <- - - - - - - - - - - - - - - |
,--+---. ,--+---. ,+-.
|Client| |Server| |DB|
`------' `------' `--'
```
3. The message was sent over waku and persisted locally in Carol's device. She
should now see the link previews in the chat history. There can be many link
previews shared by other chat members, therefore it is important to serve the
assets via the media server to avoid overloading the ReactNative bridge with
lots of big JSON payloads containing base64 encoded data URIs (maybe this
concern is meaningless for desktop). When a client is rendering messages with
link previews, they will have the field linkPreviews, and the thumbnail URL
will point to the local media server.
```
#+begin_src plantuml :results verbatim
Client->>Server: GET /link-preview/thumbnail (media server)
Server->>DB: Read from user_messages.unfurled_links
Server->Server: Unmarshal JSON
Server-->>Client: HTTP Content-Type: image/jpeg/etc
#+end_src
```
```
,------. ,------. ,--.
|Client| |Server| |DB|
`--+---' `--+---' `+-'
| GET /link-preview/thumbnail (media server)| |
| ------------------------------------------> |
| | |
| | Read from user_messages.unfurled_links|
| | -------------------------------------->
| | |
| |----. |
| | | Unmarshal JSON |
| |<---' |
| | |
| HTTP Content-Type: image/jpeg/etc | |
| <- - - - - - - - - - - - - - - - - - - - - |
,--+---. ,--+---. ,+-.
|Client| |Server| |DB|
`------' `------' `--'
```
### Some limitations of the current implementation
The following points will become separate issues in status-go that I'll work on
over the next couple weeks. In no order of importance:
- Improve how multiple links are fetched; retries on failure and testing how
unfurling behaves around the timeout limits (deterministically, not by making
real HTTP calls as I did). https://github.com/status-im/status-go/issues/3498
- Unfurl favicons and store them in the protobuf too.
- For this PR, I added unfurling support only for websites with OpenGraph
https://ogp.me/ meta tags. Other unfurlers will be implemented on demand. The
next one will probably be for oEmbed https://oembed.com/, the protocol
supported by YouTube, for example.
- Resize and/or compress thumbnails (and favicons). Often times, thumbnails are
huge for the purposes of link previews. There is already support for
compressing JPEGs in status-go, but I prefer to work with compression in a
separate PR because I'd like to also solve the problem for PNGs (probably
convert them to JPEGs, plus compress them). This would be a safe choice for
thumbnails, favicons not so much because transparency is desirable.
- Editing messages is not yet supported.
- I haven't coded any artificial limit on the number of previews or on the size
of the thumbnail payload. This will be done in a separate issue. I have heard
the ideal solution may be to split messages into smaller chunks of ~125 KiB
because of libp2p, but that might be too complicated at this stage of the
product (?).
- Link preview deletion.
- For the moment, OpenGraph metadata is extracted by requesting data for the
English language (and fallback to whatever is available). In the future, we'll
want to unfurl by respecting the user's local device language. Some websites,
like GoDaddy, are already localized based on the device's IP, but many aren't.
- The website's description text should be limited by a certain number of
characters, especially because it's outside our control. Exactly how much has
not been decided yet, so it'll be done separately.
- URL normalization can be tricky, so I implemented only the basics to help with
caching. For example, the url https://status.im and HTTPS://status.im are
considered identical. Also, a URL is considered valid for unfurling if its TLD
exists according to publicsuffix.EffectiveTLDPlusOne. This was essential,
otherwise the default Go url.Parse approach would consider many invalid URLs
valid, and thus the server would waste resources trying to unfurl the
unfurleable.
### Other requirements
- If the message is edited, the link previews should reflect the edited text,
not the original one. This has been aligned with the design team as well.
- If the website's thumbnail or the favicon can't be fetched, just ignore them.
The only mandatory piece of metadata is the website's title and URL.
- Link previews in clients should be generated in near real-time, that is, as
the user types, previews are updated. In mobile this performs very well, and
it's what other clients like WhatsApp, Telegram, and Facebook do.
### Decisions
- While the user typing in the input field, the client is constantly (debounced)
asking status-go to parse the text and extract normalized URLs and then the
client checks if they're already in its in-memory cache. If they are, no RPC
call is made. I chose this approach to achieve the best possible performance
in mobile and avoid the whole RPC overhead, since the chat experience is
already not smooth enough. The mobile client uses URLs as cache keys in a
hashmap, i.e. if the key is present, it means the preview is readily available
(naive, but good enough for now). This decision also gave me more flexibility
to find the best UX at this stage of the feature.
- Due to the requirement that users should be able to see independent loading
indicators for each link preview, when status-go can't unfurl a URL, it
doesn't return it in the response.
- As an initial implementation, I added the BLOB column unfurled_links to the
user_messages table. The preview data is then serialized as JSON before being
stored in this column. I felt that creating a separate table and the related
code for this initial PR would be inconvenient. Is that reasonable to you?
Once things stabilize I can create a proper table if we want to avoid this
kind of solution with serialized columns.
2023-05-18 15:43:06 -03:00
|
|
|
func (v *LinksVisitor) Visit(node ast.Node, entering bool) ast.WalkStatus {
|
|
|
|
if !entering {
|
|
|
|
return ast.GoToNext
|
|
|
|
}
|
|
|
|
|
|
|
|
switch n := node.(type) {
|
|
|
|
case *ast.Link:
|
|
|
|
v.Links = append(v.Links, string(n.Destination))
|
|
|
|
}
|
|
|
|
|
|
|
|
return ast.GoToNext
|
|
|
|
}
|
|
|
|
|
2021-05-25 11:40:02 +02:00
|
|
|
func runMentionsAndLinksVisitor(parsedText ast.Node, identity string) *MentionsAndLinksVisitor {
|
|
|
|
visitor := &MentionsAndLinksVisitor{identity: identity}
|
2020-09-01 12:34:28 +02:00
|
|
|
ast.Walk(parsedText, visitor)
|
2021-05-25 11:40:02 +02:00
|
|
|
return visitor
|
2020-09-01 12:34:28 +02:00
|
|
|
}
|
|
|
|
|
URL unfurling (initial implementation) (#3471)
This is the initial implementation for the new URL unfurling requirements. The
most important one is that only the message sender will pay the privacy cost for
unfurling and extracting metadata from websites. Once the message is sent, the
unfurled data will be stored at the protocol level and receivers will just
profit and happily decode the metadata to render it.
Further development of this URL unfurling capability will be mostly guided by
issues created on clients. For the moment in status-mobile:
https://github.com/status-im/status-mobile/labels/url-preview
- https://github.com/status-im/status-mobile/issues/15918
- https://github.com/status-im/status-mobile/issues/15917
- https://github.com/status-im/status-mobile/issues/15910
- https://github.com/status-im/status-mobile/issues/15909
- https://github.com/status-im/status-mobile/issues/15908
- https://github.com/status-im/status-mobile/issues/15906
- https://github.com/status-im/status-mobile/issues/15905
### Terminology
In the code, I've tried to stick to the word "unfurl URL" to really mean the
process of extracting metadata from a website, sort of lower level. I use "link
preview" to mean a higher level structure which is enriched by unfurled data.
"link preview" is also how designers refer to it.
### User flows
1. Carol needs to see link previews while typing in the chat input field. Notice
from the diagram nothing is persisted and that status-go endpoints are
essentially stateless.
```
#+begin_src plantuml :results verbatim
Client->>Server: Call wakuext_getTextURLs
Server-->>Client: Normalized URLs
Client->>Client: Render cached unfurled URLs
Client->>Server: Unfurl non-cached URLs.\nCall wakuext_unfurlURLs
Server->>Website: Fetch metadata
Website-->>Server: Metadata (thumbnail URL, title, etc)
Server->>Website: Fetch thumbnail
Server->>Website: Fetch favicon
Website-->>Server: Favicon bytes
Website-->>Server: Thumbnail bytes
Server->>Server: Decode & process images
Server-->>Client: Unfurled data (thumbnail data URI, etc)
#+end_src
```
```
,------. ,------. ,-------.
|Client| |Server| |Website|
`--+---' `--+---' `---+---'
| Call wakuext_getTextURLs | |
| ---------------------------------------> |
| | |
| Normalized URLs | |
| <- - - - - - - - - - - - - - - - - - - - |
| | |
|----. | |
| | Render cached unfurled URLs | |
|<---' | |
| | |
| Unfurl non-cached URLs. | |
| Call wakuext_unfurlURLs | |
| ---------------------------------------> |
| | |
| | Fetch metadata |
| | ------------------------------------>
| | |
| | Metadata (thumbnail URL, title, etc)|
| | <- - - - - - - - - - - - - - - - - -
| | |
| | Fetch thumbnail |
| | ------------------------------------>
| | |
| | Fetch favicon |
| | ------------------------------------>
| | |
| | Favicon bytes |
| | <- - - - - - - - - - - - - - - - - -
| | |
| | Thumbnail bytes |
| | <- - - - - - - - - - - - - - - - - -
| | |
| |----. |
| | | Decode & process images |
| |<---' |
| | |
| Unfurled data (thumbnail data URI, etc)| |
| <- - - - - - - - - - - - - - - - - - - - |
,--+---. ,--+---. ,---+---.
|Client| |Server| |Website|
`------' `------' `-------'
```
2. Carol sends the text message with link previews in the RPC request
wakuext_sendChatMessages. status-go assumes the link previews are good
because it can't and shouldn't attempt to re-unfurl them.
```
#+begin_src plantuml :results verbatim
Client->>Server: Call wakuext_sendChatMessages
Server->>Server: Transform link previews to\nbe proto-marshalled
Server->DB: Write link previews serialized as JSON
Server-->>Client: Updated message response
#+end_src
```
```
,------. ,------. ,--.
|Client| |Server| |DB|
`--+---' `--+---' `+-'
| Call wakuext_sendChatMessages| |
| -----------------------------> |
| | |
| |----. |
| | | Transform link previews to |
| |<---' be proto-marshalled |
| | |
| | |
| | Write link previews serialized as JSON|
| | -------------------------------------->
| | |
| Updated message response | |
| <- - - - - - - - - - - - - - - |
,--+---. ,--+---. ,+-.
|Client| |Server| |DB|
`------' `------' `--'
```
3. The message was sent over waku and persisted locally in Carol's device. She
should now see the link previews in the chat history. There can be many link
previews shared by other chat members, therefore it is important to serve the
assets via the media server to avoid overloading the ReactNative bridge with
lots of big JSON payloads containing base64 encoded data URIs (maybe this
concern is meaningless for desktop). When a client is rendering messages with
link previews, they will have the field linkPreviews, and the thumbnail URL
will point to the local media server.
```
#+begin_src plantuml :results verbatim
Client->>Server: GET /link-preview/thumbnail (media server)
Server->>DB: Read from user_messages.unfurled_links
Server->Server: Unmarshal JSON
Server-->>Client: HTTP Content-Type: image/jpeg/etc
#+end_src
```
```
,------. ,------. ,--.
|Client| |Server| |DB|
`--+---' `--+---' `+-'
| GET /link-preview/thumbnail (media server)| |
| ------------------------------------------> |
| | |
| | Read from user_messages.unfurled_links|
| | -------------------------------------->
| | |
| |----. |
| | | Unmarshal JSON |
| |<---' |
| | |
| HTTP Content-Type: image/jpeg/etc | |
| <- - - - - - - - - - - - - - - - - - - - - |
,--+---. ,--+---. ,+-.
|Client| |Server| |DB|
`------' `------' `--'
```
### Some limitations of the current implementation
The following points will become separate issues in status-go that I'll work on
over the next couple weeks. In no order of importance:
- Improve how multiple links are fetched; retries on failure and testing how
unfurling behaves around the timeout limits (deterministically, not by making
real HTTP calls as I did). https://github.com/status-im/status-go/issues/3498
- Unfurl favicons and store them in the protobuf too.
- For this PR, I added unfurling support only for websites with OpenGraph
https://ogp.me/ meta tags. Other unfurlers will be implemented on demand. The
next one will probably be for oEmbed https://oembed.com/, the protocol
supported by YouTube, for example.
- Resize and/or compress thumbnails (and favicons). Often times, thumbnails are
huge for the purposes of link previews. There is already support for
compressing JPEGs in status-go, but I prefer to work with compression in a
separate PR because I'd like to also solve the problem for PNGs (probably
convert them to JPEGs, plus compress them). This would be a safe choice for
thumbnails, favicons not so much because transparency is desirable.
- Editing messages is not yet supported.
- I haven't coded any artificial limit on the number of previews or on the size
of the thumbnail payload. This will be done in a separate issue. I have heard
the ideal solution may be to split messages into smaller chunks of ~125 KiB
because of libp2p, but that might be too complicated at this stage of the
product (?).
- Link preview deletion.
- For the moment, OpenGraph metadata is extracted by requesting data for the
English language (and fallback to whatever is available). In the future, we'll
want to unfurl by respecting the user's local device language. Some websites,
like GoDaddy, are already localized based on the device's IP, but many aren't.
- The website's description text should be limited by a certain number of
characters, especially because it's outside our control. Exactly how much has
not been decided yet, so it'll be done separately.
- URL normalization can be tricky, so I implemented only the basics to help with
caching. For example, the url https://status.im and HTTPS://status.im are
considered identical. Also, a URL is considered valid for unfurling if its TLD
exists according to publicsuffix.EffectiveTLDPlusOne. This was essential,
otherwise the default Go url.Parse approach would consider many invalid URLs
valid, and thus the server would waste resources trying to unfurl the
unfurleable.
### Other requirements
- If the message is edited, the link previews should reflect the edited text,
not the original one. This has been aligned with the design team as well.
- If the website's thumbnail or the favicon can't be fetched, just ignore them.
The only mandatory piece of metadata is the website's title and URL.
- Link previews in clients should be generated in near real-time, that is, as
the user types, previews are updated. In mobile this performs very well, and
it's what other clients like WhatsApp, Telegram, and Facebook do.
### Decisions
- While the user typing in the input field, the client is constantly (debounced)
asking status-go to parse the text and extract normalized URLs and then the
client checks if they're already in its in-memory cache. If they are, no RPC
call is made. I chose this approach to achieve the best possible performance
in mobile and avoid the whole RPC overhead, since the chat experience is
already not smooth enough. The mobile client uses URLs as cache keys in a
hashmap, i.e. if the key is present, it means the preview is readily available
(naive, but good enough for now). This decision also gave me more flexibility
to find the best UX at this stage of the feature.
- Due to the requirement that users should be able to see independent loading
indicators for each link preview, when status-go can't unfurl a URL, it
doesn't return it in the response.
- As an initial implementation, I added the BLOB column unfurled_links to the
user_messages table. The preview data is then serialized as JSON before being
stored in this column. I felt that creating a separate table and the related
code for this initial PR would be inconvenient. Is that reasonable to you?
Once things stabilize I can create a proper table if we want to avoid this
kind of solution with serialized columns.
2023-05-18 15:43:06 -03:00
|
|
|
func RunLinksVisitor(parsedText ast.Node) *LinksVisitor {
|
|
|
|
visitor := &LinksVisitor{}
|
|
|
|
ast.Walk(parsedText, visitor)
|
|
|
|
return visitor
|
|
|
|
}
|
|
|
|
|
Move to protobuf for Message type (#1706)
* Use a single Message type `v1/message.go` and `message.go` are the same now, and they embed `protobuf.ChatMessage`
* Use `SendChatMessage` for sending chat messages, this is basically the old `Send` but a bit more flexible so we can send different message types (stickers,commands), and not just text.
* Remove dedup from services/shhext. Because now we process in status-protocol, dedup makes less sense, as those messages are going to be processed anyway, so removing for now, we can re-evaluate if bringing it to status-go or not.
* Change the various retrieveX method to a single one:
`RetrieveAll` will be processing those messages that it can process (Currently only `Message`), and return the rest in `RawMessages` (still transit). The format for the response is:
`Chats`: -> The chats updated by receiving the message
`Messages`: -> The messages retrieved (already matched to a chat)
`Contacts`: -> The contacts updated by the messages
`RawMessages` -> Anything else that can't be parsed, eventually as we move everything to status-protocol-go this will go away.
2019-12-05 17:25:34 +01:00
|
|
|
// PrepareContent return the parsed content of the message, the line-count and whether
|
|
|
|
// is a right-to-left message
|
2021-05-25 11:40:02 +02:00
|
|
|
func (m *Message) PrepareContent(identity string) error {
|
2022-10-28 15:35:15 +02:00
|
|
|
var parsedText ast.Node
|
|
|
|
switch m.ContentType {
|
|
|
|
case protobuf.ChatMessage_DISCORD_MESSAGE:
|
|
|
|
parsedText = markdown.Parse([]byte(m.GetDiscordMessage().Content), nil)
|
|
|
|
default:
|
|
|
|
parsedText = markdown.Parse([]byte(m.Text), nil)
|
|
|
|
}
|
|
|
|
|
2021-05-25 11:40:02 +02:00
|
|
|
visitor := runMentionsAndLinksVisitor(parsedText, identity)
|
|
|
|
m.Mentions = visitor.mentions
|
|
|
|
m.Links = visitor.links
|
|
|
|
// Leave it set if already set, as sometimes we might run this without
|
|
|
|
// an identity
|
2023-04-27 10:22:26 -04:00
|
|
|
if !m.Mentioned || identity != "" {
|
2021-05-25 11:40:02 +02:00
|
|
|
m.Mentioned = visitor.mentioned
|
|
|
|
}
|
Move to protobuf for Message type (#1706)
* Use a single Message type `v1/message.go` and `message.go` are the same now, and they embed `protobuf.ChatMessage`
* Use `SendChatMessage` for sending chat messages, this is basically the old `Send` but a bit more flexible so we can send different message types (stickers,commands), and not just text.
* Remove dedup from services/shhext. Because now we process in status-protocol, dedup makes less sense, as those messages are going to be processed anyway, so removing for now, we can re-evaluate if bringing it to status-go or not.
* Change the various retrieveX method to a single one:
`RetrieveAll` will be processing those messages that it can process (Currently only `Message`), and return the rest in `RawMessages` (still transit). The format for the response is:
`Chats`: -> The chats updated by receiving the message
`Messages`: -> The messages retrieved (already matched to a chat)
`Contacts`: -> The contacts updated by the messages
`RawMessages` -> Anything else that can't be parsed, eventually as we move everything to status-protocol-go this will go away.
2019-12-05 17:25:34 +01:00
|
|
|
jsonParsedText, err := json.Marshal(parsedText)
|
|
|
|
if err != nil {
|
|
|
|
return err
|
|
|
|
}
|
2021-03-31 18:23:45 +02:00
|
|
|
m.ParsedTextAst = &parsedText
|
Move to protobuf for Message type (#1706)
* Use a single Message type `v1/message.go` and `message.go` are the same now, and they embed `protobuf.ChatMessage`
* Use `SendChatMessage` for sending chat messages, this is basically the old `Send` but a bit more flexible so we can send different message types (stickers,commands), and not just text.
* Remove dedup from services/shhext. Because now we process in status-protocol, dedup makes less sense, as those messages are going to be processed anyway, so removing for now, we can re-evaluate if bringing it to status-go or not.
* Change the various retrieveX method to a single one:
`RetrieveAll` will be processing those messages that it can process (Currently only `Message`), and return the rest in `RawMessages` (still transit). The format for the response is:
`Chats`: -> The chats updated by receiving the message
`Messages`: -> The messages retrieved (already matched to a chat)
`Contacts`: -> The contacts updated by the messages
`RawMessages` -> Anything else that can't be parsed, eventually as we move everything to status-protocol-go this will go away.
2019-12-05 17:25:34 +01:00
|
|
|
m.ParsedText = jsonParsedText
|
|
|
|
m.LineCount = strings.Count(m.Text, "\n")
|
|
|
|
m.RTL = isRTL(m.Text)
|
2020-06-17 20:55:49 +02:00
|
|
|
if err := m.parseImage(); err != nil {
|
|
|
|
return err
|
|
|
|
}
|
|
|
|
return m.parseAudio()
|
2019-08-06 23:50:13 +02:00
|
|
|
}
|
2020-05-14 07:40:40 +02:00
|
|
|
|
2021-03-31 18:23:45 +02:00
|
|
|
// GetSimplifiedText returns a the text stripped of all the markdown and with mentions
|
|
|
|
// replaced by canonical names
|
2021-05-25 11:40:02 +02:00
|
|
|
func (m *Message) GetSimplifiedText(identity string, canonicalNames map[string]string) (string, error) {
|
2021-03-31 18:23:45 +02:00
|
|
|
|
|
|
|
if m.ContentType == protobuf.ChatMessage_AUDIO {
|
|
|
|
return "Audio", nil
|
|
|
|
}
|
|
|
|
if m.ContentType == protobuf.ChatMessage_STICKER {
|
|
|
|
return "Sticker", nil
|
|
|
|
}
|
|
|
|
if m.ContentType == protobuf.ChatMessage_IMAGE {
|
|
|
|
return "Image", nil
|
|
|
|
}
|
|
|
|
if m.ContentType == protobuf.ChatMessage_COMMUNITY {
|
|
|
|
return "Community", nil
|
|
|
|
}
|
2021-09-27 11:02:25 -03:00
|
|
|
if m.ContentType == protobuf.ChatMessage_SYSTEM_MESSAGE_CONTENT_PRIVATE_GROUP {
|
|
|
|
return "Group", nil
|
|
|
|
}
|
2021-03-31 18:23:45 +02:00
|
|
|
|
|
|
|
if m.ParsedTextAst == nil {
|
2021-05-25 11:40:02 +02:00
|
|
|
err := m.PrepareContent(identity)
|
2021-03-31 18:23:45 +02:00
|
|
|
if err != nil {
|
|
|
|
return "", err
|
|
|
|
}
|
|
|
|
}
|
|
|
|
visitor := &SimplifiedTextVisitor{canonicalNames: canonicalNames}
|
|
|
|
ast.Walk(*m.ParsedTextAst, visitor)
|
|
|
|
return visitor.text, nil
|
|
|
|
}
|
|
|
|
|
2020-06-17 20:55:49 +02:00
|
|
|
func getAudioMessageMIME(i *protobuf.AudioMessage) (string, error) {
|
|
|
|
switch i.Type {
|
|
|
|
case protobuf.AudioMessage_AAC:
|
|
|
|
return "aac", nil
|
|
|
|
case protobuf.AudioMessage_AMR:
|
|
|
|
return "amr", nil
|
|
|
|
}
|
|
|
|
|
|
|
|
return "", errors.New("audio format not supported")
|
|
|
|
}
|
2020-07-25 15:16:00 +01:00
|
|
|
|
|
|
|
// GetSigPubKey returns an ecdsa encoded public key
|
2020-07-25 17:13:08 +01:00
|
|
|
// this function is required to implement the ChatEntity interface
|
2020-07-25 15:16:00 +01:00
|
|
|
func (m Message) GetSigPubKey() *ecdsa.PublicKey {
|
|
|
|
return m.SigPubKey
|
|
|
|
}
|
2020-07-25 17:13:08 +01:00
|
|
|
|
|
|
|
// GetProtoBuf returns the struct's embedded protobuf struct
|
|
|
|
// this function is required to implement the ChatEntity interface
|
|
|
|
func (m *Message) GetProtobuf() proto.Message {
|
|
|
|
return &m.ChatMessage
|
|
|
|
}
|
|
|
|
|
|
|
|
// SetMessageType a setter for the MessageType field
|
|
|
|
// this function is required to implement the ChatEntity interface
|
|
|
|
func (m *Message) SetMessageType(messageType protobuf.MessageType) {
|
|
|
|
m.MessageType = messageType
|
|
|
|
}
|
2021-01-15 18:47:30 +01:00
|
|
|
|
|
|
|
// WrapGroupMessage indicates whether we should wrap this in membership information
|
|
|
|
func (m *Message) WrapGroupMessage() bool {
|
|
|
|
return true
|
|
|
|
}
|
2021-02-23 07:37:08 +00:00
|
|
|
|
|
|
|
// GetPublicKey attempts to return or recreate the *ecdsa.PublicKey of the Message sender.
|
|
|
|
// If the m.SigPubKey is set this will be returned
|
|
|
|
// If the m.From is present the string is decoded and unmarshalled into a *ecdsa.PublicKey, the m.SigPubKey is set and returned
|
|
|
|
// Else an error is thrown
|
|
|
|
// This function differs from GetSigPubKey() as this function may return an error
|
|
|
|
func (m *Message) GetSenderPubKey() (*ecdsa.PublicKey, error) {
|
|
|
|
// TODO requires tests
|
|
|
|
|
|
|
|
if m.SigPubKey != nil {
|
|
|
|
return m.SigPubKey, nil
|
|
|
|
}
|
|
|
|
|
|
|
|
if len(m.From) > 0 {
|
|
|
|
fromB, err := hex.DecodeString(m.From[2:])
|
|
|
|
if err != nil {
|
|
|
|
return nil, err
|
|
|
|
}
|
|
|
|
|
|
|
|
senderPubKey, err := crypto.UnmarshalPubkey(fromB)
|
|
|
|
if err != nil {
|
|
|
|
return nil, err
|
|
|
|
}
|
|
|
|
|
|
|
|
m.SigPubKey = senderPubKey
|
|
|
|
return senderPubKey, nil
|
|
|
|
}
|
|
|
|
|
|
|
|
return nil, errors.New("no Message.SigPubKey or Message.From set unable to get public key")
|
|
|
|
}
|
2023-02-02 17:59:48 +00:00
|
|
|
|
|
|
|
func (m *Message) LoadAudio() error {
|
|
|
|
file, err := os.Open(m.AudioPath)
|
|
|
|
if err != nil {
|
|
|
|
return err
|
|
|
|
}
|
|
|
|
defer file.Close()
|
|
|
|
|
|
|
|
payload, err := ioutil.ReadAll(file)
|
|
|
|
if err != nil {
|
|
|
|
return err
|
|
|
|
|
|
|
|
}
|
|
|
|
audioMessage := m.GetAudio()
|
|
|
|
if audioMessage == nil {
|
|
|
|
return errors.New("no audio has been passed")
|
|
|
|
}
|
|
|
|
audioMessage.Payload = payload
|
|
|
|
audioMessage.Type = audio.Type(payload)
|
|
|
|
m.Payload = &protobuf.ChatMessage_Audio{Audio: audioMessage}
|
|
|
|
return os.Remove(m.AudioPath)
|
|
|
|
}
|
|
|
|
|
|
|
|
func (m *Message) LoadImage() error {
|
|
|
|
payload, err := images.OpenAndAdjustImage(images.CroppedImage{ImagePath: m.ImagePath}, false)
|
|
|
|
|
|
|
|
if err != nil {
|
|
|
|
return err
|
|
|
|
}
|
|
|
|
imageMessage := m.GetImage()
|
|
|
|
imageMessage.Payload = payload
|
|
|
|
imageMessage.Type = images.GetProtobufImageType(payload)
|
|
|
|
m.Payload = &protobuf.ChatMessage_Image{Image: imageMessage}
|
|
|
|
|
|
|
|
return nil
|
|
|
|
}
|
2023-03-23 19:06:50 +03:00
|
|
|
|
URL unfurling (initial implementation) (#3471)
This is the initial implementation for the new URL unfurling requirements. The
most important one is that only the message sender will pay the privacy cost for
unfurling and extracting metadata from websites. Once the message is sent, the
unfurled data will be stored at the protocol level and receivers will just
profit and happily decode the metadata to render it.
Further development of this URL unfurling capability will be mostly guided by
issues created on clients. For the moment in status-mobile:
https://github.com/status-im/status-mobile/labels/url-preview
- https://github.com/status-im/status-mobile/issues/15918
- https://github.com/status-im/status-mobile/issues/15917
- https://github.com/status-im/status-mobile/issues/15910
- https://github.com/status-im/status-mobile/issues/15909
- https://github.com/status-im/status-mobile/issues/15908
- https://github.com/status-im/status-mobile/issues/15906
- https://github.com/status-im/status-mobile/issues/15905
### Terminology
In the code, I've tried to stick to the word "unfurl URL" to really mean the
process of extracting metadata from a website, sort of lower level. I use "link
preview" to mean a higher level structure which is enriched by unfurled data.
"link preview" is also how designers refer to it.
### User flows
1. Carol needs to see link previews while typing in the chat input field. Notice
from the diagram nothing is persisted and that status-go endpoints are
essentially stateless.
```
#+begin_src plantuml :results verbatim
Client->>Server: Call wakuext_getTextURLs
Server-->>Client: Normalized URLs
Client->>Client: Render cached unfurled URLs
Client->>Server: Unfurl non-cached URLs.\nCall wakuext_unfurlURLs
Server->>Website: Fetch metadata
Website-->>Server: Metadata (thumbnail URL, title, etc)
Server->>Website: Fetch thumbnail
Server->>Website: Fetch favicon
Website-->>Server: Favicon bytes
Website-->>Server: Thumbnail bytes
Server->>Server: Decode & process images
Server-->>Client: Unfurled data (thumbnail data URI, etc)
#+end_src
```
```
,------. ,------. ,-------.
|Client| |Server| |Website|
`--+---' `--+---' `---+---'
| Call wakuext_getTextURLs | |
| ---------------------------------------> |
| | |
| Normalized URLs | |
| <- - - - - - - - - - - - - - - - - - - - |
| | |
|----. | |
| | Render cached unfurled URLs | |
|<---' | |
| | |
| Unfurl non-cached URLs. | |
| Call wakuext_unfurlURLs | |
| ---------------------------------------> |
| | |
| | Fetch metadata |
| | ------------------------------------>
| | |
| | Metadata (thumbnail URL, title, etc)|
| | <- - - - - - - - - - - - - - - - - -
| | |
| | Fetch thumbnail |
| | ------------------------------------>
| | |
| | Fetch favicon |
| | ------------------------------------>
| | |
| | Favicon bytes |
| | <- - - - - - - - - - - - - - - - - -
| | |
| | Thumbnail bytes |
| | <- - - - - - - - - - - - - - - - - -
| | |
| |----. |
| | | Decode & process images |
| |<---' |
| | |
| Unfurled data (thumbnail data URI, etc)| |
| <- - - - - - - - - - - - - - - - - - - - |
,--+---. ,--+---. ,---+---.
|Client| |Server| |Website|
`------' `------' `-------'
```
2. Carol sends the text message with link previews in the RPC request
wakuext_sendChatMessages. status-go assumes the link previews are good
because it can't and shouldn't attempt to re-unfurl them.
```
#+begin_src plantuml :results verbatim
Client->>Server: Call wakuext_sendChatMessages
Server->>Server: Transform link previews to\nbe proto-marshalled
Server->DB: Write link previews serialized as JSON
Server-->>Client: Updated message response
#+end_src
```
```
,------. ,------. ,--.
|Client| |Server| |DB|
`--+---' `--+---' `+-'
| Call wakuext_sendChatMessages| |
| -----------------------------> |
| | |
| |----. |
| | | Transform link previews to |
| |<---' be proto-marshalled |
| | |
| | |
| | Write link previews serialized as JSON|
| | -------------------------------------->
| | |
| Updated message response | |
| <- - - - - - - - - - - - - - - |
,--+---. ,--+---. ,+-.
|Client| |Server| |DB|
`------' `------' `--'
```
3. The message was sent over waku and persisted locally in Carol's device. She
should now see the link previews in the chat history. There can be many link
previews shared by other chat members, therefore it is important to serve the
assets via the media server to avoid overloading the ReactNative bridge with
lots of big JSON payloads containing base64 encoded data URIs (maybe this
concern is meaningless for desktop). When a client is rendering messages with
link previews, they will have the field linkPreviews, and the thumbnail URL
will point to the local media server.
```
#+begin_src plantuml :results verbatim
Client->>Server: GET /link-preview/thumbnail (media server)
Server->>DB: Read from user_messages.unfurled_links
Server->Server: Unmarshal JSON
Server-->>Client: HTTP Content-Type: image/jpeg/etc
#+end_src
```
```
,------. ,------. ,--.
|Client| |Server| |DB|
`--+---' `--+---' `+-'
| GET /link-preview/thumbnail (media server)| |
| ------------------------------------------> |
| | |
| | Read from user_messages.unfurled_links|
| | -------------------------------------->
| | |
| |----. |
| | | Unmarshal JSON |
| |<---' |
| | |
| HTTP Content-Type: image/jpeg/etc | |
| <- - - - - - - - - - - - - - - - - - - - - |
,--+---. ,--+---. ,+-.
|Client| |Server| |DB|
`------' `------' `--'
```
### Some limitations of the current implementation
The following points will become separate issues in status-go that I'll work on
over the next couple weeks. In no order of importance:
- Improve how multiple links are fetched; retries on failure and testing how
unfurling behaves around the timeout limits (deterministically, not by making
real HTTP calls as I did). https://github.com/status-im/status-go/issues/3498
- Unfurl favicons and store them in the protobuf too.
- For this PR, I added unfurling support only for websites with OpenGraph
https://ogp.me/ meta tags. Other unfurlers will be implemented on demand. The
next one will probably be for oEmbed https://oembed.com/, the protocol
supported by YouTube, for example.
- Resize and/or compress thumbnails (and favicons). Often times, thumbnails are
huge for the purposes of link previews. There is already support for
compressing JPEGs in status-go, but I prefer to work with compression in a
separate PR because I'd like to also solve the problem for PNGs (probably
convert them to JPEGs, plus compress them). This would be a safe choice for
thumbnails, favicons not so much because transparency is desirable.
- Editing messages is not yet supported.
- I haven't coded any artificial limit on the number of previews or on the size
of the thumbnail payload. This will be done in a separate issue. I have heard
the ideal solution may be to split messages into smaller chunks of ~125 KiB
because of libp2p, but that might be too complicated at this stage of the
product (?).
- Link preview deletion.
- For the moment, OpenGraph metadata is extracted by requesting data for the
English language (and fallback to whatever is available). In the future, we'll
want to unfurl by respecting the user's local device language. Some websites,
like GoDaddy, are already localized based on the device's IP, but many aren't.
- The website's description text should be limited by a certain number of
characters, especially because it's outside our control. Exactly how much has
not been decided yet, so it'll be done separately.
- URL normalization can be tricky, so I implemented only the basics to help with
caching. For example, the url https://status.im and HTTPS://status.im are
considered identical. Also, a URL is considered valid for unfurling if its TLD
exists according to publicsuffix.EffectiveTLDPlusOne. This was essential,
otherwise the default Go url.Parse approach would consider many invalid URLs
valid, and thus the server would waste resources trying to unfurl the
unfurleable.
### Other requirements
- If the message is edited, the link previews should reflect the edited text,
not the original one. This has been aligned with the design team as well.
- If the website's thumbnail or the favicon can't be fetched, just ignore them.
The only mandatory piece of metadata is the website's title and URL.
- Link previews in clients should be generated in near real-time, that is, as
the user types, previews are updated. In mobile this performs very well, and
it's what other clients like WhatsApp, Telegram, and Facebook do.
### Decisions
- While the user typing in the input field, the client is constantly (debounced)
asking status-go to parse the text and extract normalized URLs and then the
client checks if they're already in its in-memory cache. If they are, no RPC
call is made. I chose this approach to achieve the best possible performance
in mobile and avoid the whole RPC overhead, since the chat experience is
already not smooth enough. The mobile client uses URLs as cache keys in a
hashmap, i.e. if the key is present, it means the preview is readily available
(naive, but good enough for now). This decision also gave me more flexibility
to find the best UX at this stage of the feature.
- Due to the requirement that users should be able to see independent loading
indicators for each link preview, when status-go can't unfurl a URL, it
doesn't return it in the response.
- As an initial implementation, I added the BLOB column unfurled_links to the
user_messages table. The preview data is then serialized as JSON before being
stored in this column. I felt that creating a separate table and the related
code for this initial PR would be inconvenient. Is that reasonable to you?
Once things stabilize I can create a proper table if we want to avoid this
kind of solution with serialized columns.
2023-05-18 15:43:06 -03:00
|
|
|
func isValidLinkPreviewForProto(preview LinkPreview) bool {
|
|
|
|
return preview.Title != "" && preview.URL != "" &&
|
|
|
|
((preview.Thumbnail.DataURI == "" && preview.Thumbnail.Width == 0 && preview.Thumbnail.Height == 0) ||
|
|
|
|
(preview.Thumbnail.DataURI != "" && preview.Thumbnail.Width > 0 && preview.Thumbnail.Height > 0))
|
|
|
|
}
|
|
|
|
|
|
|
|
// ConvertLinkPreviewsToProto expects previews to be correctly sent by the
|
|
|
|
// client because we can't attempt to re-unfurl URLs at this point (it's
|
|
|
|
// actually undesirable). We run a basic validation as an additional safety net.
|
|
|
|
func (m *Message) ConvertLinkPreviewsToProto() ([]*protobuf.UnfurledLink, error) {
|
|
|
|
if len(m.LinkPreviews) == 0 {
|
|
|
|
return nil, nil
|
|
|
|
}
|
|
|
|
|
|
|
|
unfurledLinks := make([]*protobuf.UnfurledLink, 0, len(m.LinkPreviews))
|
|
|
|
|
|
|
|
for _, preview := range m.LinkPreviews {
|
|
|
|
// Do not process subsequent previews because we do expect all previews to
|
|
|
|
// be valid at this stage.
|
|
|
|
if !isValidLinkPreviewForProto(preview) {
|
|
|
|
return nil, fmt.Errorf("invalid link preview, url='%s'", preview.URL)
|
|
|
|
}
|
|
|
|
|
|
|
|
var payload []byte
|
|
|
|
var err error
|
|
|
|
if preview.Thumbnail.DataURI != "" {
|
|
|
|
payload, err = images.GetPayloadFromURI(preview.Thumbnail.DataURI)
|
|
|
|
if err != nil {
|
|
|
|
return nil, fmt.Errorf("could not get data URI payload, url='%s': %w", preview.URL, err)
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
ul := &protobuf.UnfurledLink{
|
|
|
|
Url: preview.URL,
|
|
|
|
Title: preview.Title,
|
|
|
|
Description: preview.Description,
|
|
|
|
ThumbnailWidth: uint32(preview.Thumbnail.Width),
|
|
|
|
ThumbnailHeight: uint32(preview.Thumbnail.Height),
|
|
|
|
ThumbnailPayload: payload,
|
|
|
|
}
|
|
|
|
unfurledLinks = append(unfurledLinks, ul)
|
|
|
|
}
|
|
|
|
|
|
|
|
return unfurledLinks, nil
|
|
|
|
}
|
|
|
|
|
|
|
|
func (m *Message) ConvertFromProtoToLinkPreviews(makeMediaServerURL func(msgID string, previewURL string) string) []LinkPreview {
|
|
|
|
var links []*protobuf.UnfurledLink
|
|
|
|
|
|
|
|
if links = m.GetUnfurledLinks(); links == nil {
|
|
|
|
return nil
|
|
|
|
}
|
|
|
|
|
|
|
|
previews := make([]LinkPreview, 0, len(links))
|
|
|
|
for _, link := range links {
|
|
|
|
parsedURL, err := url.Parse(link.Url)
|
|
|
|
var hostname string
|
|
|
|
// URL parsing in Go can fail with URLs that weren't correctly URL encoded.
|
|
|
|
// This shouldn't happen in general, but if an error happens we just reuse
|
|
|
|
// the full URL.
|
|
|
|
if err != nil {
|
|
|
|
hostname = link.Url
|
|
|
|
} else {
|
|
|
|
hostname = parsedURL.Hostname()
|
|
|
|
}
|
|
|
|
lp := LinkPreview{
|
|
|
|
Description: link.Description,
|
|
|
|
Hostname: hostname,
|
|
|
|
Title: link.Title,
|
|
|
|
URL: link.Url,
|
|
|
|
}
|
|
|
|
if payload := link.GetThumbnailPayload(); payload != nil {
|
|
|
|
lp.Thumbnail.Width = int(link.ThumbnailWidth)
|
|
|
|
lp.Thumbnail.Height = int(link.ThumbnailHeight)
|
|
|
|
lp.Thumbnail.URL = makeMediaServerURL(m.ID, link.Url)
|
|
|
|
}
|
|
|
|
previews = append(previews, lp)
|
|
|
|
}
|
|
|
|
|
|
|
|
return previews
|
|
|
|
}
|
|
|
|
|
2023-04-05 15:24:55 +02:00
|
|
|
func (m *Message) SetAlbumIDAndImagesCount(albumID string, imagesCount uint32) error {
|
2023-03-23 19:06:50 +03:00
|
|
|
imageMessage := m.GetImage()
|
|
|
|
if imageMessage == nil {
|
|
|
|
return errors.New("Image is empty")
|
|
|
|
}
|
|
|
|
imageMessage.AlbumId = albumID
|
2023-04-05 15:24:55 +02:00
|
|
|
imageMessage.AlbumImagesCount = imagesCount
|
2023-03-23 19:06:50 +03:00
|
|
|
m.Payload = &protobuf.ChatMessage_Image{Image: imageMessage}
|
|
|
|
|
|
|
|
return nil
|
|
|
|
}
|