Commit Graph

38650 Commits

Author SHA1 Message Date
Régis Hanol
501b19b6e0
FIX: server-side HtmlToMarkdown improvements (#9586)
TLDR; this commit vastly improves how whitespaces are handled when converting from HTML to Markdown.
It also adds support for converting HTML <tables> to markdown tables.

The previous 'remove_whitespaces!' method was traversing the whole HTML tree and used a heuristic to remove
leading and trailing whitespaces whenever it was appropriate (ie. mostly before and after HTML block elements)

It was a good idea, but it was very limited and leaded to bad conversion when the html had leading whitespaces on several lines for example.
One such example can be found [here](https://meta.discourse.org/t/86782).

For various reasons, most of the whitespaces in a HTML file is ignored when the page is being displayed in a browser.
The rules that the browsers follow are the [CSS' White Space Processing Rules](https://www.w3.org/TR/css-text-3/#white-space-rules).
They can be quite complicated when you take into account RTL languages and other various tidbits but they boils down to the following:

- Collapse whitespaces down to one space (0x20) inside an inline context (ie. nodes/tags that are being displaying on the same line)
- Remove any leading/trailing whitespaces inside an inline context

One quick & dirty way of getting this 90% solved would be to do 'HTML.gsub!(/[[:space:]]+/, " ")'.
We would also need to hoist <pre> elements in order to not mess with their whitespaces.
Unfortunately, this solution let some whitespaces creep around HTML tags which leads to more '.strip!' calls than I can bear.

I decided to "emulate" the browser's handling of whitespaces and came up with a solution in 4 parts

1. remove_not_allowed!

The HtmlToMarkdown library is recursively "visiting" all the nodes in the HTML in order to convert them to Markdown.
All the nodes that aren't handled by the library (eg. <script>, <style> or any non-textual HTML tags) are "swallowed".
In order to reduce the number of nodes visited, the method 'remove_not_allowed!' will automatically delete all the nodes
that have no "visitor" (eg. a 'visit_<tag>' method) defined.

2. remove_hidden!

Similar purpose as the previous method (eg. reducing number of nodes visited), there's no point trying to convert something that is hidden.
The 'remove_hidden!' method removes any nodes that was hidden using the "hidden" HTML attribute, some CSS or with a width or height equal to 0.

3. hoist_line_breaks!

The 'hoist_line_breaks!' method is there to handle <br> tags. I know those tiny <br> don't do much but they can be quite annoying.
The <br> tags are inline elements but they visually work like a block element (ie. they create a new line).
If you have the following HTML "<i>Foo<br>Bar</i>", it ends up visually similar to "<i>Foo</i><br><i>Bar</i>".
The latter being much more easy to process than the former, so that's what this method is doing.
The "hoist_line_breaks" will hoist <br> tags out of inline tags until their parent is a block element.

4. remove_whitespaces!

The "remove_whitespaces!" is where all the whitespace removal is happening. It's broken down into 4 methods as well

- remove_whitespaces!
- is_inline?
- collapse_spaces!
- remove_trailing_space!

The 'remove_whitespace!' method is recursively walking the HTML tree (skipping <pre> tags).
If a node has any children, they will be chunked into groups of inline elements vs block elements.
For each chunks of inline elements, it will call the "collapse_space!" and "remove_trailing_space!" methods.
For each chunks of block elements, it will call "remote_whitespace!" to keep walking the HTML tree recursively.

The "is_inline?" method determines whether a node is part of a inline context.
A node is inline iif it's a text node or it's an inline tag, but not <br>, and all its children are also inline.

The "collapse_spaces!" method will collapse any kind of (white) space into a single space (" ") character, even accros tags.
For example, if we have "  Foo \n<i> Bar </i>\t42", it will return "Foo <i>Bar </i>42".

Finally, the "remove_trailing_space!" method is there to remove any trailing space that might creep in at the end of the inline chunk.

This solution is not 100% bullet-proof.
It does not support RTL languages at all and has some caveats that I felt were not worth the work to get properly fixed.

FIX: better detection of hidden elements when converting HTML to Markdown
FIX: take into account the 'allowed_href_schemes' site setting when converting HTML <a> to Markdown
FIX: added support for 'mailto:' scheme when converting <a> from HTML to Markdown
FIX: added support for <img> dimensions when converting from HTML to Markdown
FIX: added support for <dl>, <dd> and <dt> when converting from HTML to Markdown
FIX: added support for multilines emphases, strongs and strikes when converting from HTML to Markdown
FIX: added support for <acronym> when converting from HTML to Markdown
DEV: remove unused 'sanitize' gem

Wow, did you just read all that?! Congratz, here's a cookie: 🍪.
2020-04-30 12:21:25 +02:00
Dan Ungureanu
fe51f7a863
FEATURE: More improvements to crawler and old browsers view
Related to c85018cdfd.
2020-04-30 12:07:51 +03:00
Dan Ungureanu
238cbff46f
Revert "HACK: Add dummy plugin folder"
This reverts commit 590830b931 because a
proper fix was implemented in discourse_docker@d587158.
2020-04-30 11:02:15 +03:00
Sam Saffron
d0d5a138c3
DEV: stop freezing frozen strings
We have the `# frozen_string_literal: true` comment on all our
files. This means all string literals are frozen. There is no need
to call #freeze on any literals.

For files with `# frozen_string_literal: true`

```
puts %w{a b}[0].frozen?
=> true

puts "hi".frozen?
=> true

puts "a #{1} b".frozen?
=> true

puts ("a " + "b").frozen?
=> false

puts (-("a " + "b")).frozen?
=> true
```

For more details see: https://samsaffron.com/archive/2018/02/16/reducing-string-duplication-in-ruby
2020-04-30 16:48:53 +10:00
Andrew Schleifer
02ef88052d DEV: take out more trash (icons)
folloup to f755810906
2020-04-30 14:25:05 +08:00
Martin Brennan
10f9f295dc
DEV: Add acceptance tests for bookmarks with reminders (#9592) 2020-04-30 14:58:26 +10:00
Sam Saffron
d6df92b074
DEV: correct bad test
Test was unlisting a topic and then checking for the unlist info
in the whisper field
2020-04-30 14:27:24 +10:00
Sam Saffron
53e0d8fbab
DEV: correct drop logic for columns in post table
We need to recreate the badge_posts view each time we change
column on the posts table.

The p.* is auto expanded on view create, cascade should not have
been used
2020-04-30 12:58:09 +10:00
Guo Xiang Tan
3ac5df0546
UX: Add margin when displaying unlisted details in composer. 2020-04-30 10:50:26 +08:00
David Taylor
367cbf5d2b
FEATURE: Allow user creation with admin api when local logins disabled (#9587) 2020-04-30 11:39:24 +10:00
Sam Saffron
310a7edee5
DEV: remove unused columns from posts and topics
avg_time on posts and topics have not been used in a year.

This uses a re-runnable ddl transaction diasabled migration to
drop the column, cause it touchs very high traffic table and may
deadlock
2020-04-30 11:22:33 +10:00
Martin Brennan
ca539fdccf
FIX: Rename all instances of bookmarkWithReminder to just bookmark (#9579)
* Rename all instances of bookmarkWithReminder and bookmark_with_reminder to just bookmark
* Delete old bookmark code at the same time
* Add migration to remove the bookmarkWithReminder post menu item if people have it set in site settings
2020-04-30 10:09:22 +10:00
Sam Saffron
4f5ed8e781
DEV: pry-nav was holding back on pry upgrades
pry-nav is not yet supported on latest pry, this holds off on
upgrading pry, which in turn holds off on upgrading deps

Stripping pry-nav for now till it works with latest pry
2020-04-30 09:40:50 +10:00
Jeff Wong
8e4fea897e FIX: temporarily disable event listener for dismissing the first notification 2020-04-29 14:34:49 -07:00
Dan Ungureanu
590830b931
HACK: Add dummy plugin folder
On some installations, there may be a leftover symlink which uses the
old plugin name:

  public/plugins/discourse-internet-explorer ->
  -> plugins/discourse-internet-explorer/public
2020-04-29 23:18:57 +03:00
Robin Ward
f697133b7d DEV: Add Handlebars support to Ember CLI in discourse-common 2020-04-29 16:15:13 -04:00
Joffrey JAFFEUX
e1dbc700b1
FIX: ensures widget dropdown doesn't overflow document (#9590) 2020-04-29 21:12:32 +02:00
Dan Ungureanu
c85018cdfd
Improve support for old browsers (#9515)
* FEATURE: Improve crawler view

* FIX: Make lazyYT crawler-friendly

* DEV: Rename discourse-internet-explorer to discourse-unsupported-browser

* DEV: Detect more unsupported browsers

Follow-up to 4eebbd2212.

* FIX: Hide browser update notice in print view
2020-04-29 21:40:21 +03:00
Dan Ungureanu
402194f313
FIX: Enter selected link with 'o' on full page search
Follow-up to 70012f2027.
2020-04-29 21:25:59 +03:00
Joffrey JAFFEUX
816a08cf5f
UI: do not change widget dropdown separator background on hover (#9589) 2020-04-29 19:42:50 +02:00
Joffrey JAFFEUX
a1b9150512
DEV: adds a caret option to widget dropdown (#9588) 2020-04-29 19:37:21 +02:00
Robin Ward
08fbf199ad FIX: S3 rake task can ignore yarn.lock 2020-04-29 13:14:04 -04:00
Joshua Rosenfeld
bc1f7bd659
UX: Improve discobot random menion copy
As suggested in https://meta.discourse.org/t/-/148336/10
2020-04-29 13:09:16 -04:00
Robin Ward
9ec908950d DEV: Better error handling for s3 task 2020-04-29 12:54:39 -04:00
Robin Ward
cbb27241c4
DEV: Make discourse-common an Ember addon. (#9578)
This is to help with the migration to Ember CLI. In the current running
version of Discourse everything should be the same as before, just with
a few extra files that are not used. However, using Ember CLI this can
be installed as an Ember addon.

Co-Authored-By: Jarek Radosz <jradosz@gmail.com>
2020-04-29 12:18:21 -04:00
Robin Ward
3ec21b4124 SECURITY: Update onebox to add rel="noopener" 2020-04-29 10:57:05 -04:00
Joshua Rosenfeld
9dccf4f3b8
FIX: Copyedit for "You were logged out" modal (#9584)
See https://meta.discourse.org/t/-/149596
2020-04-29 07:37:06 -07:00
Joffrey JAFFEUX
db337b10ee
FIX: correctly hides timeline scroller for short posts (#9581)
* FIX: correctly hides timeline scroller for short posts

* fix linting
2020-04-29 16:24:53 +02:00
dependabot-preview[bot]
7ccfc73edb
Build(deps): Bump rqrcode_core from 0.1.1 to 0.1.2 (#9244)
Bumps [rqrcode_core](https://github.com/whomwah/rqrcode_core) from 0.1.1 to 0.1.2.
- [Release notes](https://github.com/whomwah/rqrcode_core/releases)
- [Commits](https://github.com/whomwah/rqrcode_core/commits)
2020-04-29 12:58:52 +01:00
David Taylor
6a9a7b56df
DEV: Bump Hashie and Faraday (#9583)
These were previously pinned due to a dependency in the zendesk plugin. That has now been resolved.
2020-04-29 12:55:30 +01:00
Joffrey JAFFEUX
4eed86919e
FIX: ensures card cloak is removed (#9582)
Repro steps for current failure:
- use mobile view
- click on a different user avatar to show user card
- click message
- close composer
- cloak is still showing and prevents any click
2020-04-29 13:30:34 +02:00
Dan Ungureanu
7a946b9f5c
DEV: Fix typo
Follow-up to 0a85a7aef9.
2020-04-29 12:38:57 +03:00
Dan Ungureanu
0a85a7aef9
FEATURE: Add user_profile to user_archive CSV export (#9571) 2020-04-29 12:09:50 +03:00
Guo Xiang Tan
ae54a33719
FIX: discourse-presence breaks composer for users. 2020-04-29 15:29:29 +08:00
Guo Xiang Tan
4fe4b3cce3
DEV: Revert quiet assets in dev.
Breaks with a gem.
2020-04-29 14:46:56 +08:00
Guo Xiang Tan
16ab6430fe
DEV: Follow up to a078feee07 2020-04-29 14:24:48 +08:00
Guo Xiang Tan
a078feee07
DEV: Turn off ActiveRecord development color and query log by default.
It breaks logster.
2020-04-29 14:19:14 +08:00
Guo Xiang Tan
fa21c03a1d
DEV: Minor follow up to 1d04fb24f8 2020-04-29 14:09:19 +08:00
Guo Xiang Tan
1d04fb24f8
DEV: Enable all the ActiveRecord goodness in development env. 2020-04-29 14:08:00 +08:00
Guo Xiang Tan
960fd94758
DEV: Run rubocop in parallel in pre-commit hook. 2020-04-29 13:48:59 +08:00
Guo Xiang Tan
197d0332e6
DEV: Missing import. 2020-04-29 13:48:31 +08:00
Guo Xiang Tan
8b3b5d1474
DEV: Disable discourse-presence in Ember test env. 2020-04-29 13:39:19 +08:00
Sam Saffron
2045f51312
FIX: correctly account for direct replies with presence
followup to 301a0fa5, we were not checking the action so we
were publishing some messages from composers we did not intend to
2020-04-29 15:23:17 +10:00
Alan Guo Xiang Tan
301a0fa54e
FEATURE: Redesign discourse-presence to track state on the client side. (#9487)
Before this commit, the presence state of users were stored on the
server side and any updates to the state meant we had to publish the
entire state to the clients. Also, the way the state of users were
stored on the server side meant we didn't have a way to differentiate
between replying users and whispering users.

In this redesign, we decided to move the tracking of users state to the client
side and have the server publish client events instead. As a result of
this change, we're able to remove the number of opened connections
needed to track presence and also reduce the payload that is sent for
each event.

At the same time, we've also improved on the restrictions when publishing message_bus messages. Users that
do not have permission to see certain events will not receive messages
for those events.
2020-04-29 14:48:55 +10:00
Guo Xiang Tan
5503eba924
DEV: Add env in dev to support verbose query log. 2020-04-29 11:10:57 +08:00
Kris
24b9759c60 UX: fix dashboard version panel width by removing extra wrapper 2020-04-28 21:14:29 -04:00
Martin Brennan
6cf31f16f7
FIX: Change bookmarks-with-reminders URL back to bookmarks for user activity (#9566)
* Bookmarks with reminders is a core feature now, no need to have a separate URL
* Keep around the old /u/:username/activity/bookmarks-with-reminders route for backwards compat in Ember but just redirect to user activity bookmarks.
2020-04-29 10:53:37 +10:00
Martin Brennan
17ca47af1a
FIX: Remove timezone in brackets from user card (#9567)
For clarity and to save space remove the timezone in brackets e.g. (EDT) from the user card. Also add a title to the user time span to say it is Local Time.
2020-04-29 08:45:38 +10:00
Martin Brennan
0d519c78fb
FIX: Do not save bookmark if close (X) on modal is clicked (#9541)
* After this change the bookmark will still be saved if clicking out of the modal or pressing escape
* To achieve this I implemented an initiatedBy parameter for modal closing from d-modal. If clicking on the cross it is initiated by close, if clicking out of the modal it is by click out.
* These options can then be compared in controllers consuming onClose

Co-authored-by: Joffrey JAFFEUX <j.jaffeux@gmail.com>
2020-04-29 08:45:06 +10:00
Joffrey JAFFEUX
4ecc258835
DEV: adds support for header-title in topicTitleDecorators (#9562) 2020-04-28 21:27:07 +02:00