Nokogiri/libxml is now more strict in terms of params it receives.
It uses kwargs vs options object (I fixed an issue there in #30545) doesn't accept nil/blank html (fixed here) and most importantly handles encoding in a different way. It seems to require explicitly specifying UTF8.
* Build(deps): Bump nokogiri from 1.16.8 to 1.18.1
Bumps [nokogiri](https://github.com/sparklemotion/nokogiri) from 1.16.8 to 1.18.1.
- [Release notes](https://github.com/sparklemotion/nokogiri/releases)
- [Changelog](https://github.com/sparklemotion/nokogiri/blob/main/CHANGELOG.md)
- [Commits](https://github.com/sparklemotion/nokogiri/compare/v1.16.8...v1.18.1)
---
updated-dependencies:
- dependency-name: nokogiri
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
When serializing the `body_changes` in the `PostRevisionSerializer`, we create two diffs: one for the `cooked` and another one for the `raw` version of the post.
Inside `DiscourseDiff`, we generate both `html` and `markdown` diffs when we only need the `html` diffs for the `cooked` version of the post and the `markdown` diff for the `raw` version of the post.
This solves the issue repored in https://meta.discourse.org/t/server-error-accessing-topic-revisions-on-a-specific-topic/339185 where some revisions would return 500 because of a `ArgumentError : Attributes per element limit exceeded` exception when trying to generate the `html` diff on a very large `raw`.
Zeitwerk simplifies working with dependencies in dev and makes it easier reloading class chains.
We no longer need to use Rails "require_dependency" anywhere and instead can just use standard
Ruby patterns to require files.
This is a far reaching change and we expect some followups here.
This reduces chances of errors where consumers of strings mutate inputs
and reduces memory usage of the app.
Test suite passes now, but there may be some stuff left, so we will run
a few sites on a branch prior to merging
* FIX: Don't diplay character reference in HTML diffs
Before this change, HTML escaping was done before splitting text into
tokens, so token splitter saw literals like "'", and split them as
it was normal text into parts into ["&", "#", "39", ";"]. This caused
diff to display character references, as those tokens used separate
HTML tags to display their insertion/deletion status.
* Avoid making one element arrays while generating diffs