Trying to truncate encoded slugs will mean that we have to keep the URL
valid, which can be tricky as you have to be aware of multibyte
characters.
Since we already have upper bounds for the title, the slug won't grow
for more than title*6 in the worst case. The slug column in the topic
table can store that just fine.
Added a test to ensure that a generated slug is a valid URL too, so we
don't introduce regressions in the future.
When an admin changes the site setting slug_generation_method to
encoded, we weren't really encoding the slug, but just allowing non-ascii
characters in the slug (unicode).
That brings problems when a user posts a link to topic without the slug, as
our topic controller tries to redirect the user to the correct URL that contains
the slug with unicode characters. Having unicode in the Location header in a
response is a RFC violation and some browsers end up in a redirection loop.
Bug report: https://meta.discourse.org/t/-/125371?u=falco
This commit also checks if a site uses encoded slugs and clear all saved slugs
in the db so they can be regenerated using an onceoff job.
This reduces chances of errors where consumers of strings mutate inputs
and reduces memory usage of the app.
Test suite passes now, but there may be some stuff left, so we will run
a few sites on a branch prior to merging
- Moves the already existing transliteration rules into `transliterations.en.yml` (there's no need to translate this for every language). The same goes for the stringex configuration.
- Doesn't calculate the default slug for *zh_CN* and *ja* anymore. It hasn't been used anyway since stringex is used instead.
- Removes a wrong comment from the Russion transliteration file (I hate wrong comments)