discourse/lib/slug.rb
Rafael dos Santos Silva 76ab0350f1
FIX: Properly encoded slugs when configured to (#8158)
When an admin changes the site setting slug_generation_method to
encoded, we weren't really encoding the slug, but just allowing non-ascii
characters in the slug (unicode).

That brings problems when a user posts a link to topic without the slug, as
our topic controller tries to redirect the user to the correct URL that contains
the slug with unicode characters. Having unicode in the Location header in a
response is a RFC violation and some browsers end up in a redirection loop.

Bug report: https://meta.discourse.org/t/-/125371?u=falco

This commit also checks if a site uses encoded slugs and clear all saved slugs
in the db so they can be regenerated using an onceoff job.
2019-10-11 12:38:16 -03:00

62 lines
1.8 KiB
Ruby

# encoding: utf-8
# frozen_string_literal: true
module Slug
CHAR_FILTER_REGEXP = /[:\/\?#\[\]@!\$&'\(\)\*\+,;=_\.~%\\`^\s|\{\}"<>]+/ # :/?#[]@!$&'()*+,;=_.~%\`^|{}"<>
MAX_LENGTH = 255
def self.for(string, default = 'topic', max_length = MAX_LENGTH)
string = string.gsub(/:([\w\-+]+(?::t\d)?):/, '') if string.present? # strip emoji strings
slug =
case (SiteSetting.slug_generation_method || :ascii).to_sym
when :ascii then self.ascii_generator(string)
when :encoded then self.encoded_generator(string)
when :none then self.none_generator(string)
end
# Reject slugs that only contain numbers, because they would be indistinguishable from id's.
slug = (slug =~ /[^\d]/ ? slug : '')
slug = self.prettify_slug(slug, max_length: max_length)
slug.blank? ? default : slug
end
def self.sanitize(string, downcase: false, max_length: MAX_LENGTH)
slug = self.encoded_generator(string, downcase: downcase)
self.prettify_slug(slug, max_length: max_length)
end
private
def self.prettify_slug(slug, max_length:)
slug
.tr("_", "-")
.truncate(max_length, omission: '')
.squeeze('-') # squeeze continuous dashes to prettify slug
.gsub(/\A-+|-+\z/, '') # remove possible trailing and preceding dashes
end
def self.ascii_generator(string)
I18n.with_locale(SiteSetting.default_locale) do
string.tr("'", "").parameterize
end
end
def self.encoded_generator(string, downcase: true)
# This generator will sanitize almost all special characters,
# including reserved characters from RFC3986.
# See also URI::REGEXP::PATTERN.
string = string.strip
.gsub(/\s+/, '-')
.gsub(CHAR_FILTER_REGEXP, '')
string = string.downcase if downcase
CGI.escape(string)
end
def self.none_generator(string)
''
end
end