This reduces chances of errors where consumers of strings mutate inputs
and reduces memory usage of the app.
Test suite passes now, but there may be some stuff left, so we will run
a few sites on a branch prior to merging
* SPEC: PollFeedJob parsing atom feed
* add FeedItemAccessor
It is to provide a consistent interface to access a feed item's tag
content.
* add FeedElementInstaller
to install non-standard and non-namespaced feed elements
* FEATURE: replace SimpleRSS with Ruby RSS module
* get FinalDestination and download with Excon
* support namespaced element with FeedElementInstaller
Scrubbing an ASCII-8BIT string isn't ever going to remove anything, because
there's no code point that isn't valid 8-bit ASCII. Since we'd really
prefer it if everything were UTF-8 anyway, we'll just assume, for now, that
whatever comes out of SimpleRSS is probably UTF-8, and just nuke anything
that isn't a valid UTF-8 codepoint.
Of course, the *real* bug here is that SimpleRSS [unilaterally converts
everything to
ASCII-8BIT](https://github.com/cardmagic/simple-rss/issues/15). It's
presumably *far* too much to ask that it detects the encoding of the source
RSS feed and marks the parsed strings with the correct encoding...
There are people who have RSS feeds set up that do HTTPS -> HTTP
redirects which throw errors. Since RSS feeds are all configured
by admins I think it's OK if they allow an unsafe redirect as the
content is public anyway. This will reduce many server side errors.
SimpleRss is unreliable with parsing RSS feeds that contain German Umlauts.
For example this feed http://www.lauffeuer-lb.de/api/v2/articles.xml can't be
parsed by SimpleRss. Discourse's logs are full of
```
Job exception: Wrapped Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8
Job exception: incompatible character encodings: ASCII-8BIT and UTF-8
```
The embedding fails because the feed can't be parsed.
This change forces the encoding (using #scrub) which prevents the numerous
encoding errors.
Feature to allow each imported post to be created using a different discourse
username. A possible use case of this is a multi-author blog where discourse
is being used to track comments. This feature allows authors to receive
updates when someone leaves a comment on one of their articles because each of
the imported posts can be created using the discourse username of the author.