Commit Graph

50 Commits

Author SHA1 Message Date
Arpit Jalan
b0e781e2d4 FIX: do not follow redirect on same host with path /login or /session 2019-08-07 16:26:55 +05:30
Sam Saffron
7429700389 FIX: ensure we can download maxmind without redis or db config
This also corrects FileHelper.download so it supports "follow_redirect"
correctly (it used to always follow 1 redirect) and adds a `validate_url`
param that will bypass all uri validation if set to false (default is true)
2019-05-28 10:28:57 +10:00
Sam Saffron
30990006a9 DEV: enable frozen string literal on all files
This reduces chances of errors where consumers of strings mutate inputs
and reduces memory usage of the app.

Test suite passes now, but there may be some stuff left, so we will run
a few sites on a branch prior to merging
2019-05-13 09:31:32 +08:00
Gerhard Schlager
92df6890df FIX: GET request didn't use headers 2019-03-08 21:36:49 +01:00
Sam
cfddfa6de2 SECURITY: bypass long GET requests
In some rare cases we would check URLs with very large payloads
this ensures we always bypass and do not read entire payloads
2019-02-27 14:51:28 +11:00
Arpit Jalan
1ab91f0474 FIX: preserve github fragment URL 2018-12-19 12:34:47 +05:30
Guo Xiang Tan
8dc1463ab3 Enable Lint/ShadowingOuterLocalVariable for Rubocop. 2018-09-04 10:16:42 +08:00
Bianca Nenciu
b6963b8ffb FIX: Ignore OneBox blacklisted domains. 2018-08-27 20:40:55 +02:00
Régis Hanol
de92913bf4 FIX: store the topic links using the cooked upload url 2018-08-14 12:23:32 +02:00
Robin Ward
7058205f70 FIX: Broken specs 2018-07-24 12:00:34 -04:00
Robin Ward
236243f38a SECURITY: Consider 0.0.0.0 a private IP 2018-07-24 11:16:27 -04:00
Guo Xiang Tan
d43895e2a0 Don't log 404s for FinalDestination.
* We can't do anything about 404s
2018-05-25 10:11:16 +08:00
Guo Xiang Tan
142571bba0 Remove use of rescue nil.
* `rescue nil` is a really bad pattern to use in our code base.
  We should rescue errors that we expect the code to throw and
  not rescue everything because we're unsure of what errors the
  code would throw. This would reduce the amount of pain we face
  when debugging why something isn't working as expexted. I've
  been bitten countless of times by errors being swallowed as a
  result during debugging sessions.
2018-04-02 13:52:51 +08:00
Guo Xiang Tan
ee69d58a59 FIX: Tests could get stucked in infinite loop if it fails to resolve IP of a hostname. 2018-03-28 14:49:05 +08:00
Gerhard Schlager
4a54c09e46 FIX: Retry with GET request when HEAD fails with error 400 2018-02-27 12:07:16 +01:00
Régis Hanol
0559a4736a FIX: don't double request when downloading a file 2018-02-24 12:35:57 +01:00
Gerhard Schlager
b6277e208b FIX: Cookies header didn't have the right format 2018-02-19 12:46:57 +01:00
Sam
fa5880e04f PERF: ability to crawl for titles without extra HEAD req
Also, introduces a much more aggressive timeout for title crawling
and introduces gzip to body that is crawled
2018-01-29 15:40:12 +11:00
Gerhard Schlager
e30851e45a Move escape_uri method to a more suitable place 2017-12-12 20:17:46 +01:00
Régis Hanol
de037da731 FIX: FinalDestination's small_get method wasn't using proper request headers 2017-11-17 17:24:35 +01:00
Régis Hanol
aebcd56300 FIX: try a GET for error code 406 2017-11-17 16:59:51 +01:00
Régis Hanol
221ff24418 SQL != Ruby 2017-11-17 16:12:20 +01:00
Régis Hanol
a0fc8bd924 don't log 404s to gravatar.com 2017-11-17 15:38:26 +01:00
Sam
3ac7d041ae UX: generic onebox treats all square images as avatars and renders them smaller 2017-11-13 11:21:19 +11:00
Gerhard Schlager
d1f257d275 FinalDestination should only log when verbose is enabled 2017-10-31 17:16:59 +01:00
Gerhard Schlager
8c27f28dcb add more logging to FinalDestination 2017-10-31 12:26:35 +01:00
Sam Saffron
8185b8cb06 FEATURE: cache https redirects per hostname
If a hostname does an https redirect we cache that so next
lookup does not incur it.

Also, only rate limit per ip once per final destination

Raise final destination protection to 1000 ip lookups an hour
2017-10-17 16:22:54 +11:00
Sam
70bb2aa426 FEATURE: allow specifying s3 config via globals
This refactors handling of s3 so it can be specified via GlobalSetting

This means that in a multisite environment you can configure s3 uploads
without actual sites knowing credentials in s3

It is a critical setting for situations where assets are mirrored to s3.
2017-10-06 16:20:01 +11:00
Sam
8ecf313a81 FIX: correctly raise errors when downloads fail
This corrects an issue where we are hitting Gravatar for 404 over and over

Also ensures file download properly reports errors
2017-09-28 16:35:43 +10:00
Guo Xiang Tan
5324c01209 FIX: Don't raise an error if reading from URL timeout. 2017-09-27 14:53:22 +08:00
Guo Xiang Tan
367fb1c524 FIX: Onebox fails on encoded URL.
https://meta.discourse.org/t/onebox-breaks-if-theres-chinese-text-in-url/67364
2017-09-26 18:34:54 +08:00
Joffrey JAFFEUX
6cd8203686 FIX: allows onebox to force GET hosts returning wrong headers on HEAD 2017-08-08 11:44:27 +02:00
Arpit Jalan
b059a0f789 extract url escaping to a dedicated class method and improved tests 2017-07-29 22:16:51 +05:30
Arpit Jalan
1fe553873c FIX: preserve fragment identifier when escaping url 2017-07-29 17:22:45 +05:30
Guo Xiang Tan
5012d46cbd Add rubocop to our build. (#5004) 2017-07-28 10:20:09 +09:00
Guo Xiang Tan
b534778f46 FIX: Escape URL before attempting to resolve it. 2017-07-18 10:04:24 +09:00
Guo Xiang Tan
089a1bd3be Specify the error that we want to ignore instead of rescuing all errors. 2017-07-18 09:55:52 +09:00
Robin Ward
db485ae0da FIX: Support for skipping redirects on certain domains (like steam) 2017-06-26 15:38:43 -04:00
Robin Ward
7366f334b0 FIX: Try a GET for error code 409 too -- (Medium posts) 2017-06-15 15:09:59 -04:00
Robin Ward
009f0921dc FEATURE: Whitelist hosts for internal crawling 2017-06-13 12:59:54 -04:00
Robin Ward
a3729b51eb FIX: Always allow the host the forum is hosted on 2017-06-12 13:22:51 -04:00
Robin Ward
53b95f009f FIX: If HEAD is not supported, try GET. Also set cookies 2017-06-06 13:53:49 -04:00
Robin Ward
0a08c18a14 FIX: Don't rate limit gravatar downloads 2017-05-24 13:54:26 -04:00
Robin Ward
3b0cbf7013 FIX: Always allow downloads from CDN 2017-05-23 16:32:54 -04:00
Robin Ward
b81e7be9a1 FEATURE: Rate limit how often we'll crawl a destination IP 2017-05-23 15:03:04 -04:00
Robin Ward
36e477750c FIX: Use same code path for downloading images 2017-05-23 14:51:30 -04:00
Robin Ward
e5e7a15a85 SECURITY: Never crawl by IP 2017-05-23 13:07:18 -04:00
Robin Ward
93a5fc62bf FEATURE: A site setting to prevent crawling on private IP blocks 2017-05-23 11:56:06 -04:00
Robin Ward
b8d78b33c6 FIX: Other content types like images are fine 2017-05-22 16:51:37 -04:00
Robin Ward
b23fc2bf84 Helper to find the final destination for a URL 2017-05-22 15:52:41 -04:00