Commit Graph

72 Commits

Author SHA1 Message Date
Sam
8ecf313a81 FIX: correctly raise errors when downloads fail
This corrects an issue where we are hitting Gravatar for 404 over and over

Also ensures file download properly reports errors
2017-09-28 16:35:43 +10:00
Guo Xiang Tan
5324c01209 FIX: Don't raise an error if reading from URL timeout. 2017-09-27 14:53:22 +08:00
Guo Xiang Tan
367fb1c524 FIX: Onebox fails on encoded URL.
https://meta.discourse.org/t/onebox-breaks-if-theres-chinese-text-in-url/67364
2017-09-26 18:34:54 +08:00
Joffrey JAFFEUX
6cd8203686 FIX: allows onebox to force GET hosts returning wrong headers on HEAD 2017-08-08 11:44:27 +02:00
Arpit Jalan
b059a0f789 extract url escaping to a dedicated class method and improved tests 2017-07-29 22:16:51 +05:30
Arpit Jalan
1fe553873c FIX: preserve fragment identifier when escaping url 2017-07-29 17:22:45 +05:30
Guo Xiang Tan
5012d46cbd Add rubocop to our build. (#5004) 2017-07-28 10:20:09 +09:00
Guo Xiang Tan
b534778f46 FIX: Escape URL before attempting to resolve it. 2017-07-18 10:04:24 +09:00
Guo Xiang Tan
089a1bd3be Specify the error that we want to ignore instead of rescuing all errors. 2017-07-18 09:55:52 +09:00
Robin Ward
db485ae0da FIX: Support for skipping redirects on certain domains (like steam) 2017-06-26 15:38:43 -04:00
Robin Ward
7366f334b0 FIX: Try a GET for error code 409 too -- (Medium posts) 2017-06-15 15:09:59 -04:00
Robin Ward
009f0921dc FEATURE: Whitelist hosts for internal crawling 2017-06-13 12:59:54 -04:00
Robin Ward
a3729b51eb FIX: Always allow the host the forum is hosted on 2017-06-12 13:22:51 -04:00
Robin Ward
53b95f009f FIX: If HEAD is not supported, try GET. Also set cookies 2017-06-06 13:53:49 -04:00
Robin Ward
0a08c18a14 FIX: Don't rate limit gravatar downloads 2017-05-24 13:54:26 -04:00
Robin Ward
3b0cbf7013 FIX: Always allow downloads from CDN 2017-05-23 16:32:54 -04:00
Robin Ward
b81e7be9a1 FEATURE: Rate limit how often we'll crawl a destination IP 2017-05-23 15:03:04 -04:00
Robin Ward
36e477750c FIX: Use same code path for downloading images 2017-05-23 14:51:30 -04:00
Robin Ward
e5e7a15a85 SECURITY: Never crawl by IP 2017-05-23 13:07:18 -04:00
Robin Ward
93a5fc62bf FEATURE: A site setting to prevent crawling on private IP blocks 2017-05-23 11:56:06 -04:00
Robin Ward
b8d78b33c6 FIX: Other content types like images are fine 2017-05-22 16:51:37 -04:00
Robin Ward
b23fc2bf84 Helper to find the final destination for a URL 2017-05-22 15:52:41 -04:00