Commit Graph

40 Commits

Author SHA1 Message Date
Vytautas Šaltenis b44be78459 Allow rel attribute in sanitizer
Fixes issue #68.
2014-05-01 20:49:49 +03:00
Martin Probst 41251715ad Use go.net/html's parser to sanitize HTML.
Use an HTML5 compliant parser that interprets HTML as a browser would to parse
the Markdown result and then sanitize based on the result.
Escape unrecognized and disallowed HTML in the result.
Currently works with a hard coded whitelist of safe HTML tags and attributes.
2014-04-27 23:40:44 +02:00
Vytautas Šaltenis 55bb56bf9b Merge pull request #55 from rtfb/master
Autolink fixes
2014-03-30 19:58:39 +03:00
Vytautas Šaltenis d643453f1e Merge pull request #50 from rtfb/master
Better protection against JavaScript injection
2014-03-30 19:52:13 +03:00
Graham Miller d71c759108 add HTML_NOFOLLOW_LINKS 2014-02-25 09:21:57 -05:00
Vytautas Šaltenis e5937643a9 Fix bug in autolink with trailing semicolon
In case the link ends with escaped html entity, the semicolon is a part
of the link and should not be interpreted as punctuation.
2014-02-17 21:09:04 +02:00
Vytautas Šaltenis b0bdfbec4c Fix bug in autolink overescaping html entities
If autolink encounters a link which already has an escaped html entity,
it would escape the ampersand again, producing things like these:
    &  --> &
    " --> "
This commit solves that by first looking for all entity-looking things
in the link and copying those ranges verbatim, only considering the rest
of the string for escaping.
Doesn't seem to have considerable performance impact.
The mailto: links are processed the old way.
2014-02-17 21:09:04 +02:00
Vytautas Šaltenis f2d43f69a4 Fix bug in autolink termination
Detect the end of link when it is immediately followed by an element.
2014-02-17 21:09:03 +02:00
Vytautas Šaltenis 9fc8c9d866 Fix bug with overzealous autolink processing
When the source Markdown contains an anchor tag with URL as link text
(i.e. <a href=...>http://foo.bar</a>), autolink converts that link text
into another anchor tag, which is nonsense. Detect this situation with
regexp and early exit autolink processing.
2014-02-17 21:09:03 +02:00
Vytautas Šaltenis 2f50a53f8e Rename HTML_SKIP_SCRIPT to HTML_SANITIZE_OUTPUT 2014-01-22 01:23:43 +02:00
Vytautas Šaltenis 55cd82008e Rewrite protection against JavaScript injection
This drops the naive approach at <script> tag stripping and resorts to
full sanitization of html. The general idea (and the regexps) is grabbed
from Stack Exchange's PageDown JavaScript Markdown processor[1]. Like in
PageDown, it's implemented as a separate pass over resulting html.

Includes a metric ton (but not all) of test cases from here[2]. Several
are commented out since they don't pass yet.

Stronger (but still incomplete) fix for #11.

[1] http://code.google.com/p/pagedown/wiki/PageDown
[2] https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet
2014-01-22 01:14:35 +02:00
Darren Coxall 607ec21435 Tests for links when using HTML_SAFELINK 2013-12-19 10:00:47 +00:00
Russ Ross ca82b8db3a panic fix (issue #33) with test case 2013-09-11 12:47:43 -06:00
Alex Xandra Albert Sim da8f2753e2 Added test for link inside image 2013-09-09 12:51:20 +07:00
athom 31798e0eab add testcase for GFM autolink 2013-08-09 17:24:26 +08:00
moshee 3ea84a5811 parser no longer returns prematurely from empty footnote ref 2013-07-08 22:34:12 +00:00
moshee 1a73bae554 added slice bounds check 2013-07-08 06:54:25 +00:00
moshee c23099e5ee Implementation and some tests for inline footnotes. Also I noticed the list items had the wrong ids, that was silly of me. 2013-07-01 01:37:52 +00:00
moshee 7bdb82c53a new tests pass but old tests now fail... 2013-06-26 15:57:51 +00:00
moshee be082a1ef2 First attempt at supporting Pandoc-style footnotes. The existing tests have not broken but the new functionality does not work yet. 2013-06-25 01:18:47 +00:00
Vytautas Šaltenis 8226238289 Improve html element stripping code 2013-04-18 03:15:47 +03:00
Vytautas Šaltenis 85e2207cd0 Couple more tests 2013-04-14 01:42:47 +03:00
Vytautas Šaltenis dcaaa9b5dc More <script> stripping
Partially addresses issue #11.
2013-04-13 23:24:30 +03:00
Vytautas Šaltenis fb923cdb78 Add an option to strip <script> elements
Partially addresses issue #11.
2013-04-13 22:57:16 +03:00
Vytautas Šaltenis b79e720a36 Make isHtmlTag() case insensitive 2013-04-13 22:34:37 +03:00
Vytautas Šaltenis d5a8df164b Fix bug in isHtmlTag()
Fix what seems to be a typo. j should iterate through all tagname, so it
should be initialized to zero. The test exposes this bug.
2013-04-13 22:21:47 +03:00
Vytautas Šaltenis 90509d39d4 Make a way to parameterize inline tests
Expose extensions and html flags parameters so that tests could specify
what code paths they want to exercise.
2013-04-13 22:18:14 +03:00
Russ Ross e35b4b66cc bounds checking stress tests 2011-07-03 10:51:07 -06:00
Russ Ross ae9562f685 move whitespace stripping to parser, not renderers 2011-06-29 15:38:35 -06:00
Russ Ross 2aca667078 simplify inline callback interface 2011-06-29 13:00:54 -06:00
Russ Ross 873a60ad49 complete page rendering is now an option in the library 2011-06-29 10:08:56 -06:00
Russ Ross c969dff782 added simplified interface for common usage 2011-06-28 15:55:27 -06:00
Russ Ross fde2c60665 version number, few more options for command-line tool 2011-06-28 11:30:10 -06:00
Russ Ross f8f70572a4 simplified BSD license 2011-06-27 20:11:32 -06:00
Russ Ross c8f7e789d4 more robust whitespace stripping and matching corrections to tests 2011-06-27 16:06:16 -06:00
Russ Ross 9a0217f7aa fixed minor bugs uncovered by more testing 2011-06-27 14:35:11 -06:00
Russ Ross 3af64a90ad fixed headers nested in lists, added prefix header unit tests 2011-06-27 10:13:13 -06:00
Russ Ross be0fb4602b more inline unit tests 2011-06-24 16:39:50 -06:00
Russ Ross 1e40ebaf47 unit test for linebreaks 2011-06-01 18:52:55 -06:00
Russ Ross 2abc3af015 starting inline unit tests, fix a few minor bugs they exposed 2011-06-01 12:17:17 -06:00