Commit Graph

57 Commits

Author SHA1 Message Date
Daniel Imfeld 5bf00efe39 Remove unnecessary HTML_ABSOLUTE_LINKS flag 2014-05-29 09:17:20 -05:00
Daniel Imfeld 4ccf982a9e Add tests for absolute prefix 2014-05-25 13:22:33 -05:00
Daniel Imfeld 2ce0592896 Add tests for new footnote functionality 2014-05-25 13:07:05 -05:00
Daniel Imfeld 628c02d37b Move footnote prefix to a better place 2014-05-24 14:28:37 -05:00
Daniel Imfeld ec41294bc4 Add footnote prefix option. Needs testing 2014-05-24 02:55:13 -05:00
Daniel Imfeld 5c12499aa1 Add ability to convert relative links to absolute 2014-05-18 01:28:15 -05:00
Martin Probst 7daa6e8b70 Move sanitization tests into their own file.
Also adds an explicit test for [link](...) syntax to be sanitized.
2014-05-03 14:37:23 +02:00
Vytautas Šaltenis 717a976f69 Merge pull request #76 from mprobst/self-closing
feat: Write self-closing tags with a />
2014-05-03 15:11:53 +03:00
Martin Probst 55d8f72dde feat: Write self-closing tags with a />
Adds tests for self-closing tags both for correct writing and for correct
sanitization, i.e. stripping attributes on them.
2014-05-03 13:59:10 +02:00
Martin Probst 11e042f6c1 Avoid raw mode parsing so that raw mode tags like <script> don't cause issues.
Certain tags like <script> but also <title> and others switch an HTML5 parser
into raw mode, which causes the rest of the HTML string to be always parsed as
text, including any elements or entities that we do want to support (e.g. <p>).

As we're going to escape any of the raw text elements anyway (it's e.g. script,
style, title, xmp, noframes, and a couple of others) we can just switch of raw
text parsing by disabling it after each starting tag.
2014-05-03 13:26:52 +02:00
Martin Probst 915f7049a0 Add a test for the correct handling of escaped entities in HTML.
The sanitization code does not retain any particular escaped entities - it
parses the HTML and thus loses the information on what entities were in the
original. The result is correct UTF-8 HTML though.
2014-05-03 12:34:16 +02:00
Martin Probst 8d2af3a21b Add support for a bunch more safe HTML element tags, and bring them into some order. 2014-05-01 22:08:32 +02:00
Vytautas Šaltenis aeb569ff46 Merge pull request #70 from mprobst/master
fix: Handle all different token types that the parser can emit (d'oh).
2014-05-01 21:59:07 +03:00
Martin Probst f9b7593e65 fix: Handle all different token types that the parser can emit (d'oh). 2014-05-01 20:55:53 +02:00
Vytautas Šaltenis 3dba5bc56e Merge branch 'master' of github.com:gihnius/blackfriday into gihnius-master
Conflicts:
	html.go
	inline_test.go
2014-05-01 21:43:42 +03:00
Vytautas Šaltenis b44be78459 Allow rel attribute in sanitizer
Fixes issue #68.
2014-05-01 20:49:49 +03:00
Martin Probst 41251715ad Use go.net/html's parser to sanitize HTML.
Use an HTML5 compliant parser that interprets HTML as a browser would to parse
the Markdown result and then sanitize based on the result.
Escape unrecognized and disallowed HTML in the result.
Currently works with a hard coded whitelist of safe HTML tags and attributes.
2014-04-27 23:40:44 +02:00
Vytautas Šaltenis 55bb56bf9b Merge pull request #55 from rtfb/master
Autolink fixes
2014-03-30 19:58:39 +03:00
Vytautas Šaltenis d643453f1e Merge pull request #50 from rtfb/master
Better protection against JavaScript injection
2014-03-30 19:52:13 +03:00
gihnius c9977f0c0b test: add nofollow ref for non internal links only 2014-03-21 11:17:31 +08:00
gihnius ecf59d4a55 add target blank attr 2014-03-21 10:52:46 +08:00
Graham Miller d71c759108 add HTML_NOFOLLOW_LINKS 2014-02-25 09:21:57 -05:00
Vytautas Šaltenis e5937643a9 Fix bug in autolink with trailing semicolon
In case the link ends with escaped html entity, the semicolon is a part
of the link and should not be interpreted as punctuation.
2014-02-17 21:09:04 +02:00
Vytautas Šaltenis b0bdfbec4c Fix bug in autolink overescaping html entities
If autolink encounters a link which already has an escaped html entity,
it would escape the ampersand again, producing things like these:
    &amp;  --> &amp;amp;
    &quot; --> &amp;quot;
This commit solves that by first looking for all entity-looking things
in the link and copying those ranges verbatim, only considering the rest
of the string for escaping.
Doesn't seem to have considerable performance impact.
The mailto: links are processed the old way.
2014-02-17 21:09:04 +02:00
Vytautas Šaltenis f2d43f69a4 Fix bug in autolink termination
Detect the end of link when it is immediately followed by an element.
2014-02-17 21:09:03 +02:00
Vytautas Šaltenis 9fc8c9d866 Fix bug with overzealous autolink processing
When the source Markdown contains an anchor tag with URL as link text
(i.e. <a href=...>http://foo.bar</a>), autolink converts that link text
into another anchor tag, which is nonsense. Detect this situation with
regexp and early exit autolink processing.
2014-02-17 21:09:03 +02:00
Vytautas Šaltenis 2f50a53f8e Rename HTML_SKIP_SCRIPT to HTML_SANITIZE_OUTPUT 2014-01-22 01:23:43 +02:00
Vytautas Šaltenis 55cd82008e Rewrite protection against JavaScript injection
This drops the naive approach at <script> tag stripping and resorts to
full sanitization of html. The general idea (and the regexps) is grabbed
from Stack Exchange's PageDown JavaScript Markdown processor[1]. Like in
PageDown, it's implemented as a separate pass over resulting html.

Includes a metric ton (but not all) of test cases from here[2]. Several
are commented out since they don't pass yet.

Stronger (but still incomplete) fix for #11.

[1] http://code.google.com/p/pagedown/wiki/PageDown
[2] https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet
2014-01-22 01:14:35 +02:00
Darren Coxall 607ec21435 Tests for links when using HTML_SAFELINK 2013-12-19 10:00:47 +00:00
Russ Ross ca82b8db3a panic fix (issue #33) with test case 2013-09-11 12:47:43 -06:00
Alex Xandra Albert Sim da8f2753e2 Added test for link inside image 2013-09-09 12:51:20 +07:00
athom 31798e0eab add testcase for GFM autolink 2013-08-09 17:24:26 +08:00
moshee 3ea84a5811 parser no longer returns prematurely from empty footnote ref 2013-07-08 22:34:12 +00:00
moshee 1a73bae554 added slice bounds check 2013-07-08 06:54:25 +00:00
moshee c23099e5ee Implementation and some tests for inline footnotes. Also I noticed the list items had the wrong ids, that was silly of me. 2013-07-01 01:37:52 +00:00
moshee 7bdb82c53a new tests pass but old tests now fail... 2013-06-26 15:57:51 +00:00
moshee be082a1ef2 First attempt at supporting Pandoc-style footnotes. The existing tests have not broken but the new functionality does not work yet. 2013-06-25 01:18:47 +00:00
Vytautas Šaltenis 8226238289 Improve html element stripping code 2013-04-18 03:15:47 +03:00
Vytautas Šaltenis 85e2207cd0 Couple more tests 2013-04-14 01:42:47 +03:00
Vytautas Šaltenis dcaaa9b5dc More <script> stripping
Partially addresses issue #11.
2013-04-13 23:24:30 +03:00
Vytautas Šaltenis fb923cdb78 Add an option to strip <script> elements
Partially addresses issue #11.
2013-04-13 22:57:16 +03:00
Vytautas Šaltenis b79e720a36 Make isHtmlTag() case insensitive 2013-04-13 22:34:37 +03:00
Vytautas Šaltenis d5a8df164b Fix bug in isHtmlTag()
Fix what seems to be a typo. j should iterate through all tagname, so it
should be initialized to zero. The test exposes this bug.
2013-04-13 22:21:47 +03:00
Vytautas Šaltenis 90509d39d4 Make a way to parameterize inline tests
Expose extensions and html flags parameters so that tests could specify
what code paths they want to exercise.
2013-04-13 22:18:14 +03:00
Russ Ross e35b4b66cc bounds checking stress tests 2011-07-03 10:51:07 -06:00
Russ Ross ae9562f685 move whitespace stripping to parser, not renderers 2011-06-29 15:38:35 -06:00
Russ Ross 2aca667078 simplify inline callback interface 2011-06-29 13:00:54 -06:00
Russ Ross 873a60ad49 complete page rendering is now an option in the library 2011-06-29 10:08:56 -06:00
Russ Ross c969dff782 added simplified interface for common usage 2011-06-28 15:55:27 -06:00
Russ Ross fde2c60665 version number, few more options for command-line tool 2011-06-28 11:30:10 -06:00