Truncate properly your HTML strings in Ruby

When you have some HTML contents in your webapp and want to show an extract of them, your first think can be something like "well, easy, Rails has a truncate helper, I can use it". Then you discover that it's not a simple problem.

For example, if you want to truncate <p>Foo bar <b>baz quux</b></p> to 3 words and use ... as the ellipsis, what will be your solution? Please, take 10 seconds to think about it… I'm waiting…

So, your suggestion is probably one of these:

  1. <p>Foo bar <b>baz...</b></p>
  2. <p>Foo bar <b>baz</b></p>...
  3. <p>Foo bar <b>baz</b>...</p>

First observation, the string is a well formed XML fragment (ie open tags are closed correctly). And the tag names are not count in the 3 words. So, you can't use a simple truncate for raw text.

The documentation of the truncate helper of Rails says it:

Care should be taken if text contains HTML tags or entities, because truncation may produce invalid HTML (such as unbalanced or incomplete tags).

So, is there other gems for doing this? Yes, several:

But, I'm not pleased with the results of these scripts. Typically, on the example, their outputs are the first or second, but I prefer the third solution. I think the ellipsis should be inside the paragraph (<p>), but not in bold (<b>).

I also have some requirements that are not always respected by them:

  • Do not leave empty DOM elements like <p>Foo Bar <b></b></p>
  • Do not cut words (<p>Foo Bar <b>B...</b></p>)
  • The ellipsis can be customized and can contain HTML (<p>Foo bar <b>Baz</b><a href="/more">(...)</a></p>.

So I wrote my own gem: HTML truncator. Install it with gem install html_truncator and use it in your Ruby code.

But, just before that, just read this nice example:

HTML_Truncator.truncate("<p>Lorem ipsum dolor sit amet.</p>", 3)
# => "<p>Lorem ipsum dolor...</p>"

HTML_Truncator.truncate("<p>Lorem ipsum dolor sit amet.</p>", 3, '<a href="/more">...</a>')
 => "<p>Lorem ipsum dolor<a href="/more">...</a></p>"

blog comments powered by Disqus