Alexander Dickson

Replacing text with JavaScript

Imagine for a moment you are tasked with writing some code to highlight the search term that were used to find a page. You need to do it on the client side because you can’t invalidate the server side cache based on the referrer.

Once we have parsed the referrer for the search terms, how would we go about matching the term in the page and wrapping it with a span in JavaScript?

The Wrong Way to do it

Sometimes, you see code like this being suggested…

document.body.innerHTML = document.body.innerHTML.replace(new RegExp("\\b" + searchTerm + "\\b", "g"),
                                                          "<span class='search-term'>$&</span>"); 

This has a number of problems…

  1. The browser has to serialise and then deserialise the HTML.
  2. This regular expression could potentially match anything including portions of HTML, producing invalid markup.

Please don’t ever use this - the risk of breakage is very high on any Real World scripts.

Doing it The Right Way

Essentially, the right way to do it is…

  1. Iterate over all text nodes.
  2. Find the substring in text nodes.
  3. Split it at the offset.
  4. Insert a span element in between the split.

Putting it all together

Let’s build a generic function that takes a reference to a DOM element, iterates over its text nodes, runs a regex on them and executes a callback.

var matchText = function(node, regex, callback, excludeElements) { 

    excludeElements || (excludeElements = ['script', 'style', 'iframe', 'cavas']);
    
    var child = node.firstChild;
   
    do {

        switch (child.nodeType) {

        case 1:
            if (excludeElements.indexOf(child.tagName.toLowerCase()) > -1) {
                continue;
            }

            matchText(child, regex, callback, excludeElements);
            break;

        case 3:
           child.data.replace(regex, function(all) {
                var args = [].slice.call(arguments),
                    offset = args[args.length - 2],
                    newTextNode = child.splitText(offset);

                newTextNode.data = newTextNode.data.substr(all.length);

                callback.apply(window, [child].concat(args));

                child = newTextNode;
     
            });
            break;

        }

    } while (child = child.nextSibling);

    return node;

}

…and then we invoke it…

matchText(document.getElementsByTagName("article")[0], new RegExp("\\b" + searchTerm + "\\b", "g"), function(node, match, offset) {
    var span = document.createElement("span");
    span.className = "search-term";
    span.textContent = match;
    node.parentNode.insertBefore(span, node.nextSibling); 
});

How does it work?

We iterate over all child nodes by checking the nextSibling, as this script adds siblings and it would be trapped in an infinite loop if we used a simple for loop with an index.

While iterating, we check if the node is an element or a text node. If the node is an element, we ensure it’s not an element we don’t care about (I am skipping these elements because iterating over their text nodes is unnecessary), and if not, we call the function again passing in the new context.

If the node is a text node, we let the fun begin. We call the supplied regex on the text node, and in the replace() callback, we split the text node at the offset with splitText(). Our passed in callback does the job of inserting the new element which wraps the substring in our example.

innerHTML bad, text nodes good!

Next time you reach for your innerHTML to do some text replacements, please do yourself, your visitors and your browser a favour by using text nodes.


Want to discuss this post? Just mention me @alexdickson.