This will properly extract the text used to generate mathjax expression (both inline and block display modes) as well as remove all the cruft that mathjax is adding in the DOM. Internal ref - t/135307
discourse-common/(utils|lib)
discourse/lib
discourse-common/config/environment