Removing non UTF-8 Characters

While generating a PDF from a dynamically created HTML file, I found that the PDF generation failed as there were non UTF-8 characters in the HTML file.

To try and find these characters, I used the strings command with the -n 8 switch to remove any non UTF characters:

cat original.html | strings -n 8 > nonUTF.html

I was then able to compare the two html files to find out where the non UTF characters were appearing.