Force UTF-8 in Filedotto’s Tika handler:
Edit tika-config.xml :
# Install Tesseract 5+ apt-get install tesseract-ocr tesseract-ocr-eng -Dtika.ocr.language=eng -Dtika.ocr.path=/usr/bin/tesseract filedotto tika fixed