This is what most closely matches what an integration with Tesseract could function like. Amazing how flexible Linux is, as I’ve used Windows for well over 15 years from XP to 11, Linux never stops surprising me despite being on it for over 6 months now.
Ahem, I went on a tangent, vouching for this idea, an integration with Tesseract right into Spectacle, similar to what ShareX does on Windows with its own tools, would be great.
My solution is based on:
- Xorg part is commented (kept therefore compatibility).
- Everything is stored in Screenshots dir.
- I played with picture resize but it seems to me doesn’t affect OCR quality.
- Tesseract by default doesn’t keep UNIX new line, so there is
sed
processing. - I think it is pretty easy to understand.
#!/usr/bin/env bash
PICTURE=${HOME}/__TMP__/Screenshots/Screenshot_$(date +%Y%m%d_%H%M%S)_OCR.png
PICTURE_RESIZED=${HOME}/__TMP__/Screenshots/Screenshot_$(date +%Y%m%d_%H%M%S)_OCR_resized.png
CR=$(printf '\r')
spectacle -r -o ${PICTURE} -b -n 2>/dev/null
if [ -s "${PICTURE}" ]; then
# if [ "$XDG_SESSION_TYPE" = "x11" ]; then
# tesseract -l eng+ces $TMPFILE - | xclip -selection clipboard
# fi
# if [ "$XDG_SESSION_TYPE" = "wayland" ]; then
# mogrify -modulate 100,0 -resize 400% png:- ${TMPFILE} | tesseract -l eng+ces $TMPFILE - | wl-copy
# I'm trying to resize picture, for better OCR process by Tesseract. But maybe it's not necessary
convert ${PICTURE} -resize 400% ${PICTURE_RESIZED}
# Tesseract seems to determine line feeds perferctly fine but it only inserts
# the Line Feed character (0x0A) and not the carriage return character that
# a windows text file expects. (0x0D 0x0A)
TEXT="$(tesseract --psm 6 -l ces ${PICTURE_RESIZED} - | sed "s/\$/$CR/")"
echo ${TEXT} | wl-copy
# fi
kdialog --passivepopup "$(echo ${TEXT})" 7 --title "OCR"
fi
Please bring this feature it can be optional and disabled by default so that it does not consumes resources for people who don’t need it
Thanks for this.
I specified “LANGUAGES=jpn+eng” and added “-l $LANGUAGES” to the tesseract part. It dynamically selects the language based on the highest confidence, and thanks to this, it now works even better than the powertoys implementation I was used to.
(The OCR accuracy is better with Japanese than with English so far )
Updated version of @waldauf:
- Deleting temp files after creation
- Using ‘magick’ instead of deprecated ‘convert’ command
#!/usr/bin/env bash
PICTURE="${HOME}/_temp_Screenshot_$(date +%Y%m%d_%H%M%S)_OCR.png"
PICTURE_RESIZED="${HOME}/_temp_Screenshot_$(date +%Y%m%d_%H%M%S)_OCR_resized.png"
CR=$(printf '\r')
spectacle -r -o "${PICTURE}" -b -n 2>/dev/null
if [ -s "${PICTURE}" ]; then
# Resize picture for better OCR processing (optional)
magick "${PICTURE}" -resize 400% "${PICTURE_RESIZED}"
# Perform OCR and format text for Windows compatibility
TEXT="$(tesseract --psm 6 -l ces "${PICTURE_RESIZED}" - | sed "s/\$/${CR}/")"
# Display extracted text
kdialog --passivepopup "$(echo "${TEXT}")" 7 --title "OCR"
# Cleanup temporary files
rm -f "${PICTURE}" "${PICTURE_RESIZED}"
fi
This is indeed a great showcase of the power of linux
This version uses kate to display the text instead of using the dialog. I find it more convenient plus you don’t need wl-clipboard
nor kdialog
dependencies.
#!/usr/bin/env bash
PICTURE="$HOME/_temp_Screenshot_$(date +%Y%m%d_%H%M%S)_OCR.png"
PICTURE_RESIZED="$HOME/_temp_Screenshot_$(date +%Y%m%d_%H%M%S)_OCR_resized.png"
CR=$(printf '\r')
spectacle -r -o "$PICTURE" -b -n 2>/dev/null
if [ -s "$PICTURE" ]; then
# Resize picture for better OCR processing (optional)
magick "$PICTURE" -resize 400% "$PICTURE_RESIZED"
# Perform OCR and format text for Windows compatibility
TEXT="$(tesseract --psm 6 "$PICTURE_RESIZED" - | sed "s/$/$CR/")"
# Cleanup temporary files
rm -f "$PICTURE" "$PICTURE_RESIZED"
# Open text in Kate
echo "$TEXT" | kate -i
fi
Hey gbyte
Thanks for your version of the script. I changed it to suit my needs and it works great.
Vektor
I created a c++ program that uses spectacle to take a screenshot, and tesseract OCR to extract the text. The extracted text is then displayed in a window made using qt6. The only dependency it needs is spectacle and tesseract and relevant language packs. You can also pass in language parameters.
Here is the cpp code:
#include <leptonica/allheaders.h>
#include <tesseract/baseapi.h>
#include <QCommandLineParser>
#include <QDir>
#include <QProcess>
#include <QTemporaryFile>
#include <QTimer>
#include <QClipboard>
#include <QApplication>
#include <QFileDialog>
#include <QLabel>
#include <QMessageBox>
#include <QPushButton>
#include <QTextEdit>
#include <QVBoxLayout>
#include <QWidget>
#include <QHBoxLayout>
#include <QDateTime>
bool takeScreenshot(const QString& outputPath) {
int exitCode = QProcess::execute("spectacle", QStringList()
<< "-b" << "-r" << "-n" << "-o" << outputPath);
return exitCode == 0;
}
struct OcrResult {
QString text;
bool success;
QString errorMessage;
};
OcrResult extractText(const QString& imagePath, const QString& language) {
OcrResult result;
result.success = true;
tesseract::TessBaseAPI* ocr = new tesseract::TessBaseAPI();
if (ocr->Init(nullptr, language.toUtf8().constData())) {
delete ocr;
result.success = false;
result.errorMessage =
"Error initializing Tesseract OCR for language: " + language;
return result;
}
Pix* image = pixRead(imagePath.toUtf8().constData());
if (!image) {
ocr->End();
delete ocr;
result.success = false;
result.errorMessage = "Failed to load image";
return result;
}
ocr->SetImage(image);
char* outText = ocr->GetUTF8Text();
result.text = QString::fromUtf8(outText);
delete[] outText;
pixDestroy(&image);
ocr->End();
delete ocr;
return result;
}
int main(int argc, char* argv[]) {
QApplication app(argc, argv);
QCommandLineParser parser;
parser.setApplicationDescription("Extract text from spectacle screenshots using OCR");
parser.addHelpOption();
QCommandLineOption langOption(
QStringList() << "lang",
"Language(s) for OCR (e.g., eng, hin, or eng+hin for multiple languages)",
"language", "eng");
parser.addOption(langOption);
parser.process(app);
QString language = parser.value(langOption);
QWidget window;
window.setWindowTitle("Spectacle Screenshot OCR - Language: " + language);
window.resize(500, 400);
QVBoxLayout* layout = new QVBoxLayout();
QLabel* label = new QLabel();
layout->addWidget(label);
QTextEdit* textEdit = new QTextEdit();
textEdit->setMinimumHeight(100);
layout->addWidget(textEdit);
QWidget* buttonContainer = new QWidget();
QHBoxLayout* buttonLayout = new QHBoxLayout(buttonContainer);
QPushButton* copyButton = new QPushButton("Copy Text");
QPushButton* saveButton = new QPushButton("Save Text");
QPushButton* saveImageButton = new QPushButton("Save Image");
buttonLayout->addWidget(copyButton);
buttonLayout->addWidget(saveButton);
buttonLayout->addWidget(saveImageButton);
layout->addWidget(buttonContainer);
window.setLayout(layout);
QString tempPath = QDir::tempPath() + "/screenshot.png";
QObject::connect(copyButton, &QPushButton::clicked, [&]() {
if (!textEdit->toPlainText().isEmpty()) {
QApplication::clipboard()->setText(textEdit->toPlainText());
label->setText("Text copied to clipboard");
}
else {
label->setText("No text to copy");
}
});
QObject::connect(saveButton, &QPushButton::clicked, [&]() {
if (!textEdit->toPlainText().isEmpty()) {
QString fileName = QFileDialog::getSaveFileName(
&window, "Save OCR Text", QDir::homePath(),
"Text Files (*.txt);;All Files (*)");
if (!fileName.isEmpty()) {
QFile file(fileName);
if (file.open(QIODevice::WriteOnly | QIODevice::Text)) {
QTextStream out(&file);
out << textEdit->toPlainText();
file.close();
label->setText("Text saved to file");
}
else {
label->setText("Failed to save file");
QMessageBox::critical(&window, "Error", "Failed to save the file");
}
}
}
else {
label->setText("No text to save");
}
});
QObject::connect(saveImageButton, &QPushButton::clicked, [&]() {
QString timestamp = QDateTime::currentDateTime().toString("yyyyMMdd_hhmmss");
QString defaultImageName = QDir::homePath() + "/Screenshot_" + timestamp;
QString imageFileName = QFileDialog::getSaveFileName(
&window, "Save Screenshot", defaultImageName,
"Image Files (*.png);;All Files (*)");
if (!imageFileName.isEmpty()) {
if (QFile::copy(tempPath, imageFileName))
label->setText("Screenshot saved successfully");
else {
label->setText("Failed to save screenshot");
QMessageBox::critical(&window, "Error", "Failed to save the screenshot file");
}
}
});
if (takeScreenshot(tempPath)) {
OcrResult result = extractText(tempPath, language);
if (!result.success) {
textEdit->setText("");
label->setText(result.errorMessage);
}
else {
textEdit->setText(result.text);
label->setText(
"Text extracted successfully.");
}
window.show();
}
else {
textEdit->setText("");
label->setText("Error occurred while taking screenshot");
window.show();
QMessageBox::critical(&window, "Error",
"Failed to launch Spectacle or take screenshot");
}
return app.exec();
}
Then to build it, create a file: simple.pro
with contents:
QT += core widgets gui
CONFIG += c++17
TARGET = spectacle-ocr-screenshot
TEMPLATE = app
SOURCES += main.cpp
# Use pkg-config to find Tesseract and Leptonica
unix:!macx {
CONFIG += link_pkgconfig
PKGCONFIG += tesseract lept
}
And then make it:
qmake6 simple.pro
make
You should have qt6-base
and tesseract
installed.
The compiled binary can be placed in the PATH and a keyboard shortcut can be set to easily access it.
You can find more details on my github.
https:// github. com/funinkina/ spectacle-ocr-screenshot/
(sorry for the spaces, it doesnt allow links)
Very nice!
Here the repository as a link
You could do
auto ocr = std::make_unique<tesseract::TessBaseAPI>();
and not have to care abou the delete anymore