OCR for Spectacle

xNefas · December 17, 2024, 12:29pm

This is what most closely matches what an integration with Tesseract could function like. Amazing how flexible Linux is, as I’ve used Windows for well over 15 years from XP to 11, Linux never stops surprising me despite being on it for over 6 months now.
Ahem, I went on a tangent, vouching for this idea, an integration with Tesseract right into Spectacle, similar to what ShareX does on Windows with its own tools, would be great.

waldauf · December 17, 2024, 2:38pm

My solution is based on:

Xorg part is commented (kept therefore compatibility).
Everything is stored in Screenshots dir.
I played with picture resize but it seems to me doesn’t affect OCR quality.
Tesseract by default doesn’t keep UNIX new line, so there is sed processing.
I think it is pretty easy to understand.

#!/usr/bin/env bash

PICTURE=${HOME}/__TMP__/Screenshots/Screenshot_$(date +%Y%m%d_%H%M%S)_OCR.png
PICTURE_RESIZED=${HOME}/__TMP__/Screenshots/Screenshot_$(date +%Y%m%d_%H%M%S)_OCR_resized.png
CR=$(printf '\r')

spectacle -r -o ${PICTURE} -b -n 2>/dev/null

if [ -s "${PICTURE}" ]; then
    # if [ "$XDG_SESSION_TYPE" = "x11" ]; then
    #     tesseract -l eng+ces $TMPFILE - | xclip -selection clipboard
    # fi

    # if [ "$XDG_SESSION_TYPE" = "wayland" ]; then
        # mogrify -modulate 100,0 -resize 400% png:- ${TMPFILE} | tesseract -l eng+ces $TMPFILE - | wl-copy
            # I'm trying to resize picture, for better OCR process by Tesseract. But maybe it's not necessary
        convert ${PICTURE} -resize 400% ${PICTURE_RESIZED}
        # Tesseract seems to determine line feeds perferctly fine but it only inserts 
        #  the Line Feed character (0x0A) and not the carriage return character that 
        #  a windows text file expects. (0x0D 0x0A)
        TEXT="$(tesseract --psm 6 -l ces ${PICTURE_RESIZED} - | sed "s/\$/$CR/")"
        echo ${TEXT} | wl-copy
    # fi
    kdialog --passivepopup "$(echo ${TEXT})" 7 --title "OCR"
fi

cleanerspam · January 3, 2025, 7:20am

Please bring this feature it can be optional and disabled by default so that it does not consumes resources for people who don’t need it

AtaGunZ · January 13, 2025, 7:04pm

Thanks for this.

I specified “LANGUAGES=jpn+eng” and added “-l $LANGUAGES” to the tesseract part. It dynamically selects the language based on the highest confidence, and thanks to this, it now works even better than the powertoys implementation I was used to.

(The OCR accuracy is better with Japanese than with English so far )

gbyte · February 18, 2025, 11:11am

Updated version of @waldauf:

Deleting temp files after creation
Using ‘magick’ instead of deprecated ‘convert’ command

#!/usr/bin/env bash

PICTURE="${HOME}/_temp_Screenshot_$(date +%Y%m%d_%H%M%S)_OCR.png"
PICTURE_RESIZED="${HOME}/_temp_Screenshot_$(date +%Y%m%d_%H%M%S)_OCR_resized.png"
CR=$(printf '\r')

spectacle -r -o "${PICTURE}" -b -n 2>/dev/null

if [ -s "${PICTURE}" ]; then
  # Resize picture for better OCR processing (optional)
  magick "${PICTURE}" -resize 400% "${PICTURE_RESIZED}"
  
  # Perform OCR and format text for Windows compatibility
  TEXT="$(tesseract --psm 6 -l ces "${PICTURE_RESIZED}" - | sed "s/\$/${CR}/")"

  # Display extracted text
  kdialog --passivepopup "$(echo "${TEXT}")" 7 --title "OCR"

  # Cleanup temporary files
  rm -f "${PICTURE}" "${PICTURE_RESIZED}"
fi

This is indeed a great showcase of the power of linux

gbyte · February 18, 2025, 1:14pm

This version uses kate to display the text instead of using the dialog. I find it more convenient plus you don’t need wl-clipboard nor kdialog dependencies.

#!/usr/bin/env bash

PICTURE="$HOME/_temp_Screenshot_$(date +%Y%m%d_%H%M%S)_OCR.png"
PICTURE_RESIZED="$HOME/_temp_Screenshot_$(date +%Y%m%d_%H%M%S)_OCR_resized.png"
CR=$(printf '\r')

spectacle -r -o "$PICTURE" -b -n 2>/dev/null

if [ -s "$PICTURE" ]; then
  # Resize picture for better OCR processing (optional)
  magick "$PICTURE" -resize 400% "$PICTURE_RESIZED"

  # Perform OCR and format text for Windows compatibility
  TEXT="$(tesseract --psm 6 "$PICTURE_RESIZED" - | sed "s/$/$CR/")"

  # Cleanup temporary files
  rm -f "$PICTURE" "$PICTURE_RESIZED"

  # Open text in Kate
  echo "$TEXT" | kate -i
fi

vektor · February 18, 2025, 5:04pm

Hey gbyte

Thanks for your version of the script. I changed it to suit my needs and it works great.

Vektor

funinkina · March 12, 2025, 3:16pm

I created a c++ program that uses spectacle to take a screenshot, and tesseract OCR to extract the text. The extracted text is then displayed in a window made using qt6. The only dependency it needs is spectacle and tesseract and relevant language packs. You can also pass in language parameters.

Here is the cpp code:

#include <leptonica/allheaders.h>
#include <tesseract/baseapi.h>

#include <QCommandLineParser>
#include <QDir>
#include <QProcess>
#include <QTemporaryFile>
#include <QTimer>
#include <QClipboard>
#include <QApplication>
#include <QFileDialog>
#include <QLabel>
#include <QMessageBox>
#include <QPushButton>
#include <QTextEdit>
#include <QVBoxLayout>
#include <QWidget>
#include <QHBoxLayout>
#include <QDateTime>

bool takeScreenshot(const QString& outputPath) {
    int exitCode = QProcess::execute("spectacle", QStringList()
        << "-b" << "-r" << "-n" << "-o" << outputPath);
    return exitCode == 0;
}

struct OcrResult {
    QString text;
    bool success;
    QString errorMessage;
};

OcrResult extractText(const QString& imagePath, const QString& language) {
    OcrResult result;
    result.success = true;

    tesseract::TessBaseAPI* ocr = new tesseract::TessBaseAPI();

    if (ocr->Init(nullptr, language.toUtf8().constData())) {
        delete ocr;
        result.success = false;
        result.errorMessage =
            "Error initializing Tesseract OCR for language: " + language;
        return result;
    }

    Pix* image = pixRead(imagePath.toUtf8().constData());
    if (!image) {
        ocr->End();
        delete ocr;
        result.success = false;
        result.errorMessage = "Failed to load image";
        return result;
    }

    ocr->SetImage(image);

    char* outText = ocr->GetUTF8Text();
    result.text = QString::fromUtf8(outText);

    delete[] outText;
    pixDestroy(&image);
    ocr->End();
    delete ocr;

    return result;
}

int main(int argc, char* argv[]) {
    QApplication app(argc, argv);

    QCommandLineParser parser;
    parser.setApplicationDescription("Extract text from spectacle screenshots using OCR");
    parser.addHelpOption();

    QCommandLineOption langOption(
        QStringList() << "lang",
        "Language(s) for OCR (e.g., eng, hin, or eng+hin for multiple languages)",
        "language", "eng");
    parser.addOption(langOption);
    parser.process(app);

    QString language = parser.value(langOption);

    QWidget window;
    window.setWindowTitle("Spectacle Screenshot OCR - Language: " + language);
    window.resize(500, 400);

    QVBoxLayout* layout = new QVBoxLayout();

    QLabel* label = new QLabel();
    layout->addWidget(label);

    QTextEdit* textEdit = new QTextEdit();
    textEdit->setMinimumHeight(100);
    layout->addWidget(textEdit);

    QWidget* buttonContainer = new QWidget();
    QHBoxLayout* buttonLayout = new QHBoxLayout(buttonContainer);

    QPushButton* copyButton = new QPushButton("Copy Text");
    QPushButton* saveButton = new QPushButton("Save Text");
    QPushButton* saveImageButton = new QPushButton("Save Image");

    buttonLayout->addWidget(copyButton);
    buttonLayout->addWidget(saveButton);
    buttonLayout->addWidget(saveImageButton);
    layout->addWidget(buttonContainer);

    window.setLayout(layout);

    QString tempPath = QDir::tempPath() + "/screenshot.png";

    QObject::connect(copyButton, &QPushButton::clicked, [&]() {
        if (!textEdit->toPlainText().isEmpty()) {
            QApplication::clipboard()->setText(textEdit->toPlainText());
            label->setText("Text copied to clipboard");
        }
        else {
            label->setText("No text to copy");
        }
        });

    QObject::connect(saveButton, &QPushButton::clicked, [&]() {
        if (!textEdit->toPlainText().isEmpty()) {
            QString fileName = QFileDialog::getSaveFileName(
                &window, "Save OCR Text", QDir::homePath(),
                "Text Files (*.txt);;All Files (*)");

            if (!fileName.isEmpty()) {
                QFile file(fileName);
                if (file.open(QIODevice::WriteOnly | QIODevice::Text)) {
                    QTextStream out(&file);
                    out << textEdit->toPlainText();
                    file.close();
                    label->setText("Text saved to file");
                }
                else {
                    label->setText("Failed to save file");
                    QMessageBox::critical(&window, "Error", "Failed to save the file");
                }
            }
        }
        else {
            label->setText("No text to save");
        }
        });

    QObject::connect(saveImageButton, &QPushButton::clicked, [&]() {
        QString timestamp = QDateTime::currentDateTime().toString("yyyyMMdd_hhmmss");
        QString defaultImageName = QDir::homePath() + "/Screenshot_" + timestamp;
        QString imageFileName = QFileDialog::getSaveFileName(
            &window, "Save Screenshot", defaultImageName,
            "Image Files (*.png);;All Files (*)");
        if (!imageFileName.isEmpty()) {
            if (QFile::copy(tempPath, imageFileName))
                label->setText("Screenshot saved successfully");
            else {
                label->setText("Failed to save screenshot");
                QMessageBox::critical(&window, "Error", "Failed to save the screenshot file");
            }
        }
        });

    if (takeScreenshot(tempPath)) {
        OcrResult result = extractText(tempPath, language);

        if (!result.success) {
            textEdit->setText("");
            label->setText(result.errorMessage);
        }
        else {
            textEdit->setText(result.text);
            label->setText(
                "Text extracted successfully.");
        }
        window.show();
    }
    else {
        textEdit->setText("");
        label->setText("Error occurred while taking screenshot");
        window.show();
        QMessageBox::critical(&window, "Error",
            "Failed to launch Spectacle or take screenshot");
    }

    return app.exec();
}

Then to build it, create a file: simple.pro with contents:

QT += core widgets gui

CONFIG += c++17

TARGET = spectacle-ocr-screenshot
TEMPLATE = app

SOURCES += main.cpp

# Use pkg-config to find Tesseract and Leptonica
unix:!macx {
    CONFIG += link_pkgconfig
    PKGCONFIG += tesseract lept
}

And then make it:

qmake6 simple.pro
make

You should have qt6-base and tesseract installed.
The compiled binary can be placed in the PATH and a keyboard shortcut can be set to easily access it.

You can find more details on my github.
https:// github. com/funinkina/ spectacle-ocr-screenshot/
(sorry for the spaces, it doesnt allow links)

krake · March 12, 2025, 3:42pm

Very nice!

Here the repository as a link

You could do

auto ocr = std::make_unique<tesseract::TessBaseAPI>();

and not have to care abou the delete anymore