This is what most closely matches what an integration with Tesseract could function like. Amazing how flexible Linux is, as I’ve used Windows for well over 15 years from XP to 11, Linux never stops surprising me despite being on it for over 6 months now.
Ahem, I went on a tangent, vouching for this idea, an integration with Tesseract right into Spectacle, similar to what ShareX does on Windows with its own tools, would be great.
My solution is based on:
- Xorg part is commented (kept therefore compatibility).
- Everything is stored in Screenshots dir.
- I played with picture resize but it seems to me doesn’t affect OCR quality.
- Tesseract by default doesn’t keep UNIX new line, so there is
sedprocessing. - I think it is pretty easy to understand.
#!/usr/bin/env bash
PICTURE=${HOME}/__TMP__/Screenshots/Screenshot_$(date +%Y%m%d_%H%M%S)_OCR.png
PICTURE_RESIZED=${HOME}/__TMP__/Screenshots/Screenshot_$(date +%Y%m%d_%H%M%S)_OCR_resized.png
CR=$(printf '\r')
spectacle -r -o ${PICTURE} -b -n 2>/dev/null
if [ -s "${PICTURE}" ]; then
# if [ "$XDG_SESSION_TYPE" = "x11" ]; then
# tesseract -l eng+ces $TMPFILE - | xclip -selection clipboard
# fi
# if [ "$XDG_SESSION_TYPE" = "wayland" ]; then
# mogrify -modulate 100,0 -resize 400% png:- ${TMPFILE} | tesseract -l eng+ces $TMPFILE - | wl-copy
# I'm trying to resize picture, for better OCR process by Tesseract. But maybe it's not necessary
convert ${PICTURE} -resize 400% ${PICTURE_RESIZED}
# Tesseract seems to determine line feeds perferctly fine but it only inserts
# the Line Feed character (0x0A) and not the carriage return character that
# a windows text file expects. (0x0D 0x0A)
TEXT="$(tesseract --psm 6 -l ces ${PICTURE_RESIZED} - | sed "s/\$/$CR/")"
echo ${TEXT} | wl-copy
# fi
kdialog --passivepopup "$(echo ${TEXT})" 7 --title "OCR"
fi
Please bring this feature it can be optional and disabled by default so that it does not consumes resources for people who don’t need it
Thanks for this.
I specified “LANGUAGES=jpn+eng” and added “-l $LANGUAGES” to the tesseract part. It dynamically selects the language based on the highest confidence, and thanks to this, it now works even better than the powertoys implementation I was used to.
(The OCR accuracy is better with Japanese than with English so far
)
Updated version of @waldauf:
- Deleting temp files after creation
- Using ‘magick’ instead of deprecated ‘convert’ command
#!/usr/bin/env bash
PICTURE="${HOME}/_temp_Screenshot_$(date +%Y%m%d_%H%M%S)_OCR.png"
PICTURE_RESIZED="${HOME}/_temp_Screenshot_$(date +%Y%m%d_%H%M%S)_OCR_resized.png"
CR=$(printf '\r')
spectacle -r -o "${PICTURE}" -b -n 2>/dev/null
if [ -s "${PICTURE}" ]; then
# Resize picture for better OCR processing (optional)
magick "${PICTURE}" -resize 400% "${PICTURE_RESIZED}"
# Perform OCR and format text for Windows compatibility
TEXT="$(tesseract --psm 6 -l ces "${PICTURE_RESIZED}" - | sed "s/\$/${CR}/")"
# Display extracted text
kdialog --passivepopup "$(echo "${TEXT}")" 7 --title "OCR"
# Cleanup temporary files
rm -f "${PICTURE}" "${PICTURE_RESIZED}"
fi
This is indeed a great showcase of the power of linux ![]()
This version uses kate to display the text instead of using the dialog. I find it more convenient plus you don’t need wl-clipboard nor kdialog dependencies.
#!/usr/bin/env bash
PICTURE="$HOME/_temp_Screenshot_$(date +%Y%m%d_%H%M%S)_OCR.png"
PICTURE_RESIZED="$HOME/_temp_Screenshot_$(date +%Y%m%d_%H%M%S)_OCR_resized.png"
CR=$(printf '\r')
spectacle -r -o "$PICTURE" -b -n 2>/dev/null
if [ -s "$PICTURE" ]; then
# Resize picture for better OCR processing (optional)
magick "$PICTURE" -resize 400% "$PICTURE_RESIZED"
# Perform OCR and format text for Windows compatibility
TEXT="$(tesseract --psm 6 "$PICTURE_RESIZED" - | sed "s/$/$CR/")"
# Cleanup temporary files
rm -f "$PICTURE" "$PICTURE_RESIZED"
# Open text in Kate
echo "$TEXT" | kate -i
fi
Hey gbyte
Thanks for your version of the script. I changed it to suit my needs and it works great.
Vektor
I created a c++ program that uses spectacle to take a screenshot, and tesseract OCR to extract the text. The extracted text is then displayed in a window made using qt6. The only dependency it needs is spectacle and tesseract and relevant language packs. You can also pass in language parameters.
Here is the cpp code:
#include <leptonica/allheaders.h>
#include <tesseract/baseapi.h>
#include <QCommandLineParser>
#include <QDir>
#include <QProcess>
#include <QTemporaryFile>
#include <QTimer>
#include <QClipboard>
#include <QApplication>
#include <QFileDialog>
#include <QLabel>
#include <QMessageBox>
#include <QPushButton>
#include <QTextEdit>
#include <QVBoxLayout>
#include <QWidget>
#include <QHBoxLayout>
#include <QDateTime>
bool takeScreenshot(const QString& outputPath) {
int exitCode = QProcess::execute("spectacle", QStringList()
<< "-b" << "-r" << "-n" << "-o" << outputPath);
return exitCode == 0;
}
struct OcrResult {
QString text;
bool success;
QString errorMessage;
};
OcrResult extractText(const QString& imagePath, const QString& language) {
OcrResult result;
result.success = true;
tesseract::TessBaseAPI* ocr = new tesseract::TessBaseAPI();
if (ocr->Init(nullptr, language.toUtf8().constData())) {
delete ocr;
result.success = false;
result.errorMessage =
"Error initializing Tesseract OCR for language: " + language;
return result;
}
Pix* image = pixRead(imagePath.toUtf8().constData());
if (!image) {
ocr->End();
delete ocr;
result.success = false;
result.errorMessage = "Failed to load image";
return result;
}
ocr->SetImage(image);
char* outText = ocr->GetUTF8Text();
result.text = QString::fromUtf8(outText);
delete[] outText;
pixDestroy(&image);
ocr->End();
delete ocr;
return result;
}
int main(int argc, char* argv[]) {
QApplication app(argc, argv);
QCommandLineParser parser;
parser.setApplicationDescription("Extract text from spectacle screenshots using OCR");
parser.addHelpOption();
QCommandLineOption langOption(
QStringList() << "lang",
"Language(s) for OCR (e.g., eng, hin, or eng+hin for multiple languages)",
"language", "eng");
parser.addOption(langOption);
parser.process(app);
QString language = parser.value(langOption);
QWidget window;
window.setWindowTitle("Spectacle Screenshot OCR - Language: " + language);
window.resize(500, 400);
QVBoxLayout* layout = new QVBoxLayout();
QLabel* label = new QLabel();
layout->addWidget(label);
QTextEdit* textEdit = new QTextEdit();
textEdit->setMinimumHeight(100);
layout->addWidget(textEdit);
QWidget* buttonContainer = new QWidget();
QHBoxLayout* buttonLayout = new QHBoxLayout(buttonContainer);
QPushButton* copyButton = new QPushButton("Copy Text");
QPushButton* saveButton = new QPushButton("Save Text");
QPushButton* saveImageButton = new QPushButton("Save Image");
buttonLayout->addWidget(copyButton);
buttonLayout->addWidget(saveButton);
buttonLayout->addWidget(saveImageButton);
layout->addWidget(buttonContainer);
window.setLayout(layout);
QString tempPath = QDir::tempPath() + "/screenshot.png";
QObject::connect(copyButton, &QPushButton::clicked, [&]() {
if (!textEdit->toPlainText().isEmpty()) {
QApplication::clipboard()->setText(textEdit->toPlainText());
label->setText("Text copied to clipboard");
}
else {
label->setText("No text to copy");
}
});
QObject::connect(saveButton, &QPushButton::clicked, [&]() {
if (!textEdit->toPlainText().isEmpty()) {
QString fileName = QFileDialog::getSaveFileName(
&window, "Save OCR Text", QDir::homePath(),
"Text Files (*.txt);;All Files (*)");
if (!fileName.isEmpty()) {
QFile file(fileName);
if (file.open(QIODevice::WriteOnly | QIODevice::Text)) {
QTextStream out(&file);
out << textEdit->toPlainText();
file.close();
label->setText("Text saved to file");
}
else {
label->setText("Failed to save file");
QMessageBox::critical(&window, "Error", "Failed to save the file");
}
}
}
else {
label->setText("No text to save");
}
});
QObject::connect(saveImageButton, &QPushButton::clicked, [&]() {
QString timestamp = QDateTime::currentDateTime().toString("yyyyMMdd_hhmmss");
QString defaultImageName = QDir::homePath() + "/Screenshot_" + timestamp;
QString imageFileName = QFileDialog::getSaveFileName(
&window, "Save Screenshot", defaultImageName,
"Image Files (*.png);;All Files (*)");
if (!imageFileName.isEmpty()) {
if (QFile::copy(tempPath, imageFileName))
label->setText("Screenshot saved successfully");
else {
label->setText("Failed to save screenshot");
QMessageBox::critical(&window, "Error", "Failed to save the screenshot file");
}
}
});
if (takeScreenshot(tempPath)) {
OcrResult result = extractText(tempPath, language);
if (!result.success) {
textEdit->setText("");
label->setText(result.errorMessage);
}
else {
textEdit->setText(result.text);
label->setText(
"Text extracted successfully.");
}
window.show();
}
else {
textEdit->setText("");
label->setText("Error occurred while taking screenshot");
window.show();
QMessageBox::critical(&window, "Error",
"Failed to launch Spectacle or take screenshot");
}
return app.exec();
}
Then to build it, create a file: simple.pro with contents:
QT += core widgets gui
CONFIG += c++17
TARGET = spectacle-ocr-screenshot
TEMPLATE = app
SOURCES += main.cpp
# Use pkg-config to find Tesseract and Leptonica
unix:!macx {
CONFIG += link_pkgconfig
PKGCONFIG += tesseract lept
}
And then make it:
qmake6 simple.pro
make
You should have qt6-base and tesseract installed.
The compiled binary can be placed in the PATH and a keyboard shortcut can be set to easily access it.
You can find more details on my github.
https:// github. com/funinkina/ spectacle-ocr-screenshot/
(sorry for the spaces, it doesnt allow links)
Very nice!
Here the repository as a link
You could do
auto ocr = std::make_unique<tesseract::TessBaseAPI>();
and not have to care abou the delete anymore ![]()
I was just about to post and request the same but thought i can add to this instead.
I’m coming from windows world and recently (2-3 month back) moved entirely to Linux as i hate all the AI crap being integrated to windows + ads etc etc.
So i consider myself a Linux noob but so far i’m loving it. I’m on Fedora Plasma KDE and this is a solid experience. I use Spectacle ALLOT. And yes i also hugely miss the OCR capture as i had in snippet in windows.
Spectacle works very very well and is an impressive tool so i wish the devs would build OCR in the app with a nice icon in the menu so we can capture the text easily.
I see a ton of solutions above but nothing will replace a real integrated OCR reader frankly. I just like to have this built-in and working just as well as its working now without tinkering and third-party etc etc.
I love the Linux community and the innovative thinking to find solutions but in the end nothing beats a well tested and nicely packaged product like spectacle with the feature that so many are missing.
UPDATE:
MR with OCR in Spectacle, using tesseract is in the pipeline:
(remove spaces to access link)
invent.kde. org/ plasma/spectacle/-/merge_requests/462
Great news!!! Thanks for sharing. I’m looking forward to it.
I have this bound to super+c on my machine
spectacle -rnbo /proc/self/fd/1 | tesseract stdin stdout | wl-copy
Works in most cases but tesseract spits out gibberish in a few cases (Including a string in the tesseract man page viewed from konsole) another notable failure case is with chemical formulas.
It also does poorly with the capitalization or characters
I wonder what I am doing wrong
hope they can do it better
When is this PR expected to land?
I’ve been using Normcap, and the recent versions work absolutely wonderfully. The OCR hasn’t let me down, and there are some additional features. You can also customize hotkeys.
I am pleased to announce that Spectacle now has OCR (this is the first iteration and only extracts text).