Thanks to visit codestin.com
Credit goes to github.com

Skip to content

OCR scan of screenshot #603

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hum4nizer opened this issue Mar 8, 2021 · 30 comments
Closed

OCR scan of screenshot #603

hum4nizer opened this issue Mar 8, 2021 · 30 comments
Assignees

Comments

@hum4nizer
Copy link

hum4nizer commented Mar 8, 2021

Is your feature request related to a problem? Please describe.
No. It is not related to a problem.

Describe the solution you'd like
I would like the feature to OCR scan the screenshot when a screenshot is taken to extract text from the picture.

Additional context
Thanks a bunch for a awesome piece of software!

@DamirPorobic
Copy link
Member

I've been thinking about this feature for quite some time but haven't found any simple solution. We need to figure out what external library can be used for this and hot to interact with it. But definitely would a nice addition to ksnip.

@DamirPorobic
Copy link
Member

https://github.com/tesseract-ocr/ might be something that could work.

@hum4nizer
Copy link
Author

Exactly! ShareX uses the tesseract OCR engine. And it works for them.

@DamirPorobic
Copy link
Member

@fnkabit what do you think about this feature? Would be nice to have something like this in the application but I personally haven't worked with OCR libraries yet. Do you have any experience?

@fnkabit
Copy link
Contributor

fnkabit commented Mar 11, 2021

@DamirPorobic . Sure, would like to work on this.
Except the tesseract integration, how do you see this working ? ie. do we add a tool for making a selection, perform OCR, and then display a dialog with the transcription ?

@fnkabit
Copy link
Contributor

fnkabit commented Mar 11, 2021

@DamirPorobic To answer your question, I don't have any experience working with OCR.

@DamirPorobic
Copy link
Member

Maybe @hum4nizer can describe how it works in ShareX but for beginners I had in mind to have a button in the file menu that triggers OCR and displays a dialog with all text it has found, something in that direction, later on we can get more fancy.

Regarding the code, I think it would be nice to have a nice separation, hide the OCR stuff behind an adapter and an interface. Also, maybe we should consider that Tesseract might not be always available when building ksnip so cmake should check for it when building and when not found the option in the filemenu should be grayed out, something like that.

This is probably a larger feature, maybe start small and see how it works.

@fnkabit
Copy link
Contributor

fnkabit commented Mar 11, 2021

@DamirPorobic Sounds good.
Did a YT search to see how this works in ShareX: https://www.youtube.com/watch?v=t629fruq1Z0

@DamirPorobic
Copy link
Member

Not far from what I had in mind. Would be cool to be able to do something like this.
ShareX is open source as far as I know, maybe you can have a look into their code for hints

@fnkabit
Copy link
Contributor

fnkabit commented Mar 11, 2021

@DamirPorobic Thanks ! I will implement this next week.

@hum4nizer
Copy link
Author

hum4nizer commented Mar 16, 2021

Hi! Im really excited about this new feature. Good luck with the implementation.

@fnkabit
Copy link
Contributor

fnkabit commented Mar 29, 2021

@hum4nizer Thank you !

Sorry guys, didn't have time to start working on this; have been really busy the last two weeks.
Next week would be better, I hope.

@DamirPorobic
Copy link
Member

I'm hoping to look into this for the next minor release. An issue that I'm still thinking about is what way to go, either implement it in ksnip or call an external tool from ksnip. What worries me with implementing in knsip directly is the size of such OCR software, it seems to be much larger then ksnip.

@DamirPorobic
Copy link
Member

Maybe a plugin approach could be doable, something like done here https://doc.qt.io/qt-5/qtwidgets-tools-echoplugin-example.html

@raphaelh
Copy link
Contributor

There's https://ocr.space/OCRAPI

Each user could register her/his own API key

@DamirPorobic
Copy link
Member

I had more of a local version in mind, without an API. You trigger OCR and get a dialog window with all the text that you can then copy or whatever.
@raphaelh Have you used this API? In what form do you get the return value?

@raphaelh
Copy link
Contributor

I've contributed the API option because you said:

What worries me with implementing in knsip directly is the size of such OCR software, it seems to be much larger then ksnip.

When installing the tesseract package under Ubuntu it takes 16,3 MB (tesseract-ocr, tesseract-ocr-eng, tesseract-ocr-osd). If I add tesseract-ocr-fra (French language), it takes 1 145 KB

ksnip-1.9.1.deb is 710 KB

There's also https://github.com/PaddlePaddle/PaddleOCR, it says on the github page:

Ultra lightweight PP-OCRv2 series models: detection (3.1M) + direction classifier (1.4M) + recognition 8.5M) = 13.0M

I haven't used https://ocr.space/OCRAPI directly. I know about it because I'm using the Copyfish browser extension (https://addons.mozilla.org/fr/firefox/addon/copyfish-ocr-software/) which works well for my needs (copy text from images or PDF files while I'm browsing).

@DamirPorobic DamirPorobic assigned DamirPorobic and unassigned fnkabit Nov 22, 2021
@DamirPorobic
Copy link
Member

DamirPorobic commented Nov 27, 2021

I'm working on the OCR support and I must say that I'm bit surprised by Tesseract's weak performance:

image

I thought the OCR development had achieved more by now.

@DamirPorobic
Copy link
Member

Same image triggered via command line looks better
image

Maybe my API call requires some improvement.

@hum4nizer
Copy link
Author

It looks way better in the command line test for sure! Good luck with the development. I'm really looking forward for this feature. Thanks!

DamirPorobic added a commit that referenced this issue Dec 11, 2021
DamirPorobic added a commit that referenced this issue Mar 6, 2022
@DamirPorobic
Copy link
Member

This is implemented now, I have to write some tests but in general can be tested now. Let me know what you think.

@SM-26
Copy link
Contributor

SM-26 commented May 30, 2022

This is implemented now, I have to write some tests but in general can be tested now. Let me know what you think.

This look really amazing, how can I test it?

When I click the OCR button on Options
a new window opens up, a blank text window
what is the next step?

image

sm26@sm26-Latitude-3420:~$ ksnip -v
Debug: SingleInstance mode detected, we are the client.
Debug: X11ImageGrabber selected.
Version: 1.10.1-continuous
Build: 1-2009073

OCR used:ksnip-plugin-ocr-0.1.0-continuous.deb
Build Time: Sat, 19 Mar 2022 09:36:14

@DamirPorobic
Copy link
Member

That should be actually working. Do you see any message text there saying that the text is being processed?

@SM-26
Copy link
Contributor

SM-26 commented May 30, 2022

@DamirPorobic nope, I don't see any message. where should I look?
image

the windows of OCR is a text box I can edit, but it doesn't have anything in there ATM.

@DamirPorobic
Copy link
Member

There, before the inner text box comes where you write, there should be a label saying something like "Processing text..." and when done, the label is hidden and the text box comes up. Strange, doesn't look right.

One more thing, can you try some black text on white background? Maybe with few sentences so the process takes a few seconds more.

@SM-26
Copy link
Contributor

SM-26 commented May 30, 2022

Nope, sorry.
I don't see any label, and the OCR windows pops up instantly.

black text and white background test seems the same
image

@DamirPorobic
Copy link
Member

Ok, thanks, must have a look into the code, it seems to have a bug

@MichelDiz
Copy link

hey all, how do I build the ksnip-plugin-ocr? Theres no make file. And no instructions to do so in Windows. Thanks!

@DamirPorobic
Copy link
Member

There are prebuild binaries, also for windows https://github.com/ksnip/ksnip-plugin-ocr/releases

Building it locally is quit cumbersome due to a lot of dependencies of OCR. If you still want to build it locally, you can see how the pipeline builds it https://github.com/ksnip/ksnip-plugin-ocr/blob/master/.github/workflows/windows.yml

@MichelDiz
Copy link

Okay thanks! that's good enough. For some reason I hadn't seen it. I thought it was TAR ball or something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants