scanned PDF files converted to Word files Tópico cartaz: Emilia Delibasheva
|
Hello, I have a large volume of PDF files and I have to edit them. I did some research work on the Internet and realized that there was software converting PDF files to Word docs. However, mine are not true PDF files but they are scanned. Is it possible at all to perform such kind of conversion? Thank you. | | |
|
Natalie Polônia Local time: 12:24 Membro (2002) inglês para russo + ... Moderador deste fórum SITE LOCALIZER Of course, it is possible | Feb 2, 2013 |
Please make a search in this forum - you will find a number of previous threads on the same topic. Imho, the best software for doing this is FineReader, though there exists many other programs that do the same. | | |
Triston Goodwin Estados Unidos Local time: 04:24 espanhol para inglês + ... You sure can! | Feb 2, 2013 |
Natalie wrote: Please make a search in this forum - you will find a number of previous threads on the same topic. Imho, the best software for doing this is FineReader, though there exists many other programs that do the same. Here's a link to the finereader, since it can be a little tricky to find sometimes http://www.abbyy.com/ I haven't had that much luck with these programs, since they don't catch accent marks, so I typically just translate directly or use dragon and read it over first. If the image is clean and the program set up right, you shouldn't have to much trouble though. | |
|
|
Michel de Ruyter Finlândia Local time: 13:24 Membro (2011) inglês para holandês + ... |
Emilia Delibasheva Local time: 13:24 Membro (2005) inglês para búlgaro + ... CRIADOR(A) DO TÓPICO |
finnword1 Estados Unidos Local time: 06:24 inglês para finlandês + ...
I use a separate OCR program. I can then make necessary adjustments, depending on the quality of the scanner image. | | |
FineReader is what I use. | |
|
|
Emilia Delibasheva Local time: 13:24 Membro (2005) inglês para búlgaro + ... CRIADOR(A) DO TÓPICO |
Emma Goldsmith Espanha Local time: 12:24 Membro (2004) espanhol para inglês Quality of scanned pdf | Feb 3, 2013 |
Triston Goodwin wrote: I haven't had that much luck with these programs, since they don't catch accent marks If you set the language correctly before you OCR the document, Abbyy Findreader and other programs should certainly catch accents. Of course, much depends on the quality of the scanned PDF. If you have a lot of background noise (a vertical line crossing through all pages, stamps placed on top of text, etc.) then no program will be able to decipher what the text says. But real people might not be able to in that case, either! | | |
Emma Goldsmith wrote: Triston Goodwin wrote: I haven't had that much luck with these programs, since they don't catch accent marks If you set the language correctly before you OCR the document, Abbyy Findreader and other programs should certainly catch accents. Backed. I used to think that OCR was pretty much unusable, esp. with languages with accented characters. This might have been the case a decade ago, but it is definitely not any more. They use very smart algorithms to determine what each character might logically be and do a somewhat decent job of formatting. As an example, ABBYY consistently gets u, ü and ű right. Maybe it prints an ü instead of an ű or an o instead of an ö in one case out of a thousand. Whey you look it up in the source text you're likely to find that the image quality was abysmal at that spot. That said, for translation, it's generally better to use a setting that does not conserve much of the formatting and format the output text at the end. Otherwise, you end up with text boxes all over the place and mis-recognized headers and so on. ABBYY Finereader recognizes Hungarian text pretty much perfectly, even if the image quality leaves a lot to be desired. I'm impressed.
[Edited at 2013-02-03 09:57 GMT] | | |
Rolf Keller Alemanha Local time: 12:24 inglês para alemão Catch false/missing accents etc. automatically | Feb 3, 2013 |
FarkasAndras wrote: As an example, ABBYY consistently gets u, ü and ű right. Maybe it prints an ü instead of an ű or an o instead of an ö in one case out of a thousand. Just get a good spellchecker for that language to run over the ocr'ed text. | | |