scanned PDF files converted to Word files (Software applications)

Fóruns técnicos » Software applications »
scanned PDF files converted to Word files
Track this topic

scanned PDF files converted to Word files

Tópico cartaz: Emilia Delibasheva

Emilia Delibasheva

Local time: 13:24
Membro (2005)
inglês para búlgaro
+ ...

Feb 2, 2013

Hello,

I have a large volume of PDF files and I have to edit them. I did some research work on the Internet and realized that there was software converting PDF files to Word docs. However, mine are not true PDF files but they are scanned. Is it possible at all to perform such kind of conversion? Thank you.

Walter Landesman

Uruguai
Local time: 07:24
inglês para espanhol
+ ...

Nop

Feb 2, 2013

No, I don't think so.

Natalie

Polônia
Local time: 12:24
Membro (2002)
inglês para russo
+ ...

Moderador deste fórum

SITE LOCALIZER

Of course, it is possible

Feb 2, 2013

Please make a search in this forum - you will find a number of previous threads on the same topic.
Imho, the best software for doing this is FineReader, though there exists many other programs that do the same.

Triston Goodwin

Estados Unidos
Local time: 04:24
espanhol para inglês
+ ...

You sure can!

Feb 2, 2013

Natalie wrote:

Please make a search in this forum - you will find a number of previous threads on the same topic.
Imho, the best software for doing this is FineReader, though there exists many other programs that do the same.

Here's a link to the finereader, since it can be a little tricky to find sometimes http://www.abbyy.com/

I haven't had that much luck with these programs, since they don't catch accent marks, so I typically just translate directly or use dragon and read it over first.

If the image is clean and the program set up right, you shouldn't have to much trouble though.

Michel de Ruyter

Finlândia
Local time: 13:24
Membro (2011)
inglês para holandês
+ ...

here for example:

Feb 2, 2013

http://www.proz.com/forum/wordfast_support/195890-wordfast_anywhere_announces_support_for_scanned_pdfs.html

Emilia Delibasheva

Local time: 13:24
Membro (2005)
inglês para búlgaro
+ ...

CRIADOR(A) DO TÓPICO

Thanks

Feb 2, 2013

Thank you all very much!

finnword1
Estados Unidos
Local time: 06:24
inglês para finlandês
+ ...

OCR

Feb 2, 2013

I use a separate OCR program. I can then make necessary adjustments, depending on the quality of the scanner image.

Angelique Blommaert

Holanda
Local time: 12:24
Membro (2012)
alemão para holandês
+ ...

Works for me

Feb 2, 2013

FineReader is what I use.

Emilia Delibasheva

Local time: 13:24
Membro (2005)
inglês para búlgaro
+ ...

CRIADOR(A) DO TÓPICO

Thanks

Feb 3, 2013

Thank you all.

Emma Goldsmith

Espanha
Local time: 12:24
Membro (2004)
espanhol para inglês

Quality of scanned pdf

Feb 3, 2013

Triston Goodwin wrote:

I haven't had that much luck with these programs, since they don't catch accent marks

If you set the language correctly before you OCR the document, Abbyy Findreader and other programs should certainly catch accents.

Of course, much depends on the quality of the scanned PDF. If you have a lot of background noise (a vertical line crossing through all pages, stamps placed on top of text, etc.) then no program will be able to decipher what the text says. But real people might not be able to in that case, either!

FarkasAndras

Local time: 12:24
inglês para húngaro
+ ...

They work

Feb 3, 2013

Emma Goldsmith wrote:

Triston Goodwin wrote:

I haven't had that much luck with these programs, since they don't catch accent marks

If you set the language correctly before you OCR the document, Abbyy Findreader and other programs should certainly catch accents.

Backed. I used to think that OCR was pretty much unusable, esp. with languages with accented characters. This might have been the case a decade ago, but it is definitely not any more. They use very smart algorithms to determine what each character might logically be and do a somewhat decent job of formatting. As an example, ABBYY consistently gets u, ü and ű right. Maybe it prints an ü instead of an ű or an o instead of an ö in one case out of a thousand. Whey you look it up in the source text you're likely to find that the image quality was abysmal at that spot. That said, for translation, it's generally better to use a setting that does not conserve much of the formatting and format the output text at the end. Otherwise, you end up with text boxes all over the place and mis-recognized headers and so on.
ABBYY Finereader recognizes Hungarian text pretty much perfectly, even if the image quality leaves a lot to be desired. I'm impressed.

[Edited at 2013-02-03 09:57 GMT]

Rolf Keller
Alemanha
Local time: 12:24
inglês para alemão

Catch false/missing accents etc. automatically

Feb 3, 2013

FarkasAndras wrote:

As an example, ABBYY consistently gets u, ü and ű right. Maybe it prints an ü instead of an ű or an o instead of an ö in one case out of a thousand.

Just get a good spellchecker for that language to run over the ocr'ed text.

Login to reply/comment

To report site rules violations or get help, contact a site moderator:

Moderador(es) deste fórum
Natalie	[Call to this topic]
Prachya Mruetusatorn	[Call to this topic]

You can also contact site staff by submitting a support request »

scanned PDF files converted to Word files

Forum rules

Help and orientation

CafeTran Espresso
You've never met a CAT tool this clever! Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free Buy now! »

TM-Town
Manage your TMs and Terms ... and boost your translation business Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work. More info »

Mensagens recentes | FAQ | Regras | Moderadores | Banco de artigos

Your current localization setting

português (Br)

Select a language

More languages...

scanned PDF files converted to Word files

scanned PDF files converted to Word files

You have native languages that can be verified

Your current localization setting

Select a language