|
I have the need to obtain the exit in XML that offers TaggedPdfReaderTool. ConvertToXml
I found a problem with some of them that have accents.
For example, the word 'Número' changes to 'N*famero' 'Página' changes to 'p*e1gina' ... .. On the other hand, if the same PDF is managed to iTextSharp.text.pdf.parser. ITextExtractionStrategy,
GetResultantText returns the whole text correctly.
What do I need to do to get the text in such XML and strategy.GetResultantText returns?
Thank you in advance. Josep Maria Heras ADHOC SYNECTIC SYSTEMS, S.A. - AVISO LEGAL La Informacion incluida en este e-mail es CONFIDENCIAL, siendo para uso exclusivo del destinatario arriba mencionado. Si Ud lee este mensaje y no es el destinatario indicado, le informamos que esta totalmente prohibida cualquier utilizacion, divulgacion, distribucion y/o reproduccion de esta comunicacion, total o parcial, sin autorizacion expresa en virtud de la legislacion vigente. Si ha recibido este mensaje por error, le rogamos nos lo notifique inmediatamente por esta via y proceda a su eliminacion junto con sus ficheros anexos sin leerlo ni grabarlo. ------------------------------------------------------------------------------ Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev _______________________________________________ iText-questions mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ |
|
Hello! ADHOC SYNECTIC SYSTEMS, S.A. - AVISO LEGAL La Informacion incluida en este e-mail es CONFIDENCIAL, siendo para uso exclusivo del destinatario arriba mencionado. Si Ud lee este mensaje y no es el destinatario indicado, le informamos que esta totalmente prohibida cualquier utilizacion, divulgacion, distribucion y/o reproduccion de esta comunicacion, total o parcial, sin autorizacion expresa en virtud de la legislacion vigente. Si ha recibido este mensaje por error, le rogamos nos lo notifique inmediatamente por esta via y proceda a su eliminacion junto con sus ficheros anexos sin leerlo ni grabarlo. ------------------------------------------------------------------------------ Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev _______________________________________________ iText-questions mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ |
|
In reply to this post by jmheras
Please post the pdf and the code you are using so that we can
reproduce the problem.
Paulo
Aviso Legal: Esta mensagem é destinada exclusivamente ao destinatário. Pode conter informação confidencial ou legalmente protegida. A incorrecta transmissão desta mensagem não significa a perca de confidencialidade. Se esta mensagem for recebida por engano, por favor envie-a de volta para o remetente e apague-a do seu sistema de imediato. É proibido a qualquer pessoa que não o destinatário de usar, revelar ou distribuir qualquer parte desta mensagem. Disclaimer: This message is destined exclusively to the intended receiver. It may contain confidential or legally protected information. The incorrect transmission of this message does not mean the loss of its confidentiality. If this message is received by mistake, please send it back to the sender and delete it from your system immediately. It is forbidden to any person who is not the intended receiver to use, distribute or copy any part of this message. ------------------------------------------------------------------------------ Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev _______________________________________________ iText-questions mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ |
|
hi!
I've uploaded the pdf "problem.pdf" - my code to get the whole pdf text with accents and the letter "ñ" is: Dim reader As PdfReader = New PdfReader(pdfByte) Dim strategy As parser.ITextExtractionStrategy For i As Integer = 1 To reader.NumberOfPages strategy = parser.ProcessContent(i, New iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy) Dim sResult As String = strategy.GetResultantText Next - my code to get the XMl that has incorrect words with accents,.. is: Dim reader As PdfReader = New PdfReader(pdfByte) Dim parser As New iTextSharp.text.pdf.parser.PdfReaderContentParser(reader) Dim ms As New System.IO.MemoryStreamproblem.pdf ms.SetLength(0) Dim info As New parser.TaggedPdfReaderTool info.ConvertToXml(reader, ms) IO.File.WriteAllBytes("Z:\usr\JM\bustia\pdf\problem.pdf.xml", ms.ToArray) Thanks! Josep Maria |
|
It's fixed now in the SVN trunk.
Paulo -----Original Message----- From: jmheras [mailto:[hidden email]] Sent: Friday, October 01, 2010 7:44 AM To: [hidden email] Subject: Re: [iText-questions] Text encoding problem? hi! I've uploaded the pdf "problem.pdf" - my code to get the whole pdf text with accents and the letter "ñ" is: Dim reader As PdfReader = New PdfReader(pdfByte) Dim strategy As parser.ITextExtractionStrategy For i As Integer = 1 To reader.NumberOfPages strategy = parser.ProcessContent(i, New iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy) Dim sResult As String = strategy.GetResultantText Next - my code to get the XMl that has incorrect words with accents,.. is: Dim reader As PdfReader = New PdfReader(pdfByte) Dim parser As New iTextSharp.text.pdf.parser.PdfReaderContentParser(reader) Dim ms As New System.IO.MemoryStream http://itext-general.2136553.n4.nabble.com/file/n2848929/problem.pdf problem.pdf ms.SetLength(0) Dim info As New parser.TaggedPdfReaderTool info.ConvertToXml(reader, ms) IO.File.WriteAllBytes("Z:\usr\JM\bustia\pdf\problem.pdf.xml", ms.ToArray) Thanks! Josep Maria -- View this message in context: http://itext-general.2136553.n4.nabble.com/Text-encoding-problem-tp2720960p2848929.html Sent from the iText - General mailing list archive at Nabble.com. ------------------------------------------------------------------------------ Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev _______________________________________________ iText-questions mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/itext-questions Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php Aviso Legal: Esta mensagem é destinada exclusivamente ao destinatário. Pode conter informação confidencial ou legalmente protegida. A incorrecta transmissão desta mensagem não significa a perca de confidencialidade. Se esta mensagem for recebida por engano, por favor envie-a de volta para o remetente e apague-a do seu sistema de imediato. É proibido a qualquer pessoa que não o destinatário de usar, revelar ou distribuir qualquer parte desta mensagem. Disclaimer: This message is destined exclusively to the intended receiver. It may contain confidential or legally protected information. The incorrect transmission of this message does not mean the loss of its confidentiality. If this message is received by mistake, please send it back to the sender and delete it from your system immediately. It is forbidden to any person who is not the intended receiver to use, distribute or copy any part of this message. ------------------------------------------------------------------------------ Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev _______________________________________________ iText-questions mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/itext-questions Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php |
| Powered by Nabble | Edit this page |
