Quantcast

PdfContentStreamProcessor and changes in graphics state

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

PdfContentStreamProcessor and changes in graphics state

Peter Schwalm
Hello,

I use PdfContentStreamProcessor and a listener to extract text from
certain rectangles on pdf pages, for instance a customer number always
appearing at the same position on the pages of a file.

I assumed I could use a single PdfContentStreamProcessor object and call
ProcessContent() with it for every page of the file. It looked as if
this would work until I had to work with a pdf file that contains a
modification of the CTM in the graphics state on every page. This was
shift to right of about 16 points.

I found out that this shift was cumulative, i.e. the text positions
where shifted for 16 points on the first, 32 points on the second page
and so on.

I could get rid of the problem by using a new PdfContentStreamProcessor
object for every page.

My question: is this behaviour: cumulative effects of changes to the
graphics state between calls to ProcessContent() intended or is it a bug?

Thank you in advance
Peter Schwalm


------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: PdfContentStreamProcessor and changes in graphics state

Kevin Day
This is intentional and by design.  PdfContentStreamParser is stateful - create a new one for each page.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: PdfContentStreamProcessor and changes in graphics state

Peter Schwalm
Hi Kevin,

thank you for your answer.

Perhaps it would be nice, if this could be mentioned at a promiment
place in the api docs and / or in the itext book. The examples I have
found tend to look as if ProcessContent() was something like a "per page
action" that begins with a tabula rasa at each new call.

Peter


------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: PdfContentStreamProcessor and changes in graphics state

Kevin Day
Sure, I'd be happy to add a note to to the processContent() method.  Did you happen to note the reset() method?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: PdfContentStreamProcessor and changes in graphics state

Peter Schwalm
Hello Kevin,

Yes, I considered using it. But when I did, I was already happy being
successful with a new instance for each page, which I think is not a
very costly operation in the context it is used in. I only think a hint
to the cumulative effects of processContent() in api doc and/or "the
book" could prevent some people from running in the false direction.

Thank you
Peter






------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Loading...