|
Hi there, I’m trying to retrieve images in a pdf document created with Ghostscript virtual printer. In it, I’m able to retrieve in iText some streams, which I guess correspond to the 4 images I see in Acrobat Reader when I open my pdf document. Such streams have the entry /Flatedecode in the stream dictionary, but don’t have /Width, /Height, and other metadata for images. I decode these streams with the following code and then try to get image data with PdfImageObject, but get an exception (since there are no other metadata in the stream dictionary) PdfStream stream = (PdfStream) pdfobj; if (stream.get(PdfName.FILTER) != null && stream.get(PdfName.FILTER).equals(PdfName.FLATEDECODE)) { byte[] decodedBytes = PdfReader.FlateDecode( PdfReader.getStreamBytesRaw((PRStream) stream), true); } PdfImageObject pio = new PdfImageObject((PRStream)stream); Can somebody give me an advice? Thank you ------------------------------------------------------------------------------ RSA® Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ iText-questions mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php |
|
On 1/11/2011 9:31, Giampaolo Capelli wrote:
> Such streams have the entry /Flatedecode in the stream dictionary, but > don’t have /Width, /Height, and other metadata for images. Then why do you think it are Image XObjects. Maybe it are Form XObjects. In that case, the images are a sequence of PDF syntax; that is: vector data, as opposed to raster images. ------------------------------------------------------------------------------ RSA® Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ iText-questions mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php |
|
Thank you for your reply,
could you give me a hint on how to get information on images in the form of Form XObjects? I need to get the width and height of some images, if they are encoder as vectors I guess I have to calculate a sort of convex hull, to get image sizes (images would be a composite of different vectors). > Such streams have the entry /Flatedecode in the stream dictionary, but > don’t have /Width, /Height, and other metadata for images. Then why do you think it are Image XObjects. Maybe it are Form XObjects. In that case, the images are a sequence of PDF syntax; that is: vector data, as opposed to raster images. ------------------------------------------------------------------------------ RSA® Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ iText-questions mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php ------------------------------------------------------------------------------ RSA® Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ iText-questions mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php |
|
On 1/11/2011 10:50, Giampaolo Capelli wrote:
> could you give me a hint on how to get information on images in the form of Form XObjects? Please read ISO-32000-1 section 8.10 entitled Form XObjects. ------------------------------------------------------------------------------ RSA® Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ iText-questions mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php |
|
That ISO reports the dictionary's BBox entry, that is required.
I can't see it in the stream entries, should I look for it somewhere else? I've decoded one of my Flatedecode streams, it looks like this q 0.12 0 0 0.12 0 0 cm 0 G 0 g q 8.33333 0 0 8.33333 0 0 cm BT /R7 17.1576 Tf 1 0 0 1 34.8 735.44 Tm [()3.12497()2.32073()-2.11647()1.1577()298.828]TJ ET Q 0 0.601563 0 rg 334 6748.67 m 334 6745.67 l 333 6739.67 l ................... Does it look like a form XObject? Basically, I need somehow to recognize in it the images that I see when I open the pdf, I need to get their width, height and positions in the page. Thanks -----Messaggio originale----- Da: 1T3XT BVBA [mailto:[hidden email]] Inviato: martedì 1 novembre 2011 10:55 A: Post all your questions about iText here Oggetto: Re: [iText-questions] R: image in Flatedecode stream without metadata in dictionary On 1/11/2011 10:50, Giampaolo Capelli wrote: > could you give me a hint on how to get information on images in the form of Form XObjects? Please read ISO-32000-1 section 8.10 entitled Form XObjects. ---------------------------------------------------------------------------- -- RSA® Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ iText-questions mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php ----- Nessun virus nel messaggio. Controllato da AVG - www.avg.com Versione: 10.0.1411 / Database dei virus: 2092/3988 - Data di rilascio: 31/10/2011 ------------------------------------------------------------------------------ RSA® Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ iText-questions mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php |
|
On 1/11/2011 11:39, Giampaolo Capelli wrote:
> That ISO reports the dictionary's BBox entry, that is required. > > I've decoded one of my Flatedecode streams, it looks like this > Does it look like a form XObject? Yes, that's PDF syntax. This means you don't have an image, but a Form XObject. If you want to have the Form XObject to have a BBox, please ask the people at GhostScript to add one. If they don't provide the required BBox value, ask them how you're supposed to get the width and the height. ------------------------------------------------------------------------------ RSA® Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ iText-questions mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php |
|
But since Acrobat Reader is able to properly render my pdf file,
I guess there should be some way to get the data I need. Do I need to use a third party library to parse that pdf syntax, or to render the images and calculate myself their attributes? -----Messaggio originale----- Da: 1T3XT BVBA [mailto:[hidden email]] Inviato: martedì 1 novembre 2011 12:25 A: Post all your questions about iText here Oggetto: Re: [iText-questions] R: R: image in Flatedecode stream without metadata in dictionary On 1/11/2011 11:39, Giampaolo Capelli wrote: > That ISO reports the dictionary's BBox entry, that is required. > > I've decoded one of my Flatedecode streams, it looks like this > Does it look like a form XObject? Yes, that's PDF syntax. This means you don't have an image, but a Form XObject. If you want to have the Form XObject to have a BBox, please ask the people at GhostScript to add one. If they don't provide the required BBox value, ask them how you're supposed to get the width and the height. ---------------------------------------------------------------------------- -- RSA® Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ iText-questions mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php ----- Nessun virus nel messaggio. Controllato da AVG - www.avg.com Versione: 10.0.1411 / Database dei virus: 2092/3988 - Data di rilascio: 31/10/2011 ------------------------------------------------------------------------------ RSA® Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ iText-questions mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php |
|
You seem to have a problem with the concept of "images". There are two
types: - Raster (<http://en.wikipedia.org/wiki/Raster_graphics>) - Vector (<http://en.wikipedia.org/wiki/Vector_graphics>) An Image XObject is a Raster image, while a Form XObject is a vector image. So you need to determine how you wish to deal with the differences in your product. Leonard On 11/1/11 7:40 AM, "Giampaolo Capelli" <[hidden email]> wrote: >But since Acrobat Reader is able to properly render my pdf file, >I guess there should be some way to get the data I need. > >Do I need to use a third party library to parse that pdf syntax, >or to render the images and calculate myself their attributes? > > >-----Messaggio originale----- >Da: 1T3XT BVBA [mailto:[hidden email]] >Inviato: martedì 1 novembre 2011 12:25 >A: Post all your questions about iText here >Oggetto: Re: [iText-questions] R: R: image in Flatedecode stream without >metadata in dictionary > >On 1/11/2011 11:39, Giampaolo Capelli wrote: >> That ISO reports the dictionary's BBox entry, that is required. >> >> I've decoded one of my Flatedecode streams, it looks like this >> Does it look like a form XObject? > >Yes, that's PDF syntax. This means you don't have an image, but a Form >XObject. >If you want to have the Form XObject to have a BBox, please ask the >people at GhostScript to add one. >If they don't provide the required BBox value, ask them how you're >supposed to get the width and the height. > >-------------------------------------------------------------------------- >-- >-- >RSA® Conference 2012 >Save $700 by Nov 18 >Register now >http://p.sf.net/sfu/rsa-sfdev2dev1 >_______________________________________________ >iText-questions mailing list >[hidden email] >https://lists.sourceforge.net/lists/listinfo/itext-questions > >iText(R) is a registered trademark of 1T3XT BVBA. >Many questions posted to this list can (and will) be answered with a >reference to the iText book: http://www.itextpdf.com/book/ >Please check the keywords list before you ask for examples: >http://itextpdf.com/themes/keywords.php >----- >Nessun virus nel messaggio. >Controllato da AVG - www.avg.com >Versione: 10.0.1411 / Database dei virus: 2092/3988 - Data di rilascio: >31/10/2011 > > >-------------------------------------------------------------------------- >---- >RSA® Conference 2012 >Save $700 by Nov 18 >Register now >http://p.sf.net/sfu/rsa-sfdev2dev1 >_______________________________________________ >iText-questions mailing list >[hidden email] >https://lists.sourceforge.net/lists/listinfo/itext-questions > >iText(R) is a registered trademark of 1T3XT BVBA. >Many questions posted to this list can (and will) be answered with a >reference to the iText book: http://www.itextpdf.com/book/ >Please check the keywords list before you ask for examples: >http://itextpdf.com/themes/keywords.php ------------------------------------------------------------------------------ RSA® Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ iText-questions mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php |
|
In reply to this post by Giampaolo Capelli
On 1/11/2011 12:40, Giampaolo Capelli wrote:
> But since Acrobat Reader is able to properly render my pdf file, > I guess there should be some way to get the data I need. Why would Adobe Acrobat or Adobe Reader need to know the width and the height of the Form XObject? The bounding box isn't drawn, so it isn't required if you want to draw a Form XObject. If you see a rectangle, it's not defined by the bounding box. If you see a rectangle, you'll find an "re" operator or a sequence of "m" and "l" operators in the stream containing the PDF syntax. NOTE: if a bounding box was defined, this bounding box needn't coincide with a visual bounding box that is drawn in the syntax stream. > Do I need to use a third party library to parse that pdf syntax, iText can perfectly parse PDF syntax, but you're talking about getting information that isn't present in the PDF. My point was: if the information isn't present, you shouldn't expect any software to be able to get it for you. > or to render the images and calculate myself their attributes? I assume that you're NOT taking about Image XObjects; you're talking about Form XObjects. Form XObjects aren't images, so please refrain from saying "images" in this context. You're talking about paths, shapes and text that are defined using PDF syntax. These aren't images! Let me try to explain this as simple as possible: 1. You have a Form XObject, NOT an image. Please don't confuse people by telling them you want the width and the height of an image when in fact you need the bounding box of a Form XObject. If you don't phrase your question correctly, you shouldn't expect a correct answer. 2. Normally a Form XObject SHOULD HAVE a BBox entry. As you're not providing any PDF sample, we can't check if there a bounding box was defined. We assume that your allegation that the bounding box is missing is correct. Note that the bounding box doesn't always correspond exactly with the minimum rectangle that encloses the paths, shapes and text that are drawn in the Form XObject (see the NOTE above). 3. We assume that you want to get the minimum rectangle that encloses the paths, shapes and text drawn in the Form XObject. For text, this is easy: there's an example in the book that explains how to do it. For paths and shapes it's more complex. You need to adapt the parser provided by iText so that you keep track of all the coordinates and all the transformations of the graphics state. This is doable if you have straight lines, but it's plenty of work. As soon as Bézier curves are involved, you'll be facing even more development work. If I understood your question correctly, if you want to find out the width and the height of a Form XObject for which no BBox was defined, then you either should do a lot of development, or you should give up hope because you won't find any API, tool or application that will find this BBox for you. ------------------------------------------------------------------------------ RSA® Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ iText-questions mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php |
|
In reply to this post by Leonard Rosenthol-3
I don't have problems with th concept of image, I perfectly knows the
difference between raster and and vector, rather it seems you didn't get what I need, unlike other partipants to this thread. I need a way to render the object expressed by the pdf syntax in a way that it would be possible to get its width and height and position, or I need to get this information from the pdf data, if possible 2011/11/1, Leonard Rosenthol <[hidden email]>: > You seem to have a problem with the concept of "images". There are two > types: > - Raster (<http://en.wikipedia.org/wiki/Raster_graphics>) > - Vector (<http://en.wikipedia.org/wiki/Vector_graphics>) > > An Image XObject is a Raster image, while a Form XObject is a vector > image. > > So you need to determine how you wish to deal with the differences in your > product. > > Leonard > > > On 11/1/11 7:40 AM, "Giampaolo Capelli" <[hidden email]> wrote: > >>But since Acrobat Reader is able to properly render my pdf file, >>I guess there should be some way to get the data I need. >> >>Do I need to use a third party library to parse that pdf syntax, >>or to render the images and calculate myself their attributes? >> >> >>-----Messaggio originale----- >>Da: 1T3XT BVBA [mailto:[hidden email]] >>Inviato: martedì 1 novembre 2011 12:25 >>A: Post all your questions about iText here >>Oggetto: Re: [iText-questions] R: R: image in Flatedecode stream without >>metadata in dictionary >> >>On 1/11/2011 11:39, Giampaolo Capelli wrote: >>> That ISO reports the dictionary's BBox entry, that is required. >>> >>> I've decoded one of my Flatedecode streams, it looks like this >>> Does it look like a form XObject? >> >>Yes, that's PDF syntax. This means you don't have an image, but a Form >>XObject. >>If you want to have the Form XObject to have a BBox, please ask the >>people at GhostScript to add one. >>If they don't provide the required BBox value, ask them how you're >>supposed to get the width and the height. >> >>-------------------------------------------------------------------------- >>-- >>-- >>RSA® Conference 2012 >>Save $700 by Nov 18 >>Register now >>http://p.sf.net/sfu/rsa-sfdev2dev1 >>_______________________________________________ >>iText-questions mailing list >>[hidden email] >>https://lists.sourceforge.net/lists/listinfo/itext-questions >> >>iText(R) is a registered trademark of 1T3XT BVBA. >>Many questions posted to this list can (and will) be answered with a >>reference to the iText book: http://www.itextpdf.com/book/ >>Please check the keywords list before you ask for examples: >>http://itextpdf.com/themes/keywords.php >>----- >>Nessun virus nel messaggio. >>Controllato da AVG - www.avg.com >>Versione: 10.0.1411 / Database dei virus: 2092/3988 - Data di rilascio: >>31/10/2011 >> >> >>-------------------------------------------------------------------------- >>---- >>RSA® Conference 2012 >>Save $700 by Nov 18 >>Register now >>http://p.sf.net/sfu/rsa-sfdev2dev1 >>_______________________________________________ >>iText-questions mailing list >>[hidden email] >>https://lists.sourceforge.net/lists/listinfo/itext-questions >> >>iText(R) is a registered trademark of 1T3XT BVBA. >>Many questions posted to this list can (and will) be answered with a >>reference to the iText book: http://www.itextpdf.com/book/ >>Please check the keywords list before you ask for examples: >>http://itextpdf.com/themes/keywords.php > > > ------------------------------------------------------------------------------ > RSA® Conference 2012 > Save $700 by Nov 18 > Register now > http://p.sf.net/sfu/rsa-sfdev2dev1 > _______________________________________________ > iText-questions mailing list > [hidden email] > https://lists.sourceforge.net/lists/listinfo/itext-questions > > iText(R) is a registered trademark of 1T3XT BVBA. > Many questions posted to this list can (and will) be answered with a > reference to the iText book: http://www.itextpdf.com/book/ > Please check the keywords list before you ask for examples: > http://itextpdf.com/themes/keywords.php > -- Giampaolo Capelli [hidden email] [hidden email] (+39) 338 7139111 --------------------------- skype: giampow icq: 96521070 msn: [hidden email] gtalk: [hidden email] ------------------------------------------------------------------------------ RSA® Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ iText-questions mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php |
|
One key question - and this applies EQUALLY to both Image as well as Form
Xobjects. Do you want the physical size (what the box would be if rendered w/o any context) _OR_ the effective size (taking the current transformation matrix into account)? If effective, then you need to consider that an Xobject can be called from multiple places in the same document with different CTMs. What do you want to do in that case? And this especially applies to the POSITION item you listed below. Position is determined by the CALLING CONTEXT (aka the content stream that invoked (via the Do operator) the Xobject). So that would imply you want effective size, in which case, you need to step back and look at using the page content parser feature of iText. Leonard On 11/1/11 10:00 AM, "Giampaolo Capelli" <[hidden email]> wrote: >I don't have problems with th concept of image, I perfectly knows the >difference between raster and and vector, rather it seems you didn't >get what I need, unlike other partipants to this thread. I need a way >to render the object expressed by the pdf syntax in a way that it >would be possible to get its width and height and position, or I need >to get this information from the pdf data, if possible > >2011/11/1, Leonard Rosenthol <[hidden email]>: >> You seem to have a problem with the concept of "images". There are two >> types: >> - Raster (<http://en.wikipedia.org/wiki/Raster_graphics>) >> - Vector (<http://en.wikipedia.org/wiki/Vector_graphics>) >> >> An Image XObject is a Raster image, while a Form XObject is a vector >> image. >> >> So you need to determine how you wish to deal with the differences in >>your >> product. >> >> Leonard >> >> >> On 11/1/11 7:40 AM, "Giampaolo Capelli" <[hidden email]> wrote: >> >>>But since Acrobat Reader is able to properly render my pdf file, >>>I guess there should be some way to get the data I need. >>> >>>Do I need to use a third party library to parse that pdf syntax, >>>or to render the images and calculate myself their attributes? >>> >>> >>>-----Messaggio originale----- >>>Da: 1T3XT BVBA [mailto:[hidden email]] >>>Inviato: martedì 1 novembre 2011 12:25 >>>A: Post all your questions about iText here >>>Oggetto: Re: [iText-questions] R: R: image in Flatedecode stream without >>>metadata in dictionary >>> >>>On 1/11/2011 11:39, Giampaolo Capelli wrote: >>>> That ISO reports the dictionary's BBox entry, that is required. >>>> >>>> I've decoded one of my Flatedecode streams, it looks like this >>>> Does it look like a form XObject? >>> >>>Yes, that's PDF syntax. This means you don't have an image, but a Form >>>XObject. >>>If you want to have the Form XObject to have a BBox, please ask the >>>people at GhostScript to add one. >>>If they don't provide the required BBox value, ask them how you're >>>supposed to get the width and the height. >>> >>>------------------------------------------------------------------------ >>>-- >>>-- >>>-- >>>RSA® Conference 2012 >>>Save $700 by Nov 18 >>>Register now >>>http://p.sf.net/sfu/rsa-sfdev2dev1 >>>_______________________________________________ >>>iText-questions mailing list >>>[hidden email] >>>https://lists.sourceforge.net/lists/listinfo/itext-questions >>> >>>iText(R) is a registered trademark of 1T3XT BVBA. >>>Many questions posted to this list can (and will) be answered with a >>>reference to the iText book: http://www.itextpdf.com/book/ >>>Please check the keywords list before you ask for examples: >>>http://itextpdf.com/themes/keywords.php >>>----- >>>Nessun virus nel messaggio. >>>Controllato da AVG - www.avg.com >>>Versione: 10.0.1411 / Database dei virus: 2092/3988 - Data di rilascio: >>>31/10/2011 >>> >>> >>>------------------------------------------------------------------------ >>>-- >>>---- >>>RSA® Conference 2012 >>>Save $700 by Nov 18 >>>Register now >>>http://p.sf.net/sfu/rsa-sfdev2dev1 >>>_______________________________________________ >>>iText-questions mailing list >>>[hidden email] >>>https://lists.sourceforge.net/lists/listinfo/itext-questions >>> >>>iText(R) is a registered trademark of 1T3XT BVBA. >>>Many questions posted to this list can (and will) be answered with a >>>reference to the iText book: http://www.itextpdf.com/book/ >>>Please check the keywords list before you ask for examples: >>>http://itextpdf.com/themes/keywords.php >> >> >> >>------------------------------------------------------------------------- >>----- >> RSA® Conference 2012 >> Save $700 by Nov 18 >> Register now >> http://p.sf.net/sfu/rsa-sfdev2dev1 >> _______________________________________________ >> iText-questions mailing list >> [hidden email] >> https://lists.sourceforge.net/lists/listinfo/itext-questions >> >> iText(R) is a registered trademark of 1T3XT BVBA. >> Many questions posted to this list can (and will) be answered with a >> reference to the iText book: http://www.itextpdf.com/book/ >> Please check the keywords list before you ask for examples: >> http://itextpdf.com/themes/keywords.php >> > > >-- >Giampaolo Capelli >[hidden email] >[hidden email] >(+39) 338 7139111 >--------------------------- >skype: giampow >icq: 96521070 >msn: [hidden email] >gtalk: [hidden email] > >-------------------------------------------------------------------------- >---- >RSA® Conference 2012 >Save $700 by Nov 18 >Register now >http://p.sf.net/sfu/rsa-sfdev2dev1 >_______________________________________________ >iText-questions mailing list >[hidden email] >https://lists.sourceforge.net/lists/listinfo/itext-questions > >iText(R) is a registered trademark of 1T3XT BVBA. >Many questions posted to this list can (and will) be answered with a >reference to the iText book: http://www.itextpdf.com/book/ >Please check the keywords list before you ask for examples: >http://itextpdf.com/themes/keywords.php ------------------------------------------------------------------------------ RSA® Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ iText-questions mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php |
|
In the attachment
I'm providing an example of a pdf file where I see 4 "conceptual images", in the psychological visual sense (I understood that they are not images from the pdf point of view). My aim is to edit the pdf file adding a (rectangular) border to such "conceptual images". Is there an easy way to do it with iText, or should I handcraft some sort of low level parsing myself? Thanks -----Messaggio originale----- Da: Leonard Rosenthol [mailto:[hidden email]] Inviato: martedì 1 novembre 2011 15:22 A: Post here Oggetto: Re: [iText-questions] R: R: R: image in Flatedecode stream without metadata in dictionary One key question - and this applies EQUALLY to both Image as well as Form Xobjects. Do you want the physical size (what the box would be if rendered w/o any context) _OR_ the effective size (taking the current transformation matrix into account)? If effective, then you need to consider that an Xobject can be called from multiple places in the same document with different CTMs. What do you want to do in that case? And this especially applies to the POSITION item you listed below. Position is determined by the CALLING CONTEXT (aka the content stream that invoked (via the Do operator) the Xobject). So that would imply you want effective size, in which case, you need to step back and look at using the page content parser feature of iText. Leonard On 11/1/11 10:00 AM, "Giampaolo Capelli" <[hidden email]> wrote: >I don't have problems with th concept of image, I perfectly knows the >difference between raster and and vector, rather it seems you didn't >get what I need, unlike other partipants to this thread. I need a way >to render the object expressed by the pdf syntax in a way that it >would be possible to get its width and height and position, or I need >to get this information from the pdf data, if possible > >2011/11/1, Leonard Rosenthol <[hidden email]>: >> You seem to have a problem with the concept of "images". There are two >> types: >> - Raster (<http://en.wikipedia.org/wiki/Raster_graphics>) >> - Vector (<http://en.wikipedia.org/wiki/Vector_graphics>) >> >> An Image XObject is a Raster image, while a Form XObject is a vector >> image. >> >> So you need to determine how you wish to deal with the differences in >>your >> product. >> >> Leonard >> >> >> On 11/1/11 7:40 AM, "Giampaolo Capelli" <[hidden email]> wrote: >> >>>But since Acrobat Reader is able to properly render my pdf file, >>>I guess there should be some way to get the data I need. >>> >>>Do I need to use a third party library to parse that pdf syntax, >>>or to render the images and calculate myself their attributes? >>> >>> >>>-----Messaggio originale----- >>>Da: 1T3XT BVBA [mailto:[hidden email]] >>>Inviato: martedì 1 novembre 2011 12:25 >>>A: Post all your questions about iText here >>>Oggetto: Re: [iText-questions] R: R: image in Flatedecode stream without >>>metadata in dictionary >>> >>>On 1/11/2011 11:39, Giampaolo Capelli wrote: >>>> That ISO reports the dictionary's BBox entry, that is required. >>>> >>>> I've decoded one of my Flatedecode streams, it looks like this >>>> Does it look like a form XObject? >>> >>>Yes, that's PDF syntax. This means you don't have an image, but a Form >>>XObject. >>>If you want to have the Form XObject to have a BBox, please ask the >>>people at GhostScript to add one. >>>If they don't provide the required BBox value, ask them how you're >>>supposed to get the width and the height. >>> >>>------------------------------------------------------------------------ >>>-- >>>-- >>>-- >>>RSA® Conference 2012 >>>Save $700 by Nov 18 >>>Register now >>>http://p.sf.net/sfu/rsa-sfdev2dev1 >>>_______________________________________________ >>>iText-questions mailing list >>>[hidden email] >>>https://lists.sourceforge.net/lists/listinfo/itext-questions >>> >>>iText(R) is a registered trademark of 1T3XT BVBA. >>>Many questions posted to this list can (and will) be answered with a >>>reference to the iText book: http://www.itextpdf.com/book/ >>>Please check the keywords list before you ask for examples: >>>http://itextpdf.com/themes/keywords.php >>>----- >>>Nessun virus nel messaggio. >>>Controllato da AVG - www.avg.com >>>Versione: 10.0.1411 / Database dei virus: 2092/3988 - Data di rilascio: >>>31/10/2011 >>> >>> >>>------------------------------------------------------------------------ >>>-- >>>---- >>>RSA® Conference 2012 >>>Save $700 by Nov 18 >>>Register now >>>http://p.sf.net/sfu/rsa-sfdev2dev1 >>>_______________________________________________ >>>iText-questions mailing list >>>[hidden email] >>>https://lists.sourceforge.net/lists/listinfo/itext-questions >>> >>>iText(R) is a registered trademark of 1T3XT BVBA. >>>Many questions posted to this list can (and will) be answered with a >>>reference to the iText book: http://www.itextpdf.com/book/ >>>Please check the keywords list before you ask for examples: >>>http://itextpdf.com/themes/keywords.php >> >> >> >>------------------------------------------------------------------------- >>----- >> RSA® Conference 2012 >> Save $700 by Nov 18 >> Register now >> http://p.sf.net/sfu/rsa-sfdev2dev1 >> _______________________________________________ >> iText-questions mailing list >> [hidden email] >> https://lists.sourceforge.net/lists/listinfo/itext-questions >> >> iText(R) is a registered trademark of 1T3XT BVBA. >> Many questions posted to this list can (and will) be answered with a >> reference to the iText book: http://www.itextpdf.com/book/ >> Please check the keywords list before you ask for examples: >> http://itextpdf.com/themes/keywords.php >> > > >-- >Giampaolo Capelli >[hidden email] >[hidden email] >(+39) 338 7139111 >--------------------------- >skype: giampow >icq: 96521070 >msn: [hidden email] >gtalk: [hidden email] > >-------------------------------------------------------------------------- >---- >RSA® Conference 2012 >Save $700 by Nov 18 >Register now >http://p.sf.net/sfu/rsa-sfdev2dev1 >_______________________________________________ >iText-questions mailing list >[hidden email] >https://lists.sourceforge.net/lists/listinfo/itext-questions > >iText(R) is a registered trademark of 1T3XT BVBA. >Many questions posted to this list can (and will) be answered with a >reference to the iText book: http://www.itextpdf.com/book/ >Please check the keywords list before you ask for examples: >http://itextpdf.com/themes/keywords.php ---------------------------------------------------------------------------- -- RSA® Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ iText-questions mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php ----- Nessun virus nel messaggio. Controllato da AVG - www.avg.com Versione: 10.0.1411 / Database dei virus: 2092/3988 - Data di rilascio: 31/10/2011 ------------------------------------------------------------------------------ RSA® Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ iText-questions mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php |
|
On 1/11/2011 17:09, Giampaolo Capelli wrote:
> In the attachment > > I'm providing an example of a pdf file where I see 4 "conceptual images", in > the psychological visual sense > (I understood that they are not images from the pdf point of view). > > My aim is to edit the pdf file adding a (rectangular) border to such > "conceptual images". > > Is there an easy way to do it with iText, > or should I handcraft some sort of low level parsing myself? "internal view" of your PDF. Look at the /Resources Dictionary. It doesn't contain an /XObjects entry. This means that there are NO Image XObjects (this you already knew), but NO Form XObjects either (we assumed there were Form XObjects, but that must have been a misunderstanding)! Where are the four images? Well... as far as the PDF is concerned, there are no FOUR images. As you rightly point out, they are only there in the psychological, visual sense (you've phrased that very well). As far as the PDF is concerned, there's only ONE sequence of PDF syntax: the page content stream (object 5 in the PDF), which is the /Contents entry of the page dictionary (object 4). All the paths and shapes on that page are constructed using operators such as moveTo (m), lineTo (l) and curveTo (c). I don't know any software (not iText, not any other software) that is intelligent enough to find out which paths and shapes belong to which "conceptual" of "visual" image. I don't know any way to automate the detection of these images. ------------------------------------------------------------------------------ RSA® Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ iText-questions mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php |
|
In reply to this post by Giampaolo Capelli
It sounds like you are going to have to compute the bounding box by interpreting the content stream draw operations. The content parser (com.itextpdf.text.pdf.parser) has the scaffolding in place to help with this, but I have not added a renderDrawOperation() method (or something to that effect) to the RenderListener interface.
Actually interpreting the draw operations will be a bit of an undertaking (as pointed out in earlier posts, finding the true extents of a bezier curve, for example, will take some math), but it is doable. If you want to take a crack at it, I can provide guidance. A starting point would be to add renderDrawOperation() method to RenderListener, then add additional handlers to PdfContentStreamProcessor.populateOperators(). The real question is how much state needs to get tracked in the PdfContentStreamProcessor, and how much should get tracked in the listener. Probably tracking things like the current pen/color should be in the PdfContentStreamProcessor. At the end of the day, this will take a bit of delving into the PDF spec and understanding each operator that is involved in draw operations. If this is something you intend to do, let me know and I'll get you contact info so we can collaborate on any designed decisions, etc... |
|
In reply to this post by Giampaolo Capelli
iText has the building blocks for what you need, but you'll need to put
them together in the right way. Nothing "out of the box". Leonard On 11/1/11 12:09 PM, "Giampaolo Capelli" <[hidden email]> wrote: >In the attachment > >I'm providing an example of a pdf file where I see 4 "conceptual images", >in >the psychological visual sense >(I understood that they are not images from the pdf point of view). > >My aim is to edit the pdf file adding a (rectangular) border to such >"conceptual images". > >Is there an easy way to do it with iText, >or should I handcraft some sort of low level parsing myself? > >Thanks > > > >-----Messaggio originale----- >Da: Leonard Rosenthol [mailto:[hidden email]] >Inviato: martedì 1 novembre 2011 15:22 >A: Post here >Oggetto: Re: [iText-questions] R: R: R: image in Flatedecode stream >without >metadata in dictionary > >One key question - and this applies EQUALLY to both Image as well as Form >Xobjects. > >Do you want the physical size (what the box would be if rendered w/o any >context) _OR_ the effective size (taking the current transformation matrix >into account)? If effective, then you need to consider that an Xobject >can be called from multiple places in the same document with different >CTMs. What do you want to do in that case? > >And this especially applies to the POSITION item you listed below. >Position is determined by the CALLING CONTEXT (aka the content stream that >invoked (via the Do operator) the Xobject). So that would imply you want >effective size, in which case, you need to step back and look at using the >page content parser feature of iText. > >Leonard > > >On 11/1/11 10:00 AM, "Giampaolo Capelli" <[hidden email]> wrote: > >>I don't have problems with th concept of image, I perfectly knows the >>difference between raster and and vector, rather it seems you didn't >>get what I need, unlike other partipants to this thread. I need a way >>to render the object expressed by the pdf syntax in a way that it >>would be possible to get its width and height and position, or I need >>to get this information from the pdf data, if possible >> >>2011/11/1, Leonard Rosenthol <[hidden email]>: >>> You seem to have a problem with the concept of "images". There are two >>> types: >>> - Raster (<http://en.wikipedia.org/wiki/Raster_graphics>) >>> - Vector (<http://en.wikipedia.org/wiki/Vector_graphics>) >>> >>> An Image XObject is a Raster image, while a Form XObject is a vector >>> image. >>> >>> So you need to determine how you wish to deal with the differences in >>>your >>> product. >>> >>> Leonard >>> >>> >>> On 11/1/11 7:40 AM, "Giampaolo Capelli" <[hidden email]> wrote: >>> >>>>But since Acrobat Reader is able to properly render my pdf file, >>>>I guess there should be some way to get the data I need. >>>> >>>>Do I need to use a third party library to parse that pdf syntax, >>>>or to render the images and calculate myself their attributes? >>>> >>>> >>>>-----Messaggio originale----- >>>>Da: 1T3XT BVBA [mailto:[hidden email]] >>>>Inviato: martedì 1 novembre 2011 12:25 >>>>A: Post all your questions about iText here >>>>Oggetto: Re: [iText-questions] R: R: image in Flatedecode stream >>>>without >>>>metadata in dictionary >>>> >>>>On 1/11/2011 11:39, Giampaolo Capelli wrote: >>>>> That ISO reports the dictionary's BBox entry, that is required. >>>>> >>>>> I've decoded one of my Flatedecode streams, it looks like this >>>>> Does it look like a form XObject? >>>> >>>>Yes, that's PDF syntax. This means you don't have an image, but a Form >>>>XObject. >>>>If you want to have the Form XObject to have a BBox, please ask the >>>>people at GhostScript to add one. >>>>If they don't provide the required BBox value, ask them how you're >>>>supposed to get the width and the height. >>>> >>>>----------------------------------------------------------------------- >>>>- >>>>-- >>>>-- >>>>-- >>>>RSA® Conference 2012 >>>>Save $700 by Nov 18 >>>>Register now >>>>http://p.sf.net/sfu/rsa-sfdev2dev1 >>>>_______________________________________________ >>>>iText-questions mailing list >>>>[hidden email] >>>>https://lists.sourceforge.net/lists/listinfo/itext-questions >>>> >>>>iText(R) is a registered trademark of 1T3XT BVBA. >>>>Many questions posted to this list can (and will) be answered with a >>>>reference to the iText book: http://www.itextpdf.com/book/ >>>>Please check the keywords list before you ask for examples: >>>>http://itextpdf.com/themes/keywords.php >>>>----- >>>>Nessun virus nel messaggio. >>>>Controllato da AVG - www.avg.com >>>>Versione: 10.0.1411 / Database dei virus: 2092/3988 - Data di >>>>rilascio: >>>>31/10/2011 >>>> >>>> >>>>----------------------------------------------------------------------- >>>>- >>>>-- >>>>---- >>>>RSA® Conference 2012 >>>>Save $700 by Nov 18 >>>>Register now >>>>http://p.sf.net/sfu/rsa-sfdev2dev1 >>>>_______________________________________________ >>>>iText-questions mailing list >>>>[hidden email] >>>>https://lists.sourceforge.net/lists/listinfo/itext-questions >>>> >>>>iText(R) is a registered trademark of 1T3XT BVBA. >>>>Many questions posted to this list can (and will) be answered with a >>>>reference to the iText book: http://www.itextpdf.com/book/ >>>>Please check the keywords list before you ask for examples: >>>>http://itextpdf.com/themes/keywords.php >>> >>> >>> >>>------------------------------------------------------------------------ >>>- >>>----- >>> RSA® Conference 2012 >>> Save $700 by Nov 18 >>> Register now >>> http://p.sf.net/sfu/rsa-sfdev2dev1 >>> _______________________________________________ >>> iText-questions mailing list >>> [hidden email] >>> https://lists.sourceforge.net/lists/listinfo/itext-questions >>> >>> iText(R) is a registered trademark of 1T3XT BVBA. >>> Many questions posted to this list can (and will) be answered with a >>> reference to the iText book: http://www.itextpdf.com/book/ >>> Please check the keywords list before you ask for examples: >>> http://itextpdf.com/themes/keywords.php >>> >> >> >>-- >>Giampaolo Capelli >>[hidden email] >>[hidden email] >>(+39) 338 7139111 >>--------------------------- >>skype: giampow >>icq: 96521070 >>msn: [hidden email] >>gtalk: [hidden email] >> >>------------------------------------------------------------------------- >>- >>---- >>RSA® Conference 2012 >>Save $700 by Nov 18 >>Register now >>http://p.sf.net/sfu/rsa-sfdev2dev1 >>_______________________________________________ >>iText-questions mailing list >>[hidden email] >>https://lists.sourceforge.net/lists/listinfo/itext-questions >> >>iText(R) is a registered trademark of 1T3XT BVBA. >>Many questions posted to this list can (and will) be answered with a >>reference to the iText book: http://www.itextpdf.com/book/ >>Please check the keywords list before you ask for examples: >>http://itextpdf.com/themes/keywords.php > > >-------------------------------------------------------------------------- >-- >-- >RSA® Conference 2012 >Save $700 by Nov 18 >Register now >http://p.sf.net/sfu/rsa-sfdev2dev1 >_______________________________________________ >iText-questions mailing list >[hidden email] >https://lists.sourceforge.net/lists/listinfo/itext-questions > >iText(R) is a registered trademark of 1T3XT BVBA. >Many questions posted to this list can (and will) be answered with a >reference to the iText book: http://www.itextpdf.com/book/ >Please check the keywords list before you ask for examples: >http://itextpdf.com/themes/keywords.php >----- >Nessun virus nel messaggio. >Controllato da AVG - www.avg.com >Versione: 10.0.1411 / Database dei virus: 2092/3988 - Data di rilascio: >31/10/2011 ------------------------------------------------------------------------------ RSA® Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ iText-questions mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php |
|
In reply to this post by Kevin Day
On 1/11/2011 17:34, Kevin Day wrote:
> It sounds like you are going to have to compute the bounding box by > interpreting the content stream draw operations. But how will you decide which paths/shapes/text belong to which image. See the example: there's a pear and its description "pera". It may be possible to decide that the shape and the text belong to the same image, but that's a judgement call. Suppose that you have an image that consists of two smaller images. For instance a hammer and an anvil. How are you going to decide that both are part of the same image? ------------------------------------------------------------------------------ RSA® Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ iText-questions mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php |
|
In reply to this post by Giampaolo Capelli
The PDF content stream says "Go to position X,Y, then perform the following draw operations". There is no concept of a 'region' or 'area' for this type of thing. In some cases, the content stream might contain a crop region, but it doesn't have to, and if the draw operations don't need to be cropped, it's likely that it wouldn't have a crop region. (and oh yes - you will need to also take cropping into account if you are going to parse the draw operations to determine a bounding box - calculate the bounding box, then intersect that with the crop region) |
|
On 1/11/2011 17:56, Kevin Day wrote:
> oh yes - you will need to also take cropping into account if you are > going to parse the draw operations to determine a bounding box - calculate > the bounding box, then intersect that with the crop region In short: the requirement is far for trivial. ------------------------------------------------------------------------------ RSA® Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ iText-questions mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php |
|
In reply to this post by Kevin Day
Thank you Kevin,
I'll study the material you've provided and I'll get back to the mailing list, I would like to develop some way in iText to reach my target. -----Messaggio originale----- Da: Kevin Day [mailto:[hidden email]] Inviato: martedì 1 novembre 2011 17:35 A: [hidden email] Oggetto: Re: [iText-questions] R: R: R: image in Flatedecode stream without metadata in dictionary It sounds like you are going to have to compute the bounding box by interpreting the content stream draw operations. The content parser (com.itextpdf.text.pdf.parser) has the scaffolding in place to help with this, but I have not added a renderDrawOperation() method (or something to that effect) to the RenderListener interface. Actually interpreting the draw operations will be a bit of an undertaking (as pointed out in earlier posts, finding the true extents of a bezier curve, for example, will take some math), but it is doable. If you want to take a crack at it, I can provide guidance. A starting point would be to add renderDrawOperation() method to RenderListener, then add additional handlers to PdfContentStreamProcessor.populateOperators(). The real question is how much state needs to get tracked in the PdfContentStreamProcessor, and how much should get tracked in the listener. Probably tracking things like the current pen/color should be in the PdfContentStreamProcessor. At the end of the day, this will take a bit of delving into the PDF spec and understanding each operator that is involved in draw operations. If this is something you intend to do, let me know and I'll get you contact info so we can collaborate on any designed decisions, etc... -- View this message in context: http://itext-general.2136553.n4.nabble.com/image-in-Flatedecode-stream-witho ut-metadata-in-dictionary-tp3962812p3964251.html Sent from the iText - General mailing list archive at Nabble.com. ---------------------------------------------------------------------------- -- RSA® Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ iText-questions mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php ----- Nessun virus nel messaggio. Controllato da AVG - www.avg.com Versione: 10.0.1411 / Database dei virus: 2092/3989 - Data di rilascio: 01/11/2011 ------------------------------------------------------------------------------ RSA® Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ iText-questions mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php |
|
In reply to this post by iText Software
Hi 1T3XT,
thank you for the tips. I'm thinking of a computer vision approach to solve my problem: if I switch to the raster domain, that is rendering the pdf syntax into a raster image, then I will be able to apply some computer vision techniques to recognize the shapes (blobs), and to get their convex hulls (or bounding boxes). It would be possible, for example, to work on a black/white version of the rendered image, then to apply a threshold on its pixels and to apply some algorithm to label the connected components and so on. Once I'll have shapes (blobs), I will be able to get their bounding boxes, widths, heights and positions espressed in pixel values. The last step would be to convert such values back to the pdf domain. Do you think this makes sense? -----Messaggio originale----- Da: 1T3XT BVBA [mailto:[hidden email]] Inviato: martedì 1 novembre 2011 17:34 A: Post all your questions about iText here Oggetto: Re: [iText-questions] R: R: R: R: image in Flatedecode stream without metadata in dictionary On 1/11/2011 17:09, Giampaolo Capelli wrote: > In the attachment > > I'm providing an example of a pdf file where I see 4 "conceptual > images", in the psychological visual sense (I understood that they are > not images from the pdf point of view). > > My aim is to edit the pdf file adding a (rectangular) border to such > "conceptual images". > > Is there an easy way to do it with iText, or should I handcraft some > sort of low level parsing myself? RUPS, which is a tool that takes X-Ray photos of PDFs: it shows an "internal view" of your PDF. Look at the /Resources Dictionary. It doesn't contain an /XObjects entry. This means that there are NO Image XObjects (this you already knew), but NO Form XObjects either (we assumed there were Form XObjects, but that must have been a misunderstanding)! Where are the four images? Well... as far as the PDF is concerned, there are no FOUR images. As you rightly point out, they are only there in the psychological, visual sense (you've phrased that very well). As far as the PDF is concerned, there's only ONE sequence of PDF syntax: the page content stream (object 5 in the PDF), which is the /Contents entry of the page dictionary (object 4). All the paths and shapes on that page are constructed using operators such as moveTo (m), lineTo (l) and curveTo (c). I don't know any software (not iText, not any other software) that is intelligent enough to find out which paths and shapes belong to which "conceptual" of "visual" image. I don't know any way to automate the detection of these images. ----- Nessun virus nel messaggio. Controllato da AVG - www.avg.com Versione: 10.0.1411 / Database dei virus: 2092/3989 - Data di rilascio: 01/11/2011 ------------------------------------------------------------------------------ RSA® Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ iText-questions mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php |
| Powered by Nabble | Edit this page |
