|
Hello, im trying to get the images from pdf file.
I used the following code from ExtractImages example: String filename; FileOutputStream os; if (PdfName.DCTDECODE.toString().equals(filter.toString())) { filename = String.format(path, iri.getRef().getNumber(), "jpg"); logger.log(Level.INFO, filename); os = new FileOutputStream(filename); os.write(image.getImageAsBytes()); os.flush(); os.close(); } else if (PdfName.JPXDECODE.toString().equals(filter.toString())) { filename = String.format(path, iri.getRef().getNumber(), "jp2"); logger.log(Level.INFO, filename); os = new FileOutputStream(filename); os.write(image.getImageAsBytes()); os.flush(); os.close(); } else { BufferedImage awtimage = iri.getImage().getBufferedImage(); if (awtimage != null) { filename = String.format(path, iri.getRef().getNumber(), "png"); logger.log(Level.INFO, filename); ImageIO.write(awtimage, "png", new FileOutputStream(filename)); } } inside MyImageRenderListener it works fine for a document that I created in office app (inserting 3 images) -> it exports 3 images as well and they are correct. But Now I recieved an pdf file created by scanner (scan to pdf, three A4 pages were scanned) and the extracted images are wrong. The sum of images from this pdf is 15, and it seems as they are split into layers. One images content is on 5 different image. I tried somehow to preprocess the pdf (something like flattening/merging the layers) but did not find nothing usefull. even I cannot tell what image belongs to what layer. Optionaly, i would like to know if there is a way to extract the pages of pdf as images, ignoring the original image organisation in the document. I would appreciate any help whit this one. Thank you. Andrew p.s. i wrote one mail already to this list, but after registration i cant find it so i write this one again. |
|
On 15/05/2012 8:05, cccp14 wrote:
> Optionaly, i would like to know if there is a way to extract the pages of > pdf as images, ignoring the original image organisation in the document. You need another product for that. Try JPedal: http://www.jpedal.org/ ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ iText-questions mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php |
|
In reply to this post by cccp14
On Tue, May 15, 2012 08:05, cccp14 wrote:
> p.s. i wrote one mail already to this list, but after registration i cant > find it so i write this one again. You sent your mail to Nabble. Don't do that. Nabble is just a read-only proxy. You have to send your mails to the *real* mailing list. Bruno Lowagie explains why: http://lowagie.com/nabble This is a footer from the fake mailing list: > http://itext-general.2136553.n4.nabble.com/getting-images-from-pdf-tp4633970.html > Sent from the iText - General mailing list archive at Nabble.com. This is a footer from the real mailing list. > _______________________________________________ > iText-questions mailing list > [hidden email] > https://lists.sourceforge.net/lists/listinfo/itext-questions Just so you know. ----------------------------------------- This email was sent using SquirrelMail. "Webmail for nuts!" http://squirrelmail.org/ ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ iText-questions mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php |
|
In reply to this post by cccp14
This can work (we do it in our internal systems) - but you have to be very aware of how PDF works. The image draw operations are laying the images down onto a canvas. You can't just blindly extract the images, because it's quite possible that the images are designed to be laid down on top of each other, etc... (as you've discovered).
We handle this by creating a BufferedImage, then actually rendering the images onto it (you'll have to do coordinate transformation, bit depth adjustments, etc...). It definitely works, but it's not going to be a no brainer that you can just do using an existing API in iText. If you want something out of the box, consider licensing JPedal. Cheers, |
| Powered by Nabble | Edit this page |
