|
I am concatinating many pdf forms into one pdf document using the following code.
PdfCopyFields copy = new PdfCopyFields(new FileOutputStream("concatinated.pdf")); copy.setFullCompression(); copy.addDocument(new PdfReader("form1.pdf")); copy.addDocument(new PdfReader("form2.pdf")); copy.addDocument(new PdfReader("form3.pdf")); . . copy.addDocument(new PdfReader("formn.pdf")); copy.close(); How can I remove all the embedded fonts (which accounts for more than 80% of the size of this documents) from the concatinated.pdf? Currently I am removing the embedded fonts using Pdf Optimizer in Acrobat professional. But as the number of these concatinated documents increases (around 75 as of now and can be more in near future), I am trying to find out an easier way to do this instead of manually removing fonts from each documents one by one. Any help will be truly appreciated. Thanks -new2pdf |
|
Is there a better way of doing this?
|
|
new2pdf wrote:
> Is there a better way of doing this? That's not a question I can answer in only a few lines. Due to lack of time, I have to pass on this question for now. Maybe somebody else can explain how to remove fonts (although that's always a very delicate matter). br, Bruno ------------------------------------------------------------------------- SF.Net email is sponsored by: The Future of Linux Business White Paper from Novell. From the desktop to the data center, Linux is going mainstream. Let it simplify your IT future. http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4 _______________________________________________ iText-questions mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://itext.ugent.be/itext-in-action/ |
|
As Bruno said, this is _NOT_ a trivial task.
You will need to understand the complexities of font formats, text encoding and how these relate to PDF content streams. Once armed with that information, you can begin to construct code that will unembed fonts - or at least some of them. It also depends on whether you plan to only unembed fully embedded fonts OR also those that are subset... I think when I implemented one, it took me a good week to get everything working...and that was starting with the knowledge base, a comprehensive PDF library and a solid font engine. Leonard On Nov 30, 2007, at 9:56 AM, Bruno Lowagie wrote: > new2pdf wrote: >> Is there a better way of doing this? > > That's not a question I can answer in only a few lines. > Due to lack of time, I have to pass on this question for now. > Maybe somebody else can explain how to remove fonts > (although that's always a very delicate matter). > br, > Bruno > ------------------------------------------------------------------------- SF.Net email is sponsored by: The Future of Linux Business White Paper from Novell. From the desktop to the data center, Linux is going mainstream. Let it simplify your IT future. http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4 _______________________________________________ iText-questions mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://itext.ugent.be/itext-in-action/ |
|
Bruno & Leo,
Thanks for your time. These base pdf documents (pdf 1..n) are coming from third party companies, state & Federal governments. So we have no control over the fonts they use in these documents. We need to create several different combinations of these base documents by combining them. Once combined the average size of these documents is about 5MB (about 80% is fonts). After removing all the fonts using Acrobat Professional PDF Optimizer the size is reduced to 400-500Kb. The problem is that we have to manually run the PDF Optimizer on the 75+ documents. We tried to run a batch sequence from Acrobat Professional, but it worked partially. It only removed some of the fonts and brought the size down to 1.2MB range. That is why I posted this question here. |
|
In reply to this post by blowagie
This contribution is about TTF font merging and replacement, but it should help with the original font problem. It would also be possible to continue developing this code to remove embedded fonts and replace them with pdf document fonts. With this code you can remove other fonts and replace them with some TTF font.
Here's (itext_font_merging_patch.zip) some additional classes to com.lowagie.text.pdf . itext_font_merging_patch.zip The main class is FontReplacingPdfSmartCopy (extends PdfSmartCopy). This subclass of PdfSmartCopy will replace different subsets of the same font in the resulting pdf file with one font (same font name or replacement can be also done). It needs the TrueType font files (TTF) to do the job. This usually reduces the final PDF file size by 40%-60% compared to plain PdfSmartCopy when several single page pdfs are merged in to one big pdf file. The original motivation was to make it possible to print very large pdf files which have been concatenated of thousands of small pdf files containing the same fonts. Some printers just clog when they receive thounsands of embedded fonts in a single print job. Maybe the reason is that the resulting Postscript files are quite huge when there's thousands of embedded fonts. Having less fonts makes it more simple to print-process the concatenated pdf. It also uses less disk space. Merging/replacement currently works only with WinAnsiEncoding (iso-8859-1 / Cp1252 / latin1) Page resources are scanned while copying and it checks for a dictionary under FONT key. Possible FONT references in the font dictionary are replaced on demand. The new font will get a new indirect reference and the font will be written to the stream in the overridden PdfWriter.addSharedObjectsToBody() method. This class has to be in the com.lowagie.itext.pdf package because it needs access to some package private methods. I would like to contribute this implementation to the community and I hope this gets in to the itext release as soon as possible. Bruno or Paulo, can you add this to the itext release? There's also a helper class for concatenating pdfs. PdfConcator makes it easier to concatenate multiple PDF files in to a single PDF file. It uses FontReplacingPdfSmartCopy to reduce the number of fonts in the resulting PDF file. It's also possible to replace TTF fonts with other fonts. Sample usage: PdfConcator pdfConcator=new PdfConcator(); pdfConcator.setFontDir(new File("fonts")); // or pdfConcator.useDefaultSystemFonts(); Map fontNameMapping = new HashMap(); fontNameMapping.put("ComicSansMS", "Arial-Black"); pdfConcator.setFontNameMapping(fontNameMapping); List files = new ArrayList(); files.add("pdf1.pdf"); files.add("pdf2.pdf"); pdfConcator.concat(files, new File("concat.pdf")); Supports IoC/DI. Example Spring configuration xml: <beans xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.0.xsd"> <bean id="pdfConcator" class="com.lowagie.text.pdf.PdfConcator"> <property name="fonts"> <bean factory-bean="fontArrayFactory" factory-method="getInstance"/> </property> <property name="fontNameMapping"> <map> <entry key="ComicSansMS" value="ArialMT"/> </map> </property> </bean> <bean id="fontArrayFactory" class="com.lowagie.text.pdf.BaseFontArrayFactory"> <property name="fontDirs"> <list> <value>C:/myfonts</value> <value>fonts</value> <value>/usr/share/fonts/truetype/msttcorefonts</value> <value>C:/WINDOWS/FONTS</value> </list> </property> <property name="extensions" value="ttf,otf"/> <property name="ignoreUnreadable" value="true"/> </bean> </beans> Regards, Lari Hotari
|
|
In reply to this post by blowagie
This contribution is about TTF font merging and replacement, but it should help with the original font problem. It would also be possible to continue developing this code to remove embedded fonts and replace them with pdf document fonts. With this code you can remove other fonts and replace them with some TTF font.
Here's (itext_font_merging_patch.zip) some additional classes to com.lowagie.text.pdf . itext_font_merging_patch.zip The main class is FontReplacingPdfSmartCopy (extends PdfSmartCopy). This subclass of PdfSmartCopy will replace different subsets of the same font in the resulting pdf file with one font (same font name or replacement can be also done). It needs the TrueType font files (TTF) to do the job. This usually reduces the final PDF file size by 40%-60% compared to plain PdfSmartCopy when several single page pdfs are merged in to one big pdf file. The original motivation was to make it possible to print very large pdf files which have been concatenated of thousands of small pdf files containing the same fonts. Some printers just clog when they receive thounsands of embedded fonts in a single print job. Maybe the reason is that the resulting Postscript files are quite huge when there's thousands of embedded fonts. Having less fonts makes it more simple to print-process the concatenated pdf. It also uses less disk space. Merging/replacement currently works only with WinAnsiEncoding (iso-8859-1 / Cp1252 / latin1) Page resources are scanned while copying and it checks for a dictionary under FONT key. Possible FONT references in the font dictionary are replaced on demand. The new font will get a new indirect reference and the font will be written to the stream in the overridden PdfWriter.addSharedObjectsToBody() method. This class has to be in the com.lowagie.itext.pdf package because it needs access to some package private methods. I would like to contribute this implementation to the community and I hope this gets in to the itext release as soon as possible. Bruno or Paulo, can you add this to the itext release? There's also a helper class for concatenating pdfs. PdfConcator makes it easier to concatenate multiple PDF files in to a single PDF file. It uses FontReplacingPdfSmartCopy to reduce the number of fonts in the resulting PDF file. It's also possible to replace TTF fonts with other fonts. Sample usage: PdfConcator pdfConcator=new PdfConcator(); pdfConcator.setFontDir(new File("fonts")); // or pdfConcator.useDefaultSystemFonts(); Map fontNameMapping = new HashMap(); fontNameMapping.put("ComicSansMS", "Arial-Black"); pdfConcator.setFontNameMapping(fontNameMapping); List files = new ArrayList(); files.add("pdf1.pdf"); files.add("pdf2.pdf"); pdfConcator.concat(files, new File("concat.pdf")); Supports IoC/DI. Example Spring configuration xml: <beans xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.0.xsd"> <bean id="pdfConcator" class="com.lowagie.text.pdf.PdfConcator"> <property name="fonts"> <bean factory-bean="fontArrayFactory" factory-method="getInstance"/> </property> <property name="fontNameMapping"> <map> <entry key="ComicSansMS" value="ArialMT"/> </map> </property> </bean> <bean id="fontArrayFactory" class="com.lowagie.text.pdf.BaseFontArrayFactory"> <property name="fontDirs"> <list> <value>C:/myfonts</value> <value>fonts</value> <value>/usr/share/fonts/truetype/msttcorefonts</value> <value>C:/WINDOWS/FONTS</value> </list> </property> <property name="extensions" value="ttf,otf"/> <property name="ignoreUnreadable" value="true"/> </bean> </beans> Regards, Lari Hotari
|
|
So this will only work if the FONT dictionary specifies a /Subtype
of /TrueType, an /Encoding of /WinANSIEncoding and does not have a / Differences array - correct? There is no support for Type 1, Type 1C, Mac encodings, for custom encodings or for CID fonts, correct? Also, how do you determine in the case of multiple subsets that the fonts were from the same font originally? Only by /BaseFont name? Leonard On Jan 31, 2008, at 1:02 PM, Lari Hotari wrote: > > This contribution is about TTF font merging and replacement, but it > should > help with the original font problem. It would also be possible to > continue > developing this code to remove embedded fonts and replace them with > document fonts. With this code you can remove other fonts and > replace them > with some TTF font. > > Here's (itext_font_merging_patch.zip) some additional classes to > com.lowagie.text.pdf . > http://www.nabble.com/file/p15203186/itext_font_merging_patch.zip > itext_font_merging_patch.zip > The main class is FontReplacingPdfSmartCopy (extends PdfSmartCopy). > > This subclass of PdfSmartCopy will replace different subsets of the > same > font in the resulting pdf file with one font (same font name or > replacement > can be also done). It needs the TrueType font files (TTF) to do the > job. > > This usually reduces the final PDF file size by 40%-60% compared to > plain > PdfSmartCopy when several single page pdfs are merged in to one big > file. > > The original motivation was to make it possible to print very large > files which have been concatenated of thousands of small pdf files > containing the same fonts. Some printers just clog when they receive > thounsands of embedded fonts in a single print job. Maybe the > reason is that > the resulting Postscript files are quite huge when there's > thousands of > embedded fonts. Having less fonts makes it more simple to print- > process the > concatenated pdf. It also uses less disk space. > > Merging/replacement currently works only with WinAnsiEncoding > (iso-8859-1 / > Cp1252 / latin1) > > Page resources are scanned while copying and it checks for a > dictionary > under FONT key. Possible FONT references in the font dictionary are > replaced > on demand. The new font will get a new indirect reference and the > font will > be written to the stream in the overridden > PdfWriter.addSharedObjectsToBody() method. > > This class has to be in the com.lowagie.itext.pdf package because > it needs > access to some package private methods. > > I would like to contribute this implementation to the community and > I hope > this gets in to the itext release as soon as possible. Bruno or > Paulo, can > you add this to the itext release? > > There's also a helper class for concatenating pdfs. > > PdfConcator makes it easier to concatenate multiple PDF files in to > a single > PDF file. It uses FontReplacingPdfSmartCopy to reduce the number of > fonts in > the resulting PDF file. It's also possible to replace TTF fonts > with other > fonts. Sample usage: > > > PdfConcator pdfConcator=new PdfConcator(); > pdfConcator.setFontDir(new File("fonts")); // or > pdfConcator.useDefaultSystemFonts(); > Map fontNameMapping = new HashMap(); > fontNameMapping.put("ComicSansMS", "Arial-Black"); > pdfConcator.setFontNameMapping(fontNameMapping); > > List files = new ArrayList(); > files.add("pdf1.pdf"); > files.add("pdf2.pdf"); > pdfConcator.concat(files, new File("concat.pdf")); > > Supports IoC/DI. Example Spring configuration xml: > > <beans xsi:schemaLocation="http://www.springframework.org/ > schema/beans > http://www.springframework.org/schema/beans/spring-beans-2.0.xsd"> > <bean id="pdfConcator" class="com.lowagie.text.pdf.PdfConcator"> > <property name="fonts"> > <bean factory-bean="fontArrayFactory" > factory-method="getInstance"/> > </property> > <property name="fontNameMapping"> > <map> > <entry key="ComicSansMS" value="ArialMT"/> > </map> > </property> > </bean> > <bean id="fontArrayFactory" > class="com.lowagie.text.pdf.BaseFontArrayFactory"> > <property name="fontDirs"> > <list> > <value>C:/myfonts</value> > <value>fonts</value> > <value>/usr/share/fonts/truetype/msttcorefonts</value> > <value>C:/WINDOWS/FONTS</value> > </list> > </property> > <property name="extensions" value="ttf,otf"/> > <property name="ignoreUnreadable" value="true"/> > </bean> > </beans> > > > Regards, > > Lari Hotari > > > > Bruno Lowagie (iText) wrote: >> >> new2pdf wrote: >>> Is there a better way of doing this? >> >> That's not a question I can answer in only a few lines. >> Due to lack of time, I have to pass on this question for now. >> Maybe somebody else can explain how to remove fonts >> (although that's always a very delicate matter). >> br, >> Bruno >> >> >> --------------------------------------------------------------------- >> ---- >> SF.Net email is sponsored by: The Future of Linux Business White >> Paper >> from Novell. From the desktop to the data center, Linux is going >> mainstream. Let it simplify your IT future. >> http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4 >> _______________________________________________ >> iText-questions mailing list >> [hidden email] >> https://lists.sourceforge.net/lists/listinfo/itext-questions >> Buy the iText book: http://itext.ugent.be/itext-in-action/ >> >> > > -- > View this message in context: http://www.nabble.com/How-to-remove- > embedded-fonts-from-a-pdf-document-tp14033717p15203186.html > Sent from the iText - General mailing list archive at Nabble.com. > > > ---------------------------------------------------------------------- > --- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > iText-questions mailing list > [hidden email] > https://lists.sourceforge.net/lists/listinfo/itext-questions > Buy the iText book: http://itext.ugent.be/itext-in-action/ > ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ iText-questions mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://itext.ugent.be/itext-in-action/ |
|
In reply to this post by Lari Hotari
I'll have a look.
Paulo > -----Original Message----- > From: [hidden email] > [mailto:[hidden email]] On > Behalf Of Lari Hotari > Sent: Thursday, January 31, 2008 12:03 PM > To: [hidden email] > Subject: [iText-questions] contribution: > FontReplacingPdfSmartCopy: duplicate TTF font subset merging > and replacement (was: How to remove embedded fonts from a pdf > document) > > > This contribution is about TTF font merging and replacement, > but it should > help with the original font problem. It would also be > possible to continue > developing this code to remove embedded fonts and replace > them with pdf > document fonts. With this code you can remove other fonts and > replace them > with some TTF font. > > Here's (itext_font_merging_patch.zip) some additional classes to > com.lowagie.text.pdf . > http://www.nabble.com/file/p15203186/itext_font_merging_patch.zip > itext_font_merging_patch.zip > The main class is FontReplacingPdfSmartCopy (extends PdfSmartCopy). > > This subclass of PdfSmartCopy will replace different subsets > of the same > font in the resulting pdf file with one font (same font name > or replacement > can be also done). It needs the TrueType font files (TTF) to > do the job. > > This usually reduces the final PDF file size by 40%-60% > compared to plain > PdfSmartCopy when several single page pdfs are merged in to > one big pdf > file. > > The original motivation was to make it possible to print very > large pdf > files which have been concatenated of thousands of small pdf files > containing the same fonts. Some printers just clog when they receive > thounsands of embedded fonts in a single print job. Maybe the > reason is that > the resulting Postscript files are quite huge when there's > thousands of > embedded fonts. Having less fonts makes it more simple to > print-process the > concatenated pdf. It also uses less disk space. > > Merging/replacement currently works only with WinAnsiEncoding > (iso-8859-1 / > Cp1252 / latin1) > > Page resources are scanned while copying and it checks for a > dictionary > under FONT key. Possible FONT references in the font > dictionary are replaced > on demand. The new font will get a new indirect reference and > the font will > be written to the stream in the overridden > PdfWriter.addSharedObjectsToBody() method. > > This class has to be in the com.lowagie.itext.pdf package > because it needs > access to some package private methods. > > I would like to contribute this implementation to the > community and I hope > this gets in to the itext release as soon as possible. Bruno > or Paulo, can > you add this to the itext release? > > There's also a helper class for concatenating pdfs. > > PdfConcator makes it easier to concatenate multiple PDF files > in to a single > PDF file. It uses FontReplacingPdfSmartCopy to reduce the > number of fonts in > the resulting PDF file. It's also possible to replace TTF > fonts with other > fonts. Sample usage: > > > PdfConcator pdfConcator=new PdfConcator(); > pdfConcator.setFontDir(new File("fonts")); // or > pdfConcator.useDefaultSystemFonts(); > Map fontNameMapping = new HashMap(); > fontNameMapping.put("ComicSansMS", "Arial-Black"); > pdfConcator.setFontNameMapping(fontNameMapping); > > List files = new ArrayList(); > files.add("pdf1.pdf"); > files.add("pdf2.pdf"); > pdfConcator.concat(files, new File("concat.pdf")); > > Supports IoC/DI. Example Spring configuration xml: > > <beans > xsi:schemaLocation="http://www.springframework.org/schema/beans > http://www.springframework.org/schema/beans/spring-beans-2.0.xsd"> > <bean id="pdfConcator" class="com.lowagie.text.pdf.PdfConcator"> > <property name="fonts"> > <bean factory-bean="fontArrayFactory" > factory-method="getInstance"/> > </property> > <property name="fontNameMapping"> > <map> > <entry key="ComicSansMS" value="ArialMT"/> > </map> > </property> > </bean> > <bean id="fontArrayFactory" > class="com.lowagie.text.pdf.BaseFontArrayFactory"> > <property name="fontDirs"> > <list> > <value>C:/myfonts</value> > <value>fonts</value> > <value>/usr/share/fonts/truetype/msttcorefonts</value> > <value>C:/WINDOWS/FONTS</value> > </list> > </property> > <property name="extensions" value="ttf,otf"/> > <property name="ignoreUnreadable" value="true"/> > </bean> > </beans> > > > Regards, > > Lari Hotari > > > > Bruno Lowagie (iText) wrote: > > > > new2pdf wrote: > >> Is there a better way of doing this? > > > > That's not a question I can answer in only a few lines. > > Due to lack of time, I have to pass on this question for now. > > Maybe somebody else can explain how to remove fonts > > (although that's always a very delicate matter). > > br, > > Bruno Aviso Legal: Esta mensagem é destinada exclusivamente ao destinatário. Pode conter informação confidencial ou legalmente protegida. A incorrecta transmissão desta mensagem não significa a perca de confidencialidade. Se esta mensagem for recebida por engano, por favor envie-a de volta para o remetente e apague-a do seu sistema de imediato. É proibido a qualquer pessoa que não o destinatário de usar, revelar ou distribuir qualquer parte desta mensagem. Disclaimer: This message is destined exclusively to the intended receiver. It may contain confidential or legally protected information. The incorrect transmission of this message does not mean the loss of its confidentiality. If this message is received by mistake, please send it back to the sender and delete it from your system immediately. It is forbidden to any person who is not the intended receiver to use, distribute or copy any part of this message. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ iText-questions mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://itext.ugent.be/itext-in-action/ |
|
In reply to this post by Leonard Rosenthol
I started the work as proof-of-concept. My usecase was only for TrueType fonts so I haven't tried too much more. I also looked at replacing other type of encoding like Identity-H for barcode fonts but they use a custom encoding in the page stream and that would be too much effort to start merging the encodings. It would also require parsing & modifying the page stream. The font's are only recognized by the /BaseFont name, there isn't any other checking done. The FirstChar and LastChar ranges are updated in the final font (the minimum FirstChar and maximum LastChar is selected for the final font). Subset fonts have a name like "ZIGEYT+ComicSansMS". The subset fonts have their real postscript fontname after the "+" sign. PDF reference , p. 419 "For a font subset, the PostScript name of the font—the value of the font’s BaseFont entry and the font descriptor’s FontName entry—begins with a tagfollowed by a plus sign (+). The tag consists of exactly six uppercase letters; the choice of letters is arbitrary, but different subsets in the same PDF file must have different tags. For example, EOODIA+Poetica is the name of a subset of Poetica®, a Type 1 font. (See implementation note 63 in Appendix H.)" If you look at the source code you can see that the implementation is fairly light weight currently. It just extends PdfSmartCopy with font replacement/merging possiblity. It uses existing methods to write fonts etc. (copy&paste from other itext classes in some places). There's also a jUnit testcase is the zip, under test subdirectory. It uses the PdfConcator helper class that makes it easy to configure a PdfConcator bean instance in IoC/DI (Spring Framework, Guice, etc.). I hope that this work could serve as a baseline for adding font merging and replacement features to iText. It would be nice to have some kind of template method pattern or strategy pattern for customizing the base solution for different use cases. Lari |
|
While that is the recommendation of the PDF spec - you will find in the real world situations where there are fonts that are subset but w/o this name change AND fonts with this name setup that aren't actually subset.
Leonard On Jan 31, 2008, at 5:28 PM, Lari Hotari wrote:
------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ iText-questions mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://itext.ugent.be/itext-in-action/ |
So there's "illegal pdfs" around too. :) Anyways this solution isn't too dependent on that subset information. I mentioned about that it would be nice that this font replacement part could have some part customizable with a strategy pattern and template method pattern. For example one could provide their own strategy to replace a font based on some custom logic and matching. Some parts could be customizable with the template method pattern by subclassing the base class. I think these possibilities would make it easy to handle most requirements. My own usecase is quite simple since there's only a couple of fonts that have thousands of subset fonts. I'm only targeting the merging and replacement to certain fonts. Currently this solution works quite well for this kind of usecase. Lari |
|
In reply to this post by Paulo Soares
Hi, Have you had a chance to check this out? (http://www.nabble.com/file/p15203186/itext_font_merging_patch.zip) Regards, Lari
|
|
It's not forgotten but, as you said yourself, it solves your specific
problem. I'll see if I can make it more generic but that takes time. Paulo ----- Original Message ----- From: "Lari Hotari" <[hidden email]> To: <[hidden email]> Sent: Tuesday, February 05, 2008 10:02 PM Subject: Re: [iText-questions] contribution: FontReplacingPdfSmartCopy: duplicate TTF font subset merging and replacement (was: How to remove embedded fonts from a pdf document) > > > Hi, > > Have you had a chance to check this out? > (http://www.nabble.com/file/p15203186/itext_font_merging_patch.zip) > > Regards, > > Lari > > > > Paulo Soares wrote: >> >> I'll have a look. >> >> Paulo >> >>> -----Original Message----- >>> From: [hidden email] >>> [mailto:[hidden email]] On >>> Behalf Of Lari Hotari >>> Sent: Thursday, January 31, 2008 12:03 PM >>> To: [hidden email] >>> Subject: [iText-questions] contribution: >>> FontReplacingPdfSmartCopy: duplicate TTF font subset merging >>> and replacement (was: How to remove embedded fonts from a pdf >>> document) >>> >>> >>> This contribution is about TTF font merging and replacement, >>> but it should >>> help with the original font problem. It would also be >>> possible to continue >>> developing this code to remove embedded fonts and replace >>> them with pdf >>> document fonts. With this code you can remove other fonts and >>> replace them >>> with some TTF font. >>> >>> Here's (itext_font_merging_patch.zip) some additional classes to >>> com.lowagie.text.pdf . >>> http://www.nabble.com/file/p15203186/itext_font_merging_patch.zip >>> itext_font_merging_patch.zip >>> The main class is FontReplacingPdfSmartCopy (extends PdfSmartCopy). >>> >>> This subclass of PdfSmartCopy will replace different subsets >>> of the same >>> font in the resulting pdf file with one font (same font name >>> or replacement >>> can be also done). It needs the TrueType font files (TTF) to >>> do the job. >>> >>> This usually reduces the final PDF file size by 40%-60% >>> compared to plain >>> PdfSmartCopy when several single page pdfs are merged in to >>> one big pdf >>> file. >>> >>> The original motivation was to make it possible to print very >>> large pdf >>> files which have been concatenated of thousands of small pdf files >>> containing the same fonts. Some printers just clog when they receive >>> thounsands of embedded fonts in a single print job. Maybe the >>> reason is that >>> the resulting Postscript files are quite huge when there's >>> thousands of >>> embedded fonts. Having less fonts makes it more simple to >>> print-process the >>> concatenated pdf. It also uses less disk space. >>> >>> Merging/replacement currently works only with WinAnsiEncoding >>> (iso-8859-1 / >>> Cp1252 / latin1) >>> >>> Page resources are scanned while copying and it checks for a >>> dictionary >>> under FONT key. Possible FONT references in the font >>> dictionary are replaced >>> on demand. The new font will get a new indirect reference and >>> the font will >>> be written to the stream in the overridden >>> PdfWriter.addSharedObjectsToBody() method. >>> >>> This class has to be in the com.lowagie.itext.pdf package >>> because it needs >>> access to some package private methods. >>> >>> I would like to contribute this implementation to the >>> community and I hope >>> this gets in to the itext release as soon as possible. Bruno >>> or Paulo, can >>> you add this to the itext release? >>> >>> There's also a helper class for concatenating pdfs. >>> >>> PdfConcator makes it easier to concatenate multiple PDF files >>> in to a single >>> PDF file. It uses FontReplacingPdfSmartCopy to reduce the >>> number of fonts in >>> the resulting PDF file. It's also possible to replace TTF >>> fonts with other >>> fonts. Sample usage: >>> >>> >>> PdfConcator pdfConcator=new PdfConcator(); >>> pdfConcator.setFontDir(new File("fonts")); // or >>> pdfConcator.useDefaultSystemFonts(); >>> Map fontNameMapping = new HashMap(); >>> fontNameMapping.put("ComicSansMS", "Arial-Black"); >>> pdfConcator.setFontNameMapping(fontNameMapping); >>> >>> List files = new ArrayList(); >>> files.add("pdf1.pdf"); >>> files.add("pdf2.pdf"); >>> pdfConcator.concat(files, new File("concat.pdf")); >>> >>> Supports IoC/DI. Example Spring configuration xml: >>> >>> <beans >>> xsi:schemaLocation="http://www.springframework.org/schema/beans >>> http://www.springframework.org/schema/beans/spring-beans-2.0.xsd"> >>> <bean id="pdfConcator" class="com.lowagie.text.pdf.PdfConcator"> >>> <property name="fonts"> >>> <bean factory-bean="fontArrayFactory" >>> factory-method="getInstance"/> >>> </property> >>> <property name="fontNameMapping"> >>> <map> >>> <entry key="ComicSansMS" value="ArialMT"/> >>> </map> >>> </property> >>> </bean> >>> <bean id="fontArrayFactory" >>> class="com.lowagie.text.pdf.BaseFontArrayFactory"> >>> <property name="fontDirs"> >>> <list> >>> <value>C:/myfonts</value> >>> <value>fonts</value> >>> <value>/usr/share/fonts/truetype/msttcorefonts</value> >>> <value>C:/WINDOWS/FONTS</value> >>> </list> >>> </property> >>> <property name="extensions" value="ttf,otf"/> >>> <property name="ignoreUnreadable" value="true"/> >>> </bean> >>> </beans> >>> >>> >>> Regards, >>> >>> Lari Hotari >>> >>> >>> >>> Bruno Lowagie (iText) wrote: >>> > >>> > new2pdf wrote: >>> >> Is there a better way of doing this? >>> > >>> > That's not a question I can answer in only a few lines. >>> > Due to lack of time, I have to pass on this question for now. >>> > Maybe somebody else can explain how to remove fonts >>> > (although that's always a very delicate matter). >>> > br, >>> > Bruno ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ iText-questions mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://itext.ugent.be/itext-in-action/ |
|
I have a customer that would like to use this feature. The problem is that iText has to be patched (classes must be in the itext package) to use this. Therefore it would be nice if this could be included in iText.
I'd like to help with this. Is there anything that I could do to get this part to the state that it could be added to itext? Lari FontReplacingPdfSmartCopy: http://www.nabble.com/file/p15203186/itext_font_merging_patch.zip
|
|
Classes in the same iText package is not a patch to iText. Licensewise
as long as you don't change iText you don't have to tell anybody what you are doing even if your classes are in the com.lowagie package. About your changes, as I said, the scope is too narrow (supporting only truetype, no difference array, etc.). Without going into Unicode fonts, it's possible to extend you code without too much work to support a lot more single byte font features. Paulo > -----Original Message----- > From: [hidden email] > [mailto:[hidden email]] On > Behalf Of Lari Hotari > Sent: Friday, February 29, 2008 10:48 AM > To: [hidden email] > Subject: Re: [iText-questions] contribution: > FontReplacingPdfSmartCopy: duplicate TTF font subset merging > and replacement (was: How to remove embedded fonts from a pdf > document) > > > I have a customer that would like to use this feature. The > problem is that > iText has to be patched (classes must be in the itext > package) to use this. > Therefore it would be nice if this could be included in iText. > > I'd like to help with this. Is there anything that I could do > to get this > part to the state that it could be added to itext? > > Lari > > FontReplacingPdfSmartCopy: > http://www.nabble.com/file/p15203186/itext_font_merging_patch.zip > > > Paulo Soares wrote: > > > > It's not forgotten but, as you said yourself, it solves > your specific > > problem. I'll see if I can make it more generic but that takes time. > > > > Paulo > > > > ----- Original Message ----- > > From: "Lari Hotari" <[hidden email]> > > To: <[hidden email]> > > Sent: Tuesday, February 05, 2008 10:02 PM > > Subject: Re: [iText-questions] contribution: > FontReplacingPdfSmartCopy: > > duplicate TTF font subset merging and replacement (was: How > to remove > > embedded fonts from a pdf document) > > > > > >> > >> > >> Hi, > >> > >> Have you had a chance to check this out? > >> (http://www.nabble.com/file/p15203186/itext_font_merging_patch.zip) > >> > >> Regards, > >> > >> Lari Aviso Legal: Esta mensagem é destinada exclusivamente ao destinatário. Pode conter informação confidencial ou legalmente protegida. A incorrecta transmissão desta mensagem não significa a perca de confidencialidade. Se esta mensagem for recebida por engano, por favor envie-a de volta para o remetente e apague-a do seu sistema de imediato. É proibido a qualquer pessoa que não o destinatário de usar, revelar ou distribuir qualquer parte desta mensagem. Disclaimer: This message is destined exclusively to the intended receiver. It may contain confidential or legally protected information. The incorrect transmission of this message does not mean the loss of its confidentiality. If this message is received by mistake, please send it back to the sender and delete it from your system immediately. It is forbidden to any person who is not the intended receiver to use, distribute or copy any part of this message. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ iText-questions mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://itext.ugent.be/itext-in-action/ |
|
I think that the usecase of replacing & merging fonts is pretty usual.
Anyone who merges a lot of small pdfs will need this. Therefore it would help also others if this feature would be included in the main iText distribution. Paulo, What are the criterias of the code so that it could be added to iText? You said that the scope is too narrow. Aren't most iText users handling pdfs with truetype fonts (or document fonts) using single byte encoding (winansi)? Is it possible that the first version only supports ttf + winansi encoding? Anyways, Could we do some kind of requirements analysis for the full scope? required features: - support for all font types: T1, T3, TTF, TTF unicode, CJK, document fonts - support for all font encodings (winansi, macroman, macexpert, unicode, identity) - support for merging differences arrays - this means that the pagestreams have to be re-written so that one merged differences array can be used - is there code available in iText for parsing and modifying the pagestream? - support rewriting string in text-showing PDF operators (' , ", Tf) - rewriting acroform fields? - restriction: the original font file has to be available in order to do font merging (no need to merge font glyphs in that case). Lari
|
|
> -----Original Message----- > From: [hidden email] > [mailto:[hidden email]] On > Behalf Of Lari Hotari > Sent: Wednesday, March 12, 2008 9:02 AM > To: [hidden email] > Subject: Re: [iText-questions] contribution: > FontReplacingPdfSmartCopy: duplicate TTF font subset merging > and replacement (was: How to remove embedded fonts from a pdf > document) > > > I think that the usecase of replacing & merging fonts is > pretty usual. > > Anyone who merges a lot of small pdfs will need this. > Therefore it would > help also others if this feature would be included in the main iText > distribution. > Yes. > Paulo, What are the criterias of the code so that it could be added to > iText? You said that the scope is too narrow. Aren't most iText users > handling pdfs with truetype fonts (or document fonts) using > single byte > encoding (winansi)? Is it possible that the first version > only supports ttf > + winansi encoding? > Yes. > Anyways, Could we do some kind of requirements analysis for > the full scope? > > required features: > - support for all font types: T1, T3, TTF, TTF unicode, CJK, > document fonts T1 and TTF. > - support for all font encodings (winansi, macroman, > macexpert, unicode, > identity) Just single byte. > - support for merging differences arrays > - this means that the pagestreams have to be re-written > so that one > merged differences array can be used No. You can have the same font with different differences array. iText can't (easily) do it. > - is there code available in iText for parsing and > modifying the > pagestream? PdfContentParser. > - support rewriting string in text-showing PDF > operators (' , > ", Tf) See above. > - rewriting acroform fields? If needed. > - restriction: the original font file has to be available in > order to do > font merging (no need to merge font glyphs in that case). > That font or other if you are doing font replacement. The bottom line is that I'll include your code in iText and hope that it will be improved later, by you or someone else. Paulo > > Lari > > > Paulo Soares wrote: > > > > Classes in the same iText package is not a patch to iText. > Licensewise > > as long as you don't change iText you don't have to tell > anybody what > > you are doing even if your classes are in the com.lowagie > package. About > > your changes, as I said, the scope is too narrow (supporting only > > truetype, no difference array, etc.). Without going into > Unicode fonts, > > it's possible to extend you code without too much work to > support a lot > > more single byte font features. > > > > Paulo > > > > -- > View this message in context: > http://www.nabble.com/How-to-remove-embedded-fonts-from-a-pdf- > document-tp14033717p16000144.html > Sent from the iText - General mailing list archive at Nabble.com. > > > -------------------------------------------------------------- > ----------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > iText-questions mailing list > [hidden email] > https://lists.sourceforge.net/lists/listinfo/itext-questions > Buy the iText book: http://itext.ugent.be/itext-in-action/ > Aviso Legal: Esta mensagem é destinada exclusivamente ao destinatário. Pode conter informação confidencial ou legalmente protegida. A incorrecta transmissão desta mensagem não significa a perca de confidencialidade. Se esta mensagem for recebida por engano, por favor envie-a de volta para o remetente e apague-a do seu sistema de imediato. É proibido a qualquer pessoa que não o destinatário de usar, revelar ou distribuir qualquer parte desta mensagem. Disclaimer: This message is destined exclusively to the intended receiver. It may contain confidential or legally protected information. The incorrect transmission of this message does not mean the loss of its confidentiality. If this message is received by mistake, please send it back to the sender and delete it from your system immediately. It is forbidden to any person who is not the intended receiver to use, distribute or copy any part of this message. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ iText-questions mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://itext.ugent.be/itext-in-action/ |
|
Quote from:
http://www.nabble.com/Re%3A-contribution%3A-FontReplacingPdfSmartCopy%3A-duplicate-TTF-font-subset-merging-and-replacement-%28was%3A-How-to-remove-embedded-fonts-from-a-pdf-document%29-p16001839.html Hello, Is there any chance that FontReplacingPdfSmartCopy (http://www.nabble.com/How-to-remove-embedded-fonts-from-a-pdf-document-to14033717.html#a15203186) gets included in the iText main distribution? I contributed the patch over a year ago (Jan 31, 2008) and you replied Mar 12, 2008 that you can include it in iText. I believe that there is a demand for this feature since I've been receiving some questions about the contribution by email. I hope every iText user could benefit from this feature. I believe that font replacement is a very typical use case in pdf manipulation. I can sign a Contributor agreement if that's a problem in including this contribution. Someone requested a change by email to be able to replace embedded fonts with document fonts (type1) and that was possible by making a small change: Here's an example (Junit4 test method) for replacing ComicSansMS with Helvetica: @Test public void testDocFontReplace() throws Exception { FontReplacingPdfSmartCopy.logLevel=Level.INFO; PdfConcator pdfConcator=new PdfConcator(); pdfConcator.setFonts(new BaseFont[]{BaseFont.createFont(BaseFont.HELVETICA, BaseFont.WINANSI, BaseFont.NOT_EMBEDDED)}); Map fontNameMapping = new HashMap(); fontNameMapping.put("ComicSansMS", "Helvetica"); pdfConcator.setFontNameMapping(fontNameMapping); pdfConcator.concat(Collections.singleton(new File("test1.pdf")), new File("docfont-test2.pdf")); } I had to make a small modification in FontReplacingPdfSmartCopy's setFonts method: public void setFonts(BaseFont[] fonts) { this.fontmap = new HashMap(); //new HashMap<String, BaseFont>(); for (int i = 0; i < fonts.length; i++) { switch (fonts[i].getFontType()) { case BaseFont.FONT_TYPE_T3: case BaseFont.FONT_TYPE_T1: case BaseFont.FONT_TYPE_TT: if (BaseFont.WINANSI.equals(fonts[i].getEncoding())) { if (fonts[i].getFontType()==BaseFont.FONT_TYPE_T1 || Arrays.asList(fonts[i].getCodePagesSupported()) .contains("1252 Latin 1")) { if (!fontmap.containsKey(fonts[i] .getPostscriptFontName())) { logger.log(logLevel, "Adding font {0}", fonts[i].getPostscriptFontName()); fontmap.put(fonts[i].getPostscriptFontName(), fonts[i]); } else { logger.log(logLevel, "Discarding duplicate entry for font {0}", fonts[i].getPostscriptFontName()); } } else { logger.log(logLevel, "Discarding font {0}. It doesn't support Cp1252/ISO-8859-1/Latin1/WinAnsiEncoding.", fonts[i].getPostscriptFontName()); } } break; default: logger.log(logLevel, "Discarding unsupported type of font {0}. Only single byte TTF/OTF fonts are supported by now.", fonts[i].getPostscriptFontName()); } } } Regards, Lari |
|
Lari Hotari wrote:
> I can sign a Contributor agreement if that's a problem in including this > contribution. That's not the problem. I think your contribution was overlooked due to personal circumstances (Bruno's son getting ill). To make sure the contribution isn't overlooked, you can post the patch on the SourceForge tracker: http://sourceforge.net/tracker/?group_id=15255&atid=315255 And you can also send a scan of the signed CLA. The items in the tracker will be looked at next week. -- This answer is provided by 1T3XT BVBA http://www.1t3xt.com/ - http://www.1t3xt.info ------------------------------------------------------------------------------ _______________________________________________ iText-questions mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php |
| Powered by Nabble | Edit this page |
