Quantcast

How to remove embedded fonts from a pdf document

classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

How to remove embedded fonts from a pdf document

newToMina
I am concatinating many pdf forms into one pdf document using the following code.

   PdfCopyFields copy = new PdfCopyFields(new FileOutputStream("concatinated.pdf"));
   copy.setFullCompression();

   copy.addDocument(new PdfReader("form1.pdf"));
   copy.addDocument(new PdfReader("form2.pdf"));
   copy.addDocument(new PdfReader("form3.pdf"));
   .
   .
   copy.addDocument(new PdfReader("formn.pdf"));
   copy.close();

How can I remove all the embedded fonts (which accounts for more than 80% of the size of this documents) from the concatinated.pdf?

Currently I am removing the embedded fonts using Pdf Optimizer in Acrobat professional. But as the number of these concatinated documents increases (around 75 as of now and can be more in near future), I am trying to find out an easier way to do this instead of manually removing fonts from each documents one by one. Any help will be truly appreciated.

Thanks
-new2pdf
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: How to remove embedded fonts from a pdf document

newToMina
Is there a better way of doing this?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: How to remove embedded fonts from a pdf document

blowagie
new2pdf wrote:
> Is there a better way of doing this?

That's not a question I can answer in only a few lines.
Due to lack of time, I have to pass on this question for now.
Maybe somebody else can explain how to remove fonts
(although that's always a very delicate matter).
br,
Bruno

-------------------------------------------------------------------------
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell.  From the desktop to the data center, Linux is going
mainstream.  Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions
Buy the iText book: http://itext.ugent.be/itext-in-action/

smime.p7s (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: How to remove embedded fonts from a pdf document

Leonard Rosenthol
As Bruno said, this is _NOT_ a trivial task.

You will need to understand the complexities of font formats, text  
encoding and how these relate to PDF content streams.   Once armed  
with that information, you can begin to construct code that will  
unembed fonts - or at least some of them.   It also depends on  
whether you plan to only unembed fully embedded fonts OR also those  
that are subset...

I think when I implemented one, it took me a good week to get  
everything working...and that was starting with the knowledge base, a  
comprehensive PDF library and a solid font engine.

Leonard


On Nov 30, 2007, at 9:56 AM, Bruno Lowagie wrote:

> new2pdf wrote:
>> Is there a better way of doing this?
>
> That's not a question I can answer in only a few lines.
> Due to lack of time, I have to pass on this question for now.
> Maybe somebody else can explain how to remove fonts
> (although that's always a very delicate matter).
> br,
> Bruno
>

-------------------------------------------------------------------------
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell.  From the desktop to the data center, Linux is going
mainstream.  Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions
Buy the iText book: http://itext.ugent.be/itext-in-action/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: How to remove embedded fonts from a pdf document

newToMina
Bruno & Leo,

Thanks for your time. These base pdf documents (pdf 1..n) are coming from third party companies, state & Federal governments. So we have no control over the fonts they use in these documents. We need to create several different combinations of these base documents by combining them. Once combined the average size of these documents is about 5MB (about 80% is fonts). After removing all the fonts using Acrobat Professional PDF Optimizer the size is reduced to 400-500Kb.

The problem is that we have to manually run the PDF Optimizer on the 75+ documents. We tried to run a batch sequence from Acrobat Professional, but it worked partially. It only removed some of the fonts and brought the size down to 1.2MB range. That is why I posted this question here.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

contribution: FontReplacingPdfSmartCopy: duplicate TTF font subset merging and replacement (was: How to remove embedded fonts from a pdf document)

Lari Hotari
In reply to this post by blowagie
This contribution is about TTF font merging and replacement, but it should help with the original font problem. It would also be possible to continue developing this code to remove embedded fonts and replace them with pdf document fonts. With this code you can remove other fonts and replace them with some TTF font.

Here's (itext_font_merging_patch.zip) some additional classes to com.lowagie.text.pdf .
itext_font_merging_patch.zip
The main class is FontReplacingPdfSmartCopy (extends PdfSmartCopy).

This subclass of PdfSmartCopy will replace different subsets of the same font in the resulting pdf file with one font (same font name or replacement can be also done). It needs the TrueType font files (TTF) to do the job.

This usually reduces the final PDF file size by 40%-60% compared to plain PdfSmartCopy when several single page pdfs are merged in to one big pdf file.

The original motivation was to make it possible to print very large pdf files which have been concatenated of thousands of small pdf files containing the same fonts. Some printers just clog when they receive thounsands of embedded fonts in a single print job. Maybe the reason is that the resulting Postscript files are quite huge when there's thousands of embedded fonts. Having less fonts makes it more simple to print-process the concatenated pdf. It also uses less disk space.

Merging/replacement currently works only with WinAnsiEncoding (iso-8859-1 / Cp1252 / latin1)

Page resources are scanned while copying and it checks for a dictionary under FONT key. Possible FONT references in the font dictionary are replaced on demand. The new font will get a new indirect reference and the font will be written to the stream in the overridden PdfWriter.addSharedObjectsToBody() method.

This class has to be in the com.lowagie.itext.pdf package because it needs access to some package private methods.

I would like to contribute this implementation to the community and I hope this gets in to the itext release as soon as possible. Bruno or Paulo, can you add this to the itext release?

There's also a helper class for concatenating pdfs.

PdfConcator makes it easier to concatenate multiple PDF files in to a single PDF file. It uses FontReplacingPdfSmartCopy to reduce the number of fonts in the resulting PDF file. It's also possible to replace TTF fonts with other fonts. Sample usage:


 PdfConcator pdfConcator=new PdfConcator();
 pdfConcator.setFontDir(new File("fonts")); // or pdfConcator.useDefaultSystemFonts();
 Map fontNameMapping = new HashMap();
 fontNameMapping.put("ComicSansMS", "Arial-Black");
 pdfConcator.setFontNameMapping(fontNameMapping);
 
 List files = new ArrayList();
 files.add("pdf1.pdf");
 files.add("pdf2.pdf");
 pdfConcator.concat(files, new File("concat.pdf"));
 
 Supports IoC/DI. Example Spring configuration xml:

    <beans xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.0.xsd">
      <bean id="pdfConcator" class="com.lowagie.text.pdf.PdfConcator">
        <property name="fonts">
          <bean factory-bean="fontArrayFactory" factory-method="getInstance"/>
        </property>
        <property name="fontNameMapping">
          <map>
            <entry key="ComicSansMS" value="ArialMT"/>
          </map>
        </property>
      </bean>
      <bean id="fontArrayFactory" class="com.lowagie.text.pdf.BaseFontArrayFactory">
        <property name="fontDirs">
          <list>
            <value>C:/myfonts</value>
            <value>fonts</value>
            <value>/usr/share/fonts/truetype/msttcorefonts</value>
            <value>C:/WINDOWS/FONTS</value>
          </list>
        </property>
        <property name="extensions" value="ttf,otf"/>
        <property name="ignoreUnreadable" value="true"/>
      </bean>
    </beans>
 

Regards,

Lari Hotari



Bruno Lowagie (iText) wrote
new2pdf wrote:
> Is there a better way of doing this?

That's not a question I can answer in only a few lines.
Due to lack of time, I have to pass on this question for now.
Maybe somebody else can explain how to remove fonts
(although that's always a very delicate matter).
br,
Bruno

 
-------------------------------------------------------------------------
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell.  From the desktop to the data center, Linux is going
mainstream.  Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions
Buy the iText book: http://itext.ugent.be/itext-in-action/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

contribution: FontReplacingPdfSmartCopy: duplicate TTF font subset merging and replacement (was: How to remove embedded fonts from a pdf document)

Lari Hotari
In reply to this post by blowagie
This contribution is about TTF font merging and replacement, but it should help with the original font problem. It would also be possible to continue developing this code to remove embedded fonts and replace them with pdf document fonts. With this code you can remove other fonts and replace them with some TTF font.

Here's (itext_font_merging_patch.zip) some additional classes to com.lowagie.text.pdf .
itext_font_merging_patch.zip
The main class is FontReplacingPdfSmartCopy (extends PdfSmartCopy).

This subclass of PdfSmartCopy will replace different subsets of the same font in the resulting pdf file with one font (same font name or replacement can be also done). It needs the TrueType font files (TTF) to do the job.

This usually reduces the final PDF file size by 40%-60% compared to plain PdfSmartCopy when several single page pdfs are merged in to one big pdf file.

The original motivation was to make it possible to print very large pdf files which have been concatenated of thousands of small pdf files containing the same fonts. Some printers just clog when they receive thounsands of embedded fonts in a single print job. Maybe the reason is that the resulting Postscript files are quite huge when there's thousands of embedded fonts. Having less fonts makes it more simple to print-process the concatenated pdf. It also uses less disk space.

Merging/replacement currently works only with WinAnsiEncoding (iso-8859-1 / Cp1252 / latin1)

Page resources are scanned while copying and it checks for a dictionary under FONT key. Possible FONT references in the font dictionary are replaced on demand. The new font will get a new indirect reference and the font will be written to the stream in the overridden PdfWriter.addSharedObjectsToBody() method.

This class has to be in the com.lowagie.itext.pdf package because it needs access to some package private methods.

I would like to contribute this implementation to the community and I hope this gets in to the itext release as soon as possible. Bruno or Paulo, can you add this to the itext release?

There's also a helper class for concatenating pdfs.

PdfConcator makes it easier to concatenate multiple PDF files in to a single PDF file. It uses FontReplacingPdfSmartCopy to reduce the number of fonts in the resulting PDF file. It's also possible to replace TTF fonts with other fonts. Sample usage:


 PdfConcator pdfConcator=new PdfConcator();
 pdfConcator.setFontDir(new File("fonts")); // or pdfConcator.useDefaultSystemFonts();
 Map fontNameMapping = new HashMap();
 fontNameMapping.put("ComicSansMS", "Arial-Black");
 pdfConcator.setFontNameMapping(fontNameMapping);
 
 List files = new ArrayList();
 files.add("pdf1.pdf");
 files.add("pdf2.pdf");
 pdfConcator.concat(files, new File("concat.pdf"));
 
 Supports IoC/DI. Example Spring configuration xml:

    <beans xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.0.xsd">
      <bean id="pdfConcator" class="com.lowagie.text.pdf.PdfConcator">
        <property name="fonts">
          <bean factory-bean="fontArrayFactory" factory-method="getInstance"/>
        </property>
        <property name="fontNameMapping">
          <map>
            <entry key="ComicSansMS" value="ArialMT"/>
          </map>
        </property>
      </bean>
      <bean id="fontArrayFactory" class="com.lowagie.text.pdf.BaseFontArrayFactory">
        <property name="fontDirs">
          <list>
            <value>C:/myfonts</value>
            <value>fonts</value>
            <value>/usr/share/fonts/truetype/msttcorefonts</value>
            <value>C:/WINDOWS/FONTS</value>
          </list>
        </property>
        <property name="extensions" value="ttf,otf"/>
        <property name="ignoreUnreadable" value="true"/>
      </bean>
    </beans>
 

Regards,

Lari Hotari


Bruno Lowagie (iText) wrote
new2pdf wrote:
> Is there a better way of doing this?

That's not a question I can answer in only a few lines.
Due to lack of time, I have to pass on this question for now.
Maybe somebody else can explain how to remove fonts
(although that's always a very delicate matter).
br,
Bruno

 
-------------------------------------------------------------------------
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell.  From the desktop to the data center, Linux is going
mainstream.  Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions
Buy the iText book: http://itext.ugent.be/itext-in-action/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: contribution: FontReplacingPdfSmartCopy: duplicate TTF font subset merging and replacement (was: How to remove embedded fonts from a pdf document)

Leonard Rosenthol
So this will only work if the FONT dictionary specifies a /Subtype  
of /TrueType, an /Encoding of /WinANSIEncoding and does not have a /
Differences array - correct?

There is no support for Type 1, Type 1C, Mac encodings, for custom  
encodings or for CID fonts, correct?

Also, how do you determine in the case of multiple subsets that the  
fonts were from the same font originally?  Only by /BaseFont name?

Leonard

On Jan 31, 2008, at 1:02 PM, Lari Hotari wrote:

>
> This contribution is about TTF font merging and replacement, but it  
> should
> help with the original font problem. It would also be possible to  
> continue
> developing this code to remove embedded fonts and replace them with  
> pdf
> document fonts. With this code you can remove other fonts and  
> replace them
> with some TTF font.
>
> Here's (itext_font_merging_patch.zip) some additional classes to
> com.lowagie.text.pdf .
> http://www.nabble.com/file/p15203186/itext_font_merging_patch.zip
> itext_font_merging_patch.zip
> The main class is FontReplacingPdfSmartCopy (extends PdfSmartCopy).
>
> This subclass of PdfSmartCopy will replace different subsets of the  
> same
> font in the resulting pdf file with one font (same font name or  
> replacement
> can be also done). It needs the TrueType font files (TTF) to do the  
> job.
>
> This usually reduces the final PDF file size by 40%-60% compared to  
> plain
> PdfSmartCopy when several single page pdfs are merged in to one big  
> pdf
> file.
>
> The original motivation was to make it possible to print very large  
> pdf
> files which have been concatenated of thousands of small pdf files
> containing the same fonts. Some printers just clog when they receive
> thounsands of embedded fonts in a single print job. Maybe the  
> reason is that
> the resulting Postscript files are quite huge when there's  
> thousands of
> embedded fonts. Having less fonts makes it more simple to print-
> process the
> concatenated pdf. It also uses less disk space.
>
> Merging/replacement currently works only with WinAnsiEncoding  
> (iso-8859-1 /
> Cp1252 / latin1)
>
> Page resources are scanned while copying and it checks for a  
> dictionary
> under FONT key. Possible FONT references in the font dictionary are  
> replaced
> on demand. The new font will get a new indirect reference and the  
> font will
> be written to the stream in the overridden
> PdfWriter.addSharedObjectsToBody() method.
>
> This class has to be in the com.lowagie.itext.pdf package because  
> it needs
> access to some package private methods.
>
> I would like to contribute this implementation to the community and  
> I hope
> this gets in to the itext release as soon as possible. Bruno or  
> Paulo, can
> you add this to the itext release?
>
> There's also a helper class for concatenating pdfs.
>
> PdfConcator makes it easier to concatenate multiple PDF files in to  
> a single
> PDF file. It uses FontReplacingPdfSmartCopy to reduce the number of  
> fonts in
> the resulting PDF file. It's also possible to replace TTF fonts  
> with other
> fonts. Sample usage:
>
>
>  PdfConcator pdfConcator=new PdfConcator();
>  pdfConcator.setFontDir(new File("fonts")); // or
> pdfConcator.useDefaultSystemFonts();
>  Map fontNameMapping = new HashMap();
>  fontNameMapping.put("ComicSansMS", "Arial-Black");
>  pdfConcator.setFontNameMapping(fontNameMapping);
>
>  List files = new ArrayList();
>  files.add("pdf1.pdf");
>  files.add("pdf2.pdf");
>  pdfConcator.concat(files, new File("concat.pdf"));
>
>  Supports IoC/DI. Example Spring configuration xml:
>
>     <beans xsi:schemaLocation="http://www.springframework.org/ 
> schema/beans
> http://www.springframework.org/schema/beans/spring-beans-2.0.xsd">
>       <bean id="pdfConcator" class="com.lowagie.text.pdf.PdfConcator">
>         <property name="fonts">
>           <bean factory-bean="fontArrayFactory"
> factory-method="getInstance"/>
>         </property>
>         <property name="fontNameMapping">
>           <map>
>             <entry key="ComicSansMS" value="ArialMT"/>
>           </map>
>         </property>
>       </bean>
>       <bean id="fontArrayFactory"
> class="com.lowagie.text.pdf.BaseFontArrayFactory">
>         <property name="fontDirs">
>           <list>
>             <value>C:/myfonts</value>
>             <value>fonts</value>
>             <value>/usr/share/fonts/truetype/msttcorefonts</value>
>             <value>C:/WINDOWS/FONTS</value>
>           </list>
>         </property>
>         <property name="extensions" value="ttf,otf"/>
>         <property name="ignoreUnreadable" value="true"/>
>       </bean>
>     </beans>
>
>
> Regards,
>
> Lari Hotari
>
>
>
> Bruno Lowagie (iText) wrote:
>>
>> new2pdf wrote:
>>> Is there a better way of doing this?
>>
>> That's not a question I can answer in only a few lines.
>> Due to lack of time, I have to pass on this question for now.
>> Maybe somebody else can explain how to remove fonts
>> (although that's always a very delicate matter).
>> br,
>> Bruno
>>
>>
>> ---------------------------------------------------------------------
>> ----
>> SF.Net email is sponsored by: The Future of Linux Business White  
>> Paper
>> from Novell.  From the desktop to the data center, Linux is going
>> mainstream.  Let it simplify your IT future.
>> http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
>> _______________________________________________
>> iText-questions mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/itext-questions
>> Buy the iText book: http://itext.ugent.be/itext-in-action/
>>
>>
>
> --
> View this message in context: http://www.nabble.com/How-to-remove- 
> embedded-fonts-from-a-pdf-document-tp14033717p15203186.html
> Sent from the iText - General mailing list archive at Nabble.com.
>
>
> ----------------------------------------------------------------------
> ---
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> iText-questions mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/itext-questions
> Buy the iText book: http://itext.ugent.be/itext-in-action/
>


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions
Buy the iText book: http://itext.ugent.be/itext-in-action/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: contribution: FontReplacingPdfSmartCopy: duplicate TTF font subset merging and replacement (was: How to remove embedded fonts from a pdf document)

Paulo Soares
In reply to this post by Lari Hotari
I'll have a look.

Paulo

> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On
> Behalf Of Lari Hotari
> Sent: Thursday, January 31, 2008 12:03 PM
> To: [hidden email]
> Subject: [iText-questions] contribution:
> FontReplacingPdfSmartCopy: duplicate TTF font subset merging
> and replacement (was: How to remove embedded fonts from a pdf
> document)
>
>
> This contribution is about TTF font merging and replacement,
> but it should
> help with the original font problem. It would also be
> possible to continue
> developing this code to remove embedded fonts and replace
> them with pdf
> document fonts. With this code you can remove other fonts and
> replace them
> with some TTF font.
>
> Here's (itext_font_merging_patch.zip) some additional classes to
> com.lowagie.text.pdf .
> http://www.nabble.com/file/p15203186/itext_font_merging_patch.zip
> itext_font_merging_patch.zip
> The main class is FontReplacingPdfSmartCopy (extends PdfSmartCopy).
>
> This subclass of PdfSmartCopy will replace different subsets
> of the same
> font in the resulting pdf file with one font (same font name
> or replacement
> can be also done). It needs the TrueType font files (TTF) to
> do the job.
>
> This usually reduces the final PDF file size by 40%-60%
> compared to plain
> PdfSmartCopy when several single page pdfs are merged in to
> one big pdf
> file.
>
> The original motivation was to make it possible to print very
> large pdf
> files which have been concatenated of thousands of small pdf files
> containing the same fonts. Some printers just clog when they receive
> thounsands of embedded fonts in a single print job. Maybe the
> reason is that
> the resulting Postscript files are quite huge when there's
> thousands of
> embedded fonts. Having less fonts makes it more simple to
> print-process the
> concatenated pdf. It also uses less disk space.
>
> Merging/replacement currently works only with WinAnsiEncoding
> (iso-8859-1 /
> Cp1252 / latin1)
>
> Page resources are scanned while copying and it checks for a
> dictionary
> under FONT key. Possible FONT references in the font
> dictionary are replaced
> on demand. The new font will get a new indirect reference and
> the font will
> be written to the stream in the overridden
> PdfWriter.addSharedObjectsToBody() method.
>
> This class has to be in the com.lowagie.itext.pdf package
> because it needs
> access to some package private methods.
>
> I would like to contribute this implementation to the
> community and I hope
> this gets in to the itext release as soon as possible. Bruno
> or Paulo, can
> you add this to the itext release?
>
> There's also a helper class for concatenating pdfs.
>
> PdfConcator makes it easier to concatenate multiple PDF files
> in to a single
> PDF file. It uses FontReplacingPdfSmartCopy to reduce the
> number of fonts in
> the resulting PDF file. It's also possible to replace TTF
> fonts with other
> fonts. Sample usage:
>
>
>  PdfConcator pdfConcator=new PdfConcator();
>  pdfConcator.setFontDir(new File("fonts")); // or
> pdfConcator.useDefaultSystemFonts();
>  Map fontNameMapping = new HashMap();
>  fontNameMapping.put("ComicSansMS", "Arial-Black");
>  pdfConcator.setFontNameMapping(fontNameMapping);
>  
>  List files = new ArrayList();
>  files.add("pdf1.pdf");
>  files.add("pdf2.pdf");
>  pdfConcator.concat(files, new File("concat.pdf"));
>  
>  Supports IoC/DI. Example Spring configuration xml:
>
>     <beans
> xsi:schemaLocation="http://www.springframework.org/schema/beans
> http://www.springframework.org/schema/beans/spring-beans-2.0.xsd">
>       <bean id="pdfConcator" class="com.lowagie.text.pdf.PdfConcator">
>         <property name="fonts">
>           <bean factory-bean="fontArrayFactory"
> factory-method="getInstance"/>
>         </property>
>         <property name="fontNameMapping">
>           <map>
>             <entry key="ComicSansMS" value="ArialMT"/>
>           </map>
>         </property>
>       </bean>
>       <bean id="fontArrayFactory"
> class="com.lowagie.text.pdf.BaseFontArrayFactory">
>         <property name="fontDirs">
>           <list>
>             <value>C:/myfonts</value>
>             <value>fonts</value>
>             <value>/usr/share/fonts/truetype/msttcorefonts</value>
>             <value>C:/WINDOWS/FONTS</value>
>           </list>
>         </property>
>         <property name="extensions" value="ttf,otf"/>
>         <property name="ignoreUnreadable" value="true"/>
>       </bean>
>     </beans>
>  
>
> Regards,
>
> Lari Hotari
>
>
>
> Bruno Lowagie (iText) wrote:
> >
> > new2pdf wrote:
> >> Is there a better way of doing this?
> >
> > That's not a question I can answer in only a few lines.
> > Due to lack of time, I have to pass on this question for now.
> > Maybe somebody else can explain how to remove fonts
> > (although that's always a very delicate matter).
> > br,
> > Bruno

Aviso Legal:
Esta mensagem é destinada exclusivamente ao destinatário. Pode conter informação confidencial ou legalmente protegida. A incorrecta transmissão desta mensagem não significa a perca de confidencialidade. Se esta mensagem for recebida por engano, por favor envie-a de volta para o remetente e apague-a do seu sistema de imediato. É proibido a qualquer pessoa que não o destinatário de usar, revelar ou distribuir qualquer parte desta mensagem.

Disclaimer:
This message is destined exclusively to the intended receiver. It may contain confidential or legally protected information. The incorrect transmission of this message does not mean the loss of its confidentiality. If this message is received by mistake, please send it back to the sender and delete it from your system immediately. It is forbidden to any person who is not the intended receiver to use, distribute or copy any part of this message.



-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions
Buy the iText book: http://itext.ugent.be/itext-in-action/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: contribution: FontReplacingPdfSmartCopy: duplicate TTF font subset merging and replacement (was: How to remove embedded fonts from a pdf document)

Lari Hotari
In reply to this post by Leonard Rosenthol
Leonard Rosenthol wrote
So this will only work if the FONT dictionary specifies a /Subtype  
of /TrueType, an /Encoding of /WinANSIEncoding and does not have a /
Differences array - correct?

There is no support for Type 1, Type 1C, Mac encodings, for custom  
encodings or for CID fonts, correct?

Also, how do you determine in the case of multiple subsets that the  
fonts were from the same font originally?  Only by /BaseFont name?
I started the work as proof-of-concept. My usecase was only for TrueType fonts so I haven't tried too much more.

I also looked at replacing other type of encoding like Identity-H for barcode fonts but they use a custom encoding in the page stream and that would be too much effort to start merging the encodings. It would also require parsing & modifying the page stream.

The font's are only recognized by the /BaseFont name, there isn't any other checking done. The FirstChar and LastChar ranges are updated in the final font (the minimum FirstChar and maximum LastChar is selected for the final font).

Subset fonts have a name like "ZIGEYT+ComicSansMS". The subset fonts have their real postscript fontname after the "+" sign.

PDF reference , p. 419
"For a font subset, the PostScript name of the font—the value of the font’s BaseFont entry and the font descriptor’s FontName entry—begins with a tagfollowed by a plus sign (+). The tag consists of exactly six uppercase letters; the choice of letters is arbitrary, but different subsets in the same PDF file must have different tags. For example, EOODIA+Poetica is the name of a subset of Poetica®, a Type 1 font. (See implementation note 63 in Appendix H.)"

If you look at the source code you can see that the implementation is fairly light weight currently. It just extends PdfSmartCopy with font replacement/merging possiblity. It uses existing methods to write fonts etc. (copy&paste from other itext classes in some places).
There's also a jUnit testcase is the zip, under test subdirectory.
It uses the PdfConcator helper class that makes it easy to configure a PdfConcator bean instance in IoC/DI (Spring Framework, Guice, etc.).

I hope that this work could serve as a baseline for adding font merging and replacement features to iText.

It would be nice to have some kind of template method pattern or strategy pattern for customizing the base solution for different use cases.

Lari
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: contribution: FontReplacingPdfSmartCopy: duplicate TTF font subset merging and replacement (was: How to remove embedded fonts from a pdf document)

Leonard Rosenthol
While that is the recommendation of the PDF spec - you will find in the real world situations where there are fonts that are subset but w/o this name change AND fonts with this name setup that aren't actually subset.

Leonard

On Jan 31, 2008, at 5:28 PM, Lari Hotari wrote:

Subset fonts have a name like "ZIGEYT+ComicSansMS". The subset fonts have

their real postscript fontname after the "+" sign.



-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions
Buy the iText book: http://itext.ugent.be/itext-in-action/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: contribution: FontReplacingPdfSmartCopy: duplicate TTF font subset merging and replacement (was: How to remove embedded fonts from a pdf document)

Lari Hotari
Leonard Rosenthol wrote
While that is the recommendation of the PDF spec - you will find in  
the real world situations where there are fonts that are subset but w/
o this name change AND fonts with this name setup that aren't  
actually subset.

Leonard
So there's "illegal pdfs" around too. :)

Anyways this solution isn't too dependent on that subset information. I mentioned about that it would be nice that this font replacement part could have some part customizable with a strategy pattern and template method pattern. For example one could provide their own strategy to replace a font based on some custom logic and matching. Some parts could be customizable with the template method pattern by subclassing the base class. I think these possibilities would make it easy to handle most requirements.

My own usecase is quite simple since there's only a couple of fonts that have thousands of subset fonts. I'm only targeting the merging and replacement to certain fonts. Currently this solution works quite well for this kind of usecase.

Lari
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: contribution: FontReplacingPdfSmartCopy: duplicate TTF font subset merging and replacement (was: How to remove embedded fonts from a pdf document)

Lari Hotari
In reply to this post by Paulo Soares

Hi,

Have you had a chance to check this out?
(http://www.nabble.com/file/p15203186/itext_font_merging_patch.zip)

Regards,

Lari


Paulo Soares wrote
I'll have a look.

Paulo

> -----Original Message-----
> From: itext-questions-bounces@lists.sourceforge.net
> [mailto:itext-questions-bounces@lists.sourceforge.net] On
> Behalf Of Lari Hotari
> Sent: Thursday, January 31, 2008 12:03 PM
> To: itext-questions@lists.sourceforge.net
> Subject: [iText-questions] contribution:
> FontReplacingPdfSmartCopy: duplicate TTF font subset merging
> and replacement (was: How to remove embedded fonts from a pdf
> document)
>
>
> This contribution is about TTF font merging and replacement,
> but it should
> help with the original font problem. It would also be
> possible to continue
> developing this code to remove embedded fonts and replace
> them with pdf
> document fonts. With this code you can remove other fonts and
> replace them
> with some TTF font.
>
> Here's (itext_font_merging_patch.zip) some additional classes to
> com.lowagie.text.pdf .
> http://www.nabble.com/file/p15203186/itext_font_merging_patch.zip
> itext_font_merging_patch.zip
> The main class is FontReplacingPdfSmartCopy (extends PdfSmartCopy).
>
> This subclass of PdfSmartCopy will replace different subsets
> of the same
> font in the resulting pdf file with one font (same font name
> or replacement
> can be also done). It needs the TrueType font files (TTF) to
> do the job.
>
> This usually reduces the final PDF file size by 40%-60%
> compared to plain
> PdfSmartCopy when several single page pdfs are merged in to
> one big pdf
> file.
>
> The original motivation was to make it possible to print very
> large pdf
> files which have been concatenated of thousands of small pdf files
> containing the same fonts. Some printers just clog when they receive
> thounsands of embedded fonts in a single print job. Maybe the
> reason is that
> the resulting Postscript files are quite huge when there's
> thousands of
> embedded fonts. Having less fonts makes it more simple to
> print-process the
> concatenated pdf. It also uses less disk space.
>
> Merging/replacement currently works only with WinAnsiEncoding
> (iso-8859-1 /
> Cp1252 / latin1)
>
> Page resources are scanned while copying and it checks for a
> dictionary
> under FONT key. Possible FONT references in the font
> dictionary are replaced
> on demand. The new font will get a new indirect reference and
> the font will
> be written to the stream in the overridden
> PdfWriter.addSharedObjectsToBody() method.
>
> This class has to be in the com.lowagie.itext.pdf package
> because it needs
> access to some package private methods.
>
> I would like to contribute this implementation to the
> community and I hope
> this gets in to the itext release as soon as possible. Bruno
> or Paulo, can
> you add this to the itext release?
>
> There's also a helper class for concatenating pdfs.
>
> PdfConcator makes it easier to concatenate multiple PDF files
> in to a single
> PDF file. It uses FontReplacingPdfSmartCopy to reduce the
> number of fonts in
> the resulting PDF file. It's also possible to replace TTF
> fonts with other
> fonts. Sample usage:
>
>
>  PdfConcator pdfConcator=new PdfConcator();
>  pdfConcator.setFontDir(new File("fonts")); // or
> pdfConcator.useDefaultSystemFonts();
>  Map fontNameMapping = new HashMap();
>  fontNameMapping.put("ComicSansMS", "Arial-Black");
>  pdfConcator.setFontNameMapping(fontNameMapping);
>  
>  List files = new ArrayList();
>  files.add("pdf1.pdf");
>  files.add("pdf2.pdf");
>  pdfConcator.concat(files, new File("concat.pdf"));
>  
>  Supports IoC/DI. Example Spring configuration xml:
>
>     <beans
> xsi:schemaLocation="http://www.springframework.org/schema/beans
> http://www.springframework.org/schema/beans/spring-beans-2.0.xsd">
>       <bean id="pdfConcator" class="com.lowagie.text.pdf.PdfConcator">
>         <property name="fonts">
>           <bean factory-bean="fontArrayFactory"
> factory-method="getInstance"/>
>         </property>
>         <property name="fontNameMapping">
>           <map>
>             <entry key="ComicSansMS" value="ArialMT"/>
>           </map>
>         </property>
>       </bean>
>       <bean id="fontArrayFactory"
> class="com.lowagie.text.pdf.BaseFontArrayFactory">
>         <property name="fontDirs">
>           <list>
>             <value>C:/myfonts</value>
>             <value>fonts</value>
>             <value>/usr/share/fonts/truetype/msttcorefonts</value>
>             <value>C:/WINDOWS/FONTS</value>
>           </list>
>         </property>
>         <property name="extensions" value="ttf,otf"/>
>         <property name="ignoreUnreadable" value="true"/>
>       </bean>
>     </beans>
>  
>
> Regards,
>
> Lari Hotari
>
>
>
> Bruno Lowagie (iText) wrote:
> >
> > new2pdf wrote:
> >> Is there a better way of doing this?
> >
> > That's not a question I can answer in only a few lines.
> > Due to lack of time, I have to pass on this question for now.
> > Maybe somebody else can explain how to remove fonts
> > (although that's always a very delicate matter).
> > br,
> > Bruno


Aviso Legal:

Esta mensagem é destinada exclusivamente ao destinatário. Pode conter informação confidencial ou legalmente protegida. A incorrecta transmissão desta mensagem não significa a perca de confidencialidade. Se esta mensagem for recebida por engano, por favor envie-a de volta para o remetente e apague-a do seu sistema de imediato. É proibido a qualquer pessoa que não o destinatário de usar, revelar ou distribuir qualquer parte desta mensagem.



Disclaimer:

This message is destined exclusively to the intended receiver. It may contain confidential or legally protected information. The incorrect transmission of this message does not mean the loss of its confidentiality. If this message is received by mistake, please send it back to the sender and delete it from your system immediately. It is forbidden to any person who is not the intended receiver to use, distribute or copy any part of this message.




-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions
Buy the iText book: http://itext.ugent.be/itext-in-action/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: contribution: FontReplacingPdfSmartCopy: duplicate TTF font subset merging and replacement (was: How to remove embedded fonts from a pdf document)

Paulo Soares
It's not forgotten but, as you said yourself, it solves your specific
problem. I'll see if I can make it more generic but that takes time.

Paulo

----- Original Message -----
From: "Lari Hotari" <[hidden email]>
To: <[hidden email]>
Sent: Tuesday, February 05, 2008 10:02 PM
Subject: Re: [iText-questions] contribution: FontReplacingPdfSmartCopy:
duplicate TTF font subset merging and replacement (was: How to remove
embedded fonts from a pdf document)


>
>
> Hi,
>
> Have you had a chance to check this out?
> (http://www.nabble.com/file/p15203186/itext_font_merging_patch.zip)
>
> Regards,
>
> Lari
>
>
>
> Paulo Soares wrote:
>>
>> I'll have a look.
>>
>> Paulo
>>
>>> -----Original Message-----
>>> From: [hidden email]
>>> [mailto:[hidden email]] On
>>> Behalf Of Lari Hotari
>>> Sent: Thursday, January 31, 2008 12:03 PM
>>> To: [hidden email]
>>> Subject: [iText-questions] contribution:
>>> FontReplacingPdfSmartCopy: duplicate TTF font subset merging
>>> and replacement (was: How to remove embedded fonts from a pdf
>>> document)
>>>
>>>
>>> This contribution is about TTF font merging and replacement,
>>> but it should
>>> help with the original font problem. It would also be
>>> possible to continue
>>> developing this code to remove embedded fonts and replace
>>> them with pdf
>>> document fonts. With this code you can remove other fonts and
>>> replace them
>>> with some TTF font.
>>>
>>> Here's (itext_font_merging_patch.zip) some additional classes to
>>> com.lowagie.text.pdf .
>>> http://www.nabble.com/file/p15203186/itext_font_merging_patch.zip
>>> itext_font_merging_patch.zip
>>> The main class is FontReplacingPdfSmartCopy (extends PdfSmartCopy).
>>>
>>> This subclass of PdfSmartCopy will replace different subsets
>>> of the same
>>> font in the resulting pdf file with one font (same font name
>>> or replacement
>>> can be also done). It needs the TrueType font files (TTF) to
>>> do the job.
>>>
>>> This usually reduces the final PDF file size by 40%-60%
>>> compared to plain
>>> PdfSmartCopy when several single page pdfs are merged in to
>>> one big pdf
>>> file.
>>>
>>> The original motivation was to make it possible to print very
>>> large pdf
>>> files which have been concatenated of thousands of small pdf files
>>> containing the same fonts. Some printers just clog when they receive
>>> thounsands of embedded fonts in a single print job. Maybe the
>>> reason is that
>>> the resulting Postscript files are quite huge when there's
>>> thousands of
>>> embedded fonts. Having less fonts makes it more simple to
>>> print-process the
>>> concatenated pdf. It also uses less disk space.
>>>
>>> Merging/replacement currently works only with WinAnsiEncoding
>>> (iso-8859-1 /
>>> Cp1252 / latin1)
>>>
>>> Page resources are scanned while copying and it checks for a
>>> dictionary
>>> under FONT key. Possible FONT references in the font
>>> dictionary are replaced
>>> on demand. The new font will get a new indirect reference and
>>> the font will
>>> be written to the stream in the overridden
>>> PdfWriter.addSharedObjectsToBody() method.
>>>
>>> This class has to be in the com.lowagie.itext.pdf package
>>> because it needs
>>> access to some package private methods.
>>>
>>> I would like to contribute this implementation to the
>>> community and I hope
>>> this gets in to the itext release as soon as possible. Bruno
>>> or Paulo, can
>>> you add this to the itext release?
>>>
>>> There's also a helper class for concatenating pdfs.
>>>
>>> PdfConcator makes it easier to concatenate multiple PDF files
>>> in to a single
>>> PDF file. It uses FontReplacingPdfSmartCopy to reduce the
>>> number of fonts in
>>> the resulting PDF file. It's also possible to replace TTF
>>> fonts with other
>>> fonts. Sample usage:
>>>
>>>
>>>  PdfConcator pdfConcator=new PdfConcator();
>>>  pdfConcator.setFontDir(new File("fonts")); // or
>>> pdfConcator.useDefaultSystemFonts();
>>>  Map fontNameMapping = new HashMap();
>>>  fontNameMapping.put("ComicSansMS", "Arial-Black");
>>>  pdfConcator.setFontNameMapping(fontNameMapping);
>>>
>>>  List files = new ArrayList();
>>>  files.add("pdf1.pdf");
>>>  files.add("pdf2.pdf");
>>>  pdfConcator.concat(files, new File("concat.pdf"));
>>>
>>>  Supports IoC/DI. Example Spring configuration xml:
>>>
>>>     <beans
>>> xsi:schemaLocation="http://www.springframework.org/schema/beans
>>> http://www.springframework.org/schema/beans/spring-beans-2.0.xsd">
>>>       <bean id="pdfConcator" class="com.lowagie.text.pdf.PdfConcator">
>>>         <property name="fonts">
>>>           <bean factory-bean="fontArrayFactory"
>>> factory-method="getInstance"/>
>>>         </property>
>>>         <property name="fontNameMapping">
>>>           <map>
>>>             <entry key="ComicSansMS" value="ArialMT"/>
>>>           </map>
>>>         </property>
>>>       </bean>
>>>       <bean id="fontArrayFactory"
>>> class="com.lowagie.text.pdf.BaseFontArrayFactory">
>>>         <property name="fontDirs">
>>>           <list>
>>>             <value>C:/myfonts</value>
>>>             <value>fonts</value>
>>>             <value>/usr/share/fonts/truetype/msttcorefonts</value>
>>>             <value>C:/WINDOWS/FONTS</value>
>>>           </list>
>>>         </property>
>>>         <property name="extensions" value="ttf,otf"/>
>>>         <property name="ignoreUnreadable" value="true"/>
>>>       </bean>
>>>     </beans>
>>>
>>>
>>> Regards,
>>>
>>> Lari Hotari
>>>
>>>
>>>
>>> Bruno Lowagie (iText) wrote:
>>> >
>>> > new2pdf wrote:
>>> >> Is there a better way of doing this?
>>> >
>>> > That's not a question I can answer in only a few lines.
>>> > Due to lack of time, I have to pass on this question for now.
>>> > Maybe somebody else can explain how to remove fonts
>>> > (although that's always a very delicate matter).
>>> > br,
>>> > Bruno


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions
Buy the iText book: http://itext.ugent.be/itext-in-action/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: contribution: FontReplacingPdfSmartCopy: duplicate TTF font subset merging and replacement (was: How to remove embedded fonts from a pdf document)

Lari Hotari
I have a customer that would like to use this feature. The problem is that iText has to be patched (classes must be in the itext package) to use this. Therefore it would be nice if this could be included in iText.

I'd like to help with this. Is there anything that I could do to get this part to the state that it could be added to itext?

Lari

FontReplacingPdfSmartCopy: http://www.nabble.com/file/p15203186/itext_font_merging_patch.zip

Paulo Soares wrote
It's not forgotten but, as you said yourself, it solves your specific
problem. I'll see if I can make it more generic but that takes time.

Paulo

----- Original Message -----
From: "Lari Hotari" <lari.hotari@sagire.fi>
To: <itext-questions@lists.sourceforge.net>
Sent: Tuesday, February 05, 2008 10:02 PM
Subject: Re: [iText-questions] contribution: FontReplacingPdfSmartCopy:
duplicate TTF font subset merging and replacement (was: How to remove
embedded fonts from a pdf document)


>
>
> Hi,
>
> Have you had a chance to check this out?
> (http://www.nabble.com/file/p15203186/itext_font_merging_patch.zip)
>
> Regards,
>
> Lari
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: contribution: FontReplacingPdfSmartCopy: duplicate TTF font subset merging and replacement (was: How to remove embedded fonts from a pdf document)

Paulo Soares
Classes in the same iText package is not a patch to iText. Licensewise
as long as you don't change iText you don't have to tell anybody what
you are doing even if your classes are in the com.lowagie package. About
your changes, as I said, the scope is too narrow (supporting only
truetype, no difference array, etc.). Without going into Unicode fonts,
it's possible to extend you code without too much work to support a lot
more single byte font features.

Paulo

> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On
> Behalf Of Lari Hotari
> Sent: Friday, February 29, 2008 10:48 AM
> To: [hidden email]
> Subject: Re: [iText-questions] contribution:
> FontReplacingPdfSmartCopy: duplicate TTF font subset merging
> and replacement (was: How to remove embedded fonts from a pdf
> document)
>
>
> I have a customer that would like to use this feature. The
> problem is that
> iText has to be patched (classes must be in the itext
> package) to use this.
> Therefore it would be nice if this could be included in iText.
>
> I'd like to help with this. Is there anything that I could do
> to get this
> part to the state that it could be added to itext?
>
> Lari
>
> FontReplacingPdfSmartCopy:
> http://www.nabble.com/file/p15203186/itext_font_merging_patch.zip
>
>
> Paulo Soares wrote:
> >
> > It's not forgotten but, as you said yourself, it solves
> your specific
> > problem. I'll see if I can make it more generic but that takes time.
> >
> > Paulo
> >
> > ----- Original Message -----
> > From: "Lari Hotari" <[hidden email]>
> > To: <[hidden email]>
> > Sent: Tuesday, February 05, 2008 10:02 PM
> > Subject: Re: [iText-questions] contribution:
> FontReplacingPdfSmartCopy:
> > duplicate TTF font subset merging and replacement (was: How
> to remove
> > embedded fonts from a pdf document)
> >
> >
> >>
> >>
> >> Hi,
> >>
> >> Have you had a chance to check this out?
> >> (http://www.nabble.com/file/p15203186/itext_font_merging_patch.zip)
> >>
> >> Regards,
> >>
> >> Lari

Aviso Legal:
Esta mensagem é destinada exclusivamente ao destinatário. Pode conter informação confidencial ou legalmente protegida. A incorrecta transmissão desta mensagem não significa a perca de confidencialidade. Se esta mensagem for recebida por engano, por favor envie-a de volta para o remetente e apague-a do seu sistema de imediato. É proibido a qualquer pessoa que não o destinatário de usar, revelar ou distribuir qualquer parte desta mensagem.

Disclaimer:
This message is destined exclusively to the intended receiver. It may contain confidential or legally protected information. The incorrect transmission of this message does not mean the loss of its confidentiality. If this message is received by mistake, please send it back to the sender and delete it from your system immediately. It is forbidden to any person who is not the intended receiver to use, distribute or copy any part of this message.



-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions
Buy the iText book: http://itext.ugent.be/itext-in-action/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: contribution: FontReplacingPdfSmartCopy: duplicate TTF font subset merging and replacement (was: How to remove embedded fonts from a pdf document)

Lari Hotari
I think that the usecase of replacing & merging fonts is pretty usual.

Anyone who merges a lot of small pdfs will need this. Therefore it would help also others if this feature would be included in the main iText distribution.

Paulo, What are the criterias of the code so that it could be added to iText? You said that the scope is too narrow. Aren't most iText users handling pdfs with truetype fonts (or document fonts) using single byte encoding (winansi)? Is it possible that the first version only supports ttf + winansi encoding?

Anyways, Could we do some kind of requirements analysis for the full scope?

required features:
- support for all font types: T1, T3, TTF, TTF unicode, CJK, document fonts
- support for all font encodings (winansi, macroman, macexpert, unicode, identity)
- support for merging differences arrays
    - this means that the pagestreams have to be re-written so that one merged differences array can be used
         - is there code available in iText for parsing and modifying the pagestream?
              - support rewriting string in text-showing PDF operators (' , ", Tf)
         - rewriting acroform fields?
- restriction: the original font file has to be available in order to do font merging (no need to merge font glyphs in that case).


Lari

Paulo Soares wrote
Classes in the same iText package is not a patch to iText. Licensewise
as long as you don't change iText you don't have to tell anybody what
you are doing even if your classes are in the com.lowagie package. About
your changes, as I said, the scope is too narrow (supporting only
truetype, no difference array, etc.). Without going into Unicode fonts,
it's possible to extend you code without too much work to support a lot
more single byte font features.

Paulo
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: contribution: FontReplacingPdfSmartCopy: duplicate TTF font subset merging and replacement (was: How to remove embedded fonts from a pdf document)

Paulo Soares
 

> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On
> Behalf Of Lari Hotari
> Sent: Wednesday, March 12, 2008 9:02 AM
> To: [hidden email]
> Subject: Re: [iText-questions] contribution:
> FontReplacingPdfSmartCopy: duplicate TTF font subset merging
> and replacement (was: How to remove embedded fonts from a pdf
> document)
>
>
> I think that the usecase of replacing & merging fonts is
> pretty usual.
>
Not really.
 
> Anyone who merges a lot of small pdfs will need this.
> Therefore it would
> help also others if this feature would be included in the main iText
> distribution.
>

Yes.
 
> Paulo, What are the criterias of the code so that it could be added to
> iText? You said that the scope is too narrow. Aren't most iText users
> handling pdfs with truetype fonts (or document fonts) using
> single byte
> encoding (winansi)? Is it possible that the first version
> only supports ttf
> + winansi encoding?
>

Yes.
 
> Anyways, Could we do some kind of requirements analysis for
> the full scope?
>
> required features:
> - support for all font types: T1, T3, TTF, TTF unicode, CJK,
> document fonts

T1 and TTF.

> - support for all font encodings (winansi, macroman,
> macexpert, unicode,
> identity)

Just single byte.

> - support for merging differences arrays
>     - this means that the pagestreams have to be re-written
> so that one
> merged differences array can be used

No. You can have the same font with different differences array. iText
can't (easily) do it.

>          - is there code available in iText for parsing and
> modifying the
> pagestream?

PdfContentParser.

>               - support rewriting string in text-showing PDF
> operators (' ,
> ", Tf)

See above.

>          - rewriting acroform fields?

If needed.

> - restriction: the original font file has to be available in
> order to do
> font merging (no need to merge font glyphs in that case).
>

That font or other if you are doing font replacement.

The bottom line is that I'll include your code in iText and hope that it
will be improved later, by you or someone else.

Paulo
 

>
> Lari
>
>
> Paulo Soares wrote:
> >
> > Classes in the same iText package is not a patch to iText.
> Licensewise
> > as long as you don't change iText you don't have to tell
> anybody what
> > you are doing even if your classes are in the com.lowagie
> package. About
> > your changes, as I said, the scope is too narrow (supporting only
> > truetype, no difference array, etc.). Without going into
> Unicode fonts,
> > it's possible to extend you code without too much work to
> support a lot
> > more single byte font features.
> >
> > Paulo
> >
>
> --
> View this message in context:
> http://www.nabble.com/How-to-remove-embedded-fonts-from-a-pdf-
> document-tp14033717p16000144.html
> Sent from the iText - General mailing list archive at Nabble.com.
>
>
> --------------------------------------------------------------
> -----------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> iText-questions mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/itext-questions
> Buy the iText book: http://itext.ugent.be/itext-in-action/
>

Aviso Legal:
Esta mensagem é destinada exclusivamente ao destinatário. Pode conter informação confidencial ou legalmente protegida. A incorrecta transmissão desta mensagem não significa a perca de confidencialidade. Se esta mensagem for recebida por engano, por favor envie-a de volta para o remetente e apague-a do seu sistema de imediato. É proibido a qualquer pessoa que não o destinatário de usar, revelar ou distribuir qualquer parte desta mensagem.

Disclaimer:
This message is destined exclusively to the intended receiver. It may contain confidential or legally protected information. The incorrect transmission of this message does not mean the loss of its confidentiality. If this message is received by mistake, please send it back to the sender and delete it from your system immediately. It is forbidden to any person who is not the intended receiver to use, distribute or copy any part of this message.



-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions
Buy the iText book: http://itext.ugent.be/itext-in-action/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: contribution: FontReplacingPdfSmartCopy: duplicate TTF font subset merging and replacement (was: How to remove embedded fonts from a pdf document)

Lari Hotari
Quote from:
http://www.nabble.com/Re%3A-contribution%3A-FontReplacingPdfSmartCopy%3A-duplicate-TTF-font-subset-merging-and-replacement-%28was%3A-How-to-remove-embedded-fonts-from-a-pdf-document%29-p16001839.html
Paulo Soares wrote
 The bottom line is that I'll include your code in iText and hope that it
will be improved later, by you or someone else.

Paulo
Hello,

Is there any chance that FontReplacingPdfSmartCopy (http://www.nabble.com/How-to-remove-embedded-fonts-from-a-pdf-document-to14033717.html#a15203186)
 gets included in the iText main distribution? I contributed the patch over a year ago (Jan 31, 2008) and you replied Mar 12, 2008 that you can include it in iText.

I believe that there is a demand for this feature since I've been receiving some questions about the contribution by email. I hope every iText user could benefit from this feature. I believe that font replacement is a very typical use case in pdf manipulation.

I can sign a Contributor agreement if that's a problem in including this contribution.

Someone requested a change by email to be able to replace embedded fonts with document fonts (type1) and that was possible by making a small change:

Here's an example (Junit4 test method) for replacing ComicSansMS with Helvetica:


    @Test
    public void testDocFontReplace() throws Exception {
        FontReplacingPdfSmartCopy.logLevel=Level.INFO;
         PdfConcator pdfConcator=new PdfConcator();
         pdfConcator.setFonts(new BaseFont[]{BaseFont.createFont(BaseFont.HELVETICA, BaseFont.WINANSI, BaseFont.NOT_EMBEDDED)});
         Map fontNameMapping = new HashMap();
         fontNameMapping.put("ComicSansMS", "Helvetica");
         pdfConcator.setFontNameMapping(fontNameMapping);
         pdfConcator.concat(Collections.singleton(new File("test1.pdf")), new File("docfont-test2.pdf"));
    }


I had to make a small modification in FontReplacingPdfSmartCopy's setFonts method:

    public void setFonts(BaseFont[] fonts) {
        this.fontmap = new HashMap(); //new HashMap<String, BaseFont>();
        for (int i = 0; i < fonts.length; i++) {
            switch (fonts[i].getFontType()) {
            case BaseFont.FONT_TYPE_T3:
            case BaseFont.FONT_TYPE_T1:
            case BaseFont.FONT_TYPE_TT:
                if (BaseFont.WINANSI.equals(fonts[i].getEncoding())) {
                    if (fonts[i].getFontType()==BaseFont.FONT_TYPE_T1 || Arrays.asList(fonts[i].getCodePagesSupported())
                            .contains("1252 Latin 1")) {
                        if (!fontmap.containsKey(fonts[i]
                                .getPostscriptFontName())) {
                            logger.log(logLevel, "Adding font {0}", fonts[i].getPostscriptFontName());
                            fontmap.put(fonts[i].getPostscriptFontName(),
                                    fonts[i]);
                        } else {
                            logger.log(logLevel, "Discarding duplicate entry for font {0}", fonts[i].getPostscriptFontName());
                        }
                    } else {
                        logger.log(logLevel, "Discarding font {0}. It doesn't support Cp1252/ISO-8859-1/Latin1/WinAnsiEncoding.", fonts[i].getPostscriptFontName());
                    }
                }
                break;
            default:
                logger.log(logLevel, "Discarding unsupported type of font {0}. Only single byte TTF/OTF fonts are supported by now.", fonts[i].getPostscriptFontName());
            }
        }
    }



Regards,

Lari
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: contribution: FontReplacingPdfSmartCopy: duplicate TTF font subset merging and replacement

iText Software
Lari Hotari wrote:
> I can sign a Contributor agreement if that's a problem in including this
> contribution.

That's not the problem. I think your contribution was overlooked
due to personal circumstances (Bruno's son getting ill).

To make sure the contribution isn't overlooked, you can post the
patch on the SourceForge tracker:
http://sourceforge.net/tracker/?group_id=15255&atid=315255

And you can also send a scan of the signed CLA.

The items in the tracker will be looked at next week.
--
This answer is provided by 1T3XT BVBA
http://www.1t3xt.com/ - http://www.1t3xt.info

------------------------------------------------------------------------------
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
12
Loading...