Quantcast

Find/Replace Text in Existing PDF?

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Find/Replace Text in Existing PDF?

sesshomurai
Hi,
  I have a need to search for a known string in an existing PDF from an offset.
Then modify the background of that text (i.e. highlight color) and write out
a new PDF.

Is this possible with iText?

Any tips appreciated.

thanks,
Darren
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Find/Replace Text in Existing PDF?

Balder
Yes and no, see http://itext-general.2136553.n4.nabble.com/Search-String-in-PDF-documents-td2152106.html

Kind regards

Balder

----- Reply message -----
From: "sesshomurai" <[hidden email]>
Date: Mon, Jul 11, 2011 15:41
Subject: [iText-questions] Find/Replace Text in Existing PDF?
To: <[hidden email]>

Hi,
 I have a need to search for a known string in an existing PDF from an
offset.
Then modify the background of that text (i.e. highlight color) and write out
a new PDF.

Is this possible with iText?

Any tips appreciated.

thanks,
Darren

--
View this message in context: http://itext-general.2136553.n4.nabble.com/Find-Replace-Text-in-Existing-PDF-tp3659565p3659565.html
Sent from the iText - General mailing list archive at Nabble.com.

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php



------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Find/Replace Text in Existing PDF?

sesshomurai
Hi,
  I read that thread but I didn't find it informative. Given that iText loads a PDF
into a structured format, is it possible to analyse the text elements of that PDF?
And if so, it should be possible to replace those structured elements with new ones
with the appropriate visual settings?

It seems such a basic thing, but I understand PDF is rather prohibitive in this regard.

Best,
Darren

On 07/11/2011 05:29 PM, Balder [via iText - General] wrote:
Yes and no, see http://itext-general.2136553.n4.nabble.com/Search-String-in-PDF-documents-td2152106.html

Kind regards

Balder

----- Reply message -----
From: "sesshomurai" <[hidden email]>
Date: Mon, Jul 11, 2011 15:41
Subject: [iText-questions] Find/Replace Text in Existing PDF?
To: <[hidden email]>

Hi,
 I have a need to search for a known string in an existing PDF from an
offset.
Then modify the background of that text (i.e. highlight color) and write out
a new PDF.

Is this possible with iText?

Any tips appreciated.

thanks,
Darren

--
View this message in context: http://itext-general.2136553.n4.nabble.com/Find-Replace-Text-in-Existing-PDF-tp3659565p3659565.html
Sent from the iText - General mailing list archive at Nabble.com.

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php



------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php


If you reply to this email, your message will be added to the discussion below:
http://itext-general.2136553.n4.nabble.com/Find-Replace-Text-in-Existing-PDF-tp3659565p3660919.html
To unsubscribe from Find/Replace Text in Existing PDF?, click here.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Find/Replace Text in Existing PDF?

Alexis Pigeon
Hi Darren,

On 27 July 2011 14:09, sesshomurai <[hidden email]> wrote:
Hi,
  I read that thread but I didn't find it informative. Given that iText loads a PDF
into a structured format, is it possible to analyse the text elements of that PDF?

Your assumptions are wrong. The structure used by iText when creating a PDF (Paragraph, Phrase, Chunk, PdfPTable, etc...) are high level semantics that are 100% lost in the final document (unless you create a tagged PDF, in such a case you might be able to save maintain semantics). You should NOT expect finding such a structure when "loading" an existing PDF document with iText.
 
And if so, it should be possible to replace those structured elements with new ones
with the appropriate visual settings?

It seems such a basic thing, but I understand PDF is rather prohibitive in this regard.

Again your assumptions are wrong. PDF is a format for presentating documents, not for manipulating them.

Cheers,
alexis

------------------------------------------------------------------------------
Got Input?   Slashdot Needs You.
Take our quick survey online.  Come on, we don't ask for help often.
Plus, you'll get a chance to win $100 to spend on ThinkGeek.
http://p.sf.net/sfu/slashdot-survey
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Find/Replace Text in Existing PDF?

sesshomurai

My understanding is that you can create PDF documents with iText.
Certainly this is possible. PDF is for creating documents!
This means that the PDF format must encode the presentation in a
structured
way. Otherwise, the PDF viewer cannot reproduce the document.

Thus, it is structured and all the document information including
text is represented in the PDF format.

I thought iText is capable of loading the PDF elements and then
creating a PDF document from them. On that part, I may be wrong.

cheers.


On Wed, 27 Jul 2011 06:02:08 -0700 (PDT), "Alexis Pigeon [via iText -
General]" <[hidden email]> wrote:
> Hi Darren,
>
> On 27 July 2011 14:09, sesshomurai <[hidden email]> wrote:
>
>> Hi,
>>   I read that thread but I didn't find it informative. Given that iText
>> loads a PDF
>> into a structured format, is it possible to analyse the text elements
of
>> that PDF?
>>
>
> Your assumptions are wrong. The structure used by iText when creating a
PDF
> (Paragraph, Phrase, Chunk, PdfPTable, etc...) are high level semantics
that
> are 100% lost in the final document (unless you create a tagged PDF, in
> such
> a case you might be able to save maintain semantics). You should NOT
expect

> finding such a structure when "loading" an existing PDF document with
> iText.
>
>
>> And if so, it should be possible to replace those structured elements
>> with
>> new ones
>> with the appropriate visual settings?
>>
>> It seems such a basic thing, but I understand PDF is rather prohibitive
>> in
>> this regard.
>>
>
> Again your assumptions are wrong. PDF is a format for presentating
> documents, not for manipulating them.
>
> Cheers,
> alexis
>
>
------------------------------------------------------------------------------

> Got Input?   Slashdot Needs You.
> Take our quick survey online.  Come on, we don't ask for help often.
> Plus, you'll get a chance to win $100 to spend on ThinkGeek.
> http://p.sf.net/sfu/slashdot-survey
> _______________________________________________
> iText-questions mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/itext-questions
>
> iText(R) is a registered trademark of 1T3XT BVBA.
> Many questions posted to this list can (and will) be answered with a
> reference to the iText book: http://www.itextpdf.com/book/
> Please check the keywords list before you ask for examples:
> http://itextpdf.com/themes/keywords.php
>
> _______________________________________________
> If you reply to this email, your message will be added to the discussion
> below:
>
http://itext-general.2136553.n4.nabble.com/Find-Replace-Text-in-Existing-PDF-tp3659565p3698405.html
>
> To unsubscribe from Find/Replace Text in Existing PDF?, visit
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Find/Replace Text in Existing PDF?

Alexis Pigeon
Hi Darren,

On 27 July 2011 15:59, sesshomurai <[hidden email]> wrote:

My understanding is that you can create PDF documents with iText.

Correct.
 
Certainly this is possible. PDF is for creating documents!
This means that the PDF format must encode the presentation in a
structured
way. Otherwise, the PDF viewer cannot reproduce the document.

Thus, it is structured and all the document information including
text is represented in the PDF format.

Wrong again. There is no such thing as a paragraph, line, not even word in PDF. Just bunches of characters written at some positions.
From Wikipedia ( http://en.wikipedia.org/wiki/Portable_Document_Format#Text )
"Text in PDF is represented by text elements in page content streams. A text element specifies that characters should be drawn at certain positions. The characters are specified using the encoding of a selected font resource."
 
Cheers,
alexis

------------------------------------------------------------------------------
Got Input?   Slashdot Needs You.
Take our quick survey online.  Come on, we don't ask for help often.
Plus, you'll get a chance to win $100 to spend on ThinkGeek.
http://p.sf.net/sfu/slashdot-survey
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Find/Replace Text in Existing PDF?

iText Software
On 28/07/2011 10:07, Alexis Pigeon wrote:
> There is no such thing as a paragraph, line, not even word in PDF.
> Just bunches of characters written at some positions.
For the non-believers, please refer to
http://affiliate.manning.com/idevaffiliate.php?id=223_212
and to ask them to download Chapter 6. The intro of that chapter should
convince them.

------------------------------------------------------------------------------
Got Input?   Slashdot Needs You.
Take our quick survey online.  Come on, we don't ask for help often.
Plus, you'll get a chance to win $100 to spend on ThinkGeek.
http://p.sf.net/sfu/slashdot-survey
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Find/Replace Text in Existing PDF?

sesshomurai
In reply to this post by Alexis Pigeon

Ahhh, I see. So each individual character is an element.

Well, that's disappointing I guess. But thanks for the clarification!



On Thu, 28 Jul 2011 01:11:27 -0700 (PDT), "Alexis Pigeon [via iText -

General]" <[hidden email]> wrote:

> Hi Darren,

>

> On 27 July 2011 15:59, sesshomurai <[hidden email]> wrote:

>

>>

>> My understanding is that you can create PDF documents with iText.

>>

>

> Correct.

>

>

>> Certainly this is possible. PDF is for creating documents!

>> This means that the PDF format must encode the presentation in a

>> structured

>> way. Otherwise, the PDF viewer cannot reproduce the document.

>>

>> Thus, it is structured and all the document information including

>> text is represented in the PDF format.

>>

>

> Wrong again. There is no such thing as a paragraph, line, not even word

in

> PDF. Just bunches of characters written at some positions.

>>From Wikipedia (

>>http://en.wikipedia.org/wiki/Portable_Document_Format#Text)

> "Text in PDF is represented by *text elements* in page content streams.

A

> text element specifies that *characters* should be drawn at certain

> positions. The characters are specified using the *encoding* of a

> selected *font

> resource*."

>

> Cheers,

> alexis

>

>

------------------------------------------------------------------------------

> Got Input?   Slashdot Needs You.

> Take our quick survey online.  Come on, we don't ask for help often.

> Plus, you'll get a chance to win $100 to spend on ThinkGeek.

> http://p.sf.net/sfu/slashdot-survey

> _______________________________________________

> iText-questions mailing list

> [hidden email]

> https://lists.sourceforge.net/lists/listinfo/itext-questions

>

> iText(R) is a registered trademark of 1T3XT BVBA.

> Many questions posted to this list can (and will) be answered with a

> reference to the iText book: http://www.itextpdf.com/book/

> Please check the keywords list before you ask for examples:

> http://itextpdf.com/themes/keywords.php

>

> _______________________________________________

> If you reply to this email, your message will be added to the discussion

> below:

>

http://itext-general.2136553.n4.nabble.com/Find-Replace-Text-in-Existing-PDF-tp3659565p3700733.html

>

> To unsubscribe from Find/Replace Text in Existing PDF?, visit

>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Find/Replace Text in Existing PDF?

Kevin Day
In reply to this post by sesshomurai
Don't give up yet!

It is possible to search for a word using the pdf.parser.* API, locate the ascender and descender of the word, and add a colored rectangle below that position on the page.  If all you want to do is add hi-lighting, then you can probably do this.  If you want to actually change the text, then that's a much, much harder problem.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Find/Replace Text in Existing PDF?

sesshomurai
That's a great suggestion Kevin. I'll look into it.

So basically, I want to highlight and also redact (place a black box over the text),
so maybe its doable with some clever coding.

On 07/28/2011 07:04 PM, Kevin Day [via iText - General] wrote:
Don't give up yet!

It is possible to search for a word using the pdf.parser.* API, locate the ascender and descender of the word, and add a colored rectangle below that position on the page.  If all you want to do is add hi-lighting, then you can probably do this.  If you want to actually change the text, then that's a much, much harder problem.


If you reply to this email, your message will be added to the discussion below:
http://itext-general.2136553.n4.nabble.com/Find-Replace-Text-in-Existing-PDF-tp3659565p3702783.html
To unsubscribe from Find/Replace Text in Existing PDF?, click here.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Find/Replace Text in Existing PDF?

Leonard Rosenthol-3
Putting a black box over something is NOT redaction – the stuff underneath is still there!!   A LOT of government agencies around the world have gotten into trouble using software that only "covers" stuff but doesn't actually redact.   

If you want to write redaction software, then do actual redaction (remove the data)!!!

Leonard

From: sesshomurai <[hidden email]>
Reply-To: Post here <[hidden email]>
Date: Thu, 28 Jul 2011 16:22:26 -0700
To: Post here <[hidden email]>
Subject: Re: [iText-questions] Find/Replace Text in Existing PDF?

That's a great suggestion Kevin. I'll look into it.

So basically, I want to highlight and also redact (place a black box over the text),
so maybe its doable with some clever coding.

On 07/28/2011 07:04 PM, Kevin Day [via iText - General] wrote:
Don't give up yet!

It is possible to search for a word using the pdf.parser.* API, locate the ascender and descender of the word, and add a colored rectangle below that position on the page.  If all you want to do is add hi-lighting, then you can probably do this.  If you want to actually change the text, then that's a much, much harder problem.


If you reply to this email, your message will be added to the discussion below:
http://itext-general.2136553.n4.nabble.com/Find-Replace-Text-in-Existing-PDF-tp3659565p3702783.html
To unsubscribe from Find/Replace Text in Existing PDF?, click here.



View this message in context: Re: Find/Replace Text in Existing PDF?
Sent from the iText - General mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Got Input?   Slashdot Needs You.
Take our quick survey online.  Come on, we don't ask for help often.
Plus, you'll get a chance to win $100 to spend on ThinkGeek.
http://p.sf.net/sfu/slashdot-survey
_______________________________________________
iText-questions mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Find/Replace Text in Existing PDF?

sesshomurai

That's a really good point...



On Fri, 29 Jul 2011 03:34:18 -0700 (PDT), "Leonard Rosenthol-3 [via iText

-

General]" <[hidden email]> wrote:

> Putting a black box over something is NOT redaction – the stuff

underneath

> is still there!!   A LOT of government agencies around the world have

> gotten into trouble using software that only "covers" stuff but doesn't

> actually redact.

>

> If you want to write redaction software, then do actual redaction

(remove

> the data)!!!

>

> Leonard

>

> From: sesshomurai <[hidden email]<mailto:[hidden email]>>

> Reply-To: Post here

>

<[hidden email]<mailto:[hidden email]>>

> Date: Thu, 28 Jul 2011 16:22:26 -0700

> To: Post here

>

<[hidden email]<mailto:[hidden email]>>

> Subject: Re: [iText-questions] Find/Replace Text in Existing PDF?

>

> That's a great suggestion Kevin. I'll look into it.

>

> So basically, I want to highlight and also redact (place a black box

over

> the text),

> so maybe its doable with some clever coding.

>

> On 07/28/2011 07:04 PM, Kevin Day [via iText - General] wrote:

> Don't give up yet!

>

> It is possible to search for a word using the pdf.parser.* API, locate

the

> ascender and descender of the word, and add a colored rectangle below

that

> position on the page.  If all you want to do is add hi-lighting, then

you

> can probably do this.  If you want to actually change the text, then

that's

> a much, much harder problem.

>

> ________________________________

> If you reply to this email, your message will be added to the discussion

> below:

>

http://itext-general.2136553.n4.nabble.com/Find-Replace-Text-in-Existing-PDF-tp3659565p3702783.html

> To unsubscribe from Find/Replace Text in Existing PDF?, click here.

>

>

> ________________________________

> View this message in context: Re: Find/Replace Text in Existing

>

PDF?<http://itext-general.2136553.n4.nabble.com/Find-Replace-Text-in-Existing-PDF-tp3659565p3702817.html>

> Sent from the iText - General mailing list

> archive<http://itext-general.2136553.n4.nabble.com/> at Nabble.com.

>

>

------------------------------------------------------------------------------

> Got Input?   Slashdot Needs You.

> Take our quick survey online.  Come on, we don't ask for help often.

> Plus, you'll get a chance to win $100 to spend on ThinkGeek.

> http://p.sf.net/sfu/slashdot-survey

> _______________________________________________

> iText-questions mailing list

> [hidden email]

> https://lists.sourceforge.net/lists/listinfo/itext-questions

>

> iText(R) is a registered trademark of 1T3XT BVBA.

> Many questions posted to this list can (and will) be answered with a

> reference to the iText book: http://www.itextpdf.com/book/

> Please check the keywords list before you ask for examples:

> http://itextpdf.com/themes/keywords.php

>

> _______________________________________________

> If you reply to this email, your message will be added to the discussion

> below:

>

http://itext-general.2136553.n4.nabble.com/Find-Replace-Text-in-Existing-PDF-tp3659565p3703638.html

>

> To unsubscribe from Find/Replace Text in Existing PDF?, visit

>

Loading...