Studio Pro activities > Document Processing > PDF. Includes "Read text", "Convert to image", "Extract page range" and more.
Activities
Read Text | Convert to Image | Extract Page Range |
---|---|---|
Extract the text layer from a PDF file | Convert a PDF file to an image | Extract selected pages from the PDF file to a new PDF file |
Get PDF Page Count | Combine to PDF | |
Count the number of pages in a PDF file | Combine multiple files into a single multi-page PDF |
Read Text
Description
This activity extracts the text layer from a PDF file and saves it as a string.
Note
Editable fields in a PDF file cannot be read with this activity.
Parameters
Path
- Set a value: enables you to directly write the desired path.
- Save the previous step result: chooses the last result activity as a path.
- Calculate a value: enables you to use available properties and methods to form a path.
Display spaces and linebreaks
This parameter allows you to display /r
and /n
(spaces and linebreaks) in the variable text. You can disable the parameter if you do not need to display them.
Comment
This parameter contains an annotation of the activity. The input text will be displayed above the activity name.
Result
To save the result to a variable, you need to attach a "Save value to variable" activity to this activity, specify the desired variable name, and select the "Save the previous step result" option. The result is saved as a string.
Please note that in some cases this activity may not provide the same sequence of reading text as in the original document. To extract specific information from structured documents, you may use the OCR activities.
Convert to Image
Description
This activity converts a PDF file to an image or to a set of images in case of multiple pages in the pdf file.
Note
Editable fields in a PDF file cannot be read with this activity.
Below is an example of how to set up and use the activity.
Parameters
File type
Select the format of the pictures created after the conversion.
Supported output formats:
- .jpeg
- .png
- .tiff
PDF file path
- Set a value: enables you to directly write the desired path to the input pdf file.
- Save the previous step result: chooses the previous activity result as a path.
- Calculate a value: enables you to use available properties and methods to form a path.
Output path
- Set a value: enables you to directly write the desired path to the folder where the output images will be placed.
- Save the previous step result: chooses the previous activity result as a path.
- Calculate a value: enables you to use available properties and methods to form a path.
File prefix
- Set a value: enables you to specify a prefix that will be added to the names of the output images.
- Save the previous step result: chooses the previous activity result as a prefix.
- Calculate a value: enables you to use available properties and methods to form a prefix.
Note
The name of the generated file includes the value from the "File prefix" field and a counter restricted by the number of pages of the original PDF file. For instance, if the parameter "File prefix" is set with the value
"Test_"
and the original PDF file has 2 pages, the files"Test_1"
and"Test_2"
are generated in the output folder.
Pages
- Set a value: allows you to manually specify one or more pages to be converted. For example, you can specify pages 1,3, or 5-7.
- Calculate a value: allows you to use a special formula or a special method to determine the pages you need.
- Save the previous step result: takes the result of the previous workflow activity as the page numbers.
Comment
Contains an annotation of the activity. The input text will be displayed above the activity name.
Extract Page Range
Description
This activity allows you to Extract selected pages from the PDF file to a new PDF file.
Parameters
PDF file path
- Set a value: allows you to manually specify the path to the PDF file from which you want to extract the page range. Click the "PICK" button to open the file explorer and pick the required directory.
- Calculate a value: allows you to use a special formula or a special method to determine the path.
- Save the previous step result: takes the result of the previous workflow activity as the path.
Output path
- Set a value option allows you to manually specify the path where the PDF-file is to be created. Click the "PICK" button to open the file explorer and pick the required directory.
- Calculate a value option allows you to use a special formula or a special method to determine the path.
- Save the previous step result option takes the result of the previous workflow activity as the path.
Pages
- Set a value option allows you to manually specify the range of pages to be extracted from the PDF document. You can specify individual pages (1,3) or range in order (5-7).
- Calculate a value option allows you to use a special formula or a special method to determine the range of pages.
- Save the previous step result option takes the result of the previous workflow activity as the range of pages.
You can specify the pages to be extracted using variables. It can be useful in the cases, where you get the amount of pages to be extracted from another activity. To do that, use the following syntax:
"1-"+pdf_page_count
, where "1-" specifies the page andpdf_page_count
the name of the variable, containing the number of pages.
Comment
Allows you to add explanatory text to the block. The text will be displayed inside the block on top of the activity name.
Get PDF Page Count
Description
This activity allows you to count the number of pages in a PDF file and return the number of pages to a variable.
Parameters
PDF file path
- Set a value option allows you to manually specify the path to the target PDF file. Click the "PICK" button to open the file explorer and pick the required directory.
- Calculate a value option allows you to use a special formula or a special method to determine the path.
- Save the previous step result option takes the result of the previous workflow activity as the path.
Along with this action, the "Save value to variable pdf_page_count
" block is automatically created. Thus, number of pages in PDF is saved in the variable pdf_page_count
.
Combine to PDF
Description
This activity allows you to combine multiple files from the same folder into a single multi-page PDF document. Supported file formats: .pdf, .docx, .xlsx, .rtf, .tiff, .bmp, .jpeg, .jpg, .gif, .png, .html, .xps, .html. The files will be combined into a document alphabetically based on the file name.
Note
The activity does not scale down larger images when incorporating them into a PDF.
Parameters
Path to folder
- Set a value: allows you to manually specify the path to the folder that contains the files to be combined into a PDF document. Click the "PICK" button to open the file explorer and select the desired folder.
- Calculate a value: allows you to use a special formula or a special method to determine the path to folder.
- Save the previous step result: takes the result of the previous workflow activity as the path to folder.
PDF file path
- Set a value: allows you to manually specify the directory where the PDF file is to be created. Click the "PICK" button to open the file explorer and pick the required directrory.
- Calculate a value: allows you to use a special formula or a special method to determine the path.
- Save the previous step result. takes the result of the previous workflow activity as the path.
Comment
Allows you to add explanatory text to the block. The text is displayed inside the block on top of the activity name.
Updated about 1 year ago