MyNixOS website logo
Description

Document Conversion to 'PDF' or 'PNG'.

It provides the ability to generate images from documents of different types. Three main features are provided: functions for generating document thumbnails, functions for performing visual tests of documents and a function for updating fields and table of contents of a 'Microsoft Word' or 'RTF' document. In order to work, 'LibreOffice' must be installed on the machine and or 'Microsoft Word'. If the latter is available, it can be used to produce PDF documents or images identical to the originals; otherwise, 'LibreOffice' is used and the rendering can be sometimes different from the original documents.

doconv

doconv logo The tool offers a set of functions for converting ‘Microsoft Word’ or ‘Microsoft PowerPoint’ documents to ‘PDF’ format and also for converting them to images in the form of thumbnails.

In order to work, the package will use ‘Microsoft Word’, ‘Microsoft PowerPoint’, if they are not available program ‘LibreOffice’ can be used. A function is also provided to update all fields and table of contents of a Word document using ‘Microsoft Word’.

These features are also used to provide functions for visual-testing documents, format ‘doc’, ‘docx’, ‘ppt’, ‘pptx’, ‘html’, ‘pdf’ and ‘png’ are supported. The functions can be used with packages “testthat” and package “tinytest”.

R buildstatus

Installation

You can install the latest version from GitHub with:

# install.packages("devtools")
devtools::install_github("ardata-fr/doconv")

Example

library(doconv)

Generate thumbails from file

You can generate thumbails as an image by using to_miniature:

docx_file <- system.file(package = "doconv", "doc-examples/example.docx")
to_miniature(
  filename = docx_file, 
  row = c(1, 1, 2, 2))

It uses ‘Microsoft Word’ or ‘Microsoft PowerPoint’ program to convert Word or PowerPoint documents to PDF. If program ‘Microsoft Word’ or ‘Microsoft PowerPoint’ is not available, it uses ‘LibreOffice’ to convert Word or PowerPoint documents to PDF.

Thus, this package can only be used when ‘Microsoft Word’ and ‘Microsoft PowerPoint’ programs are available or eventually ‘LibreOffice’.

Convert a PowerPoint file to PDF

docx_file <- system.file(package = "doconv", "doc-examples/example.pptx")
to_pdf(docx_file, output = "pptx_example.pdf")
to_miniature("pptx_example.pdf", width = 1000)

Convert a Word file to PDF

to_pdf(docx_file, output = "docx_example.pdf")

Update Word fields and TOC

library(officer)
library(doconv)

read_docx() |> 
  body_add_fpar(
    value = fpar(
      run_word_field("DOCPROPERTY \"coco\" \\* MERGEFORMAT"))) |>
  set_doc_properties(coco = "test") |>
  print(target = "output.docx") |>
  docx_update()

Setup

If not available on your machine and if possible, install ‘Microsoft Word’ or ‘Microsoft PowerPoint’ programs.

If ‘Microsoft Word’ and ‘Microsoft PowerPoint’ can not be installed, install ‘LibreOffice’ on your machine; please visit https://www.libreoffice.org/ and follow the installation instructions.

Use function check_libreoffice_export() to check that the software is installed and can export to PDF:

check_libreoffice_export()
#> [1] TRUE

If ‘Microsoft Word’ or ‘Microsoft PowerPoint’ are available on your machine, you can get images or pdf that looks exactly the same than the original document. If not ‘LibreOffice’ is used to convert Word documents to PDF or as an image, in this case, be aware that ‘LibreOffice’ does not always render the document as ‘Microsoft Word’ would do (sections can be misunderstood for example).

Authorization on macOS

If you are running R for ‘macOS’, you have to authorize few things before starting.

PDF processing will happen into a working directory managed with function working_directory().

Manual interventions are necessary to authorize ‘Word’ and ‘PowerPoint’ applications to write in a single directory: the working directory.

These permissions must be set manually, this is required by the macOS security policy. We think that this is not a problem because it is unlikely that you will use a Mac machine as a server.

User must manually click on few buttons:

  1. allow R to run ‘AppleScript’ scripts that will control Word
  2. allow Word to write to the working directory.

Don’t worry, these are one-time operations.

Related work

  • Packages docxtractr is providing convert_to_pdf() that works very well. The functionality integrated in Bob Rudis’ package depends only on ‘LibreOffice’.
Metadata

Version

0.3.2

License

Unknown

Platforms (75)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows