starrocks/docs/docusaurus/PDF/README.md

4.4 KiB

Generate PDFs from the StarRocks Docusaurus documentation site

Node.js code to:

  1. Generate the ordered list of URLs from the documentation. This is done using code from docusaurus-prince-pdf.
  2. Convert each page to a PDF file with Gotenberg.
  3. Combine the individual PDF files using Ghostscript and pdfcombine.

Clone this repo

Clone this repo to your machine.

Choose the branch that you want a PDF for

When you launch the PDF conversion environment, it will use the active branch. So, if you want a PDF for version 3.3:

git switch branch-3.3

Launch the conversion environment

The conversion process uses Docker Compose. Launch the environment by running the following command from the starrocks/docs/docusaurus/PDF/ directory.

cd starrocks/docs/docusaurus/PDF
docker compose up --detach --wait --wait-timeout 120 --build

Tip

All of the docker compose commands must be run from the starrocks/docs/docusaurus/PDF/ directory.

Check the status

Tip

If you do not have jq installed just run docker compose ps. The ouput using jq is easier to read, but you can get by with the more basic command.

docker compose ps --format json | jq '{Service: .Service, State: .State, Status: .Status}'

Expected output:

{
  "Service": "docusaurus",
  "State": "running",
  "Status": "Up 14 minutes"
}
{
  "Service": "gotenberg",
  "State": "running",
  "Status": "Up 2 hours (healthy)"
}

Get the URL of the "home" page

Check to see if Docusaurus is serving the pages

From the PDF directory check the logs of the docusaurus service:

docker compose logs -f docusaurus

When Docusaurus is ready you will see this line at the end of the log output:

docusaurus-1  | [SUCCESS] Serving "build" directory at: http://0.0.0.0:3000/

Stop watching the logs with CTRL-c

Find the initial URL

First open the docs by launching a browser to the URL at the end of the log output, which should be http://0.0.0.0:3000/.

Next, change to the Chinese documentation if you are generating a PDF document of the Chinese documentation.

Copy the URL of the starting page of the documentation that you would like to generate a PDF for.

Save the URL.

Open a shell in the PDF build environment

Launch a shell from the starrocks/docs/docusaurus/PDF directory:

docker compose exec -ti docusaurus bash

and cd into the PDF directory:

cd /app/docusaurus/PDF

Crawl the docs and generate the PDFs

Run the command:

Tip

The URL in the code sample is for the Chinese documentation, remove the /zh/ if you want English.

node generatePdf.js http://0.0.0.0:3000/zh/docs/introduction/StarRocks_intro/

Join the individual PDF files

Note:

There are 900+ PDF files and more than 4,000 pages in total. Combining takes five hours on my laptop, just let it run. I am looking for a faster method to combine the files.

source .venv/bin/activate
pdfcombine -y combine.yaml --title="StarRocks 3.3" -o ../../PDFoutput/StarRocks_3.3.pdf

Note:

You may see this message during the pdfcombine step:

GPL Ghostscript 10.03.1: Missing glyph CID=93, glyph=005d in the font IAAAAA+Menlo-Regular . The output PDF may fail with some viewers.

I have not had any complaints about the missing glyph from readers of the documents produced with this.

Finished file

The individual PDF files and the combined file will be on your local machine in starrocks/docs/PDFoutput/

Customizing the docs site for PDF

Gotenberg generates the PDF files without the side navigation, header, and footer as these components are not displayed when the media is set to print. In our docs it does not make sense to have the edit URLs or Feedback widget show. These are filtered out using CSS by adding display: none to the classes of these objects when @media print.

Removing the Feedback form from the PDF can be done with CSS. This snippet is added to the Docusaurus CSS file src/css/custom.css:

/* When we generate PDF files we do not need to show the feedback widget. */
@media print {
    .feedback_Ak7m {
        display: none;
    }
}