4.4 KiB
Generate PDFs from the StarRocks Docusaurus documentation site
Node.js code to:
- Generate the ordered list of URLs from the documentation. This is done using code from
docusaurus-prince-pdf. - Convert each page to a PDF file with Gotenberg.
- Combine the individual PDF files using Ghostscript and
pdfcombine.
Clone this repo
Clone this repo to your machine.
Choose the branch that you want a PDF for
When you launch the PDF conversion environment, it will use the active branch. So, if you want a PDF for version 3.3:
git switch branch-3.3
Launch the conversion environment
The conversion process uses Docker Compose. Launch the environment by running the following command from the starrocks/docs/docusaurus/PDF/ directory.
cd starrocks/docs/docusaurus/PDF
docker compose up --detach --wait --wait-timeout 120 --build
Tip
All of the
docker composecommands must be run from thestarrocks/docs/docusaurus/PDF/directory.
Check the status
Tip
If you do not have
jqinstalled just rundocker compose ps. The ouput usingjqis easier to read, but you can get by with the more basic command.
docker compose ps --format json | jq '{Service: .Service, State: .State, Status: .Status}'
Expected output:
{
"Service": "docusaurus",
"State": "running",
"Status": "Up 14 minutes"
}
{
"Service": "gotenberg",
"State": "running",
"Status": "Up 2 hours (healthy)"
}
Get the URL of the "home" page
Check to see if Docusaurus is serving the pages
From the PDF directory check the logs of the docusaurus service:
docker compose logs -f docusaurus
When Docusaurus is ready you will see this line at the end of the log output:
docusaurus-1 | [SUCCESS] Serving "build" directory at: http://0.0.0.0:3000/
Stop watching the logs with CTRL-c
Find the initial URL
First open the docs by launching a browser to the URL at the end of the log output, which should be http://0.0.0.0:3000/.
Next, change to the Chinese documentation if you are generating a PDF document of the Chinese documentation.
Copy the URL of the starting page of the documentation that you would like to generate a PDF for.
Save the URL.
Open a shell in the PDF build environment
Launch a shell from the starrocks/docs/docusaurus/PDF directory:
docker compose exec -ti docusaurus bash
and cd into the PDF directory:
cd /app/docusaurus/PDF
Crawl the docs and generate the PDFs
Run the command:
Tip
The URL in the code sample is for the Chinese documentation, remove the
/zh/if you want English.
node generatePdf.js http://0.0.0.0:3000/zh/docs/introduction/StarRocks_intro/
Join the individual PDF files
Note:
There are 900+ PDF files and more than 4,000 pages in total. Combining takes five hours on my laptop, just let it run. I am looking for a faster method to combine the files.
source .venv/bin/activate
pdfcombine -y combine.yaml --title="StarRocks 3.3" -o ../../PDFoutput/StarRocks_3.3.pdf
Note:
You may see this message during the
pdfcombinestep:
GPL Ghostscript 10.03.1: Missing glyph CID=93, glyph=005d in the font IAAAAA+Menlo-Regular . The output PDF may fail with some viewers.I have not had any complaints about the missing glyph from readers of the documents produced with this.
Finished file
The individual PDF files and the combined file will be on your local machine in starrocks/docs/PDFoutput/
Customizing the docs site for PDF
Gotenberg generates the PDF files without the side navigation, header, and footer as these components are not displayed when the media is set to print. In our docs it does not make sense to have the edit URLs or Feedback widget show. These are filtered out using CSS by adding display: none to the classes of these objects when @media print.
Removing the Feedback form from the PDF can be done with CSS. This snippet is added to the Docusaurus CSS file src/css/custom.css:
/* When we generate PDF files we do not need to show the feedback widget. */
@media print {
.feedback_Ak7m {
display: none;
}
}