Like many developers, we lean towards using open-source libraries when possible. They’re free and often simpler than commercial tools. And for some PDF documents, an open-source HTML-to-PDF tool is the best answer.
So why did we build the DocRaptor HTML-to-PDF API on top of the PrinceXML commercial engine? After struggling with weeks with open-source libraries, we realized there is a significant technical gap between Prince and all the browser-based HTML-to-PDF libraries:
Browsers assume you’re making a long scrolling webpage. When there isn't even a concept of a “page”, repeating elements such as headers or watermarks aren’t supported. This one fact, along with a dedicated focus on PDFs, means Prince’s commercial HTML to PDF engine has far superior support for common requirements such as:
If you don’t need any of that functionality, the open-source tools may be a good solution for your document. They’re great for one-page invoices, simple letters, or exact copies of existing webpages. With enough effort and polyfills, slightly more complex documents can be supported but we always recommend testing the most complex elements of your document first.
Beyond technical capabilities, the other problem we faced was scalability. PDF generation is more complex than it appears.
A traditional web server returns the HTML content for a website in milliseconds and is then free to respond to other requests. A PDF generator must download all the external assets, such as JavaScript and images, and then completely render the document before delivering the PDF. This work is traditionally done by the browser.
Because of all this additional server-side work, the execution time of PDF generators is much larger than a web server request. Instead of milliseconds, it takes seconds or in the case of an image-heavy document even minutes to generate a PDF. Sometimes assets timeout and document generation time spikes.
There is also larger startup times, CPU usage, and memory consumption to consider. A popular Node-based open-source HTML to PDF library notes in the readme:
Note: It is strongly recommended that you keep Chrome running side-by-side with Node.js. There is significant overhead starting up Chrome for each PDF generation which can be easily avoided.
It's suggested to use pm2 to ensure Chrome continues to run. If it crashes, it will restart automatically.
As of this writing, headless Chrome uses about 65mb of RAM while idle.
These problems are all solvable. Background jobs are the common solution (it's what we do). Lambda functions are another tactic.
If your usage numbers are small, this may not be a concern. If you want to generate a lot of documents or support simultaneous PDF creation, scalability planning will be required.
In the end, our own difficulties with creating a high-quality PDF and scaling production led us to create DocRaptor. We thought we could help other developers from wasting time and energy as we did. That said, many DocRaptor users have tried an open-source tool before switching to DocRaptor. You should review all your potential solutions.
There are two kinds of open source tools: browser-based and...others. They each have different advantages and disadvantages.
As you normally write HTML, CSS, and JavaScript for web browsers, browser-based libraries are the easiest for web developers to use. You submit an HTML document or URL and you get a PDF back.
We’d recommend HTML to PDF libraries based on Headless Chrome. It provides modern CSS and JavaScript support and a strong developer community. There’s a bunch now, but the most popular Headless Chrome libraries include Puppeteer, electron-pdf, and Athena.
Historically, wkhtmltopdf and PhantomJS were the primary open-source HTML to PDF libraries. Do not use these for PDF generation. They are buggy, lack support for modern CSS, have poor font support, and are a pain to install. Additionally, PhantomJS has been officially abandoned by its maintainer. Stick with the Headless Chrome-based libraries and other wkhtmltopdf alternatives.
Not all libraries are based on browser engines. Weasyprint is a very popular HTML-to-PDF library that is not based on an actual browser. We haven’t used it, but it’s generally well-reviewed. It has one major limitation though: it does not support JavaScript. Only HTML and CSS. That limitation aside, it offers more advanced PDF options than Headless Chrome and is probably the closest open-source alternative to DocRaptor (except we support JavaScript!).
The other non-browser libraries require you to programmatically create your PDF element by element, line by line. This flexibility and power comes at the cost of development time and extensive documentation consumption. If you need a level of pixel precision that HTML and CSS cannot provide, then these libraries may be a good option. Popular choices include:
It primarily depends on your document requirements and your budget. Beyond easy access to the Prince engine, DocRaptor’s online HTML to PDF API provides a lot of benefits such as:
On the other hand, open-source HTML to PDF libraries guarantee full control over your document generation pipeline and might save you money in the long-run—though when we compared self-hosting to DocRaptor, using DocRaptor was almost always more cost-effective (assuming US-based developer salaries). The choice is yours! If you have any questions about DocRaptor, feel free to contact us at support@docraptor.com.