pdfbox
PDFBox是一个Java库,可用于创建,修改和提取PDF文件的内容。它是一个Apache软件基金会的项目,使用Apache License 2.0许可证。
PDFBox提供了一组API,可用于提取文本和图像,添加和删除页面,提取PDF元数据和加密PDF文件等。
主要依赖
<!-- 将 html 转换为 xml 工具库 -->
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.17.1</version>
</dependency>
<!-- 第三方 pdfbox 包装库,提供 html 转 pdf 功能 -->
<dependency>
<groupId>com.openhtmltopdf</groupId>
<artifactId>openhtmltopdf-pdfbox</artifactId>
<version>1.0.10</version>
</dependency>
测试代码
// 获取 java 版本
String version = System.getProperty("java.specification.version");
// 获取系统类型
String platform = System.getProperty("os.name", "");
platform = platform.toLowerCase().contains("window") ? "win" : "linux";
// 当前程序目录
String current = System.getProperty("user.dir");
System.out.println(String.format("current=%s", current));
// html 文件路径
File index = Paths.get(current, "..", "index.html").toFile();
if (!index.exists()) {
System.out.println(String.format("file not exist,file=%s", index.getAbsolutePath()));
return;
}
try {
Document doc = Jsoup.parse(index, "UTF-8");
// 补全标记
doc.outputSettings().syntax(Document.OutputSettings.Syntax.xml);
File file = Paths.get(current, String.format("java%s_%s.pdf", version, platform)).toFile();
FileOutputStream stream = new FileOutputStream(file);
PdfRendererBuilder builder = new PdfRendererBuilder();
// NOTE 字体问题,文档中出现过的字段,需要手动加载字体
builder.useFont(Paths.get(current, "..", "fonts", "simsun.ttc").toFile(), "SimSun");
builder.useFont(Paths.get(current, "..", "fonts", "msyh.ttc").toFile(), "font-test");
builder.useFont(Paths.get(current, "..", "fonts", "msyh.ttc").toFile(), "Microsoft YaHei UI");
// NOTE 设置根目录
String baseUrl = Paths.get(current, "..").toUri().toString();
builder.withHtmlContent(doc.html(), baseUrl);
builder.toStream(stream);
builder.run();
} catch (IOException e) {
throw new RuntimeException(e);
}
效果预览
pdfbox-demo/java1.8_win.pdf · yjihrp/linux-html2pdf-demo - Gitee.com
pdfbox-demo/java11_linux.pdf · yjihrp/linux-html2pdf-demo - Gitee.com
实用工具
# 查看 pdf 内部结构
java -jar pdfbox-app debug path-to-pdf/test.pdf
java -jar debugger-app path-to-pdf/test.pdf
测试结果
下一篇 5-LINUX HTML 转 PDF-selenium