- 2022 03/03
1、doc
使用linux程序antiword或者catdoc即可,性能优异,但是只可识别文字
catdoc:yum install catdoc即可使用
$content = shell_exec('/usr/bin/catdoc -d utf-8 '.$file);
antiword:
- 下载源码包make & make install
-
cp /root/bin/antiword /usr/local/bin/ mkdir /usr/share/antiword cp -R /root/.antiword/* /usr/share/antiword/ chmod 777 /usr/local/bin/*antiword chmod 755 /usr/share/antiword/*
-
antiword -t 文件名.doc 文本输出(默认) antiword -f 文件名.doc 格式化文本输出 antiword -m utf-8 文件名.doc 注意异常处理
2、docx
composer安装phpoffice/phpword,转网页读取,文字图片完整还原
$phpWord = \PhpOffice\PhpWord\IOFactory::load($file); $htmlWriter = \PhpOffice\PhpWord\IOFactory::createWriter($phpWord, "HTML"); $content = ''; foreach ($phpWord->getSections() as $section) { $writer = new \PhpOffice\PhpWord\Writer\HTML\Element\Container($htmlWriter, $section); $content .= $writer->write(); }
3、pdf
composer安装\Smalot\PdfParser
$parser = new \Smalot\PdfParser\Parser(); $pdf = $parser->parseFile($file); $content = $pdf->getText();