November 22, 2008

Linux Get All Types Of Files Total Page Numbers (Almost All Types)

In order to get the total page of a document which is to be faxed or mailed, you can use OpenOffice export tools, imagemagick and some Linux console commands together.
It is easy to get total page numbers of files by converting all files to pdf and use Linux's 'pdfinfo' console command. But it isn't easy to convert all files to pdf, especially the Microsoft Office ones. In order to convert Microsoft Office documents, you must have OpenOffice in your system. And you must run OpenOffice at background by this command;
openoffice "-accept=socket,host=localhost,port=8100;urp;StarOffice.ServiceManager" -norestore -nofirststartwizard -nologo -headless &

Then with this code that i found from here, you can convert excel, word, powerpoint, rtf, txt, html files to pdf files over OpenOffice with Python from Linux Console like;
python ./DocumentConverter.py sample.doc sample.pdf

In order to run the code above you must first run OpenOffice as i mentioned before.
Also if you need to convert image files to pdf files and want to get their total page numbers, you can use imagemagick's convert command from Linux console;
convert sample.jpg sample.pdf

After converting the files to pdfs, you can get the total page of file by;
pdfinfo sample.pdf | grep "Pages:" | awk '{print $2}' | tail -n1

No comments: