老板又出難題,，氣得我寫(xiě)了個(gè)自動(dòng)化軟件,！

F2967527 2021-01-17

展開(kāi)全文

關(guān)鍵時(shí)間,，第一時(shí)間送達(dá),！

作者：小小明,，Pandas數(shù)據(jù)處理專家,，致力于幫助無(wú)數(shù)數(shù)據(jù)從業(yè)者解決數(shù)據(jù)處理難題

編輯：朱小五,，一只不務(wù)正業(yè)的數(shù)據(jù)狗

日常工作中，領(lǐng)導(dǎo)要求你將一份 Word 文檔中的圖片存儲(chǔ)到一個(gè)文件夾內(nèi),，你可能會(huì)一邊內(nèi)心崩潰,，一邊開(kāi)始一張張的 另存為。

但假如領(lǐng)導(dǎo)要求你將幾百個(gè)word文檔中的圖片全部都拷貝出來(lái),，你是不是打算離職不干了,？

就比如下面這些word文檔中的圖片，你能否快速的把所有圖片都拷貝出來(lái)呢,？

不過(guò),，上次分享的這種方法還是有缺陷的：把word文檔用壓縮文件打開(kāi)，逐個(gè)解壓的話依然會(huì)耗時(shí)較長(zhǎng)時(shí)間,，另外里面摻雜了doc格式的word文檔,，你還需將這些03版本的word文檔另存為docx格式。

今天,，將給大家展示一下全新版本?。,?！

寫(xiě)個(gè)程序，十秒內(nèi)全部給你轉(zhuǎn)換完畢,，并把圖片都提取出來(lái),，還能批量從真實(shí)修改圖片格式,，而不是簡(jiǎn)單的修改一下擴(kuò)展名,。

（文末附帶exe可執(zhí)行程序）

下面開(kāi)始展示

doc格式批量轉(zhuǎn)為docx

python提供了win32com模塊，其中的SaveAs方法可以代替人手批量將文件另存為我們需要的格式,。

win32com包含在pypiwin32模塊中,，只需安裝pypiwin32模塊即可：

pip install pypiwin32

下面的代碼將指定目錄下的doc文件轉(zhuǎn)換為docx格式，并放在該目錄的temp_dir下面：

from win32com import client as wc  # 導(dǎo)入模塊
from pathlib import Path
import os
import shutil

doc_path = r'E:\tmp\答疑整理'
temp_dir = 'temp'
if os.path.exists(f'{doc_path}/{temp_dir}'):
    shutil.rmtree(f'{doc_path}/{temp_dir}')
os.mkdir(f'{doc_path}/{temp_dir}')

word = wc.Dispatch('Word.Application')  # 打開(kāi)word應(yīng)用程序
try:
    for filename in Path(doc_path).glob('*.doc'):
        file = str(filename)
        dest_name = str(filename.parent/f'{temp_dir}'/str(filename.name))+'x'
        print(file, dest_name)
        doc = word.Documents.Open(file)  # 打開(kāi)word文件
        doc.SaveAs(dest_name, 12)  # 另存為后綴為'.docx'的文件,，其中參數(shù)12指docx文件
finally:
    word.Quit()

運(yùn)行結(jié)果：

轉(zhuǎn)換得到的文件：

批量提取docx文檔的圖片

docx文檔其實(shí)也是一個(gè)zip壓縮包,，所以我們可以通過(guò)zip包解壓它，下面的代碼將解壓每個(gè)docx文檔中的圖片,，我將其移動(dòng)到臨時(shí)目錄下的imgs目錄下：

import itertools
from zipfile import ZipFile
import shutil

if os.path.exists(f'{doc_path}/{temp_dir}/imgs'):
    shutil.rmtree(f'{doc_path}/{temp_dir}/imgs')
os.makedirs(f'{doc_path}/{temp_dir}/imgs')

i = 1
for filename in itertools.chain(Path(doc_path).glob('*.docx'), (Path(doc_path)/temp_dir).glob('*.docx')):
    print(filename)
    with ZipFile(filename) as zip_file:
        for names in zip_file.namelist():
            if names.startswith('word/media/image'):
                zip_file.extract(names, doc_path)
                os.rename(f'{doc_path}/{names}',
                          f'{doc_path}/{temp_dir}/imgs/{i}{names[names.find('.'):]}')
                print('\t', names, f'{i}{names[names.find('.'):]}')
                i += 1
shutil.rmtree(f'{doc_path}/word')

打印結(jié)果：

提取結(jié)果：

批量圖片格式轉(zhuǎn)換

PIL：Python Imaging Library,，已經(jīng)是Python平臺(tái)事實(shí)上的圖像處理標(biāo)準(zhǔn)庫(kù)了。PIL功能非常強(qiáng)大,，但API卻非常簡(jiǎn)單易用,。

由于PIL僅支持到Python 2.7，加上年久失修,，于是一群志愿者在PIL的基礎(chǔ)上創(chuàng)建了兼容的版本,，名字叫Pillow，支持最新Python 3.x,，又加入了許多新特性,，因此，我們可以直接安裝使用Pillow,。

如果安裝了Anaconda,，Pillow就已經(jīng)可用了。否則,，需要在命令行下通過(guò)pip安裝：

pip install pillow

直接修改文件擴(kuò)展名并不能真實(shí)的修改圖片格式,，通過(guò)pillow庫(kù)我們即可將圖片批量真實(shí)的轉(zhuǎn)換為jpg格式：

from PIL import Image

if not os.path.exists(f'{doc_path}/imgs'):
    os.mkdir(f'{doc_path}/imgs')

for filename in Path(f'{doc_path}/{temp_dir}/imgs').glob('*'):
    file = str(filename)
    with Image.open(file) as im:
        im.convert('RGB').save(
            f'{doc_path}/imgs/{filename.name[:filename.name.find('.')]}.jpg', 'jpeg')

轉(zhuǎn)換后：

完整代碼

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# 創(chuàng)建時(shí)間：2020/12/25 21:46
__author__ = 'xiaoxiaoming'

import itertools
import os
import shutil
from pathlib import Path
from zipfile import ZipFile

from PIL import Image
from win32com import client as wc  # 導(dǎo)入模塊


def word_img_extract(doc_path, temp_dir):
    if os.path.exists(f'{doc_path}/{temp_dir}'):
        shutil.rmtree(f'{doc_path}/{temp_dir}')
    os.mkdir(f'{doc_path}/{temp_dir}')

    word = wc.Dispatch('Word.Application')  # 打開(kāi)word應(yīng)用程序
    try:
        for filename in Path(doc_path).glob('*.doc'):
            file = str(filename)
            dest_name = str(filename.parent / f'{temp_dir}' / str(filename.name)) + 'x'
            print(file, dest_name)
            doc = word.Documents.Open(file)  # 打開(kāi)word文件
            doc.SaveAs(dest_name, 12)  # 另存為后綴為'.docx'的文件，其中參數(shù)12指docx文件
    finally:
        word.Quit()

    if os.path.exists(f'{doc_path}/{temp_dir}/imgs'):
        shutil.rmtree(f'{doc_path}/{temp_dir}/imgs')
    os.makedirs(f'{doc_path}/{temp_dir}/imgs')

    i = 1
    for filename in itertools.chain(Path(doc_path).glob('*.docx'), (Path(doc_path) / temp_dir).glob('*.docx')):
        print(filename)
        with ZipFile(filename) as zip_file:
            for names in zip_file.namelist():
                if names.startswith('word/media/image'):
                    zip_file.extract(names, doc_path)
                    os.rename(f'{doc_path}/{names}',
                              f'{doc_path}/{temp_dir}/imgs/{i}{names[names.find('.'):]}')
                    print('\t', names, f'{i}{names[names.find('.'):]}')
                    i += 1
    shutil.rmtree(f'{doc_path}/word')

    if not os.path.exists(f'{doc_path}/imgs'):
        os.mkdir(f'{doc_path}/imgs')

    for filename in Path(f'{doc_path}/{temp_dir}/imgs').glob('*'):
        file = str(filename)
        with Image.open(file) as im:
            im.convert('RGB').save(
                f'{doc_path}/imgs/{filename.name[:filename.name.find('.')]}.jpg', 'jpeg')


if __name__ == '__main__':
    doc_path = r'E:\tmp\答疑整理'
    temp_dir = 'temp'
    word_img_extract(doc_path, temp_dir)

最終全部執(zhí)行完成耗時(shí)7s：

GUI圖形化工具開(kāi)發(fā)

下面使用PySimpleGUI開(kāi)發(fā)一個(gè)圖形化工具,，使用以下命令安裝該庫(kù)：

pip install PySimpleGUI

如果是下載速度慢的可以用下面的清華鏡像地址下載：

pip install PySimpleGUI -i https://pypi.tuna./simple

以下是完整代碼：

import PySimpleGUI as sg

from word_img_extract import word_img_extract

sg.change_look_and_feel('GreenMono')

layout = [
    [
        sg.Text('請(qǐng)輸入word文檔所在的目錄：'),
        sg.In(size=(25, 1), enable_events=True, key='-FOLDER-'),
        sg.FolderBrowse('瀏覽'),
    ], [
        sg.Button('開(kāi)始抽取', enable_events=True, key='抽取'),
        sg.Text(size=(40, 1), key='-TOUT-')
    ]
]
window = sg.Window('word文檔圖片抽取系統(tǒng)', layout)
while True:
    event, values = window.read()
    if event in (None,):
        break  # 相當(dāng)于關(guān)閉界面
    elif event == '抽取':
        if values['-FOLDER-']:
            window['-TOUT-'].update('準(zhǔn)備抽?。。,?！')
            sg.popup('抽取期間程序?qū)⑻幱诩偎罓顟B(tài)，請(qǐng)稍等片刻,，提取完成后會(huì)彈出提示?。?！\n點(diǎn)擊ok后開(kāi)始抽?。。,?！')
            window['-TOUT-'].update('正在抽取中...')
            word_img_extract(values['-FOLDER-'])
            window['-TOUT-'].update('抽取完畢?。?！')
            sg.popup('抽取完畢?。?！')
        else:
            sg.popup('請(qǐng)先輸入word文檔所在的路徑！??！')
    print(f'Event: {event}， values: {values}')
window.close()

運(yùn)行效果：

打包exe

創(chuàng)建并激活虛擬環(huán)境：

conda create -n gui python=3.6
conda activate gui

注意：創(chuàng)建虛擬環(huán)境和激活環(huán)境并不是必須,，只是為了精簡(jiǎn)環(huán)境,，可以跳過(guò)

安裝打包所用的包：

pip install PySimpleGUI
pip install pillow
pip install pywin32
pip install pyinstaller

執(zhí)行以下命令進(jìn)行打包：

pyinstaller -F --icon=C:\Users\Think\Pictures\ico\ooopic_1467046829.ico word_img_extract_GUI.py

常用參數(shù)說(shuō)明：

-F 表示生成單個(gè)可執(zhí)行文件
-w 表示去掉控制臺(tái)窗口，這在GUI界面時(shí)非常有用,。不過(guò)如果是命令行程序的話那就把這個(gè)選項(xiàng)刪除吧,！
-p 表示你自己自定義需要加載的類路徑，一般情況下用不到
-i 表示可執(zhí)行文件的圖標(biāo)

打包結(jié)果：

帶上-w參數(shù)打包,，可以去掉控制臺(tái)：

pyinstaller -wF --icon=C:\Users\Think\Pictures\ico\ooopic_1467046829.ico word_img_extract_GUI.py

給GUI加入進(jìn)度條

改造處理程序,，借助生成器反饋程序的處理進(jìn)度，完整代碼如下：

import itertools
import os
import shutil
from pathlib import Path
from zipfile import ZipFile

from PIL import Image
from win32com import client as wc  # 導(dǎo)入模塊

def word_img_extract(doc_path, temp_dir='temp'):
    if os.path.exists(f'{doc_path}/{temp_dir}'):
        shutil.rmtree(f'{doc_path}/{temp_dir}')
    os.mkdir(f'{doc_path}/{temp_dir}')

    word = wc.Dispatch('Word.Application')  # 打開(kāi)word應(yīng)用程序
    try:
        files = list(Path(doc_path).glob('*.doc'))
        if len(files) == 0:
            raise Exception('當(dāng)前目錄中沒(méi)有word文檔')
        for i, filename in enumerate(files, 1):
            file = str(filename)
            dest_name = str(filename.parent / f'{temp_dir}' / str(filename.name)) + 'x'
            # print(file, dest_name)
            doc = word.Documents.Open(file)  # 打開(kāi)word文件
            doc.SaveAs(dest_name, 12)  # 另存為后綴為'.docx'的文件,，其中參數(shù)12指docx文件
            yield 'word doc格式轉(zhuǎn)docx格式：', i * 1000 // len(files)
    finally:
        word.Quit()

    if os.path.exists(f'{doc_path}/{temp_dir}/imgs'):
        shutil.rmtree(f'{doc_path}/{temp_dir}/imgs')
    os.makedirs(f'{doc_path}/{temp_dir}/imgs')

    i = 1
    files = list(itertools.chain(Path(doc_path).glob('*.docx'), (Path(doc_path) / temp_dir).glob('*.docx')))
    for j, filename in enumerate(files, 1):
        # print(filename)
        with ZipFile(filename) as zip_file:
            for names in zip_file.namelist():
                if names.startswith('word/media/image'):
                    zip_file.extract(names, doc_path)
                    os.rename(f'{doc_path}/{names}',
                              f'{doc_path}/{temp_dir}/imgs/{i}{names[names.find('.'):]}')
                    # print('\t', names, f'{i}{names[names.find('.'):]}')
                    i += 1
        yield 'word提取圖片：', j * 1000 // len(files)
    shutil.rmtree(f'{doc_path}/word')

    if not os.path.exists(f'{doc_path}/imgs'):
        os.mkdir(f'{doc_path}/imgs')

    files = list(Path(f'{doc_path}/{temp_dir}/imgs').glob('*'))
    for i, filename in enumerate(files, 1):
        file = str(filename)
        with Image.open(file) as im:
            im.convert('RGB').save(
                f'{doc_path}/imgs/{filename.name[:filename.name.find('.')]}.jpg', 'jpeg')
        yield '圖片轉(zhuǎn)換為jpg格式：', i * 1000 // len(files)


if __name__ == '__main__':
    doc_path = r'E:\tmp\答疑整理'
    for msg, i in word_img_extract(doc_path):
        print(f'\r {msg}{i}', end='')

GUI程序的最終完整代碼：

import PySimpleGUI as sg

from word_img_extract import word_img_extract

sg.change_look_and_feel('GreenMono')

layout = [
    [
        sg.Text('請(qǐng)輸入word文檔所在的目錄：'),
        sg.In(size=(25, 1), enable_events=True, key='-FOLDER-'),
        sg.FolderBrowse('瀏覽'),
    ], [
        sg.Button('開(kāi)始抽取', enable_events=True, key='抽取'),
        sg.Text(text_color='red', size=(47, 2), key='error'),
    ], [
        sg.Text('準(zhǔn)備：', size=(20, 1), key='-TOUT-'),
        sg.ProgressBar(1000, orientation='h', size=(35, 20), key='progressbar')
    ]
]
window = sg.Window('word文檔圖片抽取系統(tǒng)', layout)
while True:
    event, values = window.read()
    if event in (None,):
        break  # 相當(dāng)于關(guān)閉界面
    elif event == '抽取':
        if values['-FOLDER-']:
            window['error'].update('')
            try:
                for msg, i in word_img_extract(values['-FOLDER-']):
                    window['-TOUT-'].update(msg)
                    window['progressbar'].UpdateBar(i)
                window['-TOUT-'].update('抽取完畢?。?！')
            except Exception as e:
                window['error'].update(str(e))
        else:
            sg.popup('請(qǐng)先輸入word文檔所在的路徑?。?！')
window.close()

重新打包：

pyinstaller -wF --icon=C:\Users\Think\Pictures\ico\ooopic_1467046829.ico word_img_extract_GUI.py

運(yùn)行效果：

我是東哥,，最后給大家分享《100本Python電子書(shū)》，包括Python編程技巧,、數(shù)據(jù)分析,、爬蟲(chóng)、Web開(kāi)發(fā),、機(jī)器學(xué)習(xí),、深度學(xué)習(xí)。

現(xiàn)在免費(fèi)分享出來(lái),，有需要的讀者可以下載學(xué)習(xí),，在下面的公眾號(hào)「GitHuboy」里回復(fù)關(guān)鍵字：Python，就行,。

本站是提供個(gè)人知識(shí)管理的網(wǎng)絡(luò)存儲(chǔ)空間,，所有內(nèi)容均由用戶發(fā)布,，不代表本站觀點(diǎn)。請(qǐng)注意甄別內(nèi)容中的聯(lián)系方式,、誘導(dǎo)購(gòu)買等信息,，謹(jǐn)防詐騙。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容,，請(qǐng)點(diǎn)擊一鍵舉報(bào),。

轉(zhuǎn)藏 分享

QQ空間 QQ好友新浪微博微信

獻(xiàn)花（0） +1

來(lái)自： F2967527 > 《Python自動(dòng)化》

舉報(bào)/認(rèn)領(lǐng)