System Info
CPU
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 57 bits virtual
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 2
Core(s) per socket: 32
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 106
Model name: Intel(R) Xeon(R) Gold 6338N CPU @ 2.20GHz
Stepping: 6
CPU MHz: 2200.000
CPU max MHz: 3500.0000
CPU min MHz: 800.0000
BogoMIPS: 4400.00
Virtualization: VT-x
L1d cache: 48K
L1i cache: 32K
L2 cache: 1280K
L3 cache: 49152K
NUMA node0 CPU(s): 0-31,64-95
NUMA node1 CPU(s): 32-63,96-127
Installed packages:
absl-py==1.4.0
aiofiles==22.1.0
aiohttp==3.8.3
aiosignal==1.3.1
aiosqlite==0.18.0
anyio==3.6.2
argcomplete==1.10.3
argon2-cffi==21.3.0
argon2-cffi-bindings==21.2.0
arrow==1.2.3
asttokens==2.2.1
async-timeout==4.0.2
attrs==22.2.0
autograd==1.5
azure-core==1.10.0
azure-storage-blob==12.6.0
Babel==2.11.0
backcall==0.2.0
backports.zoneinfo==0.2.1
beautifulsoup4==4.8.2
bleach==6.0.0
boto3==1.26.64
botocore==1.29.64
cachetools==5.3.0
certifi==2022.12.7
cffi==1.15.1
chardet==3.0.4
charset-normalizer==2.1.1
click==8.1.3
cma==2.7.0
cnvrg==0.7.54
colorama==0.4.6
coloredlogs==15.0.1
comm==0.1.2
compressed-rtf==1.0.6
contourpy==1.0.7
croniter==1.3.8
cryptography==39.0.0
cycler==0.11.0
datasets==2.9.0
debugpy==1.6.6
decorator==5.1.1
defusedxml==0.7.1
dill==0.3.6
docx2txt==0.8
ebcdic==1.1.1
evaluate==0.4.0
executing==1.2.0
extract-msg==0.28.7
fastjsonschema==2.16.2
filelock==3.9.0
flatbuffers==23.1.21
fonttools==4.38.0
fqdn==1.5.1
frozenlist==1.3.3
fsspec==2023.1.0
future==0.18.3
gitdb==4.0.10
GitPython==3.1.30
google-api-core==2.11.0
google-auth==2.16.0
google-auth-oauthlib==0.4.6
google-cloud-core==2.3.2
google-cloud-storage==2.7.0
google-crc32c==1.5.0
google-resumable-media==2.4.1
googleapis-common-protos==1.58.0
grpcio==1.51.1
huggingface-hub==0.12.0
humanfriendly==10.0
idna==2.10
IMAPClient==2.1.0
importlib-metadata==6.0.0
importlib-resources==5.10.2
ipykernel==6.21.1
ipython==8.9.0
ipython-genutils==0.2.0
isodate==0.6.1
isoduration==20.11.0
jedi==0.18.2
Jinja2==3.1.2
jmespath==1.0.1
joblib==1.2.0
json5==0.9.11
jsonpointer==2.3
jsonschema==4.17.3
jstyleson==0.0.2
jupyter-client==8.0.2
jupyter-core==5.2.0
jupyter-events==0.5.0
jupyter-server==2.2.1
jupyter-server-fileid==0.6.0
jupyter-server-mathjax==0.2.6
jupyter-server-terminals==0.4.4
jupyter-server-ydoc==0.6.1
jupyter-ydoc==0.2.2
jupyterlab==3.6.1
jupyterlab-git==0.41.0
jupyterlab-pygments==0.2.2
jupyterlab-server==2.19.0
kiwisolver==1.4.4
lxml==4.9.2
Markdown==3.4.1
MarkupSafe==2.1.2
matplotlib==3.6.3
matplotlib-inline==0.1.6
mistune==2.0.4
mpmath==1.2.1
msrest==0.6.21
multidict==6.0.4
multiprocess==0.70.14
natsort==8.2.0
nbclassic==0.5.1
nbclient==0.7.2
nbconvert==7.2.9
nbdime==3.1.1
nbformat==5.7.3
nest-asyncio==1.5.6
networkx==2.8.2
ninja==1.10.2.4
nncf==2.4.0
notebook==6.5.2
notebook-shim==0.2.2
numpy==1.23.4
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
oauthlib==3.2.2
olefile==0.46
onnx==1.12.0
onnxruntime==1.12.1
openvino==2022.3.0
openvino-telemetry==2022.3.0
optimum==1.6.3
optimum-intel==1.6.1
packaging==23.0
pandas==1.5.2
pandocfilters==1.5.0
parso==0.8.3
pdfminer.six==20191110
pexpect==4.8.0
pickleshare==0.7.5
Pillow==9.4.0
pkgutil-resolve-name==1.3.10
platformdirs==2.6.2
progress==1.6
prometheus-client==0.16.0
prompt-toolkit==3.0.36
protobuf==3.20.1
psutil==5.9.4
ptyprocess==0.7.0
pure-eval==0.2.2
pyaml==21.10.1
pyarrow==11.0.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.21
pycryptodome==3.17
pydot==1.4.2
Pygments==2.14.0
pymoo==0.5.0
pyparsing==2.4.7
pyrsistent==0.19.3
python-dateutil==2.8.2
python-json-logger==2.0.4
python-pptx==0.6.21
pytz==2022.7.1
pytz-deprecation-shim==0.1.0.post0
PyYAML==6.0
pyzmq==25.0.0
regex==2022.10.31
requests==2.28.2
requests-oauthlib==1.3.1
responses==0.18.0
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rsa==4.9
s3transfer==0.6.0
scikit-learn==1.2.1
scipy==1.10.0
Send2Trash==1.8.0
sentencepiece==0.1.97
six==1.12.0
smmap==5.0.0
sniffio==1.3.0
sortedcontainers==2.4.0
soupsieve==2.3.2.post1
SpeechRecognition==3.8.1
stack-data==0.6.2
sympy==1.11.1
tensorboard==2.11.2
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
terminado==0.17.1
textract==1.6.5
texttable==1.6.7
threadpoolctl==3.1.0
tinycss2==1.2.1
tinynetrc==1.3.1
tokenizers==0.13.2
tomli==2.0.1
torch==1.13.1
torchvision==0.14.1
tornado==6.2
tqdm==4.64.1
traitlets==5.9.0
transformers==4.26.0
typing-extensions==4.4.0
tzdata==2022.7
tzlocal==4.2
uri-template==1.2.0
urllib3==1.25.11
wcwidth==0.2.6
webcolors==1.12
webencodings==0.5.1
websocket-client==1.5.1
Werkzeug==2.2.2
xlrd==1.2.0
XlsxWriter==3.0.8
xxhash==3.2.0
y-py==0.5.5
yarl==1.8.2
ypy-websocket==0.8.2
zipp==3.12.1
Who can help?
@lewtun @michaelbenayoun
Information
Tasks
Reproduction
I converted the summarizer model to onnx and then ran it:
from transformers import AutoTokenizer, pipeline
from optimum.onnxruntime import ORTModelForSeq2SeqLM
from datasets import load_dataset
billsum = load_dataset("billsum", split="ca_test")
billsum = billsum.train_test_split(test_size=0.2)
to_summarize = billsum["train"][0]['text']
model_id = "google/pegasus-pubmed"
model = ORTModelForSeq2SeqLM.from_pretrained(model_id, from_transformers=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)
pipe = pipeline("summarization", model=model, tokenizer=tokenizer)
prediction = pipe(to_summarize)
I also tried the openvino runtime
from transformers import AutoTokenizer, pipeline
from optimum.onnxruntime import OVModelForSeq2SeqLM
from datasets import load_dataset
billsum = load_dataset("billsum", split="ca_test")
billsum = billsum.train_test_split(test_size=0.2)
to_summarize = billsum["train"][0]['text']
model_id = "google/pegasus-pubmed"
model = OVModelForSeq2SeqLM.from_pretrained(model_id, from_transformers=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)
pipe = pipeline("summarization", model=model, tokenizer=tokenizer)
prediction = pipe(to_summarize)
Expected behavior
This is supposed to provide faster inference than the original pytorch model. Neither the onnx and nor the openvino runtime improve speed, in fact the inference time increases by manifold.
System Info
Installed packages:
Who can help?
@lewtun @michaelbenayoun
Information
Tasks
examplesfolder (such as GLUE/SQuAD, ...)Reproduction
I converted the summarizer model to onnx and then ran it:
I also tried the openvino runtime
Expected behavior
This is supposed to provide faster inference than the original pytorch model. Neither the onnx and nor the openvino runtime improve speed, in fact the inference time increases by manifold.