Support bigbird ONNX export with attention_type == "block_sparse"

### System Info

```shell
CPU

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
Address sizes:       46 bits physical, 57 bits virtual
CPU(s):              128
On-line CPU(s) list: 0-127
Thread(s) per core:  2
Core(s) per socket:  32
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               106
Model name:          Intel(R) Xeon(R) Gold 6338N CPU @ 2.20GHz
Stepping:            6
CPU MHz:             2200.000
CPU max MHz:         3500.0000
CPU min MHz:         800.0000
BogoMIPS:            4400.00
Virtualization:      VT-x
L1d cache:           48K
L1i cache:           32K
L2 cache:            1280K
L3 cache:            49152K
NUMA node0 CPU(s):   0-31,64-95
NUMA node1 CPU(s):   32-63,96-127
```

```
python == 3.8.10
```

Installed packages:

```
absl-py==1.4.0
aiofiles==22.1.0
aiohttp==3.8.3
aiosignal==1.3.1
aiosqlite==0.18.0
anyio==3.6.2
argcomplete==1.10.3
argon2-cffi==21.3.0
argon2-cffi-bindings==21.2.0
arrow==1.2.3
asttokens==2.2.1
async-timeout==4.0.2
attrs==22.2.0
autograd==1.5
azure-core==1.10.0
azure-storage-blob==12.6.0
Babel==2.11.0
backcall==0.2.0
backports.zoneinfo==0.2.1
beautifulsoup4==4.8.2
bleach==6.0.0
boto3==1.26.64
botocore==1.29.64
cachetools==5.3.0
certifi==2022.12.7
cffi==1.15.1
chardet==3.0.4
charset-normalizer==2.1.1
click==8.1.3
cma==2.7.0
cnvrg==0.7.54
colorama==0.4.6
coloredlogs==15.0.1
comm==0.1.2
compressed-rtf==1.0.6
contourpy==1.0.7
croniter==1.3.8
cryptography==39.0.0
cycler==0.11.0
datasets==2.9.0
debugpy==1.6.6
decorator==5.1.1
defusedxml==0.7.1
dill==0.3.6
docx2txt==0.8
ebcdic==1.1.1
evaluate==0.4.0
executing==1.2.0
extract-msg==0.28.7
fastjsonschema==2.16.2
filelock==3.9.0
flatbuffers==23.1.21
fonttools==4.38.0
fqdn==1.5.1
frozenlist==1.3.3
fsspec==2023.1.0
future==0.18.3
gitdb==4.0.10
GitPython==3.1.30
google-api-core==2.11.0
google-auth==2.16.0
google-auth-oauthlib==0.4.6
google-cloud-core==2.3.2
google-cloud-storage==2.7.0
google-crc32c==1.5.0
google-resumable-media==2.4.1
googleapis-common-protos==1.58.0
grpcio==1.51.1
huggingface-hub==0.12.0
humanfriendly==10.0
idna==2.10
IMAPClient==2.1.0
importlib-metadata==6.0.0
importlib-resources==5.10.2
ipykernel==6.21.1
ipython==8.9.0
ipython-genutils==0.2.0
isodate==0.6.1
isoduration==20.11.0
jedi==0.18.2
Jinja2==3.1.2
jmespath==1.0.1
joblib==1.2.0
json5==0.9.11
jsonpointer==2.3
jsonschema==4.17.3
jstyleson==0.0.2
jupyter-client==8.0.2
jupyter-core==5.2.0
jupyter-events==0.5.0
jupyter-server==2.2.1
jupyter-server-fileid==0.6.0
jupyter-server-mathjax==0.2.6
jupyter-server-terminals==0.4.4
jupyter-server-ydoc==0.6.1
jupyter-ydoc==0.2.2
jupyterlab==3.6.1
jupyterlab-git==0.41.0
jupyterlab-pygments==0.2.2
jupyterlab-server==2.19.0
kiwisolver==1.4.4
lxml==4.9.2
Markdown==3.4.1
MarkupSafe==2.1.2
matplotlib==3.6.3
matplotlib-inline==0.1.6
mistune==2.0.4
mpmath==1.2.1
msrest==0.6.21
multidict==6.0.4
multiprocess==0.70.14
natsort==8.2.0
nbclassic==0.5.1
nbclient==0.7.2
nbconvert==7.2.9
nbdime==3.1.1
nbformat==5.7.3
nest-asyncio==1.5.6
networkx==2.8.2
ninja==1.10.2.4
nncf==2.4.0
notebook==6.5.2
notebook-shim==0.2.2
numpy==1.23.4
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
oauthlib==3.2.2
olefile==0.46
onnx==1.12.0
onnxruntime==1.12.1
openvino==2022.3.0
openvino-telemetry==2022.3.0
optimum==1.6.3
optimum-intel==1.6.1
packaging==23.0
pandas==1.5.2
pandocfilters==1.5.0
parso==0.8.3
pdfminer.six==20191110
pexpect==4.8.0
pickleshare==0.7.5
Pillow==9.4.0
pkgutil-resolve-name==1.3.10
platformdirs==2.6.2
progress==1.6
prometheus-client==0.16.0
prompt-toolkit==3.0.36
protobuf==3.20.1
psutil==5.9.4
ptyprocess==0.7.0
pure-eval==0.2.2
pyaml==21.10.1
pyarrow==11.0.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.21
pycryptodome==3.17
pydot==1.4.2
Pygments==2.14.0
pymoo==0.5.0
pyparsing==2.4.7
pyrsistent==0.19.3
python-dateutil==2.8.2
python-json-logger==2.0.4
python-pptx==0.6.21
pytz==2022.7.1
pytz-deprecation-shim==0.1.0.post0
PyYAML==6.0
pyzmq==25.0.0
regex==2022.10.31
requests==2.28.2
requests-oauthlib==1.3.1
responses==0.18.0
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rsa==4.9
s3transfer==0.6.0
scikit-learn==1.2.1
scipy==1.10.0
Send2Trash==1.8.0
sentencepiece==0.1.97
six==1.12.0
smmap==5.0.0
sniffio==1.3.0
sortedcontainers==2.4.0
soupsieve==2.3.2.post1
SpeechRecognition==3.8.1
stack-data==0.6.2
sympy==1.11.1
tensorboard==2.11.2
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
terminado==0.17.1
textract==1.6.5
texttable==1.6.7
threadpoolctl==3.1.0
tinycss2==1.2.1
tinynetrc==1.3.1
tokenizers==0.13.2
tomli==2.0.1
torch==1.13.1
torchvision==0.14.1
tornado==6.2
tqdm==4.64.1
traitlets==5.9.0
transformers==4.26.0
typing-extensions==4.4.0
tzdata==2022.7
tzlocal==4.2
uri-template==1.2.0
urllib3==1.25.11
wcwidth==0.2.6
webcolors==1.12
webencodings==0.5.1
websocket-client==1.5.1
Werkzeug==2.2.2
xlrd==1.2.0
XlsxWriter==3.0.8
xxhash==3.2.0
y-py==0.5.5
yarl==1.8.2
ypy-websocket==0.8.2
zipp==3.12.1
```


### Who can help?

@lewtun @michaelbenayoun 

### Information

- [X] The official example scripts
- [ ] My own modified scripts

### Tasks

- [X] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

I converted the summarizer model to onnx and then ran it:

```
from transformers import AutoTokenizer, pipeline
from optimum.onnxruntime import ORTModelForSeq2SeqLM
from datasets import load_dataset

billsum = load_dataset("billsum", split="ca_test")
billsum = billsum.train_test_split(test_size=0.2)
to_summarize = billsum["train"][0]['text']

model_id = "google/pegasus-pubmed"
model = ORTModelForSeq2SeqLM.from_pretrained(model_id, from_transformers=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)
pipe = pipeline("summarization", model=model, tokenizer=tokenizer)
prediction = pipe(to_summarize)
```

I also tried the openvino runtime
```
from transformers import AutoTokenizer, pipeline
from optimum.onnxruntime import OVModelForSeq2SeqLM
from datasets import load_dataset

billsum = load_dataset("billsum", split="ca_test")
billsum = billsum.train_test_split(test_size=0.2)
to_summarize = billsum["train"][0]['text']

model_id = "google/pegasus-pubmed"
model = OVModelForSeq2SeqLM.from_pretrained(model_id, from_transformers=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)
pipe = pipeline("summarization", model=model, tokenizer=tokenizer)
prediction = pipe(to_summarize)
```


### Expected behavior

This is supposed to provide faster inference than the original pytorch model. Neither the onnx and nor the openvino runtime improve speed, in fact the inference time increases by manifold.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support bigbird ONNX export with attention_type == "block_sparse" #754

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Support bigbird ONNX export with attention_type == "block_sparse" #754

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions