close
Skip to content

Add unicode support#1

Closed
termith wants to merge 1 commit into
adblockplus:masterfrom
termith:utf-8
Closed

Add unicode support#1
termith wants to merge 1 commit into
adblockplus:masterfrom
termith:utf-8

Conversation

@termith
Copy link
Copy Markdown

@termith termith commented Jan 17, 2018

Try to use python-abp API for ruadlist:

from abp.filters import parse_filterlist

with open('ruadlist+easylist.txt') as filterlist:
    for line in parse_filterlist(filterlist):
        print(line)

Get error:

Traceback (most recent call last):
  File "/Users/ddemidov/workspace/yandex/python-abp/test_parser.py", line 4, in <module>
    for line in parse_filterlist(filterlist):
  File "/Users/ddemidov/workspace/yandex/python-abp/abp/filters/parser.py", line 318, in parse_filterlist
    yield parse_line(line)
  File "/Users/ddemidov/workspace/yandex/python-abp/abp/filters/parser.py", line 283, in parse_line
    elif content.startswith('!'):
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 2: ordinal not in range(128)

Try to fix that

@kvas-it
Copy link
Copy Markdown
Contributor

kvas-it commented Jan 19, 2018

Hi Dmitry, thanks for the pull request.

I've seen people run into this problem before, but it's actually not really a bug in the code. The thing is: parse_filterlist expects an iterable of unicode strings, so if you give it an iterable of byte strings it only works if they don't have non-ascii characters.

However, the documentation is not very helpful about the whole thing and it gives an example that just breaks if you run it under Python 2 and have non-ascii strings in the filterlist, so I agree that it should be fixed.

I can't accept your fix as is because it breaks things on Python 3 (and even some tests on Python 2). If we want parse_filterlist and parse_filter to be flexible about accepting unicode/byte strings (which seems to be what people expect), we need to do some smart auto-detection that works on both Python 2 and 3. Then we'd also need to add a test for this. If you feel like implementing it, give it a go, otherwise I can do it myself, that should not take too long.

Let me know what you prefer.

Cheers,
Vasily

@kvas-it
Copy link
Copy Markdown
Contributor

kvas-it commented Jan 23, 2018

I've implemented more or less this feature in a more compatible way (see https://codereview.adblockplus.org/29677724/). This should land soon...ish.

@kvas-it
Copy link
Copy Markdown
Contributor

kvas-it commented Jan 26, 2018

The fix landed (see 69b6bcb)

@kvas-it kvas-it closed this Jan 26, 2018
abpbot pushed a commit that referenced this pull request May 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants