You are creating those bytes objects yourself:
item[‘title’] = [t.encode(‘utf-8’) for t in title]
item[‘link’] = [l.encode(‘utf-8’) for l in link]
item[‘desc’] = [d.encode(‘utf-8’) for d in desc]
Each of those t.encode(), l.encode() and d.encode() calls creates a bytes string. Do not do this, leave it to the JSON format to serialise these.
Next, you are making several other errors; you are encoding too much where there is no need to. Leave it to the json module and the standard file object returned by the open() call to handle encoding.
You also don’t need to convert your items list to a dictionary; it’ll already be an object that can be JSON encoded directly:
self.file = open(‘w3school_data_utf8.json’, ‘w’, encoding=’utf-8′)
def process_item(self, item, spider):
line = json.dumps(item) + ‘n’
I’m guessing you followed a tutorial that assumed Python 2, you are using Python 3 instead. I strongly suggest you find a different tutorial; not only is it written for an outdated version of Python, if it is advocating line.decode(‘unicode_escape’) it is teaching some extremely bad habits that’ll lead to hard-to-track bugs. I can recommend you look at Think Python, 2nd edition for a good, free, book on learning Python 3.
myvar = b’asdqweasdasd’