IllegalCharacterError raised when exporting xlsx #370

MohiniLimbodia · 2019-07-08T15:57:47Z

Using tablib version 0.13.0
The issue is with Control Characters and Surrogates

Steps to reproduce:

from tablib import Dataset
data=Dataset((u'\x1f',),)
data.export('xlsx')

Stack Trace

IllegalCharacterErrorTraceback (most recent call last)
<ipython-input-1-9b9694f739ff> in <module>()
      1 from tablib import Dataset
      2 data=Dataset((u'\x1f',),)
----> 3 data.export('xlsx')

/srv/jupyter/local/lib/python2.7/site-packages/tablib/core.pyc in export(self, format, **kwargs)
    466             raise UnsupportedFormat('Format {0} cannot be exported.'.format(format))
    467 
--> 468         return export_set(self, **kwargs)
    469 
    470     # -------

/srv/jupyter/local/lib/python2.7/site-packages/tablib/formats/_xlsx.pyc in export_set(dataset, freeze_panes)
     41     ws.title = dataset.title if dataset.title else 'Tablib Dataset'
     42 
---> 43     dset_sheet(dataset, ws, freeze_panes=freeze_panes)
     44 
     45     stream = BytesIO()

/srv/jupyter/local/lib/python2.7/site-packages/tablib/formats/_xlsx.pyc in dset_sheet(dataset, ws, freeze_panes)
    145                         cell.value = unicode('%s' % col, errors='ignore')
    146                 except TypeError:
--> 147                     cell.value = unicode(col)

/srv/jupyter/local/lib/python2.7/site-packages/openpyxl/cell/cell.pyc in value(self, value)
    292     def value(self, value):
    293         """Set the value and infer type and display options."""
--> 294         self._bind_value(value)
    295 
    296     @property

/srv/jupyter/local/lib/python2.7/site-packages/openpyxl/cell/cell.pyc in _bind_value(self, value)
    195 
    196         elif isinstance(value, STRING_TYPES):
--> 197             value = self.check_string(value)
    198             self.data_type = self.TYPE_STRING
    199             if len(value) > 1 and value.startswith("="):

/srv/jupyter/local/lib/python2.7/site-packages/openpyxl/cell/cell.pyc in check_string(self, value)
    158         value = value[:32767]
    159         if next(ILLEGAL_CHARACTERS_RE.finditer(value), None):
--> 160             raise IllegalCharacterError
    161         return value
    162 

IllegalCharacterError:

leonardoarroyo · 2019-07-15T18:42:05Z

Also getting this. Openpyxl detects illegal characters with the following regex:

ILLEGAL_CHARACTERS_RE = re.compile(r'[\000-\010]|[\013-\014]|[\016-\037]')

I'm using django-import-export, which in turn uses tablib, which uses openpyxl. Still trying to figure out who should handle the data cleaning.

leonardoarroyo · 2019-07-15T19:45:52Z

I fixed the issue by cleaning the data before it gets sent to tablib.

If anyone else is having this issue with django-import-export, you can clean your fields by overriding export_field on your resource.

from import_export import resources
from openpyxl.cell.cell import ILLEGAL_CHARACTERS_RE

class CleanModelResource(resources.ModelResource):
  def export_field(self, field, obj):
    v = super(CleanModelResource, self).export_field(field, obj)
    if type(v) == str:
      v = ILLEGAL_CHARACTERS_RE.sub('', v)
    return v

@leonardoarroyo

based on fix from @leonardoarroyo jazzband/tablib#370 (comment)

matthewhegarty · 2024-11-12T13:41:10Z

Re the comment in 380,

I think that's probably the right thing to do [raise exception], it will let people know what the problem is and let them react appropriately (change the character, delete it, etc.).

are you minded to handle this within tablib, or let clients handle it for themselves?

It's an open issue in django-import-export but I'm happy to submit a PR to tablib if it's decided that tablib should handle it.

claudep · 2024-11-12T14:58:42Z

What do you mean by "handle it", ignoring those chars?

matthewhegarty · 2024-11-12T15:25:08Z

I'm wondering if there should be logic in tablib to sanitize the 'illegal' chars, by replacing them with an empty string. This is the approach listed as the workaround above and also here. I have also added similar logic to django-import-export here.

It could be an optional flag to export, similar to how we escape excel formulae.

claudep · 2024-11-13T07:48:04Z

@hugovk, any opinion on this? If we add an optional flag, what would be the default?

An alternative replacement character would be �

As for the implementation, instead of adding one more regex to each output string, we could catch IllegalCharacterError and then replace offending characters only in that case. This would be a little more efficient IMO.

claudep mentioned this issue Oct 4, 2019

Fixes #314 - Delegate type coercion to openpyxl #386

Merged

willtho89 added a commit to willtho89/drf-renderer-xlsx that referenced this issue Jun 15, 2021

replace ILLEGAL_CHARACTERS_RE

5f7be90

based on fix from @leonardoarroyo jazzband/tablib#370 (comment)

matthewhegarty mentioned this issue Nov 12, 2024

XLSX export raises IllegalCharacterError django-import-export/django-import-export#2000

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IllegalCharacterError raised when exporting xlsx #370

IllegalCharacterError raised when exporting xlsx #370

MohiniLimbodia commented Jul 8, 2019 •

edited by hugovk

Loading

leonardoarroyo commented Jul 15, 2019 •

edited by hugovk

Loading

leonardoarroyo commented Jul 15, 2019 •

edited by hugovk

Loading

matthewhegarty commented Nov 12, 2024

claudep commented Nov 12, 2024

matthewhegarty commented Nov 12, 2024

claudep commented Nov 13, 2024

IllegalCharacterError raised when exporting xlsx #370

IllegalCharacterError raised when exporting xlsx #370

Comments

MohiniLimbodia commented Jul 8, 2019 • edited by hugovk Loading

leonardoarroyo commented Jul 15, 2019 • edited by hugovk Loading

leonardoarroyo commented Jul 15, 2019 • edited by hugovk Loading

matthewhegarty commented Nov 12, 2024

claudep commented Nov 12, 2024

matthewhegarty commented Nov 12, 2024

claudep commented Nov 13, 2024

MohiniLimbodia commented Jul 8, 2019 •

edited by hugovk

Loading

leonardoarroyo commented Jul 15, 2019 •

edited by hugovk

Loading

leonardoarroyo commented Jul 15, 2019 •

edited by hugovk

Loading