"""
serializejson
=============
+---------------------------+--------------------------------------------------------------------------------------------------------------------------+
| **Authors** | `Baptiste de La Gorce <contact@smartaudiotools.com>`_ |
+---------------------------+--------------------------------------------------------------------------------------------------------------------------+
| **PyPI** | https://pypi.org/project/serializejson |
+---------------------------+--------------------------------------------------------------------------------------------------------------------------+
| **Documentation** | https://smartaudiotools.github.io/serializejson |
+---------------------------+--------------------------------------------------------------------------------------------------------------------------+
| **Sources** | https://github.com/SmartAudioTools/serializejson |
+---------------------------+--------------------------------------------------------------------------------------------------------------------------+
| **Issues** | https://github.com/SmartAudioTools/serializejson/issues |
+---------------------------+--------------------------------------------------------------------------------------------------------------------------+
| **Noncommercial license** | `Prosperity Public License 3.0.0 <https://github.com/SmartAudioTools/serializejson/blob/master/LICENSE-PROSPERITY.rst>`_ |
+---------------------------+--------------------------------------------------------------------------------------------------------------------------+
| **Commercial license** | `Patron License 1.0.0 <https://github.com/SmartAudioTools/serializejson/blob/master/LICENSE-PATRON.rst>`_ |
| | ⇒ `Sponsor me ! <https://github.com/sponsors/SmartAudioTools>`_ or `contact me ! <contact@smartaudiotools.com>`_ |
+---------------------------+--------------------------------------------------------------------------------------------------------------------------+
**serializejson** is a python library for fast serialization and deserialization
of python objects in `JSON <http://json.org>`_ designed as a safe, interoperable and human-readable drop-in replacement for the Python `pickle <https://docs.python.org/3/library/pickle.html>`_ package.
Complex python object hierarchies are serializable, deserializable or updatable in once, allowing for example to save or restore a complete application state in few lines of code.
The library is build upon
`python-rapidjson <https://github.com/python-rapidjson/python-rapidjson>`_,
`pybase64 <https://github.com/mayeut/pybase64>`_ and
`blosc <https://github.com/Blosc/python-blosc>`_ for optional `zstandard <https://github.com/facebook/zstd>`_ compression.
Some of the main features:
- supports Python 3.7 (maybe lower) or greater.
- serializes arbitrary python objects into a dictionary by adding `__class__` ,and eventually `__init__`, `__new__`, `__state__`, `__items__` keys.
- calls the same objects methods as pickle. Therefore almost all pickable objects are serializable with serializejson without any modification.
- for not already pickable object, you will allways be able to serialize it by adding methodes to the object or creating plugins for pickle or serializejson.
- generally 2x slower than pickle for dumping and 3x slower than pickle for loading (on your benchmark) except for big arrays (optimisation will soon be done).
- serializes and deserializes bytes and bytearray very quickly in base64 thanks to `pybase64 <https://github.com/mayeut/pybase64>`_ and lossless `blosc <https://github.com/Blosc/python-blosc>`_ compression.
- serialize properties and attributes with getters and setters if wanted (unlike pickle).
- json data will still be directly loadable if you have transform some attributes in slots or properties in your code since your last serialization. (unlike pickle)
- can serialize `__init__(self,..)` arguments by name instead of positions, allowing to skip arguments with defauts values and making json datas robust to a change of `__init__` parameters order.
- serialized objects take generally less space than when serialized with pickle: for binary data, the 30% increase due to base64 encoding is in general largely compensated using the lossless `blosc <https://github.com/Blosc/python-blosc>`_ compression.
- serialized objects are human-readable and easy to read. Unlike pickled data, your data will never become unreadable if your code evolves: you will always be able to modify your datas with a text editor (with find & replace for example if you change an attribut name).
- serialized objects are text and therefore versionable and comparable with versionning and comparaison tools.
- can safely load untrusted / unauthenticated sources if authorized_classes list parameter is set carefully with strictly necessary objects (unlike pickle).
- can update existing objects recursively instead of override them. serializejson can be used to save and restore in place a complete application state (⚠ not yet well tested).
- filters attribute starting with "_" by default (unlike pickle). You can keep them if wanted with `filter_ = False`.
- numpy arrays can be serialized as lists with automatic conversion in both ways or in a conservative way.
- supports circular references and serialize only once duplicated objects, using "$ref" key an path to the first occurance in the json : `{"$ref": "root.xxx.elt"}` (⚠ not yet if the object is a list or dictionary).
- accepts json with comment (// and /\* \*/) if `accept_comments = True`.
- can automatically recognize objects in json from keys names and recreate them, without the need of `__class__` key, if passed in `recognized_classes`.
- serializejson is easly interoperable outside of the Python ecosystem with this recognition of objects from keys names or with `__class__` translation between python and other language classes.
- dump and load support string path.
- can iteratively encode (with append) and decode (with iterator) a list in json file, which helps saving memory space during the process of serialization and deserialization and useful for logs.
.. warning::
**⚠** Do not load serializejson files from untrusted / unauthenticated sources without carefully setting the load authorized_classes parameter.
**⚠** Never dump a dictionary with the `__class__` key, otherwise serializejson will attempt to reconstruct an object when loading the json.
Be careful not to allow a user to manually enter a dictionary key somewhere without checking that it is not `__class__`.
Due to current limitation of rapidjson we cannot we cannot at the moment efficiently detect dictionaries with the `__class__` key to raise an error.
Installation
============
**Last offical release**
.. code-block::
pip install serializejson
**Developpement version unreleased**
.. code-block::
pip install git+https://github.com/SmartAudioTools/serializejson.git
Examples
================
**Serialization with fonctions API**
.. code-block:: python
import serializejson
# serialize in string
object1 = set([1,2])
dumped1 = serializejson.dumps(object1)
loaded1 = serializejson.loads(dumped1)
print(dumped1)
>{
> "__class__": "set",
> "__init__": [1,2]
>}
# serialize in file
object2 = set([3,4])
serializejson.dump(object2,"dumped2.json")
loaded2 = serializejson.load("dumped2.json")
**Serialization with classes based API.**
.. code-block:: python
import serializejson
encoder = serializejson.Encoder()
decoder = serializejson.Decoder()
# serialize in string
object1 = set([1,2])
dumped1 = encoder.dumps(object1)
loaded1 = decoder.loads(dumped1)
print(dumped1)
# serialize in file
object2 = set([3,4])
encoder.dump(object2,"dumped2.json")
loaded2 = decoder.load("dumped2.json")
**Update existing object**
.. code-block:: python
import serializejson
object1 = set([1,2])
object2 = set([3,4])
dumped1 = serializejson.dumps(object1)
print(f"id {id(object2)} : {object2}")
serializejson.loads(dumped1,obj = object2, updatables_classes = [set])
print(f"id {id(object2)} : {object2}")
**Iterative serialization and deserialization**
.. code-block:: python
import serializejson
encoder = serializejson.Encoder("my_list.json",indent = None)
for elt in range(3):
encoder.append(elt)
print(open("my_list.json").read())
for elt in serializejson.Decoder("my_list.json"):
print(elt)
>[0,1,2]
>0
>1
>2
More examples and complete documentation `here <https://smartaudiotools.github.io/serializejson/>`_
License
=======
Copyright 2020 Baptiste de La Gorce
For noncommercial use or thirty-day limited free-trial period commercial use, this project is licensed under the `Prosperity Public License 3.0.0 <https://github.com/SmartAudioTools/serializejson/blob/master/LICENSE-PROSPERITY.rst>`_.
For non limited commercial use, this project is licensed under the `Patron License 1.0.0 <https://github.com/SmartAudioTools/serializejson/blob/master/LICENSE-PATRON.rst>`_.
To acquire a license please `contact me <mailto:contact@smartaudiotools.com>`_, or just `sponsor me on GitHub <https://github.com/sponsors/SmartAudioTools>`_ under the appropriate tier ! This funding model helps me making my work sustainable and compensates me for the work it took to write this crate!
Third-party contributions are licensed under `Apache License, Version 2.0 <http://www.apache.org/licenses/LICENSE-2.0>`_ and belong to their respective authors.
"""
try:
import importlib.metadata as importlib_metadata # New in version 3.8
except ModuleNotFoundError:
import importlib_metadata
try:
__version__ = importlib_metadata.version("serializejson")
except:
pass
import os
import warnings
import io
import rapidjson
import gc
import blosc
import errno
from collections import deque
from pybase64 import b64decode, b64encode_as_string
from _collections_abc import list_iterator
try:
import numpy
from numpy import ndarray
use_numpy = True
except ModuleNotFoundError:
use_numpy = False
from . import serialize_parameters
# def add_authorized_classes(*classes):
# if len(classes) == 0 and type(classes[0]) in (tuple,list,set):
# classes = classes[0]
# for elt in classes:
# if not type(elt) is str:
# elt = class_str_from_class(elt)
# authorized_classes.add(elt)
from .tools import (
getstate,
setstate,
instance,
tuple_from_instance,
class_str_from_class,
class_from_class_str,
from_name,
_get_getters,
_get_setters,
_get_properties,
encoder_parameters,
_onlyOneDimSameTypeNumbers,
_onlyOneDimNumbers,
blosc_compressions,
setters_names_from_class,
slots_from_class,
authorized_classes,
Reference,
constructors,
)
authorized_classes.update(
{
"bytes",
"bytearray",
"complex",
"frozenset",
"tuple",
"type",
"range",
"set",
"slice",
"dict_non_str_keys",
"collections.Counter",
"collections.defaultdict",
"collections.deque",
"collections.OrderedDict",
}
)
__all__ = [
"dumps",
"dump",
"loads",
"load",
"append",
"Encoder",
"Decoder",
"getstate",
"class_from_class_str",
]
# flag allowing to keep None as allowed value for Encoder default_value.
no_default_value = []
# --- FONCTIONS BASED API ----------------------
[docs]def dump(obj, file, **argsDict):
"""
Dump an object into json file.
Args:
obj: object to dump.
file (str or file-like): path or file.
**argsDict: parameters passed to the Encoder (see documentation).
"""
if isinstance(file, str):
fp = open(file, "wb")
else:
fp = file
Encoder(**argsDict)(obj, fp)
[docs]def dumps(obj, **argsDict):
"""
Dump object into json string.
If you want to return a bytes for pickle drop-in pickle remplacement,
your should ether replace `pickle.dumps` calls by `serializejson.dumpb` calls
or make an `from serializejson import dumpb as dumps` at the start of your script
Args:
obj: object to dump.
**argsDict: parameters passed to the Encoder (see documentation).
"""
return Encoder(**argsDict)(obj, return_bytes=False)
def dumpb(obj, **argsDict):
"""
Dump object into json bytes.
Args:
obj: object to dump.
**argsDict: parameters passed to the Encoder (see documentation).
"""
return Encoder(**argsDict)(obj, return_bytes=False).encode("utf_8")
[docs]def append(obj, file=None, *, indent="\t", **argsDict):
"""
Append an object into json file.
Args:
obj: object to dump.
file (str or file-like):
path or file. The file must be empty or containing a json list.
indent: indent passed to Encoder.
**argsDict: other parameters passed to the Encoder (see documentation).
"""
file = _open_for_append(file, indent)
Encoder(**argsDict)(obj, file)
_close_for_append(file, indent)
[docs]def loads(json, *, obj=None, iterator=False, **argsDict):
# on ne peut pas en meme temps updater objet
"""
Load an object from a json string or bytes.
Args:
json:
the json string or bytes.
obj (optional):
If provided, the object `obj` will be updated and no new object will be created.
iterator:
if `True` and the json corresponds to a list then the items will be read one by one which reduces RAM consumption.
**argsDict:
parameters passed to the Decoder (see documentation).
Return:
created object, updated object if `obj` is provided or elements iterator if `iterator` is `True`.
"""
decoder = Decoder(**argsDict)
if iterator:
return decoder
else:
return decoder(json=json, obj=obj)
[docs]def load(file, *, obj=None, iterator=False, **argsDict):
"""
Load an object from a json file.
Args:
file (str or file-like):
the json path or file-like object.
obj (optional):
if provided, the object `obj` will be updated and no new object will be created.
iterator:
if `True` and the json corresponds to a list then the items will be read one by one which reduces RAM consumption.
**argsDict:
parameters passed to the Decoder (see documentation).
Return:
created object, updated object if passed obj or elements iterator if iterator is True.
"""
if iterator:
return Decoder(**argsDict)
else:
return Decoder(**argsDict).load(file=file, obj=obj)
def jsonpath(obj):
"""return the json path of loaded object"""
return id_to_path.get(id(obj), None)
# --- CLASSES BASED API -------------------------------------------------------
[docs]class Encoder(rapidjson.Encoder):
"""
class for serialization of python objects into json.
Args:
file (str or file-like):
The json path or file-like object.
When specified, the encoded result will be written there
if you don't pricise file to`dump()` method later.
attributes_filter (bool or set/list/tuple):
Controls whether remove "private" attributs starting with "_" from the saved state
for objects without plugin, __getstate__,__serializejson__ or reimplemented
__reduce_ex__ or __reduce__ methodes.
- `False` : filter private attributes to none classes (if not filtered in __reduce__ or __gestate__ methodes)
- `True` : filter private attributes for all classes
- `set/list/tuple` : filter private attributes for this classes
Use it temporarily.
- In order to stay compatible with pickle,you sould better code one of the __getstate__, __reduce_ex__,__reduce__ or a pickle plugin, filtering attributes starting with "_".
- Otherwise, in order to be independent of this parameter, code a _serializejson__ method or serializejson plugin.
- In this method or plugin you can call the helping function : state = serialize.__gestate__(self,attributes_filter = True)
properties (bool, None, set/list/tuple, dict ):
Controls whether add properties to the saved state
for objects without plugin, __getstate__,__serializejson__ or reimplemented
__reduce_ex__ or __reduce__ methodes.
- `False` : add properties to none classes (as pickle)
- `True` : add properties for all classes
- `None` : (default) add properties defined in serializejson.properties dict (added by plugins or manualy before encoder call) (see documentation section: ref:`"Add plugins to serializejson"<add-plugins-label>`. )
- `set/list/tuple` : add all properties for classes in this set/list/tuple, in addition to properties defined in serializejson.properties dict [class1, class2,..] (not secure if unstruted json, use it only for debuging)
- `dict` : add properties defined in dict, in addition to properties defined in serializejson.properties dict {class1 : ["propertie1","propertie1"], class2: True}
Use it temporarily.
- In order to stay compatible with pickle, you sould better code one of the __getstate__, __reduce_ex__, __reduce__ or a pickle plugin, retrieving values for properties and returning them in the same dictionnary than __slots__, as the second element of a state tuple.
- Otherwise, in order to be independent of this parameter, code a _serializejson__ method or serializejson plugin retrieving values for properties and return them in the state dictionnary.
- In this method or plugin you can call the helping function : state = serialize.__gestate__(self, properties = True or list of properties names)
getters (bool or set/list/tuple):
Controls whether add values retrieve with getters to the saved state
for objects without plugin, __getstate__,__serializejson__ or reimplemented
__reduce_ex__ or __reduce__ methodes.
- `False` : save no other getters than thus called in __getstate__ methodes, like pickle.
- `True` : save getters for all objects
- `None` : (default) save getters defined in serializejson.getter dict (added by plugins or manualy before encoder call) (see documentation section: ref:`"Add plugins to serializejson"<add-plugins-label>`. )
- `set/list/tuple` : save getters for classes in set/list/tuple, in addition to getters defined in serializejson.setters dict [class1, class2,..] (not secure if unstruted json, use it only for debuging)
- `dict` : save getters defined in dict, in addition to getters defined in serializejson.getters dict {class1 : {"attribut_name":"getter_name",...}, class2: True}
Use it temporarily.
- In order to stay compatible with pickle, you sould better code one of the __getstate__, __reduce_ex__, __reduce__ or a pickle plugin, retrieving values for getters and returning them in the state. And code a __setstate__ methode calling setters for this values .
- Otherwise, in order to be independent of this parameter, code a _serializejson__ method or serializejson plugin retrieving values for getters and returning them in the state. And code a __setstate__ methode calling setters for this values or leave the Decpder's setters parameter as True.
- In this method or plugin you can call the helping function : state = serialize.__gestate__(self,getters = True or {"a":"getA","b":"getB"}). With getters as True, the getters will be automaticaly guessed. Wit getters as a dict allow the finest control and is faster because getters are not guessed from introspection. With tuple as key in this dict, you can retrieve several attributes values from one getter.
remove_default_values (bool or set/list/tuple):
Controls whether remove values same as their default value from the state in
order to save memory space, for objects without plugin, __getstate__,
__serializejson__ or reimplemented __reduce_ex__ or __reduce__ methodes.
- `False` : remove defaul values to none classes
- `True` : remove defaul values for all classes
- `set/list/tuple` : remove defaul values for this classes.
Use it temporarily.
- Since the default values will not be stored and may change between different versions of your code, never use it for long term storage. Be aware that in order to know the default value, serializejson will create an insistence of the object's class without any __init__ argument.
- In order to stay compatible with pickle, you sould better code one of the __getstate__, __reduce_ex__, __reduce__ or a pickle plugin, removing values same as their default value.
- Otherwise, in order to be independent of this parameter, code a _serializejson__ method or serializejson plugin removing values same as their default value.
- In this method or plugin you can call the helping function : state = serialize.__gestate__(self,remove_default_values = True or dict {name : default_value,...})
chunk_size:
Write the file in chunks of this size at a time.
ensure_ascii:
Whether non-ascii str are dumped with escaped unicode or utf-8.
indent (None, int or '\\\\t'):
Indentation width to produce pretty printed JSON.
- `None` : Json in one line (quicker than with indent).
- `int` : new lines and `indent` spaces for indent.
- '\\\\t' : new lines and tabulations for indent (take less space than int > 1).
single_line_init:
whether `__init__` args must be serialized in one line.
single_line_new:
whether `__new__` args must be serialized in one line.
single_line_list_numbers:
whether list of numbers of same type must be serialize in one line.
sort_keys:
whether dictionary keys should be sorted alphabetically.
Since python 3.7 dictionary order is guaranteed to be insertion order.
Some codes may now rely on this particular order, like the key order of the state returned by __gestate__.
bytes_compression(None or str):
Compression for bytes, bytesarray and numpy arrays:
- `None` : no compression, use only base 64.
- `str` : compression name ("blosc_zstd", "blosclz", "blosc_lz4", "blosc_lz4hc" or "blosc_zlib") with maximum compression level 9.
- `tuple` : (compression name, compression level) with compression level from 0 (no compression) to 9 (maximum compression)
By default the "blosc_zstd" compression is used with compression level 1.
For the highest compression (but with slower dumping) use "blosc_zstd" with compression level 9
bytes_compression_diff_dtypes (tuple of dtype)
tuple of dtype for wich serialize json encode the first element followed by the differences between consecutive elements of an array before the compression.
A cumulative sum will be used for the decompression
bytes_compression_threads (int,str):
Number of threads user for the compression
- `int` : number of threads user for the compression
- `"cpus"`: use as many thread than cpu
- `"determinist"` us one thread with blosc compression for determinist compression eiter as many thread than cpu
bytes_size_compression_threshold (int):
bytes size threshold beyond compression is tried to reduce size of
bytes, bytesarray and numpy array if `bytes_compression` is not None.
The default value is 512, generaly beside the compression is not
worth it due to the header size and the additional cpu cost.
array_readable_max_size (int,None or dict):
Defines the maximum array.array size for serialization in readable numbers.
By default array_readable_max_size is set to 0, all non empty arrays are encoded in base 64.
- `int` : all arrays smaller than or egal to this size are serialized in readable numbers.
- `None` : there is no maximum size and all arrays are serialized in readable numbers.
- `dict` : for each typecode key, the value define the maximum size of this typecode arrays for serialization in readable numbers. If value is `None` there is no maximum and array of this typecode are all serialized in readable numbers. If you want only signed int arrays to be readable, then you should pass `array_readable_max_size = {"i":None}`
.. note::
serialization of int arrays can take much less space in readable,
but is much slower than in base 64 for big arrays. If you have lot or large int arrays and
performance matters, then you should stay with default value 0.
numpy_array_readable_max_size (int,None or dict):
Defines the maximum numpy array size (product of the array’s dimensions) for serialization in readable numbers.
By default numpy_array_readable_max_size is set to 0, all non empty numpy arrays are encoded in base 64.
- `int` : all numpy arrays smaller than or egal to size are serialized in readable numbers.
- `None` : there is no maximum size and all numpy arrays are serialized in readable numbers.
- `dict` : for each dtype key, the value define the maximum size of this dtype arrays for serialization in readable numbers. If value is `None` there is no maximum and numpy array of this dtype are all serialized in readable numbers. If you want only numpy arrays int32 to be readable, then you should pass `numpy_array_readable_max_size = {"int32":None}`
.. note::
serialization in readable can take much less space in int32 if the values ar smaller or equal to 9999,
but is much slower than in base 64 for big arrays. If you have lot or large numpy int32 arrays and
performance matters, then you should stay with default value 0.
numpy_array_to_list:
whether numpy array should be serialized as list.
.. warning::
This should be used only for interoperability with other json libraries.
If you want readable values in your json, we recommend to use instead
`numpy_array_readable_max_size` which is not destructive.
With `numpy_array_to_list` set to `True`:
- numpy arrays will be indistinctable from list in json.
- `Decoder(numpy_array_from_list=True)` will recreate numpy array from lists of bool, int or float, if not an `__init__` args list, with the the risque of unwanted convertion of lists to numpy arrays.
- dtype of the numpy array will be loosed if not bool, int32 or float64 and converted to the bool, int32 or float64 when loading
- Empty numpy array will be converted to [] without any way to guess the dtype and will stay an empty list when loading event with `numpy_array_from_list = True`
numpy_types_to_python_types:
whether numpy integers and floats outside of a array must be convert to python types.
It save space and generally don't affect
strict_pickle (False by default)
If True serialize with exactly the same behaviour than pickle:
- disabling serializejson plugins for custom serialization.(no numpyB64)
- disabling attributes_filter
- disabling keys sorting
- disabling numpy_array_to_list
- disabling numpy_types_to_python_types
- keeping __dict__ and __slots__ separated in a tuple if both, instead of merge them in a dictionnary (you should prepare __setstat__ methods to receive both a tuple or a dictionnary)
- making same checks than pickle
- raising the sames Errors than pickle
**plugins_parameters:
extra keys arguments are stocked in "serialize_parameters"
global module and accessible in plugins module in order to allow
the choice between serialization options in plugins.
"""
"""
bytearray_use_bytearrayB64:
save bytearray with references to serializejson.bytearrayB64
instead of verbose use of base64.b64decode. It save space but make
the json file dependent of the serializejson module.
numpy_array_use_numpyB64:
save numpy arrays with references to serializejson.numpyB64
instead of verbose use of base64.b64decode. It save space but make
the json file dependent of the serializejson module.
"""
def __new__(
cls,
file=None,
*,
strict_pickle=False,
return_bytes=False,
attributes_filter=True,
properties=False,
getters=False,
remove_default_values=False,
chunk_size=65536,
ensure_ascii=False,
indent="\t",
single_line_init=True,
single_line_new=True,
single_line_list_numbers=True,
sort_keys=False,
bytes_compression=("blosc_zstd", 1), #
bytes_compression_diff_dtypes=tuple(),
bytes_size_compression_threshold=512,
bytes_compression_threads=1,
array_use_arrayB64=True, # le laisser ?
array_readable_max_size=0, # 'int32':-1
numpy_array_use_numpyB64=True, # le laisser ?
numpy_array_readable_max_size=0, # 'int32':-1
numpy_array_to_list=False,
numpy_types_to_python_types=True,
protocol=4, # protocol pour pickle
**plugins_parameters,
):
# if not bytes_to_string:
# bytes_mode = rapidjson.BM_NONE
# else:
# bytes_mode = rapidjson.BM_UTF8
if strict_pickle:
attributes_filter = False
sort_keys = False
numpy_array_to_list = False
numpy_types_to_python_types = False
self = super().__new__(
cls,
ensure_ascii=ensure_ascii,
indent=indent,
sort_keys=sort_keys,
bytes_mode=rapidjson.BM_NONE,
number_mode=rapidjson.NM_NAN,
iterable_mode=rapidjson.IM_ONLY_LISTS,
mapping_mode=rapidjson.MM_ONLY_DICTS
# **argsDict
)
self.protocol = protocol
self.attributes_filter = bool_or_set(attributes_filter)
self.properties = _get_properties(properties)
self.getters = _get_getters(getters)
self.remove_default_values = bool_or_set(remove_default_values)
self.file = file
self.return_bytes = return_bytes
if indent is None:
self.single_line_list_numbers = False
self.single_line_init = False
self.single_line_new = False
else:
self.single_line_list_numbers = single_line_list_numbers
self.single_line_init = single_line_init
self.single_line_new = single_line_new
# rapid json enregistre self.indent_char et self.indent_count , mais ne permet pas de savoir si indent = None ...
self.indent = indent
self._dump_one_line = indent is None
self.dumped_classes = set()
self.chunk_size = chunk_size
bytes_compression_level = 9
if bytes_compression is not None:
if isinstance(bytes_compression, (list, tuple)):
bytes_compression, bytes_compression_level = bytes_compression
if bytes_compression not in blosc_compressions:
raise Exception(
f"{bytes_compression} compression unknown. Available values for bytes_compression are {', '.join(blosc_compressions)}"
)
self.bytes_compression = bytes_compression
self.bytes_compression_threads = bytes_compression_threads
self.bytes_compression_diff_dtypes = bytes_compression_diff_dtypes
self.bytes_compression_level = bytes_compression_level
self.bytes_size_compression_threshold = bytes_size_compression_threshold
self.array_use_arrayB64 = array_use_arrayB64
self.array_readable_max_size = array_readable_max_size
self.numpy_array_to_list = numpy_array_to_list
self.numpy_array_use_numpyB64 = numpy_array_use_numpyB64
self.numpy_array_readable_max_size = numpy_array_readable_max_size
self.numpy_types_to_python_types = numpy_types_to_python_types
self.strict_pickle = strict_pickle
unexpected_keywords_arguments = set(plugins_parameters) - set(
encoder_parameters
)
if unexpected_keywords_arguments:
raise TypeError(
"serializejson.Encoder got unexpected keywords arguments '"
+ ", ".join(unexpected_keywords_arguments)
+ "'"
)
self.plugins_parameters = encoder_parameters.copy()
self.plugins_parameters.update(plugins_parameters)
return self
[docs] def dump(self, obj, file=None, close=True):
"""
Dump object into json file.
Args:
obj: object to dump.
file (optional str or file-like):
the json path or file-like object.
When specified, json is written into this file.
Otherwise json is written into the file passed to `Encoder()` constructor.
close (optional bool):
weither dump must close the file after dumping (True by default).
"""
if file is None:
file = self.file
if isinstance(file, str):
self.fp = open(file, "wb")
else:
self.fp = file
self.__call__(obj, fp=self.fp, chunk_size=self.chunk_size)
if close:
self.fp.close()
del self.fp
[docs] def dumps(self, obj):
"""
Dump object into json string.
"""
return self.__call__(obj, return_bytes=False)
[docs] def dumpb(self, obj):
"""
Dump object into json bytes.
"""
return self.__call__(obj, return_bytes=True)
def close(self):
if hasattr(self, "fp"):
self.fp.close()
del self.fp
# else :
# raise Exception("json file already closed")
def clear(self, close=False):
self._reset()
self._update_serialize_parameters()
# self.file = open(self.file, "rb+")
if isinstance(self.file, str):
path = self.file
if os.path.exists(path):
self.fp = open(path, "rb+")
self.fp.truncate(0)
else:
self.fp = open(path, "wb+")
else:
self.fp.truncate(0)
if close:
self.fp.close()
del self.fp
# @profile
[docs] def append(self, obj, file=None, close=False):
"""
Append object into json file.
Args:
obj: object to dump.
file (optional str or file-like): path or file. If provided, the object will be
dumped into this file instead of being dumped into the file passed at the Encoder
constructor. The file must be empty or contain a json list.
close :
- `True` the file will be closed afterappend and reopen at the next append
- `False` (by default) the file will be kepped open for the next append.
You will have to manually close se file with encoder.close()
"""
self._update_serialize_parameters()
if file is None:
file = self.file
if hasattr(self, "fp"):
fp = _open_for_append(self.fp, self.indent)
else:
self.fp = fp = _open_for_append(file, self.indent)
rapidjson.Encoder.__call__(self, obj, stream=fp, chunk_size=self.chunk_size)
_close_for_append(fp, self.indent)
if close:
fp.close()
del self.fp
[docs] def get_dumped_classes(self):
"""
Return the all dumped classes.
In order to reuse them as `authorize_classes` argument when loading with a ``serializejson.Decoder``.
"""
return self.dumped_classes
# @profile
def default(self, inst):
# Equivalent au calback "default" qu'on peut passer à dump ou dumps
id_ = id(inst)
if id_ in self._already_serialized:
path = self._get_path(inst, already_explored=set([id(locals())]))
if path is not None:
return rapidjson.RawJSON(f'{"$ref": "{path}"}')
else:
self._already_serialized.add(id_)
type_inst = type(inst)
if self.numpy_types_to_python_types and type_inst in _numpy_types:
return _numpy_dtypes_to_python_types[type_inst](inst)
if use_numpy and type_inst is ndarray and self.numpy_array_to_list:
if self._dump_one_line or not self.single_line_list_numbers:
return (
inst.tolist()
) # A REVOIR : pas génial... va tester si nombres tous du meme type et ne pas pas utiliser rapidjson.NM_NATIVE?
if inst.dtype in _numpy_float_dtypes:
number_mode = self.number_mode
else:
number_mode = rapidjson.NM_NATIVE # permet décceler pas mal
if inst.ndim == 1:
return rapidjson.RawJSON(
rapidjson.dumps(
inst.tolist(),
ensure_ascii=False,
number_mode=number_mode,
iterable_mode=rapidjson.IM_ONLY_LISTS,
mapping_mode=rapidjson.MM_ONLY_DICTS,
)
)
return [
rapidjson.RawJSON(
rapidjson.dumps(
elt.tolist(),
ensure_ascii=False,
number_mode=number_mode,
iterable_mode=rapidjson.IM_ONLY_LISTS,
mapping_mode=rapidjson.MM_ONLY_DICTS,
)
)
for elt in inst
] # inst.tolist()
if type_inst is tuple:
# isinstance(inst,tuple) attrape les struct_time # je l'ai mis là plutot que dans tuple_from_instance car très spécifique à json et les tuples n'ont pas de réduce contrairement à set , qui lui est pour l'instant traité dans dict_from_instance -> tuple_from_instance
self.dumped_classes.add(tuple)
dic = {"__class__": "tuple", "__new__": list(inst)}
elif type_inst is Reference:
return rapidjson.RawJSON(
'{"$ref": "%s%s"}'
% (
self._get_path(inst.obj, already_explored=set([id(inst.__dict__)])),
inst.sup_str,
)
)
else:
dic = self._dict_from_instance(
inst
) # 8.6 % (correspond au temps pour conversion en b64 avec pybase64.b64encode) du temps sur obj = bytes(numpy.arange(2**20,dtype=numpy.float64).data)
if not self._dump_one_line:
if self.single_line_init:
args = dic.get("__init__", None)
if isinstance(args, list):
# 91.2 % du temps avec obj = bytes(numpy.arange(2**20,dtype=numpy.float64).data)
dic["__init__"] = rapidjson.RawJSON(
rapidjson.dumps(
args,
ensure_ascii=self.ensure_ascii,
default=self._default_one_line,
sort_keys=self.sort_keys,
bytes_mode=self.bytes_mode,
number_mode=self.number_mode,
iterable_mode=rapidjson.IM_ONLY_LISTS,
mapping_mode=rapidjson.MM_ONLY_DICTS
# **self.kargs
)
)
if self.single_line_new:
args = dic.get("__new__", None)
if type(args) is list:
dic[
"__new__"
# 91.2 % du temps avec obj = bytes(numpy.arange(2**20,dtype=numpy.float64).data)
] = rapidjson.RawJSON(
rapidjson.dumps(
args,
ensure_ascii=self.ensure_ascii,
default=self._default_one_line,
sort_keys=self.sort_keys,
bytes_mode=self.bytes_mode,
number_mode=self.number_mode,
iterable_mode=rapidjson.IM_ONLY_LISTS,
mapping_mode=rapidjson.MM_ONLY_DICTS
# **self.kargs
)
)
if self.single_line_list_numbers:
for key, value in dic.items():
if (
key != "__class__"
and (key != "__init__" or not self.single_line_init)
and (key != "__new__" or not self.single_line_new)
and type(value) is list
and _onlyOneDimSameTypeNumbers(value)
):
dic[key] = rapidjson.RawJSON(
rapidjson.dumps(
value,
ensure_ascii=self.ensure_ascii,
default=self._default_one_line,
bytes_mode=self.bytes_mode,
number_mode=self.number_mode,
iterable_mode=rapidjson.IM_ONLY_LISTS,
mapping_mode=rapidjson.MM_ONLY_DICTS
# **self.kargs
)
)
# self._already_serialized_id_dic_to_obj_dic[id(dic)] = (
# inst,
# dic,
# ) # important de metre dic avec sinon il va être detruit et son identifiant va être réutilisé.
# if self.add_id:
# dic["_id"] = id_
return dic
# raise TypeError('%r is not JSON serializable' % inst)
# @profile
def _default_one_line(self, inst):
type_inst = type(inst)
if self.numpy_types_to_python_types and type_inst in _numpy_types:
return _numpy_dtypes_to_python_types[type_inst](inst)
if type_inst is tuple:
# isinstance(inst,tuple) attrape les struct_time # je l'ai mis là plutot que dans tuple_from_instance car très spécifique à json et les tuples n'ont pas de réduce contrairement à set , qui lui est pour l'instant traité dans dict_from_instance -> tuple_from_instance
self.dumped_classes.add(tuple)
return {"__class__": "tuple", "__new__": list(inst)}
if type_inst is Reference:
return {
"$ref": self._get_path(
inst.obj, already_explored=set([id(inst.__dict__)])
)
+ inst.sup_str
}
if type_inst is ndarray and self.numpy_array_to_list:
return inst.tolist()
return self._dict_from_instance(inst)
def _dict_from_instance(self, inst):
if type(inst) is dict: # dictionnary with non string key
d = {"__class__": "dict_non_str_keys"}
init_dict = d
for key, value in inst.items():
# if type(key) is tuple :
# key = list(key)
new_key = None
type_key = type(key)
if type_key is int:
# pas mis les float pour garder -inf et inf (nan ca déconne dans les dictionnaires)
new_key = str(key)
elif type_key is str:
try:
rapidjson.loads(key)
except:
if key.endswith("'") and (
key.startswith("'")
or key.startswith("b'")
or key.startswith("b64'")
):
new_key = f"'{key}'"
else:
new_key = key
else:
new_key = f"'{key}'"
elif type_key is bytes:
try:
new_key = f"b'{key.decode('ascii_printables')}'"
except:
new_key = f"b64'{b64encode_as_string(key)}'"
elif type_key is tuple:
key = list(key)
if new_key is None:
new_key = rapidjson.dumps(
key,
default=self._default_one_line,
ensure_ascii=self.ensure_ascii,
sort_keys=self.sort_keys,
bytes_mode=self.bytes_mode,
number_mode=rapidjson.NM_NATIVE,
iterable_mode=rapidjson.IM_ONLY_LISTS,
# mapping_mode=rapidjson.MM_ONLY_DICTS
# **self.kargs
)
init_dict[new_key] = value
return d
# if type(inst) is OrderedDict :
# if not self.sort_keys : # on a besoin d'avoir accès à self.sort_keys et specifique à serializejson
# return {
# "__class__" : "collections.OrderedDict",
# "__items__" : dict(inst)
# }
# else :
# return {
# "__class__" : "collections.OrderedDict",
# "__items__" : list(inst.items())
# }
if type(inst) is dotdict:
return dict(inst)
class_, initArgs, state, listitems, dictitems, newArgs = tuple_from_instance(
inst, self.protocol
)
if type(class_) is not str:
class_ = class_str_from_class(class_)
self.dumped_classes.add(class_)
dictionnaire = {"__class__": class_}
for args, method in ((newArgs, "__new__"), (initArgs, "__init__")):
if args is not None:
if type(args) is dict:
dictionnaire[method] = args
else:
if class_ in remove_add_braces:
if args:
dictionnaire[method] = args[0]
else:
dictionnaire[method] = []
elif len(args) == 1:
type_first = type(args[0])
if (
type_first not in (tuple, list)
and not (
self.numpy_array_to_list and type_first is numpy.ndarray
)
and ((type_first is not dict) or "__class__" in args[0])
):
dictionnaire[method] = args[0]
else:
dictionnaire[method] = list(args) # args is a tuple
else:
dictionnaire[method] = list(args) # args is a tuple
if listitems:
dictionnaire["__items__"] = listitems
elif dictitems:
dictionnaire["__items__"] = dictitems
if state:
if (type(state) is not dict) or (
hasattr(inst, "__setstate__") and not all_keys_are_str(state)
):
dictionnaire["__state__"] = state
else:
dictionnaire.update(state)
return dictionnaire
def __call__(self, obj, fp=None, return_bytes=None):
if return_bytes is None:
return_bytes = self.return_bytes
if (
type(obj) is list
and self.single_line_list_numbers
and _onlyOneDimSameTypeNumbers(obj)
):
return rapidjson.dumps(
obj,
ensure_ascii=False,
default=self._default_one_line,
bytes_mode=self.bytes_mode,
number_mode=self.number_mode,
iterable_mode=rapidjson.IM_ONLY_LISTS,
mapping_mode=rapidjson.MM_ONLY_DICTS,
# return_bytes = return_bytes
# **self.kargs
)
self._update_serialize_parameters()
self._reset()
self._root = obj
encoded = rapidjson.Encoder.__call__(
self, obj, stream=fp, chunk_size=self.chunk_size
)
self._clean()
return encoded
def _update_serialize_parameters(self):
blosc.set_nthreads(self.bytes_compression_threads)
serialize_parameters.__dict__.update(self.__dict__)
serialize_parameters.__dict__.update(self.plugins_parameters)
def _reset(self):
self.dumped_classes = set()
self._already_serialized = set()
# self._already_serialized_id_dic_to_obj_dic = dict()
def _clean(self):
del self._already_serialized
# del self.dumped_classes
# del self._already_serialized_id_dic_to_obj_dic
# @profile
# ,list_deep = 10):
def _searchSerializedParent(self, obj, already_explored=set(), attribut=None):
root = self._root
if obj is root:
return [(["root"], False)]
id_obj = id(obj)
if id_obj in already_explored:
return []
already_explored = already_explored.copy()
already_explored.add(id_obj)
already_explored.add(id(locals()))
pathElements = list()
refs = gc.get_referrers(obj)
already_explored.add(id(refs))
# potential_parents = [parent_test for parent_test in gc.get_referrers(obj)if ((id(parent_test) not in already_explored) and isinstance(parent_test,(dict,list))) ]
# print(len(potential_parents))
for parent_test in refs:
id_parent_test = id(parent_test)
if id_parent_test not in already_explored:
type_parent_test = type(parent_test)
if type_parent_test is dict:
if self.sort_keys:
parent_test_keys = sorted(parent_test)
else:
parent_test_keys = parent_test.keys()
for key in parent_test_keys: # sorted(parent_test):
value = parent_test[key]
if value is obj:
for elt, is_attribut in self._searchSerializedParent(
parent_test, already_explored, attribut=obj
):
if is_attribut:
pathElements.append((elt, False))
else:
pathElements.append((elt + [f"['{key}']"], False))
break
elif (
type_parent_test is list
and not type(parent_test[-1]) is list_iterator
):
for key, value in enumerate(parent_test):
if value is obj:
for elt, _ in self._searchSerializedParent(
parent_test, already_explored
):
pathElements.append((elt + ["[%d]" % key], False))
break
elif id_parent_test in self._already_serialized:
if hasattr(parent_test, "__dict__"):
dic = self._dict_from_instance(parent_test)
for i, (key, value) in enumerate(dic.items()):
if value is attribut:
for elt, _ in self._searchSerializedParent(
parent_test, already_explored
):
pathElements.append((elt + [".", (i), key], True))
break
if hasattr(parent_test, "__slots__"):
dic = self._dict_from_instance(parent_test)
for i, (key, value) in enumerate(dic.items()):
if value is obj:
for elt, _ in self._searchSerializedParent(
parent_test, already_explored
):
pathElements.append((elt + [".", (i), key], True))
break
return pathElements
def _get_path(self, obj, already_explored=set()):
already_explored.add(id(locals()))
pathElements = self._searchSerializedParent(
obj, already_explored=already_explored
)
if not pathElements:
return None
# return f'impossible to find a path from root object for {obj}'
#raise Exception("impossible to find a path from root object for %s" % obj)
# print("!",pathElements)
# return pathElements[0][0]
return "".join([e for e in sorted(pathElements)[0][0] if isinstance(e, str)])
[docs]class Decoder(rapidjson.Decoder):
"""
Decoder for loading objects serialized in json files or strings.
Args:
file (string or file-like):
the json path or file-like object.
When specified, the decoder will read json from this file
if you don't pricise file to`load()` method later.
authorized_classes (set/list/tuple):
Define the classes that serializejson is authorized to recreate from
the `__class__` keywords in json, in addition to default authorized classes
and classes autorized by plugins.
default authorize classes are :
array.array,bytearray,bytes,range,set,slice,time.struct_time,tuple,
type,frozenset,collections.Counter,collections.OrderedDict,
collections.defaultdict,collections.deque,complex,datetime.date,
datetime.datetime,datetime.time,datetime.timedelta,decimal.Decimal,
numpy.array,numpy.bool_,numpy.dtype,numpy.float16,numpy.float32,
numpy.float64,numpy.frombuffer,numpy.int16,numpy.int32,numpy.int64,
numpy.int8,numpy.ndarray,numpy.uint16,numpy.uint32,numpy.uint64,
numpy.uint8,numpyB64.
authorized_classes must be a set/list/tuple of classes or strings
corresponding to the qualified names of classes (`module.class_name`).
If the loading json contain an unauthorized `__class__`,
serializejson will raise a TypeError exception.
.. warning::
Do not load serializejson files from untrusted / unauthenticated
sources without carefully set the `authorized_classes` parameter.
Never authorize "eval", "exec", "apply" or other functions or
classes which could allow execution of malicious code
with for example :
``{"__class__":"eval","__init__":"do_bad_things()"}``
unauthorized_classes_as_dict (False by default)
Controls whether unauthorized classes should be decoded as dict
without raising a TypeError (or as dotdict if dotdict parameter is True,
see the "dotdict" parameter for further explanation).
recognized_classes (set/list/tuple):
Classes (string with qualified names or classes) that
serializejson will try to recognize from keys names.
A classe will be recognized if keys names of a json dictionnary is
a superset of the classe's default attributs names.
Classe's default attributs name are slots and attributs names in __dict__ not starting with "_"
after initialisation (serializejson will create an instance of each class passed in recognized_classes in order to determine
this attributs)
The instance will be instancied with new (with no argement), and __init__ will not be called .
If you want execute some initialization code, you must add a
__setstate__() methode to your object or setter/properties with setters/properties Encoder's parameters
activated.
updatables_classes (set/list/tuple):
Classes (string with qualified names or classes) that
serializejson will try to update if already in the provided object `obj` when calling `load` or `loads`.
Objects will be recreated for other classes.
properties (bool, None, set/list/tuple, dict ):
Controls whether `load` will call properties's setters instead of
put them in self.__dict__ when the object as no `__setstate__` method
and properties are merged with attributes in the state dictionnary
when dumping (merged if strict_pickle is False) .
- False: call properties setters for none classes (as pickle)
- True : (default) call properties setters for all classes
- None : call only properties setters defined in serializejson.properties dict (added by plugins or manualy before decoder call)
(see documentation section: ref:`"Add plugins to serializejson"<add-plugins-label>`. )
- set/list/tuple : call all properties setters for classes in this set/list/tuple, in addition to properties defined in serializejson.properties dict
[class1, class2,..] (not secure if unstruted json, use it only for debuging)
- dict : call properties setters defined in dict, in addition to properties defined in serializejson.properties dict
{class1 : ["propertie1","propertie1"], class2: True}
.. warning::
**The properties's setters are called in the json order !**
- in alphabetic order if `sort_keys = True` or if the object has not __getstate__ method.
- in the order returned by the __getstate__ method if `sort_keys = False`
- Be carefull if you rename an attribute because properties setters calls order can change.
- If `properties = True` (or is a list) then serializejson load will differ from pickle that don't call attribute's setters.
**It is best to add the __setate__() method to your object:**
- If you want to stay compatible with pickle with the same behavior.
- If you want to call properties setters in a different order than alphabetic order and don't want to code a __getstate__ method given the order.
- If you want to call properties setters in a order robust to an attribute name change.
- If you want to be robust to this `properties` parameter change.
- If you want to avoid transitional states during setting of attribute one by one.
In this method you can call the helping function :
serialize.__setstate__(self,properties = True)
setters (bool,None,set/list/tuple,dict):
Controls whether `load` will try to call `setxxx`,`set_xxx` or `setXxx` methods
or `xxx` property setter for each attributes of the serialized objects
when the object as no `__setstate__` method.
- False: call no other setters than thus called in __setstate__ methodes, like pickle.
- True : (default) explore and call all setters for all objects (not secure if unstruted json, use it only for debuging)
- None : call only setters defined in serializejson.setters dict (added by plugins or manualy before decoder call)
(see documentation section: ref:`"Add plugins to serializejson"<add-plugins-label>`. )
- set/list/tuple : explore and call setters classes in set/list/tuple, in addition to setters defined in serializejson.setters dict
[class1, class2,..] (not secure if unstruted json, use it only for debuging)
- dict : call setters defined in dict, in addition to setters defined in serializejson.setters dict
{class1 : {"attribut_name":"setter_name",...}, class2: True}
.. warning::
**The attribute's setters are called in the json order !**
- in alphabetic order if `sort_keys = True` or if the object has not __getstate__ method.
- in the order returned by the __getstate__ method if `sort_keys = False`
- Be carefull if you rename an attribute because setters calls order can change.
- If `set_attribute = True` (or is a list) then serializejson load will differ from pickle that don't call attribute's setters.
**It is best to add the __setate__() method to your object:**
- If you want to stay compatible with pickle with the same behavior.
- If you want to call setters in a different order than alphabetic order and don't want to code a __getstate__ method given the order.
- If you want to call setters in a order robust to an attribute name change.
- If you want to be robust to this `setters` parameter change.
- If you want to avoid transitional states during setting of attribute one by one.
In this method you can call the helping function :
serialize.__setstate__(self,setters = True or dict {name : setter_name,...})
strict_pickle (False by default)
If True serialize with exactly the same behaviour than pickle:
- disabling properties setters
- disabling setters
- disabling numpy_array_from_list
accept_comments (bool):
Controls whether serializejson accepts to parse json with comments.
numpy_array_from_list (bool):
Controls whether list of bool, int or floats with same types elements should be loaded into numpy arrays.
numpy_array_from_heterogenous_list (bool):
Controls whether list of bool, int or floats with same or heterogenous types elements should be loaded into numpy arrays.
default_value:
The value returned if the path passed to `load` doesn't exist.
It allows to have a default object at the first run of the script or
when the json has been deleted, without raising of FileNotFoundError.
chunk_size (int):
Chunk size used when reading the json file.
dotdict (bool):
load dicts as serializejson.dotdict, a dict subclasse with acces to key names with a dot as object attributes enabled.
A dotdict will be serialized as dict again when dumping.
dotdict allows you to more easily access the elements of a deserialized dictionary, with the same '.' acces syntax as for an object, allowing you if you wish, to later transform the dictionaries in your jsons into real objects with the addition of the "__class__" field, without having to modify your code.
add_jsonpath
If True, the source json path will be added to the loaded object as `_jsonpath` attribut.
If False (by default), nothing will be added to the loaded object, but you can still retrieve the source json path with the "serializejson.jsonpath" function which will find the path from the object identifier
"""
"""
Inherited from rapidjson.Decoder:
number_mode (int): Enable particular behaviors in handling numbers
datetime_mode (int): How should datetime, time and date instances be handled
uuid_mode (int): How should UUID instances be handled
parse_mode (int): Whether the parser should allow non-standard JSON extensions (nan, -inf, inf )
"""
def __new__(
cls,
file=None,
*,
authorized_classes=None,
unauthorized_classes_as_dict=False,
recognized_classes=None,
updatables_classes=None,
setters=True,
properties=True,
default_value=no_default_value,
accept_comments=False,
numpy_array_from_list=False,
numpy_array_from_heterogenous_list=False,
chunk_size=65536,
strict_pickle=False,
dotdict=False,
add_jsonpath=False,
):
if accept_comments:
parse_mode = rapidjson.PM_COMMENTS
else:
parse_mode = rapidjson.PM_NONE
self = super().__new__(cls, parse_mode=parse_mode) # , **argsDict)
self.strict_pickle = strict_pickle
if strict_pickle:
setters = False
properties = False
numpy_array_from_list = False
numpy_array_from_heterogenous_list = False
add_jsonpath = False
self.file = file
self.setters = _get_setters(setters)
self.properties = _get_properties(properties)
self._authorized_classes_strs = _get_authorized_classes_strings(
authorized_classes
)
self.unauthorized_classes_as_dict = unauthorized_classes_as_dict
self._class_from_attributes_names = _get_recognized_classes_dict(
recognized_classes
)
self.set_updatables_classes(updatables_classes)
# self.accept_comments = accept_comments
# self.numpy_array_from_list=numpy_array_from_list
self.default_value = default_value
self.chunk_size = chunk_size
self.dotdict = dotdict
self.add_jsonpath = add_jsonpath
self.file_iter = None
self._updating = False
self.numpy_array_from_list = numpy_array_from_list
self.numpy_array_from_heterogenous_list = numpy_array_from_heterogenous_list
if numpy_array_from_heterogenous_list:
self.numpy_array_from_list = True
self.end_array = self._end_array_if_numpy_array_from_heterogenous_list
elif numpy_array_from_list:
self.end_array = self._end_array_if_numpy_array_from_list
return self
[docs] def load(self, file=None, obj=None):
"""
Load object from json file.
Args:
file (optional str or file-like):
the json path or file-like object.
When specified, json is read from this file.
Otherwise json is read from the file passed to `Decoder()` constructor.
obj (optional):
If provided, the object `obj` will be updated and no new object will be created.
Return:
created object or updated object if passed obj.
"""
if file is None:
file = self.file
path = None
if isinstance(file, str):
path = file
# print("load",file)
if not os.path.exists(file):
if self.default_value is no_default_value:
raise FileNotFoundError(
errno.ENOENT, os.strerror(errno.ENOENT), file
)
return self.default_value
file = _open_with_good_encoding(file)
elif file is None: # a priori pointeur vers fichier
raise ValueError('Encoder.load need a "file" path/file argument')
loaded = self.__call__(json=file, obj=obj)
if path:
if self.add_jsonpath:
loaded._jsonpath = path
id_to_path[id(loaded)] = path
return loaded
[docs] def loads(self, json, obj=None):
"""
Load object from json string or bytes.
Args:
s:
the json string.
obj (optional):
If provided, the object `obj` will be updated and no new object will be created.
Return:
created object or updated object if passed obj.
"""
return self.__call__(json=json, obj=obj)
[docs] def set_default_value(self, value=no_default_value):
"""
Set the value returned if the path passed to load doesn't exist.
It allows to have a default object at the first run of the script or
when the json has been deleted, without raising of FileNotFoundError.
encoder.set_default_value() without any argument will remove the default_value
and reactivate the raise of FileNotFoundError.
"""
self.default_value = value
[docs] def set_authorized_classes(self, classes):
"""
Define the classes that serializejson is authorized to recreate from
the `__class__` keywords in json, in addition to the usuals classes.
Usual classes are : complex ,bytes, bytearray, Decimal, type, set,
frozenset, range, slice, deque, datetime, timedelta, date, time
numpy.array, numpy.dtype.
authorized_classes must be a liste of classes or strings
corresponding to the qualified names of classes (`module.class_name`).
If the loading json contain an unauthorized `__class__`,
serializejson will raise a TypeError exception.
.. warning::
Do not load serializejson files from untrusted / unauthenticated
sources without carefully set the `authorized_classes` parameter.
Never authorize "eval", "exec", "apply" or other functions or
classes which could allow execution of malicious code
with for example :
``{"__class__":"eval","__init__":"do_bad_things()"}``
"""
self._authorized_classes_strs = _get_authorized_classes_strings(classes)
[docs] def set_recognized_classes(self, classes):
"""
Set the classes (string with qualified name or classes) that
serializejson will try to recognize from key names.
"""
self._class_from_attributes_names = _get_recognized_classes_dict(classes)
[docs] def set_updatables_classes(self, updatables):
"""
Set the classes (string with qualified name or classes) that
serializejson will try to update if already in the provided object `obj` when loading with `load` or `loads`.
Otherwise the objects are recreated.
"""
updatableClassStrs = set()
if updatables is not None:
for updatable in updatables:
if isinstance(updatable, str):
updatableClassStrs.add(updatable)
else:
updatableClassStrs.add(class_str_from_class(updatable))
self.updatableClassStrs = updatableClassStrs
def start_object(self):
dict_ = dict()
if (
self.root is None and self.json_startswith_curly
): # en vrai c'est pas forcement le root ,si par exeple le root est une liste ...
self.root = dict_
if self._updating:
id_ = id(dict_)
self.ancestors.append(id_)
return dict_
def end_object(self, inst):
# self._counter += 1
# self._deserializeds.add()
if self._updating:
self.ancestors.pop() # se retire lui meme
class_str = inst.get("__class__", None)
if class_str:
if self._updating:
if class_str in self.updatableClassStrs:
ancestor = self.ancestors[-1]
self.node_has_descendants_to_recreate.add(ancestor)
else:
return self._exploreDictToReCreateObjects(
inst
) # idealement faudrait pouvoir eviter d'explorer, et aller directement rédydrater les descendant , le problème c'est que l'hydrattation n'est pas in place et les objet qui les contiennent de vont pas avoir leur champs mis à jour ... ex dans une liste
else:
return self._inst_from_dict(inst)
# pour reconnaissant d'objet juste à partir des attributes
elif "$ref" in inst and len(inst) == 1:
if self.root:
# try:
inst_potential = from_name(
inst["$ref"], accept_dict_as_object=True, root=self.root
) # essaye de remplacer tout de suite si possible
if inst is inst_potential:
raise Exception('{"$ref": "%s"} pointing to himself' %
inst["$ref"])
if not type(inst_potential) is dict:
# verifie que ce n'est pas un objet qui n'a pas encore été recré
return inst_potential
if "__class__" not in inst_potential:
return inst_potential
inst_potential_epured = {
key: inst_potential[key]
for key in ["__class__", "__init__", "__new__"]
if key in inst_potential
}
inst = self._inst_from_dict(inst_potential_epured)
inst_potential["__class__"] = inst
return inst
self.duplicates_to_replace.append(inst)
elif self._class_from_attributes_names:
class_from_attributes_names = self._class_from_attributes_names
attributes_tuple = tuple(sorted(inst))
if attributes_tuple in class_from_attributes_names:
inst["__class__"] = class_from_attributes_names[attributes_tuple]
recognized = True
else:
attributes_set = set(attributes_tuple)
for attribute_names in class_from_attributes_names.keys():
if attributes_set.issuperset(attribute_names):
inst["__class__"] = class_from_attributes_names[attribute_names]
recognized = True
break
else:
recognized = False
if recognized:
if self._updating:
if inst["__class__"] in self.updatableClassStrs:
ancestor = self.ancestors[-1]
self.node_has_descendants_to_recreate.add(ancestor)
else:
# idealement faudrait pouvoir eviter d'explorer, et aller directement rédydrater les descendant , le problème c'est que l'hydrattation n'est pas in place et les objet qui les contiennent de vont pas avoir leur champs mis à jour ... ex dans une liste
return self._exploreDictToReCreateObjects(inst)
else:
# pas de verification les objets recognized sont considérés comme authorized #self._inst_from_dict(inst)
return instance(**inst)
if self.dotdict:
return dotdict(inst)
return inst
def __call__(self, json, obj=None):
"""
Args:
json : file-like, str or bytes (UTF-8) containing the JSON to be decoded
obj : object to update (optional)
Returns:
a python value
examples:
>>> decoder = Decoder()
>>> decoder('"€ 0.50"')
'€ 0.50'
>>> decoder(b'"\xe2\x82\xac 0.50"')
'€ 0.50'
>>> decoder(io.StringIO('"€ 0.50"'))
'€ 0.50'
>>> decoder(io.BytesIO(b'"\xe2\x82\xac 0.50"'))
'€ 0.50'
"""
blosc.set_nthreads(blosc.ncores)
serialize_parameters.strict_pickle = self.strict_pickle
serialize_parameters.setters = self.setters
serialize_parameters.properties = self.properties
self.converted_numpy_array_from_lists = set()
# self._counter = 0
self._updating = False
# for duplicates -----------
self.root = None
if isinstance(json, str):
self.json_startswith_curly = json.startswith("{")
elif isinstance(json, bytes):
self.json_startswith_curly = json.startswith(b"{")
else:
self.json_startswith_curly = json.read(1) in ("{", b"{")
json.seek(0)
self.duplicates_to_replace = []
# for updating ------------------
if obj is None:
self._updating = False
loaded = rapidjson.Decoder.__call__(self, json, chunk_size=self.chunk_size)
else: # update
self._updating = True
self.ancestors = deque()
self.ancestors.append(None)
self.node_has_descendants_to_recreate = set()
loaded_dict = rapidjson.Decoder.__call__(
self, json, chunk_size=self.chunk_size
)
loaded = self._exploreToUpdate(obj, loaded_dict)
# on restaure doublons qu'on a pu restaurer pendant deserialisation (dans une liste ou doublon referencant un parent)
duplicates_to_replace = self.duplicates_to_replace
if duplicates_to_replace:
gc.collect() # pas sur qu'indispensable mais dans le doute https://docs.python.org/3/library/gc.html#gc.get_referrers
while duplicates_to_replace:
duplicate_to_replace = duplicates_to_replace.pop()
referenced = from_name(
duplicate_to_replace["$ref"],
accept_dict_as_object=True,
root=loaded,
)
if referenced is duplicate_to_replace:
raise Exception('{"$ref": "%s"} pointing to himself' %
duplicate_to_replace["$ref"])
refs = gc.get_referrers(duplicate_to_replace)
skip = (locals(), refs)
for parent in refs:
if parent not in skip:
if type(parent) is dict:
for key, value in parent.items():
if value is duplicate_to_replace:
parent[key] = referenced
break
elif type(parent) is list:
for key, value in enumerate(parent):
if value is duplicate_to_replace:
parent[key] = referenced
break
elif hasattr(parent, "__slots__"):
for slot in parent.__slots__:
if (
hasattr(parent, slot)
and getattr(parent, slot) is duplicate_to_replace
):
setattr(parent, slot, referenced)
break
# clean ---------------
del self.duplicates_to_replace
if self._updating:
del self.ancestors
del self.node_has_descendants_to_recreate
self._updating = False
if obj is not None:
return obj
return loaded
def __iter__(self):
self._updating = False
file = self.file
if isinstance(file, str):
if not os.path.exists(file):
return [self.default_value]
self.file_iter = _json_object_file_iterator(file, mode="rb")
else:
raise Exception("not yet able to load_iter on %s" % str(type(file)))
return self
def _inst_from_dict(self, inst):
class_str = inst["__class__"]
if class_str in self._authorized_classes_strs or not isinstance(class_str, str):
for key in ("__init__", "__new__", "__items__"):
if key in inst:
if (
self.numpy_array_from_list
and isinstance(inst[key], numpy.ndarray)
and id(inst[key]) in self.converted_numpy_array_from_lists
):
inst[key] = inst[key].tolist()
if key != "__items__" and class_str in remove_add_braces:
inst[key] = (inst[key],)
if (
inst["__class__"] == "dict_non_str_keys"
): # je l'ai mis ici car trop specifique à json pour etre dans tools (qui est partagé avec serializePython et serializeRepr)
return dict_non_str_keys(inst)
return instance(**inst)
if self.unauthorized_classes_as_dict:
if self.dotdict:
warnings.warn(
f"{inst['__class__']} not in authorized_classes leaved as docdict",
Warning,
)
return dotdict(inst)
warnings.warn(
f"{inst['__class__']} not in authorized_classes leaved as dict", Warning
)
return inst
raise TypeError(f"{inst['__class__']} is not in authorized_classes")
# @profile
def _exploreToUpdate(self, obj, loaded_node):
# gère le cas où loaded_node est un dictionnaire ----------------------
if isinstance(loaded_node, dict):
obj_keys = None # plutot que set vide un objet peut ne pas avoir d'attributes ni de slots initialisés
obj_class = obj.__class__
if obj_class is dict and ("dict" in self.updatableClassStrs):
is_dict = True
obj_keys = set(obj)
obj
else: # s'assure que c'est une instance
is_dict = False
class_str = loaded_node.get("__class__")
if (
(class_str is not None)
and (class_str in self.updatableClassStrs)
and (class_str == class_str_from_class(obj_class))
):
if class_str == "set":
# on peut udpate le set MAIS PAS LES OJECTS QUI SONT DEDANS !!!! car on ne sait pas quel existant correspond à quel element json
obj.clear()
obj.update(self._exploreDictToReCreateObjects(loaded_node))
return obj
if hasattr(obj, "__setstate__"):
# j'ai du remplacer hasMethod(inst,"__setstate__") par hasattr(inst,"__setstate__") pour pouvoir deserialiser des sklearn.tree._tree.Tree en json "__setstate__" n'est pas reconnu comme étant une methdoe !? alors que bien là .
if "__state__" in loaded_node:
obj.__setstate__(loaded_node["__state__"])
else:
loaded_node.__delitem__("__class__")
if "__init__" in loaded_node:
loaded_node.__delitem__("__init__")
obj.__setstate__(loaded_node)
return obj
if hasattr(obj, "__dict__"):
# A REVOIR : ne marche pas avec les slots
obj_keys = set(obj.__dict__)
if hasattr(obj, "__slots__"):
if obj_keys is None:
obj_keys = set()
for slot in slots_from_class(obj_class):
if hasattr(obj, slot):
obj_keys.add(slot)
if obj_keys is not None:
if not is_dict:
setters = serialize_parameters.setters
if type(setters) is dict:
setters = setters.get(obj_class, False)
if setters is True:
setters = setters_names_from_class(obj_class)
# update dans le cas où l'objet pré-existant est un objet (avec __dict__ pas encore __slot__) ou un dictionnaire --
loaded_node_has_descendants_to_recreate = (
id(loaded_node) in self.node_has_descendants_to_recreate
)
# suprime les attributes de l'objet qui ne sont pas dans loaded..
only_in_obj = obj_keys - set(loaded_node)
for key in only_in_obj:
if is_dict:
del obj[key]
elif not key.startswith("_"):
obj.__delattr__(key)
# update ou recrer les autres attributes
for key, value in loaded_node.items():
if key not in ("__class__", "__init__"):
if key in obj_keys:
if is_dict:
old_value = obj[key]
else:
old_value = obj.__getattribute__(key)
value = self._exploreToUpdate(old_value, value)
elif loaded_node_has_descendants_to_recreate:
if isinstance(value, dict):
value = self._exploreDictToReCreateObjects(value)
elif isinstance(value, list):
value = self._exploreListToReCreateObjects(value)
if is_dict:
obj[key] = value
elif setters and key in setters:
obj.__getattribute__(setters[key])(value)
else:
obj.__setattr__(key, value)
return obj
return self._exploreDictToReCreateObjects(loaded_node)
# gère le cas où loaded_node est une liste ---------------------------
if isinstance(loaded_node, list):
if isinstance(obj, list) and ("list" in self.updatableClassStrs):
# update dans le cas où l'objet pré-existant est une liste
len_obj = len(obj)
del obj[len(loaded_node):]
for i, value in enumerate(loaded_node):
if i < len_obj and isinstance(value, (list, dict)):
obj[i] = self._exploreToUpdate(obj[i], value)
else:
if isinstance(value, dict):
value = self._exploreDictToReCreateObjects(value)
elif isinstance(value, list):
value = self._exploreListToReCreateObjects(value)
obj.append(value)
return obj
else: # sinon replace
return self._exploreListToReCreateObjects(loaded_node)
# gère les autres cas
return loaded_node # replace
def _exploreDictToReCreateObjects(self, loaded_node):
if id(loaded_node) in self.node_has_descendants_to_recreate:
for key, value in loaded_node.items():
if isinstance(value, dict): # and "__class__" in value
loaded_node[key] = self._exploreDictToReCreateObjects(value)
elif isinstance(value, list):
loaded_node[key] = self._exploreListToReCreateObjects(value)
if "__class__" in loaded_node:
return self._inst_from_dict(loaded_node)
else:
return loaded_node
def _exploreListToReCreateObjects(self, loaded_node):
for i, value in enumerate(loaded_node):
if isinstance(value, dict):
loaded_node[i] = self._exploreDictToReCreateObjects(value)
elif isinstance(value, list):
loaded_node[i] = self._exploreListToReCreateObjects(value)
return loaded_node
# ---------------------------------
def _end_array_if_numpy_array_from_list(self, sequence):
if _onlyOneDimSameTypeNumbers(sequence):
array = numpy.array(sequence, dtype=type(sequence[0]))
self.converted_numpy_array_from_lists.add(id(array))
return array
if len(sequence) and isinstance(sequence[0], ndarray):
first_elt = sequence[0]
first_elt_shape = first_elt.shape
first_elt_dtype = first_elt.dtype
if all(
(
isinstance(elt, ndarray)
and elt.dtype is first_elt_dtype
and elt.shape == first_elt_shape
)
for elt in sequence
):
array = numpy.array(sequence, dtype=first_elt_dtype)
self.converted_numpy_array_from_lists.add(id(array))
return array
return sequence
def _end_array_if_numpy_array_from_heterogenous_list(self, sequence):
if _onlyOneDimNumbers(sequence):
array = numpy.array(sequence)
self.converted_numpy_array_from_lists.add(id(array))
return array
if len(sequence) and isinstance(sequence[0], ndarray):
first_elt = sequence[0]
first_elt_shape = first_elt.shape
if all(
(isinstance(elt, ndarray) and elt.shape == first_elt_shape)
for elt in sequence
):
array = numpy.array(sequence)
self.converted_numpy_array_from_lists.add(id(array))
return array
return sequence
def __next__(self):
try:
return rapidjson.Decoder.__call__(
self, self.file_iter, chunk_size=self.chunk_size
)
except rapidjson.JSONDecodeError as error:
self.file_iter.close()
if error.args[0] == "Parse error at offset 0: The document is empty.":
raise StopIteration
else:
raise
# ----------------------------------------------------------------------------------------------------------------------------
# --- INTERNES -----------------------------------------------------------------------------------------------------
# ----------------------------------------------------------------------------------------------------------------------------
class dotdict(dict):
"""dot notation access to dictionary attributes"""
def __getattr__(self, attr):
try:
return self[attr]
except KeyError:
raise AttributeError()
__setattr__ = dict.__setitem__
__delattr__ = dict.__delitem__
def bool_or_set(value):
if value is None:
return set()
if isinstance(value, (bool, set)):
return value
if isinstance(value, (list, tuple)):
return set(value)
else:
raise TypeError
def bool_or_dict(value):
if value is None:
return dict()
if isinstance(value, (bool, dict)):
return value
if isinstance(value, (set, list, tuple)):
return {key: True for key in value}
else:
raise TypeError
def dict_non_str_keys(dict_):
d = dict()
del dict_["__class__"]
for key, value in dict_.items():
try:
key = loads(key)
except:
if key.endswith("'"):
if key.startswith("'"):
key = key[1:-1]
elif key.startswith("b'"):
key = key[2:-1].encode("ascii_printables")
elif key.startswith("b64'"):
key = b64decode(key[4:])
else:
if type(key) is list:
key = tuple(key)
d[key] = value
return d
def all_keys_are_str(dict_):
for key in dict_:
if type(key) != str:
return False
return True
if use_numpy:
_numpy_float_dtypes = set(
(numpy.dtype("float16"), numpy.dtype("float32"), numpy.dtype("float64"))
)
_numpy_types = set(
(
numpy.bool_,
numpy.int8,
numpy.int16,
numpy.int32,
numpy.int64,
numpy.uint8,
numpy.uint16,
numpy.uint32,
numpy.uint64,
numpy.float16,
numpy.float32,
numpy.float64,
)
)
_numpy_float_types = set(
(
numpy.float16,
numpy.float32,
numpy.float64,
)
)
_numpy_int_types = set(
(
numpy.int8,
numpy.int16,
numpy.int32,
numpy.int64,
numpy.uint8,
numpy.uint16,
numpy.uint32,
numpy.uint64,
)
)
_numpy_dtypes_to_python_types = {numpy.bool_: bool}
for numpy_type in _numpy_int_types:
_numpy_dtypes_to_python_types[numpy_type] = int
for numpy_type in _numpy_float_types:
_numpy_dtypes_to_python_types[numpy_type] = float
else:
_numpy_types = set()
NoneType = type(None)
remove_add_braces = {
"set",
"frozenset",
"tuple",
"collections.OrderedDict",
"collections.Counter",
}
def _close_for_append(fp, indent):
if indent is None:
try:
fp.write(b"]")
except TypeError:
fp.write("]")
else:
try:
fp.write(b"\n]")
except TypeError:
fp.write("\n]")
def _open_for_append(fp, indent):
length = 0
remove_last_square_close = True
if isinstance(fp, str):
path = fp
if os.path.exists(path):
fp = open(path, "rb+")
# detect encoding
bytes_ = fp.read(3)
len_bytes = len(bytes_)
if len_bytes:
if bytes_[0] == 0:
if bytes_[1] == 0:
fp = open(path, "r+", encoding="utf_32_be")
else:
fp = open(path, "r+", encoding="utf_16_be")
elif len_bytes > 1 and bytes_[1] == 0:
if len_bytes > 2 and bytes_[2] == 0:
fp = open(path, "r+", encoding="utf_32_le")
else:
fp = open(path, "r+", encoding="utf_16_le")
# remove last ]
remove_last_square_close = True
else:
fp = open(path, "wb")
remove_last_square_close = False
elif fp is None:
raise Exception("Incorrect file (file, str ou unicode)")
if remove_last_square_close:
fp.seek(0, 2)
length = fp.tell()
if length == 1:
fp.close()
raise Exception("serializejson can append only to serialized lists")
if length >= 2:
fp.seek(-1, 2) # va sur le dernier caractère
lastcChar = fp.read(1)
if lastcChar in (b"]", "]"):
fp.seek(-2, 2)
beforlastcChar = fp.read(1)
if beforlastcChar in (b"\n", "\n"):
fp.seek(-2, 2)
else:
fp.seek(-1, 2) # va sur le dernier caractère
fp.truncate()
else:
fp.close()
raise Exception("serializejson can append only to serialized lists")
if length == 0:
if indent is None:
fp.write(b"[")
else:
fp.write(b"[\n")
elif length > 2:
if indent is None:
try:
fp.write(b",")
except TypeError:
fp.write(",")
else:
try:
fp.write(b",\n")
except TypeError:
fp.write(",\n")
return fp
def _open_with_good_encoding(path):
# https://stackoverflow.com/questions/4990095/json-specification-and-usage-of-bom-charset-encoding/38036753
fp = open(path, "rb")
bytes_ = fp.read(3)
fp.seek(0)
len_bytes = len(bytes_)
if len_bytes:
if (
bytes_ == b"\xef\xbb\xbf"
): # normalement ne devrait pas arriver les json ne devraient jamais commencer par un BOM , mais parfoit si le fichier à été créer à la main dans un editeur de text, il peut y'en avoir un (exemple : personnel.json ).
fp = open(path, "r", encoding="utf_8_sig")
elif bytes_[0] == 0:
if bytes_[1] == 0:
fp = open(path, "r", encoding="utf_32_be")
else:
fp = open(path, "r", encoding="utf_16_be")
elif len_bytes > 1 and bytes_[1] == 0:
if len_bytes > 2 and bytes_[2] == 0:
fp = open(path, "r", encoding="utf_32_le")
else:
fp = open(path, "r", encoding="utf_16_le")
return fp
def _get_authorized_classes_strings(classes):
if not type(classes) in (set, list, tuple):
if classes is None:
classes = set()
else:
classes = [classes]
_authorized_classes_strs = authorized_classes.copy()
for elt in classes:
if not type(elt) is str:
elt = class_str_from_class(elt)
_authorized_classes_strs.add(elt)
return _authorized_classes_strs
def _get_recognized_classes_dict(classes):
if classes is None:
return dict()
if not isinstance(classes, (list, tuple)):
classes = [classes]
else:
classes = classes
_class_from_attributes_names = dict()
for class_ in classes:
if isinstance(class_, str):
classToRecStr = class_
classToRecClass = class_from_class_str(class_)
else:
classToRecStr = class_str_from_class(class_)
classToRecClass = class_
serializedattributes = []
instanceVide = classToRecClass()
for attribute in list(instanceVide.__dict__.keys()) + slots_from_class(class_):
if not attribute.startswith("_"):
serializedattributes.append(attribute)
serializedattributes = tuple(sorted(serializedattributes))
_class_from_attributes_names[serializedattributes] = classToRecStr
return _class_from_attributes_names
class _json_object_file_iterator(io.FileIO):
def __init__(self, fp, mode, **kwargs):
io.FileIO.__init__(self, fp, mode=mode, **kwargs)
self.in_quotes = False
self.in_curlys = 0
self.in_squares = 0
self.in_simple = False
self.in_object = False
self.backslash_escape = False
self.shedule_break = False
self.in_chunk_start = 0
self.s = None
# s = io.FileIO.read(self, 1)
# if s not in (b"[", "["):
# raise Exception('the json data must start with "["')
if "b" in mode:
self.interesting = set(b'\\"{}[]')
self.separators = set(b", \t\n\r")
self.chars = list(b'\\"{}[]')
else:
self.interesting = set('\\"{}[]')
self.separators = set(", \t\n\r")
self.chars = list('\\"{}[]')
def read(self, size=-1):
if self.shedule_break:
self.shedule_break = False
# print("read(1): empty")
return ""
(
backslash,
doublecote,
curly_open,
curly_close,
square_open,
square_close,
) = self.chars
interesting = self.interesting
separators = self.separators
in_quotes = self.in_quotes
in_curlys = self.in_curlys
in_squares = self.in_squares
in_simple = self.in_simple
in_object = self.in_object
backslash_escape = self.backslash_escape # true if we just saw a backslash
in_chunk_start = self.in_chunk_start
if in_chunk_start == 0:
s = self.s = io.FileIO.read(self, size)
else:
s = self.s
for i in range(in_chunk_start, len(s)):
ch = s[i]
if in_simple:
if ch in separators or ch in ("]", 93):
if in_chunk_start < i:
# on prevoit d'arreter au read suivant sinon , va de tout facon arreter et on ne pourra pas remeter self.shedule_break à False
self.shedule_break = True
# self.seek(chunk_start + i + 1)
self.in_chunk_start = (i + 1) % len(s)
self.in_quotes = False
self.in_curlys = 0
self.in_squares = in_squares
self.in_simple = False
self.in_object = False
# print("read(2): ",s[in_chunk_start:i])
return s[in_chunk_start:i]
elif ch in interesting:
check = False
if in_quotes:
if backslash_escape:
# we must have just seen a backslash; reset that flag and continue
backslash_escape = False
elif ch == backslash:
backslash_escape = True # we are in a quote and we see a backslash; escape next char
elif ch == doublecote:
in_quotes = False
# signale qu'on sort d'un truc et qu'il faudra checker
check = True
elif ch == doublecote: # "
in_quotes = True
in_object = True
elif ch == curly_open: # {
in_curlys += 1
in_object = True
elif ch == curly_close: # }
in_curlys -= 1
check = True
elif ch == square_open: # [
in_squares += 1
if in_squares > 1:
in_object = True
else:
in_chunk_start = (i + 1) % len(s)
elif ch == square_close: # ]
in_squares -= 1
check = True
if not in_squares: # on a ateint la fin de la liste json
return ""
if check and not in_quotes and not in_curlys and in_squares < 2:
if in_chunk_start < (i + 1):
# on prevoit d'arreter au read suivant sinon , va de tout facon arreter et on ne pourra pas remeter self.shedule_break à False
self.shedule_break = True
# self.seek(chunk_start + i + 1)
self.in_chunk_start = (i + 1) % len(s)
self.in_quotes = False
self.in_curlys = False
self.in_squares = in_squares
self.in_simple = False
self.in_object = False
# print("read(3): ",s[in_chunk_start: i + 1])
return s[in_chunk_start: i + 1]
elif not in_object:
if ch in separators:
in_chunk_start = i + 1
else:
in_simple = True
self.in_quotes = in_quotes
self.in_curlys = in_curlys
self.in_squares = in_squares
self.in_simple = in_simple
self.in_object = in_object
self.backslash_escape = backslash_escape
self.in_chunk_start = 0
if in_chunk_start:
# print("read(4): ",s[in_chunk_start:])
return s[in_chunk_start:]
return s
id_to_path = dict()