serializejson¶

Authors	Baptiste de La Gorce
PyPI	https://pypi.org/project/serializejson
Documentation	https://smartaudiotools.github.io/serializejson
Sources	https://github.com/SmartAudioTools/serializejson
Issues	https://github.com/SmartAudioTools/serializejson/issues
Noncommercial license	Prosperity Public License 3.0.0
Commercial license	Patron License 1.0.0 ⇒ Sponsor me ! or contact me !

serializejson is a python library for fast serialization and deserialization of python objects in JSON designed as a safe, interoperable and human-readable drop-in replacement for the Python pickle package. Complex python object hierarchies are serializable, deserializable or updatable in once, allowing for example to save or restore a complete application state in few lines of code. The library is build upon python-rapidjson, pybase64 and blosc for optional zstandard compression.

Some of the main features:

supports Python 3.7 (maybe lower) or greater.
serializes arbitrary python objects into a dictionary by adding __class__ ,and eventually __init__, __new__, __state__, __items__ keys.
calls the same objects methods as pickle. Therefore almost all pickable objects are serializable with serializejson without any modification.
for not already pickable object, you will allways be able to serialize it by adding methodes to the object or creating plugins for pickle or serializejson.
generally 2x slower than pickle for dumping and 3x slower than pickle for loading (on your benchmark) except for big arrays (optimisation will soon be done).
serializes and deserializes bytes and bytearray very quickly in base64 thanks to pybase64 and lossless blosc compression.
serialize properties and attributes with getters and setters if wanted (unlike pickle).
json data will still be directly loadable if you have transform some attributes in slots or properties in your code since your last serialization. (unlike pickle)
can serialize __init__(self,..) arguments by name instead of positions, allowing to skip arguments with defauts values and making json datas robust to a change of __init__ parameters order.
serialized objects take generally less space than when serialized with pickle: for binary data, the 30% increase due to base64 encoding is in general largely compensated using the lossless blosc compression.
serialized objects are human-readable and easy to read. Unlike pickled data, your data will never become unreadable if your code evolves: you will always be able to modify your datas with a text editor (with find & replace for example if you change an attribut name).
serialized objects are text and therefore versionable and comparable with versionning and comparaison tools.
can safely load untrusted / unauthenticated sources if authorized_classes list parameter is set carefully with strictly necessary objects (unlike pickle).
can update existing objects recursively instead of override them. serializejson can be used to save and restore in place a complete application state (⚠ not yet well tested).
filters attribute starting with “_” by default (unlike pickle). You can keep them if wanted with filter_ = False.
numpy arrays can be serialized as lists with automatic conversion in both ways or in a conservative way.
supports circular references and serialize only once duplicated objects, using “$ref” key an path to the first occurance in the json : {“$ref”: “root.xxx.elt”} (⚠ not yet if the object is a list or dictionary).
accepts json with comment (// and /* */) if accept_comments = True.
can automatically recognize objects in json from keys names and recreate them, without the need of __class__ key, if passed in recognized_classes.
serializejson is easly interoperable outside of the Python ecosystem with this recognition of objects from keys names or with __class__ translation between python and other language classes.
dump and load support string path.
can iteratively encode (with append) and decode (with iterator) a list in json file, which helps saving memory space during the process of serialization and deserialization and useful for logs.

Warning

⚠ Do not load serializejson files from untrusted / unauthenticated sources without carefully setting the load authorized_classes parameter.

⚠ Never dump a dictionary with the __class__ key, otherwise serializejson will attempt to reconstruct an object when loading the json. Be careful not to allow a user to manually enter a dictionary key somewhere without checking that it is not __class__. Due to current limitation of rapidjson we cannot we cannot at the moment efficiently detect dictionaries with the __class__ key to raise an error.

Installation¶

Last offical release

pip install serializejson

Developpement version unreleased

pip install git+https://github.com/SmartAudioTools/serializejson.git

Examples¶

Serialization with fonctions API

import serializejson

#serialize in string
object1 = set([1,2])
dumped1 = serializejson.dumps(object1)
loaded1 = serializejson.loads(dumped1)
print(dumped1)
>{
>        "__class__": "set",
>        "__init__": [1,2]
>}


#serialize in file
object2 = set([3,4])
serializejson.dump(object2,"dumped2.json")
loaded2 = serializejson.load("dumped2.json")

Serialization with classes based API.

import serializejson
encoder = serializejson.Encoder()
decoder = serializejson.Decoder()

# serialize in string

object1 = set([1,2])
dumped1 = encoder.dumps(object1)
loaded1 = decoder.loads(dumped1)
print(dumped1)

# serialize in file
object2 = set([3,4])
encoder.dump(object2,"dumped2.json")
loaded2 = decoder.load("dumped2.json")

Update existing object

import serializejson
object1 = set([1,2])
object2 = set([3,4])
dumped1 = serializejson.dumps(object1)
print(f"id {id(object2)} :  {object2}")
serializejson.loads(dumped1,obj = object2, updatables_classes = [set])
print(f"id {id(object2)} :  {object2}")

Iterative serialization and deserialization

import serializejson
encoder = serializejson.Encoder("my_list.json",indent = None)
for elt in range(3):
    encoder.append(elt)
print(open("my_list.json").read())
for elt in serializejson.Decoder("my_list.json"):
    print(elt)
>[0,1,2]
>0
>1
>2

More examples and complete documentation here

License¶

For noncommercial use or thirty-day limited free-trial period commercial use, this project is licensed under the Prosperity Public License 3.0.0.

For non limited commercial use, this project is licensed under the Patron License 1.0.0. To acquire a license please contact me, or just sponsor me on GitHub under the appropriate tier ! This funding model helps me making my work sustainable and compensates me for the work it took to write this crate!

Third-party contributions are licensed under Apache License, Version 2.0 and belong to their respective authors.

Classes API¶

Classes API is the preferred API if you have to encode or decode several objects, allowing you to reuse the same Encoder and Decoder classes instancies for this objects. Function API internaly create Encoder or Decoder instances at each call. It’s non-negligeable cost if lot of smal objects are serialized one by one.

Moreover this API allow to get all encoded classes with Encoder.get_dumped_classes() in order to pass them later to Decoder(authorized_classes = … )

Encoder¶

class serializejson.Encoder(file=None, *, strict_pickle=False, return_bytes=False, attributes_filter=True, properties=False, getters=False, remove_default_values=False, chunk_size=65536, ensure_ascii=False, indent='\t', single_line_init=True, single_line_new=True, single_line_list_numbers=True, sort_keys=False, bytes_compression=('blosc_zstd', 1), bytes_compression_diff_dtypes=(), bytes_size_compression_threshold=512, bytes_compression_threads=1, array_use_arrayB64=True, array_readable_max_size=0, numpy_array_use_numpyB64=True, numpy_array_readable_max_size=0, numpy_array_to_list=False, numpy_types_to_python_types=True, protocol=4, **plugins_parameters)[source]¶

class for serialization of python objects into json.

Parameters:

file (str or file-like) – The json path or file-like object. When specified, the encoded result will be written there if you don’t pricise file to`dump()` method later.
attributes_filter (bool or set/list/tuple) –
Controls whether remove “private” attributs starting with “_” from the saved state for objects without plugin, __getstate__,__serializejson__ or reimplemented __reduce_ex__ or __reduce__ methodes.
- False : filter private attributes to none classes (if not filtered in __reduce__ or __gestate__ methodes)
- True : filter private attributes for all classes
- set/list/tuple : filter private attributes for this classes
Use it temporarily.
- In order to stay compatible with pickle,you sould better code one of the __getstate__, __reduce_ex__,__reduce__ or a pickle plugin, filtering attributes starting with “_”.
- Otherwise, in order to be independent of this parameter, code a _serializejson__ method or serializejson plugin.
- In this method or plugin you can call the helping function : state = serialize.__gestate__(self,attributes_filter = True)
properties (bool, None, set/list/tuple, dict) –
Controls whether add properties to the saved state for objects without plugin, __getstate__,__serializejson__ or reimplemented __reduce_ex__ or __reduce__ methodes.
- False : add properties to none classes (as pickle)
- True : add properties for all classes
- None : (default) add properties defined in serializejson.properties dict (added by plugins or manualy before encoder call) (see documentation section: ref:”Add plugins to serializejson”<add-plugins-label>. )
- set/list/tuple : add all properties for classes in this set/list/tuple, in addition to properties defined in serializejson.properties dict [class1, class2,..] (not secure if unstruted json, use it only for debuging)
- dict : add properties defined in dict, in addition to properties defined in serializejson.properties dict {class1 : [“propertie1”,”propertie1”], class2: True}
Use it temporarily.
- In order to stay compatible with pickle, you sould better code one of the __getstate__, __reduce_ex__, __reduce__ or a pickle plugin, retrieving values for properties and returning them in the same dictionnary than __slots__, as the second element of a state tuple.
- Otherwise, in order to be independent of this parameter, code a _serializejson__ method or serializejson plugin retrieving values for properties and return them in the state dictionnary.
- In this method or plugin you can call the helping function : state = serialize.__gestate__(self, properties = True or list of properties names)
getters (bool or set/list/tuple) –
Controls whether add values retrieve with getters to the saved state for objects without plugin, __getstate__,__serializejson__ or reimplemented __reduce_ex__ or __reduce__ methodes.
- False : save no other getters than thus called in __getstate__ methodes, like pickle.
- True : save getters for all objects
- None : (default) save getters defined in serializejson.getter dict (added by plugins or manualy before encoder call) (see documentation section: ref:”Add plugins to serializejson”<add-plugins-label>. )
- set/list/tuple : save getters for classes in set/list/tuple, in addition to getters defined in serializejson.setters dict [class1, class2,..] (not secure if unstruted json, use it only for debuging)
- dict : save getters defined in dict, in addition to getters defined in serializejson.getters dict {class1 : {“attribut_name”:”getter_name”,…}, class2: True}
Use it temporarily.
- In order to stay compatible with pickle, you sould better code one of the __getstate__, __reduce_ex__, __reduce__ or a pickle plugin, retrieving values for getters and returning them in the state. And code a __setstate__ methode calling setters for this values .
- Otherwise, in order to be independent of this parameter, code a _serializejson__ method or serializejson plugin retrieving values for getters and returning them in the state. And code a __setstate__ methode calling setters for this values or leave the Decpder’s setters parameter as True.
- In this method or plugin you can call the helping function : state = serialize.__gestate__(self,getters = True or {“a”:”getA”,”b”:”getB”}). With getters as True, the getters will be automaticaly guessed. Wit getters as a dict allow the finest control and is faster because getters are not guessed from introspection. With tuple as key in this dict, you can retrieve several attributes values from one getter.
remove_default_values (bool or set/list/tuple) –
Controls whether remove values same as their default value from the state in order to save memory space, for objects without plugin, __getstate__, __serializejson__ or reimplemented __reduce_ex__ or __reduce__ methodes.
- False : remove defaul values to none classes
- True : remove defaul values for all classes
- set/list/tuple : remove defaul values for this classes.
Use it temporarily.
- Since the default values will not be stored and may change between different versions of your code, never use it for long term storage. Be aware that in order to know the default value, serializejson will create an insistence of the object’s class without any __init__ argument.
- In order to stay compatible with pickle, you sould better code one of the __getstate__, __reduce_ex__, __reduce__ or a pickle plugin, removing values same as their default value.
- Otherwise, in order to be independent of this parameter, code a _serializejson__ method or serializejson plugin removing values same as their default value.
- In this method or plugin you can call the helping function : state = serialize.__gestate__(self,remove_default_values = True or dict {name : default_value,…})
chunk_size – Write the file in chunks of this size at a time.
ensure_ascii – Whether non-ascii str are dumped with escaped unicode or utf-8.
indent (None, int or '\t') –
Indentation width to produce pretty printed JSON.
- None : Json in one line (quicker than with indent).
- int : new lines and indent spaces for indent.
- ’\t’ : new lines and tabulations for indent (take less space than int > 1).
single_line_init – whether __init__ args must be serialized in one line.
single_line_new – whether __new__ args must be serialized in one line.
single_line_list_numbers – whether list of numbers of same type must be serialize in one line.
sort_keys – whether dictionary keys should be sorted alphabetically. Since python 3.7 dictionary order is guaranteed to be insertion order. Some codes may now rely on this particular order, like the key order of the state returned by __gestate__.
bytes_compression (None or str) –
Compression for bytes, bytesarray and numpy arrays:
- None : no compression, use only base 64.
- str : compression name (“blosc_zstd”, “blosclz”, “blosc_lz4”, “blosc_lz4hc” or “blosc_zlib”) with maximum compression level 9.
- tuple : (compression name, compression level) with compression level from 0 (no compression) to 9 (maximum compression)
By default the “blosc_zstd” compression is used with compression level 1. For the highest compression (but with slower dumping) use “blosc_zstd” with compression level 9
bytes_compression_diff_dtypes (tuple of dtype) – tuple of dtype for wich serialize json encode the first element followed by the differences between consecutive elements of an array before the compression. A cumulative sum will be used for the decompression
bytes_compression_threads (int,str) –
Number of threads user for the compression
- int : number of threads user for the compression
- ”cpus”: use as many thread than cpu
- ”determinist” us one thread with blosc compression for determinist compression eiter as many thread than cpu
bytes_size_compression_threshold (int) – bytes size threshold beyond compression is tried to reduce size of bytes, bytesarray and numpy array if bytes_compression is not None. The default value is 512, generaly beside the compression is not worth it due to the header size and the additional cpu cost.
array_readable_max_size (int,None or dict) –
Defines the maximum array.array size for serialization in readable numbers. By default array_readable_max_size is set to 0, all non empty arrays are encoded in base 64.
- int : all arrays smaller than or egal to this size are serialized in readable numbers.
- None : there is no maximum size and all arrays are serialized in readable numbers.
- dict : for each typecode key, the value define the maximum size of this typecode arrays for serialization in readable numbers. If value is None there is no maximum and array of this typecode are all serialized in readable numbers. If you want only signed int arrays to be readable, then you should pass array_readable_max_size = {“i”:None}
Note

serialization of int arrays can take much less space in readable, but is much slower than in base 64 for big arrays. If you have lot or large int arrays and performance matters, then you should stay with default value 0.
numpy_array_readable_max_size (int,None or dict) –
Defines the maximum numpy array size (product of the array’s dimensions) for serialization in readable numbers. By default numpy_array_readable_max_size is set to 0, all non empty numpy arrays are encoded in base 64.
- int : all numpy arrays smaller than or egal to size are serialized in readable numbers.
- None : there is no maximum size and all numpy arrays are serialized in readable numbers.
- dict : for each dtype key, the value define the maximum size of this dtype arrays for serialization in readable numbers. If value is None there is no maximum and numpy array of this dtype are all serialized in readable numbers. If you want only numpy arrays int32 to be readable, then you should pass numpy_array_readable_max_size = {“int32”:None}
Note

serialization in readable can take much less space in int32 if the values ar smaller or equal to 9999, but is much slower than in base 64 for big arrays. If you have lot or large numpy int32 arrays and performance matters, then you should stay with default value 0.
numpy_array_to_list –
whether numpy array should be serialized as list.
Warning

This should be used only for interoperability with other json libraries. If you want readable values in your json, we recommend to use instead numpy_array_readable_max_size which is not destructive.

With numpy_array_to_list set to True:
- numpy arrays will be indistinctable from list in json.
- Decoder(numpy_array_from_list=True) will recreate numpy array from lists of bool, int or float, if not an __init__ args list, with the the risque of unwanted convertion of lists to numpy arrays.
- dtype of the numpy array will be loosed if not bool, int32 or float64 and converted to the bool, int32 or float64 when loading
- Empty numpy array will be converted to [] without any way to guess the dtype and will stay an empty list when loading event with numpy_array_from_list = True
numpy_types_to_python_types – whether numpy integers and floats outside of a array must be convert to python types. It save space and generally don’t affect
strict_pickle (False by default) –
If True serialize with exactly the same behaviour than pickle:
- disabling serializejson plugins for custom serialization.(no numpyB64)
- disabling attributes_filter
- disabling keys sorting
- disabling numpy_array_to_list
- disabling numpy_types_to_python_types
- keeping __dict__ and __slots__ separated in a tuple if both, instead of merge them in a dictionnary (you should prepare __setstat__ methods to receive both a tuple or a dictionnary)
- making same checks than pickle
- raising the sames Errors than pickle
**plugins_parameters – extra keys arguments are stocked in “serialize_parameters” global module and accessible in plugins module in order to allow the choice between serialization options in plugins.

dump(obj, file=None, close=True)[source]¶

Dump object into json file.

Parameters:

obj – object to dump.
file (optional str or file-like) – the json path or file-like object. When specified, json is written into this file. Otherwise json is written into the file passed to Encoder() constructor.
close (optional bool) – weither dump must close the file after dumping (True by default).

dumps(obj)[source]¶: Dump object into json string.

dumpb(obj)[source]¶: Dump object into json bytes.

append(obj, file=None, close=False)[source]¶

Append object into json file.

Parameters:

obj – object to dump.
file (optional str or file-like) – path or file. If provided, the object will be dumped into this file instead of being dumped into the file passed at the Encoder constructor. The file must be empty or contain a json list.
close –
- True the file will be closed afterappend and reopen at the next append
- False (by default) the file will be kepped open for the next append.
You will have to manually close se file with encoder.close()

get_dumped_classes()[source]¶: Return the all dumped classes. In order to reuse them as authorize_classes argument when loading with a serializejson.Decoder.

Decoder¶

class serializejson.Decoder(file=None, *, authorized_classes=None, unauthorized_classes_as_dict=False, recognized_classes=None, updatables_classes=None, setters=True, properties=True, default_value=[], accept_comments=False, numpy_array_from_list=False, numpy_array_from_heterogenous_list=False, chunk_size=65536, strict_pickle=False, dotdict=False, add_jsonpath=False)[source]¶

Decoder for loading objects serialized in json files or strings.

Parameters:

file (string or file-like) – the json path or file-like object. When specified, the decoder will read json from this file if you don’t pricise file to`load()` method later.
authorized_classes (set/list/tuple) –
Define the classes that serializejson is authorized to recreate from the __class__ keywords in json, in addition to default authorized classes and classes autorized by plugins.

default authorize classes are : array.array,bytearray,bytes,range,set,slice,time.struct_time,tuple, type,frozenset,collections.Counter,collections.OrderedDict, collections.defaultdict,collections.deque,complex,datetime.date, datetime.datetime,datetime.time,datetime.timedelta,decimal.Decimal, numpy.array,numpy.bool_,numpy.dtype,numpy.float16,numpy.float32, numpy.float64,numpy.frombuffer,numpy.int16,numpy.int32,numpy.int64, numpy.int8,numpy.ndarray,numpy.uint16,numpy.uint32,numpy.uint64, numpy.uint8,numpyB64.

authorized_classes must be a set/list/tuple of classes or strings corresponding to the qualified names of classes (module.class_name). If the loading json contain an unauthorized __class__, serializejson will raise a TypeError exception.

Warning

Do not load serializejson files from untrusted / unauthenticated sources without carefully set the authorized_classes parameter. Never authorize “eval”, “exec”, “apply” or other functions or classes which could allow execution of malicious code with for example : {"__class__":"eval","__init__":"do_bad_things()"}
unauthorized_classes_as_dict (False by default) – Controls whether unauthorized classes should be decoded as dict without raising a TypeError (or as dotdict if dotdict parameter is True, see the “dotdict” parameter for further explanation).
recognized_classes (set/list/tuple) – Classes (string with qualified names or classes) that serializejson will try to recognize from keys names. A classe will be recognized if keys names of a json dictionnary is a superset of the classe’s default attributs names. Classe’s default attributs name are slots and attributs names in __dict__ not starting with “_” after initialisation (serializejson will create an instance of each class passed in recognized_classes in order to determine this attributs) The instance will be instancied with new (with no argement), and __init__ will not be called . If you want execute some initialization code, you must add a __setstate__() methode to your object or setter/properties with setters/properties Encoder’s parameters activated.
updatables_classes (set/list/tuple) – Classes (string with qualified names or classes) that serializejson will try to update if already in the provided object obj when calling load or loads. Objects will be recreated for other classes.
properties (bool, None, set/list/tuple, dict) –
Controls whether load will call properties’s setters instead of put them in self.__dict__ when the object as no __setstate__ method and properties are merged with attributes in the state dictionnary when dumping (merged if strict_pickle is False) . - False: call properties setters for none classes (as pickle) - True : (default) call properties setters for all classes - None : call only properties setters defined in serializejson.properties dict (added by plugins or manualy before decoder call) (see documentation section: ref:”Add plugins to serializejson”<add-plugins-label>. ) - set/list/tuple : call all properties setters for classes in this set/list/tuple, in addition to properties defined in serializejson.properties dict [class1, class2,..] (not secure if unstruted json, use it only for debuging) - dict : call properties setters defined in dict, in addition to properties defined in serializejson.properties dict {class1 : [“propertie1”,”propertie1”], class2: True}

Warning

The properties’s setters are called in the json order ! - in alphabetic order if sort_keys = True or if the object has not __getstate__ method. - in the order returned by the __getstate__ method if sort_keys = False - Be carefull if you rename an attribute because properties setters calls order can change. - If properties = True (or is a list) then serializejson load will differ from pickle that don’t call attribute’s setters.

It is best to add the __setate__() method to your object: - If you want to stay compatible with pickle with the same behavior. - If you want to call properties setters in a different order than alphabetic order and don’t want to code a __getstate__ method given the order. - If you want to call properties setters in a order robust to an attribute name change. - If you want to be robust to this properties parameter change. - If you want to avoid transitional states during setting of attribute one by one. In this method you can call the helping function : serialize.__setstate__(self,properties = True)
setters (bool,None,set/list/tuple,dict) –
Controls whether load will try to call setxxx,`set_xxx` or setXxx methods or xxx property setter for each attributes of the serialized objects when the object as no __setstate__ method. - False: call no other setters than thus called in __setstate__ methodes, like pickle. - True : (default) explore and call all setters for all objects (not secure if unstruted json, use it only for debuging) - None : call only setters defined in serializejson.setters dict (added by plugins or manualy before decoder call) (see documentation section: ref:”Add plugins to serializejson”<add-plugins-label>. ) - set/list/tuple : explore and call setters classes in set/list/tuple, in addition to setters defined in serializejson.setters dict [class1, class2,..] (not secure if unstruted json, use it only for debuging) - dict : call setters defined in dict, in addition to setters defined in serializejson.setters dict {class1 : {“attribut_name”:”setter_name”,…}, class2: True}

Warning

The attribute’s setters are called in the json order ! - in alphabetic order if sort_keys = True or if the object has not __getstate__ method. - in the order returned by the __getstate__ method if sort_keys = False - Be carefull if you rename an attribute because setters calls order can change. - If set_attribute = True (or is a list) then serializejson load will differ from pickle that don’t call attribute’s setters.

It is best to add the __setate__() method to your object: - If you want to stay compatible with pickle with the same behavior. - If you want to call setters in a different order than alphabetic order and don’t want to code a __getstate__ method given the order. - If you want to call setters in a order robust to an attribute name change. - If you want to be robust to this setters parameter change. - If you want to avoid transitional states during setting of attribute one by one. In this method you can call the helping function : serialize.__setstate__(self,setters = True or dict {name : setter_name,…})
strict_pickle (False by default) – If True serialize with exactly the same behaviour than pickle: - disabling properties setters - disabling setters - disabling numpy_array_from_list
accept_comments (bool) – Controls whether serializejson accepts to parse json with comments.
numpy_array_from_list (bool) – Controls whether list of bool, int or floats with same types elements should be loaded into numpy arrays.
numpy_array_from_heterogenous_list (bool) – Controls whether list of bool, int or floats with same or heterogenous types elements should be loaded into numpy arrays.
default_value – The value returned if the path passed to load doesn’t exist. It allows to have a default object at the first run of the script or when the json has been deleted, without raising of FileNotFoundError.
chunk_size (int) – Chunk size used when reading the json file.
dotdict (bool) – load dicts as serializejson.dotdict, a dict subclasse with acces to key names with a dot as object attributes enabled. A dotdict will be serialized as dict again when dumping. dotdict allows you to more easily access the elements of a deserialized dictionary, with the same ‘.’ acces syntax as for an object, allowing you if you wish, to later transform the dictionaries in your jsons into real objects with the addition of the “__class__” field, without having to modify your code.
add_jsonpath – If True, the source json path will be added to the loaded object as _jsonpath attribut. If False (by default), nothing will be added to the loaded object, but you can still retrieve the source json path with the “serializejson.jsonpath” function which will find the path from the object identifier

load(file=None, obj=None)[source]¶

Load object from json file.

Parameters:

file (optional str or file-like) – the json path or file-like object. When specified, json is read from this file. Otherwise json is read from the file passed to Decoder() constructor.
obj (optional) – If provided, the object obj will be updated and no new object will be created.

Returns:

created object or updated object if passed obj.

loads(json, obj=None)[source]¶

Load object from json string or bytes.

Parameters:

s – the json string.
obj (optional) – If provided, the object obj will be updated and no new object will be created.

Returns:

created object or updated object if passed obj.

set_default_value(value=[])[source]¶: Set the value returned if the path passed to load doesn’t exist. It allows to have a default object at the first run of the script or when the json has been deleted, without raising of FileNotFoundError. encoder.set_default_value() without any argument will remove the default_value and reactivate the raise of FileNotFoundError.

set_authorized_classes(classes)[source]¶: Define the classes that serializejson is authorized to recreate from the __class__ keywords in json, in addition to the usuals classes. Usual classes are : complex ,bytes, bytearray, Decimal, type, set, frozenset, range, slice, deque, datetime, timedelta, date, time numpy.array, numpy.dtype. authorized_classes must be a liste of classes or strings corresponding to the qualified names of classes (module.class_name). If the loading json contain an unauthorized __class__, serializejson will raise a TypeError exception.

Warning

Do not load serializejson files from untrusted / unauthenticated sources without carefully set the authorized_classes parameter. Never authorize “eval”, “exec”, “apply” or other functions or classes which could allow execution of malicious code with for example : {"__class__":"eval","__init__":"do_bad_things()"}

set_recognized_classes(classes)[source]¶: Set the classes (string with qualified name or classes) that serializejson will try to recognize from key names.

set_updatables_classes(updatables)[source]¶: Set the classes (string with qualified name or classes) that serializejson will try to update if already in the provided object obj when loading with load or loads. Otherwise the objects are recreated.

Functions API¶

Functions API are just convenient way to create Encoder or Decoder and call methods in a single instruction. If you have to encode or decode several objects reuse instances of Encoder and Decoder instead of functions to avoid creation of Encoder or Decoder at each function call. The functions arguments are the same as for Encoder and Decoder constructor. See documentation of Encoder and Decoder for more precisions.

Encode¶

serializejson.dump(obj, file, **argsDict)[source]¶

Dump an object into json file.

Parameters:

obj – object to dump.

file (str or file-like) – path or file.

**argsDict – parameters passed to the Encoder (see documentation).

serializejson.dumps(obj, **argsDict)[source]¶

Dump object into json string. If you want to return a bytes for pickle drop-in pickle remplacement, your should ether replace pickle.dumps calls by serializejson.dumpb calls or make an from serializejson import dumpb as dumps at the start of your script

Parameters:

obj – object to dump.

**argsDict – parameters passed to the Encoder (see documentation).

serializejson.append(obj, file=None, *, indent='\t', **argsDict)[source]¶

Append an object into json file.

Parameters:

obj – object to dump.

file (str or file-like) – path or file. The file must be empty or containing a json list.

indent – indent passed to Encoder.

**argsDict – other parameters passed to the Encoder (see documentation).

Decode¶

serializejson.load(file, *, obj=None, iterator=False, **argsDict)[source]¶

Load an object from a json file.

Parameters:

file (str or file-like) – the json path or file-like object.

obj (optional) – if provided, the object obj will be updated and no new object will be created.

iterator – if True and the json corresponds to a list then the items will be read one by one which reduces RAM consumption.

**argsDict – parameters passed to the Decoder (see documentation).

Returns:

created object, updated object if passed obj or elements iterator if iterator is True.

serializejson.loads(json, *, obj=None, iterator=False, **argsDict)[source]¶

Load an object from a json string or bytes.

Parameters:

json – the json string or bytes.

obj (optional) – If provided, the object obj will be updated and no new object will be created.

iterator – if True and the json corresponds to a list then the items will be read one by one which reduces RAM consumption.

**argsDict – parameters passed to the Decoder (see documentation).

Returns:

created object, updated object if obj is provided or elements iterator if iterator is True.

Custom object serialization¶

Method 1: Adding pickle methods to object for custom serialization¶

If you can add the required methods directly to you classes code, this is the recommended method. If you finally chose to use pickle instead of serializejson, your implemented methods will still be useful. serializejson uses the same methods as pickle and has exactly the same behavior as pickle if you use Encoder(strict_pickle = True) and Decoder(strict_pickle = True).

Depending of strategy you choose for recreating or updating an object, you will need to implement different methods:
object.__reduce__()¶
Code the __reduce__() method if you want to recreate your objects with __init__(). (or object.__reduce_ex__ if __reduce_ex__ as already be reimplemented in a base class, to overwrite it, python trying to call __reduce_ex__ first) You have to return a tuple with: class, init_args_tuple and optionally state. If state is a dictionary and sort_keys=False, elements will be restored in same order than given by __reduce__() given you the possibility to fine tuning the order elements are restored . For predictable behavior, be careful to always sort state as you want, manually or in alphabetic order with serializejson.sorted_filtered_attributs(self) If your object contains __slots__ and not __setstate__, state myst be a tuple (__dict__to_restore,__slots__dict_to_restore) for being able to seriazlie and deserialize your object with pickle. The convenient function serializejson.sorted_filtered_attributs(self), filter, sorte and split __slot__ and __dict__ if needed for you .
Warning

Alternatively you can return a tuple with a callable returning instance of the desired class, a callable arguments tuple, and optionally a state. In this case the callable will be considered as the class by authorized_classes, updatable_classes and set_attributes parameters. Except for “apply”, in which case the first element of the tuple in second position is considered as the class.

Never put “apply” in authorized_classes, it would allow untrusted json to execute arbitrary code.
Call __init__() with positional arguments, without state restore.
def __reduce__(self):
        init_args_tuple = (1,) # tuple with 1 element need comma
        return self.__class__,init_args_tuple
Call __init__() with named arguments, without state restore.
naming argument allows you to skip the first arguments if they have default values and is robust if you change later the init arguments order, but you will have to install the python module apply
def __reduce__(self):
    init_kwargs_dictionary = {"arg3":3}
    return apply,(self.__class__,None,init_kwargs_dictionary)
Call __init__() with positional arguments and restore state from attributes filtered and sorted alphabetically.
def __reduce__(self):
    init_args_tuple = (1,) # tuple with 1 element need comma
    state = serializejson.getstate(self)
    return self.__class__, init_args_tuple, state
Call __init__() with named arguments and restore state from attributes filtered and sorted alphabetically.
more robust if you change later the init args order, but you have to pip install apply
def __reduce__(self):
    init_kwargs_dictionary = {"arg3":3}
    state = serializejson.getstate(self)
    return  apply,
            (self.__class__,None,init_kwargs_dictionary),
            state
object.__getstate__()¶
Code the __getstate__() method without __reduce__() if you do not want to call __init__() but only __new__() and you want to have a different behavior than serialize self.__dict__ and self.__slots__ filtered (if attributes_filter is left at its default value "_") and sorted alphabetically . The __getstate__() method must return the state of the class as an object that will itself be serialized. If __setstate__() is not available, the returned object must be None or a dictionary. Otherwise the object can be any serializable object. If state is a dictionary and sort_keys=False (by default), elements will be restored in same order than given by __getstate__() given you the possibility to fine tuning the order elements are restored . For predictable behavior, be careful to always sort state as you want, manually or in alphabetic order.
def __getstate__(self):
    return {"attribut_1" : "value_1","attribut_2" : "value_2",....}
You can use the helping function serializejson.getstate(self) in your __getstate__ methode in order to select attribut to keep, add or remove, automaticaly sort_keys, filter attribut with “_”, retrieve slots, properties, getters , and remove attribut with same value as default value.
def __getstate__(self):
    return serializejson.getstate(self)
getstate(self, *, split_dict_slots=True, keep=None, add=None, remove=None, remove_types=None, filter_='_', properties=False, getters=False, extra_getters=None, sort_keys=True, lasts=None, last_classes=None, remove_default_values=False, default_values=None)¶

Generic __gestate__ method to retrieve the state of an object .

Parameters:

split_dict_slots – True if you want to stay compatible with pickle

keep – names of attributes/properties/getters to keep (and order if sort_keys is False)

add – names of attributes to add even if should be filtered by the filter

remove – names of attributes to remove even if not filtered by the filter

remove_types – types (not strings) of attributes to remove even if not filtered by the filter

filter_ – (bool or str) filter attributes starting with the given string. (“_” by default)

properties – Whether properties will be saved. - False: (default) no properties are saved - True : all properties (or only thus in keep) are saved - list/tuple : names of properties to save

getters – Whether values from getters will be saved. - False :(default) None getters are saved - True : all getters (or only thus in keep) are saved. (getters are guessed with introspection) - dict : dictionnary of “attribut”: “getter”. ex: {“a”:”getA”,”b”:”getB”, (“c”,”d”) : “getCD”} this option allow the finest control and is faster because getters are not guessed from introspection. With tuple as key in this dict, you can retrieve several attributes values from one getter.

extra_getters – dictionnary of extra getters. ex: {“c”:”retrive_c”} useful when getters is True and not all gettters are guessed by introspection.

sort_keys (True by default) – whether sort the state alphabeticaly. Be careful, if False the restoration of the object attributes may be in an arbitrary order.

lasts – names of attributes/properties/getters to put at last (and order if sort_keys is False)

last_classes (set) – set of classes to put at the end (even if sort_keys is True) Useful for classes referencing other objets, when you want to be sure that the referenced obejcts has already been serialized sooner in order to serialize only reférences. (ex : Qt Layouts)

remove_default_values – (False, True) whether attribut/properties/getter with same value as default value will be removed for lighter and more readable serialized files. If remove_default_values is True, add you still want to keep a attribut value even if same as his default value, use add = “attribut_name” parameter

default_values – (None or dict) - dict : dict of {“attribut”:”attribut_default_value”,…;} - None : serializejson will create an new instance of the object’s class, calling __init__() without any argument to know the default values.

Warning

if __reduce __ () is implemented and don’t call __getstate __() himself, __getstate __() will not be called
object.__setstate__(state)¶
Code __setstate__() if you want to have other behavior than the default which consists in just restoring attributes from state. Takes as parameter the object describing the state of the class and puts the instance back in the state it was in before serialization. Can possibly execute initialization code.
def __setstate__(self, state):
    self.set_x(state["x"])
    self.set_y(state["y"])
    # other initialization code
    ....
    ....
You can use the helping function serializejson.setstate(self) in order to automaticaly call propertie’s setters and setters.
def __setstate__(self, state):
    serializejson.setstate(self,state,properties = True, setters = True)
setstate(self, state, properties=False, setters=False, extra_setters=None, restore_default_values=False, default_values=None, order=None)¶

Generic __setstate_ method to restore the state of an object .

Parameters:

object instance to restore. (self) –

dictionnary containing the state of the object to restore. (state) –

properties – False: (default) no properties are saved True : all properties (or only thus in keep) are saved list or tuple : names of properties to save

setters – False :(default) None setters are called True : all getters are calaed (setters are guessed with introspection, parsing methodes with setXxx, set_xxx or setxxx name) dict : dictionnary of “attribut”: “setter”. ex: {“a”:”setA”,”b”:”setB”,(“c”,”d”):”setCD”} this option allow the finest control and is faster because getters are not guessed from instropection and it allow to call multi-attributs setters (ex : setCD restor “c” and “d”)

extra_setters – dictionnary of extra setters. ex: {“c”:”restore_c”} useful when setters is True and not all settters are guessed by introspection.

restore_default_values – (False, True) whether attribut/properties/setter not present in state will be restaured with there default value. Useful when __init__() is not called (update = True or object as not __reduce__() methode) and we have encoded with remove_default_values = True .

default_values – (None or dict) dict : dict of {“attribut”:”attribut_default_value”,…;} None : serializejson will create an new instance of the object’s class, calling __init__() without any argument to know the default values.

order – None : attributs are restored in state dictionnary key’s order list or tuple : attributs are restored in this order If a attribut belong to a multi-attributs setters (like {(“c”,”d”):”setCD”}), the setter will be called when one of the attribut occure .

Note

If __setstate __() is not available, all elements of self.__dict__, self.__slots__ or returned by __getstate__() or __reduce__() (which in this case must return a dict) will be restored as attributes.

Passively if Encoder or load have parameters setters = False and properties = False or strict_pickle = True (like pickle)

actively with call of properties setters if properties = True or properties = [..,your_object].

actively with call of setters if setters = True or setters = [..,your_object].

In order given by __gestate__() or __reduce__() (if sort_keys=False),

Otherwise in alphabetic order (event if sort_keys=False).

We recommend to add the __setstate__() method to your object:

If you want to call setter in a different order than alphabetic order and you don’t want to code __state__() or _reduce__() for that purpose.

If you want to be robust to a attribute name change.

If you want to be robust to the dump’s set_attribute parameter change.

If you want to avoid transitional states during setting of attribute one by one.

If you want the same behavior than pickle, for being able to comme back to pickle.

If there is restoring code depending of several elements of state, you should code __setstate__() to avoid transitional states during the restoration of attributes one by one.

If there is restoring code and you want a 100 % compatibility with pickle, you should put this code in __setstate__() for not depending of setters = True and properties = True.

Method 2: Adding plugin to pickle for custom serialization¶

If you can’t or won’t add __reduce__, __gestate__ or __setstate__ methodes directly into your object code, you have to write a module that you will import after serializejson (or pickle if you want to use pickle), named for exemple “pickle_Module.py” if you want to patch obects of “Module”.

In this module, you can either :
add dynamicaly __reduce__, __gestate__ or __setstate__ methodes to your objet by monkey patching :
See ref:”Method 1”<Method-1-label> for methods details. This will allow all inheriting object to benefit of this methods and to be directly serializable
import MyModule
def MyObject__reduce__(self):
    return self.__class__,("init_arg1","init_arg1"),None
# Method dynamicaly added to your objet by monkey patching
MyModule.MyObject.__reduce__  = MyObject__reduce__
add __reduce__ to the copyreg.dispatch_table dictionnary :
This function will be called by pickle or serializejson to serialialize this specific class type, all inheriting classes will not benefit of this plugin. If you want to use it for inheriting classes, you have to add this __reduce__ fonction for each of your inheriting classes in dispatch_table.
from copyreg import dispatch_table
import MyModule
def MyObject__reduce__(self):
    return self.__class__,("init_arg1","init_arg1"),None
dispatch_table[MyModule.MyObject] = MyObject__reduce__

Method 3: Adding plugins to serializejson for custom serialization¶

If you don’t care about making your object pickable or if your object is already pickable but want a different behavior than pickle for serializejson (because serialization in json is to verbose or not easly readable) you have the possibility to add plugins specifialy for serializejson’s serialization.
1. Create a plugin module `serializejson_module_name.py`
with module_name the name of of the module containing your object. and import it after serializejson in your code.
import serializejson
import serializejson_module_name
3. make imports in in your `serializejson_module_name.py`
try:
    import module_name
except ModuleNotFoundError:
    pass
else :
    from serializejson import (
            # encoding -------------
            dispatch_table,      # pickle plugins
                                 # (used if not serializejson plugin or methode)
            serializejson_,      # serializejson plugins
            encoder_parameters,  # encoder extra parameters for plugins,
                                 # with their default value
            getters = {}         # getters for dumped classes.
                                 # keys are classe.
                                 # values are True (for automatic getters detection)
                                 # or dictionnary of {"attribut" : "getAttribut" }
            # encoding and decoding -------
            properties = {}      # properties for dumped and loaded classes.
                                 # keys are classes
                                 # values are True (for automatic properties detection)
                                 # or list of ["attribut1","attribut2",..]}
            # decoding ---------------------
            authorized_classes,  # qualified names of classes autorized to be loaded
            setters,             # setters for loaded classes.
                                 # keys are classes.
                                 # values are True (for automatic setters detection)
                                 # or dictionnary of {"attribut" : "setAttribut" }
            constructors,        # custom construtors for loaded classes.
                                 # keys are string corresponding to the class qualified name,
                                 # value is the constructor
            decoder_parameters,  # decoder extra parameters for plugins
                                 # with their default value
            )
5. Authorize automaticaly classes to be loaded without having to precise it in the Decoder’s or load’s `authorized_classes` parameter.
Warning

be very carreful to automaticaly authorize only inoffensive object that can’t be used alone or in combinaison of other authorized objects for malicious code!
authorized_classes.update({
        "MyModule.XXX",
        "MyModule.YYY",
        "MyModule.ZZZ",
        })
4. Create functions named `XXX_serializejson`
in serializejson_module_name.py for each objects of your module, with XXX the name of the class. This functions must return all needed info for your object’s serialization in a tuple.
serializejson.plugins.module_name.XXX_serializejson(obj)[source]¶
Parameters:

obj – the object to serialize.

Returns:

(class, init_args, state,list_items ,dict_items,new_args)

Variable length tuple with one to six elements. Only the first ‘class’ element is required, other are set to None by default.

class(class or str):
the class or function called for object creation you should use obj.__class__ or string “module.submodule.name”

init_args (tuple,dict or None):

tuple: positional arguments you want pass to __init__() or to the callable

dict : keysword arguments you want pass to __init__() or to the callable (take little more space)

None : if you don’t want to call the __init__() but only __new__() when loading.

state (None, dict or object):
can be None, if the state is already restored calling __init__()

list_items (list or None):
list of items for class with list interface.

dict_items (dict or None):
dictionnary of items for class with dictionnary interface.

new_args (tuple,dict or None):

tuple: positional arguments you want pass to __new__() methode.

dict : keysword arguments you want pass to __new__() methode.

None : if you don’t want to call the __init__() but only __new__() when loading.

Example
def tuple_from_XXX(obj):
    init_args = (obj.attribute_1,obj.attribute_3)
    state = {"attribute_3":obj.attribute_3}
    return (obj.__class__, init_args, state)
. add dynamicaly this function as `__serializejson__` method to your class by monkey patching. This will allow all inheriting object to benefit of this methods and to be directly serializable
import MyModule
def XXX_serializejson(self):
    return self.__class__,{"init_arg1" : 1,"init_arg1" : 2},None
MyModule.XXX.__serializejson__  = XXX_serializejson # Method dynamicaly added to your objet by monkey patching
Note

You can add this methode directly in your object’s code , but it will make your object’s code depending of serializejson if you use serializejson’s functions like serializejson.__gestate__.
. or add `XXX_serializejson` to the `serializejson_` dictionnary. This function will be called by pickle or serializejson to serialialize this specific class type, all inheriting classes will not benefit of this plugin. If you want to use it for inheriting classes, you have to add this XXX_serializejson fonction for each of your inheriting classes in serializejson_ dictionnary.
import MyModule
def XXX_serializejson(self):
    return self.__class__,{"init_arg1" : 1,"init_arg1" : 2},None
serializejson_[MyModule.XXX] = XXX_serializejson
5. Define class properties and attributs getters and setters
properties[MyClass] = True # for automatic detection
# or
properties[MyClass] = {'property1','property2',...}


getters[MyClass] = True # for automatic detection
# or
setters[MyClass] = {'attribut_name':'getter_name',...}


setters[MyClass] = True # for automatic detection
# or
setters[MyClass] = {'attribut_name':'setter_name',...}
5. Automatically add new parameters to Encoder/dump/dumps or Decoder/load/loads for control your plugins options if needed
from serializejson import serialize_parameters, encoder_parameters,

def XXX_serializejson(self):
    if serialize_parameters.module_name_encoder_option_name:
        init_args = ...
        state = ...
    else:
        init_args = ...
        state = ...
    return (self.__class__,init_args,state)
MyModule.XXX.__serializejson__  = XXX_serializejson
encoder_parameters["module_name_encoder_option_name"] = False # record parameter and set default value


def XXX_setstate(self,state)
    if serialize_parameters.module_name_decoder_option_name:
        init_args = ...
        state = ...
    else:
        init_args = ...
        state = ...
    return (self.__class__,init_args,state)
MyModule.XXX.__setstate__  = XXX_setstate
encoder_parameters["module_name_decoder_option_name"] = False # record parameter and set default value
You can now use thesse optionss for encoding en decoding:
import serializejson
import module_name
obj = module_name.XXX()
# Function API
dumped = serializejson.dumps(obj,module_name_encoder_option_name = True)
print(dumped)

# Class API
encoder = serializejson.Encoder(module_name_encoder_option_name = True)
print(encoder.dumps(obj))

# Function API
print(serializejson.loads(dumped,module_name_decoder_option_name = True))
# Class API
decoder = serializejson.Decoder(module_name_decoder_option_name = True)
print(encoder.loads(dumped))
5. Customise the constructor, if needed
By default the json “__class__” field correspond to the class . but sometimes you want to use a different constructor without changing the json “__class__” field
constructors['my_module.XXX'] = constuctor # class or function called for object creation you should use `self.__class__` or string `"module.submodule.name"`
Or sometimes want to customize both serialized __class__ name and constructor :
import MyModule
def XXX_serializejson(self):
    return "custom_name",{"init_arg1" : 1,"init_arg1" : 2},None
MyModule.XXX.__serializejson__  = XXX_serializejson
constructors['custom_name'] = constuctor # class or function called for object creation you should use `self.__class__` or string `"module.submodule.name"`
7. Share your plugin with serializejson developer

if your plugin is for a wild user library, for include in serializejson next release. Avoiding you to manualy import it after import serializejson each time you want to use it.

Object Update¶

Updating an object consists in restoring its state recursively.

Neither __new__() or __init__() will be called.

All childrens of updatables_classes will be updated, otherwise will be recreated.

If the object has a __setstate__() method, this method will be called with the state.

Otherwise all the elements of the state dictionary will be restored as attributes. Passively if set_attribute = False (like pickle). Actively if set_attribute=True or set_attribute=[your object’s class], with call of setters (in alphabetic order if sort_keys=True or in random order if sort_keys=False).

Warning

You must make sure to have all the needed information in the state and not in the __init__ args that will be discarded when updating. See documentation section: ref:”If you want to make the object updatable”<updatable-note-label>.

If you want to make the object updatable:

Save all needed information outside of __init__ args when dumping:

put all needed information for an update in state (returned by __getstate__() or in third position by __reduce__()), because __init__() will not be called when updating, and all init arguments will be discarded.

minimize information redundancy for __init__() that is already in state (returned in second position by __reduce__()

you can remove calls to __init__() using __getsate__() instead of __reduce__(), if you don’t need to execute code in __init__() anymore when creating objects, because all the required initialization code is already in __setstate__() or setters.

Allow restoration of this information:

In __setstate__() method called with the state.

If you want to call setter in a different order than alphabetic order or the order given by __reduce__() or __getstate__()

If you want to be robust to a attribute name change or set_attribute parameter change.

If you want to avoid transitional states during setting of attribute one by one.

Otherwise all the elements of the state dictionary will be restored as attributes.

Passively if set_attribute = False (like pickle).

Actively with call of setters, if set_attribute=True or set_attribute=[your_object] (in alphabetic order or in the order given by __reduce__() or __getstate__()). ⚠ You must be sure to ever call load with set_attributes = True (or […,object]) or add a plugin for these objects with set_attributes = [object]

Versions¶

Version 0.2.0¶

Date:: 2021-02-18

API changed
can serialize dict with no-string keys
add support for cicular reférences and duplicates with {“$ref”: …}

Version 0.1.0¶

Date:: 2020-11-28

change description for pipy
add license for pipy
enable load of tuple, time.struct_time, Counter, OrderedDict and defaultdict

Version 0.0.4¶

Date:: 2020-11-24

API changed
add plugins support
add bytes, bytearray and numpy.array compression with blosc zstd
fix itertive append and decode (not fully tested).
fix dump of numpy types without conversion to python types(not yet numpy.float64)

Future Versions (TODO)¶

Add support for:

dict with __class__ key (detecte and raise exception or construct speciale object for reconstruction)

panda.dataframe

singletons (__reduce__ returning a string)

Metaclasses

Add test for:

every Encoder and Decoder parameters combinaisons.

object update

circular references and duplicates

PySide2

Optimization:

bytes: need pybase64.b64encode directly to str and rapidjson.RawJSON improvements

numpy array: need pybase64.b64decode directly to bytearray.

circular references and duplicates: need rapidjson improvements (Encoder.default call for list an dictionaries)

list of numbers: speed up _onlyOneDimNumbers function with Cython ?

json iterator:

speed up _json_object_file_iterator function with Cython ?

improve rapidjson for something like raw_decode of the standard json library ?

Improvements :

replace id check for duplicates by weakd_ref ? because id can be reused

allow alternatives compressors for images ?