A Deep Learning based Vaoursynth filter for colorizing and restoring old images and video, based on DeOldify , DDColor, Deep Exemplar based Video Colorization and ColorMNet.
The Vapoursynth filter version has the advantage of coloring the images directly in memory, without the need to use the filesystem to store the video frames.
This filter (HAVC in short) is able to combine the results provided by DeOldify and DDColor, which are some of the best models available for coloring pictures, providing often a final colorized image that is better than the image obtained from the individual models. But the main strength of this filter is the addition of specialized filters to improve the quality of videos obtained by using these color models and the possibility to improve further the stability by using these models as input to Deep Exemplar based Video Colorization model (DeepEx in short) and ColorMNet. Both DeepEx and ColorMNet are exemplar-based video colorization models and allow to colorize a Video in sequence based on the colorization history, enforcing its coherency by using a temporal consistency loss. ColorMNet is more recent and advanced respect to DeepEx and it is suggested to use it as default exemplar-based model.
This filter is distributed with the torch package provided with the Hybrid Windows Addons. To use it on Desktop (Windows) it is necessary install Hybrid and the related Addons. Hybrid is a Qt-based frontend for other tools (including this filter) which can convert most input formats to common audio & video formats and containers. It represent the easiest way to colorize images with the HAVC filter using VapourSynth.
- PyTorch 2.1.1 or later
- VapourSynth R62 or later
- MiscFilters.dll Vapoursynth's Miscellaneous Filters
pip install vsdeoldify-x.x.x-py3-none-any.whl
with the version 4.0.0 of HAVC has been released a modified version of DDColor to manage the Scene Detection properties available in the input clip, this version can be installed with the command:
pip install vsddcolor-1.0.1-py3-none-any.whl.zip
with the version 4.5.0 of HAVC has been introduced the support to ColorMNet. All the necessary packages to use ColorMNet are included in Hybrid's torch add-on package. For a manual installation not using Hybrid, it is necessary to install all the packages reported in the project page of ColorMNet. To simplify the installation, in the release 4.5.0 of this filter is available as asset the spatial_correlation_sampler package compiled against CUDA 12.4, python 3.12 and torch. To install it is necessary to unzip the following archive (using the nearest torch version available in the host system):
spatial_correlation_sampler-0.5.0-py312-cp312-win_amd64_torch-x.x.x.whl.zip
in the Library packages folder: .\Lib\site-packages\
The models are not installed with the package, they must be downloaded from the Deoldify website at: completed-generator-weights.
The models to download are:
- ColorizeVideo_gen.pth
- ColorizeStable_gen.pth
- ColorizeArtistic_gen.pth
The model files have to be copied in the models directory usually located in:
.\Lib\site-packages\vsdeoldify\models
To use ColorMNet it also necessary to download the file DINOv2FeatureV6_LocalAtten_s2_154000.pth and save it in
.\Lib\site-packages\vsdeoldify\colormnet\weights
At the first usage it is possible that are automatically downloaded by torch the neural networks: resnet101 and resnet34, and starting with the release 4.5.0: resnet50, resnet18, dinov2_vits14_pretrain and the folder facebookresearch_dinov2_main
So don't be worried if at the first usage the filter will be very slow to start, at the initialization are loaded almost all the Fastai and PyTorch modules and the resnet networks.
It is possible specify the destination directory of networks used by torch, by using the function parameter torch_hub_dir, if this parameter is set to None, the files will be downloaded in the torch's cache dir, more details are available at: caching-logic .
The models used by DDColor can be installed with the command
python -m vsddcolor
The models for Deep-Exemplar based Video Colorization. can be installed by downloading the file colorization_checkpoint.zip available in: inference code.
The archive colorization_checkpoint.zip have to be unziped in: .\Lib\site-packages\vsdeoldify\deepex
# loading plugins
core.std.LoadPlugin(path="MiscFilters.dll")
# changing range from limited to full range for HAVC
clip = core.resize.Bicubic(clip, range_in_s="limited", range_s="full")
# setting color range to PC (full) range.
clip = core.std.SetFrameProps(clip=clip, _ColorRange=0)
# adjusting color space from YUV420P16 to RGB24
clip = core.resize.Bicubic(clip=clip, format=vs.RGB24, matrix_in_s="709", range_s="full")
from vsdeoldify import HAVC_main, HAVC_ddeoldify
# DeOldify with DDColor, Preset = "fast"
clip = HAVC_main(clip=clip, Preset="fast")
# DeOldify only model
clip = HAVC_ddeoldify(clip, method=0)
# DDColor only model
clip = HAVC_ddeoldify(clip, method=1)
# To apply video color stabilization filters for ddeoldify
clip = HAVC_stabilizer(clip, dark=True, smooth=True, stab=True)
# Simplest way to use Presets
clip = HAVC_main(clip=clip, Preset="fast", ColorFix="violet/red", ColorTune="medium", ColorMap="none")
# ColorMNet model using HAVC as input for the reference frames
clip = HAVC_main(clip=clip, EnableDeepEx=True, ScThreshold=0.1)
# changing range from full to limited range for HAVC
clip = core.resize.Bicubic(clip, range_in_s="full", range_s="limited")
See __init__.py
for the description of the parameters.
NOTE: In the DDColor version included with HAVC the parameter input_size has changed name in render_factor because were changed the range of values of this parameter to be equivalent to render_factor in DeOldify, the relationship between these 2 parameters is the following:
input_size = render_factor * 16
In the modified version of DDColor 1.0.1 was added the boolean parameter scenechange, if this parameter is set to True, will be colored only the frames tagged as scene change.
The filter was developed having in mind to use it mainly to colorize movies. Both DeOldify and DDcolor are good models for coloring pictures (see the Comparison of Models). But when are used for coloring movies they are introducing artifacts that usually are not noticeable in the images. Especially in dark scenes both DeOldify and DDcolor are not able to understand what it is the dark area and what color to give it, they often decide to color these dark areas with blue, then in the next frame this area could become red and then in the next frame return to blue, introducing a flashing psychedelic effect when all the frames are put in a movie. To try to solve this problem has been developed pre- and post- process filters. It is possible to see them in the Hybrid screenshot below.
The main filters introduced are:
Chroma Smoothing : This filter allows to reduce the vibrancy of colors assigned by DeOldify/DDcolor by using the parameters de-saturation and de-vibrancy (the effect on vibrancy will be visible only if the option chroma resize is enabled, otherwise this parameter has effect on the luminosity). The area impacted by the filter is defined by the thresholds dark/white. All the pixels with luma below the dark threshold will be impacted by the filter, while the pixels above the white threshold will be left untouched. All the pixels in the middle will be gradually impacted depending on the luma value.
Chroma Stabilization: This filter will try to stabilize the frames' colors. As explained previously since the frames are colored individually, the colors can change significantly from one frame to the next, introducing a disturbing psychedelic flashing effect. This filter try to reduce this by averaging the chroma component of the frames. The average is performed using a number of frames specified in the Frames parameter. Are implemented 2 averaging methods:
- Arithmetic average: the current frame is averaged using equal weights on the past and future frames
- Weighted average: the current frame is averaged using a weighed mean of the past and future frames, where the weight decrease with the time (far frames have lower weight respect to the nearest frames).
As explained previously the stabilization is performed by averaging the past/future frames. Since the non matched areas of past/future frames are gray because is missing in the past/future the color information, the filter will apply a color restore procedure that fills the gray areas with the pixels of current frames (eventually de-saturated with the parameter "sat"). The image restored in this way is blended with the non restored image using the parameter "weight". The gray areas are selected by the threshold parameter "tht". All the pixels in the HSV color space with "S" < "tht" will be considered gray. If is detected a scene change (controlled by the parameter "tht_scen"), the color restore is not applied.
DDColor Tweaks: This filter is available only for DDColor and has been added because has been observed that the DDcolor's inference is quite poor on dark/bright scenes depending on the luma value. This filter will force the luma of input image to don't be below the threshold defined by the parameter luma_min. Moreover this filter allows to apply a dynamic gamma correction. The gamma adjustment will be applied when the average luma is below the parameter gamma_luma_min. A gamma value > 2.0 improves the DDColor stability on bright scenes, while a gamma < 1 improves the DDColor stability on dark scenes.
Unfortunately when are applied to movies the color models are subject to assign unstable colors to the frames especially on the red/violet chroma range. This problem is more visible on DDColor than on DeOldify. To mitigate this issue was necessary to implement some kind of chroma adjustment. This adjustment allows to de-saturate all the colors included in a given color range. The color range must be specified in the HSV color space. This color space is useful because all the chroma is represented by only the parameter "Hue". In this color space the colors are specified in degree (from 0 to 360), as shown in the DDeoldify Hue Wheel. It is possible to apply this adjustment on all filters described previously. Depending on the filter the adjustment can be enabled using the following syntax:
chroma_range = "hue_start:hue_end" or "hue_wheel_name"
for example this assignment:
chroma_range = "290:330,rose"
specify the range of hue colors: 290-360, because "rose" is hue wheel name that correspond to the range:330-360.
It is possible to specify more ranges by using the comma "," separator.
When the de-saturation information is not already available in the filter's parameters, it necessary to use the following syntax:
chroma_adjustment = "chroma_range|sat,weight"
in this case it is necessary to specify also the de-saturation parameter "sat" and the blending parameter "weight".
for example with this assignment:
chroma_range = "300:340|0.4,0.2"
the hue colors in the range 300-340 will be de-saturated by the amount 0.4 and the final frame will be blended by applying a 20% de-saturation of 0.4 an all the pixels (if weight=0, no blending is applied).
To simplify the usage of this filter has been added the Preset ColorFix which allows to fix a given range of chroma combination. The strength of the filter is controlled by the the Preset ColorTune.
Using an approach similar to Chroma Adjustment has been introduced the possibility to remap a given gange of colors in another chroma range. This remapping is controlled by the Preset ColorMap. For example the preset "blue->brown" allows to remap all the chroma combinations of blue in the color brown. It is not expected that this filter can be applied on a full movie, but it could be useful to remap the color on some portion of a movie.
In the post ColorMapping Guide for vsDeOldify are provided useful tips on how to use both the Chroma Adjustment and Color Mapping features provided by this filter.
As explained previously, this filter is able to combine the results provided by DeOldify and DDColor, to perform this combination has been implemented 6 methods:
-
DeOldify only coloring model.
-
DDColor only color model.
-
Simple Merge: the frames are combined using a weighted merge, where the parameter merge_weight represent the weight assigned to the frames provided by the DDcolor model.
-
Constrained Chroma Merge: given that the colors provided by DeOldify's Video model are more conservative and stable than the colors obtained with DDcolor. The frames are combined by assigning a limit to the amount of difference in chroma values between DeOldify and DDcolor. This limit is defined by the parameter threshold. The limit is applied to the frame converted to "YUV". For example when threshold=0.1, the chroma values "U","V" of DDcolor frame will be constrained to have an absolute percentage difference respect to "U","V" provided by DeOldify not higher than 10%. If merge_weight is < 1.0, the chroma limited DDColor frames will be will be merged again with the frames of DeOldify using the Simple Merge.
-
Luma Masked Merge: the behaviour is similar to the method Adaptive Luma Merge. With this method the frames are combined using a masked merge. The pixels of DDColor's frame with luma < luma_limit will be filled with the (de-saturated) pixels of DeOldify, while the pixels above the white_limit threshold will be left untouched. All the pixels in the middle will be gradually replaced depending on the luma value. If the parameter merge_weight is < 1.0, the resulting masked frames will be merged again with the non de-saturated frames of DeOldify using the Simple Merge.
-
Adaptive Luma Merge: given that the DDcolor performance is quite bad on dark scenes, with this method the images are combined by decreasing the weight assigned to DDcolor frames when the luma is below the luma_threshold. For example with: luma_threshold = 0.6 and alpha = 1, the weight assigned to DDcolor frames will start to decrease linearly when the luma < 60% till min_weight. For alpha=2, the weight begins to decrease quadratically.
The merging methods 2-5 are leveraging on the fact that usually the DeOldify Video model provides frames which are more stable, this feature is exploited to stabilize also DDColor. The methods 3 and 4 are similar to Simple Merge, but before the merge with DeOldify the DDColor frame is limited in the chroma changes (method 3) or limited based on the luma (method 4). The method 5 is a Simple Merge where the weight decrease with luma.
Taking inspiration from the article published on Habr: Mode on: Comparing the two best colorization AI's. It was decided to use it to get the refence images and the images obtained using the ColTran model, to extend the analysis with the models implemented in the HAVC filter.
The added models are:
D+D: DeOldify (with model Video & render_factor = 23) + DDColor (with model Artistic and render_factor = 24)
DD: DDColor (with model Artistic and input_size = 384)
DS: DeOldify (with model Stable & render_factor = 30)
DV: DeOldify (with model Video & render_factor = 23)
T241: ColTran + TensorFlow 2.4.1 model as shown in Habr
To compare the models I decided to use a metric being able to consider the perceptual non-uniformities in the evaluation of color difference between images. These non-uniformities are important because the human eye is more sensitive to certain colors than others. Over time, The International Commission on Illumination (CIE) has proposed increasingly advanced measurement models to measure the color distance taking into account the human color perception, that they called dE. One of the most advanced is the CIEDE2000 method, that I decided to use as color similarity metric to compare the models. The final results are shown in the table below (test image can be seen by clicking on the test number)
Test # | D+D | DD | DS | DV | T241 |
---|---|---|---|---|---|
01 | 10.7 | 8.7 | 8.8 | 12.7 | 15.7 |
02 | 11.8 | 11.7 | 12.7 | 12.7 | 15.9 |
03 | 5.5 | 3.8 | 5.6 | 7.6 | 9.9 |
04 | 6.2 | 8.5 | 4.6 | 5.3 | 9.0 |
05 | 6.6 | 8.4 | 8.8 | 8.6 | 12.5 |
06 | 10.2 | 9.9 | 10.6 | 11.2 | 16.4 |
07 | 6.5 | 6.7 | 6.8 | 7.7 | 10.2 |
08 | 6.7 | 6.4 | 7.5 | 8.3 | 9.9 |
09 | 11.7 | 11.7 | 15.2 | 13.8 | 16.5 |
10 | 7.8 | 8.0 | 9.1 | 8.4 | 9.5 |
11 | 7.5 | 8.0 | 8.0 | 7.8 | 14.8 |
12 | 7.7 | 7.6 | 8.6 | 7.8 | 13.7 |
13 | 11.8 | 11.9 | 14.2 | 13.7 | 16.8 |
14 | 5.3 | 5.2 | 4.4 | 5.3 | 7.2 |
15 | 8.2 | 7.3 | 10.7 | 10.6 | 15.7 |
16 | 12.0 | 12.3 | 9.8 | 12.7 | 19.7 |
17 | 11.1 | 10.2 | 11.6 | 12.4 | 16.7 |
18 | 6.7 | 9.3 | 7.2 | 8.6 | 13.1 |
19 | 3.7 | 4.4 | 4.7 | 3.9 | 4.6 |
20 | 8.7 | 10.1 | 6.9 | 9.2 | 11.0 |
21 | 6.9 | 6.9 | 8.1 | 8.4 | 10.4 |
22 | 11.5 | 11.8 | 13.3 | 12.2 | 12.7 |
23 | 5.6 | 7.1 | 11.4 | 8.8 | 11. |
Avg(dE) | 8.3 | 8.5 | 9.1 | 9.5 | 12.7 |
The calculation of dE with the CIEDE2000 method was obtained by leveraging on the computational code available in ColorMine.
As it is possible to see the model that performed better is the D+D model (which I called HAVC ddelodify because is using both DeOldify and DDColor). This model was the best model in 10 tests out of 23. Also the DD model performed well but there were situations where the DD model provided quite bad colorized images like in Test #23 and the combination with the DeOldify allowed to significantly improve the final image. In effect the average distance of DD was 8.5 while for DV was 9.5, given that the 2 models were weighted at 50%, if the images were positively correlated a value 9.0 would have been expected, instead the average distance measured for D+D was 8.3, this implies that the 2 models were able to compensate each other. Conversely, the T241 was the model that performed worse with the greatest average difference in colors. Finally, the quality of DeOldify models was similar, being DS slightly better than DV (as expected).
Given the goodness of CIEDE2000 method to provide a reliable estimate of human color perception, I decided to provide an additional set of tests including some of the cases not considered previously.
The models added are:
DA: DeOldify (with model Artistic & render_factor = 30)
DDs: DDColor (with model ModelScope and input_size = 384)
DS+DD: DeOldify (with model Stable & render_factor = 30) + DDColor (with model Artistic and render_factor = 24)
DA+DDs: DeOldify (with model Artistic & render_factor = 30) + DDColor (with model ModelScope and render_factor = 24)
DA+DD: DeOldify (with model Artistic & render_factor = 30) + DDColor (with model Artistic and render_factor = 24)
The results of this additional tests set are shown in the table below (test image can be seen by clicking on the test number)
Test # | DS+DD | DA+DDs | DA+DD | DDs | DA |
---|---|---|---|---|---|
01 | 7.7 | 7.5 | 8.2 | 8.2 | 8.6 |
02 | 11.8 | 11.4 | 11.9 | 11.6 | 13.2 |
03 | 4.5 | 4.2 | 3.9 | 4.5 | 4.2 |
04 | 5.9 | 5.1 | 6.0 | 6.6 | 5.9 |
05 | 6.4 | 6.5 | 6.7 | 9.5 | 9.0 |
06 | 10.0 | 10.0 | 10.3 | 9.5 | 11.4 |
07 | 6.1 | 7.3 | 6.6 | 8.1 | 8.0 |
08 | 6.2 | 8.1 | 7.3 | 8.1 | 9.4 |
09 | 12.7 | 11.3 | 11.5 | 12.5 | 13.3 |
10 | 8.1 | 7.7 | 8.0 | 7.1 | 9.0 |
11 | 7.2 | 7.3 | 7.4 | 8.6 | 7.9 |
12 | 8.0 | 7.1 | 8.0 | 6.5 | 9.3 |
13 | 12.0 | 11.7 | 12.0 | 11.8 | 13.8 |
14 | 4.5 | 4.6 | 4.8 | 5.8 | 4.8 |
15 | 8.3 | 8.1 | 8.9 | 8.2 | 12.2 |
16 | 10.6 | 10.5 | 10.7 | 12.5 | 9.9 |
17 | 10.8 | 12.1 | 11.4 | 12.3 | 13.5 |
18 | 6.7 | 7.1 | 6.1 | 11.1 | 7.2 |
19 | 3.5 | 4.6 | 4.5 | 5.1 | 7.1 |
20 | 8.0 | 8.1 | 8.2 | 9.3 | 7.6 |
21 | 6.9 | 6.7 | 7.1 | 7.1 | 9.0 |
22 | 12.1 | 11.0 | 10.9 | 12.1 | 11.2 |
23 | 6.2 | 6.3 | 6.0 | 7.8 | 10.2 |
Avg(dE) | 8.0 | 8.0 | 8.1 | 8.9 | 9.4 |
First of all, it should be noted that the individual models added (DA for DeOldify and DDs for DDColor) performed worse than the individual models tested in the previous analysis (DS for DeOldify and DD for DDColor). Conversely all combinations of DeOldify and DDColor performed well. Confirming the positive impact on the final result, already observed in the previous analysis, obtained by combining the 2 models.
As stated previously to stabilize further the colorized videos it is possible to use the frames colored by HAVC as reference frames (exemplar) as input to the supported exemplar-based models: ColorMNet and Deep Exemplar based Video Colorization model.
In Hybrid the Exemplar Models have their own panel, as shown in the following picture:
For the ColorMNet models there are 2 implementations defined, by the field Mode:
- 'remote' (has not memory frames limitation but it uses a remote process for the inference)
- 'local' (the inference is performed inside the vapoursynth local thread but has memory limitation)
The field Preset control the render method and speed, allowed values are:
- 'Fast' (faster but colors are more washed out)
- 'Medium' (colors are a little washed out)
- 'Slow' (slower but colors are a little more vivid)
The field SC thresh define the sensitivity for the scene detection (suggested value 0.1, see Miscellaneous Filters), while the field SC min freq allows to specify the minimum number of reference frames that have to be generated.
The flag Vivid has 2 different meanings depending on the Exemplar Model used:
- ColorMNet (the frames memory is reset at every reference frame update)
- DeepEx (given that the colors generated by the inference are a little washed out , the saturation of colored frames will be increased by about 25%).
The field Method allows to specify the type of reference frames (RF) provided in input to the Exemplar-based Models, allowed values are:
- 0 = HAVC same as video (default)
- 1 = HAVC + RF same as video
- 2 = HAVC + RF different from video
- 3 = external RF same as video
- 4 = external RF different from video
- 5 = HAVC different from video
It is possible to specify the directory containing the external reference frames by using the field Ref FrameDir. The frames must be named using the following format: ref_nnnnnn.[png|jpg].
Unfortunately all the Deep-Exemplar methods have the problem that are unable to properly colorize the new "features" (new elements not available in the reference frame) so that often these new elements are colored with implausible colors (see for an example: New "features" are not properly colored). To try to fix this problem has been introduced the possibility to merge the frames propagated by DeepEx with the frames colored with DDColor and/or DeOldify. The merge is controlled by the field Ref merge, allowed values are:
- 0 = no merge
- 1 = reference frames are merged with low weight
- 2 = reference frames are merged with medium weight
- 3 = reference frames are merged with high weight
When the field Ref merge is set to a value greater than 0, the field SC min freq is set =1, to allows the merge for every frame (for some example see: New RC3 release).
Finally the flag Reference frames only can be used to export the reference frames generated with the method HAVC and defined by the parameters SC thresh , SC min freq fields.
As stated previously the simplest way to colorize images with the HAVC filter it to use Hybrid. To simplify the usage has been introduced standard Presets that automatically apply all the filter's settings. A set of parameters that are able to provide a satisfactory colorization are the following:
- Preset: medium (fast will increase the speed with a little decrease in color accuracy)
- Color map: none
- ColorFix: violet/red
- Denoise: light
- Stabilize: stable (or morestable)
then enable the Exemplar Models check box and set
- Method: HAVC
- SC thresh: 0.10
- SC SSIM thresh: 0.00
- SC min freq: 10 (5 if is used the local mode)
- normalize: checked
- Mode: remote
- Frames: 0
- Preset: medium (slow will increase the color accuracy but the speed will decrease of 40%)
- Vivid: checked
In the following picture are shown the suggested parameters:
In Summary HAVC is able to provide often a final colorized image that is better than the image obtained from the individual models, and can be considered an improvement respect to the current Models. It is suggested to use the settings described in the previous section, willing to use a custom configuration, the suggested setup for video encoding is:
- D+D: DeOldify (with model Video & render_factor = 24) + DDColor (with model Artistic and render_factor = 24)
willing to accept a decrease in encoding speed of about 40% it is possible to improve a little the colorization process by using the configuration:
- DS+DD: DeOldify (with model Video & render_factor = 30) + DDColor (with model Artistic and render_factor = 30)
It is also suggested to enable the DDColor Tweaks (to apply the dynamic gamma correction) and the post-process filters: Chroma Smoothing and Chroma Stabilization. Unfortunately is not possible provide a one size fits-all solution and the filter parameters need to be adjusted depending on the type of video to be colored.
As a final consideration I would like to point out that the test results showed that the images coloring technology is mature enough to be used concretely both for coloring images and, thanks to Hybrid, videos.
I would like to thank Selur, author of Hybrid, for his wise advices and for having developed a gorgeous interface for this filter. Despite the large number of parameters and the complexity of managing them appropriately, the interface developed by Selur makes its use easy even for non-experts users.