vidsim

command module
v1.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 30, 2024 License: BSD-3-Clause Imports: 1 Imported by: 0

README

vidsim - find similar/duplicate videos in your collection

vidsim is a tool that scans a set of video files and identifies videos that are "similar." Since frame-by-frame video comparison is prohibitively slow and doesn't scale well for large collections, this tool takes a pragmatic approach, in that it extracts a single frame from erach video and compares those frames. While this is not as reliable, it works well on typical personal video collections (i.e. not requiring months to run).

Installation

go install github.com/abelikoff/vidsim@latest

[!NOTE] > vidsim uses ffmpeg tool for frame generation, so make sure the latter is installed as well.

Operation

The vidsim command supports multiple actions described below.

Find similar videos

In its basic form, vidsim scans specified directories and compares all video files, producting a JSON report for all files considered "similar:"

vidsim process <dir1> <dir2> ...

Since frame extraction and comparison are relatively slow and expensive, vidsim supports caching of the artifacts it computes, using cached values in future re-runs, which massively speeds up the operation. In order to invoke caching, one specifies a directory to be used for cached data with -d option:

vidsim -d .my.cache.dir process <dir1> <dir2> ...
Handle false positives

Since the comparison logic is imprecise, the will inevitably false positive matches: videos identified as similar, which are not. Running the tool repeatedly and revisiting those false positives again and again is annoying and distracting. To address this, vidsim allows marking pairs of videos as false positive matches, so that when it runs next time, this pair of videos will not be reported as a match. Naturally, this is only supported with caching on.

To mark a set of video files as pairwise false positives, use the unmatch command:

vidsim -d .my.cache.dir unmatch <video_file1> <video_file2> ...
Compacting the state

State can be compacted, removing data for files that no longer exist:

vidsim -d .my.cache.dir compact
Considerations about filenames

By default vidsim saves filenames using relative (to the top directories specified) paths. This has two implications:

  • When operating on files (e.g. marking false positives), paths have to be specified precisely using that convention.
  • When compacting the state, the command should be run in the same directory where process command was run, otherwise vidsim will not find the files listed in the state and will think those files have been deleted (it does have a sanity check agains mass deletion however).

Alternatively, one can specify a --abs_paths option to make vidsim store absolute paths. This takes more space but it avoids the problem above.

Future work

  • Add a peek command to examine the cached data.
  • Display cache statistics during processing.

Documentation

Overview

Copyright © 2024 NAME HERE <EMAIL ADDRESS>

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL