Descriptor Storage

DescriptorElement Interface

The DescriptorElement interface defines a standard for storing and retrieving a descriptor vector and it’s associated UID. Descriptors, also known as feature vectors, are defined here as numpy.ndarray instances. We do not constrain the vector data type at this level.

Descriptor elements are also associated with a UID. There is no standard for UID generation imposed here and is left to the user or generating algorithm to define UID attribution. Generally a UID, or unique identifier, “is an identifier that is guaranteed to be unique among all identifiers used for those objects and for a specific purpose.”

These are generally constrained to fit the python Hashable type definition.

Storing Many Elements

We provide an interface for storing groups of descriptor elements called the DescriptorSet. This provides an interface for storing and retrieving sets of DescriptorElement instances, accessing by UID, and iterating over contained elements.

Reference

class smqtk_descriptors.interfaces.descriptor_element.DescriptorElement(*args: Any, **kwargs: Any)[source]

Abstract descriptor vector container.

This structure supports implementations that cache descriptor vectors on a per-UUID basis.

UUIDs must maintain unique-ness when transformed into a string.

Descriptor element equality based on shared descriptor type and vector equality. Two descriptor vectors that are generated by different types of descriptor generator should not be considered the same (though, this may be up for discussion).

Stored vectors should be effectively immutable.

classmethod from_config(config_dict: Dict, type_str: str, uuid: collections.abc.Hashable, merge_default: bool = True)T[source]

Instantiate a new instance of this class given the desired type, uuid, and JSON-compliant configuration dictionary.

Parameters
  • type_str – Type of descriptor. This is usually the name of the content descriptor that generated this vector.

  • uuid – Unique ID reference of the descriptor.

  • config_dict – JSON compliant dictionary encapsulating a configuration.

  • merge_default – Merge the given configuration on top of the default provided by get_default_config.

Returns

Constructed instance from the provided config.

classmethod get_default_config()Dict[str, Any][source]

Generate and return a default configuration dictionary for this class. This will be primarily used for generating what the configuration dictionary would look like for this class without instantiating it.

By default, we observe what this class’s constructor takes as arguments, aside from the first two assumed positional arguments, turning those argument names into configuration dictionary keys. If any of those arguments have defaults, we will add those values into the configuration dictionary appropriately. The dictionary returned should only contain JSON compliant value types.

It is not be guaranteed that the configuration dictionary returned from this method is valid for construction of an instance of this class.

Returns

Default configuration dictionary for the class.

classmethod get_many_vectors(descriptors: Iterable[smqtk_descriptors.interfaces.descriptor_element.DescriptorElement])List[Optional[numpy.ndarray]][source]

Get an iterator over vectors associated with given descriptors.

Note

Most subclasses should override internal method _get_many_vectors rather than this external wrapper function. If a subclass does override this classmethod, it is responsible for appropriately handling any valid DescriptorElement, regardless of subclass.

Parameters

descriptors – Iterable of descriptors to query for.

Returns

Iterable of vectors associated with the given descriptors or None if the descriptor has no associated vector. Results are returned in the order that descriptors were given.

abstract has_vector()bool[source]
Returns

Whether or not this container current has a descriptor vector stored.

abstract set_vector(new_vec: numpy.ndarray)smqtk_descriptors.interfaces.descriptor_element.DescriptorElement[source]

Set the contained vector.

If this container already stores a descriptor vector, this will overwrite it.

Parameters

new_vec – New vector to contain.

Returns

Self.

type()str[source]
Returns

Type label type of the DescriptorGenerator that generated this vector.

uuid()collections.abc.Hashable[source]
Returns

Unique ID for this vector.

abstract vector()Optional[numpy.ndarray][source]
Returns

Get the stored descriptor vector as a numpy array. This returns None of there is no vector stored in this container.

class smqtk_descriptors.interfaces.descriptor_set.DescriptorSet(*args: Any, **kwargs: Any)[source]

Index of descriptors, keyed and query-able by descriptor UUID.

Note that these indexes do not use the descriptor type strings. Thus, if a set of descriptors has multiple elements with the same UUID, but different type strings, they will bash each other in these indexes. In such a case, when dealing with descriptors for different generators, it is advisable to use multiple indices.

abstract add_descriptor(descriptor: smqtk_descriptors.interfaces.descriptor_element.DescriptorElement)None[source]

Add a descriptor to this index.

Adding the same descriptor multiple times should not add multiple copies of the descriptor in the index (based on UUID). Added descriptors overwrite indexed descriptors based on UUID.

Parameters

descriptor – Descriptor to index.

abstract add_many_descriptors(descriptors: Iterable[smqtk_descriptors.interfaces.descriptor_element.DescriptorElement])None[source]

Add multiple descriptors at one time.

Adding the same descriptor multiple times should not add multiple copies of the descriptor in the index (based on UUID). Added descriptors overwrite indexed descriptors based on UUID.

Parameters

descriptors – Iterable of descriptor instances to add to this index.

abstract clear()None[source]

Clear this descriptor index’s entries.

abstract count()int[source]
Returns

Number of descriptor elements stored in this index.

abstract get_descriptor(uuid: collections.abc.Hashable)smqtk_descriptors.interfaces.descriptor_element.DescriptorElement[source]

Get the descriptor in this index that is associated with the given UUID.

Parameters

uuid – UUID of the DescriptorElement to get.

Raises

KeyError – The given UUID doesn’t associate to a DescriptorElement in this index.

Returns

DescriptorElement associated with the queried UUID.

abstract get_many_descriptors(uuids: Iterable[collections.abc.Hashable])Iterator[smqtk_descriptors.interfaces.descriptor_element.DescriptorElement][source]

Get an iterator over descriptors associated to given descriptor UUIDs.

Parameters

uuids – Iterable of descriptor UUIDs to query for.

Raises

KeyError – A given UUID doesn’t associate with a DescriptorElement in this index.

Returns

Iterator of descriptors associated to given uuid values.

get_many_vectors(uuids: Iterable[collections.abc.Hashable])List[Optional[numpy.ndarray]][source]

Get underlying vectors of descriptors associated with given uuids.

Parameters

uuids – Iterable of descriptor UUIDs to query for.

Raises

KeyError: When there is not a descriptor in this set for one or more input UIDs.

Returns

List of vectors for descriptors associated with given uuid values.

abstract has_descriptor(uuid: collections.abc.Hashable)bool[source]

Check if a DescriptorElement with the given UUID exists in this index.

Parameters

uuid – UUID to query for

Returns

True if a DescriptorElement with the given UUID exists in this index, or False if not.

items()Iterator[Tuple[collections.abc.Hashable, smqtk_descriptors.interfaces.descriptor_element.DescriptorElement]][source]

alias for iteritems

abstract iterdescriptors()Iterator[smqtk_descriptors.interfaces.descriptor_element.DescriptorElement][source]

Return an iterator over indexed descriptor element instances.

abstract iteritems()Iterator[Tuple[collections.abc.Hashable, smqtk_descriptors.interfaces.descriptor_element.DescriptorElement]][source]

Return an iterator over indexed descriptor key and instance pairs.

abstract iterkeys()Iterator[collections.abc.Hashable][source]

Return an iterator over indexed descriptor keys, which are their UUIDs.

keys()Iterator[collections.abc.Hashable][source]

alias for iterkeys

abstract remove_descriptor(uuid: collections.abc.Hashable)None[source]

Remove a descriptor from this index by the given UUID.

Parameters

uuid – UUID of the DescriptorElement to remove.

Raises

KeyError – The given UUID doesn’t associate to a DescriptorElement in this index.

abstract remove_many_descriptors(uuids: Iterable[collections.abc.Hashable])None[source]

Remove descriptors associated to given descriptor UUIDs from this index.

Parameters

uuids – Iterable of descriptor UUIDs to remove.

Raises

KeyError – A given UUID doesn’t associate with a DescriptorElement in this index.

class smqtk_descriptors.descriptor_element_factory.DescriptorElementFactory(d_type: Type[smqtk_descriptors.interfaces.descriptor_element.DescriptorElement], type_config: Dict[str, Any])[source]

Factory class for producing DescriptorElement instances of a specified type and configuration.

classmethod from_config(config_dict: Dict, merge_default: bool = True)T[source]

Instantiate a new instance of this class given the configuration JSON-compliant dictionary encapsulating initialization arguments.

This method should not be called via super unless and instance of the class is desired.

Parameters
  • config_dict – JSON compliant dictionary encapsulating a configuration.

  • merge_default – Merge the given configuration on top of the default provided by get_default_config.

Returns

Constructed instance from the provided config.

get_config()Dict[str, Any][source]

Return a JSON-compliant dictionary that could be passed to this class’s from_config method to produce an instance with identical configuration.

In the most cases, this involves naming the keys of the dictionary based on the initialization argument names as if it were to be passed to the constructor via dictionary expansion. In some cases, where it doesn’t make sense to store some object constructor parameters are expected to be supplied at as configuration values (i.e. must be supplied at runtime), this method’s returned dictionary may leave those parameters out. In such cases, the object’s from_config class-method would also take additional positional arguments to fill in for the parameters that this returned configuration lacks.

Returns

JSON type compliant configuration dictionary.

Return type

dict

classmethod get_default_config()Dict[str, Any][source]

Generate and return a default configuration dictionary for this class. This will be primarily used for generating what the configuration dictionary would look like for this class without instantiating it.

It is not be guaranteed that the configuration dictionary returned from this method is valid for construction of an instance of this class.

Returns

Default configuration dictionary for the class.

new_descriptor(type_str: str, uuid: collections.abc.Hashable)smqtk_descriptors.interfaces.descriptor_element.DescriptorElement[source]

Create a new DescriptorElement instance of the configured implementation

Parameters
  • type_str – Type of descriptor. This is usually the name of the content descriptor that generated this vector.

  • uuid – UUID to associate with the descriptor

Returns

New DescriptorElement instance