Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: .NET: initial adr on list keys method #9932

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

eavanvalkenburg
Copy link
Member

Motivation and Context

We are missing a list keys (or similar) method in our Vector Store Record Collection, as noted in #9911 and #9892
This ADR proposes a way to add that including some different approaches we could take.

The decision is to keep it simple and add it to the VectorStoreRecordCollection interface, but a discussion on how to mitigate breaking changes with that in dotnet is needed.

Description

Contribution Checklist

Copy link
Member

@dmytrostruk dmytrostruk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eavanvalkenburg Thanks for this ADR! I left some comments, let's discuss it with team.

Task<TRecord?> GetAsync(TKey key, GetRecordOptions? options = default, CancellationToken cancellationToken = default);
IAsyncEnumerable<TRecord> GetBatchAsync(IEnumerable<TKey> keys, GetRecordOptions? options = default, CancellationToken cancellationToken = default);
...
Task<IEnumerable<TKey>> ListKeysAsync(CancellationToken cancellationToken = default);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep it as an option in this ADR, but just a comment why we shouldn't probably go with this approach - I'm not sure if dedicated method to get only keys is extensible, because another requirement may be to get keys together with specific column/property (e.g. FirstName).

As an alternative, I would probably add something similar to GetBatchAsync method, but without keys parameter, where GetRecordOptions parameter will contain Filter property (similarly to VectorSearchOptions.Filter). Then, if this Filter is null, all records should be returned (but we should think about pagination here), otherwise it will be possible to filter by key or any other property.

Another point is that we should come up with some mechanism how to extend vector store functionality. Since it's interface at the moment, we don't want to add more methods to it, because it's going to be a breaking change.

Current interface contains methods to support main scenarios for vector stores, so maybe we can think about adding another interface for such cases that are present in this ADR or consider abstract class. Even if we decide to add it to current interface with breaking change, we should be careful about the size of this interface, to align with Interface Segregation Principle.

```csharp
public interface IVectorStoreRecordCollectionListKeys<TKey, TRecord> : IVectorStoreRecordCollection<TKey, TRecord>
{
Task<IEnumerable<TKey>> ListKeysAsync(CancellationToken cancellationToken = default);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As described in previous comment, I think there is no need in separate method dedicated to keys, since we can return an entire record, which could be more useful. On the other hand, we may return some extra data, which won't be used by consumer. So, we can implement it in a way that user will be able to decide which properties to return back by providing the list of them in options parameter.

We can add a separate interface to list all keys in the vector store. This method will return a list of keys.

```csharp
public interface IVectorStoreRecordCollectionListKeys<TKey, TRecord> : IVectorStoreRecordCollection<TKey, TRecord>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it inherit IVectorStoreRecordCollection interface? Would it be possible to keep it separately?

- This is a non-breaking change, each vector store record collection must choose to implement this interface.

Cons:
- This is a more complex way to add the functionality
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide to avoid this approach, I think we should add more details here why it's complex, so it will be easier to understand in the future why we decided to go with another option.

```
Pros:
- This allows the other options to be used in concert with getting all records
- This is a non-breaking change
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends. For example, if someone already implemented our interface for their own connector and then we make the parameter nullable, it will raise warning CS8767 - Nullability of reference types in type of parameter of doesn't match implicitly implemented member (possibly because of nullability attributes). For projects which handle warnings as errors, it will be a compilation error.

public class GetRecordOptions
{
public bool IncludeVectors { get; init; } = false;
public bool KeysOnly { get; init; } = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As described in one of the previous comments, accepting a collection of field/column names to return should be more extensible approach, since it will be possible to specify just a key property name as well as key together with other properties that are required in specific case.

Comment on lines +114 to +116
Option 1: add a new method to the VectorStoreRecordCollection interface

This is the cleanest approach, and for Python, the breaking change can be mitigated by providing a default implementation in the interface that returns an empty list.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before describing Decision section, let's discuss it with the team and then update this section based on team's agreement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants