-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python: .NET: initial adr on list keys method #9932
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eavanvalkenburg Thanks for this ADR! I left some comments, let's discuss it with team.
Task<TRecord?> GetAsync(TKey key, GetRecordOptions? options = default, CancellationToken cancellationToken = default); | ||
IAsyncEnumerable<TRecord> GetBatchAsync(IEnumerable<TKey> keys, GetRecordOptions? options = default, CancellationToken cancellationToken = default); | ||
... | ||
Task<IEnumerable<TKey>> ListKeysAsync(CancellationToken cancellationToken = default); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can keep it as an option in this ADR, but just a comment why we shouldn't probably go with this approach - I'm not sure if dedicated method to get only keys is extensible, because another requirement may be to get keys together with specific column/property (e.g. FirstName
).
As an alternative, I would probably add something similar to GetBatchAsync
method, but without keys
parameter, where GetRecordOptions
parameter will contain Filter
property (similarly to VectorSearchOptions.Filter
). Then, if this Filter
is null, all records should be returned (but we should think about pagination here), otherwise it will be possible to filter by key or any other property.
Another point is that we should come up with some mechanism how to extend vector store functionality. Since it's interface at the moment, we don't want to add more methods to it, because it's going to be a breaking change.
Current interface contains methods to support main scenarios for vector stores, so maybe we can think about adding another interface for such cases that are present in this ADR or consider abstract class. Even if we decide to add it to current interface with breaking change, we should be careful about the size of this interface, to align with Interface Segregation Principle.
```csharp | ||
public interface IVectorStoreRecordCollectionListKeys<TKey, TRecord> : IVectorStoreRecordCollection<TKey, TRecord> | ||
{ | ||
Task<IEnumerable<TKey>> ListKeysAsync(CancellationToken cancellationToken = default); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As described in previous comment, I think there is no need in separate method dedicated to keys, since we can return an entire record, which could be more useful. On the other hand, we may return some extra data, which won't be used by consumer. So, we can implement it in a way that user will be able to decide which properties to return back by providing the list of them in options
parameter.
We can add a separate interface to list all keys in the vector store. This method will return a list of keys. | ||
|
||
```csharp | ||
public interface IVectorStoreRecordCollectionListKeys<TKey, TRecord> : IVectorStoreRecordCollection<TKey, TRecord> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it inherit IVectorStoreRecordCollection
interface? Would it be possible to keep it separately?
- This is a non-breaking change, each vector store record collection must choose to implement this interface. | ||
|
||
Cons: | ||
- This is a more complex way to add the functionality |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we decide to avoid this approach, I think we should add more details here why it's complex, so it will be easier to understand in the future why we decided to go with another option.
``` | ||
Pros: | ||
- This allows the other options to be used in concert with getting all records | ||
- This is a non-breaking change |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends. For example, if someone already implemented our interface for their own connector and then we make the parameter nullable, it will raise warning CS8767 - Nullability of reference types in type of parameter of doesn't match implicitly implemented member (possibly because of nullability attributes)
. For projects which handle warnings as errors, it will be a compilation error.
public class GetRecordOptions | ||
{ | ||
public bool IncludeVectors { get; init; } = false; | ||
public bool KeysOnly { get; init; } = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As described in one of the previous comments, accepting a collection of field/column names to return should be more extensible approach, since it will be possible to specify just a key property name as well as key together with other properties that are required in specific case.
Option 1: add a new method to the VectorStoreRecordCollection interface | ||
|
||
This is the cleanest approach, and for Python, the breaking change can be mitigated by providing a default implementation in the interface that returns an empty list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before describing Decision
section, let's discuss it with the team and then update this section based on team's agreement.
Motivation and Context
We are missing a list keys (or similar) method in our Vector Store Record Collection, as noted in #9911 and #9892
This ADR proposes a way to add that including some different approaches we could take.
The decision is to keep it simple and add it to the VectorStoreRecordCollection interface, but a discussion on how to mitigate breaking changes with that in dotnet is needed.
Description
Contribution Checklist