All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, adheres to Semantic Versioning, and is generated by Changie.
- This is a patch release, please also check the full release note for 0.21.1.
- Adapt extension side changes in new versions.
- VSCode: 1.16.0
- IntelliJ Platform: 1.9.1
- This is a patch release, please also check the full release note for 0.21.
- Fixed Gitlab Context Provider.
- Due to changes in the indexing format, the
~/.tabby/index
directory will be automatically removed before any further indexing jobs are run. It is expected that the indexing jobs will be re-run (instead of incrementally) after the upgrade.
- Support connecting to llamafile model backend.
- Display Open / Closed state for issues / pull requests in Answer Engine context card.
- Support deleting the entire thread in Answer Engine.
- Add rate limiter options for HTTP-powered model backends.
- Fixed a panic that occurred when specifying a local model (#3464)
- Add pagination to Answer Engine threads.
- Fix Vulkan binary distributions.
- Improve the retry logic for chunk embedding computation in indexing job.
- Search results can now be edited directly.
- Allow switching backend chat models in Answer Engine.
- Added a connection test button in the
System
tab to test the connection to the backend LLM server.
- Optimized CR-LF inference in code completion. (#3279)
- Bumped
llama.cpp
version tob3995
.
- For Answer Engine, when the file content is reasonably short (e.g., less than 200 lines of code), include the entire file content directly instead of only the chunk (#3096).
- Allowed adding additional languages through the
config.toml
file. - Allowed customizing the
system_prompt
for Answer Engine.
- Redesigned homepage to make team activities (e.g., threads discussed in Answer Engine) discoverable.
- Supported downloading models with multiple partitions (e.g., Qwen-2.5 series).
- The Chat Side Panel implementation has been redesigned in version 0.18, necessitating an extension version bump for compatibility with 0.18.0.
- VSCode: >= 1.12.0
- IntelliJ: >= 1.8.0
- User Groups Access Control: Server Administrators can now assign user groups to specific context providers to precisely control which contexts can be accessed by which user groups.
- We've reworked the
Web
(a beta feature) context provider into theDeveloper Docs
context provider. Previously added context in theWeb
tab has been cleared and needs to be manually migrated toDeveloper Docs
.
-
Extensive rework has been done in the answer engine search box.
- Developer Docs / Web search is now triggered by
@
. - Repository Context is now selected using
#
.
- Developer Docs / Web search is now triggered by
-
Supports OCaml
- Starting from this version, we are utilizing websockets for features that require streaming (e.g., Answer Engine and Chat Side Panel). If you are deploying tabby behind a reverse proxy, you may need to configure the proxy to support websockets.
- Discussion threads in the Answer Engine are now persisted, allowing users to share threads with others.
- Fixed an issue where the llama-server subprocess was not being reused when reusing a model for Chat / Completion together (e.g., Codestral-22B) with the local model backend.
- Updated llama.cpp to version b3571 to support the jina series embedding models.
- The search bar in the Code Browser has been reworked and integrated with file navigation functionality.
- GraphQL syntax highlighting support in Code Browser.
- For linked GitHub repositories, issues and PRs are now only returned when the repository is selected.
- Fixed GitLab issues/MRs indexing - no longer panics if the description field is null.
- When connecting to localhost model servers, proxy settings are now skipped.
- Allow set code completion's
max_input_length
andmax_output_tokens
in config.toml
- Code search functionality is now available in the
Code Browser
tab. Users can search for code using regex patterns and filter by language, repository, and branch. - Initial experimental support for natural language to codebase conversation in
Answer Engine
.
- Incremental issues / PRs indexing by checking
updated_at
. - Canonicalize
git_url
before performing a relevant code search. Previously, for git_urls with credentials, the canonicalized git_url was used in the index, but the query still used the raw git_url. - bump llama.cpp to b3370 - which fixes Qwen2 model series inference
- Bump llama.cpp version to b3334, supporting Deepseek V2 series models.
- Turn on fast attention for Qwen2-1.5B model to fix the quantization error.
- Properly set number of GPU layers (to zero) when device is CPU.
- Introduced a new Home page featuring the Answer Engine, which activates when the chat model is loaded.
- Enhanced the Answer Engine's context by indexing issues and pull requests.
- Supports web page crawling to further enrich the Answer Engine's context.
- Enabled navigation through various git trees in the git browser.
- Turn on sha256 checksum verification for model downloading.
- Added an environment variable
TABBY_HUGGINGFACE_HOST_OVERRIDE
to overridehuggingface.co
with compatible mirrors (e.g.,hf-mirror.com
) for model downloading. - Bumped
llama.cpp
version to b3166. - Improved logging for the
llama.cpp
backend. - Added support for triggering background jobs in the admin UI.
- Enhanced logging for backend jobs in the admin UI.
- Support Gitlab SSO
- Support connect with Self-Hosted Github / Gitlab
- Repository Context is now utilizied in "Code Browser" as well
- llama-server from llama.cpp is now distributed as an individual binary, allowing for more flexible configuration
- HTTP API is out of experimental - you can connect tabby to models through HTTP API. Right now following APIs are supported:
- llama.cpp
- ollama
- mistral / codestral
- openai
- Fixed display of files where the path contains special characters. (#2081)
- Fixed non-admin users not being able to see the repository in Code Browser. (#2110)
- The
--webserver
flag is now enabled by default intabby serve
. To turn off the webserver and only use OSS features, use the--no-webserver
flag. - The
/v1beta/chat/completions
endpoint has been moved to/v1/chat/completions
, while the old endpoint is still available for backward compatibility.
- Upgraded llama.cpp to version b2715.
- Added support for integrating repositories from GitHub and GitLab using personal access tokens.
- Introduced a new Activities page to view user activities.
- Implemented incremental indexing for faster repository context updates.
- Added storage usage statistics in the System page.
- Included an
Ask Tabby
feature in the source code browser to provide in-context help from AI.
- Changed the default model filename from
q8_0.v2.gguf
tomodel.gguf
in MODEL_SPEC.md. - Excluded activities from deactivated users in reports.
- Introduced the
--chat-device
flag to specify the device used to run the chat model. - Added a "Reports" tab in the web interface, which provides team-wise statistics for Tabby IDE and Extensions usage (e.g., completions, acceptances).
- Enabled the use of segmented models with the
tabby download
command. - Implemented the "Go to file" functionality in the Code Browser.
- Fix worker unregisteration misfunctioning caused by unmatched address.
- Accurate repository context filtering using fuzzy matching on
git_url
field. - Support the use of client-side context, including function/class declarations from LSP, and relevant snippets from local changed files.
- Fix worker registration check against enterprise licenses.
- Fix default value of
disable_client_side_telemetry
when--webserver
is not used.
- Support for SMTP configuration in the user management system.
- Support for SSO and team management as features in the Enterprise tier.
- Fully managed repository indexing using
--webserver
, with job history logging available in the web interface.
- Ensure
~/.tabby/repositories
exists for tabby scheduler jobs: #1375 - Add cpu only binary
tabby-cpu
to docker distribution.
- Due to format changes, re-executing
tabby scheduler --now
is required to ensure thatCode Browser
functions properly.
- Introducing a preview release of the
Source Code Browser
, featuring visualization of code snippets utilized for code completion in RAG. - Added a Windows CPU binary distribution.
- Added a Linux ROCm (AMD GPU) binary distribution.
- Fixed an issue with cached permanent redirection in certain browsers (e.g., Chrome) when the
--webserver
flag is disabled. - Introduced the
TABBY_MODEL_CACHE_ROOT
environment variable to individually override the model cache directory. - The
/v1beta/chat/completions
API endpoint is now compatible with OpenAI's chat completion API. - Models from our official registry can now be referred to without the TabbyML prefix. Therefore, for the model TabbyML/CodeLlama-7B, you can simply refer to it as CodeLlama-7B everywhere.
- Tabby now includes built-in user management and secure access, ensuring that it is only accessible to your team.
- The
--webserver
flag is a new addition totabby serve
that enables secure access to the tabby server. When this flag is on, IDE extensions will need to provide an authorization token to access the instance.- Some functionalities that are bound to the webserver (e.g. playground) will also require the
--webserver
flag.
- Some functionalities that are bound to the webserver (e.g. playground) will also require the
- Fix #1036, events log should be written to dated json files.
- Add distribution support (running completion / chat model on different process / machine).
- Add conversation history in chat playground.
- Add
/metrics
endpoint for prometheus metrics collection.
- Fix the slow repository indexing due to constraint memory arena in tantivy index writer.
- Make
--model
optional, so users can create a chat only instance. - Add
--parallelism
to control the throughput and VRAM usage: #727
- llama.cpp backend (CPU, Metal) now requires a redownload of gguf model due to upstream format changes: #645 ggerganov/llama.cpp#3252
- Due to indexing format changes, the
~/.tabby/index
needs to be manually removed before any further runs oftabby scheduler
. TABBY_REGISTRY
is replaced withTABBY_DOWNLOAD_HOST
for the github based registry implementation.
- Improved dashboard UI.
- Cpu backend is switched to llama.cpp: #638
- add
server.completion_timeout
to control the code completion interface timeout: #637 - Cuda backend is switched to llama.cpp: #656
- Tokenizer implementation is switched to llama.cpp, so tabby no longer need to download additional tokenizer file: #683
- Fix deadlock issue reported in #718
- Supports golang: #553
- Supports ruby: #597
- Supports using local directory for
Repository.git_url
: usefile:///path/to/repo
to specify a local directory. - A new UI design for webserver.
- Improve snippets retrieval by dedup candidates to existing content + snippets: #582
The currently supported languages are:
- Rust
- Python
- JavaScript / JSX
- TypeScript / TSX
A blog series detailing the technical aspects of Retrieval-Augmented Code Completion will be published soon. Stay tuned!
- Fix Issue #511 by marking ggml models as optional.
- Improve stop words handling by combining RegexSet into Regex for efficiency.
- Fix a critical issue that might cause request dead locking in ctranslate2 backend (when loading is heavy)
We have introduced a new argument, --chat-model
, which allows you to specify the model for the chat playground located at http://localhost:8080/playground
To utilize this feature, use the following command in the terminal:
tabby serve --device metal --model TabbyML/StarCoder-1B --chat-model TabbyML/Mistral-7B
Mainland Chinese users have been facing challenges accessing Hugging Face due to various reasons. The Tabby team is actively working to address this issue by mirroring models to a hosting provider in mainland China called modelscope.cn.
## Download from the Modelscope registry
TABBY_REGISTRY=modelscope tabby download --model TabbyML/WizardCoder-1B
- Implemented more accurate UTF-8 incremental decoding in the GitHub pull request.
- Fixed the stop words implementation by utilizing RegexSet to isolate the stop word group.
- Improved model downloading logic; now Tabby will attempt to fetch the latest model version if there's a remote change, and the local cache key becomes stale.
- set default num_replicas_per_device for ctranslate2 backend to increase parallelism.
No releases yet, this file will be updated when generating your first release.