82 Commits

Author SHA1 Message Date
Suhas S
f67539882c fix: CSS files not being indexed (#7072)
CSS files were not being indexed because codeChunker expects code
structures (classes/functions) that don't exist in CSS. The chunker
would return zero chunks, preventing CSS files from being indexed.

This fix routes CSS, HTML, JSON and similar non-code files to
basicChunker instead of codeChunker, ensuring they are properly
indexed while maintaining intelligent chunking for actual code files.
2025-08-25 00:08:15 +08:00
uinstinct
fd19be2bfe cleanup async encoders after tests 2025-08-18 18:31:36 +05:30
uinstinct
e115abd5cd tests for codeChunker function definition 2025-08-11 08:43:43 +05:30
uinstinct
58f4fffae2 fix: function definition to respect max chunk size 2025-08-08 11:21:25 +05:30
Shawn Smith
7d9786606f fix: 🐛 Fix CodebaseIndexer Bugs (#6890)
* fix: 🐛 Fix CodebaseIndexer Bugs

* fix: 🎨 Prettier

* fix:  Fixed by reoganizing dependent tests

* build: 💚 New CI Build
2025-07-30 21:51:17 -07:00
Ting-Wai To
6bdae841d8 test: chunkDocumentWithoutId 2025-07-23 17:44:17 -07:00
Ting-Wai To
24574d3544 test: verify chunkDocument filters out chunks exceeding maxChunkSize 2025-07-22 11:24:59 -07:00
Nate Sesti
dbabee5906 Merge branch 'main' into patch-2 2025-06-20 19:13:30 -07:00
uinstinct
4b4e3e4702 fix misused promises errors in core folder 2025-06-13 23:01:28 +05:30
Jacob
6d6f89fe55 fix: Fix the issue where the chunk index might be duplicated 2025-06-09 11:04:23 +08:00
Nate
184be8f092 npx prettier --check "core/**/*.{js,jsx,ts,tsx,json,css,md}" --ignore-p
ath .gitignore --ignore-path .prettierignore --write
2025-05-25 18:34:39 -07:00
Patrick Erichsen
afb6bd0fb5 fix: truncate tagToString to max filename len 2025-05-12 11:28:12 -07:00
Nate
706d8a3b5b remove code chunker error telemetry 2025-03-18 19:33:43 -07:00
jubilantjerry
8662122b84 Remove path requirement in addTag indexing 2025-02-25 15:15:38 +08:00
jubilantjerry
520498b9b3 Add uniqueness constraint for chunk_tags 2025-02-24 17:06:56 +08:00
Dallin Romney
ddfcefa036 path -> ide part 243243 2024-12-10 20:34:21 -08:00
Dallin Romney
b6b3cceb78 (BROKEN COMMI) path -> uri vscode utils and prompt files cont 2024-12-10 17:28:56 -08:00
Dallin Romney
a88915ec71 merge main 2024-12-10 13:28:41 -08:00
Dallin Romney
0bb131680b path-uri-context-updates 2024-12-10 13:19:15 -08:00
Dallin Romney
8bfc6f1605 more core tests 2024-12-05 23:34:57 -08:00
Dallin Romney
af5d38e653 move some stuff back 2024-12-04 17:33:58 -08:00
Dallin Romney
a58b0ff7b5 revert fixtures location for less file changes 2024-12-04 17:31:38 -08:00
Dallin Romney
d19411aa0b revert testdir location for less file changes 2024-12-04 17:30:13 -08:00
Dallin Romney
02f40d04bd core test cleanup 2024-12-04 17:21:04 -08:00
Test
b5cab4609b chore: apply eslint to vscode and core 2024-11-11 07:38:52 -08:00
DongjaJ
0f8625faf4 refactor: declare private class properties and initialize 2024-10-18 14:31:24 +09:00
Nate
65925baa5b don't bother users with non-critical indexing errors 2024-08-29 19:05:22 -07:00
Nate
38cd597431 fall back to basic chunker when parser fails to load 2024-08-29 12:40:20 -07:00
Test
66998efad9 Merge branch 'dev' into pe/repo-map 2024-08-27 14:11:48 -07:00
Test
b7ca371d57 safer indexing 2024-08-27 12:24:17 -07:00
Test
bf426d5e21 feat: get return_type and parameters from snippets 2024-08-26 17:55:39 -07:00
Test
6600501fe7 fix ci again 2024-08-21 22:37:07 -07:00
Patrick Erichsen
5406d0e1a2 test: colocate tests in core 2024-08-16 10:35:40 -07:00
Patrick Erichsen
8cdb68c6d6 update tests 2024-08-15 13:38:15 -07:00
Patrick Erichsen
978f387922 testing: add indexing update tests 2024-08-14 11:30:36 -07:00
Patrick Erichsen
333ba5d6da Update ChunkCodebaseIndex.ts 2024-08-13 10:30:54 -07:00
Patrick Erichsen
3844208a3b feat: improve chunking desc on large projects 2024-08-13 10:27:28 -07:00
Rob Leidle
4c4b211aae Move the bulk insert resolve into the COMMIT callback
db.db.serialize(...) guarantees that each statement runs in order. However, the function itself will terminate execution long before all the statements are executed. Therefore, we should only resolve the promise when the last COMMIT transaction has run. In the other cases, there will have been an error and we'll have call reject(err).

This also removed the need for the extra insert that was recently added so it was removed (in fact the extra insert was causing conflict errors).
2024-08-09 16:14:05 -07:00
Rob Leidle
d9efc11193 Turn of chunking for files with more than 1m characters and files lacking extensions
This should help with some of the unnecessary work being done in indexing
2024-08-09 13:41:15 -07:00
Nate Sesti
7582aa8240 Indexing tests (#1979)
* fix missing insertion into lancedb index

* successfully running codebaseindex test

* await

* fix typo

* clean up indexing tests

* expectPlan
2024-08-09 12:54:53 -07:00
Rob Leidle
fad684cb78 Change indexing to operate on batches as returned by getComputeDeleteAddRemove to fix problems (#1971)
The prior approach, of batching on lists of files as returned by
walkDir(...) broke the assumptions made by
getComputeDeleteAddRemove(...). Instead, call
getComputeDeleteAddRemove() and then batch on its results. This
approach uses a bit more memory but is roughly just as fast.

In addition, made markComplete(...)'s signature return Promise<void> as
it is an async function. Awaiting this async function has made
everything "slow" again because we do not do bulk inserts for
markComplete database rows. However, that can be fixed in a subsequent
commit.

Also disabled WAL mode for SQLite as we do not completely control the
program lifecycle so was noticing a lot of times where indexing resutls
were not checkpointed to disk.

Co-authored-by: Rob Leidle <rleidle@tesla.com>
2024-08-08 15:34:55 -07:00
Nate Sesti
faef4a6ed1 use console.debug 2024-08-08 16:20:52 -04:00
Rob Leidle
4eb5eacfb2 Fix a build breakage on MacOS (#1955)
Without this change, the following error occurs on MacOS (but not linux):

  ../core/indexing/chunk/ChunkCodebaseIndex.ts:175:5 - error TS2322: Type 'unknown[]' is not assignable to type 'Chunk[]'.
    Type '{}' is missing the following properties from type 'Chunk': digest, filepath, index, content, and 2 more.

  175     return chunkLists.flat();
          ~~~~~~~~~~~~~~~~~~~~~~~~~

Co-authored-by: Rob Leidle <rleidle@tesla.com>
2024-08-07 07:02:31 -07:00
Rob Leidle
dfbc6aabcb Convert ChunkCodebaseIndexer's sqlite inserts into a single bulk insert to improve performance (#1943)
This change greatly improves the performance when inserting chunks.

Co-authored-by: Rob Leidle <rleidle@tesla.com>
2024-08-06 16:14:05 -07:00
Rob Leidle
30db947c77 Change ChunkCodebaseIndex to add a tag for known chunks rather than rechunking the file (#1926)
Note that the prior logic referenced the wrong file contents so it was
likely impacting the accuracy of portions of the index that use the
chunks table.

Co-authored-by: Rob Leidle <rleidle@tesla.com>
2024-08-05 16:39:12 -07:00
Priyash
e93ce84cd5 feat: enable WAL (Write-Ahead Logging) for improved performance and stability (#1885)
Enable Write-Ahead Logging (WAL) mode for the SQLite database. This change is intended to enhance both the performance and stability of database operations, particularly when handling concurrent transactions and large volumes of data.
2024-08-05 16:38:23 -07:00
Nate Sesti
552f0e0b2d handle method_declaration in code chunker 2024-08-05 11:35:08 -04:00
Nate Sesti
6a784b6a5a tweaks to new tokenizing 2024-07-26 15:04:08 -07:00
Rob Leidle
78a4025414 Offload all token counting to worker processes as well as some optimizations to do more token counting in parallel
Adding the workerpool logic broke gui as vite does not support __dirname
or path.join(...). To work around this, moved the gui dependency of
stripImages out of countTokens.ts
2024-07-26 13:10:36 -07:00
Patrick Erichsen
3e0fae35a7 feat: include recently + open files in codebase search (#1833)
* feat: include recently + open files in codebase search

* cleanup

* Update BaseRetrievalPipeline.ts

* add params object to chunkDocument

* Update package-lock.json
2024-07-26 12:40:52 -05:00