CSS files were not being indexed because codeChunker expects code
structures (classes/functions) that don't exist in CSS. The chunker
would return zero chunks, preventing CSS files from being indexed.
This fix routes CSS, HTML, JSON and similar non-code files to
basicChunker instead of codeChunker, ensuring they are properly
indexed while maintaining intelligent chunking for actual code files.
db.db.serialize(...) guarantees that each statement runs in order. However, the function itself will terminate execution long before all the statements are executed. Therefore, we should only resolve the promise when the last COMMIT transaction has run. In the other cases, there will have been an error and we'll have call reject(err).
This also removed the need for the extra insert that was recently added so it was removed (in fact the extra insert was causing conflict errors).
The prior approach, of batching on lists of files as returned by
walkDir(...) broke the assumptions made by
getComputeDeleteAddRemove(...). Instead, call
getComputeDeleteAddRemove() and then batch on its results. This
approach uses a bit more memory but is roughly just as fast.
In addition, made markComplete(...)'s signature return Promise<void> as
it is an async function. Awaiting this async function has made
everything "slow" again because we do not do bulk inserts for
markComplete database rows. However, that can be fixed in a subsequent
commit.
Also disabled WAL mode for SQLite as we do not completely control the
program lifecycle so was noticing a lot of times where indexing resutls
were not checkpointed to disk.
Co-authored-by: Rob Leidle <rleidle@tesla.com>
Without this change, the following error occurs on MacOS (but not linux):
../core/indexing/chunk/ChunkCodebaseIndex.ts:175:5 - error TS2322: Type 'unknown[]' is not assignable to type 'Chunk[]'.
Type '{}' is missing the following properties from type 'Chunk': digest, filepath, index, content, and 2 more.
175 return chunkLists.flat();
~~~~~~~~~~~~~~~~~~~~~~~~~
Co-authored-by: Rob Leidle <rleidle@tesla.com>
Note that the prior logic referenced the wrong file contents so it was
likely impacting the accuracy of portions of the index that use the
chunks table.
Co-authored-by: Rob Leidle <rleidle@tesla.com>
Enable Write-Ahead Logging (WAL) mode for the SQLite database. This change is intended to enhance both the performance and stability of database operations, particularly when handling concurrent transactions and large volumes of data.
Adding the workerpool logic broke gui as vite does not support __dirname
or path.join(...). To work around this, moved the gui dependency of
stripImages out of countTokens.ts