mirror of
https://github.com/ggerganov/llama.cpp.git
synced 2024-12-29 15:44:18 +01:00
2ac95c9d56
* SimpleChat:DU:BringIn local helper js modules using importmap Use it to bring in a simple trim garbage at end logic, which is used to trim received response. Also given that importmap assumes esm / standard js modules, so also global variables arent implicitly available outside the modules. So add it has a member of document for now * SimpleChat:DU: Add trim garbage at end in loop helper * SimpleChat:DU:TrimGarbage if unable try skip char and retry * SimpleChat:DU: Try trim using histogram based info TODO: May have to add max number of uniq chars in histogram at end of learning phase. * SimpleChat:DU: Switch trim garbage hist based to maxUniq simple Instead of blindly building histogram for specified substring length, and then checking if any new char within specified min garbage length limit, NOW exit learn state when specified maxUniq chars are found. Inturn there should be no new chars with in the specified min garbage length required limit. TODO: Need to track char classes like alphabets, numerals and special/other chars. * SimpleChat:DU: Bring in maxType to the mix along with maxUniq Allow for more uniq chars, but then ensure that a given type of char ie numerals or alphabets or other types dont cross the specified maxType limit. This allows intermixed text garbage to be identified and trimmed. * SimpleChat:DU: Cleanup debug log messages * SimpleChat:UI: Move html ui base helpers into its own module * SimpleChat:DU:Avoid setting frequence/Presence penalty Some models like llama3 found to try to be over intelligent by repeating garbage still, but by tweaking the garbage a bit so that it is not exactly same. So avoid setting these penalties and let the model's default behaviour work out, as is. Also the simple minded histogram based garbage trimming from end, works to an extent, when the garbage is more predictable and repeatative. * SimpleChat:UI: Add and use a para-create-append helper Also update the config params dump to indicate that now one needs to use document to get hold of gMe global object, this is bcas of moving to module type js. Also add ui.mjs to importmap * SimpleChat:UI: Helper to create bool button and use it wrt settings * SimpleChat:UI: Add Select helper and use it wrt ChatHistoryInCtxt * SimpleChat:UI:Select: dict-name-value, value wrt default, change Take a dict/object of name-value pairs instead of just names. Inturn specify the actual value wrt default, rather than the string representing that value. Trap the needed change event rather than click wrt select. * SimpleChat:UI: Add Div wrapped label+element helpers Move settings related elements to use the new div wrapped ones. * SimpleChat:UI:Add settings button and bring in settings ui * SimpleChat:UI:Settings make boolean button text show meaning * SimpleChat: Update a bit wrt readme and notes in du * SimpleChat: GarbageTrim enable/disable, show trimmed part ifany * SimpleChat: highlight trim, garbage trimming bitmore aggressive Make it easy for end user to identified the trimmed text. Make garbage trimming logic, consider a longer repeat garbage substring. * SimpleChat: Cleanup a bit wrt Api end point related flow Consolidate many of the Api end point related basic meta data into ApiEP class. Remove the hardcoded ApiEP/Mode settings from html+js, instead use the generic select helper logic, inturn in the settings block. Move helper to generate the appropriate request json string based on ApiEP into SimpleChat class itself. * SimpleChat:Move extracting assistant response to SimpleChat class so also the trimming of garbage. * SimpleChat:DU: Bring in both trim garbage logics to try trim * SimpleChat: Cleanup readme a bit, add one more chathistory length * SimpleChat:Stream:Initial handshake skeleton Parse the got stream responses and try extract the data from it. It allows for a part read to get a single data line or multiple data line. Inturn extract the json body and inturn the delta content/message in it. * SimpleChat: Move handling oneshot mode server response Move handling of the oneshot mode server response into SimpleChat. Also add plumbing for moving multipart server response into same. * SimpleChat: Move multi part server response handling in * SimpleChat: Add MultiPart Response handling, common trimming Add logic to call into multipart/stream server response handling. Move trimming of garbage at the end into the common handle_response helper. Add new global flag to control between oneshot and multipart/stream mode of fetching response. Allow same to be controlled by user. If in multipart/stream mode, send the stream flag to the server. * SimpleChat: show streamed generative text as it becomes available Now that the extracting of streamed generated text is implemented, add logic to show the same on the screen. * SimpleChat:DU: Add NewLines helper class To work with an array of new lines. Allow adding, appending, shifting, ... * SimpleChat:DU: Make NewLines shift more robust and flexible * SimpleChat:HandleResponseMultiPart using NewLines helper Make handle_response_multipart logic better and cleaner. Now it allows for working with the situation, where the delta data line got from server in stream mode, could be split up when recving, but still the logic will handle it appropriately. ALERT: Rather except (for now) for last data line wrt a request's response. * SimpleChat: Disable console debug by default by making it dummy Parallely save a reference to the original func. * SimpleChat:MultiPart/Stream flow cleanup Dont try utf8-decode and newlines-add_append if no data to work on. If there is no more data to get (ie done is set), then let NewLines instance return line without newline at end, So that we dont miss out on any last-data-line without newline kind of scenario. Pass stream flag wrt utf-8 decode, so that if any multi-byte char is only partly present in the passed buffer, it can be accounted for along with subsequent buffer. At sametime, bcas of utf-8's characteristics there shouldnt be any unaccounted bytes at end, for valid block of utf8 data split across chunks, so not bothering calling with stream set to false at end. LATER: Look at TextDecoder's implementation, for any over intelligence, it may be doing.. If needed, one can use done flag to account wrt both cases. * SimpleChat: Move baseUrl to Me and inturn gMe This should allow easy updating of the base url at runtime by the end user. * SimpleChat:UI: Add input element helper * SimpleChat: Add support for changing the base url This ensures that if the user is running the server with a different port or wants to try connect to server on a different machine, then this can be used. * SimpleChat: Move request headers into Me and gMe Inturn allow Authorization to be sent, if not empty. * SimpleChat: Rather need to use append to insert headers * SimpleChat: Allow Authorization header to be set by end user * SimpleChat:UI+: Return div and element wrt creatediv helpers use it to set placeholder wrt Authorization header. Also fix copy-paste oversight. * SimpleChat: readme wrt authorization, maybe minimal openai testing * SimpleChat: model request field for openai/equivalent compat May help testing with openai/equivalent web services, if they require this field. * SimpleChat: readme stream-utf-8 trim-english deps, exception2error * Readme: Add a entry for simplechat in the http server section * SimpleChat:WIP:Collate internally, Stream mode Trap exceptions This can help ensure that data fetched till that point, can be made use of, rather than losing it. On some platforms, the time taken wrt generating a long response, may lead to the network connection being broken when it enters some user-no-interaction related power saving mode. * SimpleChat:theResp-origMsg: Undo a prev change to fix non trim When the response handling was moved into SimpleChat, I had changed a flow bit unnecessarily and carelessly, which resulted in the non trim flow, missing out on retaining the ai assistant response. This has been fixed now. * SimpleChat: Save message internally in handle_response itself This ensures that throwing the caught exception again for higher up logic, doesnt lose the response collated till that time. Go through theResp.assistant in catch block, just to keep simple consistency wrt backtracing just in case. Update the readme file. * SimpleChat:Cleanup: Add spacing wrt shown req-options * SimpleChat:UI: CreateDiv Divs map to GridX2 class This allows the settings ui to be cleaner structured. * SimpleChat: Show Non SettingsUI config field by default * SimpleChat: Allow for multiline system prompt Convert SystemPrompt into a textarea with 2 rows. Reduce user-input-textarea to 2 rows from 3, so that overall vertical space usage remains same. Shorten usage messages a bit, cleanup to sync with settings ui. * SimpleChat: Add basic skeleton for saving and loading chat Inturn when ever a chat message (system/user/model) is added, the chat will be saved into browser's localStorage. * SimpleChat:ODS: Add a prefix to chatid wrt ondiskstorage key * SimpleChat:ODS:WIP:TMP: Add UI to load previously saved chat This is a temporary flow * SimpleChat:ODS:Move restore/load saved chat btn setup to Me This also allows being able to set the common system prompt ui element to loaded chat's system prompt. * SimpleChat:Readme updated wrt save and restore chat session info * SimpleChat:Show chat session restore button, only if saved session * SimpleChat: AutoCreate ChatRequestOptions settings to an extent * SimpleChat: Update main README wrt usage with server
267 lines
8.9 KiB
JavaScript
267 lines
8.9 KiB
JavaScript
//@ts-check
|
|
// Helpers to work with different data types
|
|
// by Humans for All
|
|
//
|
|
|
|
/**
|
|
* Given the limited context size of local LLMs and , many a times when context gets filled
|
|
* between the prompt and the response, it can lead to repeating text garbage generation.
|
|
* And many a times setting penalty wrt repeatation leads to over-intelligent garbage
|
|
* repeatation with slight variations. These garbage inturn can lead to overloading of the
|
|
* available model context, leading to less valuable response for subsequent prompts/queries,
|
|
* if chat history is sent to ai model.
|
|
*
|
|
* So two simple minded garbage trimming logics are experimented below.
|
|
* * one based on progressively-larger-substring-based-repeat-matching-with-partial-skip and
|
|
* * another based on char-histogram-driven garbage trimming.
|
|
* * in future characteristic of histogram over varying lengths could be used to allow for
|
|
* a more aggressive and adaptive trimming logic.
|
|
*/
|
|
|
|
|
|
/**
|
|
* Simple minded logic to help remove repeating garbage at end of the string.
|
|
* The repeatation needs to be perfectly matching.
|
|
*
|
|
* The logic progressively goes on probing for longer and longer substring based
|
|
* repeatation, till there is no longer repeatation. Inturn picks the one with
|
|
* the longest chain.
|
|
*
|
|
* @param {string} sIn
|
|
* @param {number} maxSubL
|
|
* @param {number} maxMatchLenThreshold
|
|
*/
|
|
export function trim_repeat_garbage_at_end(sIn, maxSubL=10, maxMatchLenThreshold=40) {
|
|
let rCnt = [0];
|
|
let maxMatchLen = maxSubL;
|
|
let iMML = -1;
|
|
for(let subL=1; subL < maxSubL; subL++) {
|
|
rCnt.push(0);
|
|
let i;
|
|
let refS = sIn.substring(sIn.length-subL, sIn.length);
|
|
for(i=sIn.length; i > 0; i -= subL) {
|
|
let curS = sIn.substring(i-subL, i);
|
|
if (refS != curS) {
|
|
let curMatchLen = rCnt[subL]*subL;
|
|
if (maxMatchLen < curMatchLen) {
|
|
maxMatchLen = curMatchLen;
|
|
iMML = subL;
|
|
}
|
|
break;
|
|
}
|
|
rCnt[subL] += 1;
|
|
}
|
|
}
|
|
console.debug("DBUG:DU:TrimRepeatGarbage:", rCnt);
|
|
if ((iMML == -1) || (maxMatchLen < maxMatchLenThreshold)) {
|
|
return {trimmed: false, data: sIn};
|
|
}
|
|
console.debug("DBUG:TrimRepeatGarbage:TrimmedCharLen:", maxMatchLen);
|
|
let iEnd = sIn.length - maxMatchLen;
|
|
return { trimmed: true, data: sIn.substring(0, iEnd) };
|
|
}
|
|
|
|
|
|
/**
|
|
* Simple minded logic to help remove repeating garbage at end of the string, till it cant.
|
|
* If its not able to trim, then it will try to skip a char at end and then trim, a few times.
|
|
* This ensures that even if there are multiple runs of garbage with different patterns, the
|
|
* logic still tries to munch through them.
|
|
*
|
|
* @param {string} sIn
|
|
* @param {number} maxSubL
|
|
* @param {number | undefined} [maxMatchLenThreshold]
|
|
*/
|
|
export function trim_repeat_garbage_at_end_loop(sIn, maxSubL, maxMatchLenThreshold, skipMax=16) {
|
|
let sCur = sIn;
|
|
let sSaved = "";
|
|
let iTry = 0;
|
|
while(true) {
|
|
let got = trim_repeat_garbage_at_end(sCur, maxSubL, maxMatchLenThreshold);
|
|
if (got.trimmed != true) {
|
|
if (iTry == 0) {
|
|
sSaved = got.data;
|
|
}
|
|
iTry += 1;
|
|
if (iTry >= skipMax) {
|
|
return sSaved;
|
|
}
|
|
got.data = got.data.substring(0,got.data.length-1);
|
|
} else {
|
|
iTry = 0;
|
|
}
|
|
sCur = got.data;
|
|
}
|
|
}
|
|
|
|
|
|
/**
|
|
* A simple minded try trim garbage at end using histogram driven characteristics.
|
|
* There can be variation in the repeatations, as long as no new char props up.
|
|
*
|
|
* This tracks the chars and their frequency in a specified length of substring at the end
|
|
* and inturn checks if moving further into the generated text from the end remains within
|
|
* the same char subset or goes beyond it and based on that either trims the string at the
|
|
* end or not. This allows to filter garbage at the end, including even if there are certain
|
|
* kind of small variations in the repeated text wrt position of seen chars.
|
|
*
|
|
* Allow the garbage to contain upto maxUniq chars, but at the same time ensure that
|
|
* a given type of char ie numerals or alphabets or other types dont cross the specified
|
|
* maxType limit. This allows intermixed text garbage to be identified and trimmed.
|
|
*
|
|
* ALERT: This is not perfect and only provides a rough garbage identification logic.
|
|
* Also it currently only differentiates between character classes wrt english.
|
|
*
|
|
* @param {string} sIn
|
|
* @param {number} maxType
|
|
* @param {number} maxUniq
|
|
* @param {number} maxMatchLenThreshold
|
|
*/
|
|
export function trim_hist_garbage_at_end(sIn, maxType, maxUniq, maxMatchLenThreshold) {
|
|
if (sIn.length < maxMatchLenThreshold) {
|
|
return { trimmed: false, data: sIn };
|
|
}
|
|
let iAlp = 0;
|
|
let iNum = 0;
|
|
let iOth = 0;
|
|
// Learn
|
|
let hist = {};
|
|
let iUniq = 0;
|
|
for(let i=0; i<maxMatchLenThreshold; i++) {
|
|
let c = sIn[sIn.length-1-i];
|
|
if (c in hist) {
|
|
hist[c] += 1;
|
|
} else {
|
|
if(c.match(/[0-9]/) != null) {
|
|
iNum += 1;
|
|
} else if(c.match(/[A-Za-z]/) != null) {
|
|
iAlp += 1;
|
|
} else {
|
|
iOth += 1;
|
|
}
|
|
iUniq += 1;
|
|
if (iUniq >= maxUniq) {
|
|
break;
|
|
}
|
|
hist[c] = 1;
|
|
}
|
|
}
|
|
console.debug("DBUG:TrimHistGarbage:", hist);
|
|
if ((iAlp > maxType) || (iNum > maxType) || (iOth > maxType)) {
|
|
return { trimmed: false, data: sIn };
|
|
}
|
|
// Catch and Trim
|
|
for(let i=0; i < sIn.length; i++) {
|
|
let c = sIn[sIn.length-1-i];
|
|
if (!(c in hist)) {
|
|
if (i < maxMatchLenThreshold) {
|
|
return { trimmed: false, data: sIn };
|
|
}
|
|
console.debug("DBUG:TrimHistGarbage:TrimmedCharLen:", i);
|
|
return { trimmed: true, data: sIn.substring(0, sIn.length-i+1) };
|
|
}
|
|
}
|
|
console.debug("DBUG:TrimHistGarbage:Trimmed fully");
|
|
return { trimmed: true, data: "" };
|
|
}
|
|
|
|
/**
|
|
* Keep trimming repeatedly using hist_garbage logic, till you no longer can.
|
|
* This ensures that even if there are multiple runs of garbage with different patterns,
|
|
* the logic still tries to munch through them.
|
|
*
|
|
* @param {any} sIn
|
|
* @param {number} maxType
|
|
* @param {number} maxUniq
|
|
* @param {number} maxMatchLenThreshold
|
|
*/
|
|
export function trim_hist_garbage_at_end_loop(sIn, maxType, maxUniq, maxMatchLenThreshold) {
|
|
let sCur = sIn;
|
|
while (true) {
|
|
let got = trim_hist_garbage_at_end(sCur, maxType, maxUniq, maxMatchLenThreshold);
|
|
if (!got.trimmed) {
|
|
return got.data;
|
|
}
|
|
sCur = got.data;
|
|
}
|
|
}
|
|
|
|
/**
|
|
* Try trim garbage at the end by using both the hist-driven-garbage-trimming as well as
|
|
* skip-a-bit-if-reqd-then-repeat-pattern-based-garbage-trimming, with blind retrying.
|
|
* @param {string} sIn
|
|
*/
|
|
export function trim_garbage_at_end(sIn) {
|
|
let sCur = sIn;
|
|
for(let i=0; i<2; i++) {
|
|
sCur = trim_hist_garbage_at_end_loop(sCur, 8, 24, 72);
|
|
sCur = trim_repeat_garbage_at_end_loop(sCur, 32, 72, 12);
|
|
}
|
|
return sCur;
|
|
}
|
|
|
|
|
|
/**
|
|
* NewLines array helper.
|
|
* Allow for maintaining a list of lines.
|
|
* Allow for a line to be builtup/appended part by part.
|
|
*/
|
|
export class NewLines {
|
|
|
|
constructor() {
|
|
/** @type {string[]} */
|
|
this.lines = [];
|
|
}
|
|
|
|
/**
|
|
* Extracts lines from the passed string and inturn either
|
|
* append to a previous partial line or add a new line.
|
|
* @param {string} sLines
|
|
*/
|
|
add_append(sLines) {
|
|
let aLines = sLines.split("\n");
|
|
let lCnt = 0;
|
|
for(let line of aLines) {
|
|
lCnt += 1;
|
|
// Add back newline removed if any during split
|
|
if (lCnt < aLines.length) {
|
|
line += "\n";
|
|
} else {
|
|
if (sLines.endsWith("\n")) {
|
|
line += "\n";
|
|
}
|
|
}
|
|
// Append if required
|
|
if (lCnt == 1) {
|
|
let lastLine = this.lines[this.lines.length-1];
|
|
if (lastLine != undefined) {
|
|
if (!lastLine.endsWith("\n")) {
|
|
this.lines[this.lines.length-1] += line;
|
|
continue;
|
|
}
|
|
}
|
|
}
|
|
// Add new line
|
|
this.lines.push(line);
|
|
}
|
|
}
|
|
|
|
/**
|
|
* Shift the oldest/earliest/0th line in the array. [Old-New|Earliest-Latest]
|
|
* Optionally control whether only full lines (ie those with newline at end) will be returned
|
|
* or will a partial line without a newline at end (can only be the last line) be returned.
|
|
* @param {boolean} bFullWithNewLineOnly
|
|
*/
|
|
shift(bFullWithNewLineOnly=true) {
|
|
let line = this.lines[0];
|
|
if (line == undefined) {
|
|
return undefined;
|
|
}
|
|
if ((line[line.length-1] != "\n") && bFullWithNewLineOnly){
|
|
return undefined;
|
|
}
|
|
return this.lines.shift();
|
|
}
|
|
|
|
}
|