PAIR ATTEMPT #3 -- THIS IS THE ONE BABY!!!! This implementation combines the best of 1 & 2 in that we resort to the existence of an atom. An atom can now have the property of being a pair thus solving the problem of too many data structures to keep track of. Memory management is reduced to freeing the (string | (car & cdr)) and the atom itself (2 items baby). Some credit goes to Doug Coker who, although used PASCAL as a reference, indirectly lead me to use unions in the atom data structure, instead of void *'s and way-to-many casts. IMPLEMENTED: quote ' car cdr eq + * eval define udefine quit atomcount tge count list cons if begin set! PAIRS should be the only internal constructors that increment BINDS var for each atom using inc_binds. internally, each time we make a non-pair, we should call inc_binds on it. REWRITE of binds construct successfull...implemented TGE passing to eval NOTHING returns an inc_binds atom. LAMBDA now relies on eval_sequence SET! WORKS NOW...(OHHH BAD! MUTATION!!) Need to make a primitive library construct so adding primitives is a breeze later on. LOAD implemented (within confines of my scheme implementation. Main problem is eval can only evalute one expression at a time and not multiple expressions. Therefore, the file one loads must be one big (begin ...) expression :( LET implemented...was in touble for a few seconds...then realized that eval let should just create a new expression which is returned, then evaluated then destroyed (then the evaluated value returned). I'm getting the hang of garbage collection and dealing with scheme/C spaghetti code. I forget why, but within CScheme, one must inc_binds before deleting the atom. COND implemented. I wonder if i should have made an eval_cond? Instead the code is found within the particular case within eval(). BUG fixed in load. Had to return dup_atom (return value) since it might rely on an atom in the expression loaded. The expression loaded is destroyed just before returning. {this concept is used heavliy throught the development of wscheme} FOUND bug with defines...couldn't bind something to #f since that would imply (with old implementation) that the search had failed. Solution was to return a list ie: cons (found_item, null) or #f on search failure. 848 lines: PRIMITIVE construct implemented. Sexy! A few commands still non primitive based since I'll have to move cases out of eval into their own functions. FIGURED out how to have functions return pointers to functions. WHILE developing word/first/bf constructs, realized i need to support n-arg lambda's. So did it. Ran into problem: WScheme destroys atoms that are used in return value:( especially with ((lambda x x) 1 2 3) SOLUTION: within eval...if symbol is a bound, return it atom_dup'ed. A FEW BUGS: needed to make quote return atom_dup when evaluated to fix atom leak when loading a .scm file. Also removed atom_dup on return value of eval_load. As stated in the preceeding paragraph, I've been going back to many of the core routines and returning atom_dup the result. Upon deeper reflection, it would seem only the most primitive elements need to be returned atom-dup'ed during the evaluation process. PRIMITIVES ADDED: even? % word first butfirst LENGTH now also returns length of an atom string. CTRL-D quit's nicely now wscheme. DOTTED PAIR consturct trivial to add. DATE added. Problem with C in that i wasn't mallocing a pointer to the time type. Therefore ctime was writing to randomly w_scheme space. MOVED all special case apply conditions into primitives. Found out i need to return atom_dup for primitive_car/cdr..since eval will destroy_atom values it passes to apply. There has to be one or two places only that require atom_dup...something to think about. Primitives that rely on evaluated data's existence should do this i suppose. STRINGS handled ie: (last "this is kewl") Created gets ie: (word "shroom screames" (gets)) 10/7/98 Bug fixed with set! implementation. This was found 1/2 way through the world 2.0.0 conversion. set! would loose an atom if you passed it an un- bound symbol. Had to destroy the atom that was evaluated and passed to eval_setbang incase it wasn't bound to an existing symbol. Thankfully, the atom count construct made it easy to deal with. 10/8/98 ATOM->STRING and STRING->ATOM perfected (for current implementation) I might decide to create a new atom type: symbol. It would help differentiate between strings and function names and vars. I should probabley mention this now. To add a type to the atom construct, one must add the proper type definition in the .h, create the appropriate predicate and constructor, and modify atom_dup and destroy-atom to support the new type. 10/9/98 I'm getting to carried away with this STRING/SYMBOL concept. I need to get rid of it in favor of scheme natural way of representing sentences. 10/10/98 Removed atom_dup calls on values returned by primitive_car/cdr. Why? I was having problems with set-car/cdr! in that they were mutating with copy's of atoms...which was defeating the purpose of set-car'ing with other existng atoms. Hit on an idea to deal with atom life spans. Inc_bind and Dec_bind within functions that use them. Must find specific functions that actually need this implementation. Strange...i removed all aforementioned constructs and everything seems to be working ok. I add a few logic checks within destroy_atom...mainly to change the type to a bogus value in case the memory was still being referenced after beeing free'd. Hoped to add a bit of efficiency by making a few funtions inlined pre- processor constructs. 10/11/98 Used emacs for the first time during editing and realized scheme does not support ALL whitespaces very nicely. Mainly tabs. Quick hack for now is to ignore them within primitive_load. Better hack to support them within read. Problem #5a. After loading a file with functions makers that contain functions...and binding symbols to the function maker evaluations then loading the function again...loose a LOT of atoms. 10/13/98 When creating a local environment, i was calling destroy_atom instead of destroy_tge on the temporary environment. I must have been on to someting since I'm having problems with environments and atom leaks. Solution, I hope is to have destroy_atom sense procedures and to delete the bound local environment first before tyring to delete the procedure. Realized that upon every invocation of a procedure, a tempoary local invironment is created then destroyed...this is so way not efficient. 10/14/98 Trying to get fancy with environment so that I don't loose atoms when environment rely on other environment. Didn't realize it at first, but one hack ended up not having a local environment for each application evaluation. The result was factorial always returning 1 * 1 * ... since it shared a common env for each itteration. The final calculation was seeing the last bound local var with was always the base case or 1. Need to remove define's need for set! since define works within local env's where set! may mutate outside of it's env. 10/29/98 Decided to implement symbol/string atom types. This SHOULD help in WorldV2 when dealing with keyboard bindings...since i'm having a hard time binding symbols such as * + " . etc... Deleted last reminents of atom_dup (was commented out anyway) 10/30/98 Re-wrote read construct. Now a bit simpler. Doesn't rely on first, bf of strings. Instead, I pass a pointer to the string around. This also fixed the problem of multiple expressions in a single argument to evaluate_and_ parse. It just evalutes the buffer until it's null. This also fixed the problem of scm files needing to be totally within a (begin). Primitive_random implemented. Works within confines of c's random (); Compiled with -Wall. It noticed an atom might not be initialized, which was the case with an empty lambda expression. Now it checks for an empty lambda body and returns null if true. primitives AND and OR implemented (special cases) 11/1/98 Bug in new read construct didn't skip over last ( in a dotted list exp. Read of Report4 spec. Need to implement a few more primitives..namely memq and assq primitivees. 11/2/98 Figure eval should take a list of expressions to evaluate, where it will evaluate each, simulating the stack in a loop, returning the first thing on the stack. 11/3/98 Got rid of eval_sequence in favor of eval's new behavior of requiring a list of things to evlauate (returing the last one of course). Stuck apply within eval, eval loops until a list of things to eval is empty. Tail recursion is handled by noticing that the last application is a user defined function and so resets the list to it's body and resets the env make_env (env, destroy_atom (old_env)) To deal with this env consturct, i inc_binds the env upon the first entry to eval, and dec bind it when leaving. 11/5/98 Append implemented. Why? Trying my damnedest to get tail-recursion to work. Eval has been the focus of a lot of work over. Eval is now going back to just taking the expression you want evaluated. It will then place it in an internalized list for evaluating. 11/6/98 Realizing more about scope and the difference between static and dynamic scope. I'd like to understand more dynamic (non-scheme) scoping so that I could implement it (it seems like it would have it's advantages). ICONS implemented as a #define (inc_bind and cons in one) Tail recursion implemented. Valid also for AND, COND, and IF Tail recursion still buggy. Implemented ATOM memory construct so atom leakage can be identified. 11/7/98 For-sure tail-recursion works now. It's great. Implemented talk-loop. Some of the biggest problems was, when an implementation worked, was pushing the correct atom's to the stack. Since I didn't encapsulate the stack construct, this was a great burden. IE: If was pushing the true-condition and not a list containing the true-condition. The first implementation of tail-recursion led me to deal with applications that needed their own environment. I had to include the environment within the stack construct otherwise 'recursed' function calls would lose the environment for the rest of the expression. All this forced me to implement an extensive debugging construct complete with color and atom creation memory/recall functions. Watching the debugging process makes me wonder if I could find a way to make the evaluation process more efficient. 11/8/98 Tail recursion bug found and deal with. I needed to inc_binds the ret value when popping the stack (since we might delete the return value!) 11/9/98 Error handling. Now evaluation stops when an error occurs during normal evaluation. Created ^C catching (which activates an eval error). Cored first time...seems ok now (doubt it). 11/10/98 Changing what a null atom is. A pointer to NULL; This forced me to re- think the eval logic. As it when through the loop, it depended on the return value being NULL. This broke when NULL also became a Scheme NULL. Now the login within the main while loop in eval is much cleaner. Spent a few hours tracking down what i thought was a bug with primitive_ load and/or environemnt. As it turns out, i left an old eval constuct lying around that assigned the return value primitive_error if it was C:TRUE This means if it were SCHEME:NULL, it would be over written while signaling an evaluation error. 11/11/98 Still can't track down the interrupt problem :( I might JUST have to implement GC. 11/14/98 Vector's under development. Don't know what functions to make soley scheme primitives or C primitives as well. The same goes for a few special cases like env. Implemented 'make-vector', 'vector?', 'vector-length', 'vector-ref', and 'vector-set'. 11/16/98 Replaced special case 'tge' with primitive 'env'. Implemented 'vector' as a primitive. Didn't rely to heavily on exisiting primitives. This hints at the idea of making primitives totally independent from other primitives, but still reliant on the exisiting C data abstraction. This, fortunately, has been the case. Implemented 'list->vector' 'vector->list'. Needed a recursive helper with vector->list. Dealing with vectors sure emphasized the fact that scheme is inately a simple language that really doesn't need a vector type. Only when one is concerned with list references in O(n) time does the simplicity of pure theoretical computing hinder progress. 11/17/98 Implemented 'integer->char' 'char->integer' but still within a string context. Characters are still assumed to be strings of length 1. Added a special case to primitive_display which first checked for a string and didn't display the quotes around it (which the normal call to dump_ atom would otherwise do). 'member' and 'assoc' implemented but doesn't work according to spec since my equal? primitive is more like 'eq?'. 1753 lines at this point 11/24/98 The read construct didn't handle ; (comments very well). The only thing that did was primitive_load. This was an invalid method since load would indiscriminately ignore anything after any ; up to a newline... including a ; within quotes. I now treat ;...\n as a white space character. All that was required was to add the extra line to skip_white_space (). Split evaluate_and_dump() into 2 functions. One which just evaluates a string and returns the atom, and the other which will call evaluate on the string, display the resulting atom, then destory it. This will be usefull for the C->wscheme conversion since i'll be needing to access a lot of scheme data to be used in strictly C code. For instance, if i need the person name, i can do a: *name = data_string (c1=evaluate ("name")); destroy_atom (inc_binds (c1)); 11/29/98 Major bug in the BEGIN wscheme construct. I wasn't updating the list reference in the stack construct (tail recursion). So when a begin statement was evaluated, it was skipped if it wasn't the only thing left to evaluate on the stack. During the world v2 development, i was using COND to avoid the bug. After an hour dealing with converting the rollcall construct, i decided to deal with it then and there. Fortunately, the extensive wscheme tracing output helped me trace it down with minimal effort. I just located the trace output where it was skipping over expressions and went from there. 12/4/98 Thought i could make wscheme more efficient by tweaking the eval process. Instead, lambda functions were being applied twice and not deleted when used directly in an application :( "If it ain't broke...don't fix it" Going to try and implement a construct that 'compiles' lambda expression. It will basically go through a lambda function, replace symbols that represent primitives with the actual primitive, then return the new lambda. Compile construct works! used like this: (define square (compile (lambda (x) (* x x The speed increase is phenominal! Problems which might arise is that i've overridden the basic symbolic nature of scheme :( Just realized that recursive functions won't benefit from this. Need to have it take a pre-bound function instead (or be able to at least). This might be better so that I could define evertying then compile last so cross referencing functions will still compile ok. 12/7/98 compile works just as well on bound functions as it does plain lamda functions. The only problem with bound functions was the evaluator tried to display the new compiled function...which causes a cyclic loop. Another problem was the garbage collector can't handle cycles in pairs :(. So compile should, for now, be used only on lambdas. 12/17/98 Came up with a GC construct. Have each newley created atom report itself to the GC...when GC is needed, it traverses TGE taking roll call. Any atom that it didn't traverse will be destroyed. Needed to get rid of inc_binds, dec_binds, icons, binds predicate. Anywhere i had an icons I probabley should have replaced with gc_register. Maybe not. Need to re-write destroy_atom so it destroys only the atom it's passed. Of course, everywhere i had destroy atom, i should go back to the original instantiation of the atom and register it. I might need a gc_blocking var to prevent registered C atoms from being free'd. Only gc_gc should call destroy_atom. Destroy first binding is trivial now. Removed atom_memory construct since my GC does that anyway. I'll keep atomcount construct...nice to know how many atoms are actually allocated and might be a usefull debugging tool. If things go as planned, not only will wscheme be smaller, but a bit faster since GC will only occur when it's needed. Decided NULL as an underlying C null is bad since the GC attendance sheet relies on filling open seats containing NULL as a value. During garbage collection, will reset the present flag after collecting since there will be less students to deal with (hopefully). eval_primitive was still returning underlying C NULL as a wscheme NULL :( Evaluate was also defaulting the return var t1 to 0. I'm probabley going to find a lot of non-properly converted NULL constructs :( 12/19/98 2nd Garbage collection construct pretty much stable. Based on one global roll-sheet and a global school that the gc works on. Anything I want kept alive i register with the school. This includes the tge and stack when evaluate is called. I can removed anything registerd with the school as well. GC is called and quiet students are removed. During evaluation, GC is called after the stack is touched (before each itteration). Still a problem with let's it seems. Memory is being overwritten and the SHIT HAPPENS (which should never be reached in the evaluate procedure) message keeps comming up. 12/20/98 The problem wasn't with let. Once again, it was garbage collection not doing it's job properly. Vector elements were not being counted during roll call so were being free'd. World, which uses them heavily, was going crazy. Don't know why when experimenting I figured it was let problems. 12/24/98 Major problem in GC implementaion solved for now. When eval list was called, it would call evaluate on each member in the list...since it was recursive and eval might call GC...things in the newly constructed list were obviously being destroyed. Solution was to build the new evaluated list in a 'classroom' then to return the class room. Eval list went from a recursive function to an itterative...but that's ok i suppose. I suppose most of the recursive C functions will have to be converted to itterative ones to take advatage of the GC implementation. Intersting point I've made here. The underlying code, even though its creating a recursive language, itself must be itterative. Otherwise, wscheme will be taking from the underlying C which i'm using. This might make conversion to assembly easier in the long run. I'll be thinking about a simpler evaluation implementation that also takes advantage of the stack-ness of the evaluation process. I'd rather have one common stack that's used to pass values around. 1/14/99 There need be only one NULL in TGE so I'll be weeding code for NULL down a bit. 3/4/99 Implemented a weak quasiquote that will evaluate any unquote within the first level of the list. This should really do a tree traversal where each quasiquote increments a counter and unquote decrements it. Only one NULL exists in entire scheme space. 3/7/99 Quasiquote and unquote work to spec. ,@ not implemented. Allow function definitions using (define (f)) syntax. 4/14/9 Attempting to fix random crash bug associated with the interrupt occuring during sensitive underlying scheme operations like pair mutation. Made all tmp vars in constructors static. 4/16/99 Can't find any reason for random crashing and evaluation errors. Removing ability to destroy_first_binding since i'll be including 'compiling' with the evaluation process. Scheme will mutate lists that are functions by replacing the symbolic name of the function with the actualy function. This is done currently with a separate function that replaces all bound symbols with their actual value (this is bad in some instances and the code for it is quite messy) Might tweak GC construct so that it consits of an array of pairs. That way I won't be creating/destroying pairs all the time. Instead just reusing old ones. I'm sure this will speed things up tremendously since turning off GC made quite a difference in performance. 4/17/99 GC construct now based on slots of atoms. The atoms never die...just become available for rebinding. If more atoms are needed that there are slots, a larger array is realloc'ed. The global structure will be called the "Bin Array" or bina. If an atom will be available for reuse when it's type is AVAILABLE Last working on bins_create and weather the arrary should be an atom_t or array of atom_t's. I think it should be an array of atom_t's since realloc'ing would physically move the atoms which is bad. Of course, this would be OK if atoms representing pairs didn't contain the pointers to the other atoms but offsets from the main head atom. I belive the GC in the purple book use this method (vector). Any atom though would require knowing the first one...any reference to an atom from a pair would require this. I will proabably not go this route. 4/19/99 After rewriting GC construct, realized it was mostly needing to search the entire bin of atoms that took most of the time. Also, the act of GC'ing was expensie (it was happening on average of 5-10 times per tge evaluation). By increasing the bin size, a bigger speed increase was gained since, i assume, it didin't have to GC as much. The GC counter would increment on average of once every 8 tge evaluations. Accidently left residual free()'s in make-port constructors that were free'ing atoms in bins. 4/20/99 Fixed a few minor? bugs. Catching SIGHUP, wasn't resetting the default handler back. So I had to do so explicitly. Anyone who quit un-gracefully would call the handler infinitly. Also, user's files, when saved during a SIGHUP, wereN'T BEing closed. I had to force a GC so the file pointer within wscheme would close it properly. 4/22/99 Major clean up of code. Solidified abstraction layers quite a bit and found where I could create local var's of atom structures (like symbol names) to speed up environment searching. 4/23/99 Realized that during the eval process I was handling non tail recursion when C could be doing it. Able to simplify the manual stack in the eval function to just a list of expressions to evaluate and the environment. Non tail recursion just calls eval again. Currently, I'm converting blocks into lambda's which are then evaluated in their own envirnment. This isn't very efficient since the env it is evlauted in is basically the same as the one that creates the new lambda. Am having a difficult time pin-pointing weather begin creates a new frame or not? If it can be defined in terms of lambda, then it should. Otherwise, it's a construct that 'splices' itself into the block of code in the env it resides in. IE: (lambda () a (begin b c) d) -> (lambda () a b c d) but only after evaluation since (lambda () a (if p t f) c) -> (lambda () a t c) | (lambda () a f c). Still don't see an easy solution to preventing GC on working atoms. One idea is to create them with their presence marked so the first sweep will accept them. If I get rare errors, then doing this should make them even more rare. Gave newly requested atoms a chance at staying alive by marking them initially as being present. 4/24/99 Dealing with the rewrite of the evaluator and GC is a pain in the but. Many bugs are cropping up so am spending alot of time creating abstraction. Have implemented a stack construct that pushes/pops the first element in a list. Cleaned up abstraction of each atom data type. Differetiated between external and internal pairs (no checking is done internally for the validity of a pair). Big problem dicovered when i realized that newly released atoms shouldn't be set as present, since any other atoms, ie list members, might not get roll-called. Basically the environment would slowly dwindel after every GC! Hoping to take advantage of the new binding abstraction by replacing symbols with their actual binding. This should speed things up trememdously. I hope, once procedure symbols are compiled, to do the same with other symbols as well and possibly implementing environments as vectors. This might require a re-write of the environment construct since env's are stack based (first items in are last to be found). 4/27/99 Broke up scheme.cc into 2 other componets: gc and prim. GC contains the core garbage collection routines and prim all scheme internal primitives. Since everything relies on everything else in some ways (can be as little as a primitive displaying the number of allocated atoms via a global var) the split wasn't very clean. Finaly worked out most bugs so that world works. The speed increase after implementing function application binding replacement is phenominal! Bug showed up after one world session. The bindings for keys were being corrupted by eval..since it was mutating the lists (representing expressions). Thus (walk 1) was being saved back as (<#binding walk> 1). Solution was to hid the binding compile construct from the user so that bindings, when displayed, evaluate to their bound var symbol names. Big problem with automatic compiling of functions. It does the same for bound variables. This is bad for local vars. 5/18/99 Can't compile anything that isn't found in the very top level. Also adding internal primitives (primitives that take unevaluted arguments) to the normal primitive pool in the environment. This might help in creating a hacked macro language. Realizing much about internal primitives. When bound to symbols, they don't need a defining environment as do normal evaluated lambdas. They also directly alter the stack (in my implementation) but not always! Do I assume that internal primitives will? Messing with scm i realized they do compile some things such as let from (((x 1)(y 2)) body) -> ((x y) (1 2) body) which makes a lot of sense (don't have to map-car or map-cdr the list of bindings. 5/20/99 Slight problem with makeing special cases internal primitives. I was using a function called pop_then_push which mutated the first element in the stack data structure...unfortunately this had the side effect of mutating code ie in an if statement (tail recursion would replace the if statement with whatever condition occured). I was getting endless loops and couldn't figure out why. Now attempting to implement environments as vector based BST. make_env will create a vector with one element. Must be passed SNULL (for TGE) or another env (for a frame). Currently, just replaced the linked list structure of the env with a vector structure. Problem now with dissappearing elements in vector. Forgot to tweak GC_roll_call to recognize environments as vectors as well. 5/21/99 Most big changes I've done to world have been virtual textbook abstraction cases. The recent one involving changing the environment abstraction went pretty much smoothly. The only kink was my weak implementation of the atom which involves tweaking more than one function in multiple places to fully implemnt it. In switching an environment from a list to a vector, I missed editing bins_roll_call (which noticed an env and treated it as a pair :() Besides that, I needed to touch nothing but those parts that involved low-level atom functions. I would like for each component (gc, io, prim, scheme) to be completley modular. Another great case was re-implementing the garbage collection facility. Because of the atom constructor/destructor abstraction, I was able to comingle it all into a facility that paralells the malloc/free abstracton. insert_binding now inserts into order. One step away from Oln(n). Removed a comment from get_binding that stated, "return null or binding or env". Env since I was trying to be slick and compile on the fly. Returning an env would mean I didn't find it in the local environment and to 'compile' the next thing i found in the upper env's. This of course turned out to be invalid. The only thing I can truley compile are bindings in the TGE. 5/22/99 Wonder if I should do away with the the 'compile' construct i'm currently trying to implement (replacing symbols with it's TGE binding, and instead focus on the general case of envirnment offset. Trying to track down an eval bug that occurs during interrupts. Seems if a GC occurs in the middle of an eval, important atoms (like proc) get cleaned up. I need to either gc_add_classroom every important C atom or workout a decent abstraction. Maybe create a local group of classrooms and alias the C atoms to them temporarily? In the mean time, I made a few primitives behave according to my current GC implementation. I mainly got rid of recursive maping type functions (mapcar, map cdr)since they could be GC'ed at anytime (like during hte actual contruction of whatever list they were making). Attempting a 'reference' construct. A looked up binding will return a modified binding where the name symbol is replaced with a reference. A reference will 'point' (via internal level and offset values) to the binding directly. Level will reference how many env's to traverse up and offset will reference the item in that particular env. An offset of 0 means to actually perform the search in that environment (since TGE will can be mutated). What i need to do to insert new atom types: create global preprocessor value mutate atom struct definition mutate atomis add predicate, selectors (error checking and no error checking) mutate dump_atom mutate destroy_atom mutate bins_roll_call Created once again a new atom type: a pointer to another atom. I'm using this mainly for the GC construct. I'll easily be able to protect C function atom's from the GC. Forgot to add special case to bins_roll_call. Was getting errors after GC since it wasn't traversing what the pointer was pointing to. After a bit of beta testing, come across the fact that cond is not tail recursing very well. Came up with this by implementing 'bins' which returns the head atom of the GC construct. I used loopn (based on cond) to print the length of (bins). It grew when loopn was implented using cond and didn't grow when i used if. 5/23/99 I had a feeling not initializing C atoms would cause a problem later when GC tried to roll_call on a pointer atom. I need to set all C atoms to SNULL before I call gc_add_pointer. 5/28/99 Implemented reset-mov/msg that resets the ipc file pointer to the beginning of the file. Noticed that as the ipc handler did it's thing, it gradually slowed down. I belive it has someting to do with tail recursion not occuring within a let since a tail call is found within a begin within a let inside of handle-message-ipc. Therefore I need to work on tail-recursion for let. Fixed. Problem was for some reason I evaluted the new lambda expressions rather than just pushing it back onto the stack? Must have been residual from an old implementation. 5/31/99 set! wasn't reporting an error when a binding cound't be found. This presented a problem during the scrolling map conversion since half way through parsing world.scm it would die and not evaluate the functions i was working on. I wonder if I shoudl handle errors a bit more gracefully instead of just bombing back to the top environment with an error. 2/2/99 Taking a big leap attempting to implement the standard number base as hex. Construct is as follows: any valid b10 number is taken to be b16. Anything that starts with 0 is ASSUMED to be a hex value. This last definition was needed in order to keep valid variables such as 'a' or 'bf'. I believe a big performance hit occurs with my heavy use of strings and append- string. Since each append requires I know the length of the string, some algorithms will end up in the On^2 family. It might be best to use a custom string library that keeps a char count. Found a comment near the end of bins_request that suggested setting the present flag to true. This would be bad since upon the next GC it would see that the atom was already acounted for and skip it. BUT if it were a cons pair it would not traverse the possibly newly created atoms it was composed of. I actually tried this to my misfortune. Bug in dump_map (). Didn't initialize last_attr to "". For some reason, after all this time, it finally showed itslef? 6/3/99 Idea: Where I don't really delete atoms from memory, i might not destroy atoms logically. Problem with (= x 0fff) since 0fff is the actual string representation. Solution is to hacking in (= x (+ 0fff)) which is inefficient. 6/6/99 Created substring-set! muates a string by overlaying a string ontop of another ie: (substring-set! "meow this" 1 "xyz") -> "mxyz this" This substring-set! concept is a pain in the but. Everything ends up being a refrence to the same string. Doug recommened an enhancement for the GC whereby I use a stack of available atoms rather than linearly searching the array for a free atom. Realized I don't need a bin location in the data structure (even though I removed it a while back) and can instead rely on pointer arithmatic if i need a bin slot number. Oops. Actually I can't since there's no way to corrolate an atom's memory position with it's corresponding pointer in the bin array. Can replace bins.used (count of used atoms) with bins.top. Using a stack along with mark-sweep results in only 28% gc usage. I belive i cut down the lag by half. Going to get rid of my tempoary substring-set! hack in favor of the officla string-append and string-copy primitives. Wasted a good morning tracking down a bug during the doug-stack-hack. I had cut and pasted the line that allocates a bin array for the stack but didn't change the realloc paramter bins to stacks so I was getting core's when trying to free bins then stack (or stack then bins) since it was already freed?! Need to implement list-copy in order to further the compiling construct. That way I'll be able to safely mutate procedures (bound or not). Realized i didn't implement append correctly, it actually muated the first by set-cdr'ing the last elemnt in the list with the next (didnt' even append over a list of lists!!!). Fortunately the behavior of append (creates a fresh list that shares the last element only) gave insight into it's implementation. I belive primitive_append is one of the only primtives that recurses on itself? wc: w_*cc w_*h -> 3618 (amazing) 6/8/99 Found out, after looking at and messing with C and the C++ info pages, that unary -x is the same as ~x+1. Also will be using long long as a base for num's in wscheme. Implementing internal numbers now. I hope to gain some performance. Noticed I'm not protecting atoms in the parsing construct. Must deal with it. Reading the spec, I didn't implement = > < etc correctly since they officially will check N args. Imiplementing number->string and removing my hex->dec dec->hex. Both number-> string and string->number will convert bases 8 10 and 16. The interpreter also recognizes #o #d and #x. Problem with 0 length vectors. 6/16/99 I wonder what the performance loss occurs with make_vector always setting each element to (). Implemented #(...) (external vector parsing) as well as homogenizing # parsing. Was easier than i though since #(...) is just syntatical sugar for (vector ...) so it was a matter of just returning (cons 'vector (read-list buffer++)). 6/18/99 Implemented open-io-file which behaves like open-input-file only you can also write to it. Funny that the aparent reason nothing like this is implemented is it would stray from the 'functional' reasoning (mutation is bad). Re-implemented make-vector to spec (takes a 2nd argument which acts as fill). 6/19/99 After some midnight pondering, realized that the reason wscheme was spending so much time dealint with garbage collection was becuse it had a lot of garbage to deal with. My evaluation process used atoms in a big way. As a temporary fix I've placed some GC power within gc_add_pointer construct (which is used to protect atom_t's from the GC process so C functions don't loose their variables) The speed up seems quite huge. gprof reports that eval() is spending as much time as most of the other top functions (expect bins_gc which still is around 20%). This means I can now go back and focus on streamlining the eval process. Let is now compiled internally into a lambda. Still have the side effect of mutating things that shouldn't ie: (eval '(let ..)) will mutate the list :( This is an annoyance when: (define x '(let ...)) (eval x). Removed gc flag (since i was dealing with it on each atom_request). Will check for gc during actual gc call. wc: w_* -> 3643 An idea is to replace not only procedure application bindings but the arguments as well. Need to also implement letrec (which won't break the enhanced compiling construct). letrec should allow me to also implement defines in the first part of procedure bodys. I might not have to worry about protecting tempoary C atom's from the GC since the atom count ceiling is high enough (1024) to allow for spurious atoms to exist before a forced GC. I need to come up with a situation that would force this to break, possibly reducing the window to 1. Trying: 6/20/99 Another atom saver involves eval checking first for self evaluating expression since those are very common and can be evaluated without the need for using extra atoms. Noticed a very big speed increase! 6/23/99 Deleted unused appendstr function. Apparently the wstr construct is a hit (a nicely abstracted string library which keeps track of the length of the string a la pascal). Successfull in full compiling construct. Small problem I had where I didn't check that I was to replace a 'symbol' with it's reference. I was replacing everything that eval_list was comming up with which messed up most functions. I only found one place that needed deep_list_copy which is primitive_eval. I have yet to see any problems althogh somewhere in the specification it says to copy a list before evaluation (this might actually be the evaluated list of arguments which is created into a new list anyway). Don't know if i mentioned it before but improved internal atom protection by rewriting gc_del_pointer since I push internal C atoms in a stack-like order I based pushing and popping C atoms on a stack construct. All i really needed to do was remove the code that searched for the atom on the stack (which was going to be the first one anyway). I also immediatly free the atom instead of relying on the GC. 7/4/99 Not dealing with zero length vectors has finally caught up to me. Took just a few minutes but when the GC goes to deal with vectors, I would get a vector_ ref error message. This came about after implementing the args construct (which is bound to a vector of strings matching the arguments passed to scheme). When moving in world, thus forcing GC's more often, I was getting the error since args was #(). I had to add a check that vector > 0 in destroy_atom. Might have to redesign the way i evaluate a stream since now it takes a null delimted string. Officially, it should take a pointer to a stream and error if an EOF is reached before a complete expression is read in. If it behaved this way, implementing (read) would be trivial. 7/11/99 Decided to create a page dedicated to WScheme and Scheme in general. Reading over Jaffer's 5d0 update page realized that my 'compile' construct was more like a memoization. 7/16/99 Going to get rid of environments being a sorted vector of bindings since the memoizing? construct is a bit more powerfull in the long run. Shouldn't call it memoize since another def for the concept is tabulation which i'm definately not. It truley is a form of compilation (or optimization). Thought i was being slick trying to make more efficient the compiling process by actually mutating symbols into the binding/reference. Unfortunately this bypassed the eval construct (which shouldn't exist anyways?) since eval will only deep copy the list. I might have to go an extra step and abstract symbols during the evaluation process as well so i can mutate lists (thus preventing the mutation of quoted lists and such). Did a prof on the non BST environment and get_binding went from .26% to %26. Seems i'm relying too heavily on dynamic bindings in that i'm always evaluating quoted lists. I should maybe deal with this issue...possibly having world not rely on the evaluation of a quoted list and have it actually bind it to an internal procedure. 7/17/99 Removed (write-char write-string gets) in favor of a real (write display read- string) primitives. Added boolean construct along with making globals STRUE, SFALSE. Wscheme is now as fast as scm when it comes to evaluating : (define (f x) (or (= x 0) (f (- x 1)))) (f 65535) (pas 8 16) WSCHEME:13sec SCM:15sec A major slow down was wscheme always creating the strings #t and #f. I should also find for other areas where it's creating/destroying the same thing over and over. Removed comment in dump_atom_helper that noted we must display a binding as a symbol. This need no longer apply since I deep_copy a list when (apply) does it's things. Process of removing string->atom atom->string (maybe?) Wed Aug 18 02:15:46 1999 There might be room for code clean up especially with the compiling process. I should let the evaluating mechanisim do most of the work. IE: when i compile a binding, i immediately assing the value to proc. Instead i should let the loop continue the process again thus letting proc be assigned eventually. Mon Aug 23 23:44:49 1999 In my attempt to make more stable wscheme's unstability after a pre-empted GC, i'm realizing that to keep a 'thread' connecting each C call throughout the evaluation process is difficult. Each C call needs passed to it some sort of reference to an atom that will be protected. This means i'm going to have to tweak every primitive to accept a protected reference to an atom. :( Wed Aug 25 18:33:24 1999 Will not be able to create a thread back to make_atom. Thread conversion some- what complete. A few primitives are broken, such as append, but can be re-writ- ten to take advantage of a more functional approach now. Most of the primitives were composed of compolex loops that 'attempted' to keep every C variable protected. This proved to be messy and un-correct. The whole goal is to create a working scheme environment that can withstand multiple threads (if i understand my OS theory) on the same environment. Once that is done I might have a usefull engine for other applications. Things I still need to make thread friendly are most of the constructors. Although the moment of atom creation might be impossible to protect. A solution might be a locking construct that can lock/protect a newly created atom while it's being made. Sun Aug 29 11:51:08 1999 It would seem many steps in the C evaluation process are based on single paths of return values. Keeping this in mind I'm passing references to a protected atom all over the place. Extend_Env now will mutate a reference to an atom with the new environment. This isn't a problem with: eval (exp, extend_env(env, ret), ret); since eval will bind the new env to a protected local stack and thus not conflict with it's own use of ret. What would be bad is: someting (someting (ret), someting (ret), ret); It would work, since ret is a pointer to an atom, but either call would not get a protected return value. In order to prevent this from happening, I must depart totally from the functional'ness of functions. Most of them will now return void :( since i'll be passing references to atoms which will expect to be mutated with the result. The advantage is speed since I won't be using up atoms trying to protect them, which will trigger the GC less. Eval list now returns void. Since this was a highly used function i might reap a nice performance benefit. This goes for all the functions which returned something. Of course, they are passed one more atom now anyway so it might cancel. TO DO: mapcar, mapcdr currently threading cons, but realizing that i'll have to tweak the parsing construct :( thinking of going back to functional paradigm in favor of an interrupt queue. Sun Sep 12 04:06:01 1999 I'm going to enforce that every atom be protected when passed. The expection of course will be with constructors calling GC functions. Becuase of my work trying to make sure anything will survive a GC at any interrupt call, I will be able to put GC control where it belongs (within the atom_request function) and not in the normal code. I'll also be moving direct interrupt control from C to scheme. C will place onto a queue the string to evaluate. I'll need to think about weather it should be a scheme data structure or C or C borrowing from scheme. It looks like i might have to separate C interrupts from scheme since they could occur in the middle of a GC. Scheme and it's internal GC should be able to run whenever it needs. I might still hold onto blocking vars just in case. Wed Oct 6 18:55:14 1999 Renamed destroy_atom to free_atom which describes it's behavior more precisely. Added a new type, protected, which represents the atom's state as it's passed to an atom constructor by an atom_requestor. atom_request will return a useable atom from the GC's heap. If all atoms are used, the heap is grown. Weird thing came up (again?) dealing with interrupts. Seems when an SIGALRM occurs during a gets(), then gets will return NULL :( Very strange. Thu Oct 7 12:45:02 1999 Select looks interesting. Since i'll be dealing with timers within scheme, (basically creating the scheme equivalent of setitimer/signal) will also need to look into signal which will be used within functions that wait on streams. Otherwise, might (or might not) GC when i'm not supposed to. Quite a complex solution for a compolex problem. Although i'll be able to simplify most of the scheme code. Tue Oct 12 23:58:04 1999 Realized another problem with interrupts. How to handle an error either within the interrupt evalution or the current evalution. Fri Oct 22 19:41:11 1999 It occured to me a continuation is just the evaluating stack in question. Passing it around would make sense since implementing call/cc would be trivial. I had initially implemented eval to do just that, crunch over a stack (the expression) until an atom remained. I then migrated much of the logic to normal C function calls. IE: eval will call itself during a non-tail recursive call. I kept the internal stack to implement tail- recursion. Looks like i'll have to implement the evaluating stack entirely now. Fun fun fun!! Tue Nov 2 00:36:25 1999 Need to implement a push/pop frame for the new stack/continuation data-struct. Will streamline the code a bit it would seem. This will be used by internal primitives as well to manipulate the stack. The stack will be made up of a list of frames (that are pushed and poped as needed) or the value to return. (either to the proc that called eval or the next frame. Still don't see why one would pass around an env, exp and a continuation when the continuation (stack) will contain all 3 anyway. So am sticking with this construct. Wed Nov 10 00:10:22 1999 Going to upload the current state of wscheme Fri Jan 21 12:06:57 2000 Bug, since day one, dealing with passing of references to pointers. I was getting strange compiler warnings, someting about passing left values?? It turns out I was passing [x] to functinos expecting char *, which i suppose is bad. Also, I was passing [x] to char *& which was MUTATING!!! the var in question. If that was bad enough, I changed the [x] to a char * which helped me realize the issue. I was mutating the char * references at that point. When I went to free them, crash! It would seem C doesn't like [] variables being mutated. Wonder why? Actually this wasnt a bug since day one, but for some reason a recent?? change brought out this quirky behavior. Ahh yes, I changed the ipc in favor of custom strings to normal external scheme expression. With that came ipc-*-gets calling read and passing the buffer string in question. The read facilities will mutate the buffer variable :( Once I switch over to an external lexical analyzer, I'll be home and dry. Fri Feb 11 01:57:05 2000 Have been working on V3 for a while now (mostly on paper and in head). I need to spend time working out the semantics of the actual C code. In the previous versions, everything, including what was really scheme syntax, was mutable. One could redfine 'if' and 'lambda' with dire consequences. Keeping this in mind, I've been spending a lot of time with the R5RS specification trying to use it as a guide for the new evaluator. (Primitive-Expressions/Syntax/4.1 (variable, literal, combination, procedure, conditional, assignment)) I suppose bison should return return these as special lists. These will then be compiled into someting? Actually, these 6 concepts compose the entire scheme syntax. I'm willing to treat define as a assignment i suppose. Perhaps vice versa? Wed Mar 22 07:43:28 PST 2000 V3 garbage collection thoughts. Am attempting a generational mark/sweep. The main issue i'm dealing with before generation hacks is protecting atoms that can't be inserted into the HEAD list. The parsing process is the main bain. Solution seems to be a temporary 'extra' mark value or 'color' to give any passed out atom. They will survive a sweep until an unknown point in the future. The main issue is how to implement the extra mark value and how to remove the mark when not needed, ie: after the parsing process. Overall, i'm separating even more the modularity of wscheme. I'm trying to keep heap, atom, parsing, and compiling/interpreter separate modules. This is somewhat difficult as the the generational algorithm will require atom mutators to talk to the heap module. Specifically it will need to add the mutated objects to the HEAD list. Since this project is mainly for compiler design research, I really should be putting more attention into debugging details. As it stands now, i'm only hacking in debugging code as i need it. I do have a few data structure renderers, such as a HEAD list dumper and HEAP dumper. Perhaps a separate module that will provide a nice interface into displaying such data structures would be usefull! Hmmm.... I've completley removed any internal memory management details from the atom module. Before an atom could have various reprsentations including atom pointers, used for the 1st memory management schema, bindings, env, etc. This might not be the case later. The biggest issue i'm having is linux seems to foce structures to be long world aligned. The atom struct, although should only by 9 bytes long is 12. I'm taking advantage of this by creating 4 generic byte fields for typing and marking. Might take on a phylogenic classification system when describing and implementing atoms. A lot of redundancy will be removed hopefully since an pair was also a procedure, binding, and a pair. It looks like i'll have 2 char fields that will represent the objects genus and species, heheh. Wed Apr 5 21:26:52 PDT 2000 The 'heap' abstraction, being one of the lowest ones in the wscheme environment, should be clearly defined. It should allow the passing of an available atom, and allow one to add/remove 'head' atoms, those which will be guaranteed to be seen during a the mark phase, should a mark/sweep generator be used. If this were a stop/copy, one would still need a list of head atoms anyways. Theoretically, the GC would just go to the stack and registers (the stack would contain pushed registers belonging to continuations anyways, so maybe registers would be overkill as the GC process would be in affect). The GC should also have the ability to recognize atoms that point to other atoms. This means it would need to work closley with the 'atom' abstraction. This blows. This implies an atom/heap abstraction would have to be built instead. In theory, i could pass the GC a function that would help it decide what to do. The function passed woudl itself need to knwo how to call the GC. (strange loops) All these details are actually just part of the final compiled code. The compiler will know when to register an atom or to protect a set of new incoming atoms. Mon Jun 12 11:24:27 /etc/localtime 2000 Although the 2nd attempt at a cleaner mark sweep collector (with the hopes of converting it to a generational) was somewhat successfull, it allowed was a bit more complicated than the latest compacting collector. Since I used flex/bison to parse scheme, it needed the GC to behave a certain way wile parsing. Namley to not GC. That was the easiest solution next to writing my own parser. The compacting collector I've recently created is simple. You can give it registers (pointers to objects) which is uses as the head pointers and you can request either an array of bytes or vector of pointers. Although i don't think there is lexically any difference between the words 'array' and 'vector', I use the former when refering to a linearly referenced set of bytes and the latter when refering to a linerly referenced set of pointers. Each object can also have bound to it a 7 bit 'type' field for use by the users. With this simple abstraction, many restrictions imposed on me with the MS-GC have been lifted. I've been able to abstract all scheme objects without needing to have the GC aware of them. Example: a 'pair' is an object-vector of length two whos type is a value different than a 'vector' (which itself is based on an object-vector). Strings will be a vector containing the length of the string and an object-array (containing the actual string). This has the advantage of giving the string enough room to grow and shrink since the entire buffer length is known as well as the length of the current string. [number]< [string] >[ary ] A string of length 3 but with a buffer size [ 3 ] \-{---* ] / [123.] of length 4. [ *--}- [....] The issue I had to deal with now was how to utilize the GC-heap with bison/flex. I belive that the only way is to forgoe bison and write my own LL (top-down) parser. I will pass the parser a register, which it will bind to either a pair, vector, or atom. It will then, if the object is a pair or vector, recurse, passing the car or cdr to itself (as a register). The vector will be a bit more challenging, possibly requiring it be parsed as a list, then converted to a vector as the last step. Eventually I'll have to deal with actual compiling of the code into something very efficient. Although still interpreted, I hope it'll be a bit faster than v2. I ran across a nice introduction to the abstract stack machine that many compilers translate into. I was thinking the same thing as emulating a basic CPU would make the most sense of any translator for microcomputers. Thu Jun 15 21:21:21 /etc/localtime 2000 GC abstraction seems to be working well. Originally didn't allow for zero length arrays or vectors (since there need be enough space after the desc field for the heart). But decided the best way to handle them, although useless, is to check, before creating and copying during compacting, for zero length and to allocate enough space for the heart. Implementing a top down parser, although took a few days to get right, was pretty trivial. The key was to keep track of weather it was reading a list. That was solved by passing a islist boolean to the main parser function. Notice that I try and solve a lot of problems using the hidden state of function calls. This MIGHT be a result of my pursuite of functional language implementation as I'm aware of whats really going on. Why use an extra variable or ADT when a function call keeps track of state quite nicely. I was about to attempt to use bison to parse bottom down, but I knew the function to do so would be somewhat trivial. Wed Jun 28 19:39:49 /etc/localtime 2000 Decided to move the ID field in objects from the front of the object to just before the object. This I feel should speed things up a bit since many objects are singular and don't need the extra header-offset calculation to get to them. This is how malloc does things. It must be a good thing. Thu Jun 29 10:18:21 /etc/localtime 2000 Hows this for an auto-sizing heap. Every time I need to GC, the new compacting heap is malloced ORIGINAL_SIZE + GC_HEAP_INCREMENT_SIZE. If there ends up an increment size available, realloc it ORIGINAL_SIZE-*GC_HEAP_INCREMENT_SIZE. Yea?! STATE: (a) (b) (c) (d) (e) HEAPSIZE: [100] -> [110] -> [120] -> [130->120] -> [130->100] a: new heap, do a bunch of stuff b: GC occurs, keep new heap size since there wasn't enough garbage c: ditto d: there was enough garbage to warrant no increase in heap size e: there was so much garbage, we could shrink heap to original size I like this. The top down approach for parsing was bad. If a GC occured during parsing then, the local object pointer would still be pointing to objects in the old heap. The ONLY way around this is to simulate a stack machine by rewriting the parser recursively. It will then have to push the local object in question before calling itself. Since the GC in it's simplest form only needs the registers and stack to do it's job anyways, it makes sense that anything I do will eventually come down to me emulating such concepts. Might as well not be doing it in C, or at least in C but just a big loop that operates on my beginning of wscheme byte code. Sat Jul 22 15:24:13 /etc/localtime 2000 Successfully implementing a stack-machine based interpreter. v3-02.tgz. Realized an issue, or should i say re-realized, with my heap implementation. It works great, [[[ I can pass ATOM's around like crazy resulting in GC's no affecting it. But once I pass a pointer to an object, the GC has no way of upating that ATOM pointer during the compacting phase. I suppose it's not too functional doing what I did but I needed to mutate the 'args' variable which was a list with an extra reference to the last pair in the list (for easy appending). Will have to make it a pair all the time. ]]] Damn. Just passing an ATOM to a function could result in lameness :( In fact it was CORRECT to pass a reference to an ATOM. DUH! That way, every time it needs to get the ATOM value, it will have to indirectly access it. To make everything correct, I first had to assign a registered variable the value in question, then pass a reference to that variable. Mon Jul 24 19:57:44 /etc/localtime 2000 I'll be stopping the current design now in favor of a stack machine. I'm planning on re-writing the heap implementation. Looks like it will consists of the compacting heap, a few registers (32 or so) and a stack. The registers and stack will replace the lame 'register an atom' concept since the comapctor will now first scan the registers then the stack. The interpreter will then basically be hard coded byte code (once I get that all setup and abstracted). Each time i've implemented the interpreter/compiler, it's been ugly. Especially when it comes to making the code behave with the GC (ie: worring about passing variables that are atoms that might be collected, thus moved in memory. even passing references wasn't good enough since the references had to be registered atoms) I'm hoping all this work towards a stack machine will simplify everything. Mon Sep 11 17:50:47 /etc/localtime 2000 Going to re-abstract the memory structure of the scheme-machine beginning with the stack. I'm bending towards creating a hybrid stack/vector type of object. It will be mutated via push/pop and viewed with an offset selector. IE: stack_push(stack, r1); stack_pop(stack, r2); stack_ref(stack, 5); for (i=0; i 804e568 #01:1(11 ) 804e570 #02:1(ff ) 804e578 #03:1(00 ) 804e580 #05:8(10 00 00 00 00 00 00 00 ) 804e58c #05:8(05 00 00 00 00 00 00 00 ) 804e598 #05:8(15 00 00 00 00 00 00 00 ) A quick snapshot of the external representation of the scheme-machine. Note the 7 registers and 7 pointers. Register 7 is the system stack. Pointer 7 is the stack pointer. Keen, no. Vectors are represented with brackets and arrays parenthesis. The vector pointer values are displayed while the array byte values are displayed. All objects on the heap are displayed consecutivley after the the registers. The first heap object, in this case being the stack, is 4 bytes after the first heap address. This is due to the object descriptor existing just before the actual object data. In this case it's 4 bytes. For each heap object is displayed: it's address, descriptor value, object length (not in bytes), and either the address of the vector elements or value of each byte. All objects will be long-word aligned. So arrays with an odd length will create 'holes' of bytes between it and the next object. Fri Sep 15 00:23:39 /etc/localtime 2000 At the point again where I have a decent library implementing a scheme-machine. Thu Feb 1 18:52:43 /etc/localtime 2001 Spent a while reworking again the scheme machine. Currently the memory abstraction is composed of registers and heap. The heap is composed of three types of objects: array (vector of bytes), vector (vector of objects), stack (like vector, only FILO). Hehe. Looking at the last entry, seemed I thought pointers (half the registers) was the way to go. Lame lame lame. Still using the original flex file to hadle the parsing of tokens. Actual parsing is still done with C routines. Although I MIGHT xfer this over to byte code. The byte code interpreter currently works thusly: A program is both an array of opcodes (bytes) along with a corresponding vector of values. That way each opcode can have an operand value along with it. This is somewhat inefficient since most opcodes don't need a value. The scheme machine (SM) is extremely stack biased. Hmm. Then perhaps a separate data vector isn't needed? I optimized the opcode array to be actual C function addresses. This is bad since I can't handle 'bad instruction' situations. I could pre-fill the array but stray instruction poiners would still be an issue. Sat Feb 3 21:06:40 /etc/localtime 2001 The virtual machine is growing. Added a few opcodes like bra (branch relative), add, push, pop, blah blah. Created a high level opcode, namley ass (assign) and vref (variable reference) in an attempt to create an environment of bindings construct. Funny thing is, the C code just calls all the existing opcode functions. I'm wondering then how much of the interpreter should be hidden behind opcodes versus actual byte code. Also thinking of the most efficient method of handling the environment. I might have written down notes on this somewhere but can't find mention of it here. Basically I see a greater chance of optimization if each local environment is a vector. JIT optimization will be trivial. The global environment then will need to be the exception (namley a hash table). Optimizing for bindings outside the local environment is also trivial in that I can just point directly to the binding (misuse of term here? binding in my case is the pair continaing the variable name and value) since its existance isn't dynamic from the point of view of the procedure. Sun Feb 4 21:24:40 /etc/localtime 2001 Was implementing byte code in a lame way: array of bytes and vector of associated data. IE ((push push add dump) #(1 2 () ())) This was lame since for the scheme machine to do anything with a program it would have to compile it first. It would make sense for the byte code to be complete and ready to go out of the box. Solution then is to just store the data after each opcode. My original idea was that I might want to push/pop/etc atoms. But I realize now that that's actually done a few levels higher. I'm trying to think about the scheme machine at an even lower level than scheme primitives. So what will happen now is certain opcodes will create scheme primitives which will then be manipulated as register and stack data. Some example assembly: ((push a) (push 1) (ass) (push b) (push 2) (ass) (push c) (push 3) (ass) (putc 10) (push a) (vref) (dump) (putc 10) (push b) (vref) (dump) (putc 10) (push c) (vref) (dump) (add) (add) (gc)(statsdump) (putc 10)(halt)) Later attempt. Notice new implementation of push/pop and new opcode 'number' to create new values in registers. Lost efficiency of existing scheme values in separate data vector but gained it back by using registers as temporary spots for often used values. ((bra 45) ;;;;; (number 0) ;;;;;;;;; (putc 45) ;; (number 1) ;;;;;;;;; (add) ; (copy) ; (number 80) ;;;;;;;;; (ncmp) ; (brf -23) ;;;;; (pop 0) ;; (ret) ; (jsr 5) (putc 10) ; DUMP BAR (number 0) (number 1) (pop 1) ; LOCAL CONSTANT = 1 (number 10000000) (pop 2) ; LOCAL CONSTANT = 20 (push 1) (add) (copy) (push 2) (ncmp) (brf -7) ; (gc) (statsdump) (halt)) Fri Feb 16 23:34:54 /etc/localtime 2001 Skeleton interpreter was a trivial issue once I realized that new_pair didn't return the new pair on the stack but instead mutated r0 (duh1!!) spent a few hours tracking down the bug. Knew something was up when the stack kept corrupting and the solution was to remove the pop's after new_pair (which is used to add argument values to the arguments register during the combination formation) Sat Feb 17 17:29:44 /etc/localtime 2001 Was implementing internal parsing behavior like quit and if by setting registers to constant values. IE: proc would be set to 17 if it was supposed to handle the arguments as evaluated if expression arguments (the arguments would be the evaluated condition and true/false clause so that all I would have to do is replace the frame with one of the clauses). What I didn't realize, at first, was this would break the garabage collector. Registers can only be valid objects or 0. Oops. All I'll have to do is create a few more OBJECT type I suppose? Sun Feb 18 15:43:22 /etc/localtime 2001 The evaluator so far has been like the original. A big loop (and associated functions) that worked on the expression passed. And once again it was big and ugly and i spent a few days on it. The only difference was It used a local stack versus the C stack so all 'function' calls were really specialized data I pushed on the stack. The main problem was all of the different combinations that the registers and stack could be in and how to efficiently perform the correct actions. (v3-03.tgz:scm.c) I later moved a lot of the code form the main loop into specific functions. Not just the 'machine mechanism' like frame pushing/popping but scheme syntax/primitive handlers. What I plan to do now is emulate completley the 'standard' stack method of evaluation. IE: [quit eval (+ 1 2) will be the initial stack state. The evaluators main loop will just pop the first element (as returned data) and 'return' to the next stack value. So after a call to eval, the stack will look like: [quit (1 2) eval_args + eval_func Because I'll be pushing/popping 'return address' or C function address as OBJ constants, I'll have to implement in the memory abstraction constants. Objects that aren't in the compacting heap and when hit during the compacting phase are ignored. Oh fun fun fun. Perhaps I'll allow a separate (de)allocation routine for them along the lines of malloc/free. Hmmm, MIGHT be able to get rid of registers? Doubt it. Need to parse the special cases: "" #() Sun Feb 18 21:42:49 /etc/localtime 2001 Constants are now implemented in memory abstraction. I've currently created 2 internal sytnax constants quit and eval. Eval, for now, just pushes #f if the returned value is a pair and the object itself otherwise. Quit sets the constant power to false so the WEL stops. I start the process by pushing quit eval and the expression in question: [quit eval (+ 1 2) for example. WEL pops the first arguemnt into the args register and 2nd stack object into the prog register. The prog register is assumed to be the 'return' address so is resolved using the obj_function() selector. This results in the C function being called with then maniuplates the stack/registers to, hopefully, get the desired result. This seems exactly like partf of the normal computing process using a stack/ register machine. In my case I start with the 'return' address and data and act as though I'm just returing from a procedure call. Hopfully a normal evaluation process will look like: [quit eval (+ 1 2) [quit 0 () (1 2) combination eval + [quit 0 () (1 2) combination [prim_plus] [quit ([prim_plus]) () (2) combination eval 1 [quit ([prim_plus]) () (2) combination 1 [quit ([prim_plus] (1) () combination eval 2 [quit ([prim_plus] (1) () combination 2 [quit ([prim_plus] apply (2 1) [quit 3 [3 I wonder (and will ponder) the efficiency of such seemingly complicated yet simple implementation. For that matter, shold I just completley be compiling this to a machine code? I still don't think so in that I will still be interpreting the machine code. Tue Feb 20 17:55:08 /etc/localtime 2001 Discovered a small parsing bug while working on environment extending. Seems my parsing of dot expressions was incorrect as it stopped parsing immediately after the next closing parenthesis. The parser now must mutate what at first is just a normal list into the dotted pair. IE: (1 2 . 3 4) =parses=> (1 2 3 4) =mutates> (1 2 . 3). I don't mind that it gives no warning if it's incorrect syntax. The result is just to ignore everything after the cdr part of the pair (syntatically). (i think this is right?) Completed closures but crashed after the first endless tail call. Took a while to figure out what was happening but fortuatley my debug library help greatly. For some reason the stack seemed to become corrupted. After a bunch of mem.c debugging output it narrowed it down to syntax objects. (I did find a bug in the heap_obj_copy where it would always ignore shadow objects since they appear to be constants. Ahh the price to pay for exception to the rule.) It turns out the syntax objects, which are bound to C variables in the main code file, weren't tagged as constants (but were still being allocated as constants). So of course they were being moved around, which is no big deal, but they were being set to shadows, which wiped out the C function address. But only for the C variables, not the scheme bound vars or existing locations on the stack or lists. Hmmm, intersting now that I think of it. No wonder it was a bitch to track down. Wed Feb 21 22:25:31 /etc/localtime 2001 During the combination phase of evaluation, I 'optimized' it a bit by checking if it was the last argument of the combination being returned. If so I would immediately apply the function or primitive. This had the undesierable side- affect of it not being able to handle functions of no arguments. Why? Oh yea, cause it wasn't storing the function/primitive locally so ended up trying to 'call' null (which is what is pushed initially into the combination frame). IE: (+ 1 2 3) => [0 () (2 3) env .comb. env eval 1 Implemented define. It's currently somewhat of a hack as environments aren't mutation friendly. Or are they? Hmm?. I guess they are since an env is: (parent-env bind1 bind2 ...) Never mind. Thu Feb 22 21:41:27 /etc/localtime 2001 Read some ancient scheme thread (including posts by Aubrey Jaffer). Most of the change seemed to stem from those who had little clue as to the reason for scheme. Some wanted to rename car/cdr/cons or include unless. Their argument was if (with no else clause) was just as pointless. Of course why not just urge them to remove that special if case? Lamers. R5 seems quite elgant. Although I would still prefer to leave out a few things such as internal definitions (leave those for binding to the TGE only!) or possibly even prohibiting mutation of the tge. (Get some nice optimization there!). Initially it seemed they didn't want to remove anything from the original specification. Nill/() #f/() seemed to hang around for quite sometime. Glad they phased those out and instilled a stricter disjointess of object types. Of course I'll be leaving out interation and named-let constructs. Why are those in scheme? I would really like to get a good mental grasp of macros. Are they just fancy quasiquotation expressions? And is there more to continuations than just reinstating captures stacks? Mon Mar 5 15:41:44 /etc/localtime 2001 Spend the last few days ignoring threads in favor of getting the basic syntax implemented. Got stuck on quasiquotes. See about.quasi quotes. It was tricky dealing with them on a stack machine. Initially I thought I needed 2 extra local variables, but needed only 1. Thought perhaps I dind't need any but that was bogus. Tue Mar 6 17:31:09 /etc/localtime 2001 Tweaked the memory abstraction a bit for stacks. Instead of the first 4 bytes representing the element count, it's now an actual pointer to the next item in the vector. Yes, I got rid of an addition. How to deal with things such as file descriptors? If I let the GC reset them then I must get fancier with the memory abstraction. If I require the user to close them then that's just lame. How CAN I free them? Impossible since once they become garbage there's no way to know about them without a linear scan of the heap after each GC. Sucks. Thu Mar 8 19:21:16 /etc/localtime 2001 Going to change the obj descriptor to include a 'cleaner' bit which will signify if it needs separate cleaning. This will complicate the copy collector as I will have to keep track of differences in the number of need-cleaning objects copied. The descriptor type will be the high bits of a long dedicated to type info. The next long will be the size. The next long either the address of the cleaner or the address of the next free space in the stack. Sun Mar 11 21:49:47 /etc/localtime 2001 Jesus. Found a very stupid bug. Surfaced after my rewrite of the mem abstraction. I wanted to support the dirty object (array with a C function pointer as the first 4 bytes representing a clean up function). Thought there was an error in the new implementation. Fortunately it was just my passing of parameters to a constructor. Of course the GC occurs and the object's have sine moved around. Stupid me. Sat Mar 17 22:15:38 PST 2001 Just reimplemented virtual machine. Instead of it popping the return value then operand, just pops operand. The return value is assumed to be on accumulator. This feels better as 1) can now use memory abstraction without having to create a stack first 2) should speed up interpreter as it was often pushing then immediately popping acc. Wed Mar 25 12:40:48 /etc/localtime 2001 Having a bitch of a time passing the port to the scanner without it mangling the port. IE: Scanning a number requires a lookahead which means the port is incremented a character passed what it really needs to be. Upon the next call to read, it begins passed where it should have started. I'm either going to have to dumb the flex program a bit or parse in the virtual machine. Tue Mar 27 08:12:07 /etc/localtime 2001 Solution to over-lookahead was the flex option "%option always-interactive" Spent a few days thinking about the problem in general as well as scrutinizing the flex manual (used info, getting quite proficient with it). Fri Mar 30 22:23:00 /etc/localtime 2001 That wasn't the solution. Solution was to redefine a new YY_INPUT that took one char only. Have been dealing with buffered versus unbuffered streams. I assumed termios's cfmakeraw was what I needed but it was a bit too raw to keep the various methods of inputing scheme code consistent. IE: read-char? would echo to stdout but read-char wouldn't. It was also translating newline to return (very annoying). All I do now is set the term to non-canonical mode and it behaves nice. Now is dealing with a separate stream of input to be fed to eval (this will be the basis of the networking IPC). Problem is since it's a normal stream, the parser sees EOF and returns it. I need for it to block on EOF while parsing while parsing indefinately. I don't see why it doesn't sense an EOF while reading stdin but does when reading a file. Actually I do. It's a canonical issue. Easiest way to deal with this I suppose is to do as I did in the original IPC: Scan a line of text, parse string, eval expression. That seems the only way of getting around the lexical analyser abstraction I'm relying so heavily on. Sat Mar 31 09:21:50 /etc/localtime 2001 Seems I've got a handle on a basic IPC. It basically reads a file and eval's the expression. There really isn't a need for parse (string->atom), just (eval (read port)), unless EOF is returned in which case I suppose I should reset the file pointer (incase of write/read collision). All I need now is to put back a decent threading facility and I can upload the wscheme interpter! Sat Mar 31 15:23:04 /etc/localtime 2001 Have been scrounging around notes looking for the insant I hit upon my virtual machine abstraction. I remembered that it began as a somewhat conventional type of machine with an instruction list (and instruction pointer) along with a data list (one data element for each instruction). I then merged the 2 lists together. It was still messy in that I was constantly pushing and popping. I suppose I realized that I should just stick everything on the stack? Current evaluation process: Stack popped and the assumed primitive, syntax, or opcode is called (C call). I start the process by pushing the primtives quit and eval as well as the expression in question (which is to load wscheme.scm). Wscheme.scm has the main read-eval-print loop. When all done, the only thing left on the stack is :QUIT:, which sets power to false, so the machine stops. Current GC has 4 types: Array, dirty, vector, stack. Array is just length of bytes, dirty same only first 4 are pointer to a destructor, vector length of objects, and stack FILO of objects (GC is aware of all of these). The GC also allows the creation of static objects (must be arrays currently). The compacting collector will only compact objects (and mutate into shadows)within the current heap, skipping those (static objects) that are outside the range. Can't think of anything else to report. I just gave a summary of the machine and GC after noticing the same a few entries back. Wed Apr 11 18:52:52 /etc/localtime 2001 Successfully implemnted sleep blocked threads. If any or all threads are sleeping, they will be skipped or the interpreter will sleep until one needs to be awoken. Thu Apr 12 10:14:43 /etc/localtime 2001 Realized if I just copy the body each time a lambda expression is evaluated, (not the application of a closure) that I'll be able to compile variables into their actual binding. This didn't work initially (before copying lambda's body) since I was mutating the original expression and was scoping variables during their first use. Things like make-adder's weren't working. Each one behaved like the first adder created. Creating a super-list-copy was quite trivial. Only 8 opcodes and 2 recursive calls. Since I never deleted the variable compilation code, it was just a matter of uncommenting the expression mutation code. Sat Apr 14 16:51:38 /etc/localtime 2001 Archiving: v3-4.6.tgz Successfully have a working interpreter (non optimized) and a cheesy multi- user chat. I had to throw in a critical primitive that halts all task switching. That way I could append a string to the end of a file without fear of another thread interfeering. ALTHOUGH that doesn't stop other scheme processes from overwriting at the same time. Doh! That might explain why the chat seemed to not send chars anymore from time to time. Guess I'll really have to instigate true file locking using linux's mechanism. This won't be an issue later when IPC messages are sent via sockets and such. Plan to redo the closure and environment paradigms. I want to be able to reference variable values directly so will be putting variables and values in separate yet coinciding vectors. TGE might be sorted later (like V1) or a hash table. I debated, on paper, weather each envirnment should also encapsuate every parent environment in a local vector so that any variable, besides TGE bound, can be accesses with only 2 vector-ref's. I feel there will be more shallow extended environments anyways so iterating to the correct on shouldn't be much of an issue. Another issue is how to efficiently evaluate arguments and bind them to the new local environment. My current method is to push evaluated args to the stack (in the combination function) then pop them off into a list, then (in the apply function) merge the closure's parameter list with the value list. Perhaps a compiling phase when creating a closure (so i know ahead of time how many variables there are (including . vars)) Looks like I'll be splitting op_eval up a bit into op_prim_eval and op_proc_eval. That way it will handle more efficiently evaluated operand passing. Procedures will require a new envirnment to be extnded and operands bound while primitive functions can just take an argument count and deal with the values on the stack directly. Currently I pop them off and form a list for both cases. This virtual stack machine and memory abstraction sure leaves a lot to be desired when actually comming up with decent logic flow. I'm always having to work around (in an imperative setting) the lack of mutation. At the same time it's a good thing. It would seem so easy to mutate numbers directly on the stack or registers (rather than creating new ones each time). But that would definatley break continuations, among other things. Wed Apr 18 10:40:14 /etc/localtime 2001 Spent the last few days reimplenting the structure of the interpreter. It was very expensive pushing evaluated operands onto the stack (during operand evaluations) then popping them off into a stack then extending them into a new environment. I now leave the operands on the stack and pass the operand count to the primitive in question. The operand count is compiled directly into each combination at the tail of the combination list (which is usually null (or in my case not a pair)). For procedures, the operand count is used along with the argument count (realized via the closures new argument vector) to extened new local environments fast. First the dotted argument list is created (stack popped until operand count equals argument count). Then the new environment value vector is created, bound to the symbol vector, then the remaining evaluated operands (still on the stack) are set to their respective places in the value vector. IE: the stack: #[ ... 1 2 3 4 5] results in the env: #(PARENT-ENV #(x y z) 1 2 (3 4 5)) Yes, I had to reimplement closures a bit and most of the environment/symbol realization abstraction. For closures, I'm able to compile one and reuse the symbol vector (as it never changes). So the only thing I need to create upon each environment extension is a new value vector (and then mutate it's parent env, symbol vector, and binding values). Still need to consider storing not only the symbol vector, but parent vector as well (so it won't have to itterate to the proper parent environment). At this point, it's as fast as guile when looping x number of times. Of course i'm compiling with full optimization switches. Fri Apr 20 20:16:05 /etc/localtime 2001 Looks like I successfully implemented the compiling of bindings and references. Most of the primitives are broken at this point since I also changed where on the stack most opcodes and function expect the environment. Going to backup now and implement heavier compiling. This will require the syntatic functions to return equivalent primitive expressions (maybe) which will then be re- evaluated. This will (hopefully) allow the evaluation process to substitute parts of the expression with the equivalent primitive expression. Wed Apr 25 20:16:20 /etc/localtime 2001 Working on compiling combinations into internal combinations: (a b c) => [b c a . 2]. Problem is to figure out if it's a syntatic expression, I evaluate 'a' which means I need to somehow eventually create the stack: | 'b' 'c' 'a' 2. But since the operator is now at the end of the list, its going to end up evaluating the operator twice. This blows. Guess I'll have to have to check for the syntatic symbol and call the syntatic translator directly rather than have the bound to symbols in the global environment. I thought I could keep them in the environment since it would help enforce the restriction on symbol names. (no syntatic symbols allowed except when used in a syntatic expression) Fixed problem. What I do now is convert list into combination, which is more efficiently evaluated. I first check that the head of a list is a syntatic symbol. If so, I lookup (in the TGE via the normal symbol lookup routine) the syntatic function and pass the rest of the operands to it. I then mutate the original list into the passed combination. IE: ((if p t f)) => ([p '(t f) .if. . 2]) If the head of the list isn't a syntatic function, then it's mutated into a generic combination. IE: ((f a b c)) => ([a b c f . 3]). In either case, the evaluator then evaluates each element in the combination pushing the value on the stack. When no more operands are left, the top most value on the stack (hopefully an opcode, primitive, procedure, or continuation) is then called with the operand count passed (in the accumulator). I wonder if I should pass the current environment to primitive functions? Since a few primitives do need the environment (apply for example and later on eval) I wouldn't want to have to perform an extra pop. Better might be to make those exceptions syntatic expression? * Problem with combinations with no operands. Thu Apr 26 08:15:32 /etc/localtime 2001 Going to do away with what I've been doing which is to compile all expressions into an internal combination form. What I think will be more efficient and easier to implement is to convert external combinations into internal combinations and external syntatic expressions into internal syntatic combinations. IE: (f a b)=>[a b f . 2] (if p t f)=>{.if. p t f} Archiving: v3-5.2.tgz Yea. The code is so much simpler now. There was way too much overhead converting syntatic expressions into normal combinations. Compare op_if in v3-5.2 to current. Seems I lied. When benchmarking the current version against guile, it seemed to be taking longer than before. So I pulled out the archived version I thought was faster than guile and wouldn't you know, it took 96 seconds for it to loop 500000 times and guile 12 seconds. The current version takes 74 seconds which is a 30% improvement! So thats a good thing I suppose. The function: (define (l x) (if (= x 0) 0 (l (- x 1)))) (l 500000) Tue May 15 21:36:56 /etc/localtime 2001 After a bunch of research (namley reading over MIT's implementation's documentation, I've come to the conclusion that a decently optomized interpreter /compiler isn't worth the trouble at this moment. It seemed the optimizations used were a big headache to deal with for both the user and implementor. A case with global variables: It would seem in certain instances 3 disjoint caches are kept (read, write, something else) so when mutation occures with one, all referenced instances of that object need to be updated. Talk about a pain in the ass. I was actually thinking of optimizing global variable calls so that functions that used them would not honor updated values assigned to them. So that if I were to (define car cdr) nothing that depended on them before the re- definition would be affected. The only advantage to this would be my compiler sensing and optimizing for primitive calls versus a complete continuation extension (pushing the environment, next expression and jumping to the primitive function). Fri May 18 01:08:20 /etc/localtime 2001 Spent the day trying to work on the stack abstraction. Instead of a pretty inefficient vector based object I put it in obj.c. That seemed lame in that whenever I needed to print an object I had to reset the stack object to reflect the temporary (but fast) pointer to the current member. I'm moving it into the mem.c file in hopes that I can hide those details from the object abstraction. Fri May 18 16:43:55 /etc/localtime 2001 An example of current compiler. I've spent a reasonable time thinking about a more efficient interpreter and came up with one that does a bit of optimizing before actually computing each expression. (1 (+ 1 2) (* 4 5)) => # # # # # # ()> # # # # # # ()> () The instruction pointer is set to a list of expressions to evaluate. The 'cpu' uses the object type as the opcode. A few opcodes compile the expression and the rest actually do usefull computation. IE: pairs are mutated into a bunch of operations that implement a sequence of expressions to evaluate with the last being the value of the entire expression. the 'push' object (which is really a pair or 2 element vector) pushes it's car on the stack and resets the ip to it's cdr. Blah blah blah. I've also enhanced the stack primitive to the memory abstraction. I spent a few minutes trying it out in the object abstraction but that made no sense as mem wouldn't know what the hell obj was doing with it. What I've done was keep a pointer to the current stack location. When the actual stack object is touched, it's object count counter is updated (relative to the stack pointer). Since I only push/pop/count/swap the stack, everything can be hidden from the object layer and so implementation-wise the stack is just the mutation of a pointer. Looking back and notes and ideas, I didn't want to stray from the ability to bind syntatx with symbols. I thought that was a usefull feature. But the spec clearly forbids using syntatic keywords as variables. The power, then, must lie in macros (or fancy quasi-expressions). Changing the compiling a bit. Instead of a sequenc becomming a bunch of 'nop's and 'call's', it's going to become a bunch of 'nop's and 'comb's. Combs will be compiled into syntatic forms or combination evaluations. (Maybe even compile for global primitives as they do in MIT scheme). As you can see the result is the same. hehe. (1 (+ 1 2) (* 4 5)) => # # # # # () > # # # # # () > () Now make sure a tail call occurs at end of sequence: ((+ 1 (+ 3 4) 5)) => # # # # # # ()> # # # () (1(+ 1 2)(* 3 4)) => # # # # # ()> # # # # # #() Oh i can do if now (notice how only the true clause makes it to full compilation: ((if (+ 1) a b)) => # # # # ()> #() . #()> In a non-tail call setting: ((if (+ 1) a b)2) => # # # ()> #() . # ()>> # # () Now to implement equality primitive and procedures and I can benchmark against last version! Woo hoo. This better be near guile (like 30% versus 300%). Sat May 19 23:05:02 /etc/localtime 2001 And this is the result: (define l (lambda (x) (display x)(newline)(if (= x 0) (l (- x 1))))) l => #) # # # ()> # # # ()> # # # # # # ()> # () . # # # # # ()> # #()>> (l 500000) takes 11 seconds (full compiler optimization). Guile 12 seconds. It originally took 96 seconds at full optimization. Yes. 28 seconds with just -O2 compiler switches. (l 1000000) Guile 49 seconds. Wscheme 42 seconds. Yes. Slowly getting rid of the oldschool opcode functions. Those where the syntax objects (that contained pointers to C functions) that were popped off the stack and called by the 'stack-code' interpreter. I still think that was a kewl way of capturing not only computation and data state but program state as well (instructions were pushed onto the stack along with their data (below it of course)). Wscheme now takes 48 seconds. Fixed an incorrect closure implementation that just replaced the lambda combination with a complete closure. This was wrong I think? Yea since a procedures defining environment will be different each time, unless it's TGE. Why? Each time a procedure is invoked, it creates a new env for itself. Any future closure will need to point to that, not the first env it may see. So although scheme binding mechanism is static, closures are not? Start to clean up the primitive functions. Caught myself doing this: r2=pop(); r1=pop(); mem_vec_set(r1, 0, r2); => mem_vec_set(pop(), 0, pop()); Duh! Looks like I'll have to remove prim_string for the time being as it expects the arguments to be in a list (the old method of passing arguments to primitives). Literals? Seemed I implemented an internal literal (compiled quote expression) as an enhancement in the last version. I suppose so I wouldn't have to go through the evaluation process of checking for the value of quote, realizing it was a syntatic keyword, then stuffing the stack with the quote opcode. By by. Shouldn't have relaized I could do this for everything hehe. OH and by by quote syntax_quote opcode function. By by syntax_if. Removed op_symbol which was initially part of this versions opcode mechanism. I thought I might dispatch on normal objects, with the action normally being to push them on the stack. Instead I just compile into a combination opcode which compiles, yet again, self evaluating objects into either push, pushbinding, or pushreference. Sun May 20 12:24:30 /etc/localtime 2001 It would seem there is no specified order an expression will be evaluated. (a(b()) + c(d())) might evaluate b() before d() or vice versa. Good by syntax_begin. It was pretty easy. Just set acc to the cdr of the begin expression and let the tail part of op_comb do it's thing. That is that is mutate the currnet opcode into either op-call or op-nop (tail call). (and 1 2 3) => #####().#()>.#()> Wow! Implementing and/or was SUPER EASY! Just mutate and/or expression into equivalent nested if expression! bad ass. Now if only I could realize a macro implementation. Oops. Actually implemented or incorrectly. I just cut and pasted the and code so when an expression was true, it returned #t and not the expressions value. Created a new 'or' opcode. At first I played with the idea of a special if opcode, but realized it was really an or operator. Almost even kept the or and-like code and thought I could just replace if with tif and have a special syntax for it which used op-tif rather than op-if. What happens now is the compiler sees (or a b c) and mutates it into: (or a b c) => # # # # # # () op-or either returns the top most stack value or cdr's the IP to eval and test the next expression. Now it's time to work out a nice let implementation. (let ((x 1) (y 2)) (+ x y)) => # # # #) . # # # # # ())> #() Mon May 21 13:24:14 /etc/localtime 2001 This deal with primitives being called by op_apply is a pain. Since after returning from a primitive, op_apply pushes acc (return value from primitives) and cdr's the IP (next instruction, which is what opcode do). My problem is with some primitives that perform non primitive operations like eval and thread. They attempt to mutate the stack into a new state. What I've had to do in these cases is stuff acc and ip with the right values so the resulting remaining actions by op_apply create the state i'm looking for. GC issues. I belive I have a problem with threads and stacks. We'll see. Maybe I need to update SP to new STACK location after a GC? Well it was related. I was updating SP in heap_obj_copy. That means whenever it found a stack it would modify it. I moved it to mem_gc where it first copies the registers, including the stack, and adjusts SP accordingly. I'm going to instead of prim_thread creating a new thread and making it the active one, insert it to the front of the active thread list. I just realized that a use a lot of registers as temporary variables. If I create someting big on one then more than likely it'll linger through GC's. If I didn't cast ch to char, then 1 was passed as 0xffffffff in certain instances, mainly when I was messing with dump3.scm. } else {c=(char)ch; new_symbol(&c, 1); } Will try and narrow the problem down a bit. Wed May 23 04:51:36 /etc/localtime 2001 THE REPL BUG Had a strnage bug I thought was a different problem for the past day or so. I would create a self loading file (which I made tail recursive) that did the dump3 (display scrolling regions of specified text file). It would seg fault. I thought it might be the newly formed threading mechanism but as I trouble shooted, found it might have someting to do with dirty objects. It kind of was where I accounted for dirty objects during a GC copy but not which one. I solved it by zero'ing out the content along with a check for null dirty object (which means don't clean). !!! Perhaps I should do the check in obj_copy?! I still got seg faults and thought it had to do with the dirty object still and not copying it over correctly (or zero'ing it out or GC copying it). Eventually it was that, of all things, the REPL op code object I was creating by mutating a number to REPl type. Well the type specified it as vector. So the problem of course waz the GC copying over 8*4 bytes rather than just 8 bytes. I knew this was the problem becuase I eventually dumped the entire heap, before copying or performing any of the actions I thought were the trouble, and noticed the heap was corrupted just after the repl object! Hmmm!!! Then it hit me. I broke the abstraction and suffered for over a day! During all this I found bugs that would have surfaced later I'm sure. One being obj_size not accounting for dirty objects. I probably had problems regarding this to begin with since I got seg faults when i open ports without binding them and let a GC occur. A few others I can't remember at the moment. Must sleep it's 2:15 am! Wed May 24 12:06:18 /etc/localtime 2001 It was a good thing that occured. It forced me to scrutinize over the code more than average. Many more bugs surfaced during the process which would have been a headache later I'm sure. EXTENT SCOPE C: dynamic static Scheme: static static Oops, the time was messed up on laptop. It was like a day behind or something. I fixed this entry's date (the time is really unimportant). Going to go back to the code and instead of op_apply pushing acc and cdr'ing ip, going to require the primitives to do it. The reason being a few of the 'primitives' were actually doing some major logic flow like resuming continuations and prim_eval. I changed mymind. I forgot to implement quasiquote. Need to do that I guess. I think i'm getting closer to the elusive 'macro' issue. For quasiquote this time, instead of using the machine to evaluate it on the spot, i'm going to translate it into a nested list/quote expression. IE: `(a ,b c) => (list 'a b 'c) Fri May 25 15:07:32 /etc/localtime 2001 Spent all morning (and night) attempting to work out the basic transformation algorithm for quasiquote. At first I attempted a list transformation (notes)but realized not to soon afterwards that one can quote pairs! Duh. Once I jotted a few cases of a cons tree transoformation it all made sense. It's just another modified tree copier! So I put the code in for a quasiquote->cons+quote transformation then added the special quasiquote/unquote cases (which just increment/decrement the depth count and continue as if it were a normal tree copy). Bonus. Should probably consider a method in which to compile binding values in cases where internal syntax re-writes an expression, like quasiquote, using cons. The spec calls for such behavior, namley for primitive behavior to be unmodified when the user mutates things around. Very strange. Something's resetting the handler for SIGALRM (either read-char or char-ready?). To fix the problem I put the signal call in reset_alarm, which just aobut every thread function seems to call. I also noticed that the thread interrupt was instigated by the leep function. I had a call to the alarm initializer commented out, yet it still functions. Baffled me for a few moments. Wed May 30 14:59:49 /etc/localtime 2001 Would like the read functions to automatically be asyncronous. There's no easy way to be signaled when input arrives so I'll have to get fancy with polling in between task switches and sleeping threads. One thing I don't have implemented is priority. As it is, if 10 threads are running and one is sleeping, it'll be added to an arbitrary point in the list (actually the head) and eventually made active. I would like, if polled and found ready, the thread wanting input to be the next active. Wed Jun 6 08:44:04 /etc/localtime 2001 Archived: v3-6.2.tgz Currently seems stable. Threads while supporting the sleep function (puts thread on sleep list). Clock tick has one second resolution which is horrible for interactive threads (lag in between groups of characters typed). When you compile and run this version, you get a cheesy chat facility (just like the very first world!). The message-evaluator is threaded. It just reads the next expression from the IPC file and evaluatess it in TGE. The main loop dispatches on a command character. Currently supports 't'alk (read a string and send command to IPC that displays the message), move a global avatar 'h'-left and 'l'right, and 'q'uit. Best to run like: make clean; make; stty raw ; s ; stty cooked TO DO TODAY: Finer clock resolution (use usleep rather than sleep). Thread support of blocked file descriptor (so I don't have to poll, sleep, repeat). Wed Jun 6 21:00:31 /etc/localtime 2001 I increased the resolution of the thread timer ticker to millisecond resolution by using ftime, rather than time, and converting the structure into a U64. That just happens to be the size of a wscheme number. Ran into a few minor bugs like forgetting to multiply by 1000 for usleep and checking that the difference in wakeup time and current time wasn't negative, otherwise usleep, which casts to unsinged, would sleep for quite a long time. I'm able to, whithin wscheme, call char-ready? and msleep for about 200m sec whree I try again. I was considering putting this process in the threading code itself (kernel?) but decided what I'd be doing internally I can just as easly abstract with a few wscheme functions. I feel VERY ready to upload current version and begin the port of the current world code. So do I keep old code or slowly build up a world engine? If I want to start implementing my 'spirit' machanisms, I'm going to have to basically start from scratch. Thu Jun 7 18:01:06 /etc/localtime 2001 I need to be carefull regarding how much I put in scm.c versus implementing it in scheme. I can easily send telnet escape sequences in scheme. Setting terminal attributes will be a bit more difficult and so will probably create primitive raw/cooked functions. Issues with the parsing of strings into symbols. When it seems to hang up when a string ends with a slash (or even 2!). I'm tired and will sleep now. Current login banner: (It's in color) .--. ._ . ._ Welcome to: ` ' / \ | \ | | \ \ ^ / \ / |-< | | | V V ~--' | \ |__ |_/ 3.0.0 Fri Jun 8 19:59:44 /etc/localtime 2001 Fixed some slasification issues. Seemed I was only slashifying a few chars when for use when displaying characters externally. Now "Test. #one\" (three)" becomes Test\.\ \#one\"\ \(three\) Sun Jun 10 10:10:28 /etc/localtime 2001 Major bug discovered. While working on the avatar movmement code, namley dealing with avatars walking off avatars, I found that the compiler doesn't handle unbound symbols the way I expected. I was sending a dynamically created expression (quasiquote) to the ipc writer which included "Avatar". Up until thas point, it was evaluated after the Avatar symbol was defined. But after tweaking the code (dump-map) to evaluate and thus compile it as an unbound symbol, the result was a # rather than a # (or the possibly of a #). My attempt at removing strings is really haunting me now. If it were a string, not only would it signal an error, as it should, but I wouldn't have to get all stupid when writing symbols as external expressions. The hack to fix this problem would probably include NOT compiling symbols as immediate #'s but rather uncompiled which would require a lookup each time around. Fuck. Fine, I'll reimplement strings! I really liked not having them. But thats what I get when I get really fancy with the language and going back and forth between external and internal forms. Actually, if I were to just define the variable first...hmm. Yea, I like that better. I re-implemented strings. Tue Jun 12 20:29:31 /etc/localtime 2001 Archived: v3-6.3.tgz Tue Jul 3 18:50:29 /etc/localtime 2001 Archived: v3-6.4.tgz Added substring a few days ago. Had to to implement readline for world. Sat Jul 7 12:18:04 PDT 2001 Spent the last few days attempting to deal with connectiosn that don't report their temrinal size (much to the detriment of my chemistry homework). I decided to have it just ask if it can't get that info. It's a clean fix i think. A bug came up relating to critical sectiosn and sleeping threads. If a critical section is engaged and it decides to sleep, the thread is switched! Bad bad bad. Wed Jul 11 16:00:33 /etc/localtime 2001 Attempting to stick in support for signals. Seem to work ok only they're not being spawned as a new thread for some reason? Only happens after certain cases like after forceing a sleep or, in the case of world, after redrawing the map. Can't see why for now and am too lazy to turn on debugging and seeing screen fulls of text scroll by. I think I'm just going to reimplement the threading/interrupt/signal mechanism. It looks a bit hair especially since a signal will interrupt the sleep timer and force the sleeping thread awake. That's no good, no good. Thu Jul 12 08:41:21 /etc/localtime 2001 Added mtime primitive. Sat Jul 14 20:26:38 /etc/localtime 2001 Altered the way new threads are inserted into active list. Now they are just inserted into the front rather than after the current thread in the 'active' list. Wow! Worked first time. Basically re-wrote thread_next so that it: prepends signal handler of signal(s) that occured, prepends all waking threads (via sleeping list). Then swiches to next thread. If no more and no more sleeping powers down the machine. Otherwise, waits for sleeping thread then calls thread_next recursively and returns. Why? Becuase while usleeping, an interrupt would force a premature return from usleep. In which case the entire process would need to be restarted again so that the interrupt handler can be pushed and immediatley executed. Even if it's an unhandled signal, it's safe to recurse since we'd end up back waiting for the sleeping thread anyways. Possibility of a malicious DOS here perhaps maybe? Sun Jul 15 09:56:29 /etc/localtime 2001 The bug was this: Ftime on k9 didn't return a valid milli second value. According to a thread on the net somewhere, previous versiosn of linux defaulted to a 'safe' ftime implementatino which just set it to 0. Some how this caused the resolution in wscheme to be 1 second. I don't see how this would affect anything. But when waiting for a sleeping thread to wake up would immediatley return from the sleep and call thread_next again (current implementation). I thought it was due to the timer causing it to return eary or someting. Regardless, i'm using gettimeofday which returns the seconds since epoch and milliseconds. MILLISECONDS! I didn't catch that and assumed it was still microseconds. This resulted in the timer returning times ahead of schedule as I was adding a number quite larger than 1000. So I just / 1000 the milllisecdonds to get microseconds. All is well now. Signals and being caught and and k9 version is hopefully now on better track of time. Fri Sep 7 01:23:49 /etc/localtime 2001 Just added support for reals and renamed the original number concept to integer. World's broken now since all "/" were assumed to be integer divide. Need to implement divide now i guess or whatever the hell it's called. It's late and i'm tired. Sun Sep 23 18:37:47 /etc/localtime 2001 Attempting to rid wscheme of the 'file' concept. I'm hoping to create a persistent framework in which to compute. IE: the program and environment never die, but rather exist at some point and continue indefinatley. This being the case, file's will be pointless as one will just be able to store anything in a variable. The resulting code will rely on the OS's ability to distinguish between often used and rarley used variables and store them in the underlying persistent memory. Powering off the machine shouldn't upset the computation (much). Powering back on should result in the computation continuing. The OS will have to keep a complete state of the running process and threads. I plan to do this during each garbage collection. How to deal with cached and uncached bindings will be an issue. I figure I'll just 'recompile' code. Often used bindings will be in an active heap and so associated code will reference the memory directly. Less used bindings will be saved in persistent memory and so the code that references the data will actually be code to bring them into the local store, unless already present, then actually extracing the value in question. OK then, how to figure out which bindings aren't being used much? Sat Oct 6 00:30:10 /etc/localtime 2001 Descent support for reals. I'll write about it later. Uploading now. Oh yea, added sin, cos, tan support as well as inexact->exact.