Dino

* *__* _* * * _* *__*_ *__* * * | \(_)_ _ __\ \ / / \/ | * * | |) | | ' \/ _ \ V /| |\/| | * * |___/|_|_||_\___/\_/ |_| |_| * * * * * * * * * * * * * An LDPL VM Written in LDPL 🦖 === INTRODUCTION ===================================================== Dino is an interpreter for the LDPL programming language, written in LDPL. Because LDPL is a compiled language, Dino's goal is to provide a lightweight, scriptable version of the language that can be used to quickly prototype ideas or perform system tasks. Dino can also be used to run basic LDPL programs on systems which lack a C++ compiler, or to experiment with new LDPL language features and syntax. Mostly, though, it's a prehistoric toy. === EXAMPLES ========================================================= HELLO: $ cat hi.ldpl PROCEDURE: display "Hey pardner" crlf $ dino hi.ldpl Hey pardner LDPL-SPARK: $ git clone https://github.com/photogabble/ldpl-spark $ dino ldpl-spark/spark.ldpl 9 13 5 17 1 ▄▆▂█▁ $ dino ldpl-spark/spark.ldpl 0 30 55 80 33 150 ▁▂▃▄▂█ LDPL-SPACE-MINES: $ git clone https://github.com/photogabble/ldpl-space-mines $ dino ldpl-space-mines/spacemines.ldpl ================================================== YEAR 1: There are 55 people in the colony... LBI: $ git clone https://github.com/Lartu/LBI $ dino LBI/src/LBI.ldpl LBI/examples/fib.b 0 1 1 2... $ dino LBI/src/LBI.ldpl LBI/examples/squares.b 0 1 4 9... LDPL Examples: $ git clone https://github.com/lartu/ldpl $ dino ldpl/examples/explode.ldpl Enter a sentence: That's all folks! That's all folks! $ dino ldpl/examples/sqrt.ldpl Enter a number: 50 sqrt(50) = 7.07106781186548 === GETTING STARTED ================================================== You must have version 3.0.5 of the official LDPL compiler installed in your $PATH: https://www.ldpl-lang.org/ Once that's done, clone Dino: git clone https://github.com/xvxx/dino And build it: cd dino make dino You should see a "File(s) compiled successfully." message if everything worked. You now have a `dino` command line program sitting in the current directory. Run it directly, or add it to your $PATH and enjoy the fruits of this installation process: ./dino -h To test Dino, run it against the official LDPL Test Battery[1]: make test You should see another "success" message if everything is working properly. If not, kindly report an issue at this address: https://github.com/xvxx/dino/issues [1] We actually use a slightly modified version of the official LDPL Test Battery, since Dino doesn't have a compilation step. === BASIC USAGE ====================================================== Let's look at a simple LDPL program: $ cat math.ldpl DATA: x is number y is number z is number PROCEDURE: store 1 in x store 2 in y add x and y in z display x "+" y "=" z crlf First we'll run it using LDPL 3.0.5 as a sanity check: $ ldpl math.ldpl LDPL: Compiling... * File(s) compiled successfully. * Saved as math-bin $ ./math-bin 1+2=3 Okay, that seems right. Next we'll run it using Dino: $ dino math.ldpl 1+2=3 Great! We can stop here. But if you want to look under the hood a bit, you can see the tokens produced by Dino's lexer for this file: $ dino lex math.ldpl tokens (41): <DATA:>, <:NL:> <X>, <IS>, <NUMBER>, <:NL:> <Y>, <IS>, <NUMBER>, <:NL:> <Z>, <IS>, <NUMBER>, <:NL:> <PROCEDURE:>, <:NL:> <STORE>, <1>, <IN>, <X>, <:NL:> <STORE>, <2>, <IN>, <Y>, <:NL:> <ADD>, <X>, <AND>, <Y>, <IN>, <Z>, <:NL:> <DISPLAY>, <X>, <"+">, <Y>, <"=">, <Z>, <"\r\n">, <:NL:> Pretty fun. The next step would turning those tokens into the parse tree, which you can see using `dino parse`: $ dino parse math.ldpl vars (3): 0. NUM: X 1. NUM: Y 2. NUM: Z nodes (4): STORE 0. 1 1. <NUM> X STORE 0. 2 1. <NUM> Y ADD 0. <NUM> X 1. <NUM> Y 2. <NUM> Z DISPLAY 0. <NUM> X 1. "+" 2. <NUM> Y 3. "=" 4. <NUM> Z 5. "\r\n" These nodes are used by the generator to emit dino assembly, our VM's imaginary syntax and instruction set: $ dino asm math.ldpl SET %var0, 1 STORE %X, %var0 SET %var1, 2 STORE %Y, %var1 ADD %X, %Y, %Z PRINT %X PRINT "+" PRINT %Y PRINT "=" PRINT %Z PRINT "\r\n" EXIT If we want, we can save this output to a .dinoasm file and run it: $ dino math.dinoasm 1+2=3 Still looks right! Running dinoasm directly can be helpful in debugging or development of Dino itself. If you want to explore further, there are a few files in `examples/` with hand written dinoasm you can examine or run, too: $ dino examples/99.dinoasm 99 bottles of beer on the wall... Finally, we can see the bytecode produced by the assembler for our LDPL computer program: $ dino bytes math.ldpl 76 68 80 76 2 09 17 01 08 18 17 09 19 02 08 20 19 20 18 20 21 31 18 31 16384 31 20 31 16385 31 21 31 16386 06 "+" "=" "\r\n" While internally the bytecode is stored as a vector of numbers, when it's printed to the screen or loaded from a file we separate each number with a space and display strings literally. This means we can save `dino bytes`'s output to a .dinocode file and run it directly. Or even modify it before running it: $ dino bytes math.ldpl | sed 's/17 01/17 13/g' > math.dinocode $ dino math.dinocode 13+2=15 Some prefer to write all their code this way: $ echo "76 68 80 76 02 31 16384 01 -4 06 \"hax!\n\"" > hi.dinocode $ dino hi.dinocode hax! There's also `dino dis` which turns dinocode back into dinoasm, kinda. It's useful when debugging and checking or challenging assumptions. === HOW IT WORKS ===================================================== Internally, Dino is organized into three parts: compiler, virtual machine, and tooling, with the `dino` command line program serving as the primary means of interacting with the suite. The architecture is pretty standard: Dino's compiler converts LDPL source code into bytecode using a lexer, a parser, a code generator, and an assembler. The virtual machine then loads that bytecode into its memory and performs each instruction one by one, just like your old Nintendo. The tooling is just the `dino` command line program that drives the compiler suite. The traditional bytecode/VM architecture means Dino could (with a few changes) support languages other than LDPL in the future, but for now it's focused on supporting the full LDPL 3.0.5 specification on Linux, MacOS, Windows, WebAssembly, and Raspberry Pi. === TECHNICAL SPECIFICATION ========================================== * "Words" are LDPL numbers. * Instructions are 1-4 words: opcode and then operands. * Two native types are number and text. * 11 number registers: $a, $x, $y, $z, $e, $c, $i, $t, $sp, $pc, $ac * $sp is stack pointer, $pc is program counter, $ac is argc, $e error code * 5 text registers: @a, @x, @y, @t, @e * One address space for number registers, number variables, text registers, text variables, and text literals. * Parallel address space for number vectors and text vectors. === REFERENCE ======================================================== # --- ADDRESS SYNTAX ------------------------------------------------- | NAME | SYNTAX +-----------------+--------------------------------------------------- | Number Register | $a, $pc | Number Variable | %bufsize, %Users | Text Variable | @name, @City | Text Literal | "heya", "LDPL rox!" | Label | print-fn, DISPLAY # ----- MEMORY ADDRESSES --------------------------------------------- | 1ST | LAST | TYPE | DESCRIPTION +------+------+------------------------------------------------------- | 0000 | 000F | NUM | Registers ($x, $y, $a, $pc) | 0010 | 2FFF | NUM | Variables (%count, %item-size) | 3000 | 300F | TEXT | Registers (@A, @X, @E) | 3010 | 3010 | TVEC | Command line arguments @argv | 3020 | 3FFF | TEXT | Variables (@beer, @name, @label) | 4000 | FFFF | TEXT | Literals ("Hiya", "SCORE", "????") # --- REGISTERS ------------------------------------------------------ | NUM | NAME | DESCRIPTION +------+------+------------------------------------------------------- | 0000 | $A | Accumulator | 0001 | $X | Parameter | 0002 | $Y | Parameter | 0003 | $Z | Parameter | 0004 | $E | Non-zero error code | 0005 | $C | Carry | 0006 | $I | Incrementor | 0007 | $T | Temporary value | 0008 | $SP | Stack pointer | 0009 | $PC | Program counter | 0010 | $AC | Num of command line arguments given aka ARGC. 8 max. | 0010 | | Number variables | .... | | | 3000 | @A | Text accumulator | 3001 | @X | Text register | 3002 | @Y | Text register | 3003 | @T | Text register | 3004 | @E | Error message | .... | | | 3010 | @argv| Command line arguments vector | .... | | | 3020 | | Text variables | .... | | | 4000 | | Text literals | .... | | | FFFF | | Final address # --- BYTECODE FORMAT ------------------------------------------------ | BYTE | DATA | DESCRIPTION +------+------+------------------------------------------------------- | 0000 | 76 | First four bytes are char codes for "LDPL" | 0001 | 68 | | 0002 | 80 | | 0003 | 76 | | 0004 | 01 | Bytecode version number | 0005 | | First instruction | 0006+| | Program instructions | 00XX | 06 | Final EXIT | 00XX | | Sub-procedure definitions | 00XX | | Text literals # --- INSTRUCTIONS --------------------------------------------------- | CODE | NAME | DESCRIPTION +------+-------------------+------------------------------------------ | 00 | n/a | n/a | ==== | ================= | CONTROL FLOW ============================ | 01 | JUMP label | Jump to location of label | 02 | JIF label | Jump to label if $a is 0 (false) | 03 | JIT label | Jump to label if $a is 1 (true) | 04 | CALL label | Push location on stack and jump to label | 05 | RETURN | Pop loc off top of stack and jump to it | 06 | EXIT | Exit program | 07 | WAIT $r | Pause for milliseconds in register. | ==== | ================= | MEMORY COMMANDS ========================= | 10 | STORE %var $r | %var = value at address $r | 11 | SET $r 314 | Set $r to a literal number value | 12 | FETCH $r $x | Set $r to the value at address in $x. Like a pointer. | 13 | PUSH $x | Push $x onto the stack. | 14 | POP $a | Pop off the stack into $a. | 15 | STOREV %vec $r %v | Set %vec:$r to value of %v. %vec:@t and @v work too. | 16 | PUTV %vec $r %a | Put %vec:$r into %a. %vec:@t and @v work too. | ==== | ================= | ARITHMETIC ============================== | 20 | EQ $x $y $a | Set $a=1 if $x == $y | 21 | GT $x $y $a | Set $a=1 if $x > $y | 22 | GTE $x $y $a | Set $a=1 if $x > $y | 23 | LT $x $y $a | Set $a=1 if $x < $y | 24 | LTE $x $y $a | Set $a=1 if $x < $y | 25 | ADD $x $y $a | Set $x + $y to $a | 26 | SUB $x $y $a | Set $x - $y to $a | 27 | MUL $x $y $a | Set $x * $y to $a | 28 | DIV $x $y $a | Set $x / $y to $a, $e will be set to 1 if $y is 0. | 29 | MOD $x $y $a | Set $x % $y to $a | 2A | ABS $x | Convert $x to its absolute value. | 2B | CEIL $x | Round $x to next whole number. | 2C | FLOOR $x | Round $x to previous whole number. | 2D | RANDOM $a | Put random number in $a. | 2E | INCR $x | Add 1 to $x. | 2F | DECR $x | Subtract 1 from $x. | ==== | ================= | I/O COMMANDS ============================ | 30 | PRINT $x | Print content of register $x | 31 | PRINL $x | Print content of register $x and newline. | 32 | ACCEPT $x | Accept user input into num or text var. | 33 | ACCEOF $x | Accept user input until EOF. | 34 | EXEC @x | Run @x. | 35 | EXECO @x @a | Run @x and put output in @a. | 36 | EXECC @x $a | Run @x and put exit code in $a. | 37 | READ @x @a | Read file at path @x into @a. Sets $e, @e | 38 | WRITE @x @y | Write @x to file at path @y. | 39 | APPEND @x @y | Append @x to file at path @y. | ==== | ================= | TEXT OPERATIONS ========================= | 40 | LEN @x $a | Get length of string in @x. | 41 | JOIN @x @y @a | Concatenate text in registers into @a. | 42 | GETC $x @str @a | Get character in @str at $x and put into @a. | 43 | GETCC @str $a | Get character code of @str and put into @a. | 44 | GETIDX @x @y $a | Get index of @x in @y, put in $a. | 45 | PUTCC $x @a | Put ascii character with code $x into @a. | 46 | COUNT @x @y $a | Count occurrences of @x in @y, put in $a. | 47 | SUBSTR @x $x $y @a| Put @x[$x..$y] into @a. | 48 | SPLIT @x @y @a | Split @x by @y and put in vector @a | 49 | REPLCE @x @y @z @a| Replace @x from @y with @z in @a | 4A | TRIM @x @a | Strip L/R whitespace from @x, put in @a. | ==== | ================ | VECTOR OPERATIONS ======================= | 50 | CLEAR %v | Clears vector %v. | 51 | COPY %x %y | Copies contents of vector %x to vector %y | 52 | INDEXC %v %a | Store index count of vector %v in %a | 53 | INDEXS %v @v | Store indices of vector %v in vector @v === ISSUES =========================================================== 1. This first iteration plays fast and loose with the "byte" in bytecode. The .dinocode files aren't really binary and we're not doing any bit shifting or fun stuff like that. Once LDPL supports bitwise operations we'll revisit the core design so it's more bit- tastic. For now, we're just using numbers. 2. Dino is super slow. Performance may never be a priority. 3. The bytecode format, version number, and set of CPU instructions are going to change a lot while this is still in development. 4. Extensions are not, and probably won't ever be, supported. 5. Nothing is optimized at all, not even number constants. There are way too many instructions generated in most cases. 6. There is hardly any error checking yet, so you might end up generating bytecode that can't be run without knowing why. 7. Nested vectors don't work yet, like: `vec1:vec2:2` 8. The `IN - SOLVE` instruction doesn't work yet. 9. You can't use `-i=` to include files yet. For now, `cat` them all together and use `dino -` to run a program from stdin. So this: $ ldpl -i=lib.ldpl main.ldpl becomes: $ cat lib.ldpl main.ldpl | dino -