Professional Projects
First Aid Skills - St John WAFIFA AI LEAGUE - Altered State Machine |
Taser VR Training - Zarmada Inc |
Komatsu Training - Cortiical Ltd |
Latest Personal Project
Project Command Center
This project centers around creating using a variety of tools and devices to create an AI assistant (non-ML).
Project tech stack:
- Core AI (C# console app)
- ESP32 E-Ink (C++, wiring/robotics)
- Android client devices (flutter, android subsystems)
- Whisper ML (C++, WSL)
- TCP web-sockets
Overview
At the core we have the AI brain, which sits on its own device / server. This holds persistent states and high level logic, and can make decisions as requested by a terminal.
Terminal is the word I am using for a device that is connected to the core server that serves 1-3 purposes;
1. Input. The user can input commands into the device to be acted on locally, or sent to the core. Speech, text, etc
2. Processor. The device can run actions as instructed by the core. Light bulb, etc.
3. Communicator. The device communicates messages from the core to the user. Speakers, screens, etc.
As you can see, you can have a device fit one or many of these purposes. Complex devices, like a PC or mobile phone will fit all three.
An example of a device that is strictly for communication is this ESP32 E-Ink display I've been working on. It connects to my home network with a static IP, forms a TCP client and then communicates messages it receives. The goal for this is if I do something like "set a timer for 5 minutes, and show it on ink one", then the core will start a timer, and instruct this display to show it. Communicator!
This project centers around creating using a variety of tools and devices to create an AI assistant (non-ML).
Project tech stack:
- Core AI (C# console app)
- ESP32 E-Ink (C++, wiring/robotics)
- Android client devices (flutter, android subsystems)
- Whisper ML (C++, WSL)
- TCP web-sockets
Overview
At the core we have the AI brain, which sits on its own device / server. This holds persistent states and high level logic, and can make decisions as requested by a terminal.
Terminal is the word I am using for a device that is connected to the core server that serves 1-3 purposes;
1. Input. The user can input commands into the device to be acted on locally, or sent to the core. Speech, text, etc
2. Processor. The device can run actions as instructed by the core. Light bulb, etc.
3. Communicator. The device communicates messages from the core to the user. Speakers, screens, etc.
As you can see, you can have a device fit one or many of these purposes. Complex devices, like a PC or mobile phone will fit all three.
An example of a device that is strictly for communication is this ESP32 E-Ink display I've been working on. It connects to my home network with a static IP, forms a TCP client and then communicates messages it receives. The goal for this is if I do something like "set a timer for 5 minutes, and show it on ink one", then the core will start a timer, and instruct this display to show it. Communicator!
User Input
As part of this project, I have also been looking at user input mechanisms.
First stages of the prototype had strict inputs in the form of terminal commands. Everything had to be exact, capital dependent and in the correct input format. This isn't a viable product for an end user though, so I had to expand this.
For each action I define in the core, I have a regex match pattern that goes along with it. When a command comes in, I check against that pattern to see if it works. This allows for a lot more freedom of input! Instead of "set volume 10", you can say "hey, set the volume to 10 percent please", and it will match the volume action pattern of "\bset\b.*\bvolume\b.*\bto\b\s*(\d+)".
Whisper / Speech to Text
With this in place, I now wanted to experiment with using speech to text. To do this I leverage the Whisper STT model using whisper.cpp. Its accuracy is a little suspect on the smaller / more efficient models, but it runs entirely on device. I also attempted to use windows speech recognition, and found it to be *very* bad.
Using WSL, I was able to compile my own version of whisper.cpp that used a trigger phrase to then send a command down the pipe to the core. From there, the text parsing was exactly the same, so this worked really well! Here we can see me using the phrase "fetch my remote configuration file" to run a basic download and display action. The WSL instance runs on the right, and the core TCP receiver runs on the left!
As part of this project, I have also been looking at user input mechanisms.
First stages of the prototype had strict inputs in the form of terminal commands. Everything had to be exact, capital dependent and in the correct input format. This isn't a viable product for an end user though, so I had to expand this.
For each action I define in the core, I have a regex match pattern that goes along with it. When a command comes in, I check against that pattern to see if it works. This allows for a lot more freedom of input! Instead of "set volume 10", you can say "hey, set the volume to 10 percent please", and it will match the volume action pattern of "\bset\b.*\bvolume\b.*\bto\b\s*(\d+)".
Whisper / Speech to Text
With this in place, I now wanted to experiment with using speech to text. To do this I leverage the Whisper STT model using whisper.cpp. Its accuracy is a little suspect on the smaller / more efficient models, but it runs entirely on device. I also attempted to use windows speech recognition, and found it to be *very* bad.
Using WSL, I was able to compile my own version of whisper.cpp that used a trigger phrase to then send a command down the pipe to the core. From there, the text parsing was exactly the same, so this worked really well! Here we can see me using the phrase "fetch my remote configuration file" to run a basic download and display action. The WSL instance runs on the right, and the core TCP receiver runs on the left!
At this point I had inputs going into the core, which was running actions and sending out messages. This hadn't yet resolved into the core / terminal structure I have now, but it was a start. From here I expanded my action list to explore a number of different things. Timers, weather API's and data storage were a few.
Once I had these in place, I started to move towards developing my terminals. I made web-socket hooks into my tablet, my phone, my pc, etc. I learnt a lot about running backround processes on android, and how to get around the permissions process to make my life easier.
I learnt so many workarounds to permissions at this point that I gave myself an involuntary dev-sec course that scared the life out of me. I never knew there were that many holes in OS security!
Finally, we arrive at the present. I'm currently working on solidifying the core/terminal structure to give everything rigid responsibilities and setup easy pathways to onboard new devices onto the AI network. Its fun, and I think it will keep me occupied for a good while.
Once I had these in place, I started to move towards developing my terminals. I made web-socket hooks into my tablet, my phone, my pc, etc. I learnt a lot about running backround processes on android, and how to get around the permissions process to make my life easier.
I learnt so many workarounds to permissions at this point that I gave myself an involuntary dev-sec course that scared the life out of me. I never knew there were that many holes in OS security!
Finally, we arrive at the present. I'm currently working on solidifying the core/terminal structure to give everything rigid responsibilities and setup easy pathways to onboard new devices onto the AI network. Its fun, and I think it will keep me occupied for a good while.