skyeye

module

v0.1.0-stealth.6 Latest Latest Go to latest Published: Aug 14, 2024 License: MIT

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

README ¶

SkyEye: AI Powered GCI Bot for DCS

SkyEye is a Ground Controlled Intercept (GCI) bot for the flight simulator Digital Combat Simulator (DCS). A GCI bot allows players to request information about the airspace in English using either voice commands or text entry, and to receive answers via verbal speech and text messages

SkyEye uses Speech-To-Text and Text-To-Speech technology which runs locally on the same computer as SkyEye. No cloud APIs are required. It works with any DCS mission, singleplayer or multiplayer. No special scripting or mission editor setup is required. You can even run SkyEye on your own PC to provide GCI service on a remote multiplayer server.

SkyEye is under active development. Most types of radio calls are functional running against live multiplayer servers. Howevever, there's still plenty to do before this is ready for widespread use. To see what I'm working on, check out the milestones!

Goals

Implement ALPHA CHECK, BOGEY DOPE, DECLARE, FADED, PICTURE, RADIO CHECK, SNAPLOCK, SPIKED and THREAT calls
Run entirely locally on reasonable consumer hardware
Use modern speech synthesis that sounds like a human (Goodbye, Microsoft SAM! Hello, Piper!)
Hybridize real-world air control communication and brevity with pragmatism
Proactively inform and update players instead of using static tripwire rules
Support accessible interfaces in addition to voice and audio, including keyboard based input and in-game subtitles
Excellent documentation for developers, server administrators and players
Be easy for a beginner programmer to customize
Have useful test coverage, especially of controller logic
Support Windows x86-64, Linux x86-64 and Linux ARM. Experimental functionality on macOS with Apple Sillicon.
Allow multiple GCI bots to run on the same DCS and SRS instance with different callsigns and frequencies
Minimize maintenance burden. Ship a static binary with as many pinned dependencies as possible, so this software continues to function with reduced maintainer activity

Anti-Goals

Follow grug-brained principles. Avoid unecessary design patterns. Keep it simple!
Focused feature set. Don't try to match other bots 1:1 on feature set.
Say "no" to complex features. Provide the basics, and sufficient documentation for others to fork and customize for their use case.

Getting Started

Developers: See CONTRIBUTING.md for instructions on building, running and modifying the bot.
Server admins: Documentation coming Soon™
Players: See the user guide (work in progress) for instructions on using the bot.
Please also see the privacy statement to understand how SkyEye uses your voice and gameplay data to function.

Technology

Skyeye would not be possible without these people and projects, for whom I am deeply appreciative:

DCS-SRS by @ciribob. Ciribob also patiently answered many of my questions on SRS internals and provided helpful debugging tips whenever I ran into a block in the SRS integration.
Tacview - specifically, ACMI real time telemetry - provides the data feed from DCS World.
@rurounijones's OverlordBot was a useful reference against SkyEye during early development, and Jones himself was also patient with my questions on Discord.
@ggerganov's whisper.cpp models provides text-to-speech.
@rodaine's numwords module is invaluable for parsing numeric quantities from voice input.
Piper by the Rhasspy voice assistant project is used for speech-to-text.
The Jenny dataset by Dioco provides the feminine voice for SkyEye.
@popey's dataset provides the masculine voice for SkyEye.
@amitybell's embedded Piper module makes distribution and implementation of Piper a breeze. @nabbl improved this module by adding support for macOS.
The Opus codec and the hraban/opus module provides audio compression for the SRS protocol.
@hbollon's go-edlib module provides algorithms to help SkyEye understand when it slightly mishears/the user slightly misspeaks a callsign or command over the radio.
@lithammer's shortuuid module provides a GUID implementation compatible with the SRS protocols.
@zaf's resample module helps with audio format conversion between Piper and SRS.
@martinlindhe's unit module provides easy angular, length, speed and frequency unit conversion.
@paulmach's orb module provides a simple, flexible GIS library for analyzing the geometric relationships between aircraft.
@proway's go-igrf module implements the Internation Geomagnetic Reference Field used to correct for magnetic declination.
Cobra is used for the CLI frontend, including configuration, help and examples.
MSYS2 provides a Windows build environment.
Oto was helpful for debugging audio format conversion problems.
zerolog is helpful for general logging and printf debugging.
testify is used in unit tests.
Multiple DCS communities provide invaluable feedback and morale-booster energy:
- Team Lima Kilo and the Flashpoint Levant community
- The Hoggit Discord server
- Digital Controllers
- 1VSC
- CVW8
- @Frosty-nee
The Ace Combat series by PROJECT ACES/Bandai Namco and Project Wingman by Sector D2 are massive influences on my interest in GCI/AWACS, and aviation in general. This project would not exist without the impact of Ace Combat 04: Shattered Skies.
And of course, DCS World is produced by Eagle Dynamics.

FAQ

Is this ready?

This project is close to a Limited Availability release by early fall 2024. A General Availability release is expected during winter 2024-2025.

You can check current progress here!

What kind of hardware does it require?

CPU: SkyEye is currently highly sensitive to CPU performance. On my system with an AMD 5900X, it takes 1-3 seconds to recognize a voice command, and starts responding 1 second after that. However, SkyEye is extremely sensitive to CPU latency. It does not run well when sharing a CPU with other intensive software.

Avoid running SkyEye on the same physical machine as another intensive app like DCS or TacView client. Ideally, run it on a separate computer.
If you're running SkyEye on a cloud provider, ensure your virtual machine has dedicated CPU cores instead of shared CPU cores.
SkyEye is heavily multi-threaded and benefits from multi-core performance.

Memory: SkyEye uses about 2.5-3.0GB of RAM when using the ggml-small.en.bin model.

Disk: SkyEye requires around 1-2GB of disk space depending on the selected Whisper model.

There is some room for improvement:

I'm using an off the shelf general purpose Whisper model in my development environment. There's some exciting research into faster distilled models and custom trained models that will be revisited in a few months. I also strongly suspect a combination of advances in AI and Moore's Law will significantly improve Speech-To-Text performance within the next year or so.
I need to investigate hardware acceleration using CUDA, OpenVINO and Core ML. This is challenging because I have limited hardware - if you're interested in this and have hardware please get in touch!

Why not update OverlordBot?

It would probably be less effort to update OverlordBot to use OpenAI Whisper speech recognition. I certainly wouldn't have had to reimplement the SRS wire protocol from scratch! If you are willing and capable, I encourage you to contribute that change to OverlordBot.

I have some personal, selfish reasons for writing a new bot:

I like programming in Go and *nix more than I like C#/.NET. Instrinic motivation is extremely important for hobby developers
I use Go, Python and Linux professionally so this is more relevant to my career development than .NET development
I want to learn more about practical network programming with coroutine-based concurrency
I believe the TRIPWIRE functionality in OverlordBot is damaging to the community and want to eradicate it.
I want to innovate and deliver new features that would be breaking changes to the OverlordBot community.
Given my lack of .NET development skills, it is faster for me to write new software using technologies to which I am "native" rather than contribute to OverlordBot.

Why didn't you implement TRIPWIRE?

TRIPWIRE encourages players to think about themselves in a small bubble. It also clutters the channel with information in a format only useful to a specific player. It encourages players to act as lone wolves rather than as members of a team.

Instead, I have implemented THREAT monitoring. THREAT monitoring warns you when a hostile aircraft is a danger to your coalition. The advantages:

THREAT calls do not require you to individually register with the bot. The bot automatically monitors all friendly aircraft which tune to the SRS frequency.
Locations in THREAT calls are given in either BRAA or BULLSEYE format, depending on whether the call is relevant to a single aircraft or multiple aircraft TRIPWIRE calls only provide BRAA format.
THREAT monitoring provides continual updates on the threat as long as threat criteria are met, all the way til the merge. A TRIPWIRE call is only given once, at a single requested threat range.
THREAT monitoring considers the bandit group's distance, platform (aircraft & weapons) and aspect (Hot, Flank, Beam, Drag). THREAT calls are broadcast earlier in the BVR timeline for bandits that present a higher relative threat. TRIPWIRE calls only consider threat range, and are broadcast at the same range regardless of other factors.
THREAT monitoring deduplicates calls to multiple friendly aircraft about the same bandit group.

Can I train the speech recognition on my voice/accent?

Since the software runs 100% locally, the speech recognition model is a local file. Server oprators can provide a trained model as an alternative to the off-the-shelf model. See this blog post for an example.

I don't plan to provide a mechanism for players to submit their voice recordings to the main repostitory due to data privacy concerns.

Does this use Line-Of-Sight restrictions?

No. Excluding this feature was an explicit choice in order to avoid the complexity demon.

If this is a critical feature for you, consider using MOOSE's AWACS module instead. It supports Line-Of-Sight and datalink simulation, at the tradeoff of requiring some special setup in the Mission Editor.

OverlordBot also optionally supports this feature, although less than 1% of users used it.

Will this work with DCS's built-in VoIP?

Hopefully in the future Eagle Dynamics will add support for external GCI bots. If anyone at ED is reading this, access to any relevant preview builds would be really helpful!

Could this use a Large Language Model? (llama, mistral, etc.)

This deserves a longer answer, for now see this issue

TL;DR most of the controller logic is simple geometry that completes in about a millisecond. An LLM is several orders of magnitude slower, less accurate and a more difficult user experience.

We use AI for the "squishy" problems - understanding human speech, and synthesizing human-like speech. We use traditional code for the algorithmic problems.

Could this provide ATC services?

This deserves a longer answer, for now see this issue

TL;DR I have no plans to attempt an ATC bot.

When is SkyEye's birthday?

October 12th. At some point I'll put an Ace Combat 04 easter egg in there.

Directories ¶

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

Path	Synopsis
cmd
skyeye
internal
application package application is the main package for the SkyEye application.	package application is the main package for the SkyEye application.
conf
pkg
bearings package bearings contains functions for working with absolute and magnetic compass bearings.	package bearings contains functions for working with absolute and magnetic compass bearings.
brevity package brevity contains types and models for air combat communication brevity	package brevity contains types and models for air combat communication brevity
coalitions package coalitions defines the coalitions in DCS World.	package coalitions defines the coalitions in DCS World.
composer package composer converts brevity responses from structured forms into natural language.	package composer converts brevity responses from structured forms into natural language.
controller package controller implements high-level logic for Ground-Controlled Interception (GCI)	package controller implements high-level logic for Ground-Controlled Interception (GCI)
encyclopedia package encyclopedia is a database of aircraft data	package encyclopedia is a database of aircraft data
parser parser converts converts brevity requests from natural language into structured forms.	parser converts converts brevity requests from natural language into structured forms.
pcm package pcm converts beween different representations of PCM audio data.	package pcm converts beween different representations of PCM audio data.
radar package radar implements mid-level logic for Ground-Controlled Interception (GCI)	package radar implements mid-level logic for Ground-Controlled Interception (GCI)
recognizer package recognizer recognizes text from speech	package recognizer recognizes text from speech
sim package sim provides an inteface for receiving telemetry data from DCS World	package sim provides an inteface for receiving telemetry data from DCS World
simpleradio package simpleradio contains a bespoke SimpleRadio-Standalone client.	package simpleradio contains a bespoke SimpleRadio-Standalone client.
simpleradio/audio package audio implements the SRS audio client.	package audio implements the SRS audio client.
simpleradio/data package data implements the SRS data client.	package data implements the SRS data client.
simpleradio/types package types contains types used by the SRS clients.	package types contains types used by the SRS clients.
simpleradio/voice package voice contains the types used by the SRS audio protocol to send and receive audio data over the network.	package voice contains the types used by the SRS audio protocol to send and receive audio data over the network.
synthesizer package sythesizer contains text-to-speech synthesizers.	package sythesizer contains text-to-speech synthesizers.
synthesizer/speakers package speakers contains interfaces and implementations for text-to-speech speakers.	package speakers contains interfaces and implementations for text-to-speech speakers.
synthesizer/voices package voices contains the available voices for the synthesizer package.	package voices contains the available voices for the synthesizer package.
tacview package tacview streams simulation data from TacView	package tacview streams simulation data from TacView
tacview/acmi package acmi streams simulation data from a TacView Air Combat Maneuvering Instrumentation (ACMI) data source.	package acmi streams simulation data from a TacView Air Combat Maneuvering Instrumentation (ACMI) data source.
tacview/client client contains clients to stream ACMI data from a local or remote source.	client contains clients to stream ACMI data from a local or remote source.
tacview/properties package properties contains names of ACMI object properties.`	package properties contains names of ACMI object properties.`
tacview/tags
tacview/types package types contains types used by the TacView clients	package types contains types used by the TacView clients
trackfiles package trackfiles records aircraft movement over time.	package trackfiles records aircraft movement over time.