Transforming a toy prototype into a market-ready AI product requires plenty of technology.
Woobo is an AI-powered plush toy equipped with a touch-sensitive screen, touch sensors, accelerometers, and gyroscopes. Its main method for interaction is speech, with the screen acting as backup navigation. The screen is also used in many of the included content. Woobo has light-up buttons for ears that serve as the primary visual indicator that Woobo is listening. The button also activates the built-in microphone.
Woobo’s brain is an electronic device housed in a fire-retardant plastic enclosure, along with most of its other electronics. During initial set-up, the user is asked to connect Woobo to a Wi-Fi network. Once connected, the toy registers itself with the server. The server responds with a pairing code for the toy to display. The parent enters this code into Woobo’s parent app. Upon receiving the correct pairing code from the parent app within a few minutes, the server creates a device account for the toy and associates it with the parent account.
Initial contact between the toy and the server is made with a HTTPS REST call. After the toy’s device account has been successfully created, the toy switches to an encrypted websocket connection and authenticates itself with the server. The server sends the toy its initial state and play can begin. A state is the speech to be spoken, actions to be taken, and/or image(s) to be displayed by the toy before it awaits user input. From there, the toy will send to the server the inputs that it receives and the server responds with states that the toy should execute.
Switching away from Python was not feasible due to the lack of time.
A NoSQL database is used to store device and user accounts data because it is scalable. The interconnected nature of states lent them to being stored in a graph database. Specifically, a MySQL database with a RDF extension is used to store the state data and Sparql is used to query it. Instead of querying the database every time state information is needed, a REDIS data store is used to cache recently retrieved states, boosting performance.
A Lucene-based full-text search engine is used to power content searching, content browsing, and a recommendation system. Whereas the graph database contains state information (a single piece of content can have many states), the search engine is preloaded with just the content information. So the server uses the search engine to find the requested content and then queries the graph database, using Sparql, for the specified initial state. This greatly reduced the complexity of the Sparql queries used by the server. The content browser allows users to browse the available content by type, subject, preference, age, etc.
By using a Lucene-based full-text search engine, some search-specific code (e.g. tokenization, normalization, fuzzy matching) was removed from the server codebase and offloaded onto the search engine. Additionally, the search engine gave the server additional search capabilities like searching with wildcards.
Initially, the server did not handle concurrency well. The focus, at the time, was to get a server prototyped quickly to demonstrate the process, not concurrency, so Python was a good choice. Of course, concurrency is necessary for production. While threading is possible with Python, the threads are still limited by the Global Interpreter Lock (GIL). Switching away from Python was not feasible due to the lack of time. Going with a multi-process solution gets around the limitation of the GIL, but would have been non-trivial. So concurrency was achieved with the Tornado web framework. This necessitated the adoption of the pytest-tornado plugin to run the unit tests.
Pytest unit tests were added to the codebase for quality control. A Jenkins CI server was stood up and integrated with the company’s software version control system to run the unit and integration tests on every merge request. The integration tests used a “toy simulator” program to check the behavior of the server for a handful of server-toy interactions. Jenkins was chosen because it was a freely-available, open source, highly customizable, and simple to set up.
So the company’s backend infrastructure went from being a monolithic server to a collection of microservices in about one year.
With the addition of more technology needed for production (e.g. In-memory cache, Lucene, Jenkins, etc.), server configuration became non-trivial. In the beginning, everything ran on a single AWS server. As time went along, more services and technologies were added to get the server production ready. To better enable scaling, the server was moved to a microservices architecture. The various services were housed inside self-contained Docker images. This trivialized scaling and configuration because a new server could be added by just starting up some Docker images.
Each server instance also ran a Consul client image. Consul is a container management software and the client monitors the Docker images running on the server instance, via heartbeat, and reports to a main Consul server. Additionally, the Consul client communicated with a Nginx server to add new server instances to the pool of traffic-ready servers. Consul was chosen because ready-made Docker images for both the server and client are available online and usable with only minor configuration. Consul also had a serviceable service-management web GUI available out of the box. So the company’s backend infrastructure went from being a monolithic server to a collection of microservices in about one year.