Dropbox achieved a $1.7 million cost savings by streamlining the file preview function in its service with machine learning.
File preview data is created in advance using an internal system called Riviera that can quickly preview files uploaded to Dropbox. However, in some cases, previously generated data was not used and wasted, so Cannes, a system that predicts data to be generated in advance by machine learning, and saves system resources when generating data, was launched.
During the development of Carnes, what the development team focused on was the tolerance for performance degradation and the simplicity of the machine learning model if the prediction was out of order. The goal was to make the line precisely drawn by machine learning by drawing a line in the allowable range for preview. In this regard, it is said that the aim was to simplify the machine learning model, if possible, to clarify why the prediction result was born, and to facilitate early debugging of the introduction.
In the construction of the machine learning model, account classification according to file extension or stored file type, and account activity for the last 30 days were adopted as input data. It is said that it achieved the goal of about Cannes v1 built by this method and estimated cost savings of 1 billion won. Through A/B testing using production environment traffic, etc., the development team introduced Kanes into the system by actually confirming whether the drop in response speed was within the allowable range.
Rivera, in charge of generating preview data, first sends the file ID or type to Carnes when creating the preview data. Carnes collects account types and usage from external databases based on file IDs. The collected data is transformed into a vector representing the features, and then the model input to the machine learning model predicts whether a file preview will be performed for the next 60 days, and the prediction result is transmitted to Rivera and stored as a log along with the debugging features. .
Currently, Carnes is being applied to almost all traffic on Dropbox, and the development team can reduce the annual cost of pre-generating preview data considerably, as estimated. The operating cost of Carnes is $9,000 per year, so significant cost savings can be achieved. In addition to experimenting with more complex models and aiming to improve prediction accuracy, the development team plans to perform fine-tuning to adjust the model by re-learning features. Related information can be found here.