Itlize

Blog

The next-gen time-series engine of InfluxDB gets built on Rust and it supports SQL

Jason Li
Sr. Software Development Engineer
Skilled Angular and .NET developer, team leader for a healthcare insurance company.
November 14, 2022

Time Series and Time Series Databases (DB)

Time series databases are based on programming an ordered sequence of values of any variable at equally spaced intervals. In short, it is a sequence of discrete-time data. Practical applications of time series databases include their prolific use in IoT devices. For instance, the measurements that constitute a time series get ordered on a timeline and it reveals information about underlying patterns, which is significant in the case of proper data analysis. InfluxDB is one such database that relies on the time-series engine. Here proper ordering is vital for the productivity of InfluxDB. There is a dependency between time and measurements. Once the order gets changed, it affects the orientation and meaning of the entire or whole data. The time series gets used in more than one context. These include Time Series Analysis, Regression Analysis, and Time Series Forecasting.

Time Series Analysis: It explores the changes that occur in variables over a period of time. A lot of data gets involved while considering the nuances of a time series analysis. The focus lies in reaching equilibrium, balance, or optimized state that aids in achieving the highest productivity with this time series analysis.

Regression Analysis: It gets utilized to examine the changes that occur in specific variables, which cause shifts in data by influencing other variables over a period of time. A classic example of such a system where regression analysis comes in handy is the stock market or any factors that cause a domino effect that alters the cause outcome from a system.

Time Series Forecasting: It utilizes information from the past to forecast based on the information that is already available. The data includes historical facts, previous nuances, and associated patterns. Weather forecasting and economic forecasting are classic examples of these.

The time series databases optimize the time-stamped data as they are developed specifically to handle associated metrics, events, and measurements.

With the databases, developers and programmers can develop, create, enumerate, update, and efficiently organize various time series.

Properties of Time Series Databases like InfluxDB

Some of the highlighted properties of time series databases include data location, range queries, high write performance, data compression, scalability, and usability along with, popularity.

Data Location: The time series databases use their ability to co-locate quanta of data within the same time range on the exact same physical part of the database cluster. Thus, it enables quick access or faster and more efficient analysis.

The Range Queries: These fast and easy range queries facilitated by the time series databases function well to eliminate errors while recalling the relevant data or information. The query language used makes it easier to operate and work through.

High Write Performance: The time series databases ensure high availability and performance for both read & write operations, especially during peak loads. The faster write operations ensure the higher productivity of time series databases.

Data Compression: The time series databases record their data at a faster rate with low granularity, and this call for a better data compression technique or model. When the data gets older, the emphasis on granularity decreases and vice versa and so the time series databases provide ample functionality to perform roll-ups in such scenarios that aid in data compaction.

Scalability: Keen observation showcase that time series data increases quite quickly and regular databases cannot handle this scalability. Time series databases influence scalability by introducing proper functionalities. It results in performance improvements that include higher insertion rates, faster scale queries, and better data compression.

Usability: Time series databases utilize data retention policies, continuous queries, flexible time aggregations, and range queries along with useful data compression.

Popularity: With so many nuances that make data retention and analysis easier, time series databases add themselves to the popular category by being highly productive and efficient. IoT (Internet of Things), a major technological marvel, aids in increasing the popularity of these databases like the InfluxDB. Other uses include DevOps Monitoring and real time data analysis. Time series databases undergo massive scalability & performance boost regularly that reduces its downtown, lower costs, and improve business decisions that increase its productivity.

Rust

Rust, the novel system programming language, provides a better alternative to C. It is unique as it enforces safety without any runtime overhead. The linear type system of Rust enables various capabilities that dramatically improve security & reliability of system software. The capabilities include automatic checkpointing, and information flow.

SQL

The Structured Query Language (SQL) communicates well with all kinds of databases including the time series databases like InfluxDB. It is a standard language that gets utilized for Relational Database Management Systems. SQL statements help with performing several functions including updating data on database and retrieving data from database.

InfluxDB’s next-generation time-series engine is built on Rust and it supports SQL

A new time series engine got launched by InfluxDB, which invokes a response to queries at a faster rate than other alternatives in the competitive marketplace. It supports data analysis in the process as well by dealing with massive database overloads. Enterprises are seeing and forecasting an unprecedented rise in real-time data analytics. The next-gen time-series engine is getting launched and released to manage database service InfluxDB Cloud. Here, time series data gets utilized according to the market research firm IDC. As mentioned, time series data gets defined as a set of data points collected at regular time intervals which include fixed time stamps. These types of datasets reveal various data patterns and trends that help enterprises to work on their flaws. Strategies get devised based on the data obtained from the time series data analysis. Better business decisions that increase the productivity of an organization get devised based on the data obtained. The time series databases recently gained enough traction to gain more prominence with the advent of streaming technologies and related data. In contrast to earlier data practices such as uploading information in a high-latency batch format, the streaming technologies allow time series data to flow into a real-time database. Observations highlight that a time series database and analytics toolset work hand-in-hand to handle a large influx of continuous data and successfully mine the massive workloads of data from insights.

Developed on Rust for Performance and Scaling

InfluxDB has developed a novel time-series engine based on the company’s IOx Open Source Project that got introduced in 2020. It got developed in Rust Programming language to enhance its performance and scaling attributes. To support its performance in terms of several attributes like faster storage, InfluxDB reengineered its Columnar Oriented Storage, a unique technological marvel. It enabled the time-series engine to ingest data in high volumes with unbounded cardinality and efficiency. This Column Oriented Database is faster than a row-oriented one. It is because the machine uses less memory for data storage. The process in turn enhances the query output speeds as the system gets to access a smaller portion of the database to process it. Cardinality gets defined as the number of unique sets of data stored in the database. The scaling is dependent on a higher number of cardinals available. As per InfluxDB, the new time-series engine processes queries across most time-series data within milliseconds. The company uses Apache Parquet files on disk storage and Apache Arrow for data in-memory operations among all the relevant components.

Writing Queries in SQL

InfluxDB has finally forayed into a support system that allows developers to write queries in SQL through this new time-series machine. As is known, SQL is one of the most popular database operating languages. Earlier, InfluxDB allowed developers to write queries with the API (Application Programming Language), Flux, and InfluxQL. Flux is built on Open Source, and it is a standalone scripting & query language that focuses on Code Reuse. It is optimized for Extract, Transform, and Load (ETL), as per InfluxDB. InfluxQL is another query language with an SQL-like Syntax. Adding proper support for SQL helps with applying real-time data solutions. It boosts the adoption rates through the intelligent use of existing teams to add new cases when SQL support is obtained. All these query languages get accessed through the DataFusion Query Engine. It helps with extensible query planning, optimization, and execution framework written in Rust which uses Apache Arrow as its in-memory format. The new time-series engine increases the observing ability, which includes traces, logs, and metrics.

All the query languages, according to the company, can be accessed via the DataFusion query engine—which is an extensible query planning, optimization, and execution framework, written in Rust, that uses Apache Arrow as its in-memory format.

Conclusion

InfluxDB launched a new time-series engine that is built on Rust and allows SQL as a query language apart from others. This revolutionary technology in real-time databases increases the performance and makes it easier to scale through. Thus, the focus lies in improving the functioning of time-series engines and its databases, which becomes beneficial for many organizational enterprises and companies.