Datasets are readily available as CSV files on websites such as Kaggle and other real-world datasets. However, CSV isn’t really suitable for querying and transforming easily. Therefore, it’s better to store the data in a database.
In this article, you will use csvs-to-sqlite to convert CSV files into SQLite databases. Once the data is in a database, it’s easier for users to query with SQL and add to their ETL service.
We have to get the dataset first, here is an example of the CPBL dataset.
Convert CSV to SQLite
“csvs-to-sqlite” tool is an open-source project and put it in the Github. We can easily use the pip install method to install the tools.
You can use the following commands to download the data and install the python package.
After configuring the environment, we can convert the CSV files into several SQLite databases.
The format of the csvs-to-sqlite command is:
csvs-to-sqlite /path/to/your.csv sqlite-db-name.db
The result after we convert the CSV files into a database:
Verify the CSV was imported correctly
We can now check if the data was successfully inserted into the SQLite database using the “sqlite3” command line tool to open the database and query the data.
We have successfully queried the SQLite data, which is the same as the CSV data. Now, If a data engineer wanted to query the historical data of an individual baseball player, they could use SQL to query the data faster.
The result of querying data from the SQLite database:
I am Simon
Hi, I am Simon, Customer Success Engineer in InfuseAI. If you think the article is helpful to you, please give me applause. Welcome to provide some suggestions and discuss with me in InfuseAI Discord.
InfuseAI is solving data quality issues
InfuseAI makes PipeRider, the open-source data reliability CLI tool that adds data profiling and assertions to data warehouses such as BigQuery, Snowflake, Redshift and more. Data profile and data assertion results are provided in an HTML report each time you run PipeRider.