Learn Common Database Managing Commands as a Data Engineer | by Lynn Kwong | Feb, 2023

By Jessie Hobb On Feb 28, 2023

Important MySQL DDL commands we should know for managing our tables

As a data engineer, checking and updating the schemas of tables is our bread and butter. There are plenty of tutorials online already but few of them focus on the conventions that should be followed. SQL is very flexible and can work in a “robust” way. You can use both lowercase and uppercase queries and name your database/tables/columns/indexes/views in whatever way you want. However, the price is that the readability is reduced and it becomes difficult to maintain because different people may have different ways of writing SQL queries.

In this post, we will introduce some common commands for managing table schemas in MySQL, with a focus on the convention and best practices for each operation. It can work as a handbook (with necessary adjustments) for new data engineers.

Preparation

We will use Docker to start a MySQL 8 container which will work as the MySQL server for this post:

# Create a volume to persist the data.
$ docker volume create mysql8-data# Create the container for MySQL.
$ docker run --name mysql8 -d -e MYSQL_ROOT_PASSWORD=root -p 13306:3306 -v mysql8-data:/var/lib/mysql mysql:8
# Connect to the local MySQL server in Docker.
$ docker exec -it mysql8 mysql -u root -proot
mysql> SELECT VERSION();
+-----------+
| VERSION() |
+-----------+
| 8.0.31    |
+-----------+
1 row in set (0.00 sec)

We will create our database (also called schemas in MySQL) and tables in this MySQL server. To get started, let’s create a database to store our dummy data:

CREATE DATABASE sales;

A database name should be descriptive, concise, and clear and contain no special characters except underscores. It should preferably be lowercase so we can tell it easily from MySQL keywords. The same naming convention applies to table and column names.

We will use DBeaver to write queries and view table data in this post.

Create a table

Let’s now create our first table which will store customer data.

CREATE TABLE `sales`.`customers` (
`id` SMALLINT NOT NULL AUTO_INCREMENT,
`name` VARCHAR(50) NOT NULL,
`job` VARCHAR(50) DEFAULT '',
PRIMARY KEY (`id`),
KEY `ix_name` (`name`)
);

You can use singular or plural table names. I prefer plural as a table can be thought of as a container for data records.

The data definition language (DDL) query for an existing table can be found by this command:

SHOW CREATE TABLE `sales`.`customers`;

It’s recommended to specify the schema name when writing queries so you can have better support for auto-completion.

By default, MySQL is case-sensitive regarding database names, table names, and aliases. However, it is case-insensitive for column names. Therefore, it can be very flexible for the naming of column names. However, we should stick to some naming convention for the same database. It doesn’t matter whether you use a camel case or a snake case, you just need to be consistent. However, you may have some preferences depending on your backend programming language. For example, as Python developers, we would prefer the snake case.

Besides, as you see we give the prefix ix (meaning index) to the index name. We should generally avoid giving prefixes for column names in order to make the queries more concise. However, we should provide prefixes for indexes or constraints so they can be more indicative when shown in some errors. We would rarely need to reference indexes or constraints explicitly so being concise is not a problem.

Some commonly used prefixes are:

ix for indexes.
fk for foreign key constraints.
uq for unique key constraints.

Besides, there are some conventions regarding how the indexes or constraints should be named:

Indexes: ix_%(column_0_label)s
Foreign keys: fk_%(table_name)s_%(column_0_name)s_%(referred_table_name)s
Unique keys: uq_%(table_name)s_%(column_0_name)s,

You can check this reference for more definitions and some examples. We will introduce some of them later in this post.

Rename or duplicate a table

We can use the RENAME command to rename a table:

RENAME TABLE sales.customers TO sales.customer  -- Not executed as I prefer plural
;

If you want to duplicate a table from one schema to another one, you would need to do it in two steps. For example, let’s create a new customers_data schema and duplicate the customers table there.

CREATE DATABASE customers_data;CREATE TABLE customers_data.customers LIKE sales.customers;
INSERT INTO customers_data.customers
SELECT * FROM sales.customers
;

In this way, the data type and indexes of the old table will be kept in the new table. If you do it as shown below using CREATE TABLE … SELECT… then the data type (actually will be inferred) and indexes will be lost, which is not desired in almost all cases:

CREATE TABLE customers_data.customers_copy_direct
SELECT * FROM sales.customers 
;

You can use the SHOW CREATE TABLE command to check the schema of the new table created.

If you need to move a table from one database (different host or port) to another one, you can dump the table into a SQL file on one host, and then load it into the other one:

mysqldump -h HOST_1 -P PORT_1 -u USERNAME_1 -p \
--single-transaction --skip-triggers --skip-column-statistics \
SCHEMA_1 TABLE_NAME > TABLE_NAME.sql
mysql -h HOST_2 -P PORT_2 -u USERNAME_2 -p SCHEMA_2 < TABLE_NAME.sql

mysqldump is available after you have installed the MySQL client:

sudo apt-get update
sudo apt-get install mysql-client

The hosts can be the same for the two databases. In this case, the port would be different.

Note the options specified for mysqldump, which are needed in most cases. Especially, with --single-transaction, the table won’t be locked when it’s being dumped.

Add/Delete/Change columns

For a demonstration of the commands, let’s perform the following actions. The operations may not make much sense, the focus is on the commands used:

Delete the name column with DROP,
Add the name column back with ADD,
Change the data type of a column with MODIFY;

ALTER TABLE `sales`.`customers`
DROP `name`,
ADD `name` VARCHAR(50) NOT NULL AFTER `id`,
MODIFY `job` VARCHAR(100) DEFAULT ''
;

Note that we can use the AFTER column_namekeyword to change the order of the columns. If a column should be changed to be the first one, then we need to use the FIRST keyword rather than AFTER column_name.

For example, let’s change the name column to be the first one:

ALTER TABLE `sales`.`customers`
MODIFY `name` VARCHAR(50) NOT NULL FIRST
;

Rename a column

We can just rename a column without changing the data type using RENAME COLUMN A TO B:

ALTER TABLE sales.customers 
RENAME COLUMN `job` TO `address`
;

Note that the COLUMN keyword can be omitted for the ADD, DROP, MODIFY, as well as the CHANGE command to be introduced, but not for the RENAME command here.

We can also use the CHANGE command to rename a column and also change the data type. Let’s change name to username and also change the length to 100:

ALTER TABLE sales.customers 
CHANGE `name` `username` VARCHAR(100) NOT NULL
;

Work with foreign keys

Let’s create two new tables to demonstrate the usage of foreign keys. A new products table will store the products info and an orders table the orders made by customers:

CREATE TABLE `sales`.`products` (
`id` INT NOT NULL AUTO_INCREMENT,
`name` VARCHAR(50) NOT NULL,
`price` DECIMAL(12,2),
PRIMARY KEY (`id`),
KEY `ix_name` (`name`),
KEY `ix_price` (`price`)
);CREATE TABLE `sales`.`orders` (
`customer_id` SMALLINT NOT NULL,
`product_id` INT NOT NULL,
`quantity` SMALLINT NOT NULL,
PRIMARY KEY (`customer_id`, `product_id`),
KEY `ix_product_id` (`product_id`),
KEY `ix_quantity` (`quantity`),
CONSTRAINT `fk_orders_customer_id_customers` FOREIGN KEY (`customer_id`) REFERENCES `customers` (`id`) ON DELETE CASCADE,
CONSTRAINT `fk_orders_product_id_products` FOREIGN KEY (`product_id`) REFERENCES `products` (`id`) ON DELETE CASCADE
);

Note that the id column has no prefix in the customers and products tables but has in the orders table. This is needed because there are two IDs there, one for the customer and another for the product.

If you check the data types of the id columns, you will find they are the same as those in the customers and products tables. This is required for adding foreign keys, as the data type of the column must be the same in the current table and in the reference table.

A composite primary key was created using customer_id and product_id. Note that we need to create a separate index for product_id, but not for customer_id, which is covered by the composite primary key because it’s the first one in the composite key.

Also pay attention to the naming convention for the foreign key constraints, which follows the naming convention introduced in this link.

To check the indexes of a table, we can run one of the following two queries:

SHOW INDEX FROM sales.orders;SELECT
s.TABLE_SCHEMA,
s.TABLE_NAME,
s.INDEX_NAME,
s.COLUMN_NAME,
s.SEQ_IN_INDEX 
FROM INFORMATION_SCHEMA.STATISTICS s
WHERE 1
AND s.TABLE_SCHEMA = 'sales'
AND s.TABLE_NAME = 'orders'
;

Note that the above queries do not return foreign keys. If we need to check the foreign keys, we need to run this query:

SELECT 
TABLE_SCHEMA,
TABLE_NAME,
CONSTRAINT_NAME,
COLUMN_NAME,
REFERENCED_TABLE_SCHEMA,
REFERENCED_TABLE_NAME,
REFERENCED_COLUMN_NAME
FROM
INFORMATION_SCHEMA.KEY_COLUMN_USAGE
WHERE 1
AND TABLE_SCHEMA = 'sales'
AND TABLE_NAME = 'orders'
;

Note that the two special tables INFORMATION_SCHEMA.STATISTICS and INFORMATION_SCHEMA.KEY_COLUMN_USAGE are system tables and are normally referenced in uppercase.

Add and delete indexes and constraints

Let’s demonstrate how to add and delete indexes and constraints. We cannot modify an index or constraint because once the condition is changed for an index/constraint, it has to be regenerated.

Let’s first drop the primary key, indexes, and foreign keys for the orders table:

ALTER TABLE sales.orders 
DROP PRIMARY KEY,
DROP INDEX `ix_product_id`,
DROP INDEX `ix_quantity`,
DROP FOREIGN KEY `fk_orders_customer_id_customers`,
DROP FOREIGN KEY `fk_orders_product_id_products`
;

Note the way we specify how a foreign key should be dropped. It should be DROP FOREIGN KEY …, rather than DROP CONSTRAINT ….

And now let’s add them back:

ALTER TABLE sales.orders 
ADD PRIMARY KEY (`customer_id`, `product_id`),
ADD KEY `ix_product_id` (`product_id`),
ADD KEY `ix_quantity` (`quantity`),
ADD CONSTRAINT `fk_orders_customer_id_customers` FOREIGN KEY (`customer_id`) REFERENCES `customers` (`id`) ON DELETE CASCADE,
ADD CONSTRAINT `fk_orders_product_id_products` FOREIGN KEY (`product_id`) REFERENCES `products` (`id`) ON DELETE CASCADE
;

The syntax is similar to that in the DDL query above.

Create or update a view

A MySQL view works just like a table and is treated like a table. A view normally contains some selected columns from one or more tables based on some filtering conditions. It can be used to quickly check some specific data from one or more tables without writing JOIN and WHERE conditions.

Let’s create a view for the orders table so we can get the details of the customers and products for the orders directly:

CREATE OR REPLACE VIEW sales.orders_with_details AS
SELECT
o.customer_id,
c.username,
c.address,
o.product_id,
p.name,
p.price,
o.quantity,
p.price * o.quantity AS total
FROM sales.orders o 
JOIN sales.customers c 
ON c.id = o.customer_id 
JOIN sales.products p 
ON p.id = o.product_id 
;SELECT * FROM sales.orders_with_details;

The name of a view should be indicative of its usage. In this case, orders_with_details is better than orders_view as the former is more indicative of what’s contained in the view.

Standards for writing SQL queries

We should write SQL queries in an easy-to-read way. There is no strict standard for how to write SQL queries. However, following the rules below will make your queries much easier to read and maintain:

Put all SQL keywords in uppercase.
Put all database names, columns, and aliases in lowercase.
Provide standard acronyms as aliases for your tables. For example, products => p, product_attributes => pa, etc. Don’t use arbitrary table aliases as they will make the queries much more difficult to read.
Start SELECT, FROM, JOIN, WHERE, GROUP BY, ORDER BY, etc statements on a new line.
Start each ON and AND conditions on a new line.
The same formatting standard applies to nested queries.

You can format your SQL queries automatically in DBeaver or VS code (with some SQL extensions). However, the formatting is not perfect and we normally need to make some manual adjustments based on the rules above.

In this post, we have introduced some common and handy commands for managing table schemas in MySQL. We have covered how databases, tables, columns, indexes, and views should be created and updated, with a focus on the convention and best practices for each operation which can work as starting guidance for new data engineers.