What is data integrity and normalization?

July 11, 2024

Data Integrity is used to restrict the invalid data into the database at the column level. Data Base Administrators(DBA) will create a table by making use of Data Integrity. In SQL data integrity is applicable for Data definition language(DDL) create command and it can be achieved in two ways-

There are five types of constraints we need to consider to achieve data integrity

a) Not Null– Not Null is a constraint at a column level that cannot be felt blank. It is a mandatory field to update. In the Employee table below, you can see the Ename column needs to be filled it cannot be null.

Output screen:

b)Data Types– Data types are a constraint at a column level that needs to be unique. It will not accept duplicate values. In the Employee table below, you can see the column name needs to be unique.

Output screen:

c) Primary key– The primary key is a constraint at a column level that needs to be unique and not null. The primary key will always be one. The primary key will always be present on the master table. There are two types of primary keys-

1)Candidate Key- Candidate Key are those columns eligible to become the primary key.

2)Alternate Key- Alternate Key are those columns not selected as primary key.

Primary key= Candidate Key-Alternate Key.

d) Foreign key– The foreign key is a constraint at a column level that will be used to create the relationship between the two tables. The foreign key can be duplicate or null. The foreign key is also called a Referential Integrity Constraint.

The columns between the two tables should be the same.
The primary key will always be present on the master table.
The foreign key will always be present on the child table.
In the below two tables, the Employee Table and the Department Table.
The primary key is a Department Table as the dept_no column has no duplicate values and the Foreign key is the Employee Table as the dept_no column has duplicate values. The child table can be many however master table will always be one.

e) Check Key– The check key is a constraint at a column level that will check any kind of condition written by the developer and the database administrator.

Datatypes in SQL will decide at the column level what kind of data column will have it can be name, address, number, store alphabets, True or false. In SQL every data have a datatype. There are different types of datatypes

a) boolean– It will give a condition either True or False.

b)Varchar- It stores alphanumeric and alpha values. For example, ename in the employee table will have varchar.

c) Numbers– it stores a range of values in different data types

byte
short
int
long

Formula to calculate the range of values is –2^(n-1) to (2^(n-1)-1)

byte- (-128 to 128)
short-(-32768 to 32767)
int- (-2,147,483,648 to 2,147,483,647)
long- (-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807)

d) decimals– There are two types of datatypes

float- The float data type stores positive and negative numbers with a decimal point. For example,35.3, -2.34.
double- The double data type is capable of storing exactly the double size of data, compared to the float data type For example, 4.5672, 2.45348.

e) char– The char data type is used to store single alphabet characters. For example, ‘A’, ‘4’

f) date– the data datatype stores in an instant in time expressed as a calendar date and time of the day. For example, Amazon knows at what time and day the customer ordered the product, and Netflix knows when and when the customer took the subscription to Netflix.

Normalization is the database design technique that organizes the table to reduce the data’s redundancy and dependency. In normalization, we have 3 normalization forms and certain rules.

Each record should be unique as rows are unique.
Each cell should consist of a single value. There should not be two values in a single cell as rows need to be unique.

Example-In the below table, the value of the cell in column number 1 and row number 2 has two different values associated with it, and it is against normalization.

The table must be in the 1st Normalization Form.
Each table should have one primary key(not null and unique) as one column should be unique and null.

Example-In the below table, the value of the cell in column number 2 and column number 3 has the same column name associated with it, and it is against normalization.

The table must be in the 2nd Normalization Form.
The table should not have a transitive functional dependency, meaning it should not have multiple things dependent on each other in a particular table. There must be different tables to store data accordingly to create a pattern.

Example-In the below table, every column is dependent on one column.

In the below table, after normalization, we have two tables with their details to create the pattern.

Data integrity is important as it restricts invalid data in the database. Constraints and datatypes can achieve it. Data integrity is an important concept to follow normalization, a database design technique that organizes the table to reduce the data’s redundancy and dependency. Normalization helps to reduce dependency on a particular column while creating tables we should always follow normalization rules. Remember to practice, stay updated with the latest trends in SQL COURSE, and maintain a positive attitude throughout your interview process.Normalization helps to organize the data in the database. It is a multi-step process that sets the data into table form and removes the duplicate data from the relational tables to create the pattern.

Also read:

AGGREGATE FUNCTION IN SQL

Different Joins In SQL

Structured Query Language(SQL)Interview Questions & Answer

The different operators in SQL