SQL (Structured Query Language) is a standardized programming language specifically designed for managing and manipulating relational databases. Here are the key points about SQL:
Key Features and Functions:
Data Definition Language (DDL): SQL includes commands for defining and modifying database schema, including tables, indexes, and constraints.
Data Manipulation Language (DML):SQL provides commands for querying and modifying data stored in the database.
Data Control Language (DCL):These commands deal with permissions and access control to the database.
Transaction Control Language (TCL): These commands manage transactions in the database, ensuring data integrity.
A Database Management System (DBMS) is software that provides a systematic way to create, retrieve, update, and manage data in databases. It serves as an interface between end users and databases, ensuring that data is consistently organized and easily accessible.
Key Functions of a DBMS:
Data Definition:Allows users to define the structure of the data, including tables, fields, and data types.
Data Update:Enables inserting, updating, and deleting data in the database
Data Retrieval:Provides mechanisms to query and retrieve specific data using SQL or other query languages.
Data Administration:Manages database access, user permissions, security, and backup/recovery operations.
Transaction Management:Ensures data integrity and consistency through transaction management, including support for ACID (Atomicity, Consistency, Isolation, Durability) properties.
Concurrency Control:Manages simultaneous access to data by multiple users, ensuring data consistency and isolation of transactions.
Types of DBMS:
Relational DBMS (RDBMS):
Data is organized into tables (relations) with rows and columns.
Uses SQL for data manipulation.
Examples: MySQL, PostgreSQL, Oracle Database, Microsoft SQL Server.
NoSQL DBMS:
Designed for unstructured or semi-structured data.
Includes various types such as document-based, key-value, column-family, and graph databases.
Examples: MongoDB (document-based), Redis (key-value), Cassandra (column-family), Neo4j (graph).
In-Memory DBMS:
Stores data primarily in memory for faster access.
Examples: Redis, SAP HANA.
Hierarchical and Network DBMS:
Early forms of DBMS.
Hierarchical DBMS: Organizes data in a tree-like structure (e.g., IBM’s Information Management System (IMS)).
Network DBMS: Uses a graph structure to represent data relationships (e.g., Integrated Data Store (IDS)).
A Relational Database Management System (RDBMS) is a type of database management software that stores data in a structured format, using tables composed of rows and columns. Each table, also known as a relation, contains a set of data entries that are organized into predefined categories. These tables can be linked to each other using relationships, which are defined by primary and foreign keys.
The RDBMS is based on the relational model, which was proposed by E.F. Codd in 1970. This model organizes data into one or more tables where data types may be related to each other. The relational approach allows for the easy querying and manipulation of data using a standard language, typically SQL (Structured Query Language).
Relational tables, a fundamental component of a Relational Database Management System (RDBMS), have several key properties that define their structure and behavior. These properties ensure data is stored in a structured, consistent, and easily accessible manner.
Properties
Rows (Tuples):
Each table consists of rows, with each row representing a single record.
Each row contains data pertaining to a single entity or item.
Columns (Attributes):
Columns represent the attributes of the data stored in the table.
Each column has a specific data type and defines what kind of data can be stored in that field (e.g., integer, varchar, date).
Unique Column Names:
Each column in a table has a unique name to identify it.
Column names help distinguish between different attributes of the data.
Atomic Values:
Each cell in a table contains exactly one value.
This property ensures that data is atomic and indivisible, facilitating easier data manipulation and retrieval.
Unique Rows:
Each row in a table is unique, meaning no two rows are identical in terms of all column values.
Tables often have a primary key to ensure row uniqueness.
Primary Key:
A primary key is a unique identifier for each row in a table.
It ensures that each row can be uniquely identified and accessed.
Foreign Keys:
A foreign key is a column (or a set of columns) that establishes a link between data in two tables.
It references the primary key of another table, creating a relationship between the tables.
Data Integrity:
Constraints can be applied to tables to maintain data integrity.
Common constraints include NOT NULL, UNIQUE, CHECK, and FOREIGN KEY.
Logical Ordering:
The order of rows and columns is logical, not physical.
Data retrieval is based on content, not the order in which data is stored.
Schema-Based:
The structure of a table (schema) is defined explicitly.
This includes the table's columns, data types, and constraints.
Normalization is a process in database design that organizes data to reduce redundancy and improve data integrity. It involves dividing a database into two or more tables and defining relationships between the tables. The main goal of normalization is to eliminate redundant data (for example, storing the same data in more than one table) and ensure data dependencies make sense (only storing related data in a table).
Key Concepts in Normalization:
Elimination of Redundant Data:
Redundancy refers to the unnecessary repetition of data. By organizing data into separate tables, normalization ensures that each piece of data is stored only once, which reduces the amount of space the database uses and improves efficiency.
Data Integrity:
By structuring data into logical groups, normalization helps maintain data integrity. This ensures that changes to data are consistently reflected throughout the database without introducing anomalies.
Denormalization is the process of intentionally introducing redundancy into a database by merging tables or introducing additional fields to optimize read performance. While normalization aims to reduce redundancy and improve data integrity, denormalization balances those benefits against the need for efficient query performance, especially in read-heavy systems.
Key Concepts of Denormalization:
Performance Optimization:
Denormalization can reduce the number of joins required in queries, which can significantly speed up read operations.
This is particularly useful in read-heavy environments where complex queries can become a bottleneck.
Normalization in database design involves organizing tables to minimize redundancy and dependency by dividing a database into two or more tables and defining relationships between the tables. The different normalization forms, also known as normal forms, provide a step-by-step approach to achieve this. Here are the key normalization forms:
Normalization Forms
1NF : Eliminate repeating groups, ensure atomicity.
2NF : Eliminate partial dependencies.
3NF : Eliminate transitive dependencies.
BCNF : Ensure every determinant is a candidate key.
4NF : Eliminate multi-valued dependencies.
5NF : Eliminate join dependencies.
6NF : Handle temporal data with no non-trivial join dependencies.
Primary Key
A primary key is a fundamental component of a relational database table that uniquely identifies each record within the table. It is a column, or a set of columns, where each value must be unique and non-null. The primary key ensures that no two rows can have the same primary key value, thus maintaining the uniqueness and integrity of each record.
Characteristics of Primary Keys:
Uniqueness:Each value in the primary key column(s) must be unique across the table. This ensures that each record can be distinctly identified.
Non-nullability: The primary key cannot contain NULL values, as it must uniquely identify each record.
Immutability:
Ideally, the value of a primary key should not change over time, ensuring consistency and stability in referencing records.
Foreign Key
A foreign key is a column or a set of columns in one table that creates a link between data in two tables. The foreign key in the child table points to the primary key in the parent table, establishing a relationship between the tables. This key is crucial for maintaining referential integrity within the database.
Characteristics of Foreign Keys:
Referential Integrity: The foreign key ensures that the value in the foreign key column matches a value in the primary key column of the parent table. This maintains consistency and integrity across tables.
Relationships: Foreign keys establish and enforce relationships between tables (one-to-one, one-to-many, or many-to-many).
Cascading Actions:Foreign keys can enforce cascading actions, such as CASCADE DELETE or CASCADE UPDATE, which automatically update or delete related records to maintain consistency.
In the context of databases, particularly within relational database management systems (RDBMS) like SQL Server, the term "identity" refers to a column property that automatically generates unique values for new records. This feature is commonly used to create surrogate keys, which are unique identifiers for each row in a table.
Characteristics of Identity Columns:
Automatic Value Generation:
When a new row is inserted into the table, the identity column automatically generates a new value without requiring manual input.
This value is typically sequential, starting from a specified seed value and incrementing by a specified increment value.
Uniqueness:Identity columns ensure that each value is unique within the table, making them suitable for use as primary keys.
Immutability:
Once assigned, the value of an identity column cannot be changed. This immutability helps maintain data integrity.
Collation refers to a set of rules that determine how data is sorted and compared in a database. These rules define how string values are sorted alphabetically and how text is compared for equality. Collation encompasses aspects such as character set, case sensitivity, accent sensitivity, and language-specific rules.
Key Aspects of Collation:
Character Set:Defines the alphabet and symbols available for sorting and comparison.
Case Sensitivity:Determines if uppercase and lowercase characters are treated as distinct (CS) or equivalent (CI).
Accent Sensitivity:Controls whether characters with accents are considered distinct (AS) or equivalent (AI).
Language Rules:Specifies language-specific sorting and comparison rules, addressing differences in alphabetical order and text handling.
Joins in SQL are used to combine rows from two or more tables based on related columns between them. Different types of joins specify how the tables should be combined and what rows should be included in the result set. Here are the most common types of joins:
Types of Join
Inner JoinAn inner join returns only the rows that have matching values in both tables.
Left (Outer) JoinA left join returns all rows from the left table and the matched rows from the right table. If there is no match, the result is NULL on the side of the right table.
Right (Outer) Join A right join returns all rows from the right table and the matched rows from the left table. If there is no match, the result is NULL on the side of the left table.
Full (Outer) JoinA full join returns all rows when there is a match in either the left or right table. Rows that do not have a match in either table will have NULL values.
Cross JoinA cross join returns the Cartesian product of the two tables, combining each row from the first table with every row from the second table.
Self JoinA self join is a regular join, but the table is joined with itself.
A trigger in a database is a special kind of stored procedure that automatically executes (or "fires") in response to certain events on a particular table or view. Triggers are used to enforce business rules, automate system tasks, and maintain data integrity by reacting to data modifications such as inserts, updates, and deletes.
Key Features of Triggers:
Automatic Execution:
Triggers are invoked automatically when the specified event occurs, without explicit calls by users or applications.
Event-Driven:
Triggers respond to data modification events like INSERT, UPDATE, and DELETE.
Association with Tables/Views:
Triggers are tied to specific tables or views in the database.
A nested trigger in SQL refers to a scenario where a trigger initiates another trigger. This occurs when the first trigger performs an operation (such as an INSERT, UPDATE, or DELETE) that causes another trigger to fire. Nesting can happen to multiple levels, depending on the database system's settings and limitations.
Characteristics of Nested Triggers:
Chain Reaction:A trigger on one table performs an operation that causes another trigger on a different table (or the same table) to execute. This can lead to a chain reaction of triggers firing.
Recursive Nature: In some systems, triggers can recursively call themselves, either directly or indirectly, leading to complex nested operations.
Controlled by Database Settings:The depth of trigger nesting and whether recursive triggers are allowed is often controlled by database system settings.
A cursor in SQL is a database object used to retrieve, manipulate, and traverse through the result set of a query one row at a time. Cursors provide a way to process individual rows returned by a database system, allowing for operations that require row-by-row processing rather than set-based processing.
Characteristics of Cursors
Row-by-Row Processing:Unlike standard SQL operations that work on entire sets of rows at once, cursors allow for handling data one row at a time.
Control over Query Results:Cursors provide more control over query results, making it possible to navigate through the data, fetch specific rows, and perform operations such as updates or deletions on individual rows.
Complex Operations:Useful for complex row-by-row operations, such as computations that depend on previous rows or iterative processes that are not easily expressed in standard SQL.
A view in SQL is a virtual table that provides a way to look at data in one or more tables. It is a stored query that users can query just like an actual table. Views can simplify complex queries, enhance security by restricting access to specific data, and present data in a specific format without modifying the underlying tables.
Key Characteristics of Views:
Virtual Table:A view does not store data itself but rather retrieves data from one or more tables upon each access.
Simplification:Views can simplify complex queries by encapsulating them in a single, reusable object. Users can select from the view as if it were a table, without needing to understand the complexity of the underlying query.
Security:Views can restrict access to specific columns or rows of data, providing a layer of security. By granting access to views instead of tables, database administrators can control which data users can see and interact with
Data Abstraction:Views provide a way to present data in a particular format or structure, abstracting away the complexities of the underlying schema. This can help with data consistency and reduce the impact of schema changes on applications.
An index in SQL is a database object that improves the speed of data retrieval operations on a table at the cost of additional storage and slower write operations. Indexes are created on one or more columns of a table to provide a faster access path to the rows.
Key Characteristics of Indexes
Speed Up Queries:Indexes significantly reduce the time required to retrieve data by providing a quick lookup mechanism.
Storage Overhead:Indexes consume additional disk space as they store copies of the indexed columns along with pointers to the actual rows in the table.
Impact on Write Operations:While indexes improve read performance, they can slow down write operations (INSERT, UPDATE, DELETE) because the index needs to be updated whenever the data changes.
A linked server in SQL Server is a feature that allows SQL Server to execute commands against OLE DB data sources on different servers. This enables a SQL Server instance to connect to and query external data sources as if they were part of the local database, facilitating distributed queries, data management, and integration across heterogeneous data sources.
Key Features of Linked Servers:
Heterogeneous Data Access:Linked servers can connect to a variety of data sources, including other SQL Server instances, Oracle, MySQL, Access databases, Excel files, and more.
Distributed Queries:Linked servers enable the execution of distributed queries that can retrieve and manipulate data across multiple servers in a single query.
Data Integration:Facilitate data integration and reporting by allowing data to be pulled from different sources into a unified result set.
Security:Security context can be mapped between the local server and the linked server, ensuring secure access to remote data.
A sub-query, also known as a nested query or inner query, is a query embedded within another SQL query. The outer query, referred to as the main query, uses the result of the sub-query to further refine its results. Sub-queries allow for more dynamic and flexible querying by leveraging results from other queries as part of their execution logic.
Key Characteristics of Sub-queries:
Placement:Sub-queries can be used in various parts of an SQL statement, including the SELECT, FROM, WHERE, HAVING, and JOIN clauses.
Types
Single-row sub-queries:Return a single row with one or more columns. Typically used with comparison operators (=, >, <, etc.).
Multiple-row sub-queries:Return multiple rows. Used with operators like IN, ANY, ALL.
Correlated sub-queries:Refer to columns in the outer query and are executed once for each row processed by the outer query.
Non-correlated sub-queries:Independent of the outer query and are executed once, providing results used by the outer query.
Execution Order:The sub-query is executed first, and its result is used by the main query.
Return Types:Sub-queries can return a single scalar value, a list of values, or a complete result set.