Introduction¶
约 1470 个字 1 行代码 4 张图片 预计阅读时间 5 分钟
What is a Database system
- Database : A very large, integrated collection of data
- Models a real-world enterprise
- Entities (e.g., teams, companies)
- Relationships (e.g., The Patriots is playing in The Superbowl)
- More recently, also includes active components (e.g., business logic)
- A Database Management System (DBMS) is a software system designed to store, manage, and facilitate access to databases
- Provides an interface for users to interact with the database
- Hides the implementation details of how the data is stored and maintained
- Ensures that the data is consistent and secure
- Provides tools for querying and updating the data
- Provides tools for managing the database itself (e.g., backup and recovery)
Purpose of Database Systems¶
Applications¶
Data processing and management are the most important fields of computer applications.
So the knowledge of database systems is essential for computer scientists.
Database Applications
- Banking: all transactions
- Airlines: reservations, schedules
- Universities: registration, grades
- Sales: customers, products, purchases
- Manufacturing: production, inventory, orders, supply chain
- Human resources: employee records, salaries, tax deductions
Databases touch all aspects of our lives although you don’t see them.
Characteristics of DBMS¶
- Efficiency and scalability in data access.
- Reduced application development time.
- Data independence (including physical data independence and logical data independence).
- Data integrity and security.
- Concurrent access and robustness (i.e., recovery).
File-Processing System¶
File-processing system is supported by a conventional Operating System (OS).
New application programs must be written if necessary, and new data files are created as required. But over a long period of time, data files may be in different formats.
Data files are independent each other.
Drawbacks of File-Processing System¶
- Data redundancy and inconsistency: Multiple file formats, duplication of information in different files
- Difficulty in accessing data:need to write a new program to carry out each task
- Data isolation——multiple files and multiple formats cause it difficult to retrieve and share
- Integrity problems Integrity constraints (e.g. account balance > 0) become part of program code
- Hard to add new constraints or change existing ones
- No atomicity of update :Failures may leave database in an inconsistent state with partial updates carried out( atomicity of update refers to the task should either complete or not happen at all)
- Difficult to concurrent access by multiple users
- Concurrent access needed for performance
- Uncontrolled concurrent accesses can lead to inconsistencies
- Example: Two people reading a balance (say 100) and updating it by withdrawing money (say 50 each) at the same time
- Security problems
** But Database systems offer solutions to all the above problems! **
Open source databases
-
MySQL: is the most popular open source database for small system on web sites, is a key part of LAMP (Linux, Apache, MySQL, PHP/Perl/Python), and is a fast growing open source enterprise software stack.
-
PostgreSQL: is a highly scalable, open source object-relational database management system, and is originally developed by the Department of Computer Science, UC Berkeley (called Postgres)
View of Data¶
LEvels of Data Abstraction¶
Different usage needs different level of abstraction.
- Physical level:describes how a record is stored
- Logical level: describes data stored in database,and the relationships among the data on upper level
- view level:View level: application programs hide details of data types. Note that views can also hide information (e.g., employee’s salary) for security purposes.

Schemas and instances¶
Schema--the structure of the database on different level
Analogous to type information of a variable in a program
- Physical schema: database structure design at the physical level
- Logical schema: database structure design at the logical level
- Subschema: schema at view level
instance--the actual content of the database at a particular point in time
Analogous to the value of a variable
Note
The relationship between schema and instance is similar to types and variables in programming languages (type-schema, variable-instance) 即schema是数据库中的蓝图或者目录,用于描述其结构和内容,定义数据应该如何存储和组织,instance是其具体的数据
Physical & Logical Independence¶
Physical data independence – the ability to modify the physical schema without changing the logical schema.
- Applications depend on the logical schema.
- Applications are insulated from how data is structured and stored.
- One of the most important benefits of using a DBMS! 对于物理结构的更改不应该影响到更高级的层面
Logical data independence – protect application programs from changes in logical structure of data.
- Logical data independence is hard to achieve as the application programs are heavily dependent on the logical structure of data.
Data Models¶
Data model is a collection of conceptual tools for describing
- data structure
- data relationships
- data semantics
- data constraints
Different level of data abstraction needs different data model to describe
Database language¶
- Data Definition Language (DDL): Specification notation for defining the database schema.
- Data Manipulation Language (DML): Language for accessing and manipulating the data organized by appropriate data model.
- Data Control Language (DCL)
DDL¶
Specifies a database scheme as a set of definitions of relational schema.
Also specifies storage structure, access methods, and consistency constraints.
DDL statements are compiled, resulting in a set of tables stored in a special file called data dictionary.
for example
Data dictionary contains metadata (i.e., the data about data) about
- Database schema
- Integrity constraints
- Primary Key
- Referential integrity
- Authorization
DML¶
Data Manipulation Language (DML) - Retrieve data from the database - Insert / delete / update data in the database - DML also known as query language
Two classes of DMLs - Procedural DML – user specifies what data is required and how to get those data (e.g., C, Pascal, Java, etc.). - Nonprocedural DML – user specifies what data is required without specifying how to get those data (e.g., SQL, Prolog, etc.).
SQL is the most widely used query language
Database Design¶
Steps of Database Design¶
-
Requirement analysis :What data, applications, and operations needed.
-
Conceptual database design :A high-level description of data, constraints using Entity-Relationship (E-R) model or a similar high level data model
- Logical database design Convert the conceptual design into a DB schema.
- Schema refinement-Normalization of relations: Check relational schema for redundancies and related anomalies.
- Physical database design:Indexing, query, clustering, and database tuning.
- Create and initialize the database & Security design Load initial data, testing. Identify different user groups and their roles.
Entity-Relationship(E-R Model)¶
E-R model of real world
-
Entities (objects) E.g., customers, accounts, bank branch. Entities are described by a set of attributes.
-
Relationships between entities E.g., Account A-101 is held by customer Johnson. Relationship set depositor associates customers with accounts
E-R Model is widely used for database design Database design in E-R model is usually converted to design in the relational model . E-R model was first proposed by Peter Chen.

Users and Administrators¶
Database Users¶
Users are differentiated by the way they expect to interact with the system
Naive users – invoke one of the permanent application programs that have been written previously by a high level language. E.g., people accessing database over the web, bank tellers, clerical staff.
Application programmers – interact with system via SQL calls.
Sophisticated users – form requests in a database query language. E.g., Online Analytical Processing (OLAP), Data mining.
Specialized users – write specialized database applications that do not fit into the traditional data processing framework. E.g., CAD, Expert System (ES), KDB.

Database Administrators¶
Database administrator (DBA): A special user having central control over database and programs accessing those data.
- DBA has the highest privilege for the database.
- DBA coordinates all the activities of the database system.
- DBA controls all users authority to the database.
- DBA has a good understanding of the enterprise’s information resources and requirements
Database administrator's duties/functions include:
- Schema definition
- Storage structure and access method definition
- Schema and physical organization modification
- Granting of authorization for data access Routing maintenance
- Monitoring performance and responding to changes in requirements Security for the database (e.g. periodically backup database, recovery when failure)
Transaction Management¶
Concurrent use/access is important, but causes problems/conflict.
A transaction is a collection of operations that performs a single logical function in a database application.
Transaction requirements include atomicity, consistence, isolation, durability.
Transaction-management component ensures that the database remains in a consistent (or correct) state, although system failures (e.g., power failures and operating system crashes) and transaction failures.
Concurrency-control manager controls the interaction among the concurrent transactions.
Database Architecture¶
storage manager¶
Storage Manager is a program module that provides the interface between the low-level data stored in the database and the application programs and queries submitted to the system.
Storage Manager is responsible for the following tasks:
- Interaction with the file manager
- Efficient storing, retrieving and updating of data
Storage Manager includes
- Transaction manager
- Authorization and integrity manger
- File manager (interaction with the file system to process data files, data dictionary, and index files)
- Buffer manager
Query Processor¶
Query Processor includes DDL interpreter, DML compiler, and query processing.
- Parsing and translation
- Optimization
- Evaluation
