File System and Data Base :File System and Data Base Lectures Prepared by:
Saurabh Mittal
JK Padampat Singhania Instt. of Management & Tech.
Various Types of files :Various Types of files Depending upon the type of content, a file can be categorized as:
1. Data file
2. Program file
3. Object code file
4. Executable file
5. Text file (unformatted file)
6. Formatted file
Data Files :Data Files A data file is used to store the data records. These data files are well defined data structures that contain related data organized in convenient groupings (records) of data items.
Each data file has two additional types of records:
Header record and trailer record.
Header records contain file identification information and keep apart different groups of records in a file.
Trailer records contain codes to mark the end of a set of data records. These also record file usage information.
Categories of data files :Categories of data files Depending upon the nature of data, data files can be categorized as:
1. Master file
2. Transaction file
3. Work file
4. Audit file
5. Backup file Program Files
Program files are used to store programs in different languages provided by different software vendors. These files have different extensions depending on the language used to write a program.
e.g.: 1. Program file in 'C' Language has extension. C.
2. A program file in C++ has extension .CPP.
Slide 5:Object code Files
These files store compiled programs written in a language. These files contain the machine code.
e.g.: After compilation, C compiler creates a file having extension .OBJ
Executable Files
These files store ready to execute programs. These files may have extension .EXE, .COM or .BAT. These programs can be directly executed from the command prompt.
Slide 6:Unformatted Text Files
These files are simple files containing simple text.
Text files can be created using any text editor or line editor.
e.g.: Text files can be created using MS-DOS Edit editor or Notepad editor provided by MS-Windows.
Formatted Text Files
These files contain formatted text. These also contain some commands and symbols to format the text. These files can be created using any word processor.
e.g.: MS-WORD creates a formatted text file having extension .DOC.
FILES ORGANIZATION :FILES ORGANIZATION File organization refers to the relationship of the key of the record to the physical location of that record in the computer file. The two main objectives of computer based file organization are:
Ease of file creation and maintenance, and
Providing an efficient means of storing and retrieving information.
The selection of a particular file organization depends upon factors like, the type of application, the method of processing for updating files, size of file, etc.
The four file organization methods that are commonly used in business data processing applications are:
1. Serial
2. Sequential
3. Direct or Random
4. Indexed Sequential
Serial File Organization :Serial File Organization In serial file organization records are stored without any consideration of their order or sequence. Records have to be accessed in the serial fashion only.
Examples : memory dumps, archive files, records of events, transaction files.
Each record is written after the last record in the current file. The order of records in the serial file is according to the time when the data was generated.
Sequential File Organization :Sequential File Organization In a sequential file, records are arranged in the ascending or descending order or chronological order of a key field. To access these records, the computer must read the file in sequence from the beginning. The retrieval search ends only when the desired key matches with the key field or the currently read record.
On an average, about half the file has to be searched to retrieve the desired record from a sequential file.
Sequentially organised files are normally created and maintained on storage media such as magnetic tape, cartridge tape, magnetic disks, etc.
Applications:
Payroll System
Billing and customer statement preparation
Bank cheque processing
Financial accounting
Slide 10:Advantages of Sequential File Organization
Easy to organise, maintain, and understand.
There is no overhead in address generation locating a particular record requiring only the specification of the key field.
Relatively inexpensive I/O media and devices can be used.
It is the most economical and efficient file organization where the activity ratio (the ratio of the total number of records in transaction file and the total number of records in master file) is very high. That is why this file organization is most suitable for transaction files.
Disadvantages of Sequential File Organization :Disadvantages of Sequential File Organization It proves to be very inefficient and un- economical for applications in which the activity ratio is very low.
Since an entire sequential file may need to be read to retrieve and update few records, accumulation of transactions into batches is required.
Transactions must be stored and placed in sequence prior to processing. Random enquiries are virtually impossible to handle.
Timeliness of data in the file deteriorates while batches are being accumulated.
Data redundancy is typically high since the same data may be stored in several files, sequenced on different keys.
Direct or Random File Organization :Direct or Random File Organization A direct file consists of records organised in such a way that it is possible for the computer to directly locate the desired record without having to search through any other records first.
A record is stored by its key field.
For mapping the key fields to the locations, an arithmetic procedure called hashing algorithm is frequently used. This address generating function is selected in such a manner that the generated addresses should be distributed uniformly over the entire range of the file area and a unique address must be generated for each record key.
A direct access storage device (DASD) such as drum, disk, etc., is essential for storing a direct file.
Applications most suitable for interactive online applications :
1. Airline/Railway reservation systems
2. Teller facility in banking applications
Slide 13:Advantages of Direct Access Organization
The access to and retrieval of a record is quick and direct
Transactions need not be sorted and placed in sequence prior to processing.
Accumulation of transactions into batches is not required before processing them. They may be processed as and when generated.
It can also provide up-to-the minute information in response to enquiries from simultaneously usable online stations.
If required, it is also possible to process direct file records sequentially in a record key sequence.
A direct file organization is most suitable for interactive online applications.
Disadvantages of Direct Access Organization :Disadvantages of Direct Access Organization 1. These files must be stored on a direct access storage device. Hence, relatively expensive hardware and software resources are required.
2. File updating (addition and deletion of records) is more difficult as compared to sequential files.
3. Address generation overhead is involved for accessing each record due to hashing function.
4. May be less efficient in the use of storage space then sequentially organised file.
5. Special security measures are necessary for online direct files that are accessible from several stations.
6. System design around it is complex and costly.
INDEXED SEQUENTIAL FILE ORGANIZATION :INDEXED SEQUENTIAL FILE ORGANIZATION Basic principle "having index (directory)"
In an indexed sequential file, records are stored sequentially on a direct access device (i.e., magnetic disk) and data is accessible either randomly or sequentially.
The sequential access of data occurs as one record at a time unit the desired item of data is found." The records of the file can be stored in random sequence but the index table is in sorted sequence on the key value.
This technique is known as Indexed Sequential Access Method (ISAM).
e.g.:
A directory (index) in a large multistoried building
The contents in books help us to locate page of the desired topic so that we can turn directly to that page to begin reading instead of searching each page.
Slide 16:Applications
This file organization is a compromise approach that combines some of the advantages of both the sequential and direct approaches, and therefore used in almost all the applications, like:
Material A/C, Banking Industry, etc.
Advantages
1. Permits the efficient and economical use of sequential processing techniques when the activity ratio is high.
2. Permits direct access processing of records in a relatively efficient way when the activity ratio is low.
Disadvantages
1. These files must be stored on a direct access storage device. Hence, relatively expensive hardware and software resources are required.
2. Less efficient in the use of storage space than some other alternatives.
MASTER FILE :MASTER FILE Master files are files of a fairly permanent nature, e.g., inventory, payroll, etc. They include some information which is of a permanent nature and also data which is continuously updated by recent transactions.
e.g.: emp_Master - Emp_No, Emp_Name, Emp_ Address, Date_of_Joining
These attributes about an employee change less frequently. The normal means of updating a master file is by:
1. Addition of data/records
2. Deletion of data/records
3. Amending of data/values
Types of master files
Static master files (or reference files) -Permanent or semi-permanent nature.
These are subject to occasional revision.
Example of business entities - products, suppliers, customers, employee, etc.
Dynamic master files (or table files) - Transitory nature
Example of business entities - customer orders, works orders, price lists.
TRANSACTION FILE :TRANSACTION FILE Transaction files are files in which the data relating to business events is recorded, prior to a further stage of processing and are created from source documents used for recording events or transactions.
A transaction file is a temporary file containing all relevant data about al transactions of one type.
e.g.: >> Customer's orders for product Purchase orders, job cards, invoice, dispatch notes, etc.
>> Daily sale transaction file, Daily stores issue file
Purposes:
1. To accumulate data about relevant events as they occur.
2. To update master files to reflect the results of most recent transactions
DESIGNING REPORTS :DESIGNING REPORTS Systems analysts specify reports when they need a record of data or a report of information, or circulate a large volume of information to several persons simultaneously.
Only those reports whose printing is absolutely necessary should be printed. One well designed report may sometimes replace several poorly designed ones. Providing unnecessary details assists no one, so analysts should be alert to avoid producing extraneous data.
Printed Reports :Printed Reports Printed reports vary in size, but often used standard sizes are :
91/2 by 11 inches
11 by 14 7/8 inches
8 by 14 7/8 inches
These sizes are for continuous forms (pin-fed or fan-fold forms) - connected sheets of papers that feed into the printer one after the other. All the features of printed output are available in microfilm and microfiche, the two film output methods.
Film output reduces output cost by approximately one third. After developing in the microfiche machine, film can be stored and retrieved when needed. For reference data used only sporadically, such as private saving account balance that changes infrequently and can have interest pasted every three months, microfilm could be a useful output option.
A page of output takes so little space when stored on microfilm. 1 square inch of a film can store as much information as several pages of paper report. A 3.5 inch card stores the equivalent of hundreds of pages
Special Forms :Special Forms Reports could be printed on simple paper. But usually, when an organization sends a report to its customer, the logo and the name of the organization is also printed on it.
Sometimes, the report is printed on a paper which is "pre-printed".
On high school mark sheets-
The name of the board, the year and logo of the board as well as the name of examination passed are printed on the paper. The marks, name, school name, date of birth, etc., were printed afterwards.
On electricity bill-
The titles of various Labels are preprinted. The values against them are printed by the information system.
Layout of a Printed Report :Layout of a Printed Report An output layout is the arrangement of items on the output medium. When analysts design an output layout, they are building a mock up of the actual report or document as it will appear after the system is in operation. The layout should show the location and position of the following:
All variable information
Item details
Summaries and totals
Separators
All preprinted details
Doment names and titles
Corporate name and address
Instructions & Notes
The layout is a blueprint that will guide the construction of programmes later in the development process. Each variable in the layout must be accounted for in programme instructions.
GUIDELINES FOR DESIGNING PRINTED REPORT :GUIDELINES FOR DESIGNING PRINTED REPORT Reports and documents have to be designed to be read from left to right and from top to bottom.
The most important items should be easiest to find
(roll number in a marks cross list is most important, hence it is placed on the left most column).
All pages should have a title and page number and show the date on which the output was produced.
All columns must be Labeled.
Abbreviations must be avoided.
Some organizations specify standards that guide design practices in addition to the above guidelines.
REPORT GENERATION :REPORT GENERATION Commercial outputs need formatted output. These formatted outputs are known as reports. A formatted report may have the following:
1. Report heading
2. Page heading
3. Page numbering
4. Footers
5. Some remarks
6. Date, month, and year of printing
Commercial outputs may also extend over multiple pages. In that case, one will have to keep a provision in his program so that certain details get printed on every page.
e.g.: One may have to print page headings on every page.
Report may be - 1. Single page 2. Multiple pages
Multiple page report :Multiple page report Multiple page reports in general may consist of following parts:
Report Heading represents the title of the report and appears, only on the first page of the report.
Report Sub Heading is printed as and when required on each page.
Page heading is printed on the top of each page.
Page Footer is printed at the bottom of each page. It is generally the total number of records on the page, sum of the numeric data, etc.
Report Detail constitutes the main body of report. The output details or the information of the main consequence are listed in this group.
REPORTS WITH CONTROL BREAK :REPORTS WITH CONTROL BREAK In control break report, we need to group information based on some common criterion. For presenting the summarised information one can use the concept of control break.
Example:
In this report control break is on the region. Analysis of report shows the following points:
1. Records of a particular region are grouped together and printed at one place.
2. When the control break occurs, certain procedures are followed. Such as total sales of that region.
3. It may be necessary that performance of each region is printed on each page, i.e., when the control break occurs the report is printed on a fresh page.
4. At the end of the report, it may be necessary to calculate and print the grand totals.
Slide 27:ABC Company Ltd.
Gurgaon Date: Page No.
Region wise Sales Report Sale of Mumbai Region: 67,50,000 Sale of Delhi Region: 45,00,000
LABEL GENERATION :LABEL GENERATION Creating Labels means to generate address slip in computer. Creating mailing Labels is a two step process:
(1) Designing Label Form
(2) Production of Labels
Designing Label From
In this step, we design Labels. We select the Label layout and then place the fields, text and pictures and other objects on it.
Production of Labels
When we have designed and Label form, we can print the Labels at any time. At this time, Label is ready to use. .
RELEVANCE OF DATABASE MANAGEMENT SYSTEMS :RELEVANCE OF DATABASE MANAGEMENT SYSTEMS Database management system (DBMS) in very important for an organization. One of the main advantages of using a database system is that the organization can exert, via the DBA, centralised management and control over the data. The database administrator is the focus of the centralised control.
Important features of DBMS:
Reduction of Redundancies:
DBA avoids unnecessary duplication of data and effectively reduces the total amount of data storage required. It also eliminates the extra processing necessary to trace the required data in a large mass of data. Another advantage of avoiding duplication is the elimination of the inconsistencies that tend to be present in redundant data files. Any redundancies that exist in the DBMS are controlled and the system ensures that these multiple copies are consistent.
Slide 30:(b) Shared Data: A database allows the share of data under its control by any number of application programs or users.
(c) Integrity: Data integrity means that the data contained in the database is both accurate and consistent. Therefore, data values being entered for storage could be checked to ensure that they fall within a specified range and are of the correct format. Centralised control can also ensure that adequate checks are incorporated in the DBMS to provide data integrity.
(d) Security: Data is of vital importance to an organization and may be confidential. Such confidential data must be accessed by unauthorized persons. The DBA who has the ultimate responsibility for the data in the DBMS can ensure that proper access procedures are followed, including proper authentication schemes for access to the DBMS and additional checks before permitting access to sensitive data. Different levels of security could be implemented for various types of data and operations.
Slide 31:(e) Conflict Resolution:
DBA resolves the conflicting requirements of various users & applications. He chooses best file structure and access method to get optimal performance for the response critical applications, while permitting less critical applications to continue to use the database, albeit with a relatively slower response.
(f) Data Independence:
Physical data independence allows changes in the physical storage devices or organization of the files to be made without requiring changes in the conceptual view or any of the eternal views and hence in application programs using the database.
Logical data independence implies that application programs need not be changed if fields are added to an existing record nor do they have to be changed if fields not used by application programs are deleted.
INTEGRATION OF APPLICATION :INTEGRATION OF APPLICATION In a typical business organization several applications work simultaneously to realise its business objectives. When implemented and managed separately the applications neither achieve a cost effective solution nor they are easily manageable. The application availability and sharing can be greatly enhanced by integrating the existing applications in one single unit.
The management control and security concerns make application integration mandatory. Therefore, these individual applications need to be integrated into not department-wide but enterprise-wide application. Many solutions have come to foray like ERP (Enterprise Resource Planning).
According to a recent survey, a typical global company has 30 to 50 enterprise applications and spends 25 to 40 percent of its information technology (IT) budget on application integration. And integration requirements are intensifying with the current wave of acquisitions and mergers.
Slide 33:Application integration aims at creating a cost-effective integration architecture and infrastructure to promote interoperability among applications. And interoperability is the key to agility for the accelerating change that rules today's global marketplace.
Cross-functional business processes through integration enable the open flow of information between systems, across organizations, between enterprises and among trading partners. Several technologies are available to integrate web-based applications, front- and back-office systems, ERP systems, and package software applications.
Some of the most popular application integration solutions are:
PeopleSoft, Oracle, SAP, and other leading ERP systems, as well as all leading database management systems (DBMSs). A typical application integration looks similar to the one depicted in the figure given below:
Slide 34:Integrated
Central
Application Accounts Application Production Application Management Application Communication Application Marketing Application Administration Application
ISSUES RELATED TO INTEGRATION OF APPLICATIONS :ISSUES RELATED TO INTEGRATION OF APPLICATIONS Cost: is most important issue in application integration. The cost of integration must justify the overall improvement achieved.
2. Compatibility: Generally, the applications are developed and implemented in isolation with little or no consideration with other applications. This fact gives rise to compatibility problems among the component applications.
3. Legacy applications: Some applications may exist in the organization which is probably very old (say implemented in very old technology) and yet cannot be thrown off. Such systems are called legacy systems. Integration of these systems some times is very challenging.
4. Migration: The database and/ or process may be required to be implemented in new technology from existing old ones. Many a time the source code does exist only an executable form of the process may be available. In such cases reverse engineering exercise may be needed.
Slide 36:Operational control: Once an application is integrated with another one there is in general a shift in the control from one entity to another. There may also be a sharing of control on the application. Therefore, a redesigning of the control and authority is often required.
Security: Application integration exposes the system to more security threats than when they are stand-alone. The vulnerability of the system to security threats must be considered very seriously.
7.User training: Along with the integration there may be significant change in the way users have been interacting with the system hitherto. Therefore, user training in the new environment must also be considered carefully.
INTRODUCTION TO MICRO DATABASE MANAGER :INTRODUCTION TO MICRO DATABASE MANAGER The database manager is a program module which provides the interface between the low-level data stored in the database and the application programs and queries submitted to the system. Databases typically require lots of storage space (in GBs), stored on disks. Data is moved between disk and main memory as and when needed. The goal of the database system is to simplify and facilitate access to data. The performance in terms of response time is also very important.
The responsibilities of database manager module:
1. Interface with the file manager: Database manager interacts with the file manager of the operating system by storing raw data on disk using the file system usually provided by a conventional operating system. The database manager would translate a DM (Data Manipulation Language) statement into sequence of low-level file system commands for storing, retrieving and updating data in the database.
Slide 38:Integrity enforcement: It enforces integrity by checking that updates in the database do not violate consistency constraints. For example, in a bank database, it is the task of the database manager to see that no bank account balance is below Rs. 1000 as otherwise it would violate consistency constraint.
Security enforcement: By defining security checks and constraints, the database manager ensures that the database is safe. The database manager is endowed with the power of letting the users use the database and also deny it. It is the sole responsible person to provide access rights to a user like - read only, read-write etc.
4. Backup and recovery: Database is such a valuable asset that the database manager must ensure that it is not damaged or lost. Therefore, it regularly takes back up of the database. In case of any failure it initiates suitable recovery procedure to resurrect the database. It must also do it in the least amount of time.
Slide 39:Database Management System Database Manager Operating System Computer Hardware