1 Data Processing And Information
1.1 Data and information
data: It is raw or unprocessed data which has no meaning
information: It is data with context and meaning
Direct and Indirect Data
direct data is data that has been collected from the source for a specific purpose and
used for only that purpose
indirect data is data that is obtained from a third party and used for a different purpose
than that which it is was originally collected for
Sources of Direct Data:
questionnaires: They are a set of arranged questions that can be distributed to
people online or physically in order to collect data from individuals. They are user
friendly, and easier to distribute and analyse information since all respondents
answer same questions.
interviews: They are one on one meeting between interviewee and interviewer, in
which questions are asked by an interviewer to the interviewee to get information,
they can be close ended questions which can make it easier to quantify the
responses or they can be open ended questions to get more in depth detail for
better quality data.
observation: A method of collecting data by watching an event/activity and then
analyzing it and recording the data. The observer gets the information firsthand
rather than from a thirdparty.
data logging: The use of sensors and computers to gather and anaylse data allowing
it to be saved or represented as an output in graphs,charts, etc. Its most likely to be
used in scientific experiments where human intervention isn’t suitable such as a
situation which requires continuous monitoring and gathering of data
Sources of Indirect Data:
electoral registers: It’s a record of citizens that are eligible to vote in an election, it
contains a list of their personal data such as legal name, address, contact information,
etc. Some of the is data removed from open version of the register which is accessible
to organizations and public for certain use depending on the local laws.
businesses collecting info when used by third parties: Businesses sell the information
that they collect from their customers. For example when someone purchases
something online they are often asked to tick a box authorising the business to share
this with other organisations. Customers often provide personal information that has a
commercial value. Businesses use this information to create mailing lists that can be
purchased by any other organisation/individual to send emails or even brochures
through the post.
Note: business collect the data from customers as direct data for shipping their
products/providing services etc, this data can be purchased by third parties hence
considered as indirect data now, it can be used to for targeting required audience
(customers) / analyzing buying trends etc.
Advantages of Direct Data:
Source of data is known hence we know how reliable it is
Only required data is gathered hence its very relevant
Data can be sold later as indirect data for other purposes
Data is likely to be upto data
Data can be easily presented in required format as preferred and suitable source is used
Disadvantages of Direct Data:
It can take a long time to gather data than to acquire data from already existing indirect
data source
Larger samples can be difficult to collect.
Can be more expensive than indirect data due to the preparation and gathering of
required data; such as producing questionnaires or buying additional equipment such
data loggers
Data might be out of data when the project is completed
Advantages of Indirect Data:
The data is readily available
Allows larger set of data to be examined with less time and cost involved in comparison
with direct data
A larger sample size can be used
Allows data to be gathered from subjects (eg; people) which the gatherer doesn’t have
physical access to
Disadvantages of Indirect Data:
Can be less reliable as source maybe unknown
Not all data will be relevant
Might be out of date
Might be difficult to extract data as it would be in wrong format
There may be sampling bias (data collected certain purpose from only certain sample
would not give an accurate answer for the purpose its being used now)
1.2 Quality Of Information
Factors affecting the quality of information are:-
accuracy:
If the data collected is inaccurate, the information after processing will be inaccurate
and hence of bad quality. Misspelling words or misplacing characters could lead to
inaccuracy, i.e 10:30 am for 10 o clock at night
relevance:
Data must be relevant to the purpose, irrelevant data needs to be removed before
processing for better quality information, i.e; being given a bus timetable when train
timetable is required
age:
Information must be upto date, old information will be irrelevant and inaccurate and
hence of bad quality, i.e, not updating family registers will make the emergency contacts
incase of emergency useless due to outdated information
level of detail:
Information must be of required level of detail. Too much detail will make it difficult to
extract necessary information and too little detail will not provide the information
needed
completeness:
Information must be complete and have all required information to be of good quality,
if not it can’t be used properly for a particular purpose, i.e not having the venue of an
event mentioned in its advert poster makes it incomplete. (note: information can have
higher level of detail and be complete)
1.3 Encryption
encryption: It’s the process of converting plain text into cipher text which makes the
original data unintelligible
The need for encryption:
Encryption is important when sending or storing sensitive data such as personal data or
a company’s sale figure
Data being sent across a network or the internet can be easily intercepted by hackers
Data stored on storage media could be stolen or lost
Hence the purpose of encryption is to scramble the data in order to make it difficult or
impossible to read if it is accessed by an unauthorized user
Methods Of Encryption:-
Symmetric:
A method of encryption which requires the use of the same private key in order to
encrypt and decrypt data. The sender and receiver both require the same key, hence it
needs to be agreed on before transmission of data or sent along with the files.
Asymmetric:
A method on encryption that requires the use of a public key (available to anyone) to
encrypt data and private key (known only to recipient) to decrypt data. The same key
can’t be used to decrypt if it is used for encryption and vice versa.
Encryption Protocols:
An encryption protocol is a set of rules setting out how the algorithms should be used to secure
information. There are several protocols including:
IPsec (internet protocol security)
SSH (secure shell)
Transport Layer Security (TLS) and Secure Socket Layer (SSL)
TLS and SSL:
It is the most popular protocol used when accessing web pages securely. TLS is an improved
version of SSL and has now, more or less, taken over from it.
Three main purpose of SSL/TLS:
Enable encryption in order to protect data
Make sure that the people/companies exchanging data are who they say they are
(authentication)
Ensure the integrity of the data to make sure it has not been corrupted or altered
The use of SSL/TLS in client server communication:
TLS is used for applications that require data to be securely exchanged over a client – server
network, such as web browsing sessions and file transfers. In order to establish a connection
between the client and server, a handshake needs to take place which authenticates the server
to the client before the transfer of data can take place. It allows both parties to agree on a set
of rules for communication and authenticate each other aswell as communicate securely
through asymmetric and symmetric encryption.
Uses of encryption:
hard disk encryption:
When a file is created/written on a disk, it is automatically encrypted, when it is
read, it is automatically decrypted while leaving other files encrypted.
The whole disk is encrypted so that data is protected if the disk is stolen or left
unattended.
Keys need to be secured in a well available location as data can’t be recovered
without the key.
Data can be permanently lost if the encrypted disk crashes or the OS gets corrupted.
email encryption:
Email encryption uses asymmetric encryption, hence both the sender and recipient
needs to send each other a digitally signed message to add each other’s digital
certificate to contacts.
Encrypting emails also encrypts any attachments.
Emails are susceptible to being intercepted by hackers, therefore encrypting all
emails including downloaded ones are a good practice.
encryption in https websites:
Hyper Text Transfer Protocol is shown by a URL having https:// or a padlock.
A session key is encrypted using a public key which is sent to the webserver by web
browser, then it is decrypted using the servers private key, after which all exchange
of information is conducted through encryption using a the session key.
HTTPS uses asymmetric encryption initially to establish a secure session, then uses
symmetric encryption after that point forward
After the session is ended, the symmetric key is disposed of.
Slower than http and needs to be kept upto date by the host, but is more secure in
data transfer and sites with https are give more priority by search engines.
1.4 Checking The Accuracy Of Data
validation: The process of checking data to make sure it matches the acceptable rules
presence check: Checks if data is present and entered
range check: Ensures if data is within a defined range, it has an upper and lower
boundary
type check: Ensures that the data is of a defined type
length check: Ensures if data is of specified length
format check: Ensures if data is of specified format such as data in dd/mm/yyyy
check digit: Uses an algorithm to create a digit or character from the given data and
attaches it to it as the last digit, this is then recalculated to check if the initial digits are
entered correctly
look up check: Checks if the data entered is within the list
consistency check: It checks if the data is consistent with the other selected fields, such
as password confirmation in signup forms
limit check: Checks if the data is within the specified range but it only has one boundary,
such as 13+ age ride check at parks
verification: It is the process of checking that the entered data matches the original
source
visual checking: A manual check performed by the user entering the data. After data
entry is complete, the data on screen is compared against the original document/source
and any errors are corrected before proceeding (i.e banking apps confirming from the
user the amount entered for money transfer)
double data entry: A method of entering the data twice and then comparing the two
entries, if they don’t match then an error has occurred. (the verification could be
performed by another person or done by a computer, i.e entering password twice
during sign up)
parity check: A method of verification to check whether the data has been changed or
corrupted following a data transmission from one medium to another. A byte for
example is assigned a parity bit, the parity is decided at start and can be either odd or
even. If it’s an odd parity then parity bit is either assigned 1 or 0 to make the total
numbers of 1s in the parity byte odd. The same is done if even parity is used
checksum: A method to check if data has been changed or corrupted following data
transmission. The data is sent in blocks and an additional value called the checksum is
calculated by using algorithm such as a hash function on the data. The checksum is send
along at the end of the blocks of data and recalculated again at the receivers end. Both
of the checksum values are compared and if they don’t match an error had occurred
during the transmission
hash total: A hash total is calculated before the transmission of data by adding up all
numbers in one or more selected fields. The hash total is sent along data and is
recalculated at receivers end, if the hash total are same then data has been transmitted
correctly. If the selected field is alphanumeric, it is converted to number for the purpose
of hash total and then added up together.
control total: It is similar to hash total and performs the same action but the calculation
is done on numeric fields only, the value produced is meaningful as alphanumeric fields
are not used. (i.e if marks field is used as control total, then this control total value can
be meaningful and used to find a class average marks for that test)
The need for both verification and validation:
Validation is always carried out by a computer whereas verification can be carried out
by a human or a computer
Validation is checking that the data entered is reasonable and sensible
Verification is checking that the data has been entered, copied or transmitted correctly
but it doesn’t tell if its sensible or not
(add example how with verification and validation a data can be still incorrect if the
original data is incorrect or swapped but in correct format etc)
Verification is a way of ensuring that the user doesn’t make a mistake when inputting
data whereas validation is checking that the data input conforms with what the system
considers to be sensible and reasonable
By using both the chances of entry data incorrect can be reduced
1.5 Data Processing
Data processing is when data is collected and translated into usable information. Data
processing starts with data in its raw form and translates it into a more readable format such as
graphs, diagrams and reports. The processing is required to give data structure and context
necessary so it can be understood by other companies and then used by employees throughout
an organization. Data processing includes actions such as:
collection and storage
editing and updating
sorting and searching
output and dissemination
Batch Processing
In a batch processing system, the individual operations or transactions that need to be
performed on the data are collected together into a batch and then processed at a later date
instead of being worked on one by one by an operator in real time. The data is searched using
sequential access
Examples:
automated backups
the processing of employees wages
customer orders
stock control
It makes use of 2 files:
master file: It stores important data that doesn’t change often such as a person name,
number and address and is sorted in order of keyfield
transaction file: It stores data that requires frequent changes that could be weekly or
daily changes such as hours worked, items sold today, number of visitors
In order to update the master file, a new blank file will be created and used as the new master
file. The following basic algorithm is used.
Use of batch processing in payroll:
Use of batch processing with customer orders:
Advantages of Batch Processing:
It is a single, automated process requiring little human participation which can
reduce costs
Processing can be scheduled when there is little demand for computer resources, for
example, at night, allows to get more work out of hardware
As it is an automated process, there will be none transcription and update errors
that human operators would produce
There are fewer repetitive tasks for the human operator
Disadvantages of Batch processing:
Only data of same type can be processed since an identical, automated process is
being applied to all the data
Errors cannot be corrected until the batch process is complete
Information is not upto date unless until the master file has been updated by the
transaction file
Online Processing
Online processing is done in a computer that has direct communication with a
user. Data is processed almost immediately with a short delay and output is
provided instantly, making it seem like the user is in direct communication with
the computer. Each transition is processed before the next transaction is dealt
with. Data is searched using direct access.
Electronic Funds Transfer (EFT):
Process if sending money from one bank account to another using computer software and
without the involvement of banks staff, eg; ATM, online banking.
Electronic Funds Transfer at Point of Sale (EFTPOS):
Customer going to a point of sale, i.e; going to the counter for checking out / waiter bringing
the card machine to table for payment, is considered a Point Of Sale.
Automatic Stock Control:
An automated system which manages stock control with little human input.
Electronic Data Exchange:
Electronic data exchange or electronic data interchange (EDI) is a method of exchanging data
and documents without the use of paper. The documents can take any form such as invoice or
order with the electronic exchange through computers using a standard format.
An EDI generally has these steps:
1. A company decides to buy some goods, creates an order and does not print it
2. EDI software creates an electronic version of the order and sends it automatically to the
supplier
3. Supplier’s computer system receives the order and updates its system
4. Supplier’s computer system automatically sends a message back to the company,
confirming receipt of the order
Business To Business Buying and Selling:
Refers to buying and selling between two businesses.
A B2B marketplace is similar to B2C marketplace in terms of appearance
On B2B marketplaces, bulk orders can be placed and they can be edited online
Buyers can compare products from different sellers, receive testers/samples and
receiver discounts on large orders
Sellers can save time and money that would have used to setup a large website, on
marketing their products and they can do mini test sale runs to see if a product sells well
B2B marketplaces has more government regulations and complex taxation, shipping is
also complicated and expensive for large orders
Online Stores:
Online stores are websites for a certain shop/chain to sell their products and services
online.
Orders are placed by like how its done in real life by browsing the online catalogue and
adding selected items in a virtual cart and hence checking out online.
Customers can look at a wide range of shops online and compare prices
Customers don’t need to spend extra money on travelling making the shopping online
cheaper and faster
Items are usually cheaper since no on street store is required and wages for staff is
cheaper
Shopping can be done at convenience without being rushed
Reviews for services and products can be found instantly online
Method for checking out online:
Advantages of Online System:
Easier to maintain and upgrade as banks, etc have less busy times so it can be shutdown
for maintenance
Errors are revealed immediately allowing it to be worked on immediately
Useful for online money transactions
Useful in online shopping
Support and stability
Disadvantages of Online System:
Lots of online requests can be difficult to manage as some are spam which can cause
system to crash
May require specialized staff to manage the online systems which increases costs
Failure of network can cause the system to go down
Requires entry of information immediately, making it expensive to run the system
Real Time Processing
Real time processing system is where data is processed as soon as it has been processed and
output is generated immediately. The processing takes places continuously and only stops
when system is turned off by user.
Examples:
computer games
traffic lights
green houses
Some real time systems use a feedback loop where the output directly affects the input. It
makes use of a microprocessor and sensors, sensors measure physical variables and send it to
the microprocessors which compare it with a stored value. If its greater than stored value then
microprocessors sends control signals to an actuator which turns off/on the [ any device ] . This
immediatly affects the new readings sensors picks up. i.e air conditioning systems. Feedback is
basically when the output of the system affects the new input, increasing ac temp will increase
the temp of room and hence the new inputs will differ.
Air- Conditioning Systems:
Rocket Guidance System:
A rocket guidance system makes use of real time processing. As the rocket is launched it could
veer off course ( divert from path) and hence crash. This is where the sensors come in and
measure the respective variable and send it back to the microprocessor which compares it
against stores values. The microprocessor sends appropriate control commands to actuator
immediately to rotate the rocket back to course. Here the output (rotating rocket) affects the
new input (rocket back at path) to the control system. As the rocket moves, its position also
constantly changes so the processing is done continuously to ensure rockets stays in path or
readjusts paths according to the situation, any delay in receiving instruction can cause the
rocket to veer off or crash. This guidance system hence provides stability for the rocket and
controls its movement.
Advantages of Real Time Processing:
has fast real time analysis/processing
information is always up to date, allows computer/microprocessor to take immediate
action
data is collected instantaneously
Disadvantages of Real Time Processing:
occupies the CPU constantly, hence it can be expensive (uses constant power)
requires expensive and complex computer systems
difficult to maintain as it has no down time