Data warehousing
(CS614)
Assignment # 3 (GRADED)
Total marks = 20
Deadline Date: 16-Feb-2015
Please carefully read the following instructions before
attempting the assignment.
Rules for Marking
It should be clear that your
assignment would not get any credit if:
- The
assignment is submitted after due date.
- The
submitted assignment does not open or file is corrupt.
- The assignment is copied. Note that
strict action would be taken if the submitted assignment is copied from
internet or any other source.
1) You
should consult recommended books to clarify your concepts as handouts are not
sufficient.
2) You
are supposed to submit your assignment in .doc
format. Any other formats like scan images, PDF, Zip, rar, bmp, docx etc
will not be accepted
3) You
are advised to upload your assignment at least two days before Due date.
4) Assignment
is Graded and will contribute marks in final grades
Important Note:
Assignment comprises of 20 Marks. Note that no assignment will be
accepted after due date via email in any case (whether it is the case of load
shedding or emergency electric failure or internet malfunctioning etc.). Hence,
refrain from uploading assignment in the last hour of the deadline, and try to
upload Solutions at least 02 days before the deadline
to avoid inconvenience later on.
For any query please contact: CS614@vu.edu.pk
Consider
the below applicant table:
Applicant_ Info
.Question:
Apply all three steps of
Basic Sorted Neighborhood (BSN) method to find out the duplicate records in the
table. Records will be considered duplicate if the value of “Applicant_id
” column is same in these
records.
Use the following rules
for the key:
Key:
Key will consist of
first three characters from “Applicant_id”,
then first three characters from “Applicant_Name”
and then first two characters from “father_Name”
column.
BSN method comprises of three steps given below:
a) Create key
In step-1, you will create the key according to the rules as
mentioned above against each record. For this, you can add extra column at the
end of the table to show the new key created against each record.
b) Sort the data
In step-2, you will sort the record on the basis of newly
created key of step-1.
c) Merge
In step-3, consider the window size (w) equal to two (2). You
are required to identify the similar records on the basis of sorted key.







Please Discuss here about this assignment.Thanks
ReplyDeleteOur main purpose here discussion not just Solution
We are here with you hands in hands to facilitate your learning and do not appreciate the idea of copying or replicating solutions.