60 likes | 166 Views
EMAIL – A SPECIAL CASE OF UNSTRUCTURED DATA. A presentation by W H Inmon. privacy is a big issue - the US stance – the employer owns the email - other countries – the employee owns the email. the legal staff of some corporations mandates the storage of emails
E N D
EMAIL – A SPECIAL CASE OF UNSTRUCTURED DATA A presentation by W H Inmon
privacy is a big issue - the US stance – the employer owns the email - other countries – the employee owns the email the legal staff of some corporations mandates the storage of emails the legal staff of other corporations prohibits the storage of emails emails are a prominent form of unstructured text
when it comes to storing emails, some corporations have a huge amount of data tied up it is one thing to store emails; it is quite another to go back and analyze their content
when dealing with emails it is important to handle the attachments as well as the body of the email. in addition it may be advisable to remove the trail of emails that sometimes are attached to each other because of the multiplicative effect that can occur when emails are “piggybacked”
but there is a special problem associated with email - most emails tend to be short – written with a brevity of words. As a consequence, emails usually have very little content in them. Trying to do a deep analysis on the text found in emails is almost impossible to do because there simply is no text that contains very much pithy material.
this means that the customer must be aware of the potential limitations of emails before engaging in an analytical project based on email. If the client has unrealistic expectations about what can be accomplished, the client is bound to be disappointed, regardless of the merits of the textual ETL tool or the sophistication of the design of the project.