Data mining is the process of extracting patterns from data. As more data are gathered, with the amount of data doubling every three years,[1] data mining is becoming an increasingly important tool to transform these data into information. It is commonly used in a wide range of profiling practices One of the most challenging problems of the information society is dealing with the increasing data overload. Due to the digitalization of all sorts of content and due to the improvement and drop in cost of recording technologies, the amount of available information is enormous and is increasing exponentially. It has thus become important for, such as marketing Marketing is the process associated with promoting for sale goods or services. It is considered a "social and managerial process by which individuals and groups obtain what they need and want through creating and exchanging products and values with others." It is an integrated process through which companies create value for customers, surveillance Surveillance is the monitoring of the behavior, activities, or other changing information, usually of people and often in a surreptitious manner. It most usually refers to observation of individuals or groups by government organizations, but disease surveillance, for example, is monitoring the progress of a disease in a community, fraud In the broadest sense, a fraud is an intentional deception made for personal gain or to damage another individual. The specific legal definition varies by legal jurisdiction. Fraud is a crime, and is also a civil law violation. Many hoaxes are fraudulent, although those not made for personal gain are not technically frauds. Defrauding people of detection and scientific discovery.
While data mining can be used to uncover patterns in data samples, it is important to be aware that the use of non-representative samples of data may produce results that are not indicative of the domain. Similarly, data mining will not find patterns that may be present in the domain, if those patterns are not present in the sample being "mined". There is a tendency for insufficiently knowledgeable "consumers" of the results to attribute "magical abilities" to data mining, treating the technique as a sort of all-seeing crystal ball. Like any other tool, it only functions in conjunction with the appropriate raw material: in this case, indicative and representative data that the user must first collect. Further, the discovery of a particular pattern in a particular set of data does not necessarily mean that pattern is representative of the whole population from which that data was drawn. Hence, an important part of the process is the verification and validation Verification and validation is the process of checking that a product, service, or system meets specifications and that it fulfills its intended purpose. These are critical components of a quality management system such as ISO 9000. Sometimes preceded with "Independent" to ensure the validation is performed by a disinterested third party of patterns on other samples of data.
The term data mining has also been used in a related but negative sense, to mean the deliberate searching for apparent but not necessarily representative patterns in large numbers of data. To avoid confusion with the other sense, the terms data dredging Data dredging is the inappropriate (sometimes deliberately so) use of data mining to uncover misleading relationships in data. These relationships may be valid within the test set but have no statistical significance in the wider population and data snooping are often used. Note, however, that dredging and snooping can be (and sometimes are) used as exploratory tools when developing and clarifying hypotheses.
Contents |
BusinessWeek
The company has finished a multiyear data - mining project to fully understand its environmental impact on the planet and has published data that may stir up ...
Apple Scores Some Points With Environmental Groups, But Not All BusinessWeek
all 42 news articles »
