{"id":386,"date":"2014-04-08T10:39:00","date_gmt":"2014-04-08T14:39:00","guid":{"rendered":""},"modified":"2015-11-30T07:51:16","modified_gmt":"2015-11-30T12:51:16","slug":"lying-with-big-data","status":"publish","type":"post","link":"https:\/\/jeremy-wu.info\/?p=386","title":{"rendered":"Lying with Big Data"},"content":{"rendered":"<div id=\"pl-386\"  class=\"panel-layout\" >\n<div id=\"pg-386-0\"  class=\"panel-grid panel-no-style\" >\n<div id=\"pgc-386-0-0\"  class=\"panel-grid-cell\" >\n<div id=\"panel-386-0-0-0\" class=\"so-panel widget widget_sow-editor panel-first-child panel-last-child\" data-index=\"0\" >\n<div\n\t\t\t\n\t\t\tclass=\"so-widget-sow-editor so-widget-sow-editor-base\"\n\t\t\t\n\t\t><\/p>\n<div class=\"siteorigin-widget-tinymce textwidget\"><p>About 45 years ago, I spent a whopping $1.95 on a little book titled &#8220;How to Lie with Statistics.&#8221;<\/p><p>Besides the catchy title, its bright orange cover has a comic character sweeping numbers under a rug.\u00a0 Darrell Huff, a magazine editor and a freelance writer, wrote the book in 1954.\u00a0 It went on to become the most popular statistics book in the world for more than half a century. \u00a0A translated version was published in China around 2002.<\/p><p>It takes only a few hours to read the entire book of about 140 pages and 80 pictures leisurely, but it was a major reason why I pursued an education and a professional career in statistics.<\/p><p>The corners of the book are now worn; the pages have turned yellow.\u00a0 One can identify some of the social changes in the last 60 years from the book.\u00a0 For example, $25,000 is no longer an enviable annual salary; few of today\u2019s younger generation may know what a \u201ctelegram\u201d was; \u201cgay\u201d has a very different meaning now; and \u201cAfrican Americans\u201d has replaced \u201cNegroes\u201d in daily usage.\u00a0 As indicative of the bygone era, the image of a cigar, a cigarette, or a pipe appeared in at least one out of every five pictures in the book \u2013 even babies were puffing away in high chairs. \u00a0The word \u201ccomputer\u201d did not show up once among its 26,000 words.<\/p><p>Huff\u2019s words were simple, but sharp and direct.\u00a0 \u00a0He provided example after example that the most respected magazines and newspapers of his time lie with statistics, just like the dreadful \u201cadvertising man\u201d and politician.<\/p><p>According to Huff, most humans have \u201ca bias to favor, a point to prove, and an axe to grind.\u201d\u00a0 They tend to over- or under-state the truth in responding to surveys; those who complete surveys are systematically different from those who do not respond; and built-in partiality occurs in the wording of a questionnaire, appearance of an interviewer, or interpretation of the results.<\/p><p>There were no desktop computers or mobile devices; statistical charts and infographics were drawn by hand; data collection, especially complete counts like a census, was difficult and costly.\u00a0 Huff conjectured, and the statistics profession has also concurred, that the only reliable small sample is one that is random and representative where all sources of bias have been removed.<\/p><p>Calling anyone a liar was harsh then, and it still is now.\u00a0 The dictionary definition of a lie is a false statement made with deliberate intent to deceive.\u00a0 Huff considered lying to include chicanery, distortion, manipulation, omission, and trickery; ignorance and incompetence were only excuses for not recognizing them as lies.\u00a0 One may also lie by selectively using a mean, a median, or a mode to mislead readers although all of them are correct as an average.<\/p><p>No matter how broadly or narrowly lies may be defined, it cannot be denied that people do lie with statistics every day.\u00a0 To some media\u2019s credit, there are now fact-checkers who regularly examine stories or statements, most of them based on numbers, and evaluate their degree of truthfulness.<\/p><p>In the era of Big Data, lies occur in higher velocity with bigger volume and greater variety.<\/p><p>Moore\u2019s law is not a legal, physical, or natural law, but a loosely-fitted regression equation in logarithmic scale.\u00a0 Each of us has probably won the Nigerian lottery or its variations via email at least a few times.\u00a0 While measures for gross domestic products or pollution are becoming more accurate because of Big Data, nations liberally use their aggregate or per capita average, depending on which favors their point of view.<\/p><p>Heavy mining of satellite, radar, audio messages, sensor, and other Big Data may one day solve the tragic mystery of Malaysian Flight MH370, but the many pure speculations, conspiracy theories, accusations of wrongdoing, and irresponsible lies quoting these data have mercilessly added anguish and misery to the families of the passengers and the crew.\u00a0 No one seems to be tracking the velocity, volume and variety of the false positives that have been generated for this event, or other data mining efforts with Big Data.<\/p><p>The responsibility is of course not on the data; it is on the people.\u00a0 There is the old saying that \u201cfigures don\u2019t lie, but liars figure.\u201d \u00a0Big Data \u2013 in terms of advancing technology and availability of some massive amount of randomly and non-randomly collected electronic data &#8211; will undoubtedly expand the study of statistics and bring our understanding and governance to new heights.<\/p><p>Huff observed that \u201cwithout writers who use the words with honesty and understanding and readers who know what they mean, the result can only be semantic nonsense.\u201d \u00a0Today many statisticians are still using terms like \u201cType I error\u201d and \u201cType II error\u201d in promoting statistical understanding, while these concepts and underlying pitfalls are seldom mentioned in Big Data discussions.<\/p><p>At the end of his book, Huff suggested that one can try to recognize sound and usable data in the wilderness of fraud by asking five questions: Who says so? How does he know? What\u2019s missing? Did somebody change the subject? Does it make sense?\u00a0 They are not perfect, but they are worth asking.\u00a0 On the other hand, healthy skepticism should not become overzealous in discrediting truly sound and innovative findings.<\/p><p>Faced with the self-raised question of why he wrote the book, especially with the title and content that provides ideas to use statistics to deceive and swindle, Huff responded that \u201c[t]he crooks already know these tricks; honest men must learn them in defense.\u201d<\/p><p>How I wish there is a book about how to lie with Big Data now!\u00a0 In the meantime, Huff\u2019s book remains as enlightening as it was 45 years ago although the price of the book has gone up to $5.98 and is almost matched by its shipping cost.<\/p><p>Jeremy S. Wu, Ph. D.,\u00a0<a href=\"mailto:jeremy.s.wu@gmail.com\">jeremy.s.wu@gmail.com<\/a><\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>About 45 years ago, I spent a whopping $1.95 on a little book titled &#8220;How to Lie with Statistics.&#8221;Besides the catchy title, its bright orange cover has a comic character [&hellip;]<\/p>\n","protected":false},"author":4,"featured_media":412,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[5,6,18],"tags":[8,22,86,85],"class_list":["post-386","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-big-data","category-statistics","category-statistics-2-0","tag-data-quality","tag-lies","tag-random","tag-sampling"],"_links":{"self":[{"href":"https:\/\/jeremy-wu.info\/index.php?rest_route=\/wp\/v2\/posts\/386","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/jeremy-wu.info\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/jeremy-wu.info\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/jeremy-wu.info\/index.php?rest_route=\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/jeremy-wu.info\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=386"}],"version-history":[{"count":11,"href":"https:\/\/jeremy-wu.info\/index.php?rest_route=\/wp\/v2\/posts\/386\/revisions"}],"predecessor-version":[{"id":750,"href":"https:\/\/jeremy-wu.info\/index.php?rest_route=\/wp\/v2\/posts\/386\/revisions\/750"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/jeremy-wu.info\/index.php?rest_route=\/wp\/v2\/media\/412"}],"wp:attachment":[{"href":"https:\/\/jeremy-wu.info\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=386"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/jeremy-wu.info\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=386"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/jeremy-wu.info\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=386"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}