Home
»Unlabelled
» Read Online Practical Data Science with R Nina Zumel John Mount Jim Porzak 9781617291562 Books
By
Dale Gilbert on Thursday, May 9, 2019
Read Online Practical Data Science with R Nina Zumel John Mount Jim Porzak 9781617291562 Books
Product details - Paperback 389 pages
- Publisher Manning; 1st edition (April 13, 2014)
- Language English
- ISBN-10 1617291560
|
Practical Data Science with R Nina Zumel John Mount Jim Porzak 9781617291562 Books Reviews
- Practical Data Science is a fun interesting book.. There are parts that lost me.... like watching a ball under the three shells kind of thing... but-- the 70% of the book that I DID get is remarkable. I learned many things that I will put to use. Well worth the price of the book. For example-- I loved the lookup vectors to change values.. very interesting graphs. This book does not waste your time. Hopefully soon I can grasp everything. Good job. I recommend for all R users. I love that the examples are about real business situations and not plant life.
- This book is what I was looking for for my new job as a Credit Risk Data Modeler (basically data science applied to credit problems). It's coverage is broad, but deep and applied enough that I have been able to apply its contents in my day-to-day work. I look forward to a second edition which will hopefully rectify the following
Earlier in the book it seemed the authors took great pains to explain in layman's terms the various statistical elements of the topic they were covering. They provided very clear and meaningful explanations which made a lot of sense of complex topics. But later in the book it seemed that that approach largely went out the window and they started using more technical boiler plate to describe the various statistical tests and procedures. Rather than perhaps give the technical boilerplate (as you'd see it in a textbook) and then elaborate on it with a more human-centric explanation, they would just leave it at the nearly impervious technical description and then proceed to explain how to conduct the test/procedure/etc in R. But without understanding of what you're trying to accomplish and why, it's hard to write the code to actually do it. Keep in mind that I'm relatively well prepared for this book too, having had as much stats and econometrics as I could fit into my four-year degree. If I found some sections of the book too technical to understand then it seems likely that the book would benefit from some additional explanation and discussion in those later sections.
Also, I have a good deal of "boots on the ground" experience with this book in my attempts to apply it in my daily work. I've found that it is useful, but could be more useful if there was more discussion of various practical problems. For instance, much of my work is focused on producing a predictive model of likelihood of charge-off. I.e., if we approve and fund this application, how likely is it to perform or charge-off. The book shares some high-level approaches to finding problems in data (using plots and summaries), fixing those problems using various techniques, selecting variables, and how to conduct the statistical modeling (logistic in my case). But it fails to really tie those areas together beyond the high-level. For instance, what are the assumptions of a logistic regression? How do you resolve issues in your data to ensure that you meet those assumptions and can perform a valid logistic regression? How do you really select variables when you're faced with at least 20 possibilities (and potentially many many more if you count interaction terms, unfixed variables, and variables which have been fixed in different ways)?
I suppose, for what it was, that it is "mission accomplished." I'd just like to see a lot more. Perhaps there's need for a second volume? Perhaps "Advanced Practical Data Science with R?" Either this book could have a second edition with a lot more content covering finding data problems, resolving those problems intelligently (for instance, resolving missing data is basically left as "either drop the effected records" or "use the mean as a replacement or the missing value," but there are alternative methods which may be more suitable), what data problems will cause issues in OLS regression, logistic regression, and machine learning; And how to practically select variables and a model. I feel like the book gave me some tools to apply (like a small box of tools you might purchase from a hardware store), but left a lot out. So now I'm in deep water trying to figure out why my logistic regression isn't predictive enough and what I can do about it. Is it the data and how I fixed variables? Is it the variables I've selected? Should I have used automated variable selection techniques? Or just manually tried different variables? How does an experienced practitioner approach these problems? I know they iterate explore data, clean data, select variables, select model, test model, look at data, change data, change variables, etc... but practically speaking what does it look like? In the book they offer a hand-coded basic variable selection script, and mention that one could also use stepwise variable selection. In the real world I'm reasonably sure that this is not actually done--mostly because their selection script does about as well as stepwise at selecting appropriate variables. There are many other better ways of selecting variables, I've discovered, and I wish that they'd discussed some of those ways (pros and cons), and shown how to conduct them in a meaningful fashion. Same thing with building a model. In my case, I have a whole bunch of variables, limited data (about 2000 records, with the desired outcome only occurring in 120 of those), and the automated tools (various R packages I've discovered and applied) either take a long time to run and/or yield poor results. But if not automated tools then what? Manually add variables and ANOVA test the difference between the first and second model?
I'd just like more...more discussion and elaboration and examples of how practical data science is conducted. This book seems like it does a fantastic job as an introduction to the topic, but you'll quickly find that you'll be in deep water without a clue how to swim--as in my case. You'll be left to your own devices, and find yourself wishing, as I do, that there was more in the book (or another book) that I could study after this one which would help take me from beginner data scientist to intermediate.
Overall, I'm very glad I bought and read the book. - Hands-on textbook that covers pretty much all aspects of data science (with keen attention to business demands). Importantly, it doesn't shy away from discussing statistical details behind the most common routines in machine learning, which I really appreciated as I was tired of typical DS books that take a black box approach that just shows "how" without explaining the "why". I think it's worth having a copy of this book irrespective of whether you are a beginner to data science or a veteran. Highly recommended!
If you buy a hard copy, I would also recommend having a look at the colorful figures in the companion soft copy of the book. - Love this book. Any one interested in data science should get this hands-on experiential learning book.
- The book has a lot of good examples and gets you jump started on being productive.
- Great very detail book! I am an analytics professional looking to starting using R programming to leverage my data the best way I can.
- Good Read
- An awesome book that's helped me on occasion with my coursework and my job.