The first part of this discussed the heterogeneity of data projects and how a uniform approach can help hone in the solution. The first post also discussed the first two elements: Refine the question, and identifying the right data. Here we tackle the next two elements.
Plan Your Approach
At this step, we begin to go into the technicalities of data science. This post is not designed to go into the detail of each approach, but it will attempt to ask the relevant questions.
How will you process the data that you now possess? In almost all cases, this step will involve data wrangling (also known as data munging or data cleaning). To determine how the “clean” form your data must take for proper analysis, it is important to determine the transformations and algorithms necessary for your question. Continue reading
Big Data has become a radiology buzzword (the others: machine learning, AI, and disruptive innovation are also up there).
However, there is a real problem with using the term Big Data – it isn’t just one set of data problems. Big Data is a conglomerate of different data challenges: volume of data, heterogeneity of data, or the velocity of data are all important dimensions. Machine learning and internet of things are others layers superimposed on the big data problem.
Sometimes it is helpful to step back and approach data problems with a common framework, a way to think about how and which facets of data science fit in a real-life workflow in the face of an actual problem.
Below is a 6-element framework that helps me think about data-driven informatics problems. They are generally in chronological order, but they are not “steps” because you frequently will find yourself going back and redefining many things. However, the framework helps you maintain a big-picture outlook. The reason any sufficiently complex data problem requires a team approach. Continue reading