Kaggle is a website to host coding competitions related to machine learning, big data, or otherwise all things data science.
Newly launched on Kaggle is a healthcare-related competition! A group of health institutions provided a large data set consisting of three patients’ interictal and preictal (up to 1 hour before) EEG tracings in raw data. The goal? Predict which “unknown” EEGs are preictal so healthcare providers can intervene.
Also, with the timely arrival of Internet of Things (IoT), wearable, and big data, can you imagine the impact of giving patients an accurate 5-minute warning every time a seizure is about to start?
Unfortunately, we’re still ways from 100% accuracy. There have been many PubMed-worthy papers (recently this one and this one) achieving >95% sensitivity and specificity on this popular computational epilepsy problem. 95% sounds great, but in real-time wearable technology, it means many, many false-positive detections.
So the challenge is here: This time, the data is publicly available, and the solution will be crowd-sourced with a $20,000 prize. Do you have what it takes to beat out the machine learning gurus in the world?
Predict seizures in long-term human intracranial EEG recordings Epilepsy afflicts nearly 1% of the world’s population, and is characterized by the occurrence of spontaneous seizures. For many patients, anticonvulsant medications can be given at sufficiently high doses to prevent seizures, but patients frequently suffer side effects. For 20-40% of patients with epilepsy, medications are not effective. Even after surgical removal of epilepsy, many patients continue to experience spontaneous seizures. Despi
Source: Melbourne University AES/MathWorks/NIH Seizure Prediction | Kaggle
Crowd-Sourcing, Machine Learning, and Radiology
Before you write this off and think that the complexity for radiology is far above that of an EEG think again. National Institute of Health (technically the National Heart, Lung and Blood Institute) also hosted a Kaggle competition on cardiac MRI back in March of 2016.
The challenge: automatically calculate ejection fraction using cardiac MRI using stacks of images that look like this:
The grand prize: $200,000.
Who Won? Who’s Next?
Those who wish to compete must figure out how to computationally segment a cardiac MRI and left ventricle (LV) without human input, identify end-systole and end-diastole, then extrapolate the LV volume.
You would think software engineers from established cardiac MRI software engineers like Syngo.via would blow away all the competitors. This was not what happened.
Qi Liu and Tencia Lee walked away with the grand prize, topping almost 1,400 other algorithms. Their health care background? Zip. Using extensive data analytics for their daily work in hedge fund management, they applied the same data-driven rigor to MRI images they can’t read themselves, and their product beat out 1400 competing algorithms. The algorithm provides a fully automated LV contour, its calculated LVEF similar in accuracy to trained cardiac imagers.
Conclusion
If you still think data science and machine learning is decades away from competing against highly trained professionals, it’s time to rethink your paradigm. Data science competitions are no longer just geeky platforms for the programming hobbyist but recognized by NIH as a powerful way to tap into the collective cross-disciplinary wisdom in finance and tech industries.
Pingback: New Seizure Prediction Machine Learning Competition on Kaggle – Radiology Data Quest