Learning from machine learning: ensembling, and other important skills

In my downtime, I’ve been using Kaggle to get better at applying machine learning to solve problems. The process is not only teaching me new technical skills, but also reminding me of some useful principles that can be applied elsewhere. To keep things digestible, this is the second post of two (the first one is here).

A short list of important skills for a data scientist

When trying to get better at a skill, I try to tackle the highest leverage points–here’s what I’ve been able to gather about three skills that are important in being a data scientist*, from talking with others and reading about machine learning, and experiencing it firsthand with the client projects I do.

  1. Feature engineering
  2. Communication (includes visualization)
  3. Ensembling

The first two are relatively self-explanatory, ensembling brings some pretty interesting concepts that apply to decision-making, in my opinion.

*I’ll be referring to the “applier of machine learning” aspect of “data science”.

Feature engineering

Feature engineering is the process of cleaning, transforming, combining, disaggregating, etc. your data to improve your machine learning model’s predictive performance. Essentially, you’re using existing data to come up with new representations of the data in the hopes of providing more signal to the model–feature selection is removing less useful features, thus feeding the model less noise, which is also good. The practitioner’s own domain knowledge and experience is used a lot here to engineer features in a way that will improve the model’s performance instead of hurt it.

There are a few tactics that can be generally applied to engineer better features, such as normalizing the data to help certain kinds of machine learning models perform better. But usually, the largest “lift” in performance comes from engineering features in a way that’s specific to the domain or even problem.

An example is using someone’s financial data to predict likelihood of default, on a loan for example. You might have the person’s annual income and monthly debt payments (e.g. for auto loans, mortgages, credit cards, the new loan they’re applying for), but those somewhat closer to the lending industry will tell you that a “debt to income ratio” is a better metric for predicting default, because it essentially measures how capable the person is of paying of his/her debt, all in one number. After calculating it, a data scientist would add this feature to the training data, and would find that their machine learning model performs better at predicting default.

As such, feature engineering (and in fact, most of machine learning) is sort of an art vs. a science, where a creative spark for an innovative way to engineer a domain specific feature is more effective than hard and fast rules. They say feature engineering can’t be taught from books, only experience, which is why I think Kaggle is in an interesting position because they’re essentially crowdsourcing the best machine learning methodologies for all sorts of problems and domains. There’s a treasure trove of knowledge on there, and if structured a little better, Kaggle could contribute a lot to machine learning education.

timestamps

What potentially useful features/data could we engineer from timestamp strings? We could generate year, month, day, day of week, etc. numeric data columns–much more readable by a machine learning model.

Communication

During a recent chat with one of the core developers of the Python scikit-learn package, I asked what he thought some of the most important skills for a data scientist are. I sort of expected technical skills, but one of the first things that came up was communication, or being able to convey findings and why those findings matter to both internal and external stakeholders, like customers. This one’s self explanatory–what good is data if you can’t act upon it.

In fact, it seems like communicating well for data scientists might be even more important than it is for professions like programmers or designers because there’s a larger gap between result and action. For example, with a design or app, a decision maker can look at it or play around with it do understand it reasonably well to make decision, whereas a decision maker usually can’t just see a bunch of numbers that were spit out by a machine learning model and know what to do: how are those numbers actionable, why should someone believe those numbers, etc. Visualization is a piece of this, as it’s choosing the right charts, design, etc. to communicate your data’s message most effectively.

Ensembling

In machine learning, an ensemble is a collection of models that can be combined into something that performs better than the individual models.

An example: one way this is done is via the voting method. The different base, or “level 0”, models each make a prediction on, say, whether a person is going to go into default in the next 90 days. Model A predicts “yes”, model B predicts “yes”, and model C predicts “no”. The final decision then becomes the majority vote, here “yes”.

There are many other ways of ensembling models together. An important and powerful one is called stacking, and it is applying another machine learning model–called a “generalizer”, or “level 1” model–on the predictions of the base models themselves. This is better than the voting method because you’re letting the level 1 machine learning model decide which level 0 models to believe more than others based on the training data you feed into the system, instead of arbitrarily saying “the majority rules”.

 

ensembling

A high level flow chart of how stacking works.

Ensembling is a key technique in machine learning to improve predictive performance. Why does it work? We all have an intuitive understanding for why it should work, because it’s a decision making framework we all have probably used, or been a part of, before. Different people know different things, and so may make different decisions given a particular problem. When we combine them in some way–like a majority vote in Congress or at the company we work at–we “diversify” away the potential biases and randomness that comes from just following one decision maker. Then, if you add in some mechanism to learn which decision makers should have their decisions weighed more than others based off of past performance, the system can become even more predictive–what areas could benefit from this improved, performance based decision-making process?*

*Proprietary trading companies, where every trade is a data point and thus generated very frequently, do this more intelligent way of ensembling, in a way, by allocating more money to traders who’ve performed better than others historically. A trader who is maybe slightly profitable but makes uncorrelated trades–for example by trading in another asset class–will still be given a decently sized allocation, because his trades hedge other traders’ trades, thus improving the overall performance of the prop trading company. Analogously, in machine learning, ensembling models that make uncorrelated predictions improves overall predictive performance.

Resources

Here are some resources related to the topics described above that were recommended to me and that I found most useful, I hope they’re helpful to you too.

  • A good overview of the principles of data science and machine learning for non-technical and technical folk alike: Data Science for Business
  • Code example of stacking done with sklearn models
  • An important thing for a data scientist to have before any of the stuff above is a good understanding of statistics, Elements of Statistical Learning is a detailed survey of the statistical underpinnings of machine learning.

Learning from machine learning: deliberate practice

In my downtime, I’ve been using Kaggle to get better at applying machine learning to solve problems. The process is not only teaching me new technical skills, but also reminding me of some useful principles that can be applied elsewhere. To keep things digestible, this is the first post of two.

Deliberate practice, with Kaggle

Deliberate practice–practice that is repeatable, hard, and has fast feedback (e.g. with a coach)–is needed to master any skill. Kaggle provides a great medium for machine learning deliberate practice: you can still solve the problems that were for old competitions, read about what the top performers did, and get instant feedback on how well your machine learning model performed vs. other peoples’.

Screen Shot 2016-05-27 at 5.40.51 PM

Aside from accessible deliberate practice, self-learning this way has another big benefit over some of the in-person data science/machine learning classes I’ve observed: the student has control. I can learn as fast or as slow as I need to. I can learn about what I want: not only about what I find most interesting, but about what the top performers on Kaggle and other experts are doing to be successful.

I attempt to solve a machine learning problem on Kaggle, see how I performed, read about and take notes on what the top performers did, and fill in my knowledge gaps with lots of research on Google, continuously cycling between writing down questions about new terms or concepts that come up and answering them. The self-paced, deliberate nature of this learning avoids what Sal Khan calls “Swiss cheese gaps” in education–though of course, it is up to the learner him/herself to stay disciplined and engaged.

Screen Shot 2016-05-29 at 8.29.14 PM

The “cycle” of deliberate practice described. Important things to note: it is closed, which allows for the learning from feedback, and it is fastwhich allows for that learning to happen quickly, and to be timely.

Something like Khan Academy provides a great structure for self-paced, deliberate-practice-oriented learning for more “traditional” academic topics. I see opportunity for more things like it, in other educational areas. Also, if anyone has found any helpful tools for self-learning, would love to hear about them. I personally use a lot of Google Docs for note-taking, mind42 for topic hierarchies, pinboard to keep track of my online research, sometimes Quizlet to help me memorize things.

Next: 80/20-ing machine learning

In the next post, I will get slightly more technical and into some of the “highest leverage” machine learning concepts and skills, as well as share some resources (including advice from one of the most helpful machine learning educators and practitioners I’ve had the pleasure to interact with). There should also be at least one principle/mental model for those less interested in the technicals of machine learning. As always, please be critical and feel free to discuss anything and everything, I love learning from other perspectives.

My Attempt To Make Clinical Trials More Efficient

cr net screenshot

For a few months, on nights and weekends while working at my most recent job, I worked on a project to help make clinical trials more efficient, and even built a prototype (the screenshot above, you can play around with it here)–I gave it the memorable and exciting name “Clinical Research Network”.

Though my project didn’t “succeed” in the traditional sense, I learned a lot about this interesting area of health/biotech, and got to practice several important product development skills. The following are the important parts of my story, but warning, it’s still a long post.

Clinical trials have a hard time recruiting enough patients, which causes a lot of waste.

I received an email from HeroX one day about a competition to see who could come up with the best idea to help clinical trials recruit more patients. Intrigued, I did more research on the problem, and decided to enter the competition: worst case I would spend a little time writing a proposal that didn’t win, but still get to learn more about this fascinating problem.

As discussed in a previous post, roughly 10% of clinical trials terminate unsuccessfully because they’re unable to recruit enough patients for the study. There are roughly a thousand new clinical trials every year, and since a clinical trial costs on average $30M-$40M, a lot of money is spent on clinical trials that don’t end up contributing much to the advancement of science and medicine.*

The HeroX competition’s more quantifiable goal was to come up with ideas that could double the patient recruitment rate from 3% to 6%, patient recruitment rate being defined as number of patients who participate in clinical trials / total number of patients out there. The more patients participate in clinical trials, the faster medical research accelerates.

*The numbers used to “size up” the problem are very rough, and taken from various sources. My model also did not account for the fact that a lot of clinical trials that do complete successfully still have trouble recruiting patients fast enough, so go way over-schedule and over-budget. But the order of magnitude should be close. See the model for more details.

Questioning assumptions, asking why

The problem was framed so that solutions tackling recruitment first came to mind e.g. increasing patient awareness of clinical trials through tools, advertising, etc., connecting patients to clinical trials automatically by leveraging EMR data.

But I wanted to understand the problem at a deeper level, vs. taking things at face value. I put together a simple model in Google Sheets and let the numbers shed some light on the problem. Interestingly, even if all clinical trials were able to recruit enough patients with a wave of a magical wand, the patient recruitment rate would only increase by 4%, much less than the competition’s desired 100% increase, or doubling, of the patient recruitment rate. This suggests that if we really want to accelerate medical research and get more of the patient population to participate in clinical trials, we’re not only going to need to recruit patients better, but we’ll also need a lot more clinical trials, clinical trials that happen faster and more efficiently.

Screenshot of Patient Recruitment Model
Screenshot of Patient Recruitment Model

I wrote a proposal for the competition, submitted it, and…

What idea did I submit?

An idea for a SaaS product that would mine/learn from all the data we have on previous clinical trials (a lot of it public), and help pharmaceutical companies and investigators learn from the past. This product would essentially be a search engine on top of a “similarity graph”, where pharma and/or doctors/investigators could describe their clinical trial, and see other trials that were similar in some way (perhaps disease treated, or similar inclusion/exclusion criteria), and learn from what made those clinical trials succeed or fail.

Why did I submit that?

  1. There’s a lot of data out there on clinical trials, even publicly available data like clinicaltrials.gov. There has to be some sort of knowledge we can learn from all the clinical trials we’ve already conducted, from both the successes and failures.
  2. Clinical trials face many different obstacles to recruiting patients, mostly because they themselves are very different–different populations, different diseases, different treatments, different investigators running the trial, different locations. But this doesn’t mean that trials aren’t similar to other trials in some way, so something that worked for one trial could also work for another, depending on how they’re similar.
  3. As mentioned before, I realized that the actual clinical trial process needs to be faster, more efficient, and cheaper to drive a meaningful acceleration of medical research. This was a tool that pharma and investigators/doctors could use to both plan and run a clinical trial more efficiently.

My idea didn’t win any of the prizes for the competition, but that’s ok.

If interested, you can see the winning entries (as well as the “top 10”, not sure where all the other entries went).

Getting out of the office

I asked for feedback on how my entry was judged, but didn’t get anything back. Still following my curiosity for the problem, I decided to talk to more people actually involved in clinical trials–I had originally found out about the competition two weeks before the deadline, so given some more time I felt I could come up with something more useful.

I developed a script to scrape clinicaltrials.gov for investigator contact info, and was able to gather a good list of physicians in the NYC area. I also used Mechanical Turk to fill in what I wasn’t able to scrape, such as a doctor’s research institution. After writing a bunch of emails to request to meet, one doctor actually got back to me! After that it was a bit easier, as I would ask the doctors if they knew anyone else I could talk to, and also name-drop the institutions I had visited already. I got to speak to a couple ex-pharma individuals from this effort too.

The two biggest things I learned from speaking to the handful of physicians and ex-pharma folk:

  1. Physicians don’t really talk to and learn from each other when it comes to clinical trials, e.g. about patient recruitment best practices. They’re extremely busy, and there isn’t really an incentive to help another physician who may be seen as a “competitor” (both in terms of revenue and research).
  2. Though investigators (physicians) recruit patients for a clinical trial, pharma and “contract research organizations” (CROs) recruit the investigators to run a clinical trial (among a ton of other stuff to set up and support the trial). It seemed that industry’s methods for investigator selection were pretty manual: they would rely on their own personal, immediate networks, maybe look at which investigators they worked with in the past.

Building something fast

I decided to build an MVP that was based on my learnings. There’s a lot that can be improved in the clinical trials process, so I thought about leverage, and a decision tree: decisions made earlier in a process can have a big impact on the decisions made later. This early task of “investigator selection” that pharma does when setting up a clinical trial (point 2) sounded like a good one to try and tackle with technology. It also isn’t something that investigators themselves are super concerned with, which would get around the obstacles discovered in point 1. There’s a lot of public data out there on clinical trials (clinicaltrials.gov) and research that came out of the trials (PubMed), so I wanted my tool to leverage this data.

I threw together something really quickly using Flask, the python framework. Use cases: pharma could type in a drug and find the researchers who published the most research on that drug–those physicians might be good candidates as investigators for a clinical trial that used that drug (to perhaps treat a different disease). Patients could type in the disease they had and find the physicians who were perhaps the most knowledgable on that disease. On the backend, data was scraped from PubMed, and essentially just restructured to be more useful for this particular case.

I started showing the “Clinical Research Network” to people in the biotech space to see what they thought…

The end?

…and I quickly found out that several companies, both small and large, were tackling this exact problem. They had way better credentials, more money, and free snacks at the office–how can I compete with free snacks?

So I put this project on hold, mulled over the possibility of working for them, and decided to move onto other ideas I was thinking about. I like writing post-mortems for my projects, and one of the biggest learnings was that I seemed to have “overextended” myself in a sense: I felt like my struggle was a very steep uphill climb from the beginning because I didn’t have the industry credentials and I didn’t yet have the industry network, very important aspects in an industry like biotech and healthcare.

Overall, the project was a great learning experience, and I got to practice several problem solving skills I find powerful and fun.

Pharma Paid Physicians $6.5B in 2014 – Looking Into The Open Payments Dataset

My friend Jesse introduced me the Open Payments Dataset, which tracks the details of all payments made by “applicable” healthcare manufacturers (like pharmaceutical companies, medical device manufacturers) to any doctor they work with. A federal program maintains this database, which is a product of the Sunshine Act, part of the Affordable Care Act.

Why does this database exist? Basically because of the incentives created by industry being able to pay doctors to work on things that will ultimately help industry–like new drugs or medical devices. The hope is that more transparency will reduce any harmful influence that industry could have on medical research, education, and clinical decision making. In the words of Senator Grassley, co-author of the Sunshine Act:

Disclosure brings about accountability, and accountability will strengthen the credibility of medical research, the marketing of ideas and, ultimately, the practice of medicine. The lack of transparency regarding payments made by the pharmaceutical and medical device community to physicians has created a culture that this law should begin to change substantially. The reform represented in the Grassley-Kohl Sunshine Law is in patients’ best interest.

The healthcare industry pays physicians a lot, almost $6.5B in 2014 alone. What is being paid for though (or, what does industry report the payments are for)? Who’s getting paid, and how much? I decided to do a quick analysis to start answering these questions and to see if there was anything interesting at a high level.

Most top paid physicians get paid royalties or license fees

The most a single physician got paid in 2014 was almost $44M. The interesting thing is that for this physician and several other top paid physicians, almost the entire total came from payments that were categorized is this unhelpfully-named category, “Compensation for services other than consulting, including serving as faculty or as a speaker at a venue other than a continuing education program” (orange).

A large majority of the other of the top paid physicians got paid primarily from “Royalty or License” (green), which makes sense: a surgeon may invent a new surgical technique and license it to a medical device company.

Another interesting phenomenon is that a handful of doctors in the top 100 earners were paid by industry solely for their research (purple). The status quo of industry having all the money and thus paying/funding research–sometimes both the design of and execution of the research–can create incentives with negative consequences for the validity of the results.

You can play around with the charts like the one below by zooming, mousing over data points to see their values, and showing/hiding different data series by clicking on each one in the legend. Physician names have been replaced with numbers for anonymity.

Chart embedded below, or link

Orthopedic surgeons received the most industry payments, followed cardiovascular physicians

Orthopedic surgeons received the most money from industry, almost twice the amount that cardiovascular physicians received, in 2014. Interestingly, most of payments to orthopedic surgeons, and other types of surgeons, were for royalties or licenses (green), whereas most payments for physicians–cardiovascular and otherwise–were for “Compensation for services other than consulting” (orange), “Research” (purple), and “Consulting” (purple).

Click to show interactive chart (some labels are crazy long so embedding didn’t look good. “A&O” stands for “Allopathic & Osteopathic Physicians”):
Payment Received by Physician Specialty in 2014 (Top 50)

The healthcare industry pays a lot of money for research

Out of the $6.5B total payments to physicians in 2014, $3.2B, or almost half, of those payments were for research. We can see this when aggregating the payments by the name of the drug or device manufacturer: companies like Genentech, Pfizer, and Novartis dominate the dollar amount of payments made to physicians, and most of their payments are for “Research” (brown). Further down the line, you can see medical device manufacturers like Stryker and Medtronic paying physicians mostly for “Royalty and License” (green).

Click to show interactive chart:

Payment Sources in 2014 (Top 50)

Physicians in CA received, by far, the most amount of money from industry.

The graph below shows how much money physicians received for research and “general” payments (any payment that isn’t classified as “Research”), grouped by the state they work in; the size of each bubble represents the number of physicians in that state.

CA had significantly more physicians receive payments (8081) than the runner-up state, NY (5981), and thus the physicians that worked in CA received a lot more money from industry, in aggregate.

Payments Received by State
Though drilling into state by state differences in the data (e.g. the dominant “purpose” CA physicians vs. physicians in other states get paid for) is an exercise for another time, we get a hint for why this phenomenon might exist by looking at the teaching hospitals that were affiliated with the physicians who got paid by industry the most.

Click to show interactive chart:

Payment Sources in 2014 (Top 50)

Physicians affiliated with the City of Hope National Medical Center in Los Angeles received the most industry payments, by far, and almost all if it from royalties or license fees (green). Genentech has been known to pay massive royalties for the drugs developed at City of Hope, including the crazy expensive cancer treatments Herceptin and Avastin.

Do physicians get rewarded with fancy dinners and extravagant trips?

By looking at the data, we can find which physicians got paid the most for “Entertainment”, “Food and Beverage”, and “Travel and Lodging”. But we won’t know for sure, because remember, all this payment data is reported by the healthcare industry themselves, and while there are some financial penalties for inaccurate reports, I don’t see an easy way for the government to verify the validity of the data.

The “worst offenders” were essentially given, by industry, $60 meals three meals a day for every day of the year, went on $590 per day trips, and spent $43 a day (about $300 a week) for entertainment and fun. Sounds like the life (except a little more on the entertainment and fun please).

Conclusion

There’s a lot of money being transferred from the healthcare industry to physicians, which means a ton of data since all of this has to be reported now. In fact, I didn’t even touch another part of the dataset, how much ownership each physician has in a particular drug or device manufacturer, which could give even more color on misaligned incentives. Also, without aggregation of some of the data fields, the raw, transaction/payment level data took up close to 6GB of space, and I didn’t want to spin up a Spark cluster or something. Luckily, the Open Payments site provides a web service that allowed me to aggregate and filter the raw data, dramatically reducing the dataset’s size.

With the Sunshine Act being first introduced in 2007, then shot down, then enacted as part of the ACA in 2010, and with the Centers for Medicare and Medicaid Services (CMS) now responsible for collecting this data on top of everything else it does, hopefully we find some useful applications for the Open Payments dataset.

This analysis and post were done pretty quickly, many thanks to Carol for giving me some immediate ideas and feedback! And to iPython Notebook, and the pandas and plotly libraries.

Learnings from being on my own

Ernest was a baller.

I’m always looking to learn and grow as much as I can, and so am now working for myself. I’m currently consulting for other businesses, doing product development and/or data analysis, since I have a generalist software + statistics background. I see it as a great way to work with different, awesome people, on different problems, while learning about different industries: it’s a way for me to take lots of little bets in my journey of doing interesting things, finding my passion/what I want to focus on, and becoming the best version of myself.

Here are some of the biggest things I’ve learned so far, even though it’s only been a short amount of time. Hopefully they are helpful and mostly generalizable, but everyone’s life is different so your mileage may vary.

1. Reflect on when in your life you’ve felt happiest and most fulfilled.

I looked back on my life and thought about when I really felt the most alive, happy, and fulfilled. For me, it came down to experiences where I manifested my dreams, despite any perceived risk. Of course, I could not have done it without the help and support of friends and family and partners-in-crime–I feel life is so much less meaningful without others–but it was not being dependent on anyone but myself in taking action to maximize my potential that made me feel fulfilled*.

For example, one of the first pieces of software I ever developed was a math flashcards application built in Visual Basic, with cheesy cartoon characters and everything. As a middle schooler who had just learned how to program, I was super proud of it and really excited whenever I got to work on it, because I had come up with the idea and it was up to me to manifest and build my own “dream”.

Another time when I felt happy and fulfilled was the period of a year or two of learning how to pick up girls. That itself is a story for another time, but again, I loved the experience of facing and overcoming perceived risk, via action, to become the best version of myself. There’s no doubt that I felt a lot of discomfort in a countless number of situations. But, especially in situations where the perceived risk is high but the real risk is low, the pain of regret usually hurts more than the pain of failure.

As a result, my overarching goal in life is to maximize the time I spend on these types of experiences.

What experiences have made you feel the most fulfilled in life?

2. Think about death.

Jeff Bezos, Steve Jobs, the ancient Stoics, and many others have used the tactic of thinking about death when examining life.

I like Bezos’s thought experiment the best for decision making, and I use it all the time: visualize that you are old and on your deathbed–would you regret having made decision A vs. decision B (vs. decision C, etc.)?

We all die someday. The inevitability is out of our control. So why not try to live the best life you can live?

3. Do things that make you happy, every day.

About a week after leaving my job, one random a day, I felt like I was in a deep rut: negative emotions like fear and self-doubt were spiraling out of control in my head. I needed to change things up–being in such a bad mood wasn’t moving me forward in life at all.

Taking 10 minutes to meditate helped (Tara Brach has some great guided meditations, Headspace is also great for beginners).

I hadn’t listened to any music in several days, so I put on some EDM, changed my environment a little, and cranked on work for a bit at a coffee shop. Those of you who’ve worked in a library and/or coffee shop before, it’s strangely motivating isn’t it?

I went to the gym in the late afternoon, which also helped because it took my mind off negative emotions and gave me sense of progress.

Later that night, I went to an event met new people. It was great to put myself in their shoes for a little and understand what they’re up to, and what they care about most.

Thanks for reading!

The new journey has only just begun, but those are the practices and mindsets I’ve implemented that have helped me so far. As always, advice is useless if you don’t internalize it, make it part of your mindset, and practice it.

Have a safe and relaxing holiday season!

 

*Reminds me of Rand’s Objectivism, I guess

What I learned from my side project in education technology: Formata

formata_screenshot
Screenshots of what the student would see, taken from the deck I sent to teachers.

Last winter, I built an MVP for an ed-tech product, called Formata. Here’s what it was, why I did it, and what I learned from it.

Why Education

I had been (and still am) trying little side projects in different industries because I like learning about and understanding new things. At the time, I had done some stuff in productivity and fintech, and I knew I wanted to have an impact on education eventually in my life. It’s been so influential on me and and is a huge lever to get us closer to what I call “opportunity equality” worldwide, so I decided to do a small project in education this time.

Principles of Educational Impact

I did a little thought experiment: I imagined myself as a middle school kid again, and thought about what influenced me the most, in my education. “My teachers” was the answer. Students spend the majority of their week day in school, and it’s the teachers that interact with them, and understand each and every child. I saw it first hand on a farm on the other side of the world: way more than the facilities and the curriculum, it’s the teacher that inspires the student and really has an impact on him or her.

Next, I asked, “Ok, so if teachers have the most impact on a child’s education, what makes a good teacher? What does “good” even mean? And how do you measure it?” I did some research, and came across the Gates Foundation’s Measures of Effective Teaching project, a project backed by hundreds of millions of dollars and pursuing these exact questions. Awesome!

Some more research led me to the interesting and sometimes controversial world of teacher evaluation. Traditionally, teachers have been evaluated by two methods: student test scores (also known as “value added”), and observations by someone like the principal. The thought is basically that student test scores, as the outcome of a teacher’s teaching, should correlate with his or her teaching ability. Sometimes, administration has a rubric for what they think makes a teacher good, and so a few times a year, the principal might sit in on a class for 15 or so minutes to observe and evaluate the teacher.

There are some fundamental issues with both methods, which I’ll mention briefly. It’s hard to see the principal observing each teacher a few times a year, for 15 minutes, having any strong relationship with how good the teacher actually is. The Gates Foundation has done research that shows that teacher observations are less reliable than test scores; however, tests on which teachers are usually evaluated (usually state-wide standardized ones) only happen once every year, and if they know this is tied to their employment, there’s a strong incentive to “teach to the test”.

Who interacts with teachers the most? Who would be best at evaluating them? The students themselves. Again, the Gates Foundation did a bunch of research on what exactly students should evaluate teachers on, sort of quantifying the aspects of a good teacher. They narrowed the most important characteristics down to what they called the “7 C’s”: caring, control, captivate, clarify, confer, consolidate, and challenge. Structured in the right way (e.g. low-stakes and anonymized, so the students aren’t incentivized to fudge), student perception questionnaires that asked about these characteristics were pretty reliable in discerning high performing teachers from the rest.

Building A Product

I noticed that in the Gates Foundation’s research, the student perception surveys were being administered with pen, paper, envelopes, stickers, etc. I felt like the surveys could be administered much more efficiently with technology; the results could also be tabulated and organized much better for teachers and administrators to learn from.

To further validate my idea, I went to a bunch of ed-tech meet-ups, talking to teachers and asking them what they thought about my idea. They all agreed that having more feedback, more frequently, on their teaching would be helpful.

I thought this was a pretty quick MVP to build, I could even do some of the analysis of feedback for the teachers manually myself at first. All the teacher would have to do was give me the email addresses of his/her students, and I could auto-generate emails and questionnaires, send them off, and aggregate the results.

Visualizations of student feedback I could generate for teachers, so they could pinpoint where to work on
Visualizations of student feedback I could generate for teachers, so they could pinpoint where to work on

Moving On

After a month of reaching out to teachers, those who I already met or knew and also those who I didn’t, and sending them my slide deck about Formata and its benefits, I finally got a few who said they were willing to try it. They were extremely busy though (all teachers are overworked), and had to get permission from their department heads, who had to get permission from the principal, to use it. Their effort fizzled out, and I did a re-evaluation of my own time, and moved on.

What I Learned

I learned about a lot of different things, but overall, I think this project reinforced two principles for me:

  • Ask better questions when doing customer development, and solve a problem.
    • My idea never really solved an important problem for my target audience, teachers. I should’ve talked to more administrators, who may care more about teacher evaluation. Also, you’re bound to get positive but not very useful answers when you ask someone what they think about your idea: whether it solves a big enough problem for them to actually integrate your product into their life is a different story. Not solving an important enough problem for teachers coupled with lots of bureaucracy and the fact that they’re overworked was not a recipe for excited users.
  • Keep doing things, don’t worry about failure.
    • I got to learn about an important and fascinating area of education by doing this project. I also got to learn about the realities of the space. I learned more about the power of customer development: that through observation and/or asking better questions, you can get to true pain points that people will pay you to solve. I learned that some types of problems and tasks excite me more than others. This project was also a great way for me to practice first principles thinking.

Thanks for reading this journal of sorts.

Cancer clinical trials and the problem of low patient accrual

Inspired by this contest to come up with ideas to increase the low amount of patient accrual for cancer clinical trials, I decided to look more into the data. Bold, by the way, is one of my all time favorite books, and was co-authored by the creator of the herox.com website, the xprize Foundation, and co-founder of Planetary Resources: Peter Diamandis. Truly someone to look up to.

Anyways, the premise of the contest is that over 20% of cancer clinical trials don’t complete, so the time and effort spent is wasted. The most common reason for this termination is the clinical trial not being able to recruit enough patients. Just how common is the low accrual reason though? And are there obvious characteristics of clinical trials that can help us better predict which ones will complete successfully, and what does that suggest about building better clinical trial protocols? I saw this as an opportunity to explore an interesting topic, while playing around with the trove of data at clinicaltrials.gov and various data analysis python libraries: seaborn for graphing, scikit-learn for machine learning, and the trusty pandas for data wrangling.

Basic data characteristics

I pulled the trials for a handful of the cancers with the most clinical trials (completed, terminated, and in progress), got around 27,000 trials, and observed the following:

  • close to 60% of the studies are based in the US*
location_distribution
*where a clinical trial is “based” can mean where the principal investigator (the researcher who’s running the clinical trial) is based. clinicaltrials.gov doesn’t give the country in which the principal investigator’s institution is in, so as a proxy, I used the country which had the largest number of hospitals the study could recruit patients at.
  • almost 25% of all US based trials ever (finished and in progress) are still recruiting patients

overall_status_distribution

  • of those trials that are finished and have results, close to 20% terminated early, and 80% completed successfully (which matches the numbers the contest cited)

finished_status_distribution

  • almost 50% of all US based trials are in Phase II, almost 25% are in Phase I

phase_distribution

  • and interestingly, the termination rate does not differ very significantly across studies in different phases

status_by_phase

Termination reasons

Next, I was interested in finding out just how common insufficient patient accrual was as a trial termination reason vs. others reasons. This was a little tricky, as clinicaltrials.gov gives principal investigators a free-form text field to enter their termination reason. So “insufficient patient accrual” could be described as “Study closed by PI due to lower than expected accrual” or “The study was stopped due to lack of enrollment”. So I used k-means clustering (after term frequency-inverse document frequency feature extraction) of the termination reasons to find groups of reasons that meant similar things, and then manually de-duped the groups (e.g. combining the “lack of enrollment” and “low accrual” groups into the same group because they meant the same thing).

I found that about 52% of terminated clinical trials end because of insufficient patient accrual. This implies that about 10% of clinical trials that end (either successfully, or because they’re terminated early) do so because they can’t recruit enough patients for the study.

termination_reasons

Predicting clinical trial termination?

Clinicaltrials.gov provides a bunch of information on each clinical trial–trial description, recruitment locations, eligibility criteria, phase, sponsor type (industry, institutional, other) to name a few–which begs the question: can this information be used to predict whether a trial will terminate early, specifically because of low patient? Are there visible aspects of a clinical trial that are related to a higher or lower probability that it fails to recruit enough patients? One might think that the complexity of trial eligibility criteria and the number of hospitals from which the trial can recruit from could be related to sufficient patient accrual.

Here was my attempt to get at a solution to this question analytically: fitting/training a logit regression multi class classifier–whether a trial would be “completed”, “terminated because of insufficient accrual”, or “terminated for other reasons”–on a random partition of clinical trial data, and measuring its accuracy at classifying out-of-sample clinical trials. The predictors were of two types: characteristic (e.g. phase, number of locations, sponsor type, etc.) and “textual”, or features extracted from text based data like the study’s description and eligibility criteria. Some of these features came from a similar tf-idf vectorization process as described in the k-means section above, other features were the simple character lengths of these text blocks. Below is a plot showing the relationship between two of these features: length of the eligibility criteria block of text, and length of the study’s title, two metrics that perhaps get at the complexity of a clinical trial.

complexity

The result: the logit model could only predict correctly whether trials would complete successfully, terminate because of low accrual, or terminate for other reasons 83.6% of the time. This is a pretty small improvement over saying “I think this trial will complete successfully” to every trial you come across, in which case you would be correct 80.6% of the time (see the Completed vs. Terminated pie chart above). Cancer clinical trials are very diverse, so it makes sense that there don’t seem to be any apparent one-size-fits-all solutions to improving patient accrual.

 

Amazon’s secret sauce: the flywheel model

Amazon’s flywheel of growth. From Andreessen Horowitz’s blog post http://a16z.com/2014/09/05/why-amazon-has-no-profits-and-why-it-works/

After finishing The Everything Store recently, I wanted to share an interesting framework that Bezos used when founding Amazon. The book, by the way, is a phenomenal read and gives great insight into Bezos’s character and how he has led an innovative Amazon. Ambition, persistence, spontaneity, and being neurotic/obsessive are some of the most common traits of the successful people I’ve read about so far, and he certainly embodies all of them.

Bezos thought about Amazon’s business model as a “flywheel” in the early days, and claimed that this was their secret sauce. Without going into what an actual flywheel is, this was another way of saying that the business model possessed a positive reinforcement loop that grew stronger if you fed any part of it. To quote the book:

… Bezos and his lieutenants sketched their own virtuous cycle, which they believed powered their business. It went something like this: lower prices led to more customer visits. More customers increased the volume of sales and attracted more commission-paying third-party sellers to the site. That allowed Amazon to get more out of fixed costs like the fulfillment centers and the servers needed to run the website. This greater efficiency then enabled it to lower prices further. Feed any part of this flywheel, they reasoned, and it should accelerate the loop.

Starting up the flywheel can be difficult, but once results accumulate, momentum builds and business accelerates. In the flywheel model, all incentives are aligned in the same direction. Some strategic and managerial conclusions:

  1. Design for success: the flywheel model is just another example of how leaders can design for the successful operation of their company before any real rubber hits the road. All planning and no action is bad, but having some sort of goal and a plan before doing any serious execution, in principle, works a lot more efficiently than trying things haphazardly and seeing what works.
  2. Design for alignment: a business model is least impeded when the result of anyone’s actions promote everyone’s desires and best interests, especially when that cycle is self-reinforcing.
  3. Do everything to start, protect, and build that initial momentum

Fun fact: during the 1999 holiday season, a lost box of stuffed Jigglypuffs wreaked havoc on a few Amazon distribution centers (they weren’t called fulfillment centers back then). Bezos ordered staff to pull all-nighters looking for the bundle of Pokemon–customers always came first.

What trying to blog somewhat regularly does for me

A young Benjamin Franklin. I’m currently reading Isaacson’s biography of him: it’s brilliant, and Franklin was a baller. More to come later…

I have not been at all regular with my blog. I also have a bunch of draft posts on various topics just sitting there, partially written, mostly because I started writing and got stuck, or distracted, or ran out of time. Noticing that has made me want to write this, a short blog post about blogging (meta-blogging?).

Trying to blog somewhat regularly forces me to structure my thoughts, to come up with a cohesive, brief story that allows me to get my point across and hopefully get others thinking. This is something that I haven’t mastered yet–as evidenced by my collection of half-written draft posts–but I guess that’s the process of becoming a better writer, and where editing comes in. I wonder: do all the best bloggers edit their blog posts? Because I remember editing and revising essays for school over and over again, a process that took a lot of time. Some of the best bloggers that I follow seem to write off the cuff while maintaining brevity and an easy to follow structure in their posts.

The process of regularly structuring my point of view for writing also leads to the discovery of both holes in my thinking and also areas of opportunity that I can do more research on. Blogging also acts as a sort of accountability tactic: if I blog about doing something, then I feel even more compelled to do it. It certainly is a learning experience for me, and hopefully others can learn (about blogging, and about whatever else I talk about and share) along with me.

Warren Buffett: insights into his character, obsession with OPM

Buffett’s house in Omaha, Nebraska. He bought it in 1957 and still lives in it today.

I tend to idolize Warren Buffett a little, something rekindled after recently reading Making of An American Capitalist.

He’s brilliant, humble, focused, self-confident, and frugal. He started his own “golf ball” business as a kid, employing an army of friends to fish out golf balls from ponds in local golf courses,  and then to clean, organize, and resell them. During his short time at Penn, Buffett joined a fraternity. He would spend parties at his frat house sitting on the ledge by a window, expounding on investing, the gold standard, and other economic concepts–a throng of guys and gals would always gather on the floor in front of him, hanging on his every word. In the early days of running his first fund, Buffett was insanely secretive about his investments, working from his home like a hermit, only wearing t-shirts and underwear, and refused to compromise on his fund’s 6 month lock-up period and $50,000 minimum investment (a lot at the time), even for celebrity investors. Those are just some of the captivating insights into Buffett’s character.

Buffett’s vast amount of wealth does not necessarily intrigue me that much–it is about how he build it: with self-reliance, focus, discipline, and authenticity.

He is also obsessed with “other people’s money”, or OPM, and OPM is essentially how he was able to build such a great fortune. One of Buffett’s first outright purchases of a company was an insurance company–he owns the well known GEICO today–and he used the float to fund his investments. That early purchase is said to be worth half of Berkshire Hathaway’s value today–this insightful post by Noh-Joon on Quora explains that, as well as how Buffett is able to essentially turn a 5% increase in actual investment appreciation into a 15% return (hint: leverage and effectively negative interest rates from insurance underwriting discipline). Not to mention, he’s a great stock picker.