A Space Talent Spotlight Series Interview with Dr. Bradley Voytek, an Associate Professor in the Department of Cognitive Science, Endowed Chair at the Halıcıoğlu Data Science Institute, and Vice Chair of the Data Science Undergraduate Program at the University of California, San Diego. Dr. Voytek was one of the first data scientists at Uber. You can read more on Dr. Voytek’s recent research in Neuroscience here and here.
What is your background?
While I initially studied physics, I became fascinated with the brain and ended up switching my major to Psychology. It was the late 90s during the first dot-com boom, so it felt like a good idea to take some computer science courses on top of that, ultimately creating what most would call a Cognitive Science degree today. At the end of it, I had become highly attuned to the deepening intersection between neuroscience, philosophy, and computation. Once I graduated I worked at UCLA for a few years before returning to UC Berkeley to do my PhD in Neuroscience where I studied theoretical, computational neuroscience, and its intersection with cognitive neuroscience. After that, I joined Uber as their first Data Science to help build out their data strategy and team, back when they were a startup with only about 10 employees. After leaving Uber I returned to Neuroscience research, and ultimately joined UCSD, where I helped found the Halıcıoğlu institute and the undergraduate data science major.
What have been your top career accomplishments so far?
This first one is what really got me into data science. It was a tool called brainSCANr. During my PhD I was super interested in how our brain brings two regions together to help us remember things. I wanted to better understand how they communicate, particularly the physical characteristics such as the white matter that connects them and the neurotransmitters that are at work there. I thought there would be some website that had a record of each brain region’s inputs and outputs, but this didn’t exist, and it still doesn't. I instead had to spend hours on end, reading through papers that had been published in the 70s. It was frustrating because we had all the research, but why don't we have a nice synthesis of how these regions connect and interact. There's about 3M papers in neuroscience alone, the science is huge and no single person can read all of those papers. I was a MATLAB programmer at the time, and I had the idea of working on such a tool. My wife was a Python programmer so we challenged each other to a “code-off” to see who could build this tool out first; she ended up winning so I became a Python programmer after that. It was able to produce clusters of things that are related, but also identify gaps where maybe things should be related but aren’t, effectively helping to guide research. At the time we called that semi-automated hypothesis generation. This really was a data science project before data science was even a thing, and it’s what drove me to enter the field and ultimately join Uber.
Another more recent accomplishment was some work that I did with Richard Gao. We have all this info about the brain, like where things are, what neurons there are, what kinds of neurons, etc.. But there’s other databases such as one from the Allen Institute, started by Microsoft co-founder Paul Allen, that has a huge record of how strongly certain genes are expressed in the brain, and then there's another by the National Institutes of Health that has huge maps of human brain structure. Our team had our own map of connectivity surrounding areas of the brain that relate to memory. By integrating all of these data sources, we were able to characterize different regions and answer questions like how long can a particular type of memory be stored in the brain, and what do those connections look like? All of this data made it possible to find links that were not obvious, inform any findings we came across, and ultimately, ask deeper questions about the brian and cognition.
The third one is the last project I did for Uber. This was really a proof of concept, using agent-based modeling to simulate a city. Running experiments for the company is hard, because you need to control as many variables as you can. When using Uber, you'll get an ETA, and that estimate needs to be accurate and low. You want your food in 30 minutes, not 3 hours, you want your uber to get to you in 6 minutes, not half an hour. But the dispatch algorithm needs to know how to get those numbers as low as possible. Typically, websites can get away with AB testing since every user is independent. But in an interconnected system such as Uber’s If you change the dynamics of the dispatch algorithm for one user, it changes the experience of another user. A user can see an ETA of 5 minutes, but if someone else books that driver, and a driver that IS available is now further away, that ETA will change. So A/B testing couldn’t work here in a live system. I ended up creating a system that uses historical uber data and looks at the probability of a user requesting a ride at any given moment, at any given place of the city, and this is dynamic as well. For example, on a weekend night, there’s more users around the bars of a city, on monday mornings demand might be more localized around workplaces, and on top of that, you have drivers coming in from neighboring cities. So we simulated cities using historical data to learn its dynamics, and then from there simulated these cities under different dispatch algorithms to find which performed the best. We would then take the algorithms that performed best in simulation, and tested them out in the real world through more careful experimentation. It was a ton of fun to work on, and it changed the way I think about doing science. You don’t always have to do experiments, you can use data to simulate plausible experiments and narrow things down to ideas that are more tractable.
What were the critical steps/choices that helped you get ahead?
Persistence. My idea for BrainScanR actually came from a panel I was on at Stanford during my PhD. Someone in the audience had asked a question regarding how complicated neuroscience is and how difficult it was to learn. I commented that we actually know a lot about the brain, but the problem is that no one person can read and integrate all that info, and that we probably need some Machine Learning or AI tools that could help us synthesize it. Another panelist, someone I respected a lot, basically said that that was nonsense. But I strongly disagreed, there IS a ton of information, and it’s way too much for any one person to digest, so why not have algorithmic tools that could help us bring this together? I truly believed in its utility, and I know no one out there is just going to do free work, so I went out and taught myself the basics of NLP. There’s some ideas that just stick with you, and when they’re with you for that long, you just have to go out and act. There’s a lot of people out there with great ideas, but not a lot that are prepared to execute on them. You have to be willing to see a project through and take the initiative.
What part of your education had the most impact on your career?
I would say that my first exposure to programming and a course I took on the philosophy of mind class were pretty big changes for me. On the philosophy side, we were talking about how you take a complex problem, and think about it logically. Philosophy actually has a lot of roots in mathematics, so it was a big eye opener for me that you could take mathematical logic and apply it to non-mathematical objects. Relatedly, programming helped me think about how you take an idea in your head and break it down into small manageable chunks. You might have a hunch or an idea of what the endpoint might be, but you don't know how far away it is, or what the result will actually look like when you get there. In some cases it might even be years away. This might cause people to freeze or procrastinate, but making little changes every day eventually adds up to a big change, you just have to be patient. So rather than being intimidated by your end goal, you realize that each of these changes is progress. It’s a core fundamental in agile software development, but has lots of applications elsewhere, even in your daily life.
What about your career have you enjoyed the most and least?
Most: In the sciences, it’s those moments of discovery; those moments that make you stop and think “oh, there might be something real here”. Of course you want to take a step back, get skeptical, and tackle it from different angles to make sure we’re not fooling ourselves. But having something, and knowing that you’re seeing something that no one else has ever seen before is a really unique feeling. There’s this feeling of flow that happens when I work in the sciences. That feeling of trying something, and having it work, it’s hard to compare it, and hard to describe it.
Least: My least favorite part would have to be everything else that goes on in the background that I have to do to get to those moments, like running research groups, online exams, reports, paperwork, etc.. They’re mechanics that are just part of the day to day, but most of it is ok, the feeling of discovery definitely makes up for it. Maybe the other side would be the failures that come with science, ideas don't work more often than they do, and ideas go unfunded more often than not. An idea to early researchers is almost like a baby and when it doesn’t get funded, some do take it personally. Really it’s hard not to, but there’s no room for ego in research. I’m at the stage of my career where that’s not really a pain point anymore, but at first it is hard.
Where do you see the most promising career opportunities in the future?
I’m probably a bit biased, but I definitely have to say it’s in data science, machine learning, and analytics. We're not going to have any less data going forward, and the amount of info is only growing; though there's some caveats to this. Conversations around data ethics, data privacy, and data providence will become more commonplace. We will need to make decisions on who owns the data, who has the rights to use the data and the algorithms used to analyze the data, and who will handle the hardware infrastructure that will support the creation of these datasets. There’s not going to be an end to this.
If you know the skills of data science, programming and math, you are going to have career options going forward. I think the exciting ones will be around the intersection of technical domains with traditionally non-technical sectors. Like text mining classic stories to identify common themes, grammatical structures, and really the bones of how a story was written. There’s definitely some naysayers that believe this takes away from the creative process; they feel it’s too artificial. They may oppose Netflix, for example, using something like this to create a story that borrows bits and pieces of past works, sure, but humans have been doing this for forever. Data will be at the foundation of a lot of what we create going forward.
What advice/resources would you share with the next generation?
Don't focus too much on the technical. Of course, you have to know the technical skills, but take time to understand the sociological and philosophical side of things too. Our ideas don't just exist on paper and in math, they exist embedded within society. Stop and ask, how am I implementing this algorithm, how does it impact people? There’s times where you might get to work on projects, and get so lost in them that you forget about how it could impact others. Without stopping to think about these effects, you’re potentially setting yourself up to do harm to other people. As a student, it’s so easy to get stuck in the cycle of “I gotta take linear algebra and then discrete math…” etc., but taking classes out of the technical domains to get a more rounded out perspective is endlessly beneficial.