The VP, Data Engineering, role is a hands-on opportunity to design and implement Company’s large-scale data acquisition pipeline. This will directly impact the future valuation of the company as it scales past the current point of inflection and works towards moments of step growth events in the next few years.
The company has been very successful to-date at proving the business model at limited scale. The next step in the company’s evolution is to demonstrate this success at scale, ramping up operations 5x – 10x or more. To succeed, it is imperative that we deepen our knowledge of the YouTube content ecosystem and improve our forecasting technology based on large scale historical data. This will be one of the company’s strategic differentiators, offering up diverse new opportunities to further our first-mover advantage and consolidate our position as the industry leader.
Research, architect, and develop new applications for data capture and processing in the scale of 100’s of millions of records
Explore opportunities and relationships in available data sources leading to new insights and value
Develop, test, performance test, and deploy new data-centric applications responsible for data capture and processing business-specific workflows
Continuously improve the resilience and reliability of data collection and processing
Work closely with engineers throughout the organization to identify and implement tools and processes for data collection, transport and processing
Communicate with the dev engineers, operations, and analytics engineers who use our applications and platforms to both explain functionality and gather requirements for new projects
BS in a technical discipline, preferably Computer Science/Engineering, MS preferred
Strong programming and scripting background (preferably Go / Python) within a Linux environment
Familiar with Linux operating system and command line tools
Programming experience with SQL/ PostgreSQL, Redshift preferred
Experience with data acquisition applications from web sites and social media in the scale of 100’s of millions of records (including API’s and web scraping)
Experience with AWS services including EC2, S3, Redshift
Experience with large volumes of data
Experience in learning new technologies and developing new applications to solve large data problems
Experience building, working with and deploying data consumers / producers, pipelines, and distributed systems
Flexible, creative, agile approach to collaboration and development
Experience with complex REST APIs, including Google and YouTube
Some experience and/or interest in applied ML, AI, Deep Learning applications (NLP, RNN, CV/Image Processing) in production (i.e. Tesnsorflow, PyTorch, Keras, etc.)
Familiar with Jupyter Notebooks and pure Python 3.x.
Experience with Google GCP services, Big Query, Google Cloud Storage