The Vice President of Data Engineering role is a hands-on opportunity to design and implement Company’s large-scale data acquisition pipeline.
Responsibilities
Research, architect, and develop new applications for data capture and processing in the scale of 100’s of millions of records
Explore opportunities and relationships in available data sources leading to new insights and value
Develop, test, performance test, and deploy new data-centric applications responsible for data capture and processing business-specific workflows
Continuously improve the resilience and reliability of data collection and processing
Work closely with engineers throughout the organization to identify and implement tools and processes for data collection, transport and processing
Communicate with the dev engineers, operations, and analytics engineers who use our applications and platforms to both explain functionality and gather requirements for new projects
Qualifications/Skills
BS in a technical discipline, preferably Computer Science/Engineering, MS preferred
Strong programming and scripting background (preferably Go / Python) within a Linux environment
Familiar with Linux operating system and command line tools
Programming experience with SQL/ PostgreSQL, Redshift preferred
Experience with data acquisition applications from web sites and social media in the scale of 100’s of millions of records (including API’s and web scraping)
Experience with AWS services including EC2, S3, Redshift
Experience with large volumes of data
Experience in learning new technologies and developing new applications to solve large data problems
Experience building, working with and deploying data consumers / producers, pipelines, and distributed systems
Flexible, creative, agile approach to collaboration and development
Preferred
Experience with complex REST APIs, including Google and YouTube
Some experience and/or interest in applied ML, AI, Deep Learning applications (NLP, RNN, CV/Image Processing) in production (i.e. Tesnsorflow, PyTorch, Keras, etc.)
Familiar with Jupyter Notebooks and pure Python 3.x.
Experience with Google GCP services, Big Query, Google Cloud Storage