You are going to build innovative data pipelines for processing and analyzing client’s large user datasets (250 billion + events per month). A unique challenge with the role is being comfortable in developing varied technologies like custom transformation/integration apps in Python and Java, and pipelines in Spark, Kafka, Kinesis, transforming and analyzing in SQL.
— Develop ETL (Extract, Transform and Load) Data pipelines in Spark, Kinesis, Kafka, custom Python apps to transfer massive amounts of data (over 20TB/ month) most efficiently between systems;
— Engineer complex and efficient and distributed data transformation solutions using Python, Java, Scala, SQL;
— Productionalize Machine Learning models efficiently utilizing resources in clustered environment;
— Research, plan, design, develop, document, test, implement and support proprietary software applications;
— Analytical data validation for accuracy and completeness of reported business metrics;
— Open to taking on, learn and implement engineering projects outside of core competency;
— Understand the business problem and engineer/architect/build an efficient, cost-effective and scalable technology infrastructure solution;
— Monitor system performance after implementation and iteratively devise solutions to improve performance and user experience;
— Research and innovate new data product ideas to grow client’s revenue opportunities and contribute to company’s intellectual property.
— 3+ years of experience of developing in Python to transform large datasets on distributed and cluster infrastructure;
— 5+ years of experience in engineering ETL data pipelines for Big Data Systems;
— Proficient in SQL. Have some experience performing data transformations and data analysis using SQL;
— Comfortable in juggling multiple technologies and high priority tasks.
Nice to have:
— BS or higher degree in computer science, engineering or other related field;
— 5+ years of Object Oriented Programming experience in any of languages such as Java, Scala, C++;
— Prior experience of designing and building ETL infrastructure involving streaming systems such as Kafka, Spark, AWS Kinesis;
— Experience of implementing clustered/ distributed/ multi-threaded infrastructure to support Machine Learning processing on Spark or Sagemaker;
— Experience with Distributed columnar databases like Veritca, Greenplum, Redshift, or Snowflake.
Success in this role:
— Demonstrate a passion for Data;
— Eagerness in research and learning new technologies to develop creative and efficient ways to solve business problems;
— Take full responsibility for the initiative;
— Stay focused on the successful implementation of the task at hand before moving on to the next engineering challenge;
— Going above and beyond: While engineering for current tasks, think of the big picture, adjustment code bases, processes. Try ways to make systems more robust, fault tolerant, monitor for failures, and program for automated recovery.
— Opportunity to work on bleeding-edge projects;
— Work with a highly motivated and dedicated team;
— Competitive salary;
— Flexible schedule;
— Medical insurance;
— Benefits program;
— Corporate social events.
Grid Dynamics is the engineering services company known for transformative, mission-critical cloud solutions for retail, finance and technology sectors. We architected some of the busiest e-commerce services on the Internet and have never had an outage during the peak season. Founded in 2006 and headquartered in San Ramon, California with offices throughout the US and Eastern Europe, we focus on big data analytics, scalable omnichannel services, DevOps, and cloud enablement.