• Education: Bachelor&039;s degree in Computer Science, Engineering, Data Science, or a related field. A Master’s degree or relevant certifications are a plus.
Experience
• Knowledge of retail business concepts such as demand forecasting, customer segmentation, pricing optimization, inventory management, and omni- channel analytics.
• Strong experience with cloud platforms (AWS, Azure, or GCP) and their associated data tools for processing and storing retail data.
• Hands- on experience with SQL, Python, and Scala for building and optimizing data pipelines.
• Minimum of 3- 5 years of experience in data engineering, with a focus on the retail domain (e.g., working with retail data such as sales transactions, customer data, inventory systems).
• Proven experience with Databricks and its ecosystem (Delta Lake, Apache Spark, Databricks notebooks) for building scalable data pipelines.
Technical Skills
• Knowledge of data warehousing tools and technologies like Snowflake, BigQuery, or Redshift is a plus.
• Familiarity with retail- specific analytical models, such as sales trend analysis, predictive inventory models, and price elasticity models.
• Solid understanding of data storage technologies such as S3, Azure Data Lake Storage (ADLS), or Google Cloud Storage for large- scale retail data.
• Expertise in building and optimizing ETL workflows using Databricks and Spark for big data processing.
Soft Skills
• Strong communication skills, with the ability to explain complex data concepts to retail business stakeholders.
• Ability to work independently, prioritize tasks, and meet deadlines in a fast- paced retail environment.
• Excellent problem- solving and troubleshooting skills, particularly in the context of retail data complexities.
Preferred Qualifications
• Experience integrating machine learning models into retail data pipelines for predictive analysis.
• Previous exposure to business intelligence (BI) tools such as Tableau, Power BI, or Looker, and how they relate to retail- specific analytics.
• Familiarity with CI/CD practices for data pipeline deployment, including version control using tools like Git.