-
Senior Spark Developer
- IBM (San Jose, CA)
-
Introduction
A career in IBM Software means you’ll be part of a team that transforms our customer’s challenges into solutions.
Seeking new possibilities and always staying curious, we are a team dedicated to creating the world’s leading AI-powered, cloud-native software solutions for our customers. Our renowned legacy creates endless global opportunities for our IBMers, so the door is always open for those who want to grow their career.
We are seeking a skilled Spark Developer developer to join our IBM Software team. As part of our team, you will be responsible for developing and maintaining high-quality software products, working with a variety of technologies and programming languages.
IBM’s product and technology landscape includes Research, Software, and Infrastructure. Entering this domain positions you at the heart of IBM, where growth and innovation thrive.
Your role and responsibilities
• Design, develop, and optimize big data applications using Apache Spark and Scala.
• Architect and implement scalable data pipelines for both batch and real-time processing.
• Collaborate with data engineers, analysts, and architects to define data strategies.
• Optimize Spark jobs for performance and cost-effectiveness on distributed clusters.
• Build and maintain reusable code and libraries for future use.
• Work with various data storage systems like HDFS, Hive, HBase, Cassandra, Kafka, and Parquet.
• Implement data quality checks, logging, monitoring, and alerting for ETL jobs.
• Mentor junior developers and lead code reviews to ensure best practices.
• Ensure security, governance, and compliance standards are adhered to in all data processes.
• Troubleshoot and resolve performance issues and bugs in big data solutions.
Required technical and professional expertise
* 12+ years of total software development experience.
* 5+ years of hands-on experience with Apache Spark and Scala.
* Proficiency in Scala with deep knowledge of functional programming.
* Strong experience with distributed computing, parallel data processing, and cluster computing frameworks and problem-solving skills and the ability to work independently or as part of a team.
* Experience with cloud platforms such as AWS, Azure, or GCP (especially EMR, Databricks, or HDInsight).
* Solid understanding of Spark tuning, partitions, joins, broadcast variables, and performance optimization techniques.
* Hands-on experience with Kafka, Hive, HBase, NoSQL databases, and data lake architectures.
* Familiarity with CI/CD pipelines, Git, Jenkins, and automated testing.
Preferred technical and professional experience
* Experience with Databricks, Delta Lake, or Apache Iceberg.
* Exposure to machine learning pipelines using Spark MLlib or integration with ML frameworks.
* Contributions to open-source big data projects are a plus.
* Excellent communication and leadership skills.
* Understanding of data lake and lakehouse architectures.
* Knowledge of Python, Java, or other backend languages is a plus.
IBM is committed to creating a diverse environment and is proud to be an equal-opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, gender, gender identity or expression, sexual orientation, national origin, caste, genetics, pregnancy, disability, neurodivergence, age, veteran status, or other characteristics. IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.
-
Recent Jobs
-
Senior Spark Developer
- IBM (San Jose, CA)
-
Lead Data Engineer
- Crate & Barrel (Northbrook, IL)