Senior Data Engineer (m/d/f)

The web was created by scientists and for scientists, to foster scientific collaboration and drive progress for a better world. Join our team to take the web back to its roots and achieve that original mission.
We’re a passionate team of pragmatic optimists from around the world and from many different backgrounds. Together, we focus on building great products that change the way scientists communicate for the better.

We love what we do. We connect the world of science and make research open to all.

Objective of the Role

As part of ResearchGate’s data engineering teams, you empower decision making processes for product managers and data scientists by continuously improving data pipelines and architecture. To ensure fast and reliable data access, shaping and working on a long-term vision for our data and machine learning infrastructure is important. Join us to empower ResearchGate and make science happen faster.

Responsibilities

  • Become an essential member of our Machine Learning Infrastructure Architecture Team and shape the long-term vision of ML at ResearchGate
  • Develop a system that enables data teams to quickly iterate on ML-based workloads and easily deploy their models to our production systems
  • Ensure that the data pipelines we use at ResearchGate are ready for future challenges
  • Provide technical leadership, influence, and partner with fellow engineers to architect, design and build infrastructure that withstands scale and availability while reducing operational overhead
  • Engineer efficient, adaptable and scalable data architectures to make building and maintaining big data applications easy and enjoyable for others
  • Build fault tolerant, self-healing, adaptive, and highly accurate data computational pipelines
  • Work with data scientists, data analysts, backend engineers, and product managers to solve problems, identify trends and leverage the data we produce
  • Build workflows involving large datasets and/or machine learning models in production using distributed computing and big data processing concepts and technologies

Requirements

  • Experience in designing and implementing data pipelines and ML applications
  • Working with data at the petabyte scale
  • Design and operation of robust distributed systems
  • Experience with Java is preferred
  • Working knowledge of relational databases and query authoring (SQL)
  • Experience using technologies like Kafka, Hadoop, Hive, and Flink
  • Experience in using machine learning tools/frameworks/libraries, such as Python, R, Jupyter Notebook, scikit-learn, PyTorch, Tensorflow is a plus
Environment

You'll be working in a team-based environment where code is written, tested and shipped continuously. Our engineering team is passionate about building maintainable, scalable web applications that are constantly optimized to meet the needs of our users - 15+ million researchers worldwide.
Our hiring process is uncomplicated. You'll be interviewed by the people you'll be working with, so you can quickly find the role that suits you best and start making an impact.
We’re located at the heart of Berlin, one of the most exciting cities in the world and a place where people from all walks of life feel welcome. Work to change the world of science and have a good time while you’re at it: we offer free, healthy lunches and many fun events.