Vinted is the world’s biggest second-hand style marketplace. Our aim is to make second-hand first choice worldwide 🌎. Currently ~20 million people use our platform and Vinted’s Data Warehouse (DWH) team makes sure our analysts, management and product teams have up to date data to make good decisions.
This September Vinted has reached a major milestone - EBITDAM == 0 (EBITDA before Marketing as we consider marketing an investment), meaning that if we stop marketing efforts, we can run the company without external investments. The company is growing fast. It’s getting difficult for Vinted DWH to keep up with ever-growing workload.
At the moment Vinted DWH is a one-man-team and we decided to double it. You would be working along with Lech to:
- Ensure, that the company has new metrics ready for analysis every day (nightly, hourly and real-time / streaming job pipelines);
- Enable analysts to analyse ever-growing amounts of data (currently we have 300+TB of data in our main cluster);
- Enable product teams to use Data Warehouse heavy lifting in our core product for: statistics for Vinted members about their item visibility; Machine Learning model training and preparation for scammer, spammer and other bad actor detection; recommendations engine.
Here's what is currently on the roadmap:
- Adapt the system to GDPR requirements;
- Find and setup solutions to make data processing easier for Data Warehouse users. Analysts and backend engineers constantly update and add new jobs to our infrastructure, pushing it to its capacity limits. We ensure that they have the tools and knowledge to fully own the development, maintenance and optimization of their jobs;
- Simplify our event ingestion pipeline, which processes ~ 10.000.000.000 events every week.
We expect the platform to grow 2-4 times next year. There are huge challenges ahead. If this sounds interesting - you may well be just who we need. We’re looking for an engineer to join Vinted DWH in their work.
We are looking for someone who likes to solve problems related to data. As there are many unsolved problems in the domain, we are always on lookout for new techniques and technologies, we experiment a lot and use unconventional ways to solve problems. Knowledge on how database systems work is a big plus. Experience with Apache Spark is not mandatory, though very useful. We value pragmatism, big picture thinking, curiosity and problem solving skills. We expect you to be familiar with most, and have deep knowledge in at least a few of the following disciplines: Database Systems, Algorithms, Software Engineering, Systems Architecture, Big Data, Systems Scaling, Systems Performance Tuning, Computer Science.
This is a mid / senior level position.
- Learning 📚 🎫 ✈️ budget (10% from gross salary)
- 25 working days of holidays 🏝
- We buy all the tech 💻 🎧 ⌨ you need to do your job
- Daily breakfast and lunch 🍜 🌯 🥕 🍔 🍳 at Vinted Noise restaurant
- Budgets for ♕ 🍻 🎾 🥝 🍩 ⛸ 🎳 🎸 🎲 🎡 teambuildings
- Inhouse 🏋 gym
- Shop 👠 👕 ⌚️ @ Vinted budget
If you’re interested, contact Titas (firstname.lastname@example.org) and he will be in touch with you.
Here are some of the technologies we use: Hadoop HDFS, Apache Spark (ETL, Streaming, ML, ad-hoc analysis), Apache Kafka (message bus for tracking events integration with our core product), Cloudera Impala (for run-time metric aggregation, quick ad-hoc querying), Oozie (for job scheduling). We use Ruby and Scala for everyday work.