I Visited a Store of the Future
I’m a driven and ambitious 16-year-old learning the skills and knowledge I will need to solve important problems in the world.🌎
This past week, I was invited to Seattle to speak for Microsoft at a blockchain conference called TruffleCon; which had over 700+ industry leaders in attendance. While there I took advantage of meeting some of the top tech industry leaders that call Seattle home to their headquarters like Starbucks, Microsoft, and Amazon.
One of the Highlights: Visiting Amazon Go
On one of our trips, we got to visit the Amazon offices, the spheres and saw an Amazon Go store in detail.
Naturally, I was very amused by Amazon Go because of their cashier-less, high-tech system and wanted to learn more. These are some of my learnings about 1. The Amazon Go Process, 2. How the Computer Vision tech worked.
Reinventing Traditional Stores
Amazon Go is SO SICK. Walking in you feel very different from a conventional grocery store. It’s so simple, yet makes so much sense.
Amazon Go proposes a new kind of store with no checkout required. When you shop at Amazon Go, you’ll never have to wait in line. The store works with an Amazon Go application, you log in, take the product(s) you need and leave.
It works by using similar types of technologies found in self-driving cars, like computer vision, sensor fusion and deep learning. This technology can detect when products are taken or returned to the shelves and keeps track of them in your virtual cart.
When you leave the store with your goods, your Amazon account is charged and you are sent a receipt. Sounds simple but trust me, it’s very different from a conventional store.
Process with Amazon Go
- Enter the store, and you get quick access to groceries, fresh coffee (speaking of, I tried my first real coffee in Seattle👏) and other convenience goods.
- Super easy interface. As a user, you sign in by scanning in at the front of the store.
- You can go around the store, pick up any items, add them to a bag. Shop like you normally would. The high-tech system can pick up what items you pick up, drop off and keep track of them within your attached account.
- Consumer exits + card associated with Amazon card is charged and you get a receipt.
Meanwhile, behind the scenes: there is magic 🔮happening with technology. I’ll get into this a bit later.
Improvement from Conventional Store:
The Amazon process cuts out about two steps in the traditional store model; including one of the most time consuming and the least rewarding experience: checking out.
How does the magic aka Amazon Go work?
Interface Journey — Amazon Go App
To get started with Amazon Go, you need an Amazon account, and the free Amazon Go app on any electronic device.
As a customer you can open the Amazon Go app on your phone, then you hold it to a scanning device, which works like a subway turnstile and entering the store. The Key screen seems to bring up the QR code that the store’s turnstiles scan to let you in, and the Receipts screen serves up what you bought after you’ve left.
Here’s a video of Jennifer explaining how this works:
Then you don’t need your phone. You can start picking up items, putting them in bags found in-store or brought from home (without needing to scan each item).
“Just Walk Out” technology
The technology is a combination of deep learning, computer vision and data pulled from multiple sensors that are similar to self-driving cars. This is to make sure customers are only charged for the stuff they pick up. Basically, there are cameras being used to track you in the store. The roof is filled with these and they look like this:
The goal is to be able to connect a product with a specific shopper, by the use of cameras that would take photos.
Amazon would take photos when people enter the store, when they removed items from a shelf, and when they left with items in their hands or bags. There is also facial recognition and consumer information being collected like images of the consumer, details about the user like height and weight, user biometrics, a username and password, even user purchase history.
Sensor fusion is the software that is intelligently combining data from several sensors to track a person as they remove, or later pick up various different items from various different shelves.
High Level Architecture of the Platform
Computer Vision Core Components
The core algorithm is the Computer Vision based Machine Learning that is used to seamlessly track and estimate the intention of everyone in the store.
There are 6 core problems or things we needed to hit right for an experience like this:
- Sensor Fusion: Aggregate signals across different sensors (or cameras)
- Calibration: Have each camera know its location in the store very accurately
- Person detection: Continuously identify and track each person in the store
- Object Recognition: distinguish the different items
- Pose estimation: detect what exactly each person near a shelf is doing with their arms
- Activity Analysis: Determine whether a person has picked up vs. returned an item. Associate items with the specific customer.
First up: Person Detection
This is very important because we can’t know who took what as a series of independent picks. We have to continuously track each person the whole time they are in a store.
- Occlusion: where a person is blocked from view by something in the store
- Tangled State: where people are very close to each other & you can’t distinguish between them.
Amazon solves these issues by using custom camera hardware that does both RGB video and distance calculation. They segment image into pixels, group pixels into blobs, and label each blob as person/not-person. They build a location map from the frame using triangulation of each person across multiple cameras.
Linking it all together: This part of the task is to ensure the labels are preserved across frames in the video, moving from locating to tracking the customers in the store.
If a tangled state problem (e.g., distinguishing associates who perform different behaviour than customers who are very close together). The Go Store marks these customers as low confidence get scheduled to be re-identified over time.
This is an example of Tangled States:
Next: Item Detection
Product ID detection: We need to figure out which specific items are off the shelf and in someone’s hand.
Some problems & solutions with this:
- Items that are very similar, like 2 different flavours of the same brand of drink, can be distinguished using residual neural networks that do refined product recognition (across multiple frames) after the Convolutional Neural Network (CNN — image classification basically) layer identifies the item class:
Then: Customer Association
Probably the most challenging problem is combining all of the information from the above steps to finally answer the “Who took what?” question.
This decision is simply made from a stick figure model of the customers collected from previous work.
Pose Estimation: The cameras look from the top down, so they need to trace a path through the pixels representing the arm between the items and a customer. A simple top down model does not work for this though we need a stick-figure like model of the customer.
This algorithm uses a CNN with a cross entropyloss function to build the joint detection point cloud, self regression for vector generation, and pairwise regression to group the vectors together. This model can be used on any video clips to aid in solving many other problems that rely on pose estimation.
Action determination: Need to accurately account for a world where the customer can put items back on the shelf.
For example, a customer can put an item back and push the remaining ones further back on the shelf. It’s easy to think that this is just a customer picking up an item BUT it’s not. To solve for this, the system needs to count all the items on the shelf rather than using a simple assumption based on space.
Too many positions: Another issue is that there are a TON of different ways in which someone can pick up an item. This is too much right now for us to predict or create a data set for. To solve for this, they generated synthetic activity data using simulators. Within these simulators, they needed to create virtual customers (including variations in clothing, hair, build, height etc.) cameras, lighting & shadows, and simulate the same camera hardware limitations.
Finally: Entry & Exit Detection
Next thing we need to figure out is when someone enters or exists a store + allows us to know when to track them. This system has these components:
- Mobile App to scan QR when you show up at the store. Lot’s of testing was done to make sure it works in various different situations (scan with phone up or down, how to handle groups, etc.)
- Association System associates your likeness in the video to your account based on position in the store entrance when you scan the QR code
- Creation of the session happens based on the association
There might be an issue with groups:
- Customers (especially families) want to shop as a group but only have one person pay. To enable this, the “head”/payer scans the same code for each person as they enter the store. This creates a session that links all of the people in the group to the same account. Now, you can exist as an individual or team anytime you want.
Is Amazon Go the Store of the Future?
We visited Amazon’s original Seattle Amazon Go store to experience Amazon Go first hand. From observation, it seems like the technology works extremely well — I tried picking up multiple items at once and putting them back + it knew not to charge us.
The experience is also very quick and feels exciting. There’s not much traffic in store and the no cashier system is very efficient. I don’t think technology will ever be a problem, so we can see the system being able to easily track and check people even in more complex situations.
Plug: Visit an Amazon Go
There are now four Amazon Go stores in Seattle. There are now also four stores in Chicago and two in San Francisco. Check out the latest list of Amazon Go locations.
Interesting in keeping up to date with my journey?