1 / 20

Prepared by Gregory Rokita

Edmunds’ Pomelo : Automobile Dealership Analytics in Real Time using MongoDB April 3 rd , 2012 Greg Rokita, Sharat Nair Edmunds.com , Inc. Prepared by Gregory Rokita. Assumptions. Understanding of MongoDB Experience with Java

brosh
Download Presentation

Prepared by Gregory Rokita

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Edmunds’ Pomelo: Automobile Dealership Analytics in Real Time using MongoDB • April 3rd, 2012 • Greg Rokita, Sharat Nair • Edmunds.com, Inc Prepared by Gregory Rokita

  2. Assumptions • Understanding of MongoDB • Experience with Java • Basic understanding of serialization protocols e.g. Thrift, Protocol Buffers • Basic understanding of messaging protocols e.g. JMS

  3. Agenda • Edmunds • Scale of Big Data operations • Use case for Pomelo Application • System Overview & Design • Real time integration with MongoDB • Real time data creation for MongoDB • Implementation • MongoDB Consumer • MongoDB REST service • Q&A

  4. Edmunds.com and Scale • Premier online resource for automotive information launched in 1995 as the first automotive information Web site • 15 million unique visitors • 210 million page views • 1 million+ new inventory items per day • 2 TB of new data every month • 40 node Hadoop cluster aggregating logs, transactions, calls, referrals, advertising, vehicle, pricing, inventory and other data sets

  5. Pomelo Application • Analytics tool for Automotive Dealers and Edmunds’ Dealer Sales • Performance measurement for Edmunds traffic and its correlation to calls & referrals • iPad, HTML5, Sencha Touch & Charts

  6. Unifying data for MongoDB

  7. Processing data for MongoDB-Oozie

  8. Populating MongoDB - Publishing System

  9. Targeting MongoDB - Producer-Consumer matching GenericThrift Producer MongoDB Consumer DealerMetrics Queue DealerMetrics Virtual Topic Publish DealerMetrics Publish DealerMetrics Prod LAX Edmunds GTP Test EC2 Edmunds MongoDB Broker Destination Interceptor Prod, Test Lax, EC2 Edmunds MongoDB Prod LAX, EC2 Edmunds GTP

  10. Integration with MongoDB – layered architecture for transport Thrift Camel ActiveMQ Type safety, versioning and service Retries and error handling Message persistence, durability and failover

  11. Preparing data for MongoDB - summary

  12. Thrift IDL definition

  13. Mongo Connection <bean id="mongo” class="com.edmunds...MongoDBConnectionFactory"> <property name="address" value="pl1db470.media.edmunds.com:27017,pl1db471.media.edmunds.com:27017"/> </bean>

  14. Mongo Connection - cont’d @Autowired public MongoDbDealerMetricsConsumer(Mongo mongo) { collection = mongo.getDB(DB_NAME).getCollection(COLLECTION_NAME); collection.ensureIndex(new BasicDBObject(LAST_ACTIVE_DATE, -1)); }

  15. Mongo consumer private void processDealerMetrics(DealerMetricsdealerMetrics) throws TException { String cddId = dealerMetrics.getCddDealershipId(); BasicDBObject query = new BasicDBObject(); query.put(CDD_ID, cddId); DBObjectdmObj = (DBObject) JSON.parse(serializeToJson(dealerMetrics)); /* query - query to match fields - fields to be returned sort - sort to apply before picking first document remove - if true, document found will be removed update - update to apply returnNew - if true, the updated document is returned, otherwise the old document is returned (or it would be lost forever) upsert - do upsert (insert if document not present) */ collection.findAndModify(query, null, null, false, dmObj, true, true); }

  16. Public interface to Mongo - Dealer public List<DBObject> getDocument(String cddId) { final BasicDBObject query = new BasicDBObject(); query.put(CDD_ID, cddId); final DBObject object = collection.findOne(query); object.removeField(OBJECT_ID); object.removeField(LAST_ACTIVE_DATE); return newArrayList(object); }

  17. Public interface to Mongo - Active list public List<DBObject> getActiveList() { final BasicDBObject query = new BasicDBObject(); query.put(LAST_ACTIVE_DATE, getActiveDate()); query.put(DMA_NAME, getDmaCriteria()); final BasicDBObject keys = new BasicDBObject(); keys.put(OBJECT_ID, 0); keys.put(CDD_ID, 1); keys.put(DEALERSHIP_NAME, 1); return collection.find(query, keys).toArray(); } private Object getActiveDate() { return collection.find().sort(getSortCriteria()).next().get(LAST_ACTIVE_DATE); } private BasicDBObjectgetSortCriteria() { return new BasicDBObject(LAST_ACTIVE_DATE, -1); } private BasicDBObjectgetDmaCriteria() { return new BasicDBObject("$in", DMAS); }

  18. Rest service @GET @Path("{id}") @Produces(MediaType.APPLICATION_JSON) public List<DBObject> get(@PathParam("id") String cddId) { return dealerMetricsMongoDao.getDocument(cddId); } @GET @Path("list") @Produces(MediaType.APPLICATION_JSON) public List<DBObject> getDealerList() { return dealerMetricsMongoDao.getActiveList(); }

  19. Q&A Greg Rokita grokita@edmunds.com Sharat Nair snair@edmunds.com

More Related