{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":730832824,"defaultBranch":"main","name":"nimble","ownerLogin":"facebookincubator","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2023-12-12T19:16:56.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/19538647?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1715272438.0","currentOid":""},"activityList":{"items":[{"before":"277b2c136fc081dd626d9c0d74f1b6f62dc4cbdc","after":"5ba326f5deb84e44d769c4e39b71d231f3b2300b","ref":"refs/heads/main","pushedAt":"2024-06-28T18:14:10.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Transform Optimizations + Trainer Role (#68)\n\nSummary:\nPull Request resolved: https://github.com/facebookincubator/nimble/pull/68\n\nTo better utilize our workers, I enabled the following:\n1. Parallel encoding\n2. Parallel stream reading\n3. I/O decoupling\n\nThese increased our CPU utilization on workers from ~30% to ~85%, speeding up transforms by a lot.\n\nIn addition, created a new role to train encoding layouts for jobs (with enough items in them).\nThis trainer is not super robust right now, and will be improved in the future (for example, detect if it is stuck, and restert).\nNot sure how much this is contributing to transform speed yet.\n\nReviewed By: sdruzkin\n\nDifferential Revision: D59125380\n\nfbshipit-source-id: 6d0d2ef3bd34ba268719353238d6b8fd176e8446","shortMessageHtmlLink":"Transform Optimizations + Trainer Role (#68)"}},{"before":"8c40f78effe5e3c7f26cab8bf53ed09924d25322","after":"277b2c136fc081dd626d9c0d74f1b6f62dc4cbdc","ref":"refs/heads/main","pushedAt":"2024-06-28T17:09:46.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Update Velox submodule (#67)\n\nSummary: Pull Request resolved: https://github.com/facebookincubator/nimble/pull/67\n\nReviewed By: pedroerp\n\nDifferential Revision: D59162073\n\nPulled By: sdruzkin\n\nfbshipit-source-id: 7d64b4832f34f3726b6562c80f83bbc66f60508e","shortMessageHtmlLink":"Update Velox submodule (#67)"}},{"before":"94422a84b245de81c5ece9ec1c9add61b6b1bc7d","after":"8c40f78effe5e3c7f26cab8bf53ed09924d25322","ref":"refs/heads/main","pushedAt":"2024-06-27T16:47:37.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Use FlatMapLayoutPlanner as a default planner (#66)\n\nSummary:\nPull Request resolved: https://github.com/facebookincubator/nimble/pull/66\n\nPromote FlatMap layout planner to be a default layout planner.\n\nChanges:\n* rename FlatMapLayoutPlanner to DefaultLayoutPlanner\n* rename tests\n* always create the DefaultLayoutPlanner in the writer\n\nReviewed By: helfman\n\nDifferential Revision: D58894881\n\nfbshipit-source-id: ed2270bdd377c1fde7e07ffa2d22447001ee4489","shortMessageHtmlLink":"Use FlatMapLayoutPlanner as a default planner (#66)"}},{"before":"d8d629e5d03499b5061af9c9a14abc95046bbec0","after":"94422a84b245de81c5ece9ec1c9add61b6b1bc7d","ref":"refs/heads/main","pushedAt":"2024-06-26T06:37:29.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Fix IOStats for Nimble (#65)\n\nSummary:\nPull Request resolved: https://github.com/facebookincubator/nimble/pull/65\n\nX-link: https://github.com/facebookincubator/velox/pull/10216\n\nIOStats are being calculated in different layers of the IO stacks.\nSince Nimble and DWRF don't share parts of the stack, some IOStats calculation were not affecting Nimble.\n\nProbably the right thing to do is to move all IOStats calculations to the bottom layers (WSFile, cache and SSD reads), where IO is actually performed (and these layers are shared beteen Nimble nad DWRF).\nBut it seems like that for this change, we need a design, clarifying what we actually want to track and how to track it.\n\nSince we don't have the cycles to create this design right now, I opted for a simple solution, where I create a simple layer on the Nimble side, which will calculate these stats.\n\nReviewed By: Yuhta, sdruzkin\n\nDifferential Revision: D58559606\n\nfbshipit-source-id: 7a13710e5273bd07f19106564c86cce88902da38","shortMessageHtmlLink":"Fix IOStats for Nimble (#65)"}},{"before":"8578738cd6a51f6af015d40d86b9d8973a4546f9","after":"d8d629e5d03499b5061af9c9a14abc95046bbec0","ref":"refs/heads/main","pushedAt":"2024-06-14T21:52:26.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Add stream byte count to 'nimble_dump histogram'\n\nSummary:\nAdd total storage bytes for encodings shown by `nimble_dump histogram`.\n\nCaveat is that for nested encodings like a dictionary, we will sum up sizes of child encodings. It also does not count include extra 5 bytes for the chunk header per stream.\n\nReviewed By: HuamengJiang\n\nDifferential Revision: D58444058\n\nfbshipit-source-id: cb74a7a93e489eb9d32572d8ce19da36967a5a2e","shortMessageHtmlLink":"Add stream byte count to 'nimble_dump histogram'"}},{"before":"0ff23c86e5b2473d5dc26566962bd04eeb618e7e","after":"8578738cd6a51f6af015d40d86b9d8973a4546f9","ref":"refs/heads/main","pushedAt":"2024-06-14T18:56:39.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Fix failure when selected feature is not in map (#63)\n\nSummary:\nPull Request resolved: https://github.com/facebookincubator/nimble/pull/63\n\nThis change fixes the failure that happens when selected feature does not exist in the target flatmap, for merged flatmap reader as well as struct flat map reader.\n\nReviewed By: Yuhta\n\nDifferential Revision: D58495009\n\nfbshipit-source-id: dc1c64753ee184d241dec00f4e7a8c90c6af243d","shortMessageHtmlLink":"Fix failure when selected feature is not in map (#63)"}},{"before":"d2b65bd114359fb0998adb866f208dc1a3c38c76","after":"0ff23c86e5b2473d5dc26566962bd04eeb618e7e","ref":"refs/heads/main","pushedAt":"2024-06-05T22:19:47.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Remove unused variable in DeduplicationUtils\n\nReviewed By: HuamengJiang\n\nDifferential Revision: D58117446\n\nfbshipit-source-id: c799fe806b6240e8a626287ffe54e78f9cf7e4b9","shortMessageHtmlLink":"Remove unused variable in DeduplicationUtils"}},{"before":"5462e6fe92d0d2651531011826cc49e25ccc9f3b","after":"d2b65bd114359fb0998adb866f208dc1a3c38c76","ref":"refs/heads/main","pushedAt":"2024-06-04T19:10:13.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Deprecate memory pool currentBytes api\n\nSummary:\nDeprecate currentBytes API in memory pool which gives different stats for different\ntypes of memory pool. For leaf memory pool, it returns actual used memory reservation\nbut for non-leaf memory pool, it returns the aggregated memory reservation from all\nits child pools which include the used reservation and not used ones.\nThis PR changes deprecate currentBytes, and switch to use usedBytes for actual used\nreservation stats and reservedBytes for reservation stats. Both returns the same kind\nof stats for different types of memory pools. The new usedBytes is bit expensive for\nnon-leaf kind of memory pool as it needs to traverse its child memory pool hierarchy to\nget the actual used memory.\n\nX-link: https://github.com/facebookincubator/velox/pull/10024\n\nReviewed By: tanjialiang\n\nDifferential Revision: D58091982\n\nPulled By: xiaoxmeng\n\nfbshipit-source-id: bd94f69743bd0e135c1b7540b5466e58fe682295","shortMessageHtmlLink":"Deprecate memory pool currentBytes api"}},{"before":"acad60d4a7d44b78ca3fdd499459a226ab74a392","after":"5462e6fe92d0d2651531011826cc49e25ccc9f3b","ref":"refs/heads/main","pushedAt":"2024-06-04T04:02:12.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Open source CI fixes (#61)\n\nSummary:\nFixing a series of issues with CI. The repo rename facebookexternal->facebookincubator broken jobs, and since then other error accumulated. This hopefully addresses all of them.\n\nAlso advancing Velox's submodule.\n\nPull Request resolved: https://github.com/facebookincubator/nimble/pull/61\n\nReviewed By: tanjialiang\n\nDifferential Revision: D58116284\n\nPulled By: pedroerp\n\nfbshipit-source-id: c27251b0c34b9465a57e75b731ca3458e9dfeb9a","shortMessageHtmlLink":"Open source CI fixes (#61)"}},{"before":"c2fa2a47cf23c0ed6127400f9fead9493bbb754c","after":"acad60d4a7d44b78ca3fdd499459a226ab74a392","ref":"refs/heads/main","pushedAt":"2024-06-03T20:20:10.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Optimize readWithVisitor for TrivialEncoding and MainlyConstantEncoding (#60)\n\nSummary:\nX-link: https://github.com/facebookincubator/velox/pull/10021\n\nPull Request resolved: https://github.com/facebookincubator/nimble/pull/60\n\n- Fast path for `TrivialEncoding::readWithVisitor`\n- Fast path for `MainlyConstantEncoding::readWithVisitor`\n- Store `encodingType`, `dataType`, `rowCount` in `Encoding` object memory to reduce memory fetch on `data_`\n- Use skip functor only in `readWithVisitorSlow` to avoid virtual call cost\n\nbypass-github-export-checks\n\nReviewed By: oerling\n\nDifferential Revision: D58085138\n\nfbshipit-source-id: e308d2c44c8e45f89a2367c8b88f1adb6511b6f9","shortMessageHtmlLink":"Optimize readWithVisitor for TrivialEncoding and MainlyConstantEncodi…"}},{"before":"8c8e73d13bed179b273291a523da95082877c7d9","after":"c2fa2a47cf23c0ed6127400f9fead9493bbb754c","ref":"refs/heads/main","pushedAt":"2024-05-31T23:49:14.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Fix internal compression integration regression (#59)\n\nSummary:\nPull Request resolved: https://github.com/facebookincubator/nimble/pull/59\n\nWith the recent changes related to open sourcing Nimble, we moved internal compressors to be compiled using conditional macros.\n\nSince this affects a header file, this macro needs to be defined on EVERY project that includes this header.\n\nWe are now switching this to not include internal compressors if a macro is defined.\nIt is easier to setup a global macro in the CMake system, than it is on our internal build (and less error prone).\n\nReviewed By: sdruzkin\n\nDifferential Revision: D58037836\n\nfbshipit-source-id: 97cce505caa8005d92ac29621f4f54541cbee09d","shortMessageHtmlLink":"Fix internal compression integration regression (#59)"}},{"before":"d924245d5cfd4ba2cac56d2bf7f124536b89af7c","after":"8c8e73d13bed179b273291a523da95082877c7d9","ref":"refs/heads/main","pushedAt":"2024-05-28T21:26:27.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Optimize readWithVisitor for RleEncoding and NullableEncoding (#58)\n\nSummary:\nX-link: https://github.com/facebookincubator/velox/pull/9896\n\nPull Request resolved: https://github.com/facebookincubator/nimble/pull/58\n\n- Add fash path for `RleEncoding::readWithVisitor`\n- Use `materializeBoolsAsBits` in `NullableEncoding::readWithVisitor`\n- Merge `ChunkedBoolsDecoder` with `ChunkedDecoder`\n- Optimize the data type dispatch in `EncodingUtils.h` to improve compilation time\n\nbypass-github-export-checks\n\nReviewed By: oerling\n\nDifferential Revision: D57675525\n\nfbshipit-source-id: 419c24d81007b22d92a555e1648ac97077aaeecb","shortMessageHtmlLink":"Optimize readWithVisitor for RleEncoding and NullableEncoding (#58)"}},{"before":"d98b3ead481922c1431bb34f0404a9b5df4906a6","after":"d924245d5cfd4ba2cac56d2bf7f124536b89af7c","ref":"refs/heads/main","pushedAt":"2024-05-28T19:48:09.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Address UB in EncodingSelectionPolicy\n\nSummary:\nHypothesis for UB:\n\n* We are [creating](https://www.internalfb.com/code/fbsource/[14c653f501db]/fbcode/dwio/nimble/encodings/EncodingSelectionPolicy.h?lines=54) `EncodingSelectionPolicy` based on [actual type](https://www.internalfb.com/code/fbsource/[14c653f501db1b52956d55a235a2a710976ad0b7]/fbcode/dwio/nimble/common/Types.h?lines=182%2C189%2C196%2C203%2C210%2C217%2C224%2C231%2C238%2C245%2C252%2C258%2C264)\n* However, we were creating `ManualEncodingSelectionPolicy` based on the [physical type](https://www.internalfb.com/code/fbsource/[14c653f501db1b52956d55a235a2a710976ad0b7]/fbcode/dwio/nimble/common/Types.h?lines=183%2C190%2C197%2C204%2C211%2C218%2C225%2C232%2C239%2C246%2C253%2C259%2C265)\n* This was problematic for cases where actual type doesn't match physical type because we weren't able to determine the correct overridden method, thus, a wrong `select` method i.e. from `ReplayedEncodingSelectionPolicy` was getting invoked that was leading to invalid encoding type and throwing exceptions.\n\n* With this diff, we will enforce same types in the class hierarchy for `EncodingSelectionPolicy`\n\nReviewed By: HuamengJiang, vladima\n\nDifferential Revision: D57709332\n\nfbshipit-source-id: 8d964c33c6a8bc6fc61e708dce485c9606101626","shortMessageHtmlLink":"Address UB in EncodingSelectionPolicy"}},{"before":"08c8928a346544bbedebe72d1f888560dfbf404e","after":"d98b3ead481922c1431bb34f0404a9b5df4906a6","ref":"refs/heads/main","pushedAt":"2024-05-23T16:42:35.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Add support for deduplicated map in validation services and add test for nested cases\n\nSummary:\nThis diffs added support for the validation service to add id_score_list columns into the deduplicated map columns.\n\nGiven that the map are typically wrapped in a flat map, this diffs also add tests for nested map.\n\nReviewed By: sdruzkin, HuamengJiang\n\nDifferential Revision: D57626634\n\nfbshipit-source-id: 442cb80c2cc2d5b901655093c1f0154e0dae560c","shortMessageHtmlLink":"Add support for deduplicated map in validation services and add test …"}},{"before":"d4f82ba2f9b2fb0936ba4f42d1d468c7b972c46b","after":"08c8928a346544bbedebe72d1f888560dfbf404e","ref":"refs/heads/main","pushedAt":"2024-05-21T19:26:26.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Remove unused function from dwio/nimble/velox/VeloxReader.cpp\n\nSummary:\n`-Wunused-function` has identified an unused function. This diff removes it. In many cases these functions have existed for years in an unused state.\n\nThis diff may also be removing code related to antiquated usage of OpenSSL 1.1.0.\n\nReviewed By: palmje\n\nDifferential Revision: D57577765\n\nfbshipit-source-id: d41e806473eec2de4c754f784eb72d227361e3b9","shortMessageHtmlLink":"Remove unused function from dwio/nimble/velox/VeloxReader.cpp"}},{"before":"76dbd875ddcd56816b979634fc4e9c02173848c2","after":"d4f82ba2f9b2fb0936ba4f42d1d468c7b972c46b","ref":"refs/heads/main","pushedAt":"2024-05-20T20:27:50.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Fix debug build for Nimble (#57)\n\nSummary:\nPull Request resolved: https://github.com/facebookincubator/nimble/pull/57\n\nD57343997 broke the build. Let's fix by just not setting a variable and using the defined value directly.\n\nReviewed By: rukshanperera\n\nDifferential Revision: D57570628\n\nfbshipit-source-id: 7a76965c0afeb72e8c4ad3aed0e8017dca56ae47","shortMessageHtmlLink":"Fix debug build for Nimble (#57)"}},{"before":"7039d93c447b6623bbce0c313a51c82b646904c5","after":"76dbd875ddcd56816b979634fc4e9c02173848c2","ref":"refs/heads/main","pushedAt":"2024-05-20T16:05:06.000Z","pushType":"push","commitsCount":4,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Support deduplication in SlidingWindowMapFieldWriter\n\nSummary:\n### Context about SlidingWindowMap\n\nThe motivation of `SlidingWindowMap` is to introduce dictionary encoding to id_score_list feature which is represented as `map` in velox. We want to be able to deduplicate id_score_list in storage and also when reading it back.\n\n```\nMapVector of id_score_list_features\n\noffsets keys vals\n0 1 0.1\n2 2 0.2\n4 1 0.1\n6 2 0.2\n 3 0.3\n 4 0.4\n 3 0.3\n 4 0.4\nSlidingWindowMapWriter writes into file\n\noffsets lengths keys vals\n0 2 1 0.1\n0 2 2 0.2\n2 2 3 0.3\n2 2 4 0.4\n\nSlidingWindowMapReader reads from file and put the deduplicated map into a Dictionary