Google Summer of Code 2018 Work Product Submission



coala

Palash Nigam

I am second year student of B.Tech Computer Science and Engineering at International Institute of Information and Technology, Bhubaneswar, Odisha, India. I worked as a GSoC student under coala to improve coala’s I/O mechanism using FileFactory class and adding features to provide more support for NextGen-Core’s caching mechanism.


Patches Tarball


SHA-256:

11072a232d80160a6142e822b1ab66c28ccd4a72d6ea3762111ecdc922bb8479

Bonding

Phase 1

Phase 2

Phase 3


Links to commits and repositories I've worked on:

Repository Link to Commit/s Description
c  cEPs View

cEP-0026.md: Adds optimize caching cEP.

p  projects View

Updated the details of Optimize Caching project.

p  projects View

optimize_caching.md: Changed primary mentor from adtac to Makman2.

d  devops View

planet.ini: Add palash25 gsoc blog feed.

c  coala View

Added FileFactory class. Objects of this class are used to represent files and replace file contents in the file dict.

c  coala View

Added Directory class. This class will act as an interface to directories providing useful information about them. The objects of this class can be used by bears that operate only on directories and perform analysis based on the things like directory structure.

c  coala View

Added memoized_property. memoized_property is a decorator that caches the properties of FileFactory class.

c  coala View

NextGen_Core.rst: Update caching section.

c  coala View

IO.rst: Add FileFactory docs.

c  coala View

NextGen_Core.rst: Add link to IO docs.

c  coala View

FileFactory: Use cached_property instead of memoized_property.

c  coala View

A new attribute newline was added to FileFactory to force newlines on the file content.

c  coala View

Processing: Add middleware FileDict that provides the bears with the actual file contents instead of the FileFactory objects.

c  coala View

Processing: Modify tests to use actual files instead of hard coded tuples as file content.

c  coala View

Added support for collection types like dict and set for persistent_hash.


Optimize Caching for the NextGen-Core

Work Done

  1. Improved I/O mechanism for coala.
    1. Added a class FileFactory to interface with files and to provide file contents in different forms like string, binary data or as a tuple.
    2. Added a Directory class to interface with directories and provide useful information about them such as the directory path, parent directory and timestamps. This will be later used to implement ignore directories functionality.
    3. Added documentation for FileFactory and improved the NextGen-Core docs.
  2. Caching support.
    1. Cached the properties of FileFactory which turned out to be a significant performance boost in consecutive coala runs.
    2. Added support for unordered collection types in persistent_hash which improved the NextGen-Core’s ability to hash much more complex task objects (which are used in NextGen caching).
  3. Integrated FileFactory with the core.
    1. Added line endings support for FileFactory.
    2. Implemented a middleware FileDict that mimics a dictionary and provides the bears in the old core with the actual file contents instead of the FileFactory objects thereby maintaining backwards compatibility with the old core.
    3. Modified the tests to use actual files and the FileFactory objects instead of hard-coded file contents.

Challenges

One of the first challenges I faced was designing the FileFactory. But with my mentors help we were able to come up with a design that could be used for the NextGen-Core and also maintained compatibility with the old core.

The biggest challenge that I faced these three months for the integration of the aforementioned FileFactory with the core. Initially about thirty tests were failing but after discussing the idea of using a middleware (FileDict) to maintain backwards compatibilty with the old core the test failures were significantly reduced to just six. After a while I discovered that FileFactory was somehow removing line-endings from the file contents which was fixed by adding support for newlines in FileFactory which in turn led to a successful integration.

Work to be done

Two major features were left out because of time constraints:

  1. Ignore directories functionality for the NextGen-Core.
  2. Cache control flags for the NextGen-Core (which was already mentioned as a stretch issue in my GSoC proposal).