Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Avoiding LF/CRLF line terminations issues #244

@spc90

Description

@spc90

I have spent quite a lot of time trying to figure out why a brand new SGpp clone would fail building on an otherwise regular Ubuntu system. It turned out that for some reason git config core.autocrlf flag was set to true (globally), which meant all datasets provided with SG++ had the CRLF (carriage return + line feed) Windows-like line terminations, instead of the simple LF terminations of Unix systems. (Fyi, MACs have, of course, a third type: simple CR.) What this did was to fool the ARFFTools utility class in datadriven/tools to read .arff files wrongly, resulting in scons failing with UnicodeDecodeError (even though this simply masks the actual failing of asserts in the datadriven boost tests).

The change in line terminations would not have been revealed if not for the fact that ARFFTools uses std::getline, which on Unix systems returns an empty string for a file lines containing the correct Unix termination '\n' , but with the Windows line terminations it will return a non-empty '\r' character.

I still don't understand why the Ubuntu system I tested it on had that git config flag set in the first place, however, an easy way to not ever worry about this (as well as being general good practice for cross-platform codes handling datasets) would be to add a .gitattribute to SGpp containing git config core.autocrlf input, which would supersede any local git config, and would use the OS-specific defaults, as expected by the platform-specific implementations of std::getline in the c++ compilers.

Metadata

Metadata

Assignees

Labels

discussionGeneral problem for which possible solutions need to be discussedfeature requestDesirable, nice-to-have feature

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions