[15 Aug 14:42] marino, dillon https://marc.info/?l=openbsd-cvs&m=143956261214725&w=2 [15 Aug 14:44] [gitweb-dfbsd] - localedef(1): eliminate need for "print" definition - http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/97055fc243c274ccb7531c2ad47484df211bed6a - John Marino [15 Aug 14:47] buggs: how can you see the diff? [15 Aug 14:48] what does this even mean? that you can't use latin at all any more? [15 Aug 14:51] http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/lib/libc/locale/setrunelocale.c.diff?r1=1.11&r2=1.12 [15 Aug 14:52] thanks [15 Aug 15:04] YRabbit: nope [15 Aug 15:11] disabling the kms console makes the resolution okay [15 Aug 15:23] --> kropotki1 (~Thunderbi@217.19.28.216) joined the channel [15 Aug 15:27] buggs: why are you repeating that link? [15 Aug 15:31] luxh: it means that OpenBSD now works like FreeBSD has worked for ages [15 Aug 15:35] marino, obviously I was not aware of its precious appearance [15 Aug 15:35] [gitweb-dfbsd] - cldr2def: Slim down ctype src files - http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/6e46cba7c2068ca21f8d50de601d72784bbecb9b - John Marino || ctypedef: Replace entire "print" sections with one element - http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/e2b72454ba857934524b7045e27559c5175f8cbf - John Marino [15 Aug 15:35] *previus [15 Aug 15:35] *previous [15 Aug 15:42] actually looking at the link, I think all this does is change short codes (e.g. de_DE) to default to UTF-8 rather than some one-byte encoding [15 Aug 15:43] no, i need to see the full context [15 Aug 15:44] the use of strstr suggests that the only legal encoding is UTF-8 and if it's not specified then it falls back to ascii [15 Aug 15:44] (which I guess is C-locale) [15 Aug 15:45] so basically OpenBSD users can't use ISO8859 even if they want to [15 Aug 15:50] have you guys tried synergy? it's nice [15 Aug 15:52] is it an energy drink? [15 Aug 15:55] no, it's a keyboard/mouse multiplexer via network [15 Aug 16:09] --> sephe (~sephe@122.84.143.132) joined the channel [15 Aug 16:10] moin [15 Aug 16:10] moin [15 Aug 16:11] there are many comments about the openbsd commit on hackernews => https://news.ycombinator.com/item?id=10061028 [15 Aug 16:14] seems highly related to the heated discussion last night [15 Aug 16:14] interesting: "Crashing on invalid data sounds like a great idea. Leaving garbage through doesn't." [15 Aug 16:15] that was basically my and Riviera's point [15 Aug 16:15] dillon had a proposal about locale strictness but it's opt-in strict. I don't mind it if it's reversed [15 Aug 16:16] right now it cripples regex by default [15 Aug 16:16] i also do not have a true sense for what the impacts of doing nothing really is [15 Aug 16:17] (doing nothing == recommending forcing C-locale for legacy scripts that are affected) [15 Aug 16:17] (and also documenting on UPDATING page) [15 Aug 16:18] most of hacker news got derailed by talk about UTC and the metric system [15 Aug 16:19] the openbsd decision makes plenty of sense, the ISO* encodings are clearly obsolete [15 Aug 16:19] that's not at all what people are talking about [15 Aug 16:20] the issue is about bytes values 80-FF [15 Aug 16:20] which is not legal UTF-8 [15 Aug 16:20] but is not meant to be invalid [15 Aug 16:20] just removing ISO only exarcebates that problem [15 Aug 16:20] ? [15 Aug 16:20] of course it is [15 Aug 16:20] that was the point of not defining 80-FF [15 Aug 16:21] nope, the locale arbitrarily decides it's invalid [15 Aug 16:21] ? [15 Aug 16:21] wtf [15 Aug 16:21] 80-FF are not in UTF-8 definition, full stop [15 Aug 16:21] but the people who created the initial data in the first place had no idea it would have to be used in an utf8 encoding years later [15 Aug 16:21] any single byte character with that value is invalid, by definition [15 Aug 16:22] the UTF8 designers intentionally omitted those values [15 Aug 16:23] as bapt said, until syscons supports UTF-8, it's probably a bad idea to default to it [15 Aug 16:23] we are talking about short codes [15 Aug 16:24] fr_FR. nobody needs to use those, they can and maybe should use full locale specification [15 Aug 16:25] it's not a bad idea, the last major operating system to use 8-bit encodings were Microsoft Windows 98 and the Apple System operating systems [15 Aug 16:26] ftigeot: then go ahead and upgrade syscons and i'll be happy to sign off on UTF-8 defaults for short-codes [15 Aug 16:28] it sounds like you have a problem with ISO-8859-1 and ISO-8859-15 just existing [15 Aug 16:28] yes [15 Aug 16:28] why? [15 Aug 16:28] they're obsolete [15 Aug 16:28] you don't have to use it [15 Aug 16:28] but who are you to tell me I can't use them/ [15 Aug 16:28] ? [15 Aug 16:28] and prevent non-latin characters to be used in the same documents [15 Aug 16:28] if I want ISO... [15 Aug 16:28] Is not that my choice? [15 Aug 16:29] (speaking as any user) [15 Aug 16:29] and frankly I do not consider ISO-8859 is obsolete [15 Aug 16:29] I think it still has use [15 Aug 16:29] -1 for european countries, fine [15 Aug 16:29] but in general, no [15 Aug 16:32] ftigeot: hopefully you caught I removed a great number of -1 and -15 ISO-8859 locales [15 Aug 16:32] that was in direct response to your recommendations [15 Aug 16:32] yes and I can't thank you enough for that [15 Aug 16:32] :-) [15 Aug 16:32] so at least you won't get -1 and -15 mixed up in same locale [15 Aug 16:36] --> sephe (~sephe@122.84.143.132) joined the channel [15 Aug 16:38] ftigeot: if possible, please help test the latest inet6 change b8a24f4403701576907cfd190f1d85de68b07d06 [15 Aug 17:01] https://www.usenix.org/system/files/conference/usenixsecurity15/sec15-paper-guri-update.pdf [15 Aug 17:01] gnihi [15 Aug 17:02] * profmakx wonders whether anything happened to font rendering in the last 2 weeks [15 Aug 17:10] did you upgrade packages? [15 Aug 17:13] yes [15 Aug 17:13] twice [15 Aug 17:14] did you upgrade your system too? [15 Aug 17:14] yes [15 Aug 17:14] multiple times [15 Aug 17:15] (I am not complaining, it looks as if the fonts became sharper) [15 Aug 17:16] [gitweb-dfbsd] - cldr2def: Add 6 Arabic locales: AE EG JO MA QA SA - http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/10bbbe2bb68f712bb51dceeb037be46d392bdf3e - John Marino || Add 6 Arabic locales: AE EG JO MA QA SA - http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/252b00558add1721cc08e392ece1f9578fc3b30c - John Marino [15 Aug 17:35] --> _hasso_ (~hasso@0010-0000-0000-0000-58f1-8384-07d0-2001.dyn.estpak.ee) joined the channel [15 Aug 17:41] <_hasso_!~hasso@0010-0000-0000-0000-58f1-8384-07d0-2001.dyn.estpak.ee> marino: a commit message mentioning et_EE caught my attention ... 8859-15 is of course OK, but only if 8-bit locale really MUST be used ... using 8-bit locale in general is asking for trouble nowadays [15 Aug 17:42] the user is free to use UTF-8 [15 Aug 17:42] -15 is only the symlink to et_EE (short code) [15 Aug 17:43] et_EE.UTF-8 removes any ambiguity [15 Aug 17:44] the symlink is just to be consistent with the rest of europe and US [15 Aug 17:54] <_hasso_!~hasso@0010-0000-0000-0000-58f1-8384-07d0-2001.dyn.estpak.ee> any 8-bit stuff must die, asap :) [15 Aug 17:54] huh [15 Aug 17:54] why? [15 Aug 17:54] utf8 has it's own set of problems [15 Aug 17:57] yeah, for example that my utf8 locale doesn't allow me to print a £ sign [15 Aug 17:57] <_hasso_!~hasso@0010-0000-0000-0000-58f1-8384-07d0-2001.dyn.estpak.ee> yes, but dealing with countless 8-bit codepages is not one of them [15 Aug 17:58] that doesn't mean 8-bit doesn't have uses and needs to die though [15 Aug 17:59] 8-bit encodings are still used on ms-dos systems [15 Aug 17:59] the rest of the world has moved on already [15 Aug 17:59] *yawn* [15 Aug 17:59] can we please make my bloody utf8 rendering work again? [15 Aug 17:59] because that did work before "we moved on" [15 Aug 18:03] profmakx: what's the issue? [15 Aug 18:04] ftigeot: why would americans need more than ISO-1 ? [15 Aug 18:04] how many of them need cyrillic, CJK, etc ? [15 Aug 18:04] --> sinetek (~quassel@114.134.184.204) joined the channel [15 Aug 18:04] maybe for euro sign .... but not language itself [15 Aug 18:05] USA is big enough not to give a shit about the rest of the world. see the metric system and soccer [15 Aug 18:05] not that I agree about the former, that sucks [15 Aug 18:06] marino it doesn't show [15 Aug 18:06] <00a3> [15 Aug 18:06] not really big in terms of population [15 Aug 18:06] that's what I get sometimes [15 Aug 18:07] it's called football [15 Aug 18:07] profmakx: what's the setting? syscons? x11 program? [15 Aug 18:07] why would I care about syscons? [15 Aug 18:07] that has been broken for a decade [15 Aug 18:07] YRabbit and I did some ports adjusting athough I don't think you'd be affected [15 Aug 18:08] profmakx: okay but I'm trying to help. [15 Aug 18:08] just describe the situation so I understand better [15 Aug 18:08] it's in uxterm and urxvt [15 Aug 18:08] also in roxterm [15 Aug 18:08] locale is set to ? [15 Aug 18:08] en_GB.UTF-8 [15 Aug 18:09] does it work if LC_CTYPE=xx_Comm_US.UTF-8 in env? [15 Aug 18:09] (assuming current master) [15 Aug 18:09] is it really U00a3 or just a3? [15 Aug 18:09] depends on how current you want master to be [15 Aug 18:10] Warning: locale not supported by C library, locale unchanged [15 Aug 18:10] Fontconfig warning: ignoring xx_Comm_US.UTF-8: not a valid region tag [15 Aug 18:10] * marino looks it up [15 Aug 18:10] apparently not recent enough? [15 Aug 18:10] profmakx: your fontconfig isn't recent enough [15 Aug 18:10] we had to patch it [15 Aug 18:10] DragonFly v4.3.1.171.gc1b91-DEVELOPMENT #8: Tue Aug 11 14:11:04 BST 2015 [15 Aug 18:10] and libX11 [15 Aug 18:11] profmakx: a3 is extended ascii [15 Aug 18:11] not unicode [15 Aug 18:11] what is producing the A3 value? [15 Aug 18:12] ah, it's both [15 Aug 18:12] me hitting shift-3 [15 Aug 18:12] where the pound symbol is located [15 Aug 18:12] no, it's c2a3 in utf-8 [15 Aug 18:13] profmakx: okay, my guess is keyboard is not mapped to utf-8 [15 Aug 18:13] my terminal display is completely fucked up as well [15 Aug 18:13] i don't know anything about keyboard mapping / drivers [15 Aug 18:13] Oo [15 Aug 18:13] wat? [15 Aug 18:13] my guess it's producing 1-byte A3 instead of 2byte c2-a3 [15 Aug 18:14] crap, let me recheck this [15 Aug 18:15] pound is unicode hex = a3, utf-8 hex = c2a3 [15 Aug 18:16] (and extended ascii = a3) [15 Aug 18:16] the keyboard should be outputing 2 bytes but it's either producing 00a3 or just a3, which is wrong for utf-8 AFAICT [15 Aug 18:18] let me try something then [15 Aug 18:18] 1 [15 Aug 18:20] setting the locale before setxkbmap deosn't change anything [15 Aug 18:20] also now my fonts are completely fucked up, so I am going to reboot [15 Aug 18:20] k. i don't know anything about keyboards though [15 Aug 18:22] euro doesn't work either [15 Aug 18:22] <20ac> [15 Aug 18:30] utf-8 is 0xE282AC [15 Aug 18:30] 3-bytes [15 Aug 18:35] I would say that some library someweher is broken [15 Aug 18:35] one that xterm and rxvt use, and firefox doesn't [15 Aug 18:35] neither does emacs [15 Aug 18:36] profmakx: The problem is probably not xterm or rxvt, but the individual shell programs [15 Aug 18:36] KeyPress event, serial 32, synthetic NO, window 0xa00001, root 0xb5, subw 0x0, time 904138, (407,191), root:(408,210), state 0x1, keycode 12 (keysym 0xa3, sterling), same_screen YES, XLookupString gives 2 bytes: (c2 a3) "£" XmbLookupString gives 2 bytes: (c2 a3) "£" XFilterEvent returns: False [15 Aug 18:36] like [15 Aug 18:36] zsh [15 Aug 18:36] fair enough [15 Aug 18:36] can I get that fixed then *please* [15 Aug 18:36] it is indeed fucking zsh [15 Aug 18:36] bash works fine [15 Aug 18:37] profmakx: e.g. characters should show up correctly in the terminal when running "cat /dev/stdin" and typing stuff [15 Aug 18:37] what about tcsh? [15 Aug 18:37] i don't care [15 Aug 18:37] I want to use zsh [15 Aug 18:37] has it just not been rebiuld [15 Aug 18:37] port options? i'll take patches [15 Aug 18:37] after the frenzy? [15 Aug 18:55] While working on that patch it solidified my position a bit that it shouldn't barf on input. Ok to barf on a bad reg-ex control string, but not on input. The regex API *never* barfed on input before this. It's reasonable to say that programmers shouldn't expect it to barf on input now [15 Aug 18:55] that's the basic issue [15 Aug 18:55] there is another set of regex calls that explicitly take wchar arrays, but programs like sed don't use it. sed uses the original regex API (which in our new libc now processes utf) [15 Aug 19:01] yo [15 Aug 19:11] dillon: but one would not expect the limitation to 8-bit encoding to barf since there are no illegal coding possible [15 Aug 19:11] interesting thread. Python's solution / the UTF-8B proposal seems reasonable [15 Aug 19:11] it's kind of apples to oranges comparison [15 Aug 19:12] ultimately the ycombinator thread is talking about the same issue... being able to represent arbitrary 8-bit-clean data in a codepoint stream, which has the side effect of never erroring on the input stream [15 Aug 19:12] python is not a poster child for standards or compatibility [15 Aug 19:12] but it solves a big problem that needs to be solved [15 Aug 19:14] in that respect, it is ahead of the game. It's clear that this is a really big problem with UTF input processing. Simply aborting with an error is not a solution anyone wants. [15 Aug 19:14] i didn't really like the patch, but I wouldn't fuss much if it was off by default [15 Aug 19:14] e.g. OPT IN to cripple regex [15 Aug 19:14] no, kills the whole purpose. But why not implement the python3/utf-8b idea instead ? [15 Aug 19:14] then the input stream can't error out, ever. [15 Aug 19:15] let somebody else blaze that trail [15 Aug 19:15] and let it get adoption [15 Aug 19:15] I don't consider that an option. We need to deal with this issue [15 Aug 19:16] IMO all we are obligated to do is advertise the issue on release and updateing notes [15 Aug 19:16] The more I read about it, the more It hink that UTF translations should never barf on 8-bit raw data [15 Aug 19:16] dillon: i don't like your solution because it's on by default [15 Aug 19:16] and I think it's incorrect [15 Aug 19:16] I believe it is the correct solution, and the best solution [15 Aug 19:16] but I have no issues with a SLOP_MODE [15 Aug 19:17] or whatever [15 Aug 19:17] your solution is for programs to start barfing on input because it is malformed. [15 Aug 19:17] i am usign correct literally [15 Aug 19:17] programs which never, ever barfed on an input stream before, now suddenly start failing [15 Aug 19:17] that is just not acceptable to me [15 Aug 19:17] the output of regex with bad input is incorrect [15 Aug 19:17] dillon: well, not to be pedantic but where's the real world problem? [15 Aug 19:18] golang also uses a similar methodology [15 Aug 19:18] one firefox patch that we can handle systematically [15 Aug 19:18] that's three languages that think it's a serious issue and have their own solutions [15 Aug 19:18] wait until one of them gets standardized? I really don't think we should blaze this trail [15 Aug 19:18] ha, golang didn't quite get it right though. But it looks like python did [15 Aug 19:19] and I really would like regex to work correctly OOTB [15 Aug 19:19] so you are saying we should back-out our locale support until a standard emerges ? [15 Aug 19:19] sure there are opponents to python solution [15 Aug 19:19] we can't run with what we have now if it barfs on input [15 Aug 19:19] what's wrong with my suggestion? [15 Aug 19:19] my preferred solution is to adopt python3's methodology [15 Aug 19:20] I don't like the opt-in/opt-out idea either way. I think the processing needs to naturally not barf on input in all situations [15 Aug 19:20] but python-2B is not utf-8 [15 Aug 19:20] or 8b whatever it's called [15 Aug 19:21] again, I shouldn't have to repeat this, but there is an expectation historically that these programs do *NOT* barf on the input stream. And now they can. That's the core problem and it needs to be addressed [15 Aug 19:21] things change. historically it was an 8-bit stream [15 Aug 19:21] and it's very simple to keep that 8-bit processing [15 Aug 19:22] you are basically telling people , too bad, programs can now barf on input now. [15 Aug 19:22] I am saying that is not an acceptable solution [15 Aug 19:22] i don't see why the root can't configure it [15 Aug 19:22] who needs more than 128 characters [15 Aug 19:22] dillon: i am telling them there are at least 3 approaches to dealing with it [15 Aug 19:22] we aren't saying "too bad" with no solution [15 Aug 19:22] all the solutions are trivial [15 Aug 19:23] all of your solutions involve forcing the user base to make configuration adjustments to their system if they want historical operation [15 Aug 19:23] that is not acceptable [15 Aug 19:23] dillon: why not? it's a new release. [15 Aug 19:23] we aren't making this change to 4.2 [15 Aug 19:24] and frankly, if you want to argue that, we can argue the reverse... that people who want strict checking and want programs to barf on input can then configure the system to do that. [15 Aug 19:24] but by default, the system will not barf on input [15 Aug 19:24] dillon: no, because your patch breaks regex by default [15 Aug 19:24] The idea that a whole slew of system utilities can now barf on input is unacceptable, period. [15 Aug 19:24] which is far more insidious than this perceived issue [15 Aug 19:24] executing setxkbmap /Xmodmap fucks up firefox rendering completely [15 Aug 19:24] also interesting [15 Aug 19:24] it isn't just regex, but any processing of an input stream. From editors to pipe programs to anything else. [15 Aug 19:26] surely I am not the only person that prefers output to be correct over failure [15 Aug 19:26] what good is staying up if the output is garbage and could fuck something up downstream? [15 Aug 19:26] i would really like to trust this [15 Aug 19:27] and I can't trust modifying streams for the sole purpose of keeping a root user from configuring his system correctly [15 Aug 19:30] we've been running like this for a week already. nobody seems to be reporting broken scripts ... [15 Aug 19:31] i will take the action to get with Bapt and lock dports down though. that's needed [15 Aug 19:33] its the unix way. if the input is garbage, no reason why the output shouldn't be garbage too [15 Aug 19:34] one person's use case does not a universe-make. And, again, the idea that normal system utilities can now barf on 'malformed input' would be a huge change and I think detrimental [15 Aug 19:34] mutt now cores dump trying to read my mails :-( [15 Aug 19:35] in strcmp() [15 Aug 19:35] dillon: this effects the world right? what are other OS's doing? [15 Aug 19:35] I shudder to think what grep would do on my mailbox files if my LANG was set to UTF-8 [15 Aug 19:36] well, that's what these conversations are all about. Clearly languages are the front-line, they want to process UTF-8 but they also want to process generic 8-bit data and they don't want to have to select between the two [15 Aug 19:36] and as you said, BSDs are only just now starting to deal with this for general utilities and look what a mess it has created already [15 Aug 19:36] it is clearly a problem [15 Aug 19:37] f*cking zsh seems to do the *output* wrong. [15 Aug 19:37] dillon: let's think about it this way: [15 Aug 19:37] up until now, the locale has been essentially ignored, sed used locale-c for comparisons regardless [15 Aug 19:38] export LANG=en_US.ISO8859-1 => all characters typed are shown as [15 Aug 19:38] we could set sed to do that be default, and use an option switch to enable locales [15 Aug 19:38] dillon: would that be reasonable? [15 Aug 19:38] not really [15 Aug 19:38] why not? it's the historical operation [15 Aug 19:38] you want to start special-casing individual utilities ? [15 Aug 19:38] utf-8 handling is brand new [15 Aug 19:39] dillon: well, is it so bad? [15 Aug 19:39] instead of fixing the input processing which automatically fixes all the utilities ? [15 Aug 19:39] I have been using utf-8 encoding with dragonfly since 2006 or so fwiw [15 Aug 19:39] dillon: i don't recognize this as a problem, that's the issue [15 Aug 19:39] my preference is to adjust the wchar conversion processing to code the illegal byte sequences [15 Aug 19:39] well, I do. And it's a big problem. [15 Aug 19:39] i haven't conceded this is some tragic thing [15 Aug 19:40] i think explicitly requesting utf-8 processing is not unreasonably [15 Aug 19:40] esp. given the potential impacts [15 Aug 19:40] you want to add an option to all the utilities to explicitly process as utf-8 ? [15 Aug 19:40] including third-party utilities ? [15 Aug 19:40] how many are we talkinga bout? [15 Aug 19:41] depends on the amount [15 Aug 19:41] if it's just sed and grep, I think it's okauy [15 Aug 19:41] it's a non-starter. It is far far better to make a few adjustments to libc's wchar conversion routines [15 Aug 19:41] you seem to think that this is a small problem effecting only a few utilities. I beg to differ. [15 Aug 19:41] you don't have concensus, at least from me. I almost always defer after a nice discussion, but i just don't agree [15 Aug 19:42] The last thing I want is to have to make adjustments to utility after utility after utility for the next 10 years [15 Aug 19:42] dillon: because i haven't seen any real world impacts/ [15 Aug 19:42] instead of making one adjustment to libc's wchar conversion routines and being done with it [15 Aug 19:42] are you joking? [15 Aug 19:42] no, I am not [15 Aug 19:42] what has happens thus far? [15 Aug 19:42] we've already seen multiple real-world impacts across the board, including this discussion right now [15 Aug 19:42] involving dfly [15 Aug 19:42] i don't care about go [15 Aug 19:42] or python [15 Aug 19:42] Well, a lot of other people disagree with you [15 Aug 19:42] this isn't about what you care about [15 Aug 19:43] i mean you are predicting some huge disaster mess with dfly administration [15 Aug 19:43] this is about what the community as a whole cares about, and for DragonFly, frankly, I need to have final word on this and my final word is that utilities which process input streams , like sed or grep or vi or whatever, should not barf [15 Aug 19:44] well.... if you want to handle things like that. [15 Aug 19:44] I don't see a problem with coding 8-bit clean to code points. [15 Aug 19:44] lack of consensus it he problem [15 Aug 19:45] I see no reason to wait for a 'consensus'. I say move forward with a solution that works, and if the rest of the world agrees on a different solution down the line we can adapt to it. [15 Aug 19:45] i think I'm okay with "sed can't break" but the solution isn't debatable [15 Aug 19:45] I will say that, clearly, just from what I see in these threads, there is going to have to be some sort of 8-bit clean solution [15 Aug 19:46] and it will have to work by default. [15 Aug 19:46] your solution is to break programs which never broke before. Not acceptable. [15 Aug 19:46] i did not say that at all [15 Aug 19:46] I say we should code to UTF-8B and if the world comes up with a different solution down the line, fine, we re-code to that solution. But not coding a solution at all is not acceptable. [15 Aug 19:46] <_hasso_!~hasso@0010-0000-0000-0000-58f1-8384-07d0-2001.dyn.estpak.ee> python-2.x supports utf-8, you just have to work with it differently [15 Aug 19:47] Just punting on it, or putting the burden on individual utilities instead of where it should be in the wchar API is not acceptable [15 Aug 19:47] wchar API? [15 Aug 19:48] i am starting to regret updating the regex [15 Aug 19:48] the wchar conversion routines that are at the core of the issue, converting a byte stream to wchar_t's and back again [15 Aug 19:49] /usr/src/lib/libc/locale/ [15 Aug 19:49] because the not everything can convert but that's what UTF-8 is [15 Aug 19:54] what's the actual 8B proposal? to map 80-FF to some unused but legal part of utf-8 ? [15 Aug 19:55] basically to map invalid UTF-8 bytes to U+DC00+n code points [15 Aug 19:56] low surrogates? [15 Aug 19:56] those cod points are reserved for the second part of a surrogate pair . Since it's not used in the first part, there is no confusion [15 Aug 19:56] I'm reading from the thread [15 Aug 19:56] https://news.ycombinator.com/item?id=10061028 [15 Aug 19:56] search for 'is to map an invalid' [15 Aug 19:57] so no regex would map to this ... [15 Aug 19:57] regex any char would of course i.e. '.' [15 Aug 19:57] i wouldn't mind the ability to do that, but not by default [15 Aug 19:58] e.g. I'd still modify sed to enable that [15 Aug 19:58] it would have to be the default. we could add an env variable for the strict mode you desire [15 Aug 19:58] but there's no point doing it if it isn't the default [15 Aug 19:58] utilities need to work out of the box, as in not crash on malformed input (for sed, grep, etc) [15 Aug 19:58] non-default plus modify sed / grep [15 Aug 19:58] assuming it's limited tools [15 Aug 19:58] if you want yours to crash on malformed input you can set the env [15 Aug 19:58] no, it isn't a limited set of tools [15 Aug 19:59] how many piping programs are there? do you want 'less' to crash too? [15 Aug 19:59] will it? [15 Aug 19:59] dozens, even hundreds or thousands if we get into dports [15 Aug 19:59] I dunno, but if third party utilities start to jump on the wchar bandwagon then it will become a bigger and bigger issue [15 Aug 19:59] is that really the impact? that would change the complexion of this [15 Aug 19:59] yes, that's really the impact. [15 Aug 20:00] and we haven't see it because? regex was limited to 1 byte until now? [15 Aug 20:00] you want a default that will cause us no end of trouble, pretty much forever. [15 Aug 20:00] I want a default that nibs that in the bud [15 Aug 20:00] just because you haven't personally seen it in your own use cases doesn't mean that it can't turn into a big problem [15 Aug 20:01] we've only just gotten this in and we are already hitting problems, and you want to assume that the problem set is some small limited fixed number ? [15 Aug 20:01] no , i am asking . this isn't a problem right now [15 Aug 20:01] I see the problem set as becoming an undending thorn in our sides and I want a solution that removes that thorn before it begins to fester [15 Aug 20:02] I don't want to deal with never-ending bug submissions and complaints , let alone haveto deal with special-casing individual utilities all so you can have your 'abort on malformed input' concept be the default [15 Aug 20:02] well, you want to plaster over it. persumably the culprit gets fixed eventually. [15 Aug 20:03] and when it does, it still works just as well as it worked before [15 Aug 20:03] but frankly, you are making some major assumptions as to the input sets [15 Aug 20:03] what if someone wants to grep their mailbox ? [15 Aug 20:03] okay, i guess I can live with DC00 [15 Aug 20:03] it's basically filtering the bad chars out [15 Aug 20:03] ok. I'll research in some more and then create a patch for people to try out [15 Aug 20:04] we definitely need a way to block the bad conversion [15 Aug 20:04] it's not a locale thing so it needs a diff. name [15 Aug 20:04] I'll keep the LOCALE_STRICT env concept intact [15 Aug 20:05] the default will be to convert it, however. Strict mode will not be the default [15 Aug 20:05] fine [15 Aug 20:06] STRICT_UTF8_CONVERSION? [15 Aug 20:06] seems a bit wordy. [15 Aug 20:06] but I won't quibble [15 Aug 20:06] NO_SURROGATE_UTF8 ? [15 Aug 20:07] I kinda like LOCALE_STRICT [15 Aug 20:07] it's not limited to locales [15 Aug 20:07] is it? [15 Aug 20:07] for all intents and purposes the world is moving to UTF-only, ultimately locales will only really support base ascii and UTF [15 Aug 20:07] maybe it is [15 Aug 20:08] to me it's turning off this non-standard conversion [15 Aug 20:08] and allowing failure modes to show, which are now masked [15 Aug 20:08] well, come up with a name if you don't like LOCALE_STRICT, and paste it. I'll check my IRC history when I get back from lunch [15 Aug 20:11] i'm not sure if 80-FF is the only thing that doesn't convert [15 Aug 20:12] probably 110000 - 1FFFFF as well [15 Aug 20:12] nvm [15 Aug 20:12] wrong direction [15 Aug 20:12] unless you take 4 bytes together? [15 Aug 20:13] yeah, 0x11 0x01 0x02 0x03 would be an illegal sequence [15 Aug 20:13] and that's a huge section, can't remap that [15 Aug 20:15] huh? [15 Aug 20:17] basically any sequence beginning in 0x80-0xFF which does not map cannonically to a code point [15 Aug 20:17] --> impy (~impy@78-22-147-131.access.telenet.be) joined the channel [15 Aug 20:18] dillon: utf8 ranges from 0000 0000 10FF FFFF [15 Aug 20:18] --> impy (~impy@78-22-147-131.access.telenet.be) joined the channel [15 Aug 20:18] anything more is also illegal [15 Aug 20:18] so a longer non-cannoical UTF-8 sequence (which is already illegal) is also assumed to be bad. That is, a longer UTF-8 sequence which can map to a shorter sequence [15 Aug 20:19] anything which the mbr/wcs API can't map back to the original input byte sequence would be escaped, basically. [15 Aug 20:22] hmm. maybe i'm confusing myself. I assume the goal is there is no possible input that can't be handled but not sure how that works. i'll just see what you come up with [15 Aug 20:22] it just seems like there are lots of ways to get an illseq error [15 Aug 20:24] --> zrj (~arch@84.240.17.161) joined the channel [15 Aug 20:30] haven't left for lunch yet. still researching. will go in a sec. [15 Aug 20:32] python's solution is a bit of a mess because it tries to make security exceptions for hidden byte sequences. That is a potential issue but I would argue that 8-bit-clean processing is more important. If one is converting back to ascii for e.g. file name paths, then security handling for those paths should be done in the ascii domain [15 Aug 20:38] interesting. this seems to be codified [15 Aug 20:38] https://en.wikipedia.org/wiki/UTF-8 [15 Aug 20:38] search for 'More recent converters' [15 Aug 20:38] <--- lunch. I'll continue reading when I get back [15 Aug 20:55] :( I tried customink to make a dfly shirt but it's >$25 per [15 Aug 20:55] only slightly cheaper in bulk [15 Aug 20:56] dillon: i;m out of pocket for a few hours [15 Aug 20:58] http://www.customink.com/designs/dflybsd/eqw0-00a7-r332/hotlink?pc=HL-142344&cm_mmc=hotlink-_-5-_-Body_txt-_-viewbutton [15 Aug 21:02] have you guys ever had dfly lock on startup right after it says newaliases: no recipients? [15 Aug 21:02] if I ctrl-c it continues fine [15 Aug 21:02] but if I don't it just hangs indefinitely [15 Aug 21:09] zach: hit ctrl-t to see what's running [15 Aug 21:10] will do [15 Aug 21:12] Studbolt: it's dma [15 Aug 21:29] zach: have you talked to corecode about it? [15 Aug 21:31] --> zrj (~arch@84.240.17.161) joined the channel [15 Aug 21:31] tuxillo: no [15 Aug 21:32] most of my dfly systems are VMs though too, so that may lower priority [15 Aug 21:33] it's not about priorities, it's about knowing what's going on [15 Aug 21:33] I've got 6 systems up and running simultaneously now though with dfly [15 Aug 21:33] more than my 1-2 from before [15 Aug 21:34] I'm also not sure how to debug it at the boot process short of giving you the contents of the config files that dma could be referencing [15 Aug 21:35] dunno, open a bug ticket in bugs.dragonflybsd.org including all the information you've got [15 Aug 21:57] lol [15 Aug 21:58] my kids watch shows teaching them shapes with things like is that a ? no, it's a . [15 Aug 21:58] so my oldest comes to me and often brings me or points out a shape, usually right, but he looks at me and asks, is that a rectangle? [15 Aug 21:58] and then he laughs and says, no, it's a daddy [15 Aug 22:04] --> RelativeK (~tsyesika@c83-250-129-79.bredband.comhem.se) joined the channel [15 Aug 22:11] awww [15 Aug 22:18] I have to take them to the park and I don't want to :\, they've been so crazy in the parks lately, watching both totally split brain is getting increasingly difficult given their increased size, strength, and speed [15 Aug 22:18] --> nighty (~nighty@hokuriku.rural-networks.com) joined the channel [15 Aug 22:30] leashes [15 Aug 22:42] zach: its likely that dma is stuck in the dns resolver [15 Aug 22:47] luxh: I guess I never told you my leash story, heh [15 Aug 22:47] --> Bluerise (~Bluerise@p2003006C2F56AE0074A394AE6CB1D43B.dip0.t-ipconnect.de) joined the channel [15 Aug 22:47] anyway, I think I might not be able to take them out anyway, I checked the weather, it's going to be 35.55C here within the hour [15 Aug 22:47] 96F [15 Aug 22:47] I think that's a bit much for a 2 and 4 year old in the park [16 Aug 00:12] I would like to put wlan_serialize_enter()/exit() around ieee80211_ifdetach() in if_iwn.c like it was done for ath to avoid a panic on kldunload [16 Aug 00:12] does that sound reasonable? [16 Aug 00:57] marino: http://apollo.backplane.com/DFlyMisc/locale01.patch [16 Aug 00:58] marino: this defaults to UTF-8B encoding for illegal sequences instead of erroring out. If setenv LOCALE_STRICT then the old error-out behavior will be used. [16 Aug 00:59] marino: I'm still doing some conversion testing to make sure it works as expected in all cases [16 Aug 01:14] jh32: hmm [16 Aug 01:15] jh32: theoretically... well, if it was done for ath then it should be ok for iwn [16 Aug 02:21] --> Daimao (~wired@cpe-45-48-33-56.socal.res.rr.com) joined the channel [16 Aug 02:57] blast [16 Aug 02:58] this API is messed up. I might not be able to properly encode it [16 Aug 02:58] <--- break [16 Aug 03:10] ok, it won't be possible to make the routines work byte-for-byte for encode and decode but basic munging will work. [16 Aug 03:11] the basic problem is mbrtowc() is supposed to collect partial bytes in code sequences, but if an error occurs in a 2-4 byte code sequence we can't drain the saved buffer to produce multiple output wchars because we would have to return 0 (tell the caller not to advance his input buffer) [16 Aug 03:11] and returning 0 has a special meaning, so we can't return 0. [16 Aug 03:56] locale01.patch updated. I've done some basic testing so far. It's probably the best we can do with the API. [16 Aug 03:57] UTF-8B encoding is used when possible, and the replacement character (U+FFFD) is used if a multi-byte sequence winds up being invalid and we couldn't rewind due to a state continuation. [16 Aug 03:57] http://apollo.backplane.com/DFlyMisc/locale01.patch [16 Aug 05:06] --> tkusumi (~tkusumi@i121-115-18-114.s41.a013.ap.plala.or.jp) joined the channel [16 Aug 06:05] So, I build mksh 51 from source (from mirbsd.org). Do I just need to replace my existing mksh and mksh.1? [16 Aug 06:05] built* [16 Aug 06:16] I'm trying to make sense of CVE 2014-7187. For kicks I ran the shellshock exploits against mksh R50. [16 Aug 06:17] and exploit that corresponds to CVE 2014-7187 seems to have worked, but I still have errors for the provided input (even after replacing bash with mksh). [16 Aug 06:18] Same goes for the recent R51. [16 Aug 06:25] or not use it at all [16 Aug 06:25] --> kerma (~kerma@87-92-243-246.bb.dnainternet.fi) joined the channel [16 Aug 06:25] I'm still not totally sure though, because mksh kicks a bad identifier error when I attempt to input it. [16 Aug 06:26] you'd have to look at their repo to see what fixes they've committed to it [16 Aug 06:27] Looks like it doesn't work when I attempt to do something that requires elevated permissions, though. [16 Aug 06:27] So, nevermind. [16 Aug 06:42] oops, I meant to execute the administrivia, not let it post. oh well [16 Aug 06:56] new locale01.patch up, fixes issue where terminator hidden in malformed byte sequence could wind up getting lost. [16 Aug 07:29] --> zrj (~arch@212.59.11.230) joined the channel [16 Aug 07:29] grrr. [16 Aug 07:35] this has no chance of working properly. Other UTF-8 decoders which implement UTF-8B use mbrtowc() and expect a -1 return for mis-codings. And python will even escape surrogates returned by mbrtowc(). So it will break python instantly. [16 Aug 07:55] --> ftigeot (~ftigeot@213.215.11.28) joined the channel [16 Aug 07:56] ftigeot: there is no way to modify the mbr*() and wcs*() functions to process 8-bit-clean data so we will have to adjust the regex library itself, at least for sed and grep and such, to make them work the way we want. [16 Aug 07:57] ftigeot: I'm still investigating but if we escape illegal byte codes in the mbr*() functions it will cause python's own escapes to break. [16 Aug 07:58] ftigeot: I'm working on a binary buffer encoding/decoding abstraction that these programs can call, instead of rolling our own in each program. [16 Aug 08:13] morning [16 Aug 08:14] do we absolutely have to interpret input data ? the previous dragonfly behavior was fine IMHO [16 Aug 08:16] LANG was only used to choose translation strings and as a hint of the character encoding to use and didn't do anything else [16 Aug 08:19] you mean go back to using "C" for regex ? [16 Aug 08:19] yeah, there was nothing wrong with it [16 Aug 08:20] won't that break for non-latin character sets ? [16 Aug 08:20] not sure which ones but I have been using UTF-8 on dragonfly for many, many years [16 Aug 08:20] like japanese or chinese [16 Aug 08:21] codes >= 0x100 [16 Aug 08:21] I had no trouble reading text [16 Aug 08:21] in terms of regex sequence specifications [16 Aug 08:21] I don't think I tried to use an input method to type asian characters though [16 Aug 08:24] all the wc API functions are really badly designed. That is, the standard is badly designed. It makes them almost unusable. [16 Aug 08:27] applications behave differently with LC_CTYPE=C than with no LC_CTYPE [16 Aug 08:27] (on master) [16 Aug 08:28] LC_CTYPE=C ls => ????? etc... [16 Aug 08:28] LC_CTYPE= ls => Σὲ γνωρίζω ἀπὸ τὴν κόψη [16 Aug 08:28] that was with LANG=el_GR.UTF-8 [16 Aug 08:28] that kinda sounds like iti s working as intended [16 Aug 08:29] It was better before [16 Aug 08:29] if I set LANG to en_US.UTF-8 I can't see the greek characters anymore [16 Aug 08:29] that's a different issue [16 Aug 08:29] on previous dragonfly versions, en_US.UTF-8 allowed non-ascii characters to be shown [16 Aug 08:29] that's basically what the openbsd commit changed [16 Aug 08:31] I can see french accented characters with en_US.UTF-8 so the new behavior is not consistent [16 Aug 08:32] what happens if you set LC_CTYPE to that rollup utf8 set that marino added ? [16 Aug 08:32] I guess... en_Comm_US.UTF-8 [16 Aug 08:32] try that [16 Aug 08:33] LC_CTYPE=xx_Comm_US.UTF-8 ls [16 Aug 08:33] -rw-r--r-- 1 ftigeot wheel 0B Aug 16 08:32 Зарегистрируйтесь [16 Aug 08:33] -rw-r--r-- 1 ftigeot wheel 0B Aug 16 08:25 Σὲ γνωρίζω ἀπὸ τὴν κόψη [16 Aug 08:33] -rw-r--r-- 1 ftigeot wheel 0B Aug 16 08:23 Зарегистрируйтесь сейчас на Десятую Международную Конференцию по Unicode [16 Aug 08:33] -rw-r--r-- 1 ftigeot wheel 0B Aug 16 08:30 aàeéooô.txt [16 Aug 08:33] well, I can't read what you post [16 Aug 08:33] works as expected :) [16 Aug 08:33] xterm should be able to display it [16 Aug 08:33] let me grab a picture [16 Aug 08:34] this irc program is filtering out what it thinks is unprintable [16 Aug 08:34] its ok [16 Aug 08:34] in anycase, this then comes down to whether we should require people to set LC_CTYPE that way, or whether we should adopt the openbsd commit that forces it that way [16 Aug 08:34] well, I created some files with names pasted from the UTF8-demo.txt file [16 Aug 08:35] we should have UTF-8 work fine by default IMHO [16 Aug 08:35] most people are apparently now using UTF-8, even with chinese and japanese characters [16 Aug 08:46] --> ftigeot_ (~ftigeot@2001:7a8:600:1:219:d1ff:fe81:28) joined the channel [16 Aug 08:54] dillon: it's still worth a screenshot => http://dl.wolfpond.org/UTF-8_xterm.jpg [16 Aug 08:55] fun [16 Aug 09:03] it's also interesting to see the en_US locale hides the english texts in runes (probably used by the witches of Salem) and braille (used by blind people) [16 Aug 09:14] dillon: fyi I'm working on generating the majority of the rollup [16 Aug 09:14] what I have in place now is good for the interim [16 Aug 09:15] dillon: so are we back to my idea of having sed use locale-c by default and add a switch for full utf-8 ? [16 Aug 09:15] dillon: also, grep is probably not affected, it has it's only regex I think [16 Aug 09:17] marino: I think at least for now, yes. Since I can't change the existing mbr*/wcs*() APIs if we want to support binary streams in these functions we will have to use an API extension for converting binary streams (i.e. do escaping) [16 Aug 09:17] marino: And we should seriously consider just making LC_CTYPE a generic UTF (your common UTF) if *any* UTF locale is specified, similar to what OpenBSD did. [16 Aug 09:17] marino: for things like ls [16 Aug 09:18] dillon: alternatively we can read env var like STRICT_LOCALE [16 Aug 09:18] (anything that uses isprint() equivalent for wchar) [16 Aug 09:18] dillon: openbsd just eliminated ISO* and other non-UTF-8 encodings [16 Aug 09:18] yah. Ignore my last locale01.patch, I have to rip most of that out, but I will keep the LOCALE_STRICT env and I am adding two API functions to do escaping [16 Aug 09:18] That's fine with me too [16 Aug 09:19] it did not address this specific/rollup [16 Aug 09:19] We don't need this other stuff [16 Aug 09:19] dillon: the other stuff is not hurting anything [16 Aug 09:19] But don't commit changes to libc/locale for that kind of rip-out yet [16 Aug 09:19] lemme get my API extensions in, then we'll rip the other junk out [16 Aug 09:19] also I found a bug in the existing utf8.c code. [16 Aug 09:20] dillon: the 4 / 6 thing? [16 Aug 09:20] that's one bug. a second bug is also in mbrtowc(): [16 Aug 09:20] if (wch < lbound) { ... really needs to be [16 Aug 09:20] if (wch < lbound || (wch & ~0x10ffff)) { [16 Aug 09:20] ... [16 Aug 09:20] dillon: i didn't follow exactly the backlog but it seemed like 8B just has too many issues to work [16 Aug 09:20] otherwise it can generate wchar's that the reverse function will say is illegal [16 Aug 09:20] dillon: ah yes [16 Aug 09:21] 8B works great actually, but not in the existing API functions [16 Aug 09:21] dillon: can you commit those two fixes? bapt will want them [16 Aug 09:21] it's completely unworkable in the existing API functions [16 Aug 09:21] marino: Yes, I will commit those ... ah, I'll do that tomorrow [16 Aug 09:21] or i can do it I guess [16 Aug 09:22] marino: I'm too tired and I'll have to re-merge my current work. The two fixes are trivial though so I will get them in tomorrow. (might be tomorrow evening though) [16 Aug 09:22] mnmm.. sure, go ahead. [16 Aug 09:22] the 6 -> 4, and the additional check along w/ lbound [16 Aug 09:22] ok [16 Aug 09:22] both in _UTF8_mbrtowc() [16 Aug 09:23] did you ever confirm less is also affected? [16 Aug 09:23] no [16 Aug 09:23] the only tool I know for sure is affected is sed [16 Aug 09:24] ultimately though things like less and more are going to wind up using all this stuff [16 Aug 09:24] i'm thinking we should just add LOCAL_STRICT detection to sed, and then see if there are any other real world issues [16 Aug 09:25] (meaning sed operates in locale c by default) [16 Aug 09:25] yes, I will have a locale_isstrict() API function in my commit set in libc/locale that sed will be able to call efficiently [16 Aug 09:25] hi [16 Aug 09:26] still not sure that should use "C" though, because collation ranges in the regex might need interpretation too [16 Aug 09:26] for a specific (asian) language [16 Aug 09:26] what we might end up doing is just using the binary converter and just say that it will use UTF-8B [16 Aug 09:26] dillon: alright. the implication was nothing else was needed but that was based on this api thing not really working [16 Aug 09:29] dillon: the sed-uses-C just mean to make sed works like it does on release 4.2 [16 Aug 09:30] meant* [16 Aug 09:30] yah, for now that's fine. We will revisit it later. [16 Aug 09:37] arg, i transferred 4.0 packages but forgot to swap them [16 Aug 09:48] ah, I was going to ask [16 Aug 09:48] done now [16 Aug 09:48] I have a new world + kernel compiled up fo pkgbox64 (compiled via /usr/src). It's ready to install and reboot [16 Aug 09:49] haven't installed and rebooted yet, can I ? [16 Aug 09:49] yes please [16 Aug 09:49] installing [16 Aug 09:50] can I reboot pkgbox64 ? [16 Aug 09:50] yep [16 Aug 09:50] ok, rebooting [16 Aug 09:53] dillon: utf8.c fixes pushed, thanks [16 Aug 09:53] ok, its up again [16 Aug 09:53] ok, cool [16 Aug 09:53] bapt needs it for his libc review [16 Aug 09:57] [gitweb-dfbsd] - libc/locale: limit utf8 illegal input detection to 10FF FFFF - http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/594d13a07466451297cbd6348924ef9db90f0e56 - John Marino [16 Aug 09:59] i should test mac and see how sed works there. they have same regex so they should have same issue unless they are also handling bad conversions [16 Aug 10:03] hiya [16 Aug 10:04] so gleb wants to hack up net80211 and drivers so it doens't have a primary network device [16 Aug 10:04] ie, there'd be no ath0, no iwn0, etc [16 Aug 10:04] you'd just clone it via some existing cloning mechanism (used for tun, tap, etc I think) and get wlanX interfaces [16 Aug 10:04] how do people feel about thaT? [16 Aug 10:05] adrian: I think it should have been done that way in the first place [16 Aug 10:05] yeah [16 Aug 10:05] I mostly do too [16 Aug 10:05] adrian: I don't mind reporting, even though it will take a week, if you guys do that. It really helps all the locking [16 Aug 10:05] There are just some leftover pieces (like debugging) [16 Aug 10:05] (or most of it anyhow) [16 Aug 10:05] that i have to shift around into cdevs for ioctls first [16 Aug 10:05] as there's no parent device to ioctl against anymore [16 Aug 10:05] ok [16 Aug 10:06] I'll update you when it's done [16 Aug 10:06] maybe when you're doing the next merge I can continue to fix up net80211 to be easier to port [16 Aug 10:07] You'll have to merge in a bunch of other clone related and ifcofnig bits, and some rc script changes to [16 Aug 10:07] too* [16 Aug 10:07] --> dflybot (~jsb@leaf.dragonflybsd.org) joined the channel [16 Aug 10:13] [gitweb-dfbsd] - utf8.c: Fix typo - http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/61730ce66a557e03bb9d82858c614ffd016097e0 - John Marino [16 Aug 10:14] <--- Zzzz time. Got the escaping API working, will integrate it tomorrow [16 Aug 10:14] or monday [16 Aug 10:15] --> kerma (~kerma@87-92-243-246.bb.dnainternet.fi) joined the channel [16 Aug 10:19] http://apollo.backplane.com/DFlyMisc/locale02.patch (not integrated w/ your commits yet) [16 Aug 10:19] works... tested with /dev/urandom for round-trip 8-bit clean [16 Aug 10:20] uses the python UTF-8B escaping method and also escapes any UTF-8 input surrogates (which are illegal for UTF-8 and would mess up the round-trip 8-bit clean feature) [16 Aug 10:21] test program: http://apollo.backplane.com/DFlyMisc/utfescape.c [16 Aug 10:21] (simple test program, makes a few assumptions) [16 Aug 10:21] ok< --- zzz for real [16 Aug 10:21] ok [16 Aug 10:21] night [16 Aug 10:27] i have a feeling that one this xx_Comm_US LC_CTYPE is generated, dillon is going to direct that all locales use it [16 Aug 10:27] (and remove xx_Comm_US) [16 Aug 10:27] s/that one/that once/ [16 Aug 10:28] and honestly I'm running out of reasons to oppose that ... [16 Aug 10:37] Well, this will solve the *potential* problem with the language "xx":) [16 Aug 10:37] y [16 Aug 10:38] and obviously one utf-8 locale will improve POLA [16 Aug 11:08] morning [16 Aug 11:14] hi [16 Aug 11:30] --> mneumann (~mneumann@nat-wh-kha.rz.uni-karlsruhe.de) joined the channel [16 Aug 11:39] --> buggs (~buggs@braetling.cip.ifi.lmu.de) joined the channel [16 Aug 11:49] --> tkusumi (~tkusumi@i121-115-18-114.s41.a013.ap.plala.or.jp) joined the channel [16 Aug 12:30] --> Cthulhux (~JKL@p4FE28501.dip0.t-ipconnect.de) joined the channel [16 Aug 14:06] [gitweb-dfbsd] - kernel/iwn: Grab the WLAN serializer around ieee80211_ifdetach() - http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/6e4c1236fb481784a286fa5bbe2c02a54ca9f656 - Johannes Hofmann [16 Aug 14:14] [digest-dfbsd] - Lazy Reading for 2015/08/16 - http://www.dragonflydigest.com/2015/08/16/16580.html - Justin Sherrill [16 Aug 14:55] ll [16 Aug 14:58] --> mneumann (~mneumann@nat-wh-kha.rz.uni-karlsruhe.de) joined the channel [16 Aug 15:57] --> stateless (~sin@saturn.2f30.org) joined the channel [16 Aug 17:06] morning [16 Aug 17:06] side note: now that pkgbox64 is done building stuff I've restarted gobuilder on it. and we're running sephes latest commits so I am hopeful it fixes the network related panic [16 Aug 17:08] I dunno about all the domains but certainly for CTYPE I think if any UTF locale is selected (i.e. not the default or "C"), we should just use the rollup utf8 for LC_CTYPE. [16 Aug 17:43] yo [16 Aug 17:44] --> Bluerise (~Bluerise@p2003006C2F56AE00583E664FFFFCAF98.dip0.t-ipconnect.de) joined the channel [16 Aug 20:00] --> marino (~marino@178.162.201.97) joined the channel [16 Aug 20:25] --> zrj (~arch@84.240.17.161) joined the channel [16 Aug 20:33] marino: new patch. Major functionality additions to new API functions + manual page entries [16 Aug 20:33] marino: http://apollo.backplane.com/DFlyMisc/locale02.patch [16 Aug 20:34] moin [16 Aug 20:34] marino: I plan to commit this later today [16 Aug 20:40] --> eadler (~toor@c-67-188-9-244.hsd1.ca.comcast.net) joined the channel [16 Aug 20:42] dillon: did I mess up utf.8 twice? [16 Aug 20:42] what ? [16 Aug 20:42] 10ffff vs 10ffffff [16 Aug 20:42] I will check [16 Aug 20:44] maximum 4-byte encoding is U+10000 to U+1FFFFF [16 Aug 20:44] allowed maximum is U+10FFFF [16 Aug 20:44] 4 byte encodings can encode 21 bits [16 Aug 20:45] hi [16 Aug 20:45] it all looks right to me... 10ffff and not 10fffff [16 Aug 20:45] what is utf.8 ? [16 Aug 20:45] everything in libc/locale looks ok [16 Aug 20:45] dillon: then line changed which means it's wrong in master (my fault) [16 Aug 20:46] which line ? [16 Aug 20:46] ~213 [16 Aug 20:46] on utf8.c [16 Aug 20:46] checking [16 Aug 20:47] ah, you did get that wrong [16 Aug 20:47] yeah, seems like [16 Aug 20:47] I was looking at my code, when I merged I just blindly made the adjustments so I didn't notice that you got it wrong [16 Aug 20:47] what a winner. 2 line patch and messed it up twice [16 Aug 20:48] go ahead and fix, I'll re-merge before I commit (unless you want me to just fix it when I commit) [16 Aug 20:48] k [16 Aug 20:55] [gitweb-dfbsd] - utf8.c: Fix second error of two-line patch - http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/95590d9d35b8d503fcd8e43a8615b109f83bde8a - John Marino [16 Aug 21:14] dillon: if current buildworld completes without issue, i'm going to upload a generated rollup utf8 along with the tool that generated it [16 Aug 21:14] i suppose at that point I can modify cldr2def tool to use it for all utf8 locales [16 Aug 21:17] sounds good [16 Aug 21:17] <-- lunch. will commit after I get back [16 Aug 21:36] --> kerma (~kerma@87-92-243-246.bb.dnainternet.fi) joined the channel [16 Aug 22:29] --> tcb (~quassel@163.17.63.188.dynamic.wline.res.cust.swisscom.ch) joined the channel [16 Aug 22:36] [gitweb-dfbsd] - Add locale tool to generate "rollup" UTF-8 src file - http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/775a693d971d3b6e2e5b71b5566451b0c6d2da0d - John Marino || Update common UTF-8 src file with generated one. - http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/c663aee353a5501a8c2031b3b558e45fc4144b17 - John Marino [16 Aug 22:39] --> freakazoi (~matt@pool-96-245-252-116.phlapa.fios.verizon.net) joined the channel [16 Aug 22:51] [gitweb-dfbsd] - rollup UTF-8: Manually add NO-BREAK_SPACE - http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/e9e78086c16862f7a18f42143846d8d62986691f - John Marino [16 Aug 23:37] hrm [17 Aug 00:05] --> profmakx (~profmakx@karp.morphism.de) joined the channel [17 Aug 00:15] --> marino (~marino@178.162.201.97) joined the channel [17 Aug 00:27] doing final buildworld test, then committing the two new API functions [17 Aug 00:52] --> tkusumi (~tkusumi@i121-115-18-114.s41.a013.ap.plala.or.jp) joined the channel [17 Aug 00:53] [gitweb-dfbsd] - locale - Add two new API functions - http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/8a84c799639d2e842952ce56fad530a085b16712 - Matthew Dillon [17 Aug 01:03] marino: I'm all committed, we can sync things up and remove the other encodings [17 Aug 01:03] [gitweb-dfbsd] - locale gen tools: Set all UTF-8 to same rollup CTYPE - http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/236ac5fcb19a88c43de00e2ab309f57cb3e2d912 - John Marino || UTF-8 locales: Change all to use single master CTYPE file - http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/12ce37e1e1b34c18b52ec5aa73af0cd9c38d93a7 - John - 1 more [17 Aug 01:11] --> marino (~marino@36.Red-83-37-188.dynamicIP.rima-tde.net) joined the channel [17 Aug 01:18] --> marino (~marino@36.Red-83-37-188.dynamicIP.rima-tde.net) joined the channel [17 Aug 01:19] dillon: there's only one UTF-8 CTYPE now [17 Aug 01:19] YRabbit: xx_Comm_US went away so you'll have to reconfigure [17 Aug 01:21] marino: awesome. I think that will work well [17 Aug 01:21] my main object was erased when the rollup was generated from CLDR definition files [17 Aug 01:21] *objection [17 Aug 01:22] can we get rid of the non-UTF8 locale support code? e.g. big5, euc, gb18030, gb2312, gbk, mskanji ? [17 Aug 01:22] and just have "C" and UTF8-based support ? [17 Aug 01:22] dillon: I don't see a reason to [17 Aug 01:23] it's not hurting anything [17 Aug 01:23] well, I'm not going to write support functions for my new API function for them, too much work. [17 Aug 01:23] But the real question is... does anyone evne use them ? [17 Aug 01:23] because if not, we should scrap them. they're unnecessary weight [17 Aug 01:24] syscons doesn't support multibyte [17 Aug 01:24] i don't think you need to put any effort to them [17 Aug 01:25] i also don't think they are useless or redundant but i'm sure people are quick to disagree [17 Aug 01:27] i don't feel too qualified to comment on how asian languages are used anyway [17 Aug 01:27] maybe if sephe and tkusumi weigh in and swear it's only UTF-8 , we can trash them [17 Aug 01:28] (I'd like to keep ISO-x though) [17 Aug 01:31] mskanji (aka shift-jis) is still widely used (for historical reason i think) though i personally don't care since i only use LANG=C anyway. [17 Aug 01:41] maybe not. if shift-jis==mcrosoft-kanji then yes, but if != then maybe no. not sure since i don't know any details of these. [17 Aug 01:41] https://en.wikipedia.org/wiki/Shift_JIS [17 Aug 01:41] hmm [17 Aug 01:42] tkusumi: there is ja_JP.SJIS [17 Aug 01:43] yah. There seem to be numerous different encodings for several asian languages. So there's also ja_JP.eucJP, and of course ja_JP.UTF-8 [17 Aug 01:44] marino: yes. may be we can scrap ms but not sure. [17 Aug 01:44] dillon: i'd suggest revisiting in a year. libc can handle them, they work [17 Aug 01:44] we just don't do any new support