mozilla

Mozilla Nederland LogoDe Nederlandse
Mozilla-gemeenschap

The USA Freedom Act and Browsing History

Mozilla Blog - vr, 22/05/2020 - 17:29

Last Thursday, the US Senate voted to renew the USA Freedom Act which authorizes a variety of forms of national surveillance. As has been reported, this renewal does not include an amendment offered by Sen. Ron Wyden and Sen. Steve Daines that would have explicitly prohibited the warrantless collection of Web browsing history. The legislation is now  being considered by the House of Representatives and today Mozilla and a number of other technology companies sent a letter urging them to adopt the Wyden-Daines language in their version of the bill. This post helps fill in the technical background of what all this means.

Despite what you might think from the term “browsing history,” we’re not talking about browsing data stored on your computer. Web browsers like Firefox store, on your computer, a list of the places you’ve gone so that you can go back and find things and to help provide better suggestions when you type stuff in the awesomebar. That’s how it is that you can type ‘f’ in the awesomebar and it might suggest you go to Facebook.

Browsers also store a pile of other information on your computer, like cookies, passwords, cached files, etc. that help improve your browsing experience and all of this can be used to infer where you have been. This information obviously has privacy implications if you share a computer or if someone gets access to your computer, and most browsers provide some sort of mode that lets you surf without storing history (Firefox calls this Private Browsing). Anyway, while this information can be accessed by law enforcement if they have access to your computer, it’s generally subject to the same conditions as other data on your computer and those conditions aren’t the topic at hand.

In this context, what “web browsing history” refers to is data which is stored outside your computer by third parties. It turns out there is quite a lot of this kind of data, generally falling into four broad categories:

  • Telecommunications metadata. Typically, as you browse the Internet, your Internet Service Provider (ISP) learns every website you visit. This information leaks via a variety of channels (DNS lookups), the IP address of sites, TLS Server Name Indication (SNI), and then ISPs have various policies for how much of this data they log and for how long. Now that most sites have TLS Encryption this data generally will be just the name of the Web site you are going to, but not what pages you go to on the site. For instance, if you go to WebMD, the ISP won’t know what page you went to, they just know that you went to WebMD.
  • Web Tracking Data. As is increasingly well known, a giant network of third party trackers follows you around the Internet. What these trackers are doing is building up a profile of your browsing history so that they can monetize it in various ways. This data often includes the exact pages that you visit  and will be tied to your IP address and other potentially identifying information.
  • Web Site Data. Any Web site that you go to is very likely to keep extensive logs of everything you do on the site, including what pages you visit and what links you click. They may also record what outgoing links you click. For instance, when you do searches, many search engines record not just the search terms, but what links you click on, even when they go to other sites. In addition, many sites include various third party analytics systems which themselves may record your browsing history or even make a recording of your behavior on the site, including keystrokes, mouse movements, etc. so it can be replayed later.
  • Browser Sync Data. Although the browsing history stored on your computer may not be directly accessible, many browsers offer a “sync” feature which lets you share history, bookmarks, passwords, etc. between browser instances (such as between your phone and your laptop). This information has to be stored on a server somewhere and so is potentially accessible. By default, Firefox encrypts this data by default, but in some other browsers you need to enable that feature yourself.

So there’s a huge amount of very detailed data about people’s browsing behavior sitting out there on various servers on the Internet. Because this is such sensitive information, in Mozilla’s products we try to minimize how much of it is collected with features such as encrypted sync (see above) or enhanced tracking protection. However, even so there is still far too much data about user browsing behavior being collected and stored by a variety of parties.

This information isn’t being collected for law enforcement purposes but rather for a variety of product and commercial reasons. However, the fact that it exists and is being stored means that it is accessible to law enforcement if they follow the right process; the question at hand here is what that process actually is, and specifically in the US what data requires a warrant to access — demanding a showing of ‘probable cause’ plus a lot of procedural safeguards — and what can be accessed with a more lightweight procedure. A more detailed treatment of this topic can be found in this Lawfare piece by Margaret Taylor, but at a high level, the question turns on whether data is viewed as content or metadata, with content generally requiring a more heavyweight process and a higher level of evidence.

Unfortunately, historically the line between content and metadata hasn’t been incredibly clear in the US courts. In some cases the sites you visit (e.g., www.webmd.com) are treated as metadata, in which case that data would not require a warrant. By contrast, the exact page you went to on WebMD would be content and would require a warrant. However, the sites themselves reveal a huge amount of information about you. Consider, for instance, the implications of having Ashley Madison or Stormfront in your browsing history. The Wyden-Daines amendment would have resolved that ambiguity in favor of requiring a warrant for all  Web browsing history and search history. If the House reauthorizes USA Freedom without this language, we will be left with this somewhat uncertain situation but one where in practice much of people’s activity on the Internet  — including activity which they would rather keep secret —  may be subject to surveillance without a warrant.

The post The USA Freedom Act and Browsing History appeared first on The Mozilla Blog.

Categorieën: Mozilla-nl planet

Protecting Search and Browsing Data from Warrantless Access

Mozilla Blog - vr, 22/05/2020 - 17:20

As the maker of Firefox, we know that browsing and search data can provide a detailed portrait of our private lives and needs to be protected. That’s why we work to safeguard your browsing data, with privacy features like Enhanced Tracking Protection and more secure DNS.

Unfortunately, too much search and browsing history still is collected and stored around the Web. We believe this data deserves strong legal protections when the government seeks access to it, but in many cases that protection is uncertain.

The US House of Representatives will have the opportunity to address this issue next week when it takes up the USA FREEDOM Reauthorization Act (H.R. 6172). We hope legislators will amend the bill to limit government access to internet browsing and search history without a warrant.

The letter in the link below, sent today from Mozilla and other companies and internet organizations, calls on the House to preserve this essential aspect of personal privacy online.

Read our letter (PDF)

 

The post Protecting Search and Browsing Data from Warrantless Access appeared first on The Mozilla Blog.

Categorieën: Mozilla-nl planet

Request for comment: how to collaboratively make trustworthy AI a reality

Mozilla Blog - do, 14/05/2020 - 21:41

A little over a year ago, I wrote the first of many posts arguing: if we want a healthy internet — and a healthy digital society — we need to make sure AI is trustworthy. AI, and the large pools of data that fuel it, are central to how computing works today. If we want apps, social networks, online stores and digital government to serve us as people — and as citizens — we need to make sure the way we build with AI has things like privacy and fairness built in from the get go.

Since writing that post, a number of us at Mozilla — along with literally hundreds of partners and collaborators — have been exploring the questions: What do we really mean by ‘trustworthy AI’? And, what do we want to do about it?

 How do we collaboratively make trustworthy AI a reality? 

Today, we’re kicking off a request for comment on  v0.9 of Mozilla’s Trustworthy AI Whitepaper — and on the accompanying theory of change diagram that outlines the things we think need to happen. While I have fallen out of the habit, I have traditionally included a simple diagram in my blog posts to explain the core concept I’m trying to get at. I would like to come back to that old tradition here:

This cartoonish drawing gets to the essence of where we landed in our year of exploration: ‘agency’ and ‘accountability’ are the two things we need to focus on if we want the AI that surrounds us everyday to be more trustworthy. Agency is something that we need to proactively build into the digital products and services we use — we need computing norms and tech building blocks that put agency at the forefront of our design process. Accountability is about having effective ways to react if things go wrong — ways for people to demand better from the digital products and services we use everyday and for governments to enforce rules when things go wrong. Of course, I encourage  you to look at the full (and fancy) version of our theory of change diagram — but the fact that ‘agency’ (proactive) and ‘accountability’ (reactive) are the core, mutually reinforcing parts of our trustworthy AI vision is the key thing to understand.

In parallel to developing our theory of change, Mozilla has also been working closely with partners over the past year to show what we mean by trustworthy AI, especially as it relates to consumer internet technology. A significant portion of our 2019 Internet Health Report was dedicated to AI issues. We ran campaigns to: pressure platforms like YouTube to make sure their content recommendations don’t promote misinformation; and call on Facebook and others to open up APIs to make political ad targeting more transparent. We provided consumers with a critical buying guide for AI-centric smart home gadgets like Amazon Alexa. We invested ~$4M in art projects and awarded fellowships to explore AI’s impact on society. And, as the world faced a near universal health crisis, we asked  questions about how issues like AI, big data and privacy will play during — and after — the pandemic. As with all of Mozilla’s movement building work, our intention with our trustworthy AI efforts is to bias towards action and working with others.

A request for comments

It’s with this ‘act + collaborate’ bias in mind that we are embarking on a request for comments on v0.9 of the Mozilla Trustworthy AI Whitepaper. The paper talks about how industry, regulators and citizens of the internet can work together to build more agency and accountability into our digital world. It also talks briefly about some of the areas where Mozilla will focus, knowing that Mozilla is only one small actor in the bigger picture of shifting the AI tide.

Our aim is to use the current version of this paper as a foil for improving our thinking and — even more so — for identifying further opportunities to collaborate with others in building more trustworthy AI. This is why we’re using the term ‘request for comment’ (RFC). It is a very intentional hat tip to a long standing internet tradition of collaborating openly to figure out how things should work. For decades, the RFC process has been used by the internet community to figure out everything from standards for sharing email across different computer networks to best practices for defeating denial of service attacks. While this trustworthy AI effort is not primarily about technical standards (although that’s part of it), it felt (poetically) useful to frame this process as an RFC aimed at collaboratively and openly figuring out how to get to a world where AI and big data work quite differently than they do today.

We’re imagining that Mozilla’s trustworthy AI request for comment process includes three main steps, with the first step starting today.

Step 1: partners, friends and critics comment on the white paper

During this first part of the RFC, we’re interested in: feedback on our thinking; further examples to flesh out our points, especially from sources outside Europe and North America; and ideas for concrete collaboration.

The best way to provide input during this part of the process is to put up a blog post or some other document reacting to what we’ve written (and then share it with us). This will give you the space to flesh out your ideas and get them in front of both Mozilla (send us your post!) and a broader audience. If you want something quicker, there is also an online form where you can provide comments. We’ll be holding a number of online briefings and town halls for people who want to learn about and comment on the content in the paper — sign up through the form above to find out more. This phase of the process starts today and will run through September 2020.

Step 2: collaboratively map what’s happening — and what should happen 

Given our focus on action, mapping out real trustworthy AI work that is already happening — and that should happen — is even more critical than honing frameworks in the white paper. At a baseline, this means collecting information about educational programs, technology building blocks, product prototypes, consumer campaigns and emerging government policies that focus on making trustworthy AI a reality.

The idea is that the ‘maps’ we create will be a resource for both Mozilla and the broader field. They will help Mozilla direct its fellows, funding and publicity efforts to valuable projects. And, they will help people from across the field see each other so they can share ideas and collaborate completely independently of our work.

Process-wise, these maps will be developed collaboratively by Mozilla’s Insights Team with involvement of people and organizations from across the field. Using a mix of feedback from the white paper comment process (step 1) and direct research, they will develop a general map of who is working on key elements of trustworthy AI. They will also develop a deeper landscape analysis on the topic of data stewardship and alternative approaches to data governance. This work will take place from now until November 2020.

Step 3: do more things together, and update the paper

The final — and most important — part of the process will be to figure out where Mozilla can do more to support and collaborate with others. We already know that we want to work more with people who are developing new approaches to data stewardship, including trusts, commons and coops. We see efforts like these as foundational building blocks for trustworthy AI. Separately, we also know that we want to find ways to support African entrepreneurs, researchers and activists working to build out a vision of AI for that continent that is independent of the big tech players in the US and China. Through the RFC process, we hope to identify further areas for action and collaboration, both big and small.

Partnerships around data stewardship and AI in Africa are already being developed by teams within Mozilla. A team has also been tasked with identifying smaller collaborations that could grow into something bigger over the coming years. We imagine this will happen slowly through suggestions made and patterns identified during the RFC process. This will then shape our 2021 planning — and will feed back into a (hopefully much richer) v1.0 of the whitepaper. We expect all this to be done by the end of 2020.

Mozilla cannot do this alone. None of us can

As noted above: the task at hand is to collaboratively and openly figure out how to get to a world where AI and big data work quite differently than they do today. Mozilla cannot do this alone. None of us can. But together we are much greater than the sum of our parts. While this RFC process will certainly help us refine Mozilla’s approach and direction, it will hopefully also help others figure out where they want to take their efforts. And, where we can work together. We want our allies and our community not only to weigh in on the white paper, but also to contribute to the collective conversation about how we reshape AI in a way that lets us build — and live in — a healthier digital world.

PS. A huge thank you to all of those who have collaborated with us thus far and who will continue to provide valuable perspectives to our thinking on AI.

The post Request for comment: how to collaboratively make trustworthy AI a reality appeared first on The Mozilla Blog.

Categorieën: Mozilla-nl planet

Welcome Adam Seligman, Mozilla’s new Chief Operating Officer

Mozilla Blog - wo, 13/05/2020 - 18:00

I’m excited to announce that Adam Seligman has joined Mozilla as our new Chief Operating Officer. Adam will work closely with me to help scale our businesses, growing capabilities, revenue and impact to fulfill Mozilla’s mission in service to internet users around the world.

Our goal at Mozilla is to build a better internet. To provide products and services that people flock to, and that elevate a safer, more humane, less surveillance and exploitation-based reality. To do this —  especially now — we need to engage with our customers and other technologists; ideate, test, iterate and ship products; and develop revenue sources faster than we’ve ever done.

Adam has a proven track record of building businesses and communities in the technology space. With a background in computer science, Adam comes to Mozilla with nearly two decades of experience in our industry. He managed a $1B+ cloud platform at Salesforce, led developer relations at Google and was a key member of the web platform strategy team at Microsoft.

Adding Adam to our team will accelerate our ability to solve big problems of online life, to create product offerings that connect to consumers, and to develop revenue models in ways that align with our mission. Adam is joining Mozilla at a time when people are more reliant than ever on the internet, but also questioning the impact of technology on their lives. They are looking for leadership and solutions from organizations like Mozilla. Adam will help grow Mozilla’s capacity to offer a range of products and services that meet people’s needs for online life.

“The open internet has brought incredible changes to our lives, and what we are witnessing now is a massive acceleration,” said Adam Seligman, Mozilla’s new Chief Operating Officer. “The open internet is keeping our industries running and our children learning. It’s our lifeline to family and friends, and it’s our primary source of information. It powers everything from small business to social movements. I want to give back to an internet that works for people — not against them. And there is no better champion for a people-first vision of the internet than Mozilla.”

In his capacity as Chief Operating Officer, Adam will lead the Pocket, Emerging Technologies, Marketing and Open Innovation teams to accelerate product growth and profitable business models, and work in close coordination with Dave Camp and the Firefox organization to do the same.

I eagerly look forward to working together with Adam to navigate these troubled times and build a vibrant future for Mozilla’s product and mission.

The post Welcome Adam Seligman, Mozilla’s new Chief Operating Officer appeared first on The Mozilla Blog.

Categorieën: Mozilla-nl planet

What the heck happened with .org?

Mozilla Blog - di, 12/05/2020 - 02:19

If you are following the tech news, you might have seen the announcement that ICANN withheld consent for the change of control of the Public Interest Registry and that this had some implications for .org.  However, unless you follow a lot of DNS inside baseball, it might not be that clear what all this means. This post is intended to give a high level overview of the background here and what happened with .org. In addition, Mozilla has been actively engaged in the public discussion on this topic; see here for a good starting point.

The Structure and History of Internet Naming

As you’ve probably noticed, Web sites have names like “mozilla.org”, “google.com”, etc. These are called “domain names.” The way this all works is that there are a number of “top-level domains” (.org, .com, .io, …) and then people can get names within those domains (i.e., that end in one of those). Top level domains (TLDs) come in two main flavors:

  • Country-code top-level domains (ccTLDs) which represent some country or region like .us (United States), .uk (United Kingdom, etc.)

Back at the beginning of the Internet, there were five gTLDs which were intended to roughly reflect the type of entity registering the name:

  • .com: for “commercial-related domains”
  • .edu: for educational institutions
  • .gov: for government entities (really, US government entities)
  • .mil: for the US Military (remember, the Internet came out of US government research)
  • .org: for organizations (“any other domains”)

It’s important to remember that until the 90s, much of the Internet ran under an Acceptable Use Policy which discouraged/forbid commercial use and so these distinctions were inherently somewhat fuzzy, but nevertheless people had the rough understanding that .org was for non-profits and the like and .com was for companies.

During this period the actual name registrations were handled by a series of government contractors (first SRI and then Network Solutions) but the creation and assignment of the top-level domains was under the control of the Internet Assigned Numbers Authority (IANA), which in practice, mostly meant the decisions of its Director, Jon Postel. However, as the Internet became bigger, this became increasingly untenable especially as IANA was run under a contract to the US government. Through a long and somewhat complicated series of events, in 1998 this responsibility was handed off to the Internet Corporation for Assigned Names and Numbers (ICANN), which administers the overall system, including setting the overall rules and determining which gTLDs will exist (which ccTLDs exist is determined by ISO 3166-1 country codes, as described in RFC 1591). ICANN has created a pile of new gTLDs, such as .dev, .biz, and .wtf (you may be wondering whether the world really needed .wtf, but there it is). As an aside, many of the newer names you see registered are not actually under gTLDs, but rather ccTLDs that happen to correspond to countries lucky enough to have cool sounding country codes. For instance, .io is actually the British Indian Ocean’s TLD and .tv belongs to Tuvalu.

One of the other things that ICANN does is determine who gets to run each TLD. The way this all works is that ICANN determines who gets to be the registry, i.e., who keeps the records of who has which name as well as some of the technical data needed to actually route name lookups. The actual work of registering domain names is done by a registrar, who engages with the customer. Importantly, while registrars compete for business at some level (i.e., multiple people can sell you a domain in .com), there is only one registry for a given TLD and so they don’t have any price competition within that TLD; if you want a .com domain, VeriSign gets to set the price floor. Moreover, ICANN doesn’t really try to keep prices down; in fact, they recently removed the cap on the price of .org domains (bringing it in line with most other TLDs). One interesting fact about these contracts is that they are effectively perpetual: the contracts themselves are for quite long terms and registry agreements typically provide for automatic renewal except under cases of significant misbehavior by the registry. In other words, this is a more or less permanent claim on the revenues for a given TLD.

The bottom line here is that this is all quite lucrative. For example, in FY19 VeriSign’s revenue was over $1.2 billion. ICANN itself makes money in two main ways. First, it takes a cut of the revenue from each domain registration and second it auctions off the contracts for new gTLDs if more than one entity wants to register them. In the fiscal year ending in June 2018, ICANN made $136 million in total revenues (it was $302 million the previous year due to a large amount of revenue from gTLD auctions).

ISOC and .org

This brings us to the story of ISOC and .org. Until 2003, VeriSign operated .com, .net, and .org, but ICANN and VeriSign agreed to give up running .org (while retaining the far more profitable .com). As stated in their proposal:

As a general matter, it will largely eliminate the vestiges of special or unique treatment of VeriSign based on its legacy activities before the formation of ICANN, and generally place VeriSign in the same relationship with ICANN as all other generic TLD registry operators. In addition, it will return the .org registry to its original purpose, separate the contract expiration dates for the .com and .net registries, and generally commit VeriSign to paying its fair share of the costs of ICANN without any artificial or special limits on that responsibility.

The Internet Society (ISOC) is a nonprofit organization with the mission to support and promote “the development of the Internet as a global technical infrastructure, a resource to enrich people’s lives, and a force for good in society”. In 2002, they submitted one of 11 proposals to take over as the registry for .org and ICANN ultimately selected them. ICANN had a list of 11 criteria for the selection and the board minutes are pretty vague on the reason for selecting ISOC, but at the time this was widely understood as ICANN using the .org contract to provide a subsidy for ISOC and ISOC’s work. In any case, it ended up being quite a large subsidy: in 2018, PIR’s revenue from .org was over $92 million.

The actual mechanics here are somewhat complicated: it’s not like ISOC runs the registry itself. Instead they created a new non-profit subsidiary, the Public Interest Registry (PIR), to hold the contract with ICANN to manage .org. PIR in turn contracts the actual operations to Afilias, which is also the registry for a pile of other domains in their own right. [This isn’t an uncommon structure. For instance, VeriSign is the registry for .com, but they also run .tv for Tuvalu.] This will become relevant to our story shortly. Additionally, in the summer of 2019, PIR’s ten year agreement with ICANN renewed, but under new terms: looser contractual conditions to mirror those for the new gTLDs (yes, including .wtf), including the removal of a price cap and certain other provisions.

The PIR Sale

So, by 2018, ISOC was sitting on a pretty large ongoing revenue stream in the form of .org registration fees. However, ISOC management felt that having essentially all of their funding dependent on one revenue source was unwise and that actually running .org was a mismatch with ISOC’s main mission. Instead, they entered into a deal to sell PIR (and hence the .org contract) to a private equity firm called Ethos Capital, which is where things get interesting.

Ordinarily, this would be a straightforward-seeming transaction, but under the terms of the .org Registry Agreement, ISOC had to get approval from ICANN for the sale (or at least for PIR to retain the contract):

7.5              Change of Control; Assignment and Subcontracting.  Except as set forth in this Section 7.5, neither party may assign any of its rights and obligations under this Agreement without the prior written approval of the other party, which approval will not be unreasonably withheld.  For purposes of this Section 7.5, a direct or indirect change of control of Registry Operator or any subcontracting arrangement that relates to any Critical Function (as identified in Section 6 of Specification 10) for the TLD (a “Material Subcontracting Arrangement”) shall be deemed an assignment.

Soon after the proposed transaction was announced, a number of organizations (especially Access Now and EFF) started to surface concerns about the transaction. You can find a detailed writeup of those concerns here but I think a fair summary of the argument is that .org was special (and in particular that a lot of NGOs relied on it) and that Ethos could not be trusted to manage it responsibly. A number of concerns were raised, including that Ethos might aggressively raise prices in order to maximize their profit or that they could be more susceptible to governmental pressure to remove the domain names of NGOs that were critical of them. You can find Mozilla’s comments on the proposed sale here. The California Attorney General’s Office also weighed in opposing the sale in a letter that implied it might take independent action to stop it, saying:

This office will continue to evaluate this matter, and will take whatever action necessary to protect Californians and the nonprofit community.

In turn, Ethos and ISOC mounted a fairly aggressive PR campaign of their own, including creating a number of new commitments intended to alleviate concerns that had been raised, such as a new “Stewardship Council” with some level of input into privacy and policy decisions, an amendment to the operating agreement with ICANN to provide for additional potential oversight going forward, and a promise not to raise prices by more than 10%/year for 8 years. At the end of the day these efforts did not succeed: ICANN announced on April 30 that they would withhold consent for the deal (see here for their reasoning).

What Now?

As far as I can tell, this decision merely returns the situation to the status quo ante (see this post by Milton Mueller for some more detailed analysis). In particular, ISOC will continue to operate PIR and be able to benefit from the automatic renewal (and the agreement runs through 2029 in any case). To the extent to which you trusted PIR to manage .org responsibly a month ago, there’s no reason to think that has changed (of course, people’s opinions may have changed because of the proposed sale). However, as Mueller points out, none of the commitments that Ethos made in order to make the deal more palatable apply here, and in particular, thanks to the new contract in 2019, PIR ISOC is free to raise prices without being bound by the 10% annual commitment that Ethos had offered.

It’s worth noting that “Save dot Org” at least doesn’t seem happy to leave .org in the hands of ISOC and in particular has called for ICANN to rebid the contract. Here’s what they say:

This is not the final step needed for protecting the .Org domain. ICANN must now open a public process for bids to find a new home for the .Org domain. ICANN has established processes and criteria that outline how to hold a reassignment process. We look forward to seeing a competitive process and are eager to support the participation in that process by the global nonprofit community.

For ICANN to actually try to take .org away from ISOC seems like it would be incredibly contentious and ICANN hasn’t given any real signals about what they intend to do here. It’s possible they will try to rebid the contract (though it’s not clear to me exactly whether the contract terms really permit this) or that they’ll just be content to leave things as they are, with ISOC running .org through 2029.

Regardless of what the Internet Society and ICANN choose to do here, I think that this has revealed the extent to which the current domain name ecosystem depends on informal understandings of what the various actors are going to do, as opposed to formal commitments to do them. For instance, many opposed to the sale seem to have expected that ISOC would continue to manage .org in the public interest and felt that the Ethos sale threatened that. However, as a practical matter the registry agreement doesn’t include any such obligation and in particular nothing really stops them from raising prices much higher in order to maximize profit as opponents argued Ethos might do (although ISOC’s nonprofit status means they can’t divest those profits directly). Similarly, those who were against the sale and those who were in favor of it seem to have had rather radically different expectations about what ICANN was supposed to do (actively ensure that .org be managed in a specific way versus just keep the system running with a light touch) and at the end of the day were relying on ICANN’s discretion to act one way or the other. It remains to be seen whether this is an isolated incident or whether this is a sign of a deeper disconnect that will cause increasing friction going forward.

The post What the heck happened with .org? appeared first on The Mozilla Blog.

Categorieën: Mozilla-nl planet

William Lachance: A principled reorganization of docs.telemetry.mozilla.org

Mozilla planet - ma, 11/05/2020 - 17:44

(this post is aimed primarily at an internal audience, but I thought I’d go ahead and make it public as a blog post)

I’ve been thinking a bunch over the past few months about the Mozilla data organization’s documentation story. We have a first class data platform here at Mozilla, but using it to answer questions, especially for newer employees, can be quite intimidating. As we continue our collective journey to becoming a modern data-driven organization, part of the formula for unlocking this promise is making the tools and platforms we create accessible to a broad internal audience.

My data peers are a friendly group of people and we have historically been good at answering questions on forums like the #fx-metrics slack channel: we’ll keep doing this. That said, our time is limited: we need a common resource for helping bring people up to speed on how to use the data platform to answer common questions.

Our documentation site, docs.telemetry.mozilla.org, was meant to be this resource: however in the last couple of years an understanding of its purpose has been (at least partially) lost and it has become somewhat overgrown with content that isn’t very relevant to those it’s intended to help.

This post’s goal is to re-establish a mission for our documentation site — towards the end, some concrete proposals on what to change are also outlined.

Setting the audience

docs.telemetry.mozilla.org was and is meant to be a resource useful for data practitioners within Mozilla.

Examples of different data practioners and their use cases:

  • A data scientist performing an experiment analysis
  • A data analyst producing a report on the effectiveness of a recent marketing campaign
  • A Firefox engineer trying to understand the performance characteristics of a new feature
  • A technical product manager trying to understand the characteristics of a particular user segment
  • A quality assurance engineer trying to understand the severity of a Firefox crash

There are a range of skills that these different groups bring to the table, but there are some common things we expect a data practitioner to have, whatever their formal job description:

  • At least some programming and analytical experience
  • Comfortable working with and understanding complex data systems with multiple levels of abstraction (for example: the relationship between the Firefox browser which produces Telemetry data and the backend system which processes it)
  • The time necessary to dig into details

This also excludes a few groups:

  • Senior leadership or executives: they are of course free to use docs.telemetry.mozilla.org if helpful, but it is expected that the type of analytical work covered by the documentation will normally be done by a data practitioner and that relevant concepts and background information will be explained to them (in the form of high-level dashboards, presentations, etc.).
  • Data Engineering: some of the material on docs.telemetry.mozilla.org may be incidentally useful to this internal audience, but explaining the full details of the data platform itself belongs elsewhere.
What do these users need?

In general, a data practitioner is trying to answer a specific set of questions in the context of an exploration. There are a few things that they need:

  • A working knowledge of how to use the technological tools to answer the questions they might have: for example, how to create a SQL query counting the median value of a particular histogram for a particular usage segment.
  • A set of guidelines on best practices on how to measure specific things: for example, we want our people using our telemetry systems to use well-formulated practices for measuring things like “Monthly Active Users” rather than re-inventing such things themselves.
What serves this need?

A few years ago, Ryan Harter did an extensive literature review on writing documentation on technical subjects - the take away from this exploration is that the global consensus is that we should focus most of our attention on writing practical tutorials which enables our users to perform specific tasks in the service of the above objective.

There is a proverb, allegedly attributed to Confucius which goes something like this:

“I hear and I forget. I see and I remember. I do and I understand.”

The understanding we want to build is how to use our data systems and tools to answer questions. Some knowledge of how our data platform works is no doubt necessary to accomplish this, but it is mostly functional knowledge we care about imparting to data practitioners: the best way to build this understanding is to guide users in performing tasks.

This makes sense intuitively, but it is also borne out by the data that this is what our users are looking for. Looking through the top pages on Google Analytics, virtually all of them1 refer either to a cookbook or howto guide:

Happily, this allows us to significantly narrow our focus for docs.telemetry.mozilla.org. We no longer need to worry about:

  • Providing lists or summaries of data tools available inside Mozilla2: we can talk about tools only as needed in the context of tasks they want to accomplish. We may want to keep such a list handy elsewhere for some other reason (e.g. internal auditing purposes), but we can safely say that it belongs somewhere else, like the Mozilla wiki, mana, or the telemetry.mozilla.org portal.
  • Detailed reference on the technical details of the data platform implementation. Links to this kind of material can be surfaced inside the documentation where relevant, but it is expected that an implementation reference will normally be stored and presented within the context of the tools themselves (a good example would be the existing documentation for GCP ingestion).
  • Detailed reference on all data sets, ping types, or probes: hand-written documentation for this kind of information is difficult to keep up to date with manual processes3 and is best generated automatically and browsed with tools like the probe dictionary.
  • Detailed reference material on how to submit Telemetry. While an overview of how to think about submitting telemetry may be in scope (recall that we consider Firefox engineers a kind of data practitioner), the details are really a seperate topic that is better left to another resource which is closer to the implementation (for example, the Firefox Source Documentation or the Glean SDK reference).

Scanning through the above, you’ll see a common theme: avoid overly detailed reference material. The above is not to say that we should avoid background documentation altogether. For example, an understanding of how our data pipeline works is key to understanding how up-to-date a dashboard is expected to be. However, this type of documentation should be written bearing in mind the audience (focusing on what they need to know as data practitioners) and should be surfaced towards the end of the documentation as supporting material.

As an exception, there is also a very small amount of reference documentation which we want to put at top-level because it is so important: for example the standard metrics page describes how we define “MAU” and “DAU”: these are measures that we want to standardize in the organization, and not have someone re-invent every time they produce an analysis or report. However, we should be very cautious about how much of this “front facing” material we include: if we overwhelm our audience with details right out of the gate, they are apt to ignore them.

Concrete actions
  • We should continue working on tutorials on how to perform common tasks: this includes not only the low-level guides that we currently have (e.g. BigQuery and SQL tutorials) but also information on how to effectively use our higher-level, more accessible tools like GLAM and GUD to answer questions.
  • Medium term, we should remove the per-dataset documentation and replace it with a graphical tool for browsing this type of information (perhaps Google Data Catalog). Since this is likely to be a rather involved project, we can keep the existing documentation for now — but for new datasets, we should encourage their authors to write tutorials on how to use them effectively (assuming they are of broad general interest) instead of hand-creating schema definitions that are likely to go out of date quickly.
  • We should set clear expectations and guidelines of what does and doesn’t belong on docs.telemetry.mozilla.org as part of a larger style guide. This style guide should be referenced somewhere prominent (perhaps as part of a pull request template) so that historical knowledge of what this resource is for isn’t lost.
Footnotes
  1. For some reason which I cannot fathom, a broad (non-Mozilla) audience seems unusually interested in our SQL style guide. 

  2. The current Firefox data documentation has a project glossary that is riddled with links to obsolete and unused projects. 

  3. docs.telemetry.mozilla.org has a huge section devoted to derived datasets (20+), many of which are obsolete or not recommended. At the same time, we are missing explicit reference material for the most commonly used tables in BigQuery (e.g. telemetry.main). 

Categorieën: Mozilla-nl planet

Daniel Stenberg: Manual cURL cURL

Mozilla planet - ma, 11/05/2020 - 08:46

The HP Color LaserJet CP3525 Printer looks like any other ordinary printer done by HP. But there’s a difference!

A friend of mine fell over this gem, and told me.

TCP/IP Settings

If you go to the machine’s TCP/IP settings using the built-in web server, the printer offers the ordinary network configure options but also one that sticks out a little exta. The “Manual cURL cURL” option! It looks like this:

I could easily confirm that this is genuine. I did this screenshot above by just googling for the string and printer model, since there appears to exist printers like this exposing their settings web server to the Internet. Hilarious!

What?

How on earth did that string end up there? Certainly there’s no relation to curl at all except for the actual name used there? Is it a sign that there’s basically no humans left at HP that understand what the individual settings on that screen are actually meant for?

Given the contents in the text field, a URL containing the letters WPAD twice, I can only presume this field is actually meant for Web Proxy Auto-Discovery. I spent some time trying to find the user manual for this printer configuration screen but failed. It would’ve been fun to find “manual cURL cURL” described in a manual! They do offer a busload of various manuals, maybe I just missed the right one.

Does it use curl?

Yes, it seems HP generally use curl at least as I found the “Open-Source Software License Agreements for HP LaserJet and ScanJet Printers” and it contains the curl license:

<figcaption>The curl license as found in the HP printer open source report.</figcaption> HP using curl for Print-Uri?

Independently, someone else recently told me about another possible HP + curl connection. This user said his HP printer makes HTTP requests using the user-agent libcurl-agent/1.0:

I haven’t managed to get this confirmed by anyone else (although the license snippet above certainly implies they use curl) and that particular user-agent string has been used everywhere for a long time, as I believe it is copied widely from the popular libcurl example getinmemory.c where I made up the user-agent and put it there already in 2004.

Credits

Frank Gevaerts tricked me into going down this rabbit hole as he told me about this string.

Categorieën: Mozilla-nl planet

Ludovic Hirlimann: Recommendations are moving entities

Mozilla planet - za, 09/05/2020 - 15:02

At my new job we publish an open source webapp map systems uxing a mix of technologies, we also offer it as SAS. Last Thursday I looked at how our Nginx server was configured TLS wise.

I was thrilled to see the comment in our nginx code saying the configuration had been built using mozilla's ssl config tool. At the same time I was shocked to see that the configuration that dated from early 2018 was completely out of date. Half of the ciphers were gone. So we took a modern config and applied it.

Once done we turned ourselves to the observatory to check out our score, and me and my colleague were disappointed to get an F. So we fixed what we could easily (the cyphers) and added an issue to our product to make it more secure for our users.

We'll also probably add a calendar entry to check our score on a regular basis, as the recommendation will change, our software configuration will change too.

Categorieën: Mozilla-nl planet

Mozilla Security Blog: May 2020 CA Communication

Mozilla planet - vr, 08/05/2020 - 19:05

Mozilla has sent a CA Communication and Survey to inform Certification Authorities (CAs) who have root certificates included in Mozilla’s program about current expectations. Additionally this survey will collect input from CAs on potential changes to Mozilla’s Root Store Policy. This CA communication and survey has been emailed to the Primary Point of Contact (POC) and an email alias for each CA in Mozilla’s program, and they have been asked to respond to the following items:

  1. Review guidance about actions a CA should take if they realize that mandated restrictions regarding COVID-19 will impact their audits or delay revocation of certificates.
  2. Inform Mozilla if their CA’s ability to fulfill the commitments that they made in response to the January 2020 CA Communication has been impeded.
  3. Provide input into potential policy changes that are under consideration, such as limiting maximum lifetimes for TLS certificates and limiting the re-use of domain name verification.

The full communication and survey can be read here. Responses to the survey will be automatically and immediately published by the CCADB.

With this CA Communication, we reiterate that participation in Mozilla’s CA Certificate Program is at our sole discretion, and we will take whatever steps are necessary to keep our users safe. Nevertheless, we believe that the best approach to safeguard that security is to work with CAs as partners, to foster open and frank communication, and to be diligent in looking for ways to improve.

The post May 2020 CA Communication appeared first on Mozilla Security Blog.

Categorieën: Mozilla-nl planet

Data@Mozilla: This Week in Glean: mozregression telemetry (part 2)

Mozilla planet - vr, 08/05/2020 - 17:16

(“This Week in Glean” is a series of blog posts that the Glean Team at Mozilla is using to try to communicate better about our work. They could be release notes, documentation, hopes, dreams, or whatever: so long as it is inspired by Glean. You can find an index of all TWiG posts online.)

This is a special guest post by non-Glean-team member William Lachance!

This is a continuation of an exploration of adding Glean-based telemetry to a python application, in this case mozregression, a tool for automatically finding the source of Firefox regressions (breakage).

When we left off last time, we had written some test scripts and verified that the data was visible in the debug viewer.

Adding Telemetry to mozregression itself

In many ways, this is pretty similar to what I did inside the sample application: the only significant difference is that these are shipped inside a Python application that is meant to be be installable via pip. This means we need to specify the pings.yaml and metrics.yaml (located inside the mozregression subirectory) as package data inside setup.py:

setup( name="mozregression", ... package_data={"mozregression": ["*.yaml"]}, ... )

There were also a number of Glean SDK enhancements which we determined were necessary. Most notably, Michael Droettboom added 32-bit Windows wheels to the Glean SDK, which we need to make building the mozregression GUI on Windows possible. In addition, some minor changes needed to be made to Glean’s behaviour for it to work correctly with a command-line tool like mozregression — for example, Glean used to assume that Telemetry would always be disabled via a GUI action so that it would send a deletion ping, but this would obviously not work in an application like mozregression where there is only a configuration file — so for this case, Glean needed to be modified to check if it had been disabled between runs.

Many thanks to Mike (and others on the Glean team) for so patiently listening to my concerns and modifying Glean accordingly.

Getting Data Review

At Mozilla, we don’t just allow random engineers like myself to start collecting data in a product that we ship (even a semi-internal like mozregression). We have a process, overseen by Data Stewards to make sure the information we gather is actually answering important questions and doesn’t unnecessarily collect personally identifiable information (e.g. email addresses).

You can see the specifics of how this worked out in the case of mozregression in bug 1581647.

Documentation

Glean has some fantastic utilities for generating markdown-based documentation on what information is being collected, which I have made available on GitHub:

https://github.com/mozilla/mozregression/blob/master/docs/glean/metrics.md

The generation of this documentation is hooked up to mozregression’s continuous integration, so we can sure it’s up to date.

I also added a quick note to mozregression’s web site describing the feature, along with (very importantly) instructions on how to turn it off.

Enabling Data Ingestion

Once a Glean-based project has passed data review, getting our infrastructure to ingest it is pretty straightforward. Normally we would suggest just filing a bug and let us (the data team) handle the details, but since I’m on that team, I’m going to go a (little bit) of detail into how the sausage is made.

Behind the scenes, we have a collection of ETL (extract-transform-load) scripts in the probe-scraper repository which are responsible for parsing the ping and probe metadata files that I added to mozregression in the step above and then automatically creating BigQuery tables and updating our ingestion machinery to insert data passed to us there.

There’s quite a bit of complicated machinery being the scenes to make this all work, but since it’s already in place, adding a new thing like this is relatively simple. The changeset I submitted as part of a pull request to probe-scraper was all of 9 lines long:

diff --git a/repositories.yaml b/repositories.yaml index dffcccf..6212e55 100644 --- a/repositories.yaml +++ b/repositories.yaml @@ -239,3 +239,12 @@ firefox-android-release: - org.mozilla.components:browser-engine-gecko-beta - org.mozilla.appservices:logins - org.mozilla.components:support-migration +mozregression: + app_id: org-mozilla-mozregression + notification_emails: + - wlachance@mozilla.com + url: 'https://github.com/mozilla/mozregression' + metrics_files: + - 'mozregression/metrics.yaml' + ping_files: + - 'mozregression/pings.yaml' A Pretty Graph

With the probe scraper change merged and deployed, we can now start querying! A number of tables are automatically created according to the schema outlined above: notably “live” and “stable” tables corresponding to the usage ping. Using sql.telemetry.mozilla.org we can start exploring what’s out there. Here’s a quick query I wrote up:

SELECT DATE(submission_timestamp) AS date, metrics.string.usage_variant AS variant, count(*), FROM `moz-fx-data-shared-prod`.org_mozilla_mozregression_stable.usage_v1 WHERE DATE(submission_timestamp) >= '2020-04-14' AND client_info.app_display_version NOT LIKE '%.dev%' GROUP BY date, variant;

Generating a chart like this:

This chart represents the absolute volume of mozregression usage since April 14th 2020 (around the time when we first released a version of mozregression with Glean telemetry), grouped by mozregression “variant” (GUI, console, and mach) and date – you can see that (unsurprisingly?) the GUI has the highest usage.

Next Steps

We’re not done yet! Next time, we’ll look into making a public-facing dashboard demonstrating these results and making an aggregated version of the mozregression telemetry data publicly accessible to researchers and the general public. If we’re lucky, there might even be a bit of data science. Stay tuned!

(( This is a syndicated copy of the original post. ))

Categorieën: Mozilla-nl planet

William Lachance: This Week in Glean: mozregression telemetry (part 2)

Mozilla planet - vr, 08/05/2020 - 16:32

(“This Week in Glean” is a series of blog posts that the Glean Team at Mozilla is using to try to communicate better about our work. They could be release notes, documentation, hopes, dreams, or whatever: so long as it is inspired by Glean. You can find an index of all TWiG posts online.)

This is a special guest post by non-Glean-team member William Lachance!

This is a continuation of an exploration of adding Glean-based telemetry to a python application, in this case mozregression, a tool for automatically finding the source of Firefox regressions (breakage).

When we left off last time, we had written some test scripts and verified that the data was visible in the debug viewer.

Adding Telemetry to mozregression itself

In many ways, this is pretty similar to what I did inside the sample application: the only significant difference is that these are shipped inside a Python application that is meant to be be installable via pip. This means we need to specify the pings.yaml and metrics.yaml (located inside the mozregression subirectory) as package data inside setup.py:

setup( name="mozregression", ... package_data={"mozregression": ["*.yaml"]}, ... )

There were also a number of Glean SDK enhancements which we determined were necessary. Most notably, Michael Droettboom added 32-bit Windows wheels to the Glean SDK, which we need to make building the mozregression GUI on Windows possible. In addition, some minor changes needed to be made to Glean’s behaviour for it to work correctly with a command-line tool like mozregression — for example, Glean used to assume that Telemetry would always be disabled via a GUI action so that it would send a deletion ping, but this would obviously not work in an application like mozregression where there is only a configuration file — so for this case, Glean needed to be modified to check if it had been disabled between runs.

Many thanks to Mike (and others on the Glean team) for so patiently listening to my concerns and modifying Glean accordingly.

Getting Data Review

At Mozilla, we don’t just allow random engineers like myself to start collecting data in a product that we ship (even a semi-internal like mozregression). We have a process, overseen by Data Stewards to make sure the information we gather is actually answering important questions and doesn’t unnecessarily collect personally identifiable information (e.g. email addresses).

You can see the specifics of how this worked out in the case of mozregression in bug 1581647.

Documentation

Glean has some fantastic utilities for generating markdown-based documentation on what information is being collected, which I have made available on GitHub:

https://github.com/mozilla/mozregression/blob/master/docs/glean/metrics.md

The generation of this documentation is hooked up to mozregression’s continuous integration, so we can sure it’s up to date.

I also added a quick note to mozregression’s web site describing the feature, along with (very importantly) instructions on how to turn it off.

Enabling Data Ingestion

Once a Glean-based project has passed data review, getting our infrastructure to ingest it is pretty straightforward. Normally we would suggest just filing a bug and let us (the data team) handle the details, but since I’m on that team, I’m going to go a (little bit) of detail into how the sausage is made.

Behind the scenes, we have a collection of ETL (extract-transform-load) scripts in the probe-scraper repository which are responsible for parsing the ping and probe metadata files that I added to mozregression in the step above and then automatically creating BigQuery tables and updating our ingestion machinery to insert data passed to us there.

There’s quite a bit of complicated machinery being the scenes to make this all work, but since it’s already in place, adding a new thing like this is relatively simple. The changeset I submitted as part of a pull request to probe-scraper was all of 9 lines long:

diff --git a/repositories.yaml b/repositories.yaml index dffcccf..6212e55 100644 --- a/repositories.yaml +++ b/repositories.yaml @@ -239,3 +239,12 @@ firefox-android-release: - org.mozilla.components:browser-engine-gecko-beta - org.mozilla.appservices:logins - org.mozilla.components:support-migration +mozregression: + app_id: org-mozilla-mozregression + notification_emails: + - wlachance@mozilla.com + url: 'https://github.com/mozilla/mozregression' + metrics_files: + - 'mozregression/metrics.yaml' + ping_files: + - 'mozregression/pings.yaml' A Pretty Graph

With the probe scraper change merged and deployed, we can now start querying! A number of tables are automatically created according to the schema outlined above: notably “live” and “stable” tables corresponding to the usage ping. Using sql.telemetry.mozilla.org we can start exploring what’s out there. Here’s a quick query I wrote up:

SELECT DATE(submission_timestamp) AS date, metrics.string.usage_variant AS variant, count(*), FROM `moz-fx-data-shared-prod`.org_mozilla_mozregression_stable.usage_v1 WHERE DATE(submission_timestamp) >= '2020-04-14' AND client_info.app_display_version NOT LIKE '%.dev%' GROUP BY date, variant;

… which generates a chart like this:

This chart represents the absolute volume of mozregression usage since April 14th 2020 (around the time when we first released a version of mozregression with Glean telemetry), grouped by mozregression “variant” (GUI, console, and mach) and date - you can see that (unsurprisingly?) the GUI has the highest usage. I’ll talk about this more in an upcoming installment, speaking of…

Next Steps

We’re not done yet! Next time, we’ll look into making a public-facing dashboard demonstrating these results and making an aggregated version of the mozregression telemetry data publicly accessible to researchers and the general public. If we’re lucky, there might even be a bit of data science. Stay tuned!

Categorieën: Mozilla-nl planet

Wladimir Palant: What data does Xiaomi collect about you?

Mozilla planet - vr, 08/05/2020 - 13:43

A few days ago I published a very technical article confirming that Xiaomi browsers collect a massive amount of private data. This fact was initially publicized in a Forbes article based on the research by Gabriel Cîrlig and Andrew Tierney. After initially dismissing the report as incorrect, Xiaomi has since updated their Mint and Mi Pro browsers to include an option to disable this tracking in incognito mode.

Xiaomi demonstrating a privacy fig leaf<figcaption> Image credits: 1mran IN, Openclipart </figcaption>

Is the problem solved now? Not really. There is now exactly one non-obvious setting combination where you can have your privacy with these browsers: “Incognito Mode” setting on, “Enhanced Incognito Mode” setting off. With these not being the default and the users not informed about the consequences, very few people will change to this configuration. So the browsers will continue spying on the majority of their user base.

In this article I want to provide a high-level overview of the data being exfiltrated here. TL;DR: Lots and lots of it.

Contents

Disclaimer: This article is based entirely on reverse engineering Xiaomi Mint Browser 3.4.3. I haven’t seen the browser in action, so some details might be wrong. Update (2020-05-08): From a quick glance at Xiaomi Mint Browser 3.4.4 which has been released in the meantime, no further changes to this functionality appear to have been implemented.

Event data

When allowed, Xiaomi browsers will send information about a multitude of different events, sometimes with specific data attached. For example, an event will typically be generated when some piece of the user interface shows up or is clicked, an error occurs or the current page’s address is copied to clipboard. There are more interesting events as well however, for example:

  • A page started or finished loading, with the page address attached
  • Change of default search engine, with old and new search engines attached
  • Search via the navigation bar, with the search query attached
  • Reader mode switched on, with the page address attached
  • A tab clicked, with the tab name attached
  • A page being shared, with the page address attached
  • Reminder shown to switch on Incognito Mode, with the porn site that triggered the reminder attached
  • YouTube searches, with the search query attached
  • Video details for a YouTube video opened or closed, with video ID attached
  • YouTube video played, with video ID attached
  • Page or file downloaded, with the address attached
  • Speed dial on the homepage clicked, added or modified, with the target address attached
Generic annotations

Some pieces of data will be attached to every event. These are meant to provide the context, and to group related events of course. This data includes among other things:

  • A randomly generated identifier that is unique to your browser instance. While this identifier is supposed to change every 90 days, this won’t actually happen due to a bug. In most cases, it should be fairly easy to recognize the person behind the identifier.
  • An additional device identifier (this one will stay unchanged even if app data is cleared)
  • If you are logged into your Mi Account: the identifier of this account
  • The exact time of the event
  • Device manufacturer and model
  • Browser version
  • Operating system version
  • Language setting of your Android system
  • Default search engine
  • Mobile network operator
  • Network type (wifi, mobile)
  • Screen size
Conclusions

Even with the recent changes, Xiaomi browsers are massively invading users’ privacy. The amount of data collected by default goes far beyond what’s necessary for application improvement. Instead, Xiaomi appears to be interested in where users go, what they search for and which videos they watch. Even with a fairly obscure setting to disable this tracking, the default behavior isn’t acceptable. If you happen to be using a Xiaomi device, you should install a different browser ASAP.

Categorieën: Mozilla-nl planet

Daniel Stenberg: video: common mistakes when using libcurl

Mozilla planet - vr, 08/05/2020 - 08:02

As I posted previously, I did a webinar and here’s the recording and the slides I used for it.

The slides.

Categorieën: Mozilla-nl planet

The Talospace Project: Firefox 76 on POWER

Mozilla planet - vr, 08/05/2020 - 05:46
Firefox 76 is released. Besides other CSS, HTML and developer features, it refines that somewhat obnoxious zooming bar a bit, improves Picture-in-Picture further (great for livestreams: using it a lot for church), and most notably adds critical alerts for website breaches and improved password security (both generating good secure passwords and notifying you when a password used on one or other sites may have been stolen). The .mozconfigs are unchanged from Firefox 67, which is good news, because we've been stable without changing build options for quite a while at this point and we might be able to start investigating why some build options fail which should function. In particular, PGO and LTO would be nice to get working.
Categorieën: Mozilla-nl planet

Daniel Stenberg: qlog with curl

Mozilla planet - do, 07/05/2020 - 23:38

I want curl to be on the very bleeding edge of protocol development to aid the Internet protocol development community to test out protocols early and to work out kinks in the protocols and server implementations using curl’s vast set of tools and switches.

For this, curl supported HTTP/2 really early on and helped shaping the protocol and testing out servers.

For this reason, curl supports HTTP/3 already since August 2019. A convenient and well-known client that you can then use to poke on your brand new HTTP/3 servers too and we can work on getting all the rough edges smoothed out before the protocol is reaching its final state.

QUIC tooling

One of the many challenges QUIC and HTTP/3 have is that with a new transport protocol comes entirely new paradigms. With new paradigms like this, we need improved or perhaps even new tools to help us understand the network flows back and forth, to make sure we all have a common understanding of the protocols and to make sure we implement our end-points correctly.

QUIC only exists as an encrypted-only protocol, meaning that we can no longer easily monitor and passively investigate network traffic like before, QUIC also encrypts more of the protocol than TCP + TLS do, leaving even less for an outsider to see.

The current QUIC analyzer tool lineup gives us two options.

Wireshark

We all of course love Wireshark and if you get a very recent version, you’ll be able to decrypt and view QUIC network data.

With curl, and a few other clients, you can ask to get the necessary TLS secrets exported at run-time with the SSLKEYLOGFILE environment variable. You’ll then be able to see every bit in every packet. This way to extract secrets works with QUIC as well as with the traditional TCP+TLS based protocols.

qvis/qlog

The qvis/qlog site. If you find the Wireshark network view a little bit too low level and leaving a lot for you to understand and draw conclusions from, the next-level tool here is the common QUIC logging format called qlog. This is an agreed-upon common standard to log QUIC traffic, which the accompanying qvis web based visualizer tool that lets you upload your logs and get visualizations generated. This becomes extra powerful if you have logs from both ends!

Starting with this commit (landed in the git master branch on May 7, 2020), all curl builds that support HTTP/3 – independent of what backend you pick – can be told to output qlogs.

Enable qlogging in curl by setting the new standard environment variable QLOGDIR to point to a directory in which you want qlogs to be generated. When you run curl then, you’ll get files creates in there named as [hex digits].log, where the hex digits is the “SCID” (Source Connection Identifier).

Credits

qlog and qvis are spear-headed by Robin Marx. qlogging for curl with Quiche was pushed for by Lucas Pardue and Alessandro Ghedini. In the ngtcp2 camp, Tatsuhiro Tsujikawa made it very easy for me to switch it on in curl.

The top image is snapped from the demo sample on the qvis web site.

Categorieën: Mozilla-nl planet

The Mozilla Blog: Mozilla research shows some machine voices score higher than humans

Mozilla planet - do, 07/05/2020 - 17:36

This blog post is to accompany the publication of the paper Choice of Voices: A Large-Scale Evaluation of Text-to-Speech Voice Quality for Long-Form Content in the Proceedings of CHI’20, by Julia Cambre and Jessica Colnago from CMU, Jim Maddock from Northwestern, and Janice Tsai and Jofish Kaye from Mozilla. 

In 2019, Mozilla’s Voice team developed a method to evaluate the quality of text-to-speech voices. It turns out there was very little that had been done in the world of text to speech to evaluate voice for listening to long-form content — things like articles, book chapters, or blog posts. A lot of the existing work answered the core question ofcan you understand this voice?” So a typical test might use a syntactically correct but meaningless sentence, like “The masterly serials withdrew the collaborative brochure”, and have a listener type that in. That way, the listener couldn’t guess missed words from other words in the sentence. But now that we’ve reached a stage of computerized voice quality where so many voices can pass the comprehension test with flying colours, what’s the next step?

How can we determine if a voice is enjoyable to listen to, particularly for long-form content — something you’d listen to for more than a minute or two? Our team had a lot of experience with this: we had worked closely with our colleagues at Pocket to develop the Pocket Listen feature, so you can listen to articles you’ve saved, while driving or cooking. But we still didn’t know how to definitively say that one voice led to a better listening experience than another.

The method we used was developed by our intern Jessica Colnago during her internship at Mozilla, and it’s pretty simple in concept. We took one article, How to Reduce Your Stress in Two Minutes a Day, and we recorded each voice reading that article. Then we had 50 people on Mechanical Turk listen to each recording — 50 different people each time. (You can also listen to clips from most of these recordings to make your own judgement.). Nobody heard the article more than once. And at the end of the article, we’d ask them a couple of questions to check they were actually listening, and to see what they thought about the voice.

For example, we’d ask them to rate how much they liked the voice on a scale of one to five, and how willing they’d be to listen to more content recorded by that voice. We asked them why they thought that voice might be pleasant or unpleasant to listen to. We evaluated 27 voices, and here’s one graph which represents the results. (The paper has lots more rigorous analysis, and we explored various methods to sort the ratings, but the end results are all pretty similar. We also added a few more voices after the paper was finished, which is why there’s different numbers of voices in different places in this research.)

voice comparison graph As you can see, some voices rated better than others. The ones at the left are the ones people consistently rated positively, and the ones at the right are the ones that people liked less: just as examples, you’ll notice that the default (American) iOS female voice is pretty far to the right, although the Mac default voice has a pretty respectable showing. I was proud to find that the Mozilla Judy Wave1 voice, created by Mozilla research engineer Eren Gölge, is rated up there along with some of the best ones in the field. It turns out the best electronic voices we tested are Mozilla’s voices and the Polly Neural voices from Amazon. And while we still have some licensing questions to figure out, making sure we can create sustainable, publicly accessible, high quality voices, it’s exciting to see that we can do something in an open source way that is competitive with very well funded voice efforts out there, which don’t have the same aim of being private, secure and accessible to all.

We found there were some generalizable experiences. Listeners were 54% more likely to give a higher experience rating to the male voices we tested than the female voices. We also looked at the number of words spoken in a minute. Generally, our results indicated that there is a “just right speed” in the range of 163 to 177 words per minute, and people didn’t like listening to voices that were much faster or slower than that.

But the more interesting result comes from one of the things we did at a pretty late stage in the process, which was to include some humans reading the article directly into a microphone. Those are the voices circled in red:

voice comparison graph humans marked

What we found was that some of our human voices were being rated lower than some of the robot voices. And that’s fascinating. That suggests we are at a point in technology, in society right now, where there are mechanically generated voices that actually sound better than humans. And before you ask, I listened to those recordings of human voices. You can do the same. Janice (the recording labelled Human 2 in the dataset) has a perfectly normal voice that I find pleasant to listen to. And yet some people were finding these mechanically generated voices better.

That raises a whole host of interesting questions, concerns and opportunities. This is a snapshot of computerized voices, in the last two years or so. Even since we’ve done this study, we’ve seen the quality of voices improve. What happens when computers are more pleasant to listen to than our own voices? What happens when our children might prefer to listen to our computer reading a story than ourselves?

A potentially bigger ethical question comes with the question of persuasion. One question we didn’t ask in this study was whether people trusted or believed the content that was read to them. What happens when we can increase the number of people who believe something simply by changing the voice that it is read in? There are entire careers exploring the boundaries of influence and persuasion; how does easy access to “trustable” voices change our understanding of what signals point to trustworthiness? The BBC has been exploring British attitudes to regional accents in a similar way — drawing, fascinatingly, from a study of how British people reacted to different voices on the radio in 1927. We are clearly continuing a long tradition of analyzing the impact of voice and voices on how we understand and feel about information.

The post Mozilla research shows some machine voices score higher than humans appeared first on The Mozilla Blog.

Categorieën: Mozilla-nl planet

Pagina's