mozilla

Mozilla Nederland LogoDe Nederlandse
Mozilla-gemeenschap

Abonneren op feed Mozilla planet
Planet Mozilla - https://planet.mozilla.org/
Bijgewerkt: 2 dagen 4 uur geleden

Dustin J. Mitchell: Taskcluster's DB (Part 1) - Azure to Postgres

wo, 28/10/2020 - 22:00

This is a deep-dive into some of the implementation details of Taskcluster. Taskcluster is a platform for building continuous integration, continuous deployment, and software-release processes. It’s an open source project that began life at Mozilla, supporting the Firefox build, test, and release systems.

The Taskcluster “services” are a collection of microservices that handle distinct tasks: the queue coordinates tasks; the worker-manager creates and manages workers to execute tasks; the auth service authenticates API requests; and so on.

Azure Storage Tables to Postgres

Until April 2020, Taskcluster stored its data in Azure Storage tables, a simple NoSQL-style service similar to AWS’s DynamoDB. Briefly, each Azure table is a list of JSON objects with a single primary key composed of a partition key and a row key. Lookups by primary key are fast and parallelize well, but scans of an entire table are extremely slow and subject to API rate limits. Taskcluster was carefully designed within these constraints, but that meant that some useful operations, such as listing tasks by their task queue ID, were simply not supported. Switching to a fully-relational datastore would enable such operations, while easing deployment of the system for organizations that do not use Azure.

Always Be Migratin’

In April, we migrated the existing deployments of Taskcluster (at that time all within Mozilla) to Postgres. This was a “forklift migration”, in the sense that we moved the data directly into Postgres with minimal modification. Each Azure Storage table was imported into a single Postgres table of the same name, with a fixed structure:

create table queue_tasks_entities( partition_key text, row_key text, value jsonb not null, version integer not null, etag uuid default public.gen_random_uuid() ); alter table queue_tasks_entities add primary key (partition_key, row_key);

The importer we used was specially tuned to accomplish this import in a reasonable amount of time (hours). For each known deployment, we scheduled a downtime to perform this migration, after extensive performance testing on development copies.

We considered options to support a downtime-free migration. For example, we could have built an adapter that would read from Postgres and Azure, but write to Postgres. This adapter could support production use of the service while a background process copied data from Azure to Postgres.

This option would have been very complex, especially in supporting some of the atomicity and ordering guarantees that the Taskcluster API relies on. Failures would likely lead to data corruption and a downtime much longer than the simpler, planned downtime. So, we opted for the simpler, planned migration. (we’ll revisit the idea of online migrations in part 3)

The database for Firefox CI occupied about 350GB. The other deployments, such as the community deployment, were much smaller.

Database Interface

All access to Azure Storage tables had been via the azure-entities library, with a limited and very regular interface (hence the _entities suffix on the Postgres table name). We wrote an implementation of the same interface, but with a Postgres backend, in taskcluster-lib-entities. The result was that none of the code in the Taskcluster microservices changed. Not changing code is a great way to avoid introducing new bugs! It also limited the complexity of this change: we only had to deeply understand the semantics of azure-entities, and not the details of how the queue service handles tasks.

Stored Functions

As the taskcluster-lib-entities README indicates, access to each table is via five stored database functions:

  • <tableName>_load - load a single row
  • <tableName>_create - create a new row
  • <tableName>_remove - remove a row
  • <tableName>_modify - modify a row
  • <tableName>_scan - return some or all rows in the table

Stored functions are functions defined in the database itself, that can be redefined within a transaction. Part 2 will get into why we made this choice.

Optimistic Concurrency

The modify function is an interesting case. Azure Storage has no notion of a “transaction”, so the azure-entities library uses an optimistic-concurrency approach to implement atomic updates to rows. This uses the etag column, which changes to a new value on every update, to detect and retry concurrent modifications. While Postgres can do much better, we replicated this behavior in taskcluster-lib-entities, again to limit the changes made and avoid introducing new bugs.

A modification looks like this in Javascript:

await task.modify(task => { if (task.status !== 'running') { task.status = 'running'; task.started = now(); } });

For those not familiar with JS notation, this is calling the modify method on a task, passing a modifier function which, given a task, modifies that task. The modify method calls the modifier and tries to write the updated row to the database, conditioned on the etag still having the value it did when the task was loaded. If the etag does not match, modify re-loads the row to get the new etag, and tries again until it succeeds. The effect is that updates to the row occur one-at-a-time.

This approach is “optimistic” in the sense that it assumes no conflicts, and does extra work (retrying the modification) only in the unusual case that a conflict occurs.

What’s Next?

At this point, we had fork-lifted Azure tables into Postgres and no longer require an Azure account to run Taskcluster. However, we hadn’t yet seen any of the benefits of a relational database:

  • data fields were still trapped in a JSON object (in fact, some kinds of data were hidden in base64-encoded blobs)
  • each table still only had a single primary key, and queries by any other field would still be prohibitively slow
  • joins between tables would also be prohibitively slow

Part 2 of this series of articles will describe how we addressed these issues. Then part 3 will get into the details of performing large-scale database migrations without downtime.

Categorieën: Mozilla-nl planet

Mozilla Addons Blog: Extensions in Firefox 83

wo, 28/10/2020 - 16:30

In addition to our brief update on extensions in Firefox 83, this post contains information about changes to the Firefox release calendar and a feature preview for Firefox 84.

Thanks to a contribution from Richa Sharma, the error message logged when a tabs.sendMessage is passed an invalid tabID is now much easier to understand. It had regressed to a generic message due to a previous refactoring.

End of Year Release Calendar

The end of 2020 is approaching (yay?), and as usual people will be taking time off and will be less available. To account for this, the Firefox Release Calendar has been updated to extend the Firefox 85 release cycle by 2 weeks. We will release Firefox 84 on 15 December and Firefox 85 on 26 January. The regular 4-week release cadence should resume after that.

Coming soon in Firefox 84: Manage Optional Permissions in Add-ons Manager

Starting with Firefox 84, currently available on the Nightly pre-release channel, users will be able to manage optional permissions of installed extensions from the Firefox Add-ons Manager (about:addons).

addons

We recommend that extensions using optional permissions listen for the browser.permissions.onAdded and browser.permissions.onRemoved API events. This ensures the extension is aware of the user granting or revoking optional permissions.

The post Extensions in Firefox 83 appeared first on Mozilla Add-ons Blog.

Categorieën: Mozilla-nl planet

Will Kahn-Greene: Everett v1.0.3 released!

wo, 28/10/2020 - 14:00
What is it?

Everett is a configuration library for Python apps.

Goals of Everett:

  1. flexible configuration from multiple configured environments

  2. easy testing with configuration

  3. easy documentation of configuration for users

From that, Everett has the following features:

  • is composeable and flexible

  • makes it easier to provide helpful error messages for users trying to configure your software

  • supports auto-documentation of configuration with a Sphinx autocomponent directive

  • has an API for testing configuration variations in your tests

  • can pull configuration from a variety of specified sources (environment, INI files, YAML files, dict, write-your-own)

  • supports parsing values (bool, int, lists of things, classes, write-your-own)

  • supports key namespaces

  • supports component architectures

  • works with whatever you're writing--command line tools, web sites, system daemons, etc

v1.0.3 released!

This is a minor maintenance update that fixes a couple of minor bugs, addresses a Sphinx deprecation issue, drops support for Python 3.4 and 3.5, and adds support for Python 3.8 and 3.9 (largely adding those environments to the test suite).

Why you should take a look at Everett

At Mozilla, I'm using Everett for a variety of projects: Mozilla symbols server, Mozilla crash ingestion pipeline, and some other tooling. We use it in a bunch of other places at Mozilla, too.

Everett makes it easy to:

  1. deal with different configurations between local development and server environments

  2. test different configuration values

  3. document configuration options

First-class docs. First-class configuration error help. First-class testing. This is why I created Everett.

If this sounds useful to you, take it for a spin. It's a drop-in replacement for python-decouple and os.environ.get('CONFIGVAR', 'default_value') style of configuration so it's easy to test out.

Enjoy!

Where to go for more

For more specifics on this release, see here: https://everett.readthedocs.io/en/latest/history.html#october-28th-2020

Documentation and quickstart here: https://everett.readthedocs.io/

Source code and issue tracker here: https://github.com/willkg/everett

Categorieën: Mozilla-nl planet

Wladimir Palant: What would you risk for free Honey?

wo, 28/10/2020 - 11:17

Honey is a popular browser extension built by the PayPal subsidiary Honey Science LLC. It promises nothing less than preventing you from wasting money on your online purchases. Whenever possible, it will automatically apply promo codes to your shopping cart, thus saving your money without you lifting a finger. And it even runs a reward program that will give you some money back! Sounds great, what’s the catch?

With such offers, the price you pay is usually your privacy. With Honey, it’s also security. The browser extension is highly reliant on instructions it receives from its server. I found at least four ways for this server to run arbitrary code on any website you visit. So the extension can mutate into spyware or malware at any time, for all users or only for a subset of them – without leaving any traces of the attack like a malicious extension release.

Flies buzzing around an open honeypot, despite the fly swatter nearby.<figcaption> Image credits: Honey, Glitch, Firkin, j4p4n </figcaption> Contents The trouble with shopping assistants

Please note that there are objective reasons why it’s really hard to build a good shopping assistant. The main issue is how many online shops there are. Honey supports close to 50 thousand shops, yet I easily found a bunch of shops that were missing. Even the shops based on the same engine are typically customized and might have subtle differences in their behavior. Not just that, they will also change without an advance warning. Supporting this zoo is far from trivial.

Add to this the fact that with most of these shops there is very little money to be earned. A shopping assistant needs to work well with Amazon and Shopify. But supporting everything else has to come at close to no cost whatsoever.

The resulting design choices are the perfect recipe for a privacy nightmare:

  • As much server-side configuration as somehow possible, to avoid releasing new extension versions unnecessarily
  • As much data extraction as somehow possible, to avoid manual monitoring of shop changes
  • Bad code quality with many inconsistent approaches, because improving code is costly

I looked into Honey primarily due to its popularity, it being used by more than 17 million users according to the statement on the product’s website. Given the above, I didn’t expect great privacy choices. And while I haven’t seen anything indicating malice, the poor choices made still managed to exceed my expectations by far.

Unique user identifiers

By now you are probably used to reading statements like the following in company’s privacy statements:

None of the information that we collect from these events contains any personally identifiable information (PII) such as names or email addresses.

But of course a persistent semi-random user identifier doesn’t count as “personally identifiable information.” So Honey creates several of those and sends them with every request to its servers:

HTTP headers sent with requests to joinhoney.com

Here you see the exv value in the Cookie header: it is a combination of the extension version, a user ID (bound to the Honey account if any) and a device ID (locally generated random value, stored persistently in the extension data). The same value is also sent with the payload of various requests.

If you are logged into your Honey account, there will also be x-honey-auth-at and x-honey-auth-rt headers. These are an access and a refresh token respectively. It’s not that these are required (the server will produce the same responses regardless) but they once again associate your requests with your Honey account.

So that’s where this Honey privacy statement is clearly wrong: while the data collected doesn’t contain your email address, Honey makes sure to associate it with your account among other things. And the account is tied to your email address. If you were careless enough to enter your name, there will be a name associated with the data as well.

Remote configure everything

Out of the box, the extension won’t know what to do. Before it can do anything at all, it first needs to ask the server which domains it is supposed to be active on. The result is currently a huge list with some of the most popular domains like google.com, bing.com or microsoft.com listed.

Clearly, not all of google.com is an online shop. So when you visit one of the “supported” domains for the first time within a browsing session, the extension will request additional information:

Honey asking its server for shops under the google.com domain

Now the extension knows to ignore all of google.com but the shops listed here. It still doesn’t know anything about the shops however, so when you visit Google Play for example there will be one more request:

Google Play metadata returned by Honey server

The metadata part of the response is most interesting as it determines much of the extension’s behavior on the respective website. For example, there are optional fields pns_siteSelSubId1 to pns_siteSelSubId3 that determine what information the extension sends back to the server later:

Honey sending three empty subid fields to the server

Here the field subid1 and similar are empty because pns_siteSelSubId1 is missing in the store configuration. Were it present, Honey would use it as a CSS selector to find a page element, extract its text and send that text back to the server. Good if somebody wants to know what exactly people are looking at.

Mind you, I only found this functionality enabled on amazon.com and macys.com, yet the selectors provided appear to be outdated and do not match anything. So is this some outdated functionality that is no longer in use and that nobody bothered removing yet? Very likely. Yet it could jump to life any time to collect more detailed information about your browsing habits.

The highly flexible promo code applying process

As you can imagine, the process of applying promo codes can vary wildly between different shops. Yet Honey needs to do it somehow without bothering the user. So while store configuration normally tends to stick to CSS selectors, for this task it will resort to JavaScript code. For example, you get the following configuration for hostgator.com:

Store configuration for hostgator.com containing JavaScript code

The JavaScript code listed under pns_siteRemoveCodeAction or pns_siteSelCartCodeSubmit will be injected into the web page, so it could do anything there: add more items to the cart, change the shipping address or steal your credit card data. Honey requires us to put lots of trust into their web server, isn’t there a better way?

Turns out, Honey actually found one. Allow me to introduce a mechanism labeled as “DAC” internally for reasons I wasn’t yet able to understand:

Honey requesting the DAC script to be applied

The acorn field here contains base64-encoded JSON data. It’s the output of the acorn JavaScript parser: an Abstract Syntax Tree (AST) of some JavaScript code. When reassembled, it turns into this script:

let price = state.startPrice; try { $('#coupon-code').val(code); $('#check-coupon').click(); setTimeout(3000); price = $('#preview_total').text(); } catch (_) { } resolve({ price });

But Honey doesn’t reassemble the script. Instead, it runs it via a JavaScript-based JavaScript interpreter. This library is explicitly meant to run untrusted code in a sandboxed environment. All one has to do is making sure that the script only gets access to safe functionality.

But you are wondering what this $() function is, aren’t you? It almost looks like jQuery, a library that I called out as a security hazard on multiple occasions. And indeed: Honey chose to expose full jQuery functionality to the sandboxed scripts, thus rendering the sandbox completely useless.

Why did they even bother with this complicated approach? Beats me. I can only imagine that they had trouble with shops using Content Security Policy (CSP) in a way that prohibited execution of arbitrary scripts. So they decided to run the scripts outside the browser where CSP couldn’t stop them.

When selectors aren’t actually selectors

So if the Honey server turned malicious, it would have to enable Honey functionality on the target website, then trick the user into clicking the button to apply promo codes? It could even make that attack more likely to succeed because some of the CSS code styling the button is conveniently served remotely, so the button could be made transparent and spanning the entire page – the user would be bound to click it.

No, that’s still too complicated. Those selectors in the store configuration, what do you think: how are these turned into actual elements? Are you saying document.querySelector()? No, guess again. Is anybody saying “jQuery”? Yes, of course it is using jQuery for extension code as well! And that means that every selector could be potentially booby-trapped.

In the store configuration pictured above, pns_siteSelCartCodeBox field has the selector #coupon-code, [name="coupon"] as its value. What if the server replaces the selector by <img src=x onerror=alert("XSS")>? Exactly, this will happen:

An alert message saying

This message actually appears multiple times because Honey will evaluate this selector a number of times for each page. It does that for any page of a supported store, unconditionally. Remember that whether a site is a supported store or not is determined by the Honey server. So this is a very simple and reliable way for this server to leverage its privileged access to the Honey extension and run arbitrary code on any website (Universal XSS vulnerability).

How about some obfuscation?

Now we have simple and reliable, but isn’t it also too obvious? What if somebody monitors the extension’s network requests? Won’t they notice the odd JavaScript code?

That scenario is rather unlikely actually, e.g. if you look at how long Avast has been spying on their users with barely anybody noticing. But Honey developers are always up to a challenge. And their solution was aptly named “VIM” (no, they definitely don’t mean the editor). Here is one of the requests downloading VIM code for a store:

A request resulting in base64-encoded data in the mainVim field and more in the subVims object

This time, there is no point decoding the base64-encoded data: the result will be binary garbage. As it turns out, the data here has been encrypted using AES, with the start of the string serving as the key. But even after decrypting you won’t be any wiser: the resulting JSON data has all key names replaced by numeric indices and values are once again encrypted.

You need the following script to decrypt the data (requires CryptoJS):

const keys = [ "alternate", "argument", "arguments", "block", "body", "callee", "cases", "computed", "consequent", "constructor", "declaration", "declarations", "discriminant", "elements", "expression", "expressions", "finalizer", "handler", "id", "init", "key", "kind", "label", "left", "method", "name", "object", "operator", "param", "params", "prefix", "properties", "property", "quasi", "right", "shorthand", "source", "specifiers", "superClass", "tag", "test", "type", "update", "value" ]; function decryptValue(obj) { if (Array.isArray(obj)) return obj.map(decryptValue); if (typeof obj != "object" || !obj) return obj; let result = {}; for (let key of Object.keys(obj)) { let value = obj[key]; if (key.startsWith("_")) key = keys[parseInt(key.substr(1), 10)]; if (typeof value == "string") value = CryptoJS.AES.decrypt(value.slice(1), value[0] + "+" + key).toString(CryptoJS.enc.Utf8); else value = decryptValue(value); result[key] = value; } return result; } var data = "<base64 string here>"; data = JSON.parse(CryptoJS.AES.decrypt(data.slice(10), data.slice(0, 10)).toString(CryptoJS.enc.Utf8)); console.log(decryptValue(data));

What you get is once again the Abstract Syntax Tree (AST) of some JavaScript code. The lengthy chunks of JavaScript code are for example categorizing the pages of a shop, determining what kind of logic should apply to these. And the sandboxing is once again ineffective, with the code being provided access to jQuery for example.

So here is a mechanism, providing the server with a simple way to run arbitrary JavaScript code on any website it likes, immediately after the page loads and with sufficient obfuscation that nobody will notice anything odd. Mission accomplished?

Taking over the extension

Almost. So far we were talking about running code in the context of websites. But wouldn’t running code in the context of the extension provide more flexibility? There is a small complication: Content Security Policy (CSP) mechanism disallows running arbitrary JavaScript code in the extension context. At least that’s the case with the Firefox extension due to the Mozilla Add-ons requirements, on Chrome the extension simply relaxed CSP protection.

But that’s not really a problem of course. As we’ve already established, running the code in your own JavaScript interpreter circumvents this protection. And so the Honey extension also has VIM code that will run in the context of the extension’s background page:

The extension requesting VIM code that will run in the background page

It seems that the purpose of this code is extracting user identifiers from various advertising cookies. Here is an excerpt:

var cs = { CONTID: { name: 'CONTID', url: 'https://www.cj.com', exVal: null }, s_vi: { name: 's_vi', url: 'https://www.linkshare.com', exVal: null }, _ga: { name: '_ga', url: 'https://www.rakutenadvertising.com', exVal: null }, ... };

The extension conveniently grants this code access to all cookies on any domains. This is only the case on Chrome however, on Firefox the extension doesn’t request access to cookies. That’s most likely to address concerns that Mozilla Add-ons reviewers had.

The script also has access to jQuery. With the relaxed CSP protection of the Chrome version, this allows it to load any script from paypal.com and some other domains at will. These scripts will be able to do anything that the extension can do: read or change website cookies, track the user’s browsing in arbitrary ways, inject code into websites or even modify server responses.

On Firefox the fallout is more limited. So far I could only think of one rather exotic possibility: add a frame to the extension’s background page. This would allow loading an arbitrary web page that would stay around for the duration of the browsing session while being invisible. This attack could be used for cryptojacking for example.

About that privacy commitment…

The Honey Privacy and Security policy states:

We will be transparent with what data we collect and how we use it to save you time and money, and you can decide if you’re good with that.

This sounds pretty good. But if I still have you here, I want to take a brief look at what this means in practice.

As the privacy policy explains, Honey collects information on availability and prices of items with your help. Opening a single Amazon product page results in numerous requests like the following:

Honey transmitting data about Amazon products to its server

The code responsible for the data sent here is only partly contained in the extension, much of it is loaded from the server:

Obfuscated VIM code returned by Honey server

Yes, this is yet another block of obfuscated VIM code. That’s definitely an unusual way to ensure transparency…

On the bright side, this particular part of Honey functionality can be disabled. That is, if you find the “off” switch. Rather counter-intuitively, this setting is part of your account settings on the Honey website:

The relevant privacy setting is labeled Community Hero

Don’t know about you, but after reading this description I would be no wiser. And if you don’t have a Honey account, it seems that there is no way for you to disable this. Either way, from what I can tell this setting won’t affect other tracking like pns_siteSelSubId1 functionality outlined above.

On a side note, I couldn’t fail to notice one more interesting feature not mentioned in the privacy policy. Honey tracks ad blocker usage, and it will even re-run certain tracking requests from the extension if blocked by an ad blocker. So much for your privacy choices.

Why you should care

In the end, I found that the Honey browser extension gives its server very far reaching privileges, but I did not find any evidence of these privileges being misused. So is it all fine and nothing to worry about? Unfortunately, it’s not that easy.

While the browser extension’s codebase is massive and I certainly didn’t see all of it, it’s possible to make definitive statements about the extension’s behavior. Unfortunately, the same isn’t true for a web server that one can only observe from outside. The fact that I only saw non-malicious responses doesn’t mean that it will stay the same way in future or that other people will make the same experience.

In fact, if the server were to invade users’ privacy or do something outright malicious, it would likely try to avoid detection. One common way is to only do it for accounts that accumulated a certain amount of history. As security researchers like me usually use fairly new accounts, they won’t notice anything. Also, the server might decide to limit such functionality to countries where litigation is less likely. So somebody like me living in Europe with its strict privacy laws won’t see anything, whereas US citizens would have all of their data extracted.

But let’s say that we really trust Honey Science LLC given its great track record. We even trust PayPal who happened to acquire Honey this year. Maybe they really only want to do the right thing, by any means possible. Even then there are still at least two scenarios for you to worry about.

The Honey server infrastructure makes an extremely lucrative target for hackers. Whoever manages to gain control of it will gain control of the browsing experience for all Honey users. They will be able to extract valuable data like credit card numbers, impersonate users (e.g. to commit ad fraud), take over users’ accounts (e.g. to demand ransom) and more. Now think again how much you trust Honey to keep hackers out.

But even if Honey had perfect security, they are also a US-based company. And that means that at any time a three letter agency can ask them for access, and they will have to grant it. That agency might be interested in a particular user, and Honey provides the perfect infrastructure for a targeted attack. Or the agency might want data from all users, something that they are also known to do occasionally. Honey can deliver that as well.

And that’s the reason why Mozilla’s Add-on Policies list the following requirement:

Add-ons must be self-contained and not load remote code for execution

So it’s very surprising that the Honey browser extension in its current form is not merely allowed on Mozilla Add-ons but also marked as “Verified.” I wonder what kind of review process this extension got that none of the remote code execution mechanisms have been detected.

Edit (2020-10-28): As Hubert Figuière pointed out, extensions acquire this “Verified” badge by paying for the review. All the more interesting to learn what kind of review has been paid here.

While Chrome Web Store is more relaxed on this front, their Developer Program Policies also list the following requirement:

Developers must not obfuscate code or conceal functionality of their extension. This also applies to any external code or resource fetched by the extension package.

I’d say that the VIM mechanism clearly violates that requirement as well. As I’m still to discover a working mechanism to report violations of Chrome’s Developer Program Policies, it is to be seen whether this will have any consequences.

Categorieën: Mozilla-nl planet

Patrick Cloke: django-render-block 0.8 (and 0.8.1) released!

di, 27/10/2020 - 21:57

A couple of weeks ago I released version 0.8 of django-render-block, this was followed up with a 0.8.1 to fix a regression.

django-render-block is a small library that allows you render a specific block from a Django (or Jinja) template, this is frequently used for emails when …

Categorieën: Mozilla-nl planet