This is the start of the Synthetic channel, created by loleg on January 22, 2022. Any member can join and read this channel. This channel's purpose is: Synthetic data generated from computer simulations or algorithms provides an inexpensive alternative to real-world data that's increasingly used to create accurate AI models.
loleg 14:15
Census.gov
Example datasets:
GitHub
A hands-on tutorial showing how to use Python to do anonymisation with synthetic data - GitHub - theodi/synthetic-data-tutorial: A hands-on tutorial showing how to use Python to do anonymisation wi...
See also introductory article
AIcrowd | NeurIPS 2019 : Disentanglement Challenge | Challenges
Disentanglement: from simulation to real-world
loleg 23:13
Not to be confused with synthetic biology, here we are talking about data that "are not obtained by direct measurement". This is one of the many excellent topics that were debated by students participating in the University of Bern 3 day mini-course for Open Government Data that I had the pleasure of supporting for the second time.
"Crowdsourcing A.I. to solve real world problems" (AIcrowd slogan)
January 24, 2022
10:43
@jurek joined the channel.
loleg 17:44
:wave: welcome @jurek !
January 25, 2022
loleg 11:53
January 28, 2022
loleg 07:50
A few months ago we revealed we were working on something new. The Block Protocol is now in public draft, and an early preview of our new product, HASH, is available for download on GitHub under an open-source license.
January 31, 2022
loleg 21:11
Train machine learning models on your dataset and generate synthetic data that is statistically equivalent. Get started free with a single click.
February 17, 2022
loleg 09:43
The Economist
OpenMined is an open-source community whose goal is to make the world more privacy-preserving by lowering the barrier-to-entry to private AI technologies.
February 18, 2022
loleg 09:46
Curated Health
💻 Synthetic data has been defined as “the generation of artificial data with the aim of reproducing the statistical properties of an original dataset” and it intends to capture the original ...
February 25, 2022
loleg 07:20
Generate massive amounts of fake (but reasonable) data for testing and development.
loleg 20:05
A separate, but related, proposal aims to address “synthetic content,” an umbrella term encompassing fake news, synthetic audio, and deepfake images and videos, in which a person’s face is stitched onto someone else using AI. Among the provisions, makers of deepfake software would be required to verify the real names of creators and “conspicuously label” any deepfaked content. Deepfake apps, popular in China, have stirred public debate over privacy and ownership of personal data. https://www.wired.com/story/china-regulate-ai-world-watching/?mc_cid=1e1ad5e9ba&mc_eid=6ebf3bae07
February 26, 2022
loleg 19:12
...For all its benefits, open data also carries risk. Open data can certainly accelerate AI development, but using massive public datasets to train models can unintentionally undermine privacy or perpetuate encoded biases. Even the pioneering ImageNet data faced some of these risks as creators removed people-related categories and blurred individuals’ faces to try to protect their privacy.7 For open datasets released by the public sector, government leaders should be cognizant of the risks and take steps to ensure that open data offers a safe path to future AI. https://www2.deloitte.com/us/en/insights/industry/public-sector/open-data-ai-explainable-trustworthy.html
Deloitte Insights
Government leaders need to be cognizant of the risks of using open data to train AI and ensure a safe path to bring benefits to citizens.
March 06, 2022
loleg 16:31
Created for AI Day events in Bangkok (Apr 21, 2018) and Singapore (Apr 29, 2018), this tutorial is based on: TensorFlow.js Introduction to the TensorFlow.js core API Machine Learning in JavaScript (TensorFlow Dev Summit 2018) TensorFlow.js is a general purpose, WebGL-accelerated numerics platform...
April 06, 2022
loleg 10:34
GitHub
SynPopToolbox is a Python framework designed for analysis, visualization and manipulation of a synthetic population produced by the land-use simulation software FaLC (https://github.com/falc-sim-or...
loleg 22:44
“This is in opposition to data where we learn on real datasets, of real people, often collected without consent and then make "synthetic data" that often leaks private information! See: https://t.co/D7idLIN9EF for more.”
May 19, 2022
loleg 10:01
GitHub
Python Data Anonymization & Masking Library For Data Science Tasks - GitHub - ArtLabss/open-data-anonymizer: Python Data Anonymization & Masking Library For Data Science Tasks
June 01, 2022
loleg 11:20
Unity Blog
June 27, 2022
loleg 07:31
Your agent will be evaluated on 1000 episodes and will have a total available time of 48 hours to finish. Your submissions will be evaluated on AWS EC2 p2.xlarge instance which has a Tesla K80 GPU (12 GB Memory), 4 CPU cores, and 61 GB RAM. If you need more time/resources for evaluation of your submission please get in touch.
AI Habitat
Overview In 2022, we are hosting the ObjectNav challenge in the Habitat simulator 1. ObjectNav focuses on egocentric object/scene recognition and a commonsense understanding of object semantics (where is a bed typically located in a house?). For details on how to participate, submit and train age...
July 16, 2022
loleg 05:58
“ICYMI: Machine learning models! On Living with Machines we are grappling with large and varied digitised collections. These include textual sources, maps, census records etc. One way in which we can try and manage this scale and complexity is through machine learning. 🧵”
August 17, 2022
loleg 16:38
loleg 20:25
GitHub
OpenBytes, a Sandbox Project in the LF AI & Data Foundation., aims to bring transformational changes to AI by making open datasets more available & accessible. - OpenBytes
September 07, 2022
loleg 14:14
Unity Blog
November 16, 2022
14:30
@loleg updated the channel header to:
See also ~Deidentification
loleg 14:32
loleg 22:31
Confidential Computing Consortium
The Confidential Computing Consortium is focused on projects securing data in use and accelerating the adoption of confidential computing.
December 16, 2022
loleg 18:52
GitHub
Random fake data generator written in go. Contribute to brianvoe/gofakeit development by creating an account on GitHub.
January 09
loleg 15:58
Our privacy-preserving synthetic data platform help unlock your data's potential and give you the freedom to use and share data with confidence. Learn more
January 12
loleg 21:21
GitHub
Sharing sensitive data is vital in enabling many modern data analysis and machine learning tasks. However, current methods for data release are insufficiently accurate or granular to provide meaningful utility, and they carry a high risk of deanonymization or membership inference attacks. In this...
loleg 22:27
Zero-Knowledge Proofs
What are they, how do they work, and are they fast yet?
6 Designing Access with Differential Privacy | Handbook on Using Administrative Data for Research and Evidence-based Policy
February 04
loleg 22:59
GitLab
With Satya’s copilot announcements at Microsoft Ignite in the rear-view mirror, it’s a good time to talk more about the kind of work and creative thinking that made it possible. If you aren’t already familiar with the new ways to innovate with AI, such as the AI-based copilot to build your flow i...
loleg:
Open data initiatives are increasing everywhere. These are often driven by the public sector aiming to promote data availability to trigger innovation and event
forum.opendata.ch
Session #2 room C.02 @ Opendata.ch/2019 Participants: ca. 4 Adapted from the original protocol in German of “Was macht das #eID-Konsortium mit meinen Daten?”. Participants’ experiences on the topic Board. Community. Developer. The question of electronic identity is hotly discussed at the Sw...
December 16, 2021
loleg:
ProtonMail Blog
Knowledge is power. Today, we're launching Privacy Decrypted, a new initiative to arm activists with essential knowledge.
January 24, 2022
System
:
10:41
@jurek joined the channel.
April 28, 2022
loleg:
"Upload your unredacted sensitive documents and our magic AI will remove PII for you!"
https://news.ycombinator.com/item?id=25275637 Amnesia – High-Accuracy Data Anonymization (openaire.eu)
https://news.ycombinator.com/item?id=20811372 Presidio: Customizable data protection and PII data anonymization
GitHub
Contribute to psal/anonymouth development by creating an account on GitHub.
Almost Secure
Analyzing a sample of Jumpshot data confirms the suspicion that Avast did indeed sell personally identifiable data of their users, lots of it.
“@ZimMatthias @mattlungrenMD @AureliaAugusta @adjiboussodieng @codyaustun @JvNixon @andrey_kurenkov We get the pretrained checkpoint here! Creds to the LSC-CNN folks for releasing their weights. It should be a google drive link if you navigate their README. https://t.co/TSVmE9mtnh”
Weblaw AG
Sind «Namen Schall und Rauch»? Vermutlich sind sie dies eher nicht, vielmehr aufgeladen mit Bedeutungen, Zuschreibungen und Erwartungen, verbürgen Bonität oder das Gegenteil davon, all dies unter spezifischen Rahmenbedingungen. Das pseudonyme Verwenden eines anderen Namens kann es ermöglichen, di...
November 16, 2022
loleg:
Confidential Computing Consortium
The Confidential Computing Consortium is focused on projects securing data in use and accelerating the adoption of confidential computing.
System
:
@loleg updated the channel header to:
See also ~Synthetic
loleg:
GitHub
Set of tools for helping with data (in FHIR format) anonymization. - GitHub - microsoft/Tools-for-Health-Data-Anonymization: Set of tools for helping with data (in FHIR format) anonymization.
The free, extendable, open source data anonymization tool.
Image Pasted at 2022-11-16 14-33.png
@jurek will you care to join an upcoming hackathon? This topic is really heating up here in Bern.
November 25, 2022
jurek:
In principle sure :slightly_smiling_face:. Although, I am quite busy at the moment. Are you at REJOHA? If yes maybe we could talk there.
loleg:
As usually am distracted by hundreds of things at a hackathon, only saw your reply now :slightly_smiling_face:
December 20, 2022
loleg:
Generates synthetic data and user interfaces for privacy-preserving data sharing and analysis. - GitHub - microsoft/synthetic-data-showcase: Generates synthetic data and user interfaces for privacy...
March 09
loleg:
Our anonymisation course will introduce you to the latest thinking on the topic. Through practical exercises, we will develop your understanding.